| Class and Description |
|---|
| org.bytedeco.cuda.nvml.nvmlEccErrorCounts_t
Different GPU families can have different memory error counters
See \ref nvmlDeviceGetMemoryErrorCounter
|
| Field and Description |
|---|
| org.bytedeco.cuda.global.cudart.cudaDeviceBlockingSync
This flag was deprecated as of CUDA 4.0 and
replaced with ::cudaDeviceScheduleBlockingSync.
|
| org.bytedeco.cuda.global.nvml.nvmlClocksThrottleReasonUserDefinedClocks
Renamed to \ref nvmlClocksThrottleReasonApplicationsClocksSetting
as the name describes the situation more accurately.
|
| Method and Description |
|---|
| org.bytedeco.cuda.global.cudart.cuCtxAttach(CUctx_st, int)
Note that this function is deprecated and should not be used.
Increments the usage count of the context and passes back a context handle
in \p *pctx that must be passed to ::cuCtxDetach() when the application is
done with the context. ::cuCtxAttach() fails if there is no context current
to the thread.
Currently, the \p flags parameter must be 0.
|
| org.bytedeco.cuda.global.cudart.cuCtxDetach(CUctx_st)
Note that this function is deprecated and should not be used.
Decrements the usage count of the context \p ctx, and destroys the context
if the usage count goes to 0. The context must be a handle that was passed
back by ::cuCtxCreate() or ::cuCtxAttach(), and must be current to the
calling thread.
|
| org.bytedeco.cuda.global.cudart.cudaGLMapBufferObject(Pointer, int) |
| org.bytedeco.cuda.global.cudart.cudaGLMapBufferObject(PointerPointer, int)
This function is deprecated as of CUDA 3.0.
Maps the buffer object of ID \p bufObj into the address space of
CUDA and returns in \p *devPtr the base pointer of the resulting
mapping. The buffer must have previously been registered by
calling ::cudaGLRegisterBufferObject(). While a buffer is mapped
by CUDA, any OpenGL operation which references the buffer will
result in undefined behavior. The OpenGL context used to create
the buffer, or another context from the same share group, must be
bound to the current thread when this is called.
All streams in the current thread are synchronized with the current
GL context.
|
| org.bytedeco.cuda.global.cudart.cudaGLMapBufferObjectAsync(Pointer, int, CUstream_st) |
| org.bytedeco.cuda.global.cudart.cudaGLMapBufferObjectAsync(PointerPointer, int, CUstream_st)
This function is deprecated as of CUDA 3.0.
Maps the buffer object of ID \p bufObj into the address space of
CUDA and returns in \p *devPtr the base pointer of the resulting
mapping. The buffer must have previously been registered by
calling ::cudaGLRegisterBufferObject(). While a buffer is mapped
by CUDA, any OpenGL operation which references the buffer will
result in undefined behavior. The OpenGL context used to create
the buffer, or another context from the same share group, must be
bound to the current thread when this is called.
Stream /p stream is synchronized with the current GL context.
|
| org.bytedeco.cuda.global.cudart.cudaGLRegisterBufferObject(int)
This function is deprecated as of CUDA 3.0.
Registers the buffer object of ID \p bufObj for access by
CUDA. This function must be called before CUDA can map the buffer
object. The OpenGL context used to create the buffer, or another
context from the same share group, must be bound to the current
thread when this is called.
|
| org.bytedeco.cuda.global.cudart.cudaGLSetBufferObjectMapFlags(int, int)
This function is deprecated as of CUDA 3.0.
Set flags for mapping the OpenGL buffer \p bufObj
Changes to flags will take effect the next time \p bufObj is mapped.
The \p flags argument may be any of the following:
- ::cudaGLMapFlagsNone: Specifies no hints about how this buffer will
be used. It is therefore assumed that this buffer will be read from and
written to by CUDA kernels. This is the default value.
- ::cudaGLMapFlagsReadOnly: Specifies that CUDA kernels which access this
buffer will not write to the buffer.
- ::cudaGLMapFlagsWriteDiscard: Specifies that CUDA kernels which access
this buffer will not read from the buffer and will write over the
entire contents of the buffer, so none of the data previously stored in
the buffer will be preserved.
If \p bufObj has not been registered for use with CUDA, then
::cudaErrorInvalidResourceHandle is returned. If \p bufObj is presently
mapped for access by CUDA, then ::cudaErrorUnknown is returned.
|
| org.bytedeco.cuda.global.cudart.cudaGLSetGLDevice(int)
This function is deprecated as of CUDA 5.0.
This function is deprecated and should no longer be used. It is
no longer necessary to associate a CUDA device with an OpenGL
context in order to achieve maximum interoperability performance.
This function will immediately initialize the primary context on
\p device if needed.
|
| org.bytedeco.cuda.global.cudart.cudaGLUnmapBufferObject(int)
This function is deprecated as of CUDA 3.0.
Unmaps the buffer object of ID \p bufObj for access by CUDA. When
a buffer is unmapped, the base address returned by
::cudaGLMapBufferObject() is invalid and subsequent references to
the address result in undefined behavior. The OpenGL context used
to create the buffer, or another context from the same share group,
must be bound to the current thread when this is called.
All streams in the current thread are synchronized with the current
GL context.
|
| org.bytedeco.cuda.global.cudart.cudaGLUnmapBufferObjectAsync(int, CUstream_st)
This function is deprecated as of CUDA 3.0.
Unmaps the buffer object of ID \p bufObj for access by CUDA. When
a buffer is unmapped, the base address returned by
::cudaGLMapBufferObject() is invalid and subsequent references to
the address result in undefined behavior. The OpenGL context used
to create the buffer, or another context from the same share group,
must be bound to the current thread when this is called.
Stream /p stream is synchronized with the current GL context.
|
| org.bytedeco.cuda.global.cudart.cudaGLUnregisterBufferObject(int)
This function is deprecated as of CUDA 3.0.
Unregisters the buffer object of ID \p bufObj for access by CUDA
and releases any CUDA resources associated with the buffer. Once a
buffer is unregistered, it may no longer be mapped by CUDA. The GL
context used to create the buffer, or another context from the
same share group, must be bound to the current thread when this is
called.
|
| org.bytedeco.cuda.global.cudart.cudaLaunchCooperativeKernelMultiDevice(cudaLaunchParams, int) |
| org.bytedeco.cuda.global.cudart.cudaLaunchCooperativeKernelMultiDevice(cudaLaunchParams, int, int)
This function is deprecated as of CUDA 11.3.
Invokes kernels as specified in the \p launchParamsList array where each element
of the array specifies all the parameters required to perform a single kernel launch.
These kernels can cooperate and synchronize as they execute. The size of the array is
specified by \p numDevices.
No two kernels can be launched on the same device. All the devices targeted by this
multi-device launch must be identical. All devices must have a non-zero value for the
device attribute ::cudaDevAttrCooperativeMultiDeviceLaunch.
The same kernel must be launched on all devices. Note that any __device__ or __constant__
variables are independently instantiated on every device. It is the application's
responsiblity to ensure these variables are initialized and used appropriately.
The size of the grids as specified in blocks, the size of the blocks themselves and the
amount of shared memory used by each thread block must also match across all launched kernels.
The streams used to launch these kernels must have been created via either ::cudaStreamCreate
or ::cudaStreamCreateWithPriority or ::cudaStreamCreateWithPriority. The NULL stream or
::cudaStreamLegacy or ::cudaStreamPerThread cannot be used.
The total number of blocks launched per kernel cannot exceed the maximum number of blocks
per multiprocessor as returned by ::cudaOccupancyMaxActiveBlocksPerMultiprocessor (or
::cudaOccupancyMaxActiveBlocksPerMultiprocessorWithFlags) times the number of multiprocessors
as specified by the device attribute ::cudaDevAttrMultiProcessorCount. Since the
total number of blocks launched per device has to match across all devices, the maximum
number of blocks that can be launched per device will be limited by the device with the
least number of multiprocessors.
The kernel cannot make use of CUDA dynamic parallelism.
The ::cudaLaunchParams structure is defined as:
where:
- ::cudaLaunchParams::func specifies the kernel to be launched. This same functions must
be launched on all devices. For templated functions, pass the function symbol as follows:
func_name |
| org.bytedeco.cuda.global.cudart.cudaMemcpyArrayToArray(cudaArray, long, long, cudaArray, long, long, long) |
| org.bytedeco.cuda.global.cudart.cudaMemcpyArrayToArray(cudaArray, long, long, cudaArray, long, long, long, int)
Copies \p count bytes from the CUDA array \p src starting at \p hOffsetSrc
rows and \p wOffsetSrc bytes from the upper left corner to the CUDA array
\p dst starting at \p hOffsetDst rows and \p wOffsetDst bytes from the upper
left corner, where \p kind specifies the direction of the copy, and must be one of
::cudaMemcpyHostToHost, ::cudaMemcpyHostToDevice, ::cudaMemcpyDeviceToHost,
::cudaMemcpyDeviceToDevice, or ::cudaMemcpyDefault. Passing
::cudaMemcpyDefault is recommended, in which case the type of transfer is
inferred from the pointer values. However, ::cudaMemcpyDefault is only
allowed on systems that support unified virtual addressing.
|
| org.bytedeco.cuda.global.cudart.cudaMemcpyFromArray(Pointer, cudaArray, long, long, long, int)
Copies \p count bytes from the CUDA array \p src starting at \p hOffset rows
and \p wOffset bytes from the upper left corner to the memory area pointed to
by \p dst, where \p kind specifies the direction of the copy, and must be one of
::cudaMemcpyHostToHost, ::cudaMemcpyHostToDevice, ::cudaMemcpyDeviceToHost,
::cudaMemcpyDeviceToDevice, or ::cudaMemcpyDefault. Passing
::cudaMemcpyDefault is recommended, in which case the type of transfer is
inferred from the pointer values. However, ::cudaMemcpyDefault is only
allowed on systems that support unified virtual addressing.
|
| org.bytedeco.cuda.global.cudart.cudaMemcpyFromArrayAsync(Pointer, cudaArray, long, long, long, int) |
| org.bytedeco.cuda.global.cudart.cudaMemcpyFromArrayAsync(Pointer, cudaArray, long, long, long, int, CUstream_st)
Copies \p count bytes from the CUDA array \p src starting at \p hOffset rows
and \p wOffset bytes from the upper left corner to the memory area pointed to
by \p dst, where \p kind specifies the direction of the copy, and must be one of
::cudaMemcpyHostToHost, ::cudaMemcpyHostToDevice, ::cudaMemcpyDeviceToHost,
::cudaMemcpyDeviceToDevice, or ::cudaMemcpyDefault. Passing
::cudaMemcpyDefault is recommended, in which case the type of transfer is
inferred from the pointer values. However, ::cudaMemcpyDefault is only
allowed on systems that support unified virtual addressing.
::cudaMemcpyFromArrayAsync() is asynchronous with respect to the host, so
the call may return before the copy is complete. The copy can optionally
be associated to a stream by passing a non-zero \p stream argument. If \p
kind is ::cudaMemcpyHostToDevice or ::cudaMemcpyDeviceToHost and \p stream
is non-zero, the copy may overlap with operations in other streams.
|
| org.bytedeco.cuda.global.cudart.cudaMemcpyToArray(cudaArray, long, long, Pointer, long, int)
Copies \p count bytes from the memory area pointed to by \p src to the
CUDA array \p dst starting at \p hOffset rows and \p wOffset bytes from
the upper left corner, where \p kind specifies the direction
of the copy, and must be one of ::cudaMemcpyHostToHost,
::cudaMemcpyHostToDevice, ::cudaMemcpyDeviceToHost,
::cudaMemcpyDeviceToDevice, or ::cudaMemcpyDefault. Passing
::cudaMemcpyDefault is recommended, in which case the type of transfer is
inferred from the pointer values. However, ::cudaMemcpyDefault is only
allowed on systems that support unified virtual addressing.
|
| org.bytedeco.cuda.global.cudart.cudaMemcpyToArrayAsync(cudaArray, long, long, Pointer, long, int) |
| org.bytedeco.cuda.global.cudart.cudaMemcpyToArrayAsync(cudaArray, long, long, Pointer, long, int, CUstream_st)
Copies \p count bytes from the memory area pointed to by \p src to the
CUDA array \p dst starting at \p hOffset rows and \p wOffset bytes from
the upper left corner, where \p kind specifies the
direction of the copy, and must be one of ::cudaMemcpyHostToHost,
::cudaMemcpyHostToDevice, ::cudaMemcpyDeviceToHost,
::cudaMemcpyDeviceToDevice, or ::cudaMemcpyDefault. Passing
::cudaMemcpyDefault is recommended, in which case the type of transfer is
inferred from the pointer values. However, ::cudaMemcpyDefault is only
allowed on systems that support unified virtual addressing.
::cudaMemcpyToArrayAsync() is asynchronous with respect to the host, so
the call may return before the copy is complete. The copy can optionally
be associated to a stream by passing a non-zero \p stream argument. If \p
kind is ::cudaMemcpyHostToDevice or ::cudaMemcpyDeviceToHost and \p stream
is non-zero, the copy may overlap with operations in other streams.
|
| org.bytedeco.cuda.global.cudart.cudaSetDoubleForDevice(double[]) |
| org.bytedeco.cuda.global.cudart.cudaSetDoubleForDevice(DoubleBuffer) |
| org.bytedeco.cuda.global.cudart.cudaSetDoubleForDevice(DoublePointer)
This function is deprecated as of CUDA 7.5
Converts the double value of \p d to an internal float representation if
the device does not support double arithmetic. If the device does natively
support doubles, then this function does nothing.
|
| org.bytedeco.cuda.global.cudart.cudaSetDoubleForHost(double[]) |
| org.bytedeco.cuda.global.cudart.cudaSetDoubleForHost(DoubleBuffer) |
| org.bytedeco.cuda.global.cudart.cudaSetDoubleForHost(DoublePointer)
This function is deprecated as of CUDA 7.5
Converts the double value of \p d from a potentially internal float
representation if the device does not support double arithmetic. If the
device does natively support doubles, then this function does nothing.
|
| org.bytedeco.cuda.global.cudart.cudaThreadExit()
Note that this function is deprecated because its name does not
reflect its behavior. Its functionality is identical to the
non-deprecated function ::cudaDeviceReset(), which should be used
instead.
Explicitly destroys all cleans up all resources associated with the current
device in the current process. Any subsequent API call to this device will
reinitialize the device.
Note that this function will reset the device immediately. It is the caller's
responsibility to ensure that the device is not being accessed by any
other host threads from the process when this function is called.
|
| org.bytedeco.cuda.global.cudart.cudaThreadGetCacheConfig(int[]) |
| org.bytedeco.cuda.global.cudart.cudaThreadGetCacheConfig(IntBuffer) |
| org.bytedeco.cuda.global.cudart.cudaThreadGetCacheConfig(IntPointer)
Note that this function is deprecated because its name does not
reflect its behavior. Its functionality is identical to the
non-deprecated function ::cudaDeviceGetCacheConfig(), which should be
used instead.
On devices where the L1 cache and shared memory use the same hardware
resources, this returns through \p pCacheConfig the preferred cache
configuration for the current device. This is only a preference. The
runtime will use the requested configuration if possible, but it is free to
choose a different configuration if required to execute functions.
This will return a \p pCacheConfig of ::cudaFuncCachePreferNone on devices
where the size of the L1 cache and shared memory are fixed.
The supported cache configurations are:
- ::cudaFuncCachePreferNone: no preference for shared memory or L1 (default)
- ::cudaFuncCachePreferShared: prefer larger shared memory and smaller L1 cache
- ::cudaFuncCachePreferL1: prefer larger L1 cache and smaller shared memory
|
| org.bytedeco.cuda.global.cudart.cudaThreadGetLimit(SizeTPointer, int)
Note that this function is deprecated because its name does not
reflect its behavior. Its functionality is identical to the
non-deprecated function ::cudaDeviceGetLimit(), which should be used
instead.
Returns in \p *pValue the current size of \p limit. The supported
::cudaLimit values are:
- ::cudaLimitStackSize: stack size of each GPU thread;
- ::cudaLimitPrintfFifoSize: size of the shared FIFO used by the
::printf() device system call.
- ::cudaLimitMallocHeapSize: size of the heap used by the
::malloc() and ::free() device system calls;
|
| org.bytedeco.cuda.global.cudart.cudaThreadSetCacheConfig(int)
Note that this function is deprecated because its name does not
reflect its behavior. Its functionality is identical to the
non-deprecated function ::cudaDeviceSetCacheConfig(), which should be
used instead.
On devices where the L1 cache and shared memory use the same hardware
resources, this sets through \p cacheConfig the preferred cache
configuration for the current device. This is only a preference. The
runtime will use the requested configuration if possible, but it is free to
choose a different configuration if required to execute the function. Any
function preference set via
\ref ::cudaFuncSetCacheConfig(const void*, enum cudaFuncCache) "cudaFuncSetCacheConfig (C API)"
or
\ref ::cudaFuncSetCacheConfig(T*, enum cudaFuncCache) "cudaFuncSetCacheConfig (C++ API)"
will be preferred over this device-wide setting. Setting the device-wide
cache configuration to ::cudaFuncCachePreferNone will cause subsequent
kernel launches to prefer to not change the cache configuration unless
required to launch the kernel.
This setting does nothing on devices where the size of the L1 cache and
shared memory are fixed.
Launching a kernel with a different preference than the most recent
preference setting may insert a device-side synchronization point.
The supported cache configurations are:
- ::cudaFuncCachePreferNone: no preference for shared memory or L1 (default)
- ::cudaFuncCachePreferShared: prefer larger shared memory and smaller L1 cache
- ::cudaFuncCachePreferL1: prefer larger L1 cache and smaller shared memory
|
| org.bytedeco.cuda.global.cudart.cudaThreadSetLimit(int, long)
Note that this function is deprecated because its name does not
reflect its behavior. Its functionality is identical to the
non-deprecated function ::cudaDeviceSetLimit(), which should be used
instead.
Setting \p limit to \p value is a request by the application to update
the current limit maintained by the device. The driver is free to
modify the requested value to meet h/w requirements (this could be
clamping to minimum or maximum values, rounding up to nearest element
size, etc). The application can use ::cudaThreadGetLimit() to find out
exactly what the limit has been set to.
Setting each ::cudaLimit has its own specific restrictions, so each is
discussed here.
- ::cudaLimitStackSize controls the stack size of each GPU thread.
- ::cudaLimitPrintfFifoSize controls the size of the shared FIFO
used by the ::printf() device system call.
Setting ::cudaLimitPrintfFifoSize must be performed before
launching any kernel that uses the ::printf() device
system call, otherwise ::cudaErrorInvalidValue will be returned.
- ::cudaLimitMallocHeapSize controls the size of the heap used
by the ::malloc() and ::free() device system calls. Setting
::cudaLimitMallocHeapSize must be performed before launching
any kernel that uses the ::malloc() or ::free() device system calls,
otherwise ::cudaErrorInvalidValue will be returned.
|
| org.bytedeco.cuda.global.cudart.cudaThreadSynchronize()
Note that this function is deprecated because its name does not
reflect its behavior. Its functionality is similar to the
non-deprecated function ::cudaDeviceSynchronize(), which should be used
instead.
Blocks until the device has completed all preceding requested tasks.
::cudaThreadSynchronize() returns an error if one of the preceding tasks
has failed. If the ::cudaDeviceScheduleBlockingSync flag was set for
this device, the host thread will block until the device has finished
its work.
|
| org.bytedeco.cuda.global.cudart.cuDeviceComputeCapability(int[], int[], int) |
| org.bytedeco.cuda.global.cudart.cuDeviceComputeCapability(IntBuffer, IntBuffer, int) |
| org.bytedeco.cuda.global.cudart.cuDeviceComputeCapability(IntPointer, IntPointer, int)
This function was deprecated as of CUDA 5.0 and its functionality superseded
by ::cuDeviceGetAttribute().
Returns in \p *major and \p *minor the major and minor revision numbers that
define the compute capability of the device \p dev.
|
| org.bytedeco.cuda.global.cudart.cuDeviceGetProperties(CUdevprop_v1, int)
This function was deprecated as of CUDA 5.0 and replaced by ::cuDeviceGetAttribute().
Returns in \p *prop the properties of device \p dev. The ::CUdevprop
structure is defined as:
where:
- ::maxThreadsPerBlock is the maximum number of threads per block;
- ::maxThreadsDim[3] is the maximum sizes of each dimension of a block;
- ::maxGridSize[3] is the maximum sizes of each dimension of a grid;
- ::sharedMemPerBlock is the total amount of shared memory available per
block in bytes;
- ::totalConstantMemory is the total amount of constant memory available on
the device in bytes;
- ::SIMDWidth is the warp size;
- ::memPitch is the maximum pitch allowed by the memory copy functions that
involve memory regions allocated through ::cuMemAllocPitch();
- ::regsPerBlock is the total number of registers available per block;
- ::clockRate is the clock frequency in kilohertz;
- ::textureAlign is the alignment requirement; texture base addresses that
are aligned to ::textureAlign bytes do not need an offset applied to
texture fetches. |
| org.bytedeco.cuda.global.cudnn.cudnnCopyAlgorithmDescriptor(cudnnAlgorithmStruct, cudnnAlgorithmStruct) |
| org.bytedeco.cuda.global.cudnn.cudnnCreateAlgorithmDescriptor(cudnnAlgorithmStruct) |
| org.bytedeco.cuda.global.cudnn.cudnnCreateAlgorithmPerformance(cudnnAlgorithmPerformanceStruct, int) |
| org.bytedeco.cuda.global.cudnn.cudnnCreatePersistentRNNPlan(cudnnRNNStruct, int, int, cudnnPersistentRNNPlan) |
| org.bytedeco.cuda.global.cudnn.cudnnDestroyAlgorithmDescriptor(cudnnAlgorithmStruct) |
| org.bytedeco.cuda.global.cudnn.cudnnDestroyAlgorithmPerformance(cudnnAlgorithmPerformanceStruct, int) |
| org.bytedeco.cuda.global.cudnn.cudnnDestroyPersistentRNNPlan(cudnnPersistentRNNPlan) |
| org.bytedeco.cuda.global.cudnn.cudnnFindRNNBackwardDataAlgorithmEx(cudnnContext, cudnnRNNStruct, int, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnFilterStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, float, int, int[], cudnnAlgorithmPerformanceStruct, Pointer, long, Pointer, long) |
| org.bytedeco.cuda.global.cudnn.cudnnFindRNNBackwardDataAlgorithmEx(cudnnContext, cudnnRNNStruct, int, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnFilterStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, float, int, IntBuffer, cudnnAlgorithmPerformanceStruct, Pointer, long, Pointer, long) |
| org.bytedeco.cuda.global.cudnn.cudnnFindRNNBackwardDataAlgorithmEx(cudnnContext, cudnnRNNStruct, int, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnFilterStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, float, int, IntPointer, cudnnAlgorithmPerformanceStruct, Pointer, long, Pointer, long) |
| org.bytedeco.cuda.global.cudnn.cudnnFindRNNBackwardDataAlgorithmEx(cudnnContext, cudnnRNNStruct, int, PointerPointer, Pointer, PointerPointer, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnFilterStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, PointerPointer, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, float, int, int[], cudnnAlgorithmPerformanceStruct, Pointer, long, Pointer, long) |
| org.bytedeco.cuda.global.cudnn.cudnnFindRNNBackwardDataAlgorithmEx(cudnnContext, cudnnRNNStruct, int, PointerPointer, Pointer, PointerPointer, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnFilterStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, PointerPointer, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, float, int, IntBuffer, cudnnAlgorithmPerformanceStruct, Pointer, long, Pointer, long) |
| org.bytedeco.cuda.global.cudnn.cudnnFindRNNBackwardDataAlgorithmEx(cudnnContext, cudnnRNNStruct, int, PointerPointer, Pointer, PointerPointer, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnFilterStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, PointerPointer, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, float, int, IntPointer, cudnnAlgorithmPerformanceStruct, Pointer, long, Pointer, long) |
| org.bytedeco.cuda.global.cudnn.cudnnFindRNNBackwardWeightsAlgorithmEx(cudnnContext, cudnnRNNStruct, int, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, float, int, int[], cudnnAlgorithmPerformanceStruct, Pointer, long, cudnnFilterStruct, Pointer, Pointer, long) |
| org.bytedeco.cuda.global.cudnn.cudnnFindRNNBackwardWeightsAlgorithmEx(cudnnContext, cudnnRNNStruct, int, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, float, int, IntBuffer, cudnnAlgorithmPerformanceStruct, Pointer, long, cudnnFilterStruct, Pointer, Pointer, long) |
| org.bytedeco.cuda.global.cudnn.cudnnFindRNNBackwardWeightsAlgorithmEx(cudnnContext, cudnnRNNStruct, int, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, float, int, IntPointer, cudnnAlgorithmPerformanceStruct, Pointer, long, cudnnFilterStruct, Pointer, Pointer, long) |
| org.bytedeco.cuda.global.cudnn.cudnnFindRNNBackwardWeightsAlgorithmEx(cudnnContext, cudnnRNNStruct, int, PointerPointer, Pointer, cudnnTensorStruct, Pointer, PointerPointer, Pointer, float, int, int[], cudnnAlgorithmPerformanceStruct, Pointer, long, cudnnFilterStruct, Pointer, Pointer, long) |
| org.bytedeco.cuda.global.cudnn.cudnnFindRNNBackwardWeightsAlgorithmEx(cudnnContext, cudnnRNNStruct, int, PointerPointer, Pointer, cudnnTensorStruct, Pointer, PointerPointer, Pointer, float, int, IntBuffer, cudnnAlgorithmPerformanceStruct, Pointer, long, cudnnFilterStruct, Pointer, Pointer, long) |
| org.bytedeco.cuda.global.cudnn.cudnnFindRNNBackwardWeightsAlgorithmEx(cudnnContext, cudnnRNNStruct, int, PointerPointer, Pointer, cudnnTensorStruct, Pointer, PointerPointer, Pointer, float, int, IntPointer, cudnnAlgorithmPerformanceStruct, Pointer, long, cudnnFilterStruct, Pointer, Pointer, long) |
| org.bytedeco.cuda.global.cudnn.cudnnFindRNNForwardInferenceAlgorithmEx(cudnnContext, cudnnRNNStruct, int, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnFilterStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, float, int, int[], cudnnAlgorithmPerformanceStruct, Pointer, long) |
| org.bytedeco.cuda.global.cudnn.cudnnFindRNNForwardInferenceAlgorithmEx(cudnnContext, cudnnRNNStruct, int, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnFilterStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, float, int, IntBuffer, cudnnAlgorithmPerformanceStruct, Pointer, long) |
| org.bytedeco.cuda.global.cudnn.cudnnFindRNNForwardInferenceAlgorithmEx(cudnnContext, cudnnRNNStruct, int, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnFilterStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, float, int, IntPointer, cudnnAlgorithmPerformanceStruct, Pointer, long) |
| org.bytedeco.cuda.global.cudnn.cudnnFindRNNForwardInferenceAlgorithmEx(cudnnContext, cudnnRNNStruct, int, PointerPointer, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnFilterStruct, Pointer, PointerPointer, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, float, int, int[], cudnnAlgorithmPerformanceStruct, Pointer, long) |
| org.bytedeco.cuda.global.cudnn.cudnnFindRNNForwardInferenceAlgorithmEx(cudnnContext, cudnnRNNStruct, int, PointerPointer, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnFilterStruct, Pointer, PointerPointer, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, float, int, IntBuffer, cudnnAlgorithmPerformanceStruct, Pointer, long) |
| org.bytedeco.cuda.global.cudnn.cudnnFindRNNForwardInferenceAlgorithmEx(cudnnContext, cudnnRNNStruct, int, PointerPointer, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnFilterStruct, Pointer, PointerPointer, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, float, int, IntPointer, cudnnAlgorithmPerformanceStruct, Pointer, long) |
| org.bytedeco.cuda.global.cudnn.cudnnFindRNNForwardTrainingAlgorithmEx(cudnnContext, cudnnRNNStruct, int, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnFilterStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, float, int, int[], cudnnAlgorithmPerformanceStruct, Pointer, long, Pointer, long) |
| org.bytedeco.cuda.global.cudnn.cudnnFindRNNForwardTrainingAlgorithmEx(cudnnContext, cudnnRNNStruct, int, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnFilterStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, float, int, IntBuffer, cudnnAlgorithmPerformanceStruct, Pointer, long, Pointer, long) |
| org.bytedeco.cuda.global.cudnn.cudnnFindRNNForwardTrainingAlgorithmEx(cudnnContext, cudnnRNNStruct, int, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnFilterStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, float, int, IntPointer, cudnnAlgorithmPerformanceStruct, Pointer, long, Pointer, long) |
| org.bytedeco.cuda.global.cudnn.cudnnFindRNNForwardTrainingAlgorithmEx(cudnnContext, cudnnRNNStruct, int, PointerPointer, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnFilterStruct, Pointer, PointerPointer, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, float, int, int[], cudnnAlgorithmPerformanceStruct, Pointer, long, Pointer, long) |
| org.bytedeco.cuda.global.cudnn.cudnnFindRNNForwardTrainingAlgorithmEx(cudnnContext, cudnnRNNStruct, int, PointerPointer, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnFilterStruct, Pointer, PointerPointer, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, float, int, IntBuffer, cudnnAlgorithmPerformanceStruct, Pointer, long, Pointer, long) |
| org.bytedeco.cuda.global.cudnn.cudnnFindRNNForwardTrainingAlgorithmEx(cudnnContext, cudnnRNNStruct, int, PointerPointer, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnFilterStruct, Pointer, PointerPointer, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, float, int, IntPointer, cudnnAlgorithmPerformanceStruct, Pointer, long, Pointer, long) |
| org.bytedeco.cuda.global.cudnn.cudnnGetAlgorithmDescriptor(cudnnAlgorithmStruct, cudnnAlgorithm_t) |
| org.bytedeco.cuda.global.cudnn.cudnnGetAlgorithmPerformance(cudnnAlgorithmPerformanceStruct, cudnnAlgorithmStruct, int[], float[], SizeTPointer) |
| org.bytedeco.cuda.global.cudnn.cudnnGetAlgorithmPerformance(cudnnAlgorithmPerformanceStruct, cudnnAlgorithmStruct, IntBuffer, FloatBuffer, SizeTPointer) |
| org.bytedeco.cuda.global.cudnn.cudnnGetAlgorithmPerformance(cudnnAlgorithmPerformanceStruct, cudnnAlgorithmStruct, IntPointer, FloatPointer, SizeTPointer) |
| org.bytedeco.cuda.global.cudnn.cudnnGetAlgorithmSpaceSize(cudnnContext, cudnnAlgorithmStruct, SizeTPointer) |
| org.bytedeco.cuda.global.cudnn.cudnnGetRNNBackwardDataAlgorithmMaxCount(cudnnContext, cudnnRNNStruct, int[]) |
| org.bytedeco.cuda.global.cudnn.cudnnGetRNNBackwardDataAlgorithmMaxCount(cudnnContext, cudnnRNNStruct, IntBuffer) |
| org.bytedeco.cuda.global.cudnn.cudnnGetRNNBackwardDataAlgorithmMaxCount(cudnnContext, cudnnRNNStruct, IntPointer) |
| org.bytedeco.cuda.global.cudnn.cudnnGetRNNBackwardWeightsAlgorithmMaxCount(cudnnContext, cudnnRNNStruct, int[]) |
| org.bytedeco.cuda.global.cudnn.cudnnGetRNNBackwardWeightsAlgorithmMaxCount(cudnnContext, cudnnRNNStruct, IntBuffer) |
| org.bytedeco.cuda.global.cudnn.cudnnGetRNNBackwardWeightsAlgorithmMaxCount(cudnnContext, cudnnRNNStruct, IntPointer) |
| org.bytedeco.cuda.global.cudnn.cudnnGetRNNBiasMode(cudnnRNNStruct, int[]) |
| org.bytedeco.cuda.global.cudnn.cudnnGetRNNBiasMode(cudnnRNNStruct, IntBuffer) |
| org.bytedeco.cuda.global.cudnn.cudnnGetRNNBiasMode(cudnnRNNStruct, IntPointer) |
| org.bytedeco.cuda.global.cudnn.cudnnGetRNNDescriptor_v6(cudnnContext, cudnnRNNStruct, int[], int[], cudnnDropoutStruct, int[], int[], int[], int[], int[]) |
| org.bytedeco.cuda.global.cudnn.cudnnGetRNNDescriptor_v6(cudnnContext, cudnnRNNStruct, IntBuffer, IntBuffer, cudnnDropoutStruct, IntBuffer, IntBuffer, IntBuffer, IntBuffer, IntBuffer) |
| org.bytedeco.cuda.global.cudnn.cudnnGetRNNDescriptor_v6(cudnnContext, cudnnRNNStruct, IntPointer, IntPointer, cudnnDropoutStruct, IntPointer, IntPointer, IntPointer, IntPointer, IntPointer) |
| org.bytedeco.cuda.global.cudnn.cudnnGetRNNForwardInferenceAlgorithmMaxCount(cudnnContext, cudnnRNNStruct, int[]) |
| org.bytedeco.cuda.global.cudnn.cudnnGetRNNForwardInferenceAlgorithmMaxCount(cudnnContext, cudnnRNNStruct, IntBuffer) |
| org.bytedeco.cuda.global.cudnn.cudnnGetRNNForwardInferenceAlgorithmMaxCount(cudnnContext, cudnnRNNStruct, IntPointer) |
| org.bytedeco.cuda.global.cudnn.cudnnGetRNNForwardTrainingAlgorithmMaxCount(cudnnContext, cudnnRNNStruct, int[]) |
| org.bytedeco.cuda.global.cudnn.cudnnGetRNNForwardTrainingAlgorithmMaxCount(cudnnContext, cudnnRNNStruct, IntBuffer) |
| org.bytedeco.cuda.global.cudnn.cudnnGetRNNForwardTrainingAlgorithmMaxCount(cudnnContext, cudnnRNNStruct, IntPointer) |
| org.bytedeco.cuda.global.cudnn.cudnnGetRNNLinLayerBiasParams(cudnnContext, cudnnRNNStruct, int, cudnnTensorStruct, cudnnFilterStruct, Pointer, int, cudnnFilterStruct, Pointer) |
| org.bytedeco.cuda.global.cudnn.cudnnGetRNNLinLayerBiasParams(cudnnContext, cudnnRNNStruct, int, cudnnTensorStruct, cudnnFilterStruct, Pointer, int, cudnnFilterStruct, PointerPointer) |
| org.bytedeco.cuda.global.cudnn.cudnnGetRNNLinLayerMatrixParams(cudnnContext, cudnnRNNStruct, int, cudnnTensorStruct, cudnnFilterStruct, Pointer, int, cudnnFilterStruct, Pointer) |
| org.bytedeco.cuda.global.cudnn.cudnnGetRNNLinLayerMatrixParams(cudnnContext, cudnnRNNStruct, int, cudnnTensorStruct, cudnnFilterStruct, Pointer, int, cudnnFilterStruct, PointerPointer) |
| org.bytedeco.cuda.global.cudnn.cudnnGetRNNMatrixMathType(cudnnRNNStruct, int[]) |
| org.bytedeco.cuda.global.cudnn.cudnnGetRNNMatrixMathType(cudnnRNNStruct, IntBuffer) |
| org.bytedeco.cuda.global.cudnn.cudnnGetRNNMatrixMathType(cudnnRNNStruct, IntPointer) |
| org.bytedeco.cuda.global.cudnn.cudnnGetRNNPaddingMode(cudnnRNNStruct, int[]) |
| org.bytedeco.cuda.global.cudnn.cudnnGetRNNPaddingMode(cudnnRNNStruct, IntBuffer) |
| org.bytedeco.cuda.global.cudnn.cudnnGetRNNPaddingMode(cudnnRNNStruct, IntPointer) |
| org.bytedeco.cuda.global.cudnn.cudnnGetRNNParamsSize(cudnnContext, cudnnRNNStruct, cudnnTensorStruct, SizeTPointer, int) |
| org.bytedeco.cuda.global.cudnn.cudnnGetRNNProjectionLayers(cudnnContext, cudnnRNNStruct, int[], int[]) |
| org.bytedeco.cuda.global.cudnn.cudnnGetRNNProjectionLayers(cudnnContext, cudnnRNNStruct, IntBuffer, IntBuffer) |
| org.bytedeco.cuda.global.cudnn.cudnnGetRNNProjectionLayers(cudnnContext, cudnnRNNStruct, IntPointer, IntPointer) |
| org.bytedeco.cuda.global.cudnn.cudnnGetRNNTrainingReserveSize(cudnnContext, cudnnRNNStruct, int, cudnnTensorStruct, SizeTPointer) |
| org.bytedeco.cuda.global.cudnn.cudnnGetRNNTrainingReserveSize(cudnnContext, cudnnRNNStruct, int, PointerPointer, SizeTPointer) |
| org.bytedeco.cuda.global.cudnn.cudnnGetRNNWorkspaceSize(cudnnContext, cudnnRNNStruct, int, cudnnTensorStruct, SizeTPointer) |
| org.bytedeco.cuda.global.cudnn.cudnnGetRNNWorkspaceSize(cudnnContext, cudnnRNNStruct, int, PointerPointer, SizeTPointer) |
| org.bytedeco.cuda.global.cudnn.cudnnRestoreAlgorithm(cudnnContext, Pointer, long, cudnnAlgorithmStruct) |
| org.bytedeco.cuda.global.cudnn.cudnnRNNBackwardData(cudnnContext, cudnnRNNStruct, int, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnFilterStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, Pointer, long, Pointer, long) |
| org.bytedeco.cuda.global.cudnn.cudnnRNNBackwardData(cudnnContext, cudnnRNNStruct, int, PointerPointer, Pointer, PointerPointer, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnFilterStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, PointerPointer, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, Pointer, long, Pointer, long) |
| org.bytedeco.cuda.global.cudnn.cudnnRNNBackwardDataEx(cudnnContext, cudnnRNNStruct, cudnnRNNDataStruct, Pointer, cudnnRNNDataStruct, Pointer, cudnnRNNDataStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnFilterStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnRNNDataStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnRNNDataStruct, Pointer, Pointer, long, Pointer, long) |
| org.bytedeco.cuda.global.cudnn.cudnnRNNBackwardWeights(cudnnContext, cudnnRNNStruct, int, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, Pointer, long, cudnnFilterStruct, Pointer, Pointer, long) |
| org.bytedeco.cuda.global.cudnn.cudnnRNNBackwardWeights(cudnnContext, cudnnRNNStruct, int, PointerPointer, Pointer, cudnnTensorStruct, Pointer, PointerPointer, Pointer, Pointer, long, cudnnFilterStruct, Pointer, Pointer, long) |
| org.bytedeco.cuda.global.cudnn.cudnnRNNBackwardWeightsEx(cudnnContext, cudnnRNNStruct, cudnnRNNDataStruct, Pointer, cudnnTensorStruct, Pointer, cudnnRNNDataStruct, Pointer, Pointer, long, cudnnFilterStruct, Pointer, Pointer, long) |
| org.bytedeco.cuda.global.cudnn.cudnnRNNForwardInference(cudnnContext, cudnnRNNStruct, int, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnFilterStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, Pointer, long) |
| org.bytedeco.cuda.global.cudnn.cudnnRNNForwardInference(cudnnContext, cudnnRNNStruct, int, PointerPointer, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnFilterStruct, Pointer, PointerPointer, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, Pointer, long) |
| org.bytedeco.cuda.global.cudnn.cudnnRNNForwardInferenceEx(cudnnContext, cudnnRNNStruct, cudnnRNNDataStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnFilterStruct, Pointer, cudnnRNNDataStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnRNNDataStruct, Pointer, cudnnRNNDataStruct, Pointer, cudnnRNNDataStruct, Pointer, cudnnRNNDataStruct, Pointer, Pointer, long) |
| org.bytedeco.cuda.global.cudnn.cudnnRNNForwardTraining(cudnnContext, cudnnRNNStruct, int, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnFilterStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, Pointer, long, Pointer, long) |
| org.bytedeco.cuda.global.cudnn.cudnnRNNForwardTraining(cudnnContext, cudnnRNNStruct, int, PointerPointer, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnFilterStruct, Pointer, PointerPointer, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, Pointer, long, Pointer, long) |
| org.bytedeco.cuda.global.cudnn.cudnnRNNForwardTrainingEx(cudnnContext, cudnnRNNStruct, cudnnRNNDataStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnFilterStruct, Pointer, cudnnRNNDataStruct, Pointer, cudnnTensorStruct, Pointer, cudnnTensorStruct, Pointer, cudnnRNNDataStruct, Pointer, cudnnRNNDataStruct, Pointer, cudnnRNNDataStruct, Pointer, cudnnRNNDataStruct, Pointer, Pointer, long, Pointer, long) |
| org.bytedeco.cuda.global.cudnn.cudnnRNNGetClip(cudnnContext, cudnnRNNStruct, int[], int[], double[], double[]) |
| org.bytedeco.cuda.global.cudnn.cudnnRNNGetClip(cudnnContext, cudnnRNNStruct, IntBuffer, IntBuffer, DoubleBuffer, DoubleBuffer) |
| org.bytedeco.cuda.global.cudnn.cudnnRNNGetClip(cudnnContext, cudnnRNNStruct, IntPointer, IntPointer, DoublePointer, DoublePointer) |
| org.bytedeco.cuda.global.cudnn.cudnnRNNSetClip(cudnnContext, cudnnRNNStruct, int, int, double, double) |
| org.bytedeco.cuda.global.cudnn.cudnnSaveAlgorithm(cudnnContext, cudnnAlgorithmStruct, Pointer, long) |
| org.bytedeco.cuda.global.cudnn.cudnnSetAlgorithmDescriptor(cudnnAlgorithmStruct, cudnnAlgorithm_t) |
| org.bytedeco.cuda.global.cudnn.cudnnSetAlgorithmPerformance(cudnnAlgorithmPerformanceStruct, cudnnAlgorithmStruct, int, float, long) |
| org.bytedeco.cuda.global.cudnn.cudnnSetPersistentRNNPlan(cudnnRNNStruct, cudnnPersistentRNNPlan) |
| org.bytedeco.cuda.global.cudnn.cudnnSetRNNAlgorithmDescriptor(cudnnContext, cudnnRNNStruct, cudnnAlgorithmStruct) |
| org.bytedeco.cuda.global.cudnn.cudnnSetRNNBiasMode(cudnnRNNStruct, int) |
| org.bytedeco.cuda.global.cudnn.cudnnSetRNNDescriptor_v6(cudnnContext, cudnnRNNStruct, int, int, cudnnDropoutStruct, int, int, int, int, int) |
| org.bytedeco.cuda.global.cudnn.cudnnSetRNNMatrixMathType(cudnnRNNStruct, int) |
| org.bytedeco.cuda.global.cudnn.cudnnSetRNNPaddingMode(cudnnRNNStruct, int) |
| org.bytedeco.cuda.global.cudnn.cudnnSetRNNProjectionLayers(cudnnContext, cudnnRNNStruct, int, int) |
| org.bytedeco.cuda.global.cudart.cuFuncSetBlockShape(CUfunc_st, int, int, int)
Specifies the \p x, \p y, and \p z dimensions of the thread blocks that are
created when the kernel given by \p hfunc is launched.
|
| org.bytedeco.cuda.global.cudart.cuFuncSetSharedSize(CUfunc_st, int)
Sets through \p bytes the amount of dynamic shared memory that will be
available to each thread block when the kernel given by \p hfunc is launched.
|
| org.bytedeco.cuda.global.cudart.cuGLCtxCreate(CUctx_st, int, int)
This function is deprecated as of Cuda 5.0.
This function is deprecated and should no longer be used. It is
no longer necessary to associate a CUDA context with an OpenGL
context in order to achieve maximum interoperability performance.
|
| org.bytedeco.cuda.global.cudart.cuGLInit()
This function is deprecated as of Cuda 3.0.
Initializes OpenGL interoperability. This function is deprecated
and calling it is no longer required. It may fail if the needed
OpenGL driver facilities are not available.
|
| org.bytedeco.cuda.global.cudart.cuGLMapBufferObject(long[], SizeTPointer, int) |
| org.bytedeco.cuda.global.cudart.cuGLMapBufferObject(LongBuffer, SizeTPointer, int) |
| org.bytedeco.cuda.global.cudart.cuGLMapBufferObject(LongPointer, SizeTPointer, int)
This function is deprecated as of Cuda 3.0.
Maps the buffer object specified by \p buffer into the address space of the
current CUDA context and returns in \p *dptr and \p *size the base pointer
and size of the resulting mapping.
There must be a valid OpenGL context bound to the current thread
when this function is called. This must be the same context, or a
member of the same shareGroup, as the context that was bound when
the buffer was registered.
All streams in the current CUDA context are synchronized with the
current GL context.
|
| org.bytedeco.cuda.global.cudart.cuGLMapBufferObjectAsync(long[], SizeTPointer, int, CUstream_st) |
| org.bytedeco.cuda.global.cudart.cuGLMapBufferObjectAsync(LongBuffer, SizeTPointer, int, CUstream_st) |
| org.bytedeco.cuda.global.cudart.cuGLMapBufferObjectAsync(LongPointer, SizeTPointer, int, CUstream_st)
This function is deprecated as of Cuda 3.0.
Maps the buffer object specified by \p buffer into the address space of the
current CUDA context and returns in \p *dptr and \p *size the base pointer
and size of the resulting mapping.
There must be a valid OpenGL context bound to the current thread
when this function is called. This must be the same context, or a
member of the same shareGroup, as the context that was bound when
the buffer was registered.
Stream \p hStream in the current CUDA context is synchronized with
the current GL context.
|
| org.bytedeco.cuda.global.cudart.cuGLRegisterBufferObject(int)
This function is deprecated as of Cuda 3.0.
Registers the buffer object specified by \p buffer for access by
CUDA. This function must be called before CUDA can map the buffer
object. There must be a valid OpenGL context bound to the current
thread when this function is called, and the buffer name is
resolved by that context.
|
| org.bytedeco.cuda.global.cudart.cuGLSetBufferObjectMapFlags(int, int)
This function is deprecated as of Cuda 3.0.
Sets the map flags for the buffer object specified by \p buffer.
Changes to \p Flags will take effect the next time \p buffer is mapped.
The \p Flags argument may be any of the following:
- ::CU_GL_MAP_RESOURCE_FLAGS_NONE: Specifies no hints about how this
resource will be used. It is therefore assumed that this resource will be
read from and written to by CUDA kernels. This is the default value.
- ::CU_GL_MAP_RESOURCE_FLAGS_READ_ONLY: Specifies that CUDA kernels which
access this resource will not write to this resource.
- ::CU_GL_MAP_RESOURCE_FLAGS_WRITE_DISCARD: Specifies that CUDA kernels
which access this resource will not read from this resource and will
write over the entire contents of the resource, so none of the data
previously stored in the resource will be preserved.
If \p buffer has not been registered for use with CUDA, then
::CUDA_ERROR_INVALID_HANDLE is returned. If \p buffer is presently
mapped for access by CUDA, then ::CUDA_ERROR_ALREADY_MAPPED is returned.
There must be a valid OpenGL context bound to the current thread
when this function is called. This must be the same context, or a
member of the same shareGroup, as the context that was bound when
the buffer was registered.
|
| org.bytedeco.cuda.global.cudart.cuGLUnmapBufferObject(int)
This function is deprecated as of Cuda 3.0.
Unmaps the buffer object specified by \p buffer for access by CUDA.
There must be a valid OpenGL context bound to the current thread
when this function is called. This must be the same context, or a
member of the same shareGroup, as the context that was bound when
the buffer was registered.
All streams in the current CUDA context are synchronized with the
current GL context.
|
| org.bytedeco.cuda.global.cudart.cuGLUnmapBufferObjectAsync(int, CUstream_st)
This function is deprecated as of Cuda 3.0.
Unmaps the buffer object specified by \p buffer for access by CUDA.
There must be a valid OpenGL context bound to the current thread
when this function is called. This must be the same context, or a
member of the same shareGroup, as the context that was bound when
the buffer was registered.
Stream \p hStream in the current CUDA context is synchronized with
the current GL context.
|
| org.bytedeco.cuda.global.cudart.cuGLUnregisterBufferObject(int)
This function is deprecated as of Cuda 3.0.
Unregisters the buffer object specified by \p buffer. This
releases any resources associated with the registered buffer.
After this call, the buffer may no longer be mapped for access by
CUDA.
There must be a valid OpenGL context bound to the current thread
when this function is called. This must be the same context, or a
member of the same shareGroup, as the context that was bound when
the buffer was registered.
|
| org.bytedeco.cuda.global.cudart.cuLaunch(CUfunc_st)
Invokes the kernel \p f on a 1 x 1 x 1 grid of blocks. The block
contains the number of threads specified by a previous call to
::cuFuncSetBlockShape().
The block shape, dynamic shared memory size, and parameter information
must be set using
::cuFuncSetBlockShape(),
::cuFuncSetSharedSize(),
::cuParamSetSize(),
::cuParamSeti(),
::cuParamSetf(), and
::cuParamSetv()
prior to calling this function.
Launching a function via ::cuLaunchKernel() invalidates the function's
block shape, dynamic shared memory size, and parameter information. After
launching via cuLaunchKernel, this state must be re-initialized prior to
calling this function. Failure to do so results in undefined behavior.
|
| org.bytedeco.cuda.global.cudart.cuLaunchCooperativeKernelMultiDevice(CUDA_LAUNCH_PARAMS_v1, int, int)
This function is deprecated as of CUDA 11.3.
Invokes kernels as specified in the \p launchParamsList array where each element
of the array specifies all the parameters required to perform a single kernel launch.
These kernels can cooperate and synchronize as they execute. The size of the array is
specified by \p numDevices.
No two kernels can be launched on the same device. All the devices targeted by this
multi-device launch must be identical. All devices must have a non-zero value for the
device attribute ::CU_DEVICE_ATTRIBUTE_COOPERATIVE_MULTI_DEVICE_LAUNCH.
All kernels launched must be identical with respect to the compiled code. Note that
any __device__, __constant__ or __managed__ variables present in the module that owns
the kernel launched on each device, are independently instantiated on every device.
It is the application's responsibility to ensure these variables are initialized and
used appropriately.
The size of the grids as specified in blocks, the size of the blocks themselves
and the amount of shared memory used by each thread block must also match across
all launched kernels.
The streams used to launch these kernels must have been created via either ::cuStreamCreate
or ::cuStreamCreateWithPriority. The NULL stream or ::CU_STREAM_LEGACY or ::CU_STREAM_PER_THREAD
cannot be used.
The total number of blocks launched per kernel cannot exceed the maximum number of blocks
per multiprocessor as returned by ::cuOccupancyMaxActiveBlocksPerMultiprocessor (or
::cuOccupancyMaxActiveBlocksPerMultiprocessorWithFlags) times the number of multiprocessors
as specified by the device attribute ::CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT. Since the
total number of blocks launched per device has to match across all devices, the maximum
number of blocks that can be launched per device will be limited by the device with the
least number of multiprocessors.
The kernels cannot make use of CUDA dynamic parallelism.
The ::CUDA_LAUNCH_PARAMS structure is defined as:
where:
- ::CUDA_LAUNCH_PARAMS::function specifies the kernel to be launched. All functions must
be identical with respect to the compiled code.
Note that you can also specify context-less kernel ::CUkernel by querying the handle
using ::cuLibraryGetKernel() and then casting to ::CUfunction. In this case, the context to
launch the kernel on be taken from the specified stream ::CUDA_LAUNCH_PARAMS::hStream.
- ::CUDA_LAUNCH_PARAMS::gridDimX is the width of the grid in blocks. This must match across
all kernels launched.
- ::CUDA_LAUNCH_PARAMS::gridDimY is the height of the grid in blocks. This must match across
all kernels launched.
- ::CUDA_LAUNCH_PARAMS::gridDimZ is the depth of the grid in blocks. This must match across
all kernels launched.
- ::CUDA_LAUNCH_PARAMS::blockDimX is the X dimension of each thread block. This must match across
all kernels launched.
- ::CUDA_LAUNCH_PARAMS::blockDimX is the Y dimension of each thread block. This must match across
all kernels launched.
- ::CUDA_LAUNCH_PARAMS::blockDimZ is the Z dimension of each thread block. This must match across
all kernels launched.
- ::CUDA_LAUNCH_PARAMS::sharedMemBytes is the dynamic shared-memory size per thread block in bytes.
This must match across all kernels launched.
- ::CUDA_LAUNCH_PARAMS::hStream is the handle to the stream to perform the launch in. This cannot
be the NULL stream or ::CU_STREAM_LEGACY or ::CU_STREAM_PER_THREAD. The CUDA context associated
with this stream must match that associated with ::CUDA_LAUNCH_PARAMS::function.
- ::CUDA_LAUNCH_PARAMS::kernelParams is an array of pointers to kernel parameters. If
::CUDA_LAUNCH_PARAMS::function has N parameters, then ::CUDA_LAUNCH_PARAMS::kernelParams
needs to be an array of N pointers. Each of ::CUDA_LAUNCH_PARAMS::kernelParams[0] through
::CUDA_LAUNCH_PARAMS::kernelParams[N-1] must point to a region of memory from which the actual
kernel parameter will be copied. The number of kernel parameters and their offsets and sizes
do not need to be specified as that information is retrieved directly from the kernel's image.
By default, the kernel won't begin execution on any GPU until all prior work in all the specified
streams has completed. This behavior can be overridden by specifying the flag
::CUDA_COOPERATIVE_LAUNCH_MULTI_DEVICE_NO_PRE_LAUNCH_SYNC. When this flag is specified, each kernel
will only wait for prior work in the stream corresponding to that GPU to complete before it begins
execution.
Similarly, by default, any subsequent work pushed in any of the specified streams will not begin
execution until the kernels on all GPUs have completed. This behavior can be overridden by specifying
the flag ::CUDA_COOPERATIVE_LAUNCH_MULTI_DEVICE_NO_POST_LAUNCH_SYNC. When this flag is specified,
any subsequent work pushed in any of the specified streams will only wait for the kernel launched
on the GPU corresponding to that stream to complete before it begins execution.
Calling ::cuLaunchCooperativeKernelMultiDevice() sets persistent function state that is
the same as function state set through ::cuLaunchKernel API when called individually for each
element in \p launchParamsList.
When kernels are launched via ::cuLaunchCooperativeKernelMultiDevice(), the previous
block shape, shared size and parameter info associated with each ::CUDA_LAUNCH_PARAMS::function
in \p launchParamsList is overwritten.
Note that to use ::cuLaunchCooperativeKernelMultiDevice(), the kernels must either have
been compiled with toolchain version 3.2 or later so that it will
contain kernel parameter information, or have no kernel parameters.
If either of these conditions is not met, then ::cuLaunchCooperativeKernelMultiDevice() will
return ::CUDA_ERROR_INVALID_IMAGE. |
| org.bytedeco.cuda.global.cudart.cuLaunchGrid(CUfunc_st, int, int)
Invokes the kernel \p f on a \p grid_width x \p grid_height grid of
blocks. Each block contains the number of threads specified by a previous
call to ::cuFuncSetBlockShape().
The block shape, dynamic shared memory size, and parameter information
must be set using
::cuFuncSetBlockShape(),
::cuFuncSetSharedSize(),
::cuParamSetSize(),
::cuParamSeti(),
::cuParamSetf(), and
::cuParamSetv()
prior to calling this function.
Launching a function via ::cuLaunchKernel() invalidates the function's
block shape, dynamic shared memory size, and parameter information. After
launching via cuLaunchKernel, this state must be re-initialized prior to
calling this function. Failure to do so results in undefined behavior.
|
| org.bytedeco.cuda.global.cudart.cuLaunchGridAsync(CUfunc_st, int, int, CUstream_st)
Invokes the kernel \p f on a \p grid_width x \p grid_height grid of
blocks. Each block contains the number of threads specified by a previous
call to ::cuFuncSetBlockShape().
The block shape, dynamic shared memory size, and parameter information
must be set using
::cuFuncSetBlockShape(),
::cuFuncSetSharedSize(),
::cuParamSetSize(),
::cuParamSeti(),
::cuParamSetf(), and
::cuParamSetv()
prior to calling this function.
Launching a function via ::cuLaunchKernel() invalidates the function's
block shape, dynamic shared memory size, and parameter information. After
launching via cuLaunchKernel, this state must be re-initialized prior to
calling this function. Failure to do so results in undefined behavior.
|
| org.bytedeco.cuda.global.cudart.cuModuleGetSurfRef(CUsurfref_st, CUmod_st, BytePointer)
Returns in \p *pSurfRef the handle of the surface reference of name \p name
in the module \p hmod. If no surface reference of that name exists,
::cuModuleGetSurfRef() returns ::CUDA_ERROR_NOT_FOUND.
|
| org.bytedeco.cuda.global.cudart.cuModuleGetSurfRef(CUsurfref_st, CUmod_st, String) |
| org.bytedeco.cuda.global.cudart.cuModuleGetTexRef(CUtexref_st, CUmod_st, BytePointer)
Returns in \p *pTexRef the handle of the texture reference of name \p name
in the module \p hmod. If no texture reference of that name exists,
::cuModuleGetTexRef() returns ::CUDA_ERROR_NOT_FOUND. This texture reference
handle should not be destroyed, since it will be destroyed when the module
is unloaded.
|
| org.bytedeco.cuda.global.cudart.cuModuleGetTexRef(CUtexref_st, CUmod_st, String) |
| org.bytedeco.cuda.global.cudart.cuParamSetf(CUfunc_st, int, float)
Sets a floating-point parameter that will be specified the next time the
kernel corresponding to \p hfunc will be invoked. \p offset is a byte offset.
|
| org.bytedeco.cuda.global.cudart.cuParamSeti(CUfunc_st, int, int)
Sets an integer parameter that will be specified the next time the
kernel corresponding to \p hfunc will be invoked. \p offset is a byte offset.
|
| org.bytedeco.cuda.global.cudart.cuParamSetSize(CUfunc_st, int)
Sets through \p numbytes the total size in bytes needed by the function
parameters of the kernel corresponding to \p hfunc.
|
| org.bytedeco.cuda.global.cudart.cuParamSetTexRef(CUfunc_st, int, CUtexref_st)
Makes the CUDA array or linear memory bound to the texture reference
\p hTexRef available to a device program as a texture. In this version of
CUDA, the texture-reference must be obtained via ::cuModuleGetTexRef() and
the \p texunit parameter must be set to ::CU_PARAM_TR_DEFAULT.
|
| org.bytedeco.cuda.global.cudart.cuParamSetv(CUfunc_st, int, Pointer, int)
Copies an arbitrary amount of data (specified in \p numbytes) from \p ptr
into the parameter space of the kernel corresponding to \p hfunc. \p offset
is a byte offset.
|
| org.bytedeco.cuda.global.cusolver.cusolverDnGeqrf_bufferSize(cusolverDnContext, cusolverDnParams, long, long, int, Pointer, long, int, Pointer, int, SizeTPointer) |
| org.bytedeco.cuda.global.cusolver.cusolverDnGeqrf(cusolverDnContext, cusolverDnParams, long, long, int, Pointer, long, int, Pointer, int, Pointer, long, int[]) |
| org.bytedeco.cuda.global.cusolver.cusolverDnGeqrf(cusolverDnContext, cusolverDnParams, long, long, int, Pointer, long, int, Pointer, int, Pointer, long, IntBuffer) |
| org.bytedeco.cuda.global.cusolver.cusolverDnGeqrf(cusolverDnContext, cusolverDnParams, long, long, int, Pointer, long, int, Pointer, int, Pointer, long, IntPointer) |
| org.bytedeco.cuda.global.cusolver.cusolverDnGesvd_bufferSize(cusolverDnContext, cusolverDnParams, byte, byte, long, long, int, Pointer, long, int, Pointer, int, Pointer, long, int, Pointer, long, int, SizeTPointer) |
| org.bytedeco.cuda.global.cusolver.cusolverDnGesvd(cusolverDnContext, cusolverDnParams, byte, byte, long, long, int, Pointer, long, int, Pointer, int, Pointer, long, int, Pointer, long, int, Pointer, long, int[]) |
| org.bytedeco.cuda.global.cusolver.cusolverDnGesvd(cusolverDnContext, cusolverDnParams, byte, byte, long, long, int, Pointer, long, int, Pointer, int, Pointer, long, int, Pointer, long, int, Pointer, long, IntBuffer) |
| org.bytedeco.cuda.global.cusolver.cusolverDnGesvd(cusolverDnContext, cusolverDnParams, byte, byte, long, long, int, Pointer, long, int, Pointer, int, Pointer, long, int, Pointer, long, int, Pointer, long, IntPointer) |
| org.bytedeco.cuda.global.cusolver.cusolverDnGetrf_bufferSize(cusolverDnContext, cusolverDnParams, long, long, int, Pointer, long, int, SizeTPointer) |
| org.bytedeco.cuda.global.cusolver.cusolverDnGetrf(cusolverDnContext, cusolverDnParams, long, long, int, Pointer, long, long[], int, Pointer, long, int[]) |
| org.bytedeco.cuda.global.cusolver.cusolverDnGetrf(cusolverDnContext, cusolverDnParams, long, long, int, Pointer, long, LongBuffer, int, Pointer, long, IntBuffer) |
| org.bytedeco.cuda.global.cusolver.cusolverDnGetrf(cusolverDnContext, cusolverDnParams, long, long, int, Pointer, long, LongPointer, int, Pointer, long, IntPointer) |
| org.bytedeco.cuda.global.cusolver.cusolverDnGetrs(cusolverDnContext, cusolverDnParams, int, long, long, int, Pointer, long, long[], int, Pointer, long, int[]) |
| org.bytedeco.cuda.global.cusolver.cusolverDnGetrs(cusolverDnContext, cusolverDnParams, int, long, long, int, Pointer, long, LongBuffer, int, Pointer, long, IntBuffer) |
| org.bytedeco.cuda.global.cusolver.cusolverDnGetrs(cusolverDnContext, cusolverDnParams, int, long, long, int, Pointer, long, LongPointer, int, Pointer, long, IntPointer) |
| org.bytedeco.cuda.global.cusolver.cusolverDnPotrf_bufferSize(cusolverDnContext, cusolverDnParams, int, long, int, Pointer, long, int, SizeTPointer) |
| org.bytedeco.cuda.global.cusolver.cusolverDnPotrf(cusolverDnContext, cusolverDnParams, int, long, int, Pointer, long, int, Pointer, long, int[]) |
| org.bytedeco.cuda.global.cusolver.cusolverDnPotrf(cusolverDnContext, cusolverDnParams, int, long, int, Pointer, long, int, Pointer, long, IntBuffer) |
| org.bytedeco.cuda.global.cusolver.cusolverDnPotrf(cusolverDnContext, cusolverDnParams, int, long, int, Pointer, long, int, Pointer, long, IntPointer) |
| org.bytedeco.cuda.global.cusolver.cusolverDnPotrs(cusolverDnContext, cusolverDnParams, int, long, long, int, Pointer, long, int, Pointer, long, int[]) |
| org.bytedeco.cuda.global.cusolver.cusolverDnPotrs(cusolverDnContext, cusolverDnParams, int, long, long, int, Pointer, long, int, Pointer, long, IntBuffer) |
| org.bytedeco.cuda.global.cusolver.cusolverDnPotrs(cusolverDnContext, cusolverDnParams, int, long, long, int, Pointer, long, int, Pointer, long, IntPointer) |
| org.bytedeco.cuda.global.cusolver.cusolverDnSyevd_bufferSize(cusolverDnContext, cusolverDnParams, int, int, long, int, Pointer, long, int, Pointer, int, SizeTPointer) |
| org.bytedeco.cuda.global.cusolver.cusolverDnSyevd(cusolverDnContext, cusolverDnParams, int, int, long, int, Pointer, long, int, Pointer, int, Pointer, long, int[]) |
| org.bytedeco.cuda.global.cusolver.cusolverDnSyevd(cusolverDnContext, cusolverDnParams, int, int, long, int, Pointer, long, int, Pointer, int, Pointer, long, IntBuffer) |
| org.bytedeco.cuda.global.cusolver.cusolverDnSyevd(cusolverDnContext, cusolverDnParams, int, int, long, int, Pointer, long, int, Pointer, int, Pointer, long, IntPointer) |
| org.bytedeco.cuda.global.cusolver.cusolverDnSyevdx_bufferSize(cusolverDnContext, cusolverDnParams, int, int, int, long, int, Pointer, long, Pointer, Pointer, long, long, long[], int, Pointer, int, SizeTPointer) |
| org.bytedeco.cuda.global.cusolver.cusolverDnSyevdx_bufferSize(cusolverDnContext, cusolverDnParams, int, int, int, long, int, Pointer, long, Pointer, Pointer, long, long, LongBuffer, int, Pointer, int, SizeTPointer) |
| org.bytedeco.cuda.global.cusolver.cusolverDnSyevdx_bufferSize(cusolverDnContext, cusolverDnParams, int, int, int, long, int, Pointer, long, Pointer, Pointer, long, long, LongPointer, int, Pointer, int, SizeTPointer) |
| org.bytedeco.cuda.global.cusolver.cusolverDnSyevdx(cusolverDnContext, cusolverDnParams, int, int, int, long, int, Pointer, long, Pointer, Pointer, long, long, long[], int, Pointer, int, Pointer, long, int[]) |
| org.bytedeco.cuda.global.cusolver.cusolverDnSyevdx(cusolverDnContext, cusolverDnParams, int, int, int, long, int, Pointer, long, Pointer, Pointer, long, long, LongBuffer, int, Pointer, int, Pointer, long, IntBuffer) |
| org.bytedeco.cuda.global.cusolver.cusolverDnSyevdx(cusolverDnContext, cusolverDnParams, int, int, int, long, int, Pointer, long, Pointer, Pointer, long, long, LongPointer, int, Pointer, int, Pointer, long, IntPointer) |
| org.bytedeco.cuda.global.cudart.cuSurfRefGetArray(CUarray_st, CUsurfref_st)
Returns in \p *phArray the CUDA array bound to the surface reference
\p hSurfRef, or returns ::CUDA_ERROR_INVALID_VALUE if the surface reference
is not bound to any CUDA array.
|
| org.bytedeco.cuda.global.cudart.cuSurfRefSetArray(CUsurfref_st, CUarray_st, int)
Sets the CUDA array \p hArray to be read and written by the surface reference
\p hSurfRef. Any previous CUDA array state associated with the surface
reference is superseded by this function. \p Flags must be set to 0.
The ::CUDA_ARRAY3D_SURFACE_LDST flag must have been set for the CUDA array.
Any CUDA array previously bound to \p hSurfRef is unbound.
|
| org.bytedeco.cuda.global.cudart.cuTexRefCreate(CUtexref_st)
Creates a texture reference and returns its handle in \p *pTexRef. Once
created, the application must call ::cuTexRefSetArray() or
::cuTexRefSetAddress() to associate the reference with allocated memory.
Other texture reference functions are used to specify the format and
interpretation (addressing, filtering, etc.) to be used when the memory is
read through this texture reference.
|
| org.bytedeco.cuda.global.cudart.cuTexRefDestroy(CUtexref_st)
Destroys the texture reference specified by \p hTexRef.
|
| org.bytedeco.cuda.global.cudart.cuTexRefGetAddress(long[], CUtexref_st) |
| org.bytedeco.cuda.global.cudart.cuTexRefGetAddress(LongBuffer, CUtexref_st) |
| org.bytedeco.cuda.global.cudart.cuTexRefGetAddress(LongPointer, CUtexref_st)
Returns in \p *pdptr the base address bound to the texture reference
\p hTexRef, or returns ::CUDA_ERROR_INVALID_VALUE if the texture reference
is not bound to any device memory range.
|
| org.bytedeco.cuda.global.cudart.cuTexRefGetAddressMode(int[], CUtexref_st, int) |
| org.bytedeco.cuda.global.cudart.cuTexRefGetAddressMode(IntBuffer, CUtexref_st, int) |
| org.bytedeco.cuda.global.cudart.cuTexRefGetAddressMode(IntPointer, CUtexref_st, int)
Returns in \p *pam the addressing mode corresponding to the
dimension \p dim of the texture reference \p hTexRef. Currently, the only
valid value for \p dim are 0 and 1.
|
| org.bytedeco.cuda.global.cudart.cuTexRefGetArray(CUarray_st, CUtexref_st)
Returns in \p *phArray the CUDA array bound to the texture reference
\p hTexRef, or returns ::CUDA_ERROR_INVALID_VALUE if the texture reference
is not bound to any CUDA array.
|
| org.bytedeco.cuda.global.cudart.cuTexRefGetBorderColor(float[], CUtexref_st) |
| org.bytedeco.cuda.global.cudart.cuTexRefGetBorderColor(FloatBuffer, CUtexref_st) |
| org.bytedeco.cuda.global.cudart.cuTexRefGetBorderColor(FloatPointer, CUtexref_st)
Returns in \p pBorderColor, values of the RGBA color used by
the texture reference \p hTexRef.
The color value is of type float and holds color components in
the following sequence:
pBorderColor[0] holds 'R' component
pBorderColor[1] holds 'G' component
pBorderColor[2] holds 'B' component
pBorderColor[3] holds 'A' component
|
| org.bytedeco.cuda.global.cudart.cuTexRefGetFilterMode(int[], CUtexref_st) |
| org.bytedeco.cuda.global.cudart.cuTexRefGetFilterMode(IntBuffer, CUtexref_st) |
| org.bytedeco.cuda.global.cudart.cuTexRefGetFilterMode(IntPointer, CUtexref_st)
Returns in \p *pfm the filtering mode of the texture reference
\p hTexRef.
|
| org.bytedeco.cuda.global.cudart.cuTexRefGetFlags(int[], CUtexref_st) |
| org.bytedeco.cuda.global.cudart.cuTexRefGetFlags(IntBuffer, CUtexref_st) |
| org.bytedeco.cuda.global.cudart.cuTexRefGetFlags(IntPointer, CUtexref_st)
Returns in \p *pFlags the flags of the texture reference \p hTexRef.
|
| org.bytedeco.cuda.global.cudart.cuTexRefGetFormat(int[], int[], CUtexref_st) |
| org.bytedeco.cuda.global.cudart.cuTexRefGetFormat(IntBuffer, IntBuffer, CUtexref_st) |
| org.bytedeco.cuda.global.cudart.cuTexRefGetFormat(IntPointer, IntPointer, CUtexref_st)
Returns in \p *pFormat and \p *pNumChannels the format and number
of components of the CUDA array bound to the texture reference \p hTexRef.
If \p pFormat or \p pNumChannels is NULL, it will be ignored.
|
| org.bytedeco.cuda.global.cudart.cuTexRefGetMaxAnisotropy(int[], CUtexref_st) |
| org.bytedeco.cuda.global.cudart.cuTexRefGetMaxAnisotropy(IntBuffer, CUtexref_st) |
| org.bytedeco.cuda.global.cudart.cuTexRefGetMaxAnisotropy(IntPointer, CUtexref_st)
Returns the maximum anisotropy in \p pmaxAniso that's used when reading memory through
the texture reference \p hTexRef.
|
| org.bytedeco.cuda.global.cudart.cuTexRefGetMipmapFilterMode(int[], CUtexref_st) |
| org.bytedeco.cuda.global.cudart.cuTexRefGetMipmapFilterMode(IntBuffer, CUtexref_st) |
| org.bytedeco.cuda.global.cudart.cuTexRefGetMipmapFilterMode(IntPointer, CUtexref_st)
Returns the mipmap filtering mode in \p pfm that's used when reading memory through
the texture reference \p hTexRef.
|
| org.bytedeco.cuda.global.cudart.cuTexRefGetMipmapLevelBias(float[], CUtexref_st) |
| org.bytedeco.cuda.global.cudart.cuTexRefGetMipmapLevelBias(FloatBuffer, CUtexref_st) |
| org.bytedeco.cuda.global.cudart.cuTexRefGetMipmapLevelBias(FloatPointer, CUtexref_st)
Returns the mipmap level bias in \p pBias that's added to the specified mipmap
level when reading memory through the texture reference \p hTexRef.
|
| org.bytedeco.cuda.global.cudart.cuTexRefGetMipmapLevelClamp(float[], float[], CUtexref_st) |
| org.bytedeco.cuda.global.cudart.cuTexRefGetMipmapLevelClamp(FloatBuffer, FloatBuffer, CUtexref_st) |
| org.bytedeco.cuda.global.cudart.cuTexRefGetMipmapLevelClamp(FloatPointer, FloatPointer, CUtexref_st)
Returns the min/max mipmap level clamps in \p pminMipmapLevelClamp and \p pmaxMipmapLevelClamp
that's used when reading memory through the texture reference \p hTexRef.
|
| org.bytedeco.cuda.global.cudart.cuTexRefGetMipmappedArray(CUmipmappedArray_st, CUtexref_st)
Returns in \p *phMipmappedArray the CUDA mipmapped array bound to the texture
reference \p hTexRef, or returns ::CUDA_ERROR_INVALID_VALUE if the texture reference
is not bound to any CUDA mipmapped array.
|
| org.bytedeco.cuda.global.cudart.cuTexRefSetAddress(SizeTPointer, CUtexref_st, long, long)
Binds a linear address range to the texture reference \p hTexRef. Any
previous address or CUDA array state associated with the texture reference
is superseded by this function. Any memory previously bound to \p hTexRef
is unbound.
Since the hardware enforces an alignment requirement on texture base
addresses, ::cuTexRefSetAddress() passes back a byte offset in
\p *ByteOffset that must be applied to texture fetches in order to read from
the desired memory. This offset must be divided by the texel size and
passed to kernels that read from the texture so they can be applied to the
::tex1Dfetch() function.
If the device memory pointer was returned from ::cuMemAlloc(), the offset
is guaranteed to be 0 and NULL may be passed as the \p ByteOffset parameter.
The total number of elements (or texels) in the linear address range
cannot exceed ::CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE1D_LINEAR_WIDTH.
The number of elements is computed as (\p bytes / bytesPerElement),
where bytesPerElement is determined from the data format and number of
components set using ::cuTexRefSetFormat().
|
| org.bytedeco.cuda.global.cudart.cuTexRefSetAddress2D(CUtexref_st, CUDA_ARRAY_DESCRIPTOR_v2, long, long)
Binds a linear address range to the texture reference \p hTexRef. Any
previous address or CUDA array state associated with the texture reference
is superseded by this function. Any memory previously bound to \p hTexRef
is unbound.
Using a ::tex2D() function inside a kernel requires a call to either
::cuTexRefSetArray() to bind the corresponding texture reference to an
array, or ::cuTexRefSetAddress2D() to bind the texture reference to linear
memory.
Function calls to ::cuTexRefSetFormat() cannot follow calls to
::cuTexRefSetAddress2D() for the same texture reference.
It is required that \p dptr be aligned to the appropriate hardware-specific
texture alignment. You can query this value using the device attribute
::CU_DEVICE_ATTRIBUTE_TEXTURE_ALIGNMENT. If an unaligned \p dptr is
supplied, ::CUDA_ERROR_INVALID_VALUE is returned.
\p Pitch has to be aligned to the hardware-specific texture pitch alignment.
This value can be queried using the device attribute
::CU_DEVICE_ATTRIBUTE_TEXTURE_PITCH_ALIGNMENT. If an unaligned \p Pitch is
supplied, ::CUDA_ERROR_INVALID_VALUE is returned.
Width and Height, which are specified in elements (or texels), cannot exceed
::CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_LINEAR_WIDTH and
::CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_LINEAR_HEIGHT respectively.
\p Pitch, which is specified in bytes, cannot exceed
::CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_LINEAR_PITCH.
|
| org.bytedeco.cuda.global.cudart.cuTexRefSetAddressMode(CUtexref_st, int, int)
Specifies the addressing mode \p am for the given dimension \p dim of the
texture reference \p hTexRef. If \p dim is zero, the addressing mode is
applied to the first parameter of the functions used to fetch from the
texture; if \p dim is 1, the second, and so on. ::CUaddress_mode is defined
as:
Note that this call has no effect if \p hTexRef is bound to linear memory.
Also, if the flag, ::CU_TRSF_NORMALIZED_COORDINATES, is not set, the only
supported address mode is ::CU_TR_ADDRESS_MODE_CLAMP. |
| org.bytedeco.cuda.global.cudart.cuTexRefSetArray(CUtexref_st, CUarray_st, int)
Binds the CUDA array \p hArray to the texture reference \p hTexRef. Any
previous address or CUDA array state associated with the texture reference
is superseded by this function. \p Flags must be set to
::CU_TRSA_OVERRIDE_FORMAT. Any CUDA array previously bound to \p hTexRef is
unbound.
|
| org.bytedeco.cuda.global.cudart.cuTexRefSetBorderColor(CUtexref_st, float[]) |
| org.bytedeco.cuda.global.cudart.cuTexRefSetBorderColor(CUtexref_st, FloatBuffer) |
| org.bytedeco.cuda.global.cudart.cuTexRefSetBorderColor(CUtexref_st, FloatPointer)
Specifies the value of the RGBA color via the \p pBorderColor to the texture reference
\p hTexRef. The color value supports only float type and holds color components in
the following sequence:
pBorderColor[0] holds 'R' component
pBorderColor[1] holds 'G' component
pBorderColor[2] holds 'B' component
pBorderColor[3] holds 'A' component
Note that the color values can be set only when the Address mode is set to
CU_TR_ADDRESS_MODE_BORDER using ::cuTexRefSetAddressMode.
Applications using integer border color values have to "reinterpret_cast" their values to float.
|
| org.bytedeco.cuda.global.cudart.cuTexRefSetFilterMode(CUtexref_st, int)
Specifies the filtering mode \p fm to be used when reading memory through
the texture reference \p hTexRef. ::CUfilter_mode_enum is defined as:
Note that this call has no effect if \p hTexRef is bound to linear memory. |
| org.bytedeco.cuda.global.cudart.cuTexRefSetFlags(CUtexref_st, int)
Specifies optional flags via \p Flags to specify the behavior of data
returned through the texture reference \p hTexRef. The valid flags are:
- ::CU_TRSF_READ_AS_INTEGER, which suppresses the default behavior of
having the texture promote integer data to floating point data in the
range [0, 1]. Note that texture with 32-bit integer format
would not be promoted, regardless of whether or not this
flag is specified;
- ::CU_TRSF_NORMALIZED_COORDINATES, which suppresses the
default behavior of having the texture coordinates range
from [0, Dim) where Dim is the width or height of the CUDA
array. Instead, the texture coordinates [0, 1.0) reference
the entire breadth of the array dimension;
- ::CU_TRSF_DISABLE_TRILINEAR_OPTIMIZATION, which disables any trilinear
filtering optimizations. Trilinear optimizations improve texture filtering
performance by allowing bilinear filtering on textures in scenarios where
it can closely approximate the expected results.
|
| org.bytedeco.cuda.global.cudart.cuTexRefSetFormat(CUtexref_st, int, int)
Specifies the format of the data to be read by the texture reference
\p hTexRef. \p fmt and \p NumPackedComponents are exactly analogous to the
::Format and ::NumChannels members of the ::CUDA_ARRAY_DESCRIPTOR structure:
They specify the format of each component and the number of components per
array element.
|
| org.bytedeco.cuda.global.cudart.cuTexRefSetMaxAnisotropy(CUtexref_st, int)
Specifies the maximum anisotropy \p maxAniso to be used when reading memory through
the texture reference \p hTexRef.
Note that this call has no effect if \p hTexRef is bound to linear memory.
|
| org.bytedeco.cuda.global.cudart.cuTexRefSetMipmapFilterMode(CUtexref_st, int)
Specifies the mipmap filtering mode \p fm to be used when reading memory through
the texture reference \p hTexRef. ::CUfilter_mode_enum is defined as:
Note that this call has no effect if \p hTexRef is not bound to a mipmapped array. |
| org.bytedeco.cuda.global.cudart.cuTexRefSetMipmapLevelBias(CUtexref_st, float)
Specifies the mipmap level bias \p bias to be added to the specified mipmap level when
reading memory through the texture reference \p hTexRef.
Note that this call has no effect if \p hTexRef is not bound to a mipmapped array.
|
| org.bytedeco.cuda.global.cudart.cuTexRefSetMipmapLevelClamp(CUtexref_st, float, float)
Specifies the min/max mipmap level clamps, \p minMipmapLevelClamp and \p maxMipmapLevelClamp
respectively, to be used when reading memory through the texture reference
\p hTexRef.
Note that this call has no effect if \p hTexRef is not bound to a mipmapped array.
|
| org.bytedeco.cuda.global.cudart.cuTexRefSetMipmappedArray(CUtexref_st, CUmipmappedArray_st, int)
Binds the CUDA mipmapped array \p hMipmappedArray to the texture reference \p hTexRef.
Any previous address or CUDA array state associated with the texture reference
is superseded by this function. \p Flags must be set to ::CU_TRSA_OVERRIDE_FORMAT.
Any CUDA array previously bound to \p hTexRef is unbound.
|
| org.bytedeco.cuda.global.nvml.NVML_DOUBLE_BIT_ECC()
Mapped to \ref NVML_MEMORY_ERROR_TYPE_UNCORRECTED
|
| org.bytedeco.cuda.global.nvml.NVML_SINGLE_BIT_ECC()
Mapped to \ref NVML_MEMORY_ERROR_TYPE_CORRECTED
|
| org.bytedeco.cuda.global.nvml.nvmlDeviceGetDetailedEccErrors(nvmlDevice_st, int, int, nvmlEccErrorCounts_t)
This API supports only a fixed set of ECC error locations
On different GPU architectures different locations are supported
See \ref nvmlDeviceGetMemoryErrorCounter
For Fermi &tm; or newer fully supported devices.
Only applicable to devices with ECC.
Requires \a NVML_INFOROM_ECC version 2.0 or higher to report aggregate location-based ECC counts.
Requires \a NVML_INFOROM_ECC version 1.0 or higher to report all other ECC counts.
Requires ECC Mode to be enabled.
Detailed errors provide separate ECC counts for specific parts of the memory system.
Reports zero for unsupported ECC error counters when a subset of ECC error counters are supported.
See \ref nvmlMemoryErrorType_t for a description of available bit types.\n
See \ref nvmlEccCounterType_t for a description of available counter types.\n
See \ref nvmlEccErrorCounts_t for a description of provided detailed ECC counts.
|
| org.bytedeco.cuda.global.nvml.nvmlDeviceGetHandleBySerial(BytePointer, nvmlDevice_st)
Since more than one GPU can exist on a single board this function is deprecated in favor
of \ref nvmlDeviceGetHandleByUUID.
For dual GPU boards this function will return NVML_ERROR_INVALID_ARGUMENT.
Starting from NVML 5, this API causes NVML to initialize the target GPU
NVML may initialize additional GPUs as it searches for the target GPU
|
| org.bytedeco.cuda.global.nvml.nvmlVgpuInstanceGetLicenseStatus(int, IntPointer)
Use \ref nvmlVgpuInstanceGetLicenseInfo_v2.
Retrieve the current licensing state of the vGPU instance.
If the vGPU is currently licensed, \a licensed is set to 1, otherwise it is set to 0.
For Kepler &tm; or newer fully supported devices.
|
Copyright © 2023. All rights reserved.