| Class and Description |
|---|
| org.bytedeco.cuda.nvml.nvmlEccErrorCounts_t
Different GPU families can have different memory error counters
See \ref nvmlDeviceGetMemoryErrorCounter
|
| Field and Description |
|---|
| org.bytedeco.cuda.global.cudart.cudaDeviceBlockingSync
This flag was deprecated as of CUDA 4.0 and
replaced with ::cudaDeviceScheduleBlockingSync.
|
| org.bytedeco.cuda.global.nvml.nvmlClocksThrottleReasonUserDefinedClocks
Renamed to \ref nvmlClocksThrottleReasonApplicationsClocksSetting
as the name describes the situation more accurately.
|
| Method and Description |
|---|
| org.bytedeco.cuda.global.cudart.cuCtxAttach(CUctx_st, int)
Note that this function is deprecated and should not be used.
Increments the usage count of the context and passes back a context handle
in \p *pctx that must be passed to ::cuCtxDetach() when the application is
done with the context. ::cuCtxAttach() fails if there is no context current
to the thread.
Currently, the \p flags parameter must be 0.
|
| org.bytedeco.cuda.global.cudart.cuCtxDetach(CUctx_st)
Note that this function is deprecated and should not be used.
Decrements the usage count of the context \p ctx, and destroys the context
if the usage count goes to 0. The context must be a handle that was passed
back by ::cuCtxCreate() or ::cuCtxAttach(), and must be current to the
calling thread.
|
| org.bytedeco.cuda.global.cudart.cudaConfigureCall(dim3, dim3, long, CUstream_st)
This function is deprecated as of CUDA 7.0
Specifies the grid and block dimensions for the device call to be executed
similar to the execution configuration syntax. ::cudaConfigureCall() is
stack based. Each call pushes data on top of an execution stack. This data
contains the dimension for the grid and thread blocks, together with any
arguments for the call.
|
| org.bytedeco.cuda.global.cudart.cudaLaunch(Pointer)
This function is deprecated as of CUDA 7.0
Launches the function \p func on the device. The parameter \p func must
be a device function symbol. The parameter specified by \p func must be
declared as a \p __global__ function. For templated functions, pass the
function symbol as follows: func_name
|
| org.bytedeco.cuda.global.cudart.cudaSetDoubleForDevice(double[]) |
| org.bytedeco.cuda.global.cudart.cudaSetDoubleForDevice(DoubleBuffer) |
| org.bytedeco.cuda.global.cudart.cudaSetDoubleForDevice(DoublePointer)
This function is deprecated as of CUDA 7.5
Converts the double value of \p d to an internal float representation if
the device does not support double arithmetic. If the device does natively
support doubles, then this function does nothing.
|
| org.bytedeco.cuda.global.cudart.cudaSetDoubleForHost(double[]) |
| org.bytedeco.cuda.global.cudart.cudaSetDoubleForHost(DoubleBuffer) |
| org.bytedeco.cuda.global.cudart.cudaSetDoubleForHost(DoublePointer)
This function is deprecated as of CUDA 7.5
Converts the double value of \p d from a potentially internal float
representation if the device does not support double arithmetic. If the
device does natively support doubles, then this function does nothing.
|
| org.bytedeco.cuda.global.cudart.cudaSetupArgument(Pointer, long, long)
This function is deprecated as of CUDA 7.0
Pushes \p size bytes of the argument pointed to by \p arg at \p offset
bytes from the start of the parameter passing area, which starts at
offset 0. The arguments are stored in the top of the execution stack.
\ref ::cudaSetupArgument(const void*, size_t, size_t) "cudaSetupArgument()"
must be preceded by a call to ::cudaConfigureCall().
|
| org.bytedeco.cuda.global.cudart.cudaThreadExit()
Note that this function is deprecated because its name does not
reflect its behavior. Its functionality is identical to the
non-deprecated function ::cudaDeviceReset(), which should be used
instead.
Explicitly destroys all cleans up all resources associated with the current
device in the current process. Any subsequent API call to this device will
reinitialize the device.
Note that this function will reset the device immediately. It is the caller's
responsibility to ensure that the device is not being accessed by any
other host threads from the process when this function is called.
|
| org.bytedeco.cuda.global.cudart.cudaThreadGetCacheConfig(int[]) |
| org.bytedeco.cuda.global.cudart.cudaThreadGetCacheConfig(IntBuffer) |
| org.bytedeco.cuda.global.cudart.cudaThreadGetCacheConfig(IntPointer)
Note that this function is deprecated because its name does not
reflect its behavior. Its functionality is identical to the
non-deprecated function ::cudaDeviceGetCacheConfig(), which should be
used instead.
On devices where the L1 cache and shared memory use the same hardware
resources, this returns through \p pCacheConfig the preferred cache
configuration for the current device. This is only a preference. The
runtime will use the requested configuration if possible, but it is free to
choose a different configuration if required to execute functions.
This will return a \p pCacheConfig of ::cudaFuncCachePreferNone on devices
where the size of the L1 cache and shared memory are fixed.
The supported cache configurations are:
- ::cudaFuncCachePreferNone: no preference for shared memory or L1 (default)
- ::cudaFuncCachePreferShared: prefer larger shared memory and smaller L1 cache
- ::cudaFuncCachePreferL1: prefer larger L1 cache and smaller shared memory
|
| org.bytedeco.cuda.global.cudart.cudaThreadGetLimit(SizeTPointer, int)
Note that this function is deprecated because its name does not
reflect its behavior. Its functionality is identical to the
non-deprecated function ::cudaDeviceGetLimit(), which should be used
instead.
Returns in \p *pValue the current size of \p limit. The supported
::cudaLimit values are:
- ::cudaLimitStackSize: stack size of each GPU thread;
- ::cudaLimitPrintfFifoSize: size of the shared FIFO used by the
::printf() device system call.
- ::cudaLimitMallocHeapSize: size of the heap used by the
::malloc() and ::free() device system calls;
|
| org.bytedeco.cuda.global.cudart.cudaThreadSetCacheConfig(int)
Note that this function is deprecated because its name does not
reflect its behavior. Its functionality is identical to the
non-deprecated function ::cudaDeviceSetCacheConfig(), which should be
used instead.
On devices where the L1 cache and shared memory use the same hardware
resources, this sets through \p cacheConfig the preferred cache
configuration for the current device. This is only a preference. The
runtime will use the requested configuration if possible, but it is free to
choose a different configuration if required to execute the function. Any
function preference set via
\ref ::cudaFuncSetCacheConfig(const void*, enum cudaFuncCache) "cudaFuncSetCacheConfig (C API)"
or
\ref ::cudaFuncSetCacheConfig(T*, enum cudaFuncCache) "cudaFuncSetCacheConfig (C++ API)"
will be preferred over this device-wide setting. Setting the device-wide
cache configuration to ::cudaFuncCachePreferNone will cause subsequent
kernel launches to prefer to not change the cache configuration unless
required to launch the kernel.
This setting does nothing on devices where the size of the L1 cache and
shared memory are fixed.
Launching a kernel with a different preference than the most recent
preference setting may insert a device-side synchronization point.
The supported cache configurations are:
- ::cudaFuncCachePreferNone: no preference for shared memory or L1 (default)
- ::cudaFuncCachePreferShared: prefer larger shared memory and smaller L1 cache
- ::cudaFuncCachePreferL1: prefer larger L1 cache and smaller shared memory
|
| org.bytedeco.cuda.global.cudart.cudaThreadSetLimit(int, long)
Note that this function is deprecated because its name does not
reflect its behavior. Its functionality is identical to the
non-deprecated function ::cudaDeviceSetLimit(), which should be used
instead.
Setting \p limit to \p value is a request by the application to update
the current limit maintained by the device. The driver is free to
modify the requested value to meet h/w requirements (this could be
clamping to minimum or maximum values, rounding up to nearest element
size, etc). The application can use ::cudaThreadGetLimit() to find out
exactly what the limit has been set to.
Setting each ::cudaLimit has its own specific restrictions, so each is
discussed here.
- ::cudaLimitStackSize controls the stack size of each GPU thread.
- ::cudaLimitPrintfFifoSize controls the size of the shared FIFO
used by the ::printf() device system call.
Setting ::cudaLimitPrintfFifoSize must be performed before
launching any kernel that uses the ::printf() device
system call, otherwise ::cudaErrorInvalidValue will be returned.
- ::cudaLimitMallocHeapSize controls the size of the heap used
by the ::malloc() and ::free() device system calls. Setting
::cudaLimitMallocHeapSize must be performed before launching
any kernel that uses the ::malloc() or ::free() device system calls,
otherwise ::cudaErrorInvalidValue will be returned.
|
| org.bytedeco.cuda.global.cudart.cudaThreadSynchronize()
Note that this function is deprecated because its name does not
reflect its behavior. Its functionality is similar to the
non-deprecated function ::cudaDeviceSynchronize(), which should be used
instead.
Blocks until the device has completed all preceding requested tasks.
::cudaThreadSynchronize() returns an error if one of the preceding tasks
has failed. If the ::cudaDeviceScheduleBlockingSync flag was set for
this device, the host thread will block until the device has finished
its work.
|
| org.bytedeco.cuda.global.cudart.cuDeviceComputeCapability(int[], int[], int) |
| org.bytedeco.cuda.global.cudart.cuDeviceComputeCapability(IntBuffer, IntBuffer, int) |
| org.bytedeco.cuda.global.cudart.cuDeviceComputeCapability(IntPointer, IntPointer, int)
This function was deprecated as of CUDA 5.0 and its functionality superceded
by ::cuDeviceGetAttribute().
Returns in \p *major and \p *minor the major and minor revision numbers that
define the compute capability of the device \p dev.
|
| org.bytedeco.cuda.global.cudart.cuDeviceGetProperties(CUdevprop, int)
This function was deprecated as of CUDA 5.0 and replaced by ::cuDeviceGetAttribute().
Returns in \p *prop the properties of device \p dev. The ::CUdevprop
structure is defined as:
where:
- ::maxThreadsPerBlock is the maximum number of threads per block;
- ::maxThreadsDim[3] is the maximum sizes of each dimension of a block;
- ::maxGridSize[3] is the maximum sizes of each dimension of a grid;
- ::sharedMemPerBlock is the total amount of shared memory available per
block in bytes;
- ::totalConstantMemory is the total amount of constant memory available on
the device in bytes;
- ::SIMDWidth is the warp size;
- ::memPitch is the maximum pitch allowed by the memory copy functions that
involve memory regions allocated through ::cuMemAllocPitch();
- ::regsPerBlock is the total number of registers available per block;
- ::clockRate is the clock frequency in kilohertz;
- ::textureAlign is the alignment requirement; texture base addresses that
are aligned to ::textureAlign bytes do not need an offset applied to
texture fetches. |
| org.bytedeco.cuda.global.cudart.cuFuncSetBlockShape(CUfunc_st, int, int, int)
Specifies the \p x, \p y, and \p z dimensions of the thread blocks that are
created when the kernel given by \p hfunc is launched.
|
| org.bytedeco.cuda.global.cudart.cuFuncSetSharedSize(CUfunc_st, int)
Sets through \p bytes the amount of dynamic shared memory that will be
available to each thread block when the kernel given by \p hfunc is launched.
|
| org.bytedeco.cuda.global.cudart.cuLaunch(CUfunc_st)
Invokes the kernel \p f on a 1 x 1 x 1 grid of blocks. The block
contains the number of threads specified by a previous call to
::cuFuncSetBlockShape().
|
| org.bytedeco.cuda.global.cudart.cuLaunchGrid(CUfunc_st, int, int)
Invokes the kernel \p f on a \p grid_width x \p grid_height grid of
blocks. Each block contains the number of threads specified by a previous
call to ::cuFuncSetBlockShape().
|
| org.bytedeco.cuda.global.cudart.cuLaunchGridAsync(CUfunc_st, int, int, CUstream_st)
Invokes the kernel \p f on a \p grid_width x \p grid_height grid of
blocks. Each block contains the number of threads specified by a previous
call to ::cuFuncSetBlockShape().
|
| org.bytedeco.cuda.global.cudart.cuParamSetf(CUfunc_st, int, float)
Sets a floating-point parameter that will be specified the next time the
kernel corresponding to \p hfunc will be invoked. \p offset is a byte offset.
|
| org.bytedeco.cuda.global.cudart.cuParamSeti(CUfunc_st, int, int)
Sets an integer parameter that will be specified the next time the
kernel corresponding to \p hfunc will be invoked. \p offset is a byte offset.
|
| org.bytedeco.cuda.global.cudart.cuParamSetSize(CUfunc_st, int)
Sets through \p numbytes the total size in bytes needed by the function
parameters of the kernel corresponding to \p hfunc.
|
| org.bytedeco.cuda.global.cudart.cuParamSetTexRef(CUfunc_st, int, CUtexref_st)
Makes the CUDA array or linear memory bound to the texture reference
\p hTexRef available to a device program as a texture. In this version of
CUDA, the texture-reference must be obtained via ::cuModuleGetTexRef() and
the \p texunit parameter must be set to ::CU_PARAM_TR_DEFAULT.
|
| org.bytedeco.cuda.global.cudart.cuParamSetv(CUfunc_st, int, Pointer, int)
Copies an arbitrary amount of data (specified in \p numbytes) from \p ptr
into the parameter space of the kernel corresponding to \p hfunc. \p offset
is a byte offset.
|
| org.bytedeco.cuda.global.cudart.cuTexRefCreate(CUtexref_st)
Creates a texture reference and returns its handle in \p *pTexRef. Once
created, the application must call ::cuTexRefSetArray() or
::cuTexRefSetAddress() to associate the reference with allocated memory.
Other texture reference functions are used to specify the format and
interpretation (addressing, filtering, etc.) to be used when the memory is
read through this texture reference.
|
| org.bytedeco.cuda.global.cudart.cuTexRefDestroy(CUtexref_st)
Destroys the texture reference specified by \p hTexRef.
|
| org.bytedeco.cuda.cudart.cudaPointerAttributes.isManaged()
Indicates if this pointer points to managed memory
|
| org.bytedeco.cuda.cudart.cudaPointerAttributes.memoryType()
The physical location of the memory, ::cudaMemoryTypeHost or
::cudaMemoryTypeDevice. Note that managed memory can return either
::cudaMemoryTypeDevice or ::cudaMemoryTypeHost regardless of it's
physical location.
|
| org.bytedeco.cuda.global.nvml.NVML_DOUBLE_BIT_ECC()
Mapped to \ref NVML_MEMORY_ERROR_TYPE_UNCORRECTED
|
| org.bytedeco.cuda.global.nvml.NVML_SINGLE_BIT_ECC()
Mapped to \ref NVML_MEMORY_ERROR_TYPE_CORRECTED
|
| org.bytedeco.cuda.global.nvml.nvmlDeviceGetDetailedEccErrors(nvmlDevice_st, int, int, nvmlEccErrorCounts_t)
This API supports only a fixed set of ECC error locations
On different GPU architectures different locations are supported
See \ref nvmlDeviceGetMemoryErrorCounter
For Fermi &tm; or newer fully supported devices.
Only applicable to devices with ECC.
Requires \a NVML_INFOROM_ECC version 2.0 or higher to report aggregate location-based ECC counts.
Requires \a NVML_INFOROM_ECC version 1.0 or higher to report all other ECC counts.
Requires ECC Mode to be enabled.
Detailed errors provide separate ECC counts for specific parts of the memory system.
Reports zero for unsupported ECC error counters when a subset of ECC error counters are supported.
See \ref nvmlMemoryErrorType_t for a description of available bit types.\n
See \ref nvmlEccCounterType_t for a description of available counter types.\n
See \ref nvmlEccErrorCounts_t for a description of provided detailed ECC counts.
|
| org.bytedeco.cuda.global.nvml.nvmlDeviceGetHandleBySerial(BytePointer, nvmlDevice_st)
Since more than one GPU can exist on a single board this function is deprecated in favor
of \ref nvmlDeviceGetHandleByUUID.
For dual GPU boards this function will return NVML_ERROR_INVALID_ARGUMENT.
Starting from NVML 5, this API causes NVML to initialize the target GPU
NVML may initialize additional GPUs as it searches for the target GPU
|
Copyright © 2019. All rights reserved.