Memory Resources¶
-
namespace mr¶
- group memory_resources
Functions
-
inline device_memory_resource *get_per_device_resource(cuda_device_id device_id)¶
Get the resource for the specified device.
Returns a pointer to the
device_memory_resource
for the specified device. The initial resource is acuda_memory_resource
.id.value()
must be in the range[0, cudaGetDeviceCount())
, otherwise behavior is undefined.This function is thread-safe with respect to concurrent calls to
set_per_device_resource
,get_per_device_resource
,get_current_device_resource
, andset_current_device_resource
. Concurrent calls to any of these functions will result in a valid state, but the order of execution is undefined.Note
The returned
device_memory_resource
should only be used when CUDA deviceid
is the current device (e.g. set usingcudaSetDevice()
). The behavior of a device_memory_resource is undefined if used while the active CUDA device is a different device from the one that was active when the device_memory_resource was created.- Parameters:
device_id – The id of the target device
- Returns:
Pointer to the current
device_memory_resource
for deviceid
-
inline device_memory_resource *set_per_device_resource(cuda_device_id device_id, device_memory_resource *new_mr)¶
Set the
device_memory_resource
for the specified device.If
new_mr
is notnullptr
, sets the memory resource pointer for the device specified byid
tonew_mr
. Otherwise, resetsid
s resource to the initialcuda_memory_resource
.id.value()
must be in the range[0, cudaGetDeviceCount())
, otherwise behavior is undefined.The object pointed to by
new_mr
must outlive the last use of the resource, otherwise behavior is undefined. It is the caller’s responsibility to maintain the lifetime of the resource object.This function is thread-safe with respect to concurrent calls to
set_per_device_resource
,get_per_device_resource
,get_current_device_resource
, andset_current_device_resource
. Concurrent calls to any of these functions will result in a valid state, but the order of execution is undefined.Note
The resource passed in
new_mr
must have been created when deviceid
was the current CUDA device (e.g. set usingcudaSetDevice()
). The behavior of a device_memory_resource is undefined if used while the active CUDA device is a different device from the one that was active when the device_memory_resource was created.- Parameters:
device_id – The id of the target device
new_mr – If not
nullptr
, pointer to newdevice_memory_resource
to use as new resource forid
- Returns:
Pointer to the previous memory resource for
id
-
inline device_memory_resource *get_current_device_resource()¶
Get the memory resource for the current device.
Returns a pointer to the resource set for the current device. The initial resource is a
cuda_memory_resource
.The “current device” is the device returned by
cudaGetDevice
.This function is thread-safe with respect to concurrent calls to
set_per_device_resource
,get_per_device_resource
,get_current_device_resource
, andset_current_device_resource
. Concurrent calls to any of these functions will result in a valid state, but the order of execution is undefined.Note
The returned
device_memory_resource
should only be used with the current CUDA device. Changing the current device (e.g. usingcudaSetDevice()
) and then using the returned resource can result in undefined behavior. The behavior of a device_memory_resource is undefined if used while the active CUDA device is a different device from the one that was active when the device_memory_resource was created.- Returns:
Pointer to the resource for the current device
-
inline device_memory_resource *set_current_device_resource(device_memory_resource *new_mr)¶
Set the memory resource for the current device.
If
new_mr
is notnullptr
, sets the resource pointer for the current device tonew_mr
. Otherwise, resets the resource to the initialcuda_memory_resource
.The “current device” is the device returned by
cudaGetDevice
.The object pointed to by
new_mr
must outlive the last use of the resource, otherwise behavior is undefined. It is the caller’s responsibility to maintain the lifetime of the resource object.This function is thread-safe with respect to concurrent calls to
set_per_device_resource
,get_per_device_resource
,get_current_device_resource
, andset_current_device_resource
. Concurrent calls to any of these functions will result in a valid state, but the order of execution is undefined.Note
The resource passed in
new_mr
must have been created for the current CUDA device. The behavior of a device_memory_resource is undefined if used while the active CUDA device is a different device from the one that was active when the device_memory_resource was created.- Parameters:
new_mr – If not
nullptr
, pointer to new resource to use for the current device- Returns:
Pointer to the previous resource for the current device
-
inline device_memory_resource *get_per_device_resource(cuda_device_id device_id)¶
- group device_memory_resources
Typedefs
-
using allocate_callback_t = std::function<void*(std::size_t, cuda_stream_view, void*)>¶
Callback function type used by callback memory resource for allocation.
The signature of the callback function is: `void* allocate_callback_t(std::size_t bytes, cuda_stream_view stream, void* arg);
Returns a pointer to an allocation of at least
bytes
usable immediately onstream
. The stream-ordered behavior requirements are identical todevice_memory_resource::allocate
.This signature is compatible with
do_allocate
but adds the extra function parameterarg
. Thearg
is provided to the constructor of thecallback_memory_resource
and will be forwarded along to every invocation of the callback function.
-
using deallocate_callback_t = std::function<void(void*, std::size_t, cuda_stream_view, void*)>¶
Callback function type used by callback_memory_resource for deallocation.
The signature of the callback function is: `void deallocate_callback_t(void* ptr, std::size_t bytes, cuda_stream_view stream, void* arg);
Deallocates memory pointed to by
ptr
.bytes
specifies the size of the allocation in bytes, and must equal the value ofbytes
that was passed to the allocate callback function. The stream-ordered behavior requirements are identical todevice_memory_resource::deallocate
.This signature is compatible with
do_deallocate
but adds the extra function parameterarg
. Thearg
is provided to the constructor of thecallback_memory_resource
and will be forwarded along to every invocation of the callback function.
-
template<typename Upstream>
class arena_memory_resource : public rmm::mr::device_memory_resource¶ - #include <arena_memory_resource.hpp>
A suballocator that emphasizes fragmentation avoidance and scalable concurrency support.
Allocation (do_allocate()) and deallocation (do_deallocate()) are thread-safe. Also, this class is compatible with CUDA per-thread default stream.
GPU memory is divided into a global arena, per-thread arenas for default streams, and per-stream arenas for non-default streams. Each arena allocates memory from the global arena in chunks called superblocks.
Blocks in each arena are allocated using address-ordered first fit. When a block is freed, it is coalesced with neighbouring free blocks if the addresses are contiguous. Free superblocks are returned to the global arena.
In real-world applications, allocation sizes tend to follow a power law distribution in which large allocations are rare, but small ones quite common. By handling small allocations in the per-thread arena, adequate performance can be achieved without introducing excessive memory fragmentation under high concurrency.
This design is inspired by several existing CPU memory allocators targeting multi-threaded applications (glibc malloc, Hoard, jemalloc, TCMalloc), albeit in a simpler form. Possible future improvements include using size classes, allocation caches, and more fine-grained locking or lock-free approaches.
See also
Wilson, P. R., Johnstone, M. S., Neely, M., & Boles, D. (1995, September). Dynamic storage allocation: A survey and critical review. In International Workshop on Memory Management (pp. 1-116). Springer, Berlin, Heidelberg.
See also
Berger, E. D., McKinley, K. S., Blumofe, R. D., & Wilson, P. R. (2000). Hoard: A scalable memory allocator for multithreaded applications. ACM Sigplan Notices, 35(11), 117-128.
See also
Evans, J. (2006, April). A scalable concurrent malloc (3) implementation for FreeBSD. In Proc. of the bsdcan conference, ottawa, canada.
See also
See also
See also
- Template Parameters:
Upstream – Memory resource to use for allocating memory for the global arena. Implements rmm::mr::device_memory_resource interface.
Public Functions
-
inline explicit arena_memory_resource(Upstream *upstream_mr, std::optional<std::size_t> arena_size = std::nullopt, bool dump_log_on_failure = false)¶
Construct an
arena_memory_resource
.- Throws:
rmm::logic_error – if
upstream_mr == nullptr
.- Parameters:
upstream_mr – The memory resource from which to allocate blocks for the global arena.
arena_size – Size in bytes of the global arena. Defaults to half of the available memory on the current device.
dump_log_on_failure – If true, dump memory log when running out of memory.
-
inline virtual bool supports_streams() const noexcept override¶
Queries whether the resource supports use of non-null CUDA streams for allocation/deallocation.
- Returns:
bool true.
-
template<typename Upstream>
class binning_memory_resource : public rmm::mr::device_memory_resource¶ - #include <binning_memory_resource.hpp>
Allocates memory from upstream resources associated with bin sizes.
- Template Parameters:
UpstreamResource – memory_resource to use for allocations that don’t fall within any configured bin size. Implements rmm::mr::device_memory_resource interface.
Public Functions
-
inline explicit binning_memory_resource(Upstream *upstream_resource)¶
Construct a new binning memory resource object.
Initially has no bins, so simply uses the upstream_resource until bin resources are added with
add_bin
.- Throws:
rmm::logic_error – if size_base is not a power of two.
- Parameters:
upstream_resource – The upstream memory resource used to allocate bin pools.
-
inline binning_memory_resource(Upstream *upstream_resource, int8_t min_size_exponent, int8_t max_size_exponent)¶
Construct a new binning memory resource object with a range of initial bins.
Constructs a new binning memory resource and adds bins backed by
fixed_size_memory_resource
in the range [2^min_size_exponent, 2^max_size_exponent]. For example ifmin_size_exponent==18
andmax_size_exponent==22
, creates bins of sizes 256KiB, 512KiB, 1024KiB, 2048KiB and 4096KiB.- Parameters:
upstream_resource – The upstream memory resource used to allocate bin pools.
min_size_exponent – The minimum base-2 exponent bin size.
max_size_exponent – The maximum base-2 exponent bin size.
-
~binning_memory_resource() override = default¶
Destroy the binning_memory_resource and free all memory allocated from the upstream resource.
-
inline virtual bool supports_streams() const noexcept override¶
Query whether the resource supports use of non-null streams for allocation/deallocation.
- Returns:
true
-
inline Upstream *get_upstream() const noexcept¶
Get the upstream memory_resource object.
- Returns:
UpstreamResource* the upstream memory resource.
-
inline void add_bin(std::size_t allocation_size, device_memory_resource *bin_resource = nullptr)¶
Add a bin allocator to this resource.
Adds
bin_resource
if it is not null; otherwise constructs and adds a fixed_size_memory_resource.This bin will be used for any allocation smaller than
allocation_size
that is larger than the next smaller bin’s allocation size.If there is already a bin of the specified size nothing is changed.
This function is not thread safe.
- Parameters:
allocation_size – The maximum size that this bin allocates
bin_resource – The memory resource for the bin
-
class callback_memory_resource : public rmm::mr::device_memory_resource¶
- #include <callback_memory_resource.hpp>
A device memory resource that uses the provided callbacks for memory allocation and deallocation.
Public Functions
-
inline callback_memory_resource(allocate_callback_t allocate_callback, deallocate_callback_t deallocate_callback, void *allocate_callback_arg = nullptr, void *deallocate_callback_arg = nullptr) noexcept¶
Construct a new callback memory resource.
Constructs a callback memory resource that uses the user-provided callbacks
allocate_callback
for allocation anddeallocate_callback
for deallocation.- Parameters:
allocate_callback – The callback function used for allocation
deallocate_callback – The callback function used for deallocation
allocate_callback_arg – Additional context passed to
allocate_callback
. It is the caller’s responsibility to maintain the lifetime of the pointed-to data for the duration of the lifetime of thecallback_memory_resource
.deallocate_callback_arg – Additional context passed to
deallocate_callback
. It is the caller’s responsibility to maintain the lifetime of the pointed-to data for the duration of the lifetime of thecallback_memory_resource
.
-
callback_memory_resource(callback_memory_resource&&) noexcept = default¶
Default move constructor.
-
callback_memory_resource &operator=(callback_memory_resource&&) noexcept = default¶
Default move assignment operator.
- Returns:
callback_memory_resource& Reference to the assigned object
-
inline callback_memory_resource(allocate_callback_t allocate_callback, deallocate_callback_t deallocate_callback, void *allocate_callback_arg = nullptr, void *deallocate_callback_arg = nullptr) noexcept¶
-
class cuda_async_memory_resource : public rmm::mr::device_memory_resource¶
- #include <cuda_async_memory_resource.hpp>
device_memory_resource
derived class that usescudaMallocAsync
/cudaFreeAsync
for allocation/deallocation.Public Types
-
enum class allocation_handle_type¶
Flags for specifying memory allocation handle types.
Note
These values are exact copies from
cudaMemAllocationHandleType
. We need to define our own enum here because the earliest CUDA runtime version that supports asynchronous memory pools (CUDA 11.2) did not support these flags, so we need a placeholder that can be used consistently in the constructor ofcuda_async_memory_resource
with all versions of CUDA >= 11.2. See thecudaMemAllocationHandleType
docs at https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.htmlValues:
-
enumerator none¶
Does not allow any export mechanism.
-
enumerator posix_file_descriptor¶
Allows a file descriptor to be used for exporting. Permitted only on POSIX systems.
-
enumerator win32¶
Allows a Win32 NT handle to be used for exporting. (HANDLE)
-
enumerator win32_kmt¶
Allows a Win32 KMT handle to be used for exporting. (D3DKMT_HANDLE)
-
enumerator none¶
Public Functions
-
inline cuda_async_memory_resource(thrust::optional<std::size_t> initial_pool_size = {}, thrust::optional<std::size_t> release_threshold = {}, thrust::optional<allocation_handle_type> export_handle_type = {})¶
Constructs a cuda_async_memory_resource with the optionally specified initial pool size and release threshold.
If the pool size grows beyond the release threshold, unused memory held by the pool will be released at the next synchronization event.
- Throws:
rmm::logic_error – if the CUDA version does not support
cudaMallocAsync
- Parameters:
initial_pool_size – Optional initial size in bytes of the pool. If no value is provided, initial pool size is half of the available GPU memory.
release_threshold – Optional release threshold size in bytes of the pool. If no value is provided, the release threshold is set to the total amount of memory on the current device.
export_handle_type – Optional
cudaMemAllocationHandleType
that allocations from this resource should support interprocess communication (IPC). Default iscudaMemHandleTypeNone
for no IPC support.
-
inline virtual bool supports_streams() const noexcept override¶
Query whether the resource supports use of non-null CUDA streams for allocation/deallocation.
cuda_memory_resource
does not support streams.- Returns:
bool true
-
enum class allocation_handle_type¶
-
class cuda_async_view_memory_resource : public rmm::mr::device_memory_resource¶
- #include <cuda_async_view_memory_resource.hpp>
device_memory_resource
derived class that usescudaMallocAsync
/cudaFreeAsync
for allocation/deallocation.Public Functions
-
cuda_async_view_memory_resource(cuda_async_view_memory_resource const&) = default¶
Default copy constructor.
-
cuda_async_view_memory_resource(cuda_async_view_memory_resource&&) = default¶
Default move constructor.
-
cuda_async_view_memory_resource &operator=(cuda_async_view_memory_resource const&) = default¶
Default copy assignment operator.
- Returns:
cuda_async_view_memory_resource& Reference to the assigned object
-
cuda_async_view_memory_resource &operator=(cuda_async_view_memory_resource&&) = default¶
Default move assignment operator.
- Returns:
cuda_async_view_memory_resource& Reference to the assigned object
-
inline virtual bool supports_streams() const noexcept override¶
Query whether the resource supports use of non-null CUDA streams for allocation/deallocation.
cuda_memory_resource
does not support streams.- Returns:
bool true
-
cuda_async_view_memory_resource(cuda_async_view_memory_resource const&) = default¶
-
class cuda_memory_resource : public rmm::mr::device_memory_resource¶
- #include <cuda_memory_resource.hpp>
device_memory_resource
derived class that uses cudaMalloc/Free for allocation/deallocation.Public Functions
-
cuda_memory_resource(cuda_memory_resource const&) = default¶
Default copy constructor.
-
cuda_memory_resource(cuda_memory_resource&&) = default¶
Default move constructor.
-
cuda_memory_resource &operator=(cuda_memory_resource const&) = default¶
Default copy assignment operator.
- Returns:
cuda_memory_resource& Reference to the assigned object
-
cuda_memory_resource &operator=(cuda_memory_resource&&) = default¶
Default move assignment operator.
- Returns:
cuda_memory_resource& Reference to the assigned object
-
inline virtual bool supports_streams() const noexcept override¶
Query whether the resource supports use of non-null CUDA streams for allocation/deallocation.
cuda_memory_resource
does not support streams.- Returns:
bool false
-
cuda_memory_resource(cuda_memory_resource const&) = default¶
-
class device_memory_resource¶
- #include <device_memory_resource.hpp>
Base class for all libcudf device memory allocation.
This class serves as the interface that all custom device memory implementations must satisfy.
There are two private, pure virtual functions that all derived classes must implement:
do_allocate
anddo_deallocate
. Optionally, derived classes may also overrideis_equal
. By default,is_equal
simply performs an identity comparison.The public, non-virtual functions
allocate
,deallocate
, andis_equal
simply call the private virtual functions. The reason for this is to allow implementing shared, default behavior in the base class. For example, the base class’allocate
function may log every allocation, no matter what derived class implementation is used.The
allocate
anddeallocate
APIs and implementations provide stream-ordered memory allocation. This allows optimizations such as re-using memory deallocated on the same stream without the overhead of stream synchronization.A call to
allocate(bytes, stream_a)
(on any derived class) returns a pointer that is valid to use onstream_a
. Using the memory on a different stream (saystream_b
) is Undefined Behavior unless the two streams are first synchronized, for example by usingcudaStreamSynchronize(stream_a)
or by recording a CUDA event onstream_a
and then callingcudaStreamWaitEvent(stream_b, event)
.The stream specified to deallocate() should be a stream on which it is valid to use the deallocated memory immediately for another allocation. Typically this is the stream on which the allocation was last used before the call to deallocate(). The passed stream may be used internally by a device_memory_resource for managing available memory with minimal synchronization, and it may also be synchronized at a later time, for example using a call to
cudaStreamSynchronize()
.For this reason, it is Undefined Behavior to destroy a CUDA stream that is passed to deallocate(). If the stream on which the allocation was last used has been destroyed before calling deallocate() or it is known that it will be destroyed, it is likely better to synchronize the stream (before destroying it) and then pass a different stream to deallocate() (e.g. the default stream).
A device_memory_resource should only be used when the active CUDA device is the same device that was active when the device_memory_resource was created. Otherwise behavior is undefined.
Creating a device_memory_resource for each device requires care to set the current device before creating each resource, and to maintain the lifetime of the resources as long as they are set as per-device resources. Here is an example loop that creates
unique_ptr
s to pool_memory_resource objects for each device and sets them as the per-device resource for that device.using pool_mr = rmm::mr::pool_memory_resource<rmm::mr::cuda_memory_resource>; std::vector<unique_ptr<pool_mr>> per_device_pools; for(int i = 0; i < N; ++i) { cudaSetDevice(i); // Note: for brevity, omitting creation of upstream and computing initial_size per_device_pools.push_back(std::make_unique<pool_mr>(upstream, initial_size)); set_per_device_resource(cuda_device_id{i}, &per_device_pools.back()); }
Subclassed by rmm::mr::aligned_resource_adaptor< Upstream >, rmm::mr::arena_memory_resource< Upstream >, rmm::mr::binning_memory_resource< Upstream >, rmm::mr::callback_memory_resource, rmm::mr::cuda_async_memory_resource, rmm::mr::cuda_async_view_memory_resource, rmm::mr::cuda_memory_resource, rmm::mr::failure_callback_resource_adaptor< Upstream, ExceptionType >, rmm::mr::limiting_resource_adaptor< Upstream >, rmm::mr::logging_resource_adaptor< Upstream >, rmm::mr::managed_memory_resource, rmm::mr::owning_wrapper< Resource, Upstreams >, rmm::mr::statistics_resource_adaptor< Upstream >, rmm::mr::thread_safe_resource_adaptor< Upstream >, rmm::mr::tracking_resource_adaptor< Upstream >
Public Functions
-
device_memory_resource(device_memory_resource const&) = default¶
Default copy constructor.
-
device_memory_resource(device_memory_resource&&) noexcept = default¶
Default move constructor.
-
device_memory_resource &operator=(device_memory_resource const&) = default¶
Default copy assignment operator.
- Returns:
device_memory_resource& Reference to the assigned object
-
device_memory_resource &operator=(device_memory_resource&&) noexcept = default¶
Default move assignment operator.
- Returns:
device_memory_resource& Reference to the assigned object
-
inline void *allocate(std::size_t bytes, cuda_stream_view stream = cuda_stream_view{})¶
Allocates memory of size at least
bytes
.The returned pointer will have at minimum 256 byte alignment.
If supported, this operation may optionally be executed on a stream. Otherwise, the stream is ignored and the null stream is used.
-
inline void deallocate(void *ptr, std::size_t bytes, cuda_stream_view stream = cuda_stream_view{})¶
Deallocate memory pointed to by
p
.p
must have been returned by a prior call toallocate(bytes, stream)
on adevice_memory_resource
that compares equal to*this
, and the storage it points to must not yet have been deallocated, otherwise behavior is undefined.If supported, this operation may optionally be executed on a stream. Otherwise, the stream is ignored and the null stream is used.
- Parameters:
ptr – Pointer to be deallocated
bytes – The size in bytes of the allocation. This must be equal to the value of
bytes
that was passed to theallocate
call that returnedp
.stream – Stream on which to perform deallocation
-
inline bool is_equal(device_memory_resource const &other) const noexcept¶
Compare this resource to another.
Two device_memory_resources compare equal if and only if memory allocated from one device_memory_resource can be deallocated from the other and vice versa.
By default, simply checks if
*this
andother
refer to the same object, i.e., does not check if they are two objects of the same class.- Parameters:
other – The other resource to compare to
- Returns:
If the two resources are equivalent
-
inline void *allocate(std::size_t bytes, std::size_t alignment)¶
Allocates memory of size at least
bytes
.The returned pointer will have at minimum 256 byte alignment.
-
inline void deallocate(void *ptr, std::size_t bytes, std::size_t alignment)¶
Deallocate memory pointed to by
p
.p
must have been returned by a prior call toallocate(bytes, stream)
on adevice_memory_resource
that compares equal to*this
, and the storage it points to must not yet have been deallocated, otherwise behavior is undefined.- Parameters:
ptr – Pointer to be deallocated
bytes – The size in bytes of the allocation. This must be equal to the value of
bytes
that was passed to theallocate
call that returnedp
.alignment – The alignment that was passed to the
allocate
call that returnedp
-
inline void *allocate_async(std::size_t bytes, std::size_t alignment, cuda_stream_view stream)¶
Allocates memory of size at least
bytes
.The returned pointer will have at minimum 256 byte alignment.
-
inline void *allocate_async(std::size_t bytes, cuda_stream_view stream)¶
Allocates memory of size at least
bytes
.The returned pointer will have at minimum 256 byte alignment.
-
inline void deallocate_async(void *ptr, std::size_t bytes, std::size_t alignment, cuda_stream_view stream)¶
Deallocate memory pointed to by
p
.p
must have been returned by a prior call toallocate(bytes, stream)
on adevice_memory_resource
that compares equal to*this
, and the storage it points to must not yet have been deallocated, otherwise behavior is undefined.- Parameters:
ptr – Pointer to be deallocated
bytes – The size in bytes of the allocation. This must be equal to the value of
bytes
that was passed to theallocate
call that returnedp
.alignment – The alignment that was passed to the
allocate
call that returnedp
stream – Stream on which to perform allocation
-
inline void deallocate_async(void *ptr, std::size_t bytes, cuda_stream_view stream)¶
Deallocate memory pointed to by
p
.p
must have been returned by a prior call toallocate(bytes, stream)
on adevice_memory_resource
that compares equal to*this
, and the storage it points to must not yet have been deallocated, otherwise behavior is undefined.- Parameters:
ptr – Pointer to be deallocated
bytes – The size in bytes of the allocation. This must be equal to the value of
bytes
that was passed to theallocate
call that returnedp
.stream – Stream on which to perform allocation
-
inline bool operator==(device_memory_resource const &other) const noexcept¶
Comparison operator with another device_memory_resource.
- Parameters:
other – The other resource to compare to
- Returns:
true If the two resources are equivalent
- Returns:
false If the two resources are not equivalent
-
inline bool operator!=(device_memory_resource const &other) const noexcept¶
Comparison operator with another device_memory_resource.
- Parameters:
other – The other resource to compare to
- Returns:
false If the two resources are equivalent
- Returns:
true If the two resources are not equivalent
-
virtual bool supports_streams() const noexcept = 0¶
Query whether the resource supports use of non-null CUDA streams for allocation/deallocation.
- Returns:
bool true if the resource supports non-null CUDA streams.
-
inline virtual bool supports_get_mem_info() const noexcept¶
Query whether the resource supports the get_mem_info API.
- Returns:
bool true if the resource supports get_mem_info, false otherwise.
-
inline std::pair<std::size_t, std::size_t> get_mem_info(cuda_stream_view stream) const¶
Queries the amount of free and total memory for the resource.
- Parameters:
stream – the stream whose memory manager we want to retrieve
- Returns:
a pair containing the free memory in bytes in .first and total amount of memory in .second
Friends
-
inline friend void get_property(device_memory_resource const&, cuda::mr::device_accessible) noexcept¶
Enables the
cuda::mr::device_accessible
property.This property declares that a
device_memory_resource
provides device accessible memory
-
device_memory_resource(device_memory_resource const&) = default¶
-
template<typename Upstream>
class fixed_size_memory_resource : public detail::stream_ordered_memory_resource<fixed_size_memory_resource<Upstream>, detail::fixed_size_free_list>¶ - #include <fixed_size_memory_resource.hpp>
A
device_memory_resource
which allocates memory blocks of a single fixed size.Supports only allocations of size smaller than the configured block_size.
Public Functions
-
inline explicit fixed_size_memory_resource(Upstream *upstream_mr, std::size_t block_size = default_block_size, std::size_t blocks_to_preallocate = default_blocks_to_preallocate)¶
Construct a new
fixed_size_memory_resource
that allocates memory fromupstream_resource
.When the pool of blocks is all allocated, grows the pool by allocating
blocks_to_preallocate
more blocks fromupstream_mr
.- Parameters:
upstream_mr – The memory_resource from which to allocate blocks for the pool.
block_size – The size of blocks to allocate.
blocks_to_preallocate – The number of blocks to allocate to initialize the pool.
-
inline ~fixed_size_memory_resource() override¶
Destroy the
fixed_size_memory_resource
and free all memory allocated from upstream.
-
inline bool supports_streams() const noexcept override¶
Query whether the resource supports use of non-null streams for allocation/deallocation.
- Returns:
true
-
inline Upstream *get_upstream() const noexcept¶
Get the upstream memory_resource object.
- Returns:
UpstreamResource* the upstream memory resource.
-
inline std::size_t get_block_size() const noexcept¶
Get the size of blocks allocated by this memory resource.
- Returns:
std::size_t size in bytes of allocated blocks.
Public Static Attributes
-
static constexpr std::size_t default_block_size = 1 << 20¶
Default allocation block size.
-
static constexpr std::size_t default_blocks_to_preallocate = 128¶
The number of blocks that the pool starts out with, and also the number of blocks by which the pool grows when all of its current blocks are allocated
-
inline explicit fixed_size_memory_resource(Upstream *upstream_mr, std::size_t block_size = default_block_size, std::size_t blocks_to_preallocate = default_blocks_to_preallocate)¶
-
class managed_memory_resource : public rmm::mr::device_memory_resource¶
- #include <managed_memory_resource.hpp>
device_memory_resource
derived class that uses cudaMallocManaged/Free for allocation/deallocation.Public Functions
-
managed_memory_resource(managed_memory_resource const&) = default¶
Default copy constructor.
-
managed_memory_resource(managed_memory_resource&&) = default¶
Default move constructor.
-
managed_memory_resource &operator=(managed_memory_resource const&) = default¶
Default copy assignment operator.
- Returns:
managed_memory_resource& Reference to the assigned object
-
managed_memory_resource &operator=(managed_memory_resource&&) = default¶
Default move assignment operator.
- Returns:
managed_memory_resource& Reference to the assigned object
-
inline virtual bool supports_streams() const noexcept override¶
Query whether the resource supports use of non-null streams for allocation/deallocation.
- Returns:
false
-
managed_memory_resource(managed_memory_resource const&) = default¶
-
template<typename Upstream>
class pool_memory_resource : public detail::maybe_remove_property<pool_memory_resource<Upstream>, Upstream, cuda::mr::device_accessible>, public detail::stream_ordered_memory_resource<pool_memory_resource<Upstream>, detail::coalescing_free_list>, public cuda::forward_property<pool_memory_resource<Upstream>, Upstream>¶ - #include <pool_memory_resource.hpp>
A coalescing best-fit suballocator which uses a pool of memory allocated from an upstream memory_resource.
Allocation (do_allocate()) and deallocation (do_deallocate()) are thread-safe. Also, this class is compatible with CUDA per-thread default stream.
- Template Parameters:
UpstreamResource – memory_resource to use for allocating the pool. Implements rmm::mr::device_memory_resource interface.
Public Functions
-
inline explicit pool_memory_resource(Upstream *upstream_mr, thrust::optional<std::size_t> initial_pool_size = thrust::nullopt, thrust::optional<std::size_t> maximum_pool_size = thrust::nullopt)¶
Construct a
pool_memory_resource
and allocate the initial device memory pool usingupstream_mr
.- Deprecated:
Use the constructor that takes an explicit initial pool size instead.
- Throws:
rmm::logic_error – if
upstream_mr == nullptr
rmm::logic_error – if
initial_pool_size
is neither the default nor aligned to a multiple of pool_memory_resource::allocation_alignment bytes.rmm::logic_error – if
maximum_pool_size
is neither the default nor aligned to a multiple of pool_memory_resource::allocation_alignment bytes.
- Parameters:
upstream_mr – The memory_resource from which to allocate blocks for the pool.
initial_pool_size – Minimum size, in bytes, of the initial pool. Defaults to zero.
maximum_pool_size – Maximum size, in bytes, that the pool can grow to. Defaults to all of the available memory from the upstream resource.
-
template<typename Upstream2 = Upstream, cuda::std::enable_if_t<cuda::mr::async_resource<Upstream2>, int> = 0>
inline explicit pool_memory_resource(Upstream2 &upstream_mr, thrust::optional<std::size_t> initial_pool_size = thrust::nullopt, thrust::optional<std::size_t> maximum_pool_size = thrust::nullopt)¶ Construct a
pool_memory_resource
and allocate the initial device memory pool usingupstream_mr
.- Deprecated:
Use the constructor that takes an explicit initial pool size instead.
- Throws:
rmm::logic_error – if
upstream_mr == nullptr
rmm::logic_error – if
initial_pool_size
is neither the default nor aligned to a multiple of pool_memory_resource::allocation_alignment bytes.rmm::logic_error – if
maximum_pool_size
is neither the default nor aligned to a multiple of pool_memory_resource::allocation_alignment bytes.
- Parameters:
upstream_mr – The memory_resource from which to allocate blocks for the pool.
initial_pool_size – Minimum size, in bytes, of the initial pool. Defaults to zero.
maximum_pool_size – Maximum size, in bytes, that the pool can grow to. Defaults to all of the available memory from the upstream resource.
-
inline explicit pool_memory_resource(Upstream *upstream_mr, std::size_t initial_pool_size, thrust::optional<std::size_t> maximum_pool_size = thrust::nullopt)¶
Construct a
pool_memory_resource
and allocate the initial device memory pool usingupstream_mr
.- Throws:
rmm::logic_error – if
upstream_mr == nullptr
rmm::logic_error – if
initial_pool_size
is not aligned to a multiple of pool_memory_resource::allocation_alignment bytes.rmm::logic_error – if
maximum_pool_size
is neither the default nor aligned to a multiple of pool_memory_resource::allocation_alignment bytes.
- Parameters:
upstream_mr – The memory_resource from which to allocate blocks for the pool.
initial_pool_size – Minimum size, in bytes, of the initial pool.
maximum_pool_size – Maximum size, in bytes, that the pool can grow to. Defaults to all of the available from the upstream resource.
-
template<typename Upstream2 = Upstream, cuda::std::enable_if_t<cuda::mr::async_resource<Upstream2>, int> = 0>
inline explicit pool_memory_resource(Upstream2 &upstream_mr, std::size_t initial_pool_size, thrust::optional<std::size_t> maximum_pool_size = thrust::nullopt)¶ Construct a
pool_memory_resource
and allocate the initial device memory pool usingupstream_mr
.- Throws:
rmm::logic_error – if
upstream_mr == nullptr
rmm::logic_error – if
initial_pool_size
is not aligned to a multiple of pool_memory_resource::allocation_alignment bytes.rmm::logic_error – if
maximum_pool_size
is neither the default nor aligned to a multiple of pool_memory_resource::allocation_alignment bytes.
- Parameters:
upstream_mr – The memory_resource from which to allocate blocks for the pool.
initial_pool_size – Minimum size, in bytes, of the initial pool.
maximum_pool_size – Maximum size, in bytes, that the pool can grow to. Defaults to all of the available memory from the upstream resource.
-
inline ~pool_memory_resource() override¶
Destroy the
pool_memory_resource
and deallocate all memory it allocated using the upstream resource.
-
inline bool supports_streams() const noexcept override¶
Queries whether the resource supports use of non-null CUDA streams for allocation/deallocation.
- Returns:
bool true.
-
inline const Upstream &upstream_resource() const noexcept¶
Get the upstream memory_resource object.
- Returns:
const reference to the upstream memory resource.
-
inline Upstream *get_upstream() const noexcept¶
Get the upstream memory_resource object.
- Returns:
UpstreamResource* the upstream memory resource.
-
inline std::size_t pool_size() const noexcept¶
Computes the size of the current pool.
Includes allocated as well as free memory.
- Returns:
std::size_t The total size of the currently allocated pool.
-
using allocate_callback_t = std::function<void*(std::size_t, cuda_stream_view, void*)>¶
- group host_memory_resources
-
class host_memory_resource¶
- #include <host_memory_resource.hpp>
Base class for host memory allocation.
This is based on
std::pmr::memory_resource
: https://en.cppreference.com/w/cpp/memory/memory_resourceWhen C++17 is available for use in RMM,
rmm::host_memory_resource
should inherit fromstd::pmr::memory_resource
.This class serves as the interface that all host memory resource implementations must satisfy.
There are two private, pure virtual functions that all derived classes must implement:
do_allocate
anddo_deallocate
. Optionally, derived classes may also overrideis_equal
. By default,is_equal
simply performs an identity comparison.The public, non-virtual functions
allocate
,deallocate
, andis_equal
simply call the private virtual functions. The reason for this is to allow implementing shared, default behavior in the base class. For example, the base class’allocate
function may log every allocation, no matter what derived class implementation is used.Subclassed by rmm::mr::new_delete_resource, rmm::mr::pinned_memory_resource
Public Functions
-
host_memory_resource(host_memory_resource const&) = default¶
Default copy constructor.
-
host_memory_resource(host_memory_resource&&) noexcept = default¶
Default move constructor.
-
host_memory_resource &operator=(host_memory_resource const&) = default¶
Default copy assignment operator.
- Returns:
host_memory_resource& Reference to the assigned object
-
host_memory_resource &operator=(host_memory_resource&&) noexcept = default¶
Default move assignment operator.
- Returns:
host_memory_resource& Reference to the assigned object
-
inline void *allocate(std::size_t bytes, std::size_t alignment = alignof(std::max_align_t))¶
Allocates memory on the host of size at least
bytes
bytes.The returned storage is aligned to the specified
alignment
if supported, and toalignof(std::max_align_t)
otherwise.- Throws:
std::bad_alloc – When the requested
bytes
andalignment
cannot be allocated.- Parameters:
bytes – The size of the allocation
alignment – Alignment of the allocation
- Returns:
void* Pointer to the newly allocated memory
-
inline void deallocate(void *ptr, std::size_t bytes, std::size_t alignment = alignof(std::max_align_t))¶
Deallocate memory pointed to by
ptr
.ptr
must have been returned by a prior call toallocate(bytes,alignment)
on ahost_memory_resource
that compares equal to*this
, and the storage it points to must not yet have been deallocated, otherwise behavior is undefined.- Parameters:
ptr – Pointer to be deallocated
bytes – The size in bytes of the allocation. This must be equal to the value of
bytes
that was passed to theallocate
call that returnedptr
.alignment – Alignment of the allocation. This must be equal to the value of
alignment
that was passed to theallocate
call that returnedptr
.
-
inline bool is_equal(host_memory_resource const &other) const noexcept¶
Compare this resource to another.
Two
host_memory_resource
s compare equal if and only if memory allocated from onehost_memory_resource
can be deallocated from the other and vice versa.By default, simply checks if
*this
andother
refer to the same object, i.e., does not check if they are two objects of the same class.- Parameters:
other – The other resource to compare to
- Returns:
true if the two resources are equivalent
-
inline bool operator==(host_memory_resource const &other) const noexcept¶
Comparison operator with another device_memory_resource.
- Parameters:
other – The other resource to compare to
- Returns:
true If the two resources are equivalent
- Returns:
false If the two resources are not equivalent
-
inline bool operator!=(host_memory_resource const &other) const noexcept¶
Comparison operator with another device_memory_resource.
- Parameters:
other – The other resource to compare to
- Returns:
false If the two resources are equivalent
- Returns:
true If the two resources are not equivalent
Friends
-
inline friend void get_property(host_memory_resource const&, cuda::mr::host_accessible) noexcept¶
Enables the
cuda::mr::host_accessible
property.This property declares that a
host_memory_resource
provides host accessible memory
-
host_memory_resource(host_memory_resource const&) = default¶
-
class new_delete_resource : public rmm::mr::host_memory_resource¶
- #include <new_delete_resource.hpp>
A
host_memory_resource
that uses the globaloperator new
andoperator delete
to allocate host memory.Public Functions
-
new_delete_resource(new_delete_resource const&) = default¶
Default copy constructor.
-
new_delete_resource(new_delete_resource&&) = default¶
Default move constructor.
-
new_delete_resource &operator=(new_delete_resource const&) = default¶
Default copy assignment operator.
- Returns:
new_delete_resource& Reference to the assigned object
-
new_delete_resource &operator=(new_delete_resource&&) = default¶
Default move assignment operator.
- Returns:
new_delete_resource& Reference to the assigned object
-
new_delete_resource(new_delete_resource const&) = default¶
-
class pinned_memory_resource : public rmm::mr::host_memory_resource¶
- #include <pinned_memory_resource.hpp>
A
host_memory_resource
that usescudaMallocHost
to allocate pinned/page-locked host memory.See https://devblogs.nvidia.com/how-optimize-data-transfers-cuda-cc/
Public Functions
-
pinned_memory_resource(pinned_memory_resource const&) = default¶
Default copy constructor.
-
pinned_memory_resource(pinned_memory_resource&&) = default¶
Default move constructor.
-
pinned_memory_resource &operator=(pinned_memory_resource const&) = default¶
Default copy assignment operator.
- Returns:
pinned_memory_resource& Reference to the assigned object
-
pinned_memory_resource &operator=(pinned_memory_resource&&) = default¶
Default move assignment operator.
- Returns:
pinned_memory_resource& Reference to the assigned object
-
inline bool supports_streams() const noexcept¶
Query whether the pinned_memory_resource supports use of non-null CUDA streams for allocation/deallocation.
- Returns:
bool false.
-
inline void *allocate_async(std::size_t bytes, std::size_t alignment, cuda_stream_view)¶
Pretend to support the allocate_async interface, falling back to stream 0.
-
inline void *allocate_async(std::size_t bytes, cuda_stream_view)¶
Pretend to support the allocate_async interface, falling back to stream 0.
-
inline void deallocate_async(void *ptr, std::size_t bytes, std::size_t alignment, cuda_stream_view)¶
Pretend to support the deallocate_async interface, falling back to stream 0.
- Parameters:
ptr – Pointer to be deallocated
bytes – The size in bytes of the allocation. This must be equal to the value of
bytes
that was passed to theallocate
call that returnedp
.alignment – The alignment that was passed to the
allocate
call that returnedp
Friends
-
inline friend void get_property(pinned_memory_resource const&, cuda::mr::device_accessible) noexcept¶
Enables the
cuda::mr::device_accessible
property.This property declares that a
pinned_memory_resource
provides device accessible memory
-
pinned_memory_resource(pinned_memory_resource const&) = default¶
-
class host_memory_resource¶
- group device_resource_adaptors
Typedefs
-
using failure_callback_t = std::function<bool(std::size_t, void*)>¶
Callback function type used by failure_callback_resource_adaptor.
The resource adaptor calls this function when a memory allocation throws a specified exception type. The function decides whether the resource adaptor should try to allocate the memory again or re-throw the exception.
The callback function signature is:
bool failure_callback_t(std::size_t bytes, void* callback_arg)
The callback function is passed two parameters:
bytes
is the size of the failed memory allocation andarg
is the extra argument passed to the constructor of thefailure_callback_resource_adaptor
. The callback function returns a Boolean where true means to retry the memory allocation and false means to re-throw the exception.
Functions
-
template<typename Upstream>
limiting_resource_adaptor<Upstream> make_limiting_adaptor(Upstream *upstream, std::size_t allocation_limit)¶ Convenience factory to return a
limiting_resource_adaptor
around the upstream resourceupstream
.- Template Parameters:
Upstream – Type of the upstream
device_memory_resource
.- Parameters:
upstream – Pointer to the upstream resource
allocation_limit – Maximum amount of memory to allocate
- Returns:
The new limiting resource adaptor
-
template<typename Upstream>
logging_resource_adaptor<Upstream> make_logging_adaptor(Upstream *upstream, std::string const &filename = logging_resource_adaptor<Upstream>::get_default_filename(), bool auto_flush = false)¶ Convenience factory to return a
logging_resource_adaptor
around the upstream resourceupstream
.- Template Parameters:
Upstream – Type of the upstream
device_memory_resource
.- Parameters:
upstream – Pointer to the upstream resource
filename – Name of the file to write log info. If not specified, retrieves the log file name from the environment variable “RMM_LOG_FILE”.
auto_flush – If true, flushes the log for every (de)allocation. Warning, this will degrade performance.
- Returns:
The new logging resource adaptor
-
template<typename Upstream>
logging_resource_adaptor<Upstream> make_logging_adaptor(Upstream *upstream, std::ostream &stream, bool auto_flush = false)¶ Convenience factory to return a
logging_resource_adaptor
around the upstream resourceupstream
.- Template Parameters:
Upstream – Type of the upstream
device_memory_resource
.- Parameters:
upstream – Pointer to the upstream resource
stream – The ostream to write log info.
auto_flush – If true, flushes the log for every (de)allocation. Warning, this will degrade performance.
- Returns:
The new logging resource adaptor
Constructs a resource of type
Resource
wrapped in anowning_wrapper
usingupstreams
as the upstream resources andargs
as the additional parameters for the constructor ofResource
.template <typename Upstream1, typename Upstream2> class example_resource{ example_resource(Upstream1 * u1, Upstream2 * u2, int n, float f); }; auto cuda_mr = std::make_shared<rmm::mr::cuda_memory_resource>(); auto cuda_upstreams = std::make_tuple(cuda_mr, cuda_mr); // Constructs an `example_resource<rmm::mr::cuda_memory_resource, rmm::mr::cuda_memory_resource>` // wrapped by an `owning_wrapper` taking shared ownership of `cuda_mr` and using it as both of // `example_resource`s upstream resources. Forwards the arguments `42` and `3.14` to the // additional `n` and `f` arguments of `example_resource` constructor. auto wrapped_example = rmm::mr::make_owning_wrapper<example_resource>(cuda_upstreams, 42, 3.14);
- Template Parameters:
Resource – Template template parameter specifying the type of the wrapped resource to construct
Upstreams – Types of the upstream resources
Args – Types of the arguments used in
Resource
s constructor
- Parameters:
upstreams – Tuple of
std::shared_ptr
s to the upstreams used by the wrapped resource, in the same order as expected byResource
s constructor.args – Function parameter pack of arguments to forward to the wrapped resource’s constructor
- Returns:
An
owning_wrapper
wrapping a newly constructedResource<Upstreams...>
andupstreams
.
Additional convenience factory for
owning_wrapper
whenResource
has only a single upstream resource.When a resource has only a single upstream, it can be inconvenient to construct a
std::tuple
of the upstream resource. This factory allows specifying the single upstream as just astd::shared_ptr
.- Template Parameters:
Resource – Type of the wrapped resource to construct
Upstream – Type of the single upstream resource
Args – Types of the arguments used in
Resource
s constructor
- Parameters:
upstream –
std::shared_ptr
to the upstream resourceargs – Function parameter pack of arguments to forward to the wrapped resource’s constructor
- Returns:
An
owning_wrapper
wrapping a newly constructResource<Upstream>
andupstream
.
-
template<typename Upstream>
statistics_resource_adaptor<Upstream> make_statistics_adaptor(Upstream *upstream)¶ Convenience factory to return a
statistics_resource_adaptor
around the upstream resourceupstream
.- Template Parameters:
Upstream – Type of the upstream
device_memory_resource
.- Parameters:
upstream – Pointer to the upstream resource
- Returns:
The new statistics resource adaptor
-
template<typename Upstream>
tracking_resource_adaptor<Upstream> make_tracking_adaptor(Upstream *upstream)¶ Convenience factory to return a
tracking_resource_adaptor
around the upstream resourceupstream
.- Template Parameters:
Upstream – Type of the upstream
device_memory_resource
.- Parameters:
upstream – Pointer to the upstream resource
- Returns:
The new tracking resource adaptor
-
template<typename Upstream>
class aligned_resource_adaptor : public rmm::mr::device_memory_resource¶ - #include <aligned_resource_adaptor.hpp>
Resource that adapts
Upstream
memory resource to allocate memory in a specified alignment size.An instance of this resource can be constructed with an existing, upstream resource in order to satisfy allocation requests. This adaptor wraps allocations and deallocations from Upstream using the given alignment size.
By default, any address returned by one of the memory allocation routines from the CUDA driver or runtime API is always aligned to at least 256 bytes. For some use cases, such as GPUDirect Storage (GDS), allocations need to be aligned to a larger size (4 KiB for GDS) in order to avoid additional copies to bounce buffers.
Since a larger alignment size has some additional overhead, the user can specify a threshold size. If an allocation’s size falls below the threshold, it is aligned to the default size. Only allocations with a size above the threshold are aligned to the custom alignment size.
- Template Parameters:
Upstream – Type of the upstream resource used for allocation/deallocation.
Public Functions
-
inline explicit aligned_resource_adaptor(Upstream *upstream, std::size_t alignment = rmm::CUDA_ALLOCATION_ALIGNMENT, std::size_t alignment_threshold = default_alignment_threshold)¶
Construct an aligned resource adaptor using
upstream
to satisfy allocation requests.- Throws:
rmm::logic_error – if
upstream == nullptr
rmm::logic_error – if
allocation_alignment
is not a power of 2
- Parameters:
upstream – The resource used for allocating/deallocating device memory.
alignment – The size used for allocation alignment.
alignment_threshold – Only allocations with a size larger than or equal to this threshold are aligned.
-
inline Upstream *get_upstream() const noexcept¶
Get the upstream memory resource.
- Returns:
Upstream* pointer to a memory resource object.
-
inline virtual bool supports_streams() const noexcept override¶
Query whether the resource supports use of non-null CUDA streams for allocation/deallocation.
- Returns:
bool true if the resource supports non-null CUDA streams.
Public Static Attributes
-
static constexpr std::size_t default_alignment_threshold = 0¶
The default alignment used by the adaptor.
-
template<typename Upstream, typename ExceptionType = rmm::out_of_memory>
class failure_callback_resource_adaptor : public rmm::mr::device_memory_resource¶ - #include <failure_callback_resource_adaptor.hpp>
A device memory resource that calls a callback function when allocations throw a specified exception type.
An instance of this resource must be constructed with an existing, upstream resource in order to satisfy allocation requests.
The callback function takes an allocation size and a callback argument and returns a bool representing whether to retry the allocation (true) or re-throw the caught exception (false).
When implementing a callback function for allocation retry, care must be taken to avoid an infinite loop. The following example makes sure to only retry the allocation once:
using failure_callback_adaptor = rmm::mr::failure_callback_resource_adaptor<rmm::mr::device_memory_resource>; bool failure_handler(std::size_t bytes, void* arg) { bool& retried = *reinterpret_cast<bool*>(arg); if (!retried) { retried = true; return true; // First time we request an allocation retry } return false; // Second time we let the adaptor throw std::bad_alloc } int main() { bool retried{false}; failure_callback_adaptor mr{ rmm::mr::get_current_device_resource(), failure_handler, &retried }; rmm::mr::set_current_device_resource(&mr); }
- Template Parameters:
Upstream – The type of the upstream resource used for allocation/deallocation.
ExceptionType – The type of exception that this adaptor should respond to
Public Types
-
using exception_type = ExceptionType¶
The type of exception this object catches/throws.
Public Functions
-
inline failure_callback_resource_adaptor(Upstream *upstream, failure_callback_t callback, void *callback_arg)¶
Construct a new
failure_callback_resource_adaptor
usingupstream
to satisfy allocation requests.See also
failure_callback_t
- Throws:
rmm::logic_error – if
upstream == nullptr
- Parameters:
upstream – The resource used for allocating/deallocating device memory
callback – Callback function
callback_arg – Extra argument passed to
callback
-
failure_callback_resource_adaptor(failure_callback_resource_adaptor&&) noexcept = default¶
Default move constructor.
-
failure_callback_resource_adaptor &operator=(failure_callback_resource_adaptor&&) noexcept = default¶
Default move assignment operator.
- Returns:
failure_callback_resource_adaptor& Reference to the assigned object
-
inline Upstream *get_upstream() const noexcept¶
Pointer to the upstream resource.
- Returns:
Pointer to the upstream resource
-
inline virtual bool supports_streams() const noexcept override¶
Checks whether the upstream resource supports streams.
- Returns:
true The upstream resource supports streams
- Returns:
false The upstream resource does not support streams.
-
template<typename Upstream>
class limiting_resource_adaptor : public rmm::mr::device_memory_resource¶ - #include <limiting_resource_adaptor.hpp>
Resource that uses
Upstream
to allocate memory and limits the total allocations possible.An instance of this resource can be constructed with an existing, upstream resource in order to satisfy allocation requests, but any existing allocations will be untracked. Atomics are used to make this thread-safe, but note that the
get_allocated_bytes
may not include in-flight allocations.- Template Parameters:
Upstream – Type of the upstream resource used for allocation/deallocation.
Public Functions
-
inline limiting_resource_adaptor(Upstream *upstream, std::size_t allocation_limit, std::size_t alignment = rmm::CUDA_ALLOCATION_ALIGNMENT)¶
Construct a new limiting resource adaptor using
upstream
to satisfy allocation requests and limiting the total allocation amount possible.- Throws:
rmm::logic_error – if
upstream == nullptr
- Parameters:
upstream – The resource used for allocating/deallocating device memory
allocation_limit – Maximum memory allowed for this allocator
alignment – Alignment in bytes for the start of each allocated buffer
-
limiting_resource_adaptor(limiting_resource_adaptor&&) noexcept = default¶
Default move constructor.
-
limiting_resource_adaptor &operator=(limiting_resource_adaptor&&) noexcept = default¶
Default move assignment operator.
- Returns:
limiting_resource_adaptor& Reference to the assigned object
-
inline Upstream *get_upstream() const noexcept¶
Pointer to the upstream resource.
- Returns:
Pointer to the upstream resource
-
inline virtual bool supports_streams() const noexcept override¶
Checks whether the upstream resource supports streams.
- Returns:
true The upstream resource supports streams
- Returns:
false The upstream resource does not support streams.
-
inline std::size_t get_allocated_bytes() const¶
Query the number of bytes that have been allocated. Note that this can not be used to know how large of an allocation is possible due to both possible fragmentation and also internal page sizes and alignment that is not tracked by this allocator.
- Returns:
std::size_t number of bytes that have been allocated through this allocator.
-
inline std::size_t get_allocation_limit() const¶
Query the maximum number of bytes that this allocator is allowed to allocate. This is the limit on the allocator and not a representation of the underlying device. The device may not be able to support this limit.
- Returns:
std::size_t max number of bytes allowed for this allocator
-
template<typename Upstream>
class logging_resource_adaptor : public rmm::mr::device_memory_resource¶ - #include <logging_resource_adaptor.hpp>
Resource that uses
Upstream
to allocate memory and logs information about the requested allocation/deallocations.An instance of this resource can be constructed with an existing, upstream resource in order to satisfy allocation requests and log allocation/deallocation activity.
- Template Parameters:
Upstream – Type of the upstream resource used for allocation/deallocation.
Public Functions
-
inline logging_resource_adaptor(Upstream *upstream, std::string const &filename = get_default_filename(), bool auto_flush = false)¶
Construct a new logging resource adaptor using
upstream
to satisfy allocation requests and logging information about each allocation/free to the file specified byfilename
.The logfile will be written using CSV formatting.
Clears the contents of
filename
if it already exists.Creating multiple
logging_resource_adaptor
s with the samefilename
will result in undefined behavior.- Throws:
rmm::logic_error – if
upstream == nullptr
spdlog::spdlog_ex – if opening
filename
failed
- Parameters:
upstream – The resource used for allocating/deallocating device memory
filename – Name of file to write log info. If not specified, retrieves the file name from the environment variable “RMM_LOG_FILE”.
auto_flush – If true, flushes the log for every (de)allocation. Warning, this will degrade performance.
-
inline logging_resource_adaptor(Upstream *upstream, std::ostream &stream, bool auto_flush = false)¶
Construct a new logging resource adaptor using
upstream
to satisfy allocation requests and logging information about each allocation/free to the ostream specified bystream
.The logfile will be written using CSV formatting.
- Throws:
rmm::logic_error – if
upstream == nullptr
- Parameters:
upstream – The resource used for allocating/deallocating device memory
stream – The ostream to write log info.
auto_flush – If true, flushes the log for every (de)allocation. Warning, this will degrade performance.
-
inline logging_resource_adaptor(Upstream *upstream, spdlog::sinks_init_list sinks, bool auto_flush = false)¶
Construct a new logging resource adaptor using
upstream
to satisfy allocation requests and logging information about each allocation/free to the ostream specified bystream
.The logfile will be written using CSV formatting.
- Throws:
rmm::logic_error – if
upstream == nullptr
- Parameters:
upstream – The resource used for allocating/deallocating device memory
sinks – A list of logging sinks to which log output will be written.
auto_flush – If true, flushes the log for every (de)allocation. Warning, this will degrade performance.
-
logging_resource_adaptor(logging_resource_adaptor&&) noexcept = default¶
Default move constructor.
-
logging_resource_adaptor &operator=(logging_resource_adaptor&&) noexcept = default¶
Default move assignment operator.
- Returns:
logging_resource_adaptor& Reference to the assigned object
-
inline Upstream *get_upstream() const noexcept¶
Return pointer to the upstream resource.
- Returns:
Upstream* Pointer to the upstream resource.
-
inline virtual bool supports_streams() const noexcept override¶
Checks whether the upstream resource supports streams.
- Returns:
true The upstream resource supports streams
- Returns:
false The upstream resource does not support streams.
-
inline void flush()¶
Flush logger contents.
-
inline std::string header() const¶
Return the CSV header string.
- Returns:
CSV formatted header string of column names
Public Static Functions
-
static inline std::string get_default_filename()¶
Return the value of the environment variable RMM_LOG_FILE.
- Throws:
rmm::logic_error – if
RMM_LOG_FILE
is not set.- Returns:
The value of RMM_LOG_FILE as
std::string
.
-
template<typename Resource, typename ...Upstreams>
class owning_wrapper : public rmm::mr::device_memory_resource¶ - #include <owning_wrapper.hpp>
Resource adaptor that maintains the lifetime of upstream resources.
Many
device_memory_resource
derived types allocate memory from another “upstream” resource. E.g.,pool_memory_resource
allocates its pool from an upstream resource. Typically, a resource does not own its upstream, and therefore it is the user’s responsibility to maintain the lifetime of the upstream resource. This can be inconvenient and error prone, especially for resources with complex upstreams that may themselves also have an upstream.owning_wrapper
simplifies lifetime management of a resource,wrapped
, by taking shared ownership of all upstream resources via astd::shared_ptr
.For convenience, it is recommended to use the
make_owning_wrapper
factory instead of constructing anowning_wrapper
directly.Example:
auto cuda = std::make_shared<rmm::mr::cuda_memory_resource>(); auto pool = rmm::mr::make_owning_wrapper<rmm::mr::pool_memory_resource>(cuda,initial_pool_size, max_pool_size); // The `cuda` resource will be kept alive for the lifetime of `pool` and automatically be // destroyed after `pool` is destroyed
- Template Parameters:
Resource – Type of the wrapped resource
Upstreams – Template parameter pack of the types of the upstream resources used by
Resource
Public Types
Public Functions
-
template<typename ...Args>
inline owning_wrapper(upstream_tuple upstreams, Args&&... args)¶ Constructs the wrapped resource using the provided upstreams and any additional arguments forwarded to the wrapped resources constructor.
Resource
is required to have a constructor whose first argument(s) are raw pointers to its upstream resources in the same order asupstreams
, followed by any additional arguments in the same order asargs
.Example:
template <typename Upstream1, typename Upstream2> class example_resource{ example_resource(Upstream1 * u1, Upstream2 * u2, int n, float f); }; using cuda = rmm::mr::cuda_memory_resource; using example = example_resource<cuda,cuda>; using wrapped_example = rmm::mr::owning_wrapper<example, cuda, cuda>; auto cuda_mr = std::make_shared<cuda>(); // Constructs an `example_resource` wrapped by an `owning_wrapper` taking shared ownership of //`cuda_mr` and using it as both of `example_resource`s upstream resources. Forwards the // arguments `42` and `3.14` to the additional `n` and `f` arguments of `example_resources` // constructor. wrapped_example w{std::make_tuple(cuda_mr,cuda_mr), 42, 3.14};
- Template Parameters:
Args – Template parameter pack to forward to the wrapped resource’s constructor
- Parameters:
upstreams – Tuple of
std::shared_ptr
s to the upstreams used by the wrapped resource, in the same order as expected byResource
s constructor.args – Function parameter pack of arguments to forward to the wrapped resource’s constructor
-
inline Resource const &wrapped() const noexcept¶
A constant reference to the wrapped resource.
- Returns:
A constant reference to the wrapped resource
-
inline Resource &wrapped() noexcept¶
A reference to the wrapped resource.
- Returns:
A reference to the wrapped resource
-
inline virtual bool supports_streams() const noexcept override¶
Query whether the resource supports use of non-null CUDA streams for allocation/deallocation.
- Returns:
bool true if the resource supports non-null CUDA streams.
-
template<typename Upstream>
class statistics_resource_adaptor : public rmm::mr::device_memory_resource¶ - #include <statistics_resource_adaptor.hpp>
Resource that uses
Upstream
to allocate memory and tracks statistics on memory allocations.An instance of this resource can be constructed with an existing, upstream resource in order to satisfy allocation requests, but any existing allocations will be untracked. Tracking statistics stores the current, peak and total memory allocations for both the number of bytes and number of calls to the memory resource.
statistics_resource_adaptor
is intended as a debug adaptor and shouldn’t be used in performance-sensitive code.- Template Parameters:
Upstream – Type of the upstream resource used for allocation/deallocation.
Public Types
-
using read_lock_t = std::shared_lock<std::shared_timed_mutex>¶
Type of lock used to synchronize read access.
-
using write_lock_t = std::unique_lock<std::shared_timed_mutex>¶
Type of lock used to synchronize write access.
Public Functions
-
inline statistics_resource_adaptor(Upstream *upstream)¶
Construct a new statistics resource adaptor using
upstream
to satisfy allocation requests.- Throws:
rmm::logic_error – if
upstream == nullptr
- Parameters:
upstream – The resource used for allocating/deallocating device memory
-
statistics_resource_adaptor(statistics_resource_adaptor&&) noexcept = default¶
Default move constructor.
-
statistics_resource_adaptor &operator=(statistics_resource_adaptor&&) noexcept = default¶
Default move assignment operator.
- Returns:
statistics_resource_adaptor& Reference to the assigned object
-
inline Upstream *get_upstream() const noexcept¶
Pointer to the upstream resource.
- Returns:
Pointer to the upstream resource
-
inline virtual bool supports_streams() const noexcept override¶
Checks whether the upstream resource supports streams.
- Returns:
true The upstream resource supports streams
- Returns:
false The upstream resource does not support streams.
-
struct counter¶
- #include <statistics_resource_adaptor.hpp>
Utility struct for counting the current, peak, and total value of a number.
-
template<typename Upstream>
class thread_safe_resource_adaptor : public rmm::mr::device_memory_resource¶ - #include <thread_safe_resource_adaptor.hpp>
Resource that adapts
Upstream
memory resource adaptor to be thread safe.An instance of this resource can be constructured with an existing, upstream resource in order to satisfy allocation requests. This adaptor wraps allocations and deallocations from Upstream in a mutex lock.
- Template Parameters:
Upstream – Type of the upstream resource used for allocation/deallocation.
Public Types
-
using lock_t = std::lock_guard<std::mutex>¶
Type of lock used to synchronize access.
Public Functions
-
inline thread_safe_resource_adaptor(Upstream *upstream)¶
Construct a new thread safe resource adaptor using
upstream
to satisfy allocation requests.All allocations and frees are protected by a mutex lock
- Throws:
rmm::logic_error – if
upstream == nullptr
- Parameters:
upstream – The resource used for allocating/deallocating device memory.
-
inline Upstream *get_upstream() const noexcept¶
Get the upstream memory resource.
- Returns:
Upstream* pointer to a memory resource object.
-
inline virtual bool supports_streams() const noexcept override¶
Query whether the resource supports use of non-null CUDA streams for allocation/deallocation.
- Returns:
bool true if the resource supports non-null CUDA streams.
-
template<typename T>
class thrust_allocator : public thrust::device_malloc_allocator<T>¶ - #include <thrust_allocator_adaptor.hpp>
An
allocator
compatible with Thrust containers and algorithms using adevice_memory_resource
for memory (de)allocation.Unlike a
device_memory_resource
,thrust_allocator
is typed and bound to allocate objects of a specific typeT
, but can be freely rebound to other types.- Template Parameters:
T – The type of the objects that will be allocated by this allocator
Public Types
Public Functions
-
thrust_allocator() = default¶
Default constructor creates an allocator using the default memory resource and default stream.
-
inline explicit thrust_allocator(cuda_stream_view stream)¶
Constructs a
thrust_allocator
using the default device memory resource and specified stream.- Parameters:
stream – The stream to be used for device memory (de)allocation
-
inline thrust_allocator(cuda_stream_view stream, async_resource_ref mr)¶
Constructs a
thrust_allocator
using a device memory resource and stream.- Parameters:
mr – The resource to be used for device memory allocation
stream – The stream to be used for device memory (de)allocation
-
template<typename U>
inline thrust_allocator(thrust_allocator<U> const &other)¶ Copy constructor. Copies the resource pointer and stream.
- Parameters:
other – The
thrust_allocator
to copy
-
inline pointer allocate(size_type num)¶
Allocate objects of type
T
- Parameters:
num – The number of elements of type
T
to allocate- Returns:
pointer Pointer to the newly allocated storage
-
inline void deallocate(pointer ptr, size_type num)¶
Deallocates objects of type
T
- Parameters:
ptr – Pointer returned by a previous call to
allocate
num – number of elements, must be equal to the argument passed to the prior
allocate
call that producedp
-
inline async_resource_ref memory_resource() const noexcept¶
The async_resource_ref used to allocate and deallocate.
- Returns:
The async_resource_ref used to allocate and deallocate
-
inline cuda_stream_view stream() const noexcept¶
The stream used by this allocator.
- Returns:
The stream used by this allocator
Friends
-
inline friend void get_property(thrust_allocator const&, cuda::mr::device_accessible) noexcept¶
Enables the
cuda::mr::device_accessible
property.This property declares that a
thrust_allocator
provides device accessible memory
-
template<typename U>
struct rebind¶ - #include <thrust_allocator_adaptor.hpp>
Provides the type of a
thrust_allocator
instantiated with another type.- Template Parameters:
U – the other type to use for instantiation
Public Types
-
using other = thrust_allocator<U>¶
The type to bind to.
-
template<typename Upstream>
class tracking_resource_adaptor : public rmm::mr::device_memory_resource¶ - #include <tracking_resource_adaptor.hpp>
Resource that uses
Upstream
to allocate memory and tracks allocations.An instance of this resource can be constructed with an existing, upstream resource in order to satisfy allocation requests, but any existing allocations will be untracked. Tracking stores a size and pointer for every allocation, and a stack frame if
capture_stacks
is true, so it can add significant overhead.tracking_resource_adaptor
is intended as a debug adaptor and shouldn’t be used in performance-sensitive code. Note that callstacks may not contain all symbols unless the project is linked with-rdynamic
. This can be accomplished withadd_link_options(-rdynamic)
in cmake.- Template Parameters:
Upstream – Type of the upstream resource used for allocation/deallocation.
Public Types
-
using read_lock_t = std::shared_lock<std::shared_timed_mutex>¶
Type of lock used to synchronize read access.
-
using write_lock_t = std::unique_lock<std::shared_timed_mutex>¶
Type of lock used to synchronize write access.
Public Functions
-
inline tracking_resource_adaptor(Upstream *upstream, bool capture_stacks = false)¶
Construct a new tracking resource adaptor using
upstream
to satisfy allocation requests.- Throws:
rmm::logic_error – if
upstream == nullptr
- Parameters:
upstream – The resource used for allocating/deallocating device memory
capture_stacks – If true, capture stacks for allocation calls
-
tracking_resource_adaptor(tracking_resource_adaptor&&) noexcept = default¶
Default move constructor.
-
tracking_resource_adaptor &operator=(tracking_resource_adaptor&&) noexcept = default¶
Default move assignment operator.
- Returns:
tracking_resource_adaptor& Reference to the assigned object
-
inline Upstream *get_upstream() const noexcept¶
Pointer to the upstream resource.
- Returns:
Pointer to the upstream resource
-
inline virtual bool supports_streams() const noexcept override¶
Checks whether the upstream resource supports streams.
- Returns:
true The upstream resource supports streams
- Returns:
false The upstream resource does not support streams.
-
inline std::map<void*, allocation_info> const &get_outstanding_allocations() const noexcept¶
Get the outstanding allocations map.
- Returns:
std::map<void*, allocation_info> const& of a map of allocations. The key is the allocated memory pointer and the data is the allocation_info structure, which contains size and, potentially, stack traces.
-
inline std::size_t get_allocated_bytes() const noexcept¶
Query the number of bytes that have been allocated. Note that this can not be used to know how large of an allocation is possible due to both possible fragmentation and also internal page sizes and alignment that is not tracked by this allocator.
- Returns:
std::size_t number of bytes that have been allocated through this allocator.
-
inline std::string get_outstanding_allocations_str() const¶
Gets a string containing the outstanding allocation pointers, their size, and optionally the stack trace for when each pointer was allocated.
Stack traces are only included if this resource adaptor was created with
capture_stack == true
. Otherwise, outstanding allocation pointers will be shown with their size and empty stack traces.- Returns:
std::string Containing the outstanding allocation pointers.
-
inline void log_outstanding_allocations() const¶
Log any outstanding allocations via RMM_LOG_DEBUG.
-
struct allocation_info¶
- #include <tracking_resource_adaptor.hpp>
Information stored about an allocation. Includes the size and a stack trace if the
tracking_resource_adaptor
was initialized to capture stacks.Public Functions
-
inline allocation_info(std::size_t size, bool capture_stack)¶
Construct a new allocation info object.
- Parameters:
size – Size of the allocation
capture_stack – If true, capture the stack trace for the allocation
-
inline allocation_info(std::size_t size, bool capture_stack)¶
-
using failure_callback_t = std::function<bool(std::size_t, void*)>¶