RMM
23.12
RAPIDS Memory Manager
|
A suballocator that emphasizes fragmentation avoidance and scalable concurrency support. More...
#include <arena_memory_resource.hpp>
Public Member Functions | |
arena_memory_resource (Upstream *upstream_mr, std::optional< std::size_t > arena_size=std::nullopt, bool dump_log_on_failure=false) | |
Construct an arena_memory_resource . More... | |
arena_memory_resource (arena_memory_resource const &)=delete | |
arena_memory_resource & | operator= (arena_memory_resource const &)=delete |
arena_memory_resource (arena_memory_resource &&) noexcept=delete | |
arena_memory_resource & | operator= (arena_memory_resource &&) noexcept=delete |
bool | supports_streams () const noexcept override |
Queries whether the resource supports use of non-null CUDA streams for allocation/deallocation. More... | |
bool | supports_get_mem_info () const noexcept override |
Query whether the resource supports the get_mem_info API. More... | |
Public Member Functions inherited from rmm::mr::device_memory_resource | |
device_memory_resource (device_memory_resource const &)=default | |
Default copy constructor. | |
device_memory_resource (device_memory_resource &&) noexcept=default | |
Default move constructor. | |
device_memory_resource & | operator= (device_memory_resource const &)=default |
Default copy assignment operator. More... | |
device_memory_resource & | operator= (device_memory_resource &&) noexcept=default |
Default move assignment operator. More... | |
void * | allocate (std::size_t bytes, cuda_stream_view stream=cuda_stream_view{}) |
Allocates memory of size at least bytes . More... | |
void | deallocate (void *ptr, std::size_t bytes, cuda_stream_view stream=cuda_stream_view{}) |
Deallocate memory pointed to by p . More... | |
bool | is_equal (device_memory_resource const &other) const noexcept |
Compare this resource to another. More... | |
void * | allocate (std::size_t bytes, std::size_t alignment) |
Allocates memory of size at least bytes . More... | |
void | deallocate (void *ptr, std::size_t bytes, std::size_t alignment) |
Deallocate memory pointed to by p . More... | |
void * | allocate_async (std::size_t bytes, std::size_t alignment, cuda_stream_view stream) |
Allocates memory of size at least bytes . More... | |
void * | allocate_async (std::size_t bytes, cuda_stream_view stream) |
Allocates memory of size at least bytes . More... | |
void | deallocate_async (void *ptr, std::size_t bytes, std::size_t alignment, cuda_stream_view stream) |
Deallocate memory pointed to by p . More... | |
void | deallocate_async (void *ptr, std::size_t bytes, cuda_stream_view stream) |
Deallocate memory pointed to by p . More... | |
bool | operator== (device_memory_resource const &other) const noexcept |
Comparison operator with another device_memory_resource. More... | |
bool | operator!= (device_memory_resource const &other) const noexcept |
Comparison operator with another device_memory_resource. More... | |
std::pair< std::size_t, std::size_t > | get_mem_info (cuda_stream_view stream) const |
Queries the amount of free and total memory for the resource. More... | |
A suballocator that emphasizes fragmentation avoidance and scalable concurrency support.
Allocation (do_allocate()) and deallocation (do_deallocate()) are thread-safe. Also, this class is compatible with CUDA per-thread default stream.
GPU memory is divided into a global arena, per-thread arenas for default streams, and per-stream arenas for non-default streams. Each arena allocates memory from the global arena in chunks called superblocks.
Blocks in each arena are allocated using address-ordered first fit. When a block is freed, it is coalesced with neighbouring free blocks if the addresses are contiguous. Free superblocks are returned to the global arena.
In real-world applications, allocation sizes tend to follow a power law distribution in which large allocations are rare, but small ones quite common. By handling small allocations in the per-thread arena, adequate performance can be achieved without introducing excessive memory fragmentation under high concurrency.
This design is inspired by several existing CPU memory allocators targeting multi-threaded applications (glibc malloc, Hoard, jemalloc, TCMalloc), albeit in a simpler form. Possible future improvements include using size classes, allocation caches, and more fine-grained locking or lock-free approaches.
Upstream | Memory resource to use for allocating memory for the global arena. Implements rmm::mr::device_memory_resource interface. |
|
inlineexplicit |
Construct an arena_memory_resource
.
rmm::logic_error | if upstream_mr == nullptr . |
upstream_mr | The memory resource from which to allocate blocks for the global arena. |
arena_size | Size in bytes of the global arena. Defaults to half of the available memory on the current device. |
dump_log_on_failure | If true, dump memory log when running out of memory. |
|
inlineoverridevirtualnoexcept |
Query whether the resource supports the get_mem_info API.
Implements rmm::mr::device_memory_resource.
|
inlineoverridevirtualnoexcept |
Queries whether the resource supports use of non-null CUDA streams for allocation/deallocation.
Implements rmm::mr::device_memory_resource.