A suballocator that emphasizes fragmentation avoidance and scalable concurrency support. More...

#include <arena_memory_resource.hpp>

Inheritance diagram for rmm::mr::arena_memory_resource< Upstream >:

Collaboration diagram for rmm::mr::arena_memory_resource< Upstream >:

Public Member Functions
	arena_memory_resource (Upstream *upstream_mr, std::optional< std::size_t > arena_size=std::nullopt, bool dump_log_on_failure=false)
	Construct an `arena_memory_resource`. More...

	arena_memory_resource (arena_memory_resource const &)=delete

arena_memory_resource &	operator= (arena_memory_resource const &)=delete

	arena_memory_resource (arena_memory_resource &&) noexcept=delete

arena_memory_resource &	operator= (arena_memory_resource &&) noexcept=delete

bool	supports_streams () const noexcept override
	Queries whether the resource supports use of non-null CUDA streams for allocation/deallocation. More...

bool	supports_get_mem_info () const noexcept override
	Query whether the resource supports the get_mem_info API. More...

Public Member Functions inherited from rmm::mr::device_memory_resource
	device_memory_resource (device_memory_resource const &)=default
	Default copy constructor.

	device_memory_resource (device_memory_resource &&) noexcept=default
	Default move constructor.

device_memory_resource &	operator= (device_memory_resource const &)=default
	Default copy assignment operator. More...

device_memory_resource &	operator= (device_memory_resource &&) noexcept=default
	Default move assignment operator. More...

void *	allocate (std::size_t bytes, cuda_stream_view stream=cuda_stream_view{})
	Allocates memory of size at least `bytes`. More...

void	deallocate (void *ptr, std::size_t bytes, cuda_stream_view stream=cuda_stream_view{})
	Deallocate memory pointed to by `p`. More...

bool	is_equal (device_memory_resource const &other) const noexcept
	Compare this resource to another. More...

void *	allocate (std::size_t bytes, std::size_t alignment)
	Allocates memory of size at least `bytes`. More...

void	deallocate (void *ptr, std::size_t bytes, std::size_t alignment)
	Deallocate memory pointed to by `p`. More...

void *	allocate_async (std::size_t bytes, std::size_t alignment, cuda_stream_view stream)
	Allocates memory of size at least `bytes`. More...

void *	allocate_async (std::size_t bytes, cuda_stream_view stream)
	Allocates memory of size at least `bytes`. More...

void	deallocate_async (void *ptr, std::size_t bytes, std::size_t alignment, cuda_stream_view stream)
	Deallocate memory pointed to by `p`. More...

void	deallocate_async (void *ptr, std::size_t bytes, cuda_stream_view stream)
	Deallocate memory pointed to by `p`. More...

bool	operator== (device_memory_resource const &other) const noexcept
	Comparison operator with another device_memory_resource. More...

bool	operator!= (device_memory_resource const &other) const noexcept
	Comparison operator with another device_memory_resource. More...

std::pair< std::size_t, std::size_t >	get_mem_info (cuda_stream_view stream) const
	Queries the amount of free and total memory for the resource. More...

Detailed Description

template<typename Upstream>
class rmm::mr::arena_memory_resource< Upstream >

A suballocator that emphasizes fragmentation avoidance and scalable concurrency support.

Allocation (do_allocate()) and deallocation (do_deallocate()) are thread-safe. Also, this class is compatible with CUDA per-thread default stream.

GPU memory is divided into a global arena, per-thread arenas for default streams, and per-stream arenas for non-default streams. Each arena allocates memory from the global arena in chunks called superblocks.

Blocks in each arena are allocated using address-ordered first fit. When a block is freed, it is coalesced with neighbouring free blocks if the addresses are contiguous. Free superblocks are returned to the global arena.

In real-world applications, allocation sizes tend to follow a power law distribution in which large allocations are rare, but small ones quite common. By handling small allocations in the per-thread arena, adequate performance can be achieved without introducing excessive memory fragmentation under high concurrency.

This design is inspired by several existing CPU memory allocators targeting multi-threaded applications (glibc malloc, Hoard, jemalloc, TCMalloc), albeit in a simpler form. Possible future improvements include using size classes, allocation caches, and more fine-grained locking or lock-free approaches.

See also: Wilson, P. R., Johnstone, M. S., Neely, M., & Boles, D. (1995, September). Dynamic storage allocation: A survey and critical review. In International Workshop on Memory Management (pp. 1-116). Springer, Berlin, Heidelberg.; Berger, E. D., McKinley, K. S., Blumofe, R. D., & Wilson, P. R. (2000). Hoard: A scalable memory allocator for multithreaded applications. ACM Sigplan Notices, 35(11), 117-128.; Evans, J. (2006, April). A scalable concurrent malloc (3) implementation for FreeBSD. In Proc. of the bsdcan conference, ottawa, canada.; https://sourceware.org/glibc/wiki/MallocInternals; http://hoard.org/; http://jemalloc.net/; https://github.com/google/tcmalloc

Template Parameters

Upstream Memory resource to use for allocating memory for the global arena. Implements rmm::mr::device_memory_resource interface.

Constructor & Destructor Documentation

◆ arena_memory_resource()

template<typename Upstream >

rmm::mr::arena_memory_resource< Upstream >::arena_memory_resource	(	Upstream *	upstream_mr,
		std::optional< std::size_t >	arena_size = `std::nullopt`,
		bool	dump_log_on_failure = `false`
	)

inlineexplicit

Construct an arena_memory_resource.

Exceptions

rmm::logic_error if upstream_mr == nullptr.

Parameters

upstream_mr	The memory resource from which to allocate blocks for the global arena.
arena_size	Size in bytes of the global arena. Defaults to half of the available memory on the current device.
dump_log_on_failure	If true, dump memory log when running out of memory.

Member Function Documentation

◆ supports_get_mem_info()

template<typename Upstream >

bool rmm::mr::arena_memory_resource< Upstream >::supports_get_mem_info ( ) const

inlineoverridevirtualnoexcept

Query whether the resource supports the get_mem_info API.

Returns: bool false.

Implements rmm::mr::device_memory_resource.

◆ supports_streams()

template<typename Upstream >

bool rmm::mr::arena_memory_resource< Upstream >::supports_streams ( ) const

inlineoverridevirtualnoexcept

Queries whether the resource supports use of non-null CUDA streams for allocation/deallocation.

Returns: bool true.

Implements rmm::mr::device_memory_resource.

The documentation for this class was generated from the following file:

arena_memory_resource.hpp

Public Member Functions

Detailed Description

template<typename Upstream> class rmm::mr::arena_memory_resource< Upstream >

Constructor & Destructor Documentation

◆ arena_memory_resource()

Member Function Documentation

◆ supports_get_mem_info()

◆ supports_streams()

template<typename Upstream>
class rmm::mr::arena_memory_resource< Upstream >