Attention

The vector search and clustering algorithms in RAFT are being migrated to a new library dedicated to vector search called cuVS. We will continue to support the vector search algorithms in RAFT during this move, but will no longer update them after the RAPIDS 24.06 (June) release. We plan to complete the migration by RAPIDS 24.08 (August) release.

Multi-node Multi-GPU#

RAFT contains C++ infrastructure for abstracting the communications layer when writing applications that scale on multiple nodes and across multiple GPUs. This infrastructure assumes OPG (one-process per GPU) architectures where multiple physical parallel units (processes, ranks, or workers) might be executing code concurrently but where each parallel unit is communicating with only a single GPU and is the only process communicating with each GPU.

The comms layer in RAFT is intended to provide a facade API for barrier synchronous collective communications, allowing users to write algorithms using a single abstraction layer and deploy in many different types of systems. Currently, RAFT communications code has been deployed in MPI, Dask, and Spark clusters.

Common Types#

#include <raft/core/comms.hpp>

namespace raft::comms

enum class datatype_t#

Values:

enumerator CHAR#

enumerator UINT8#

enumerator INT32#

enumerator UINT32#

enumerator INT64#

enumerator UINT64#

enumerator FLOAT32#

enumerator FLOAT64#

enum class op_t#

Values:

enumerator SUM#

enumerator PROD#

enumerator MIN#

enumerator MAX#

enum class status_t#

The resulting status of distributed stream synchronization

Values:

enumerator SUCCESS#

enumerator ERROR#

enumerator ABORT#

typedef unsigned int request_t#

template<typename value_t> constexpr datatype_t get_type()#

template<> constexpr datatype_t get_type<char>()#

template<> constexpr datatype_t get_type<uint8_t>()#

template<> constexpr datatype_t get_type<int>()#

template<> constexpr datatype_t get_type<uint32_t>()#

template<> constexpr datatype_t get_type<int64_t>()#

template<> constexpr datatype_t get_type<uint64_t>()#

template<> constexpr datatype_t get_type<float>()#

template<> constexpr datatype_t get_type<double>()#

Comms Interface#

class comms_t#

#include <comms.hpp>

Public Functions

inline virtual ~comms_t()#: Virtual Destructor to enable polymorphism

inline int get_size() const#: Returns the size of the communicator clique

inline int get_rank() const#: Returns the local rank

inline std::unique_ptr<comms_iface> comm_split(int color, int key) const#

Splits the current communicator clique into sub-cliques matching the given color and key

Parameters:

color – ranks w/ the same color are placed in the same communicator
key – controls rank assignment

inline void barrier() const#: Performs a collective barrier synchronization

inline status_t sync_stream(cudaStream_t stream) const#

Some collective communications implementations (eg. NCCL) might use asynchronous collectives that are explicitly synchronized. It’s important to always synchronize using this method to allow failures to propagate, rather than cudaStreamSynchronize(), to prevent the potential for deadlocks.

Parameters:: stream – the cuda stream to sync collective operations on

template<typename value_t> inline void isend(const value_t *buf, size_t size, int dest, int tag, request_t *request) const#

Performs an asynchronous point-to-point send

Template Parameters:

value_t – the type of data to send

Parameters:

buf – pointer to array of data to send
size – number of elements in buf
dest – destination rank
tag – a tag to use for the receiver to filter
request – pointer to hold returned request_t object. This will be used in waitall() to synchronize until the message is delivered (or fails).

template<typename value_t> inline void irecv(value_t *buf, size_t size, int source, int tag, request_t *request) const#

Performs an asynchronous point-to-point receive

Template Parameters:

value_t – the type of data to be received

Parameters:

buf – pointer to (initialized) array that will hold received data
size – number of elements in buf
source – source rank
tag – a tag to use for message filtering
request – pointer to hold returned request_t object. This will be used in waitall() to synchronize until the message is delivered (or fails).

inline void waitall(int count, request_t array_of_requests[]) const#

Synchronize on an array of request_t objects returned from isend/irecv

Parameters:

count – number of requests to synchronize on
array_of_requests – an array of request_t objects returned from isend/irecv

template<typename value_t> inline void allreduce(const value_t *sendbuff, value_t *recvbuff, size_t count, op_t op, cudaStream_t stream) const#

Perform an allreduce collective

Template Parameters: