libcudf  24.02.00
Modules | Files | Enumerations | Functions

Modules

 Concatenating
 
 Gathering
 
 Scattering
 
 Slicing
 
 Splitting
 
 Shifting
 

Files

file  copying.hpp
 Column APIs for gather, scatter, split, slice, etc.
 

Enumerations

enum class  cudf::out_of_bounds_policy : bool { cudf::NULLIFY , cudf::DONT_CHECK }
 Policy to account for possible out-of-bounds indices. More...
 
enum class  cudf::mask_allocation_policy : int32_t { cudf::NEVER , cudf::RETAIN , cudf::ALWAYS }
 Indicates when to allocate a mask, based on an existing mask. More...
 
enum class  cudf::sample_with_replacement : bool { cudf::FALSE , cudf::TRUE }
 Indicates whether a row can be sampled more than once. More...
 

Functions

std::unique_ptr< tablecudf::reverse (table_view const &source_table, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Reverses the rows within a table. More...
 
std::unique_ptr< columncudf::reverse (column_view const &source_column, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Reverses the elements of a column. More...
 
std::unique_ptr< columncudf::empty_like (column_view const &input)
 Initializes and returns an empty column of the same type as the input. More...
 
std::unique_ptr< columncudf::empty_like (scalar const &input)
 Initializes and returns an empty column of the same type as the input. More...
 
std::unique_ptr< columncudf::allocate_like (column_view const &input, mask_allocation_policy mask_alloc=mask_allocation_policy::RETAIN, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Creates an uninitialized new column of the same size and type as the input. More...
 
std::unique_ptr< columncudf::allocate_like (column_view const &input, size_type size, mask_allocation_policy mask_alloc=mask_allocation_policy::RETAIN, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Creates an uninitialized new column of the specified size and same type as the input. More...
 
std::unique_ptr< tablecudf::empty_like (table_view const &input_table)
 Creates a table of empty columns with the same types as the input_table More...
 
void cudf::copy_range_in_place (column_view const &source, mutable_column_view &target, size_type source_begin, size_type source_end, size_type target_begin, rmm::cuda_stream_view stream=cudf::get_default_stream())
 Copies a range of elements in-place from one column to another. More...
 
std::unique_ptr< columncudf::copy_range (column_view const &source, column_view const &target, size_type source_begin, size_type source_end, size_type target_begin, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Copies a range of elements out-of-place from one column to another. More...
 
std::unique_ptr< columncudf::copy_if_else (column_view const &lhs, column_view const &rhs, column_view const &boolean_mask, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns a new column, where each element is selected from either lhs or rhs based on the value of the corresponding element in boolean_mask. More...
 
std::unique_ptr< columncudf::copy_if_else (scalar const &lhs, column_view const &rhs, column_view const &boolean_mask, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns a new column, where each element is selected from either lhs or rhs based on the value of the corresponding element in boolean_mask. More...
 
std::unique_ptr< columncudf::copy_if_else (column_view const &lhs, scalar const &rhs, column_view const &boolean_mask, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns a new column, where each element is selected from either lhs or rhs based on the value of the corresponding element in boolean_mask. More...
 
std::unique_ptr< columncudf::copy_if_else (scalar const &lhs, scalar const &rhs, column_view const &boolean_mask, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns a new column, where each element is selected from either lhs or rhs based on the value of the corresponding element in boolean_mask. More...
 
std::unique_ptr< scalarcudf::get_element (column_view const &input, size_type index, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Get the element at specified index from a column. More...
 
std::unique_ptr< tablecudf::sample (table_view const &input, size_type const n, sample_with_replacement replacement=sample_with_replacement::FALSE, int64_t const seed=0, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Gather n samples from given input randomly. More...
 
bool cudf::has_nonempty_nulls (column_view const &input, rmm::cuda_stream_view stream=cudf::get_default_stream())
 Checks if a column or its descendants have non-empty null rows. More...
 
bool cudf::may_have_nonempty_nulls (column_view const &input)
 Approximates if a column or its descendants may have non-empty null elements. More...
 
std::unique_ptr< columncudf::purge_nonempty_nulls (column_view const &input, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Copy input into output while purging any non-empty null rows in the column or its descendants. More...
 

Detailed Description

Enumeration Type Documentation

◆ mask_allocation_policy

enum cudf::mask_allocation_policy : int32_t
strong

Indicates when to allocate a mask, based on an existing mask.

Enumerator
NEVER 

Do not allocate a null mask, regardless of input.

RETAIN 

Allocate a null mask if the input contains one.

ALWAYS 

Allocate a null mask, regardless of input.

Definition at line 214 of file copying.hpp.

◆ out_of_bounds_policy

enum cudf::out_of_bounds_policy : bool
strong

Policy to account for possible out-of-bounds indices.

NULLIFY means to nullify output values corresponding to out-of-bounds gather_map values. DONT_CHECK means do not check whether the indices are out-of-bounds, for better performance.

Enumerator
NULLIFY 

Output values corresponding to out-of-bounds indices are null.

DONT_CHECK 

No bounds checking is performed, better performance.

Definition at line 48 of file copying.hpp.

◆ sample_with_replacement

enum cudf::sample_with_replacement : bool
strong

Indicates whether a row can be sampled more than once.

Enumerator
FALSE 

A row can be sampled only once.

TRUE 

A row can be sampled more than once.

Definition at line 799 of file copying.hpp.

Function Documentation

◆ allocate_like() [1/2]

Creates an uninitialized new column of the same size and type as the input.

Supports only fixed-width types.

If the mask_alloc allocates a validity mask that mask is also uninitialized and the validity bits and the null count should be set by the caller.

Parameters
inputImmutable view of input column to emulate
mask_allocOptional, Policy for allocating null mask. Defaults to RETAIN
mrDevice memory resource used to allocate the returned column's device memory
streamCUDA stream used for device memory operations and kernel launches
Returns
A column with sufficient uninitialized capacity to hold the same number of elements as input of the same type as input.type()

◆ allocate_like() [2/2]

Creates an uninitialized new column of the specified size and same type as the input.

Supports only fixed-width types.

If the mask_alloc allocates a validity mask that mask is also uninitialized and the validity bits and the null count should be set by the caller.

Parameters
inputImmutable view of input column to emulate
sizeThe desired number of elements that the new column should have capacity for
mask_allocOptional, Policy for allocating null mask. Defaults to RETAIN
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned column's device memory
Returns
A column with sufficient uninitialized capacity to hold the specified number of elements as input of the same type as input.type()

◆ copy_if_else() [1/4]

std::unique_ptr<column> cudf::copy_if_else ( column_view const &  lhs,
column_view const &  rhs,
column_view const &  boolean_mask,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource mr = rmm::mr::get_current_device_resource() 
)

Returns a new column, where each element is selected from either lhs or rhs based on the value of the corresponding element in boolean_mask.

Selects each element i in the output column from either rhs or lhs using the following rule: output[i] = (boolean_mask.valid(i) and boolean_mask[i]) ? lhs[i] : rhs[i]

Exceptions
cudf::logic_errorif lhs and rhs are not of the same type
cudf::logic_errorif lhs and rhs are not of the same length
cudf::logic_errorif boolean mask is not of type bool
cudf::logic_errorif boolean mask is not of the same length as lhs and rhs
Parameters
lhsleft-hand column_view
rhsright-hand column_view
boolean_maskcolumn of type_id::BOOL8 representing "left (true) / right (false)" boolean for each element. Null element represents false.
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned column's device memory
Returns
new column with the selected elements

◆ copy_if_else() [2/4]

std::unique_ptr<column> cudf::copy_if_else ( column_view const &  lhs,
scalar const &  rhs,
column_view const &  boolean_mask,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource mr = rmm::mr::get_current_device_resource() 
)

Returns a new column, where each element is selected from either lhs or rhs based on the value of the corresponding element in boolean_mask.

Selects each element i in the output column from either rhs or lhs using the following rule: output[i] = (boolean_mask.valid(i) and boolean_mask[i]) ? lhs[i] : rhs

Exceptions
cudf::logic_errorif lhs and rhs are not of the same type
cudf::logic_errorif boolean mask is not of type bool
cudf::logic_errorif boolean mask is not of the same length as lhs
Parameters
lhsleft-hand column_view
rhsright-hand scalar
boolean_maskcolumn of type_id::BOOL8 representing "left (true) / right (false)" boolean for each element. Null element represents false.
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned column's device memory
Returns
new column with the selected elements

◆ copy_if_else() [3/4]

std::unique_ptr<column> cudf::copy_if_else ( scalar const &  lhs,
column_view const &  rhs,
column_view const &  boolean_mask,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource mr = rmm::mr::get_current_device_resource() 
)

Returns a new column, where each element is selected from either lhs or rhs based on the value of the corresponding element in boolean_mask.

Selects each element i in the output column from either rhs or lhs using the following rule: output[i] = (boolean_mask.valid(i) and boolean_mask[i]) ? lhs : rhs[i]

Exceptions
cudf::logic_errorif lhs and rhs are not of the same type
cudf::logic_errorif boolean mask is not of type bool
cudf::logic_errorif boolean mask is not of the same length as rhs
Parameters
lhsleft-hand scalar
rhsright-hand column_view
boolean_maskcolumn of type_id::BOOL8 representing "left (true) / right (false)" boolean for each element. Null element represents false.
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned column's device memory
Returns
new column with the selected elements

◆ copy_if_else() [4/4]

std::unique_ptr<column> cudf::copy_if_else ( scalar const &  lhs,
scalar const &  rhs,
column_view const &  boolean_mask,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource mr = rmm::mr::get_current_device_resource() 
)

Returns a new column, where each element is selected from either lhs or rhs based on the value of the corresponding element in boolean_mask.

Selects each element i in the output column from either rhs or lhs using the following rule: output[i] = (boolean_mask.valid(i) and boolean_mask[i]) ? lhs : rhs

Exceptions
cudf::logic_errorif boolean mask is not of type bool
Parameters
lhsleft-hand scalar
rhsright-hand scalar
boolean_maskcolumn of type_id::BOOL8 representing "left (true) / right (false)" boolean for each element. null element represents false.
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned column's device memory
Returns
new column with the selected elements

◆ copy_range()

std::unique_ptr<column> cudf::copy_range ( column_view const &  source,
column_view const &  target,
size_type  source_begin,
size_type  source_end,
size_type  target_begin,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource mr = rmm::mr::get_current_device_resource() 
)

Copies a range of elements out-of-place from one column to another.

Creates a new column as if an in-place copy was performed into target. A copy of target is created first and then the elements indicated by the indices [target_begin, target_begin + N) were copied from the elements indicated by the indices [source_begin, source_end) of source (where N = (source_end - source_begin)). Elements outside the range are copied from target into the returned new column target.

If source and target refer to the same elements and the ranges overlap, the behavior is undefined.

Exceptions
cudf::logic_errorfor invalid range (if source_begin > source_end, source_begin < 0, source_begin >= source.size(), source_end > source.size(), target_begin < 0, target_begin >= target.size(), or target_begin + (source_end - source_begin) > target.size()).
cudf::logic_errorif target and source have different types.
Parameters
sourceThe column to copy from inside the range
targetThe column to copy from outside the range
source_beginThe starting index of the source range (inclusive)
source_endThe index of the last element in the source range (exclusive)
target_beginThe starting index of the target range (inclusive)
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned column's device memory
Returns
The result target column

◆ copy_range_in_place()

void cudf::copy_range_in_place ( column_view const &  source,
mutable_column_view target,
size_type  source_begin,
size_type  source_end,
size_type  target_begin,
rmm::cuda_stream_view  stream = cudf::get_default_stream() 
)

Copies a range of elements in-place from one column to another.

Overwrites the range of elements in target indicated by the indices [target_begin, target_begin + N) with the elements from source indicated by the indices [source_begin, source_end) (where N = (source_end - source_begin)). Use the out-of-place copy function returning std::unique_ptr<column> for uses cases requiring memory reallocation. For example for strings columns and other variable-width types.

If source and target refer to the same elements and the ranges overlap, the behavior is undefined.

Exceptions
cudf::logic_errorif memory reallocation is required (e.g. for variable width types).
cudf::logic_errorfor invalid range (if source_begin > source_end, source_begin < 0, source_begin >= source.size(), source_end > source.size(), target_begin < 0, target_begin >= target.size(), or target_begin + (source_end - source_begin) > target.size()).
cudf::logic_errorif target and source have different types.
cudf::logic_errorif source has null values and target is not nullable.
Parameters
sourceThe column to copy from
targetThe preallocated column to copy into
source_beginThe starting index of the source range (inclusive)
source_endThe index of the last element in the source range (exclusive)
target_beginThe starting index of the target range (inclusive)
streamCUDA stream used for device memory operations and kernel launches

◆ empty_like() [1/3]

std::unique_ptr<column> cudf::empty_like ( column_view const &  input)

Initializes and returns an empty column of the same type as the input.

Parameters
[in]inputImmutable view of input column to emulate
Returns
An empty column of same type as input

◆ empty_like() [2/3]

std::unique_ptr<column> cudf::empty_like ( scalar const &  input)

Initializes and returns an empty column of the same type as the input.

Parameters
[in]inputScalar to emulate
Returns
An empty column of same type as input

◆ empty_like() [3/3]

std::unique_ptr<table> cudf::empty_like ( table_view const &  input_table)

Creates a table of empty columns with the same types as the input_table

Creates the cudf::column objects, but does not allocate any underlying device memory for the column's data or bitmask.

Parameters
[in]input_tableImmutable view of input table to emulate
Returns
A table of empty columns with the same types as the columns in input_table

◆ get_element()

std::unique_ptr<scalar> cudf::get_element ( column_view const &  input,
size_type  index,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource mr = rmm::mr::get_current_device_resource() 
)

Get the element at specified index from a column.

Warning
This function is expensive (invokes a kernel launch). So, it is not recommended to be used in performance sensitive code or inside a loop.
Exceptions
cudf::logic_errorif index is not within the range [0, input.size())
Parameters
inputColumn view to get the element from
indexIndex into input to get the element at
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned scalar's device memory
Returns
Scalar containing the single value

◆ has_nonempty_nulls()

bool cudf::has_nonempty_nulls ( column_view const &  input,
rmm::cuda_stream_view  stream = cudf::get_default_stream() 
)

Checks if a column or its descendants have non-empty null rows.

Note
This function is exact. If it returns true, there exists one or more non-empty null elements.

A LIST or STRING column might have non-empty rows that are marked as null. A STRUCT OR LIST column might have child columns that have non-empty null rows. Other types of columns are deemed incapable of having non-empty null rows. E.g. Fixed width columns have no concept of an "empty" row.

Parameters
inputThe column which is (and whose descendants are) to be checked for non-empty null rows.
streamCUDA stream used for device memory operations and kernel launches
Returns
true If either the column or its descendants have non-empty null rows
false If neither the column or its descendants have non-empty null rows

◆ may_have_nonempty_nulls()

bool cudf::may_have_nonempty_nulls ( column_view const &  input)

Approximates if a column or its descendants may have non-empty null elements.

Note
This function is approximate.
  • true: Non-empty null elements could exist
  • false: Non-empty null elements definitely do not exist

False positives are possible, but false negatives are not.

Compared to the exact has_nonempty_nulls() function, this function is typically more efficient.

Complexity:

  • Best case: O(count_descendants(input))
  • Worst case: O(count_descendants(input)) * m, where m is the number of rows in the largest descendant
Parameters
inputThe column which is (and whose descendants are) to be checked for non-empty null rows
Returns
true If either the column or its descendants have null rows
false If neither the column nor its descendants have null rows

◆ purge_nonempty_nulls()

std::unique_ptr<column> cudf::purge_nonempty_nulls ( column_view const &  input,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource mr = rmm::mr::get_current_device_resource() 
)

Copy input into output while purging any non-empty null rows in the column or its descendants.

If the input column is not of compound type (LIST/STRING/STRUCT/DICTIONARY), the output will be the same as input.

The purge operation only applies directly to LIST and STRING columns, but it applies indirectly to STRUCT/DICTIONARY columns as well, since these columns may have child columns that are LIST or STRING.

Examples:

auto const lists = lists_column_wrapper<int32_t>{ {0,1}, {2,3}, {4,5} }.release();
cudf::detail::set_null_mask(lists->null_mask(), 1, 2, false);
lists[1] is now null, but the lists child column still stores `{2,3}`.
The lists column contents will be:
Validity: 101
Offsets: [0, 2, 4, 6]
Child: [0, 1, 2, 3, 4, 5]
After purging the contents of the list's null rows, the column's contents will be:
Validity: 101
Offsets: [0, 2, 2, 4]
Child: [0, 1, 4, 5]
auto const strings = strings_column_wrapper{ "AB", "CD", "EF" }.release();
cudf::detail::set_null_mask(strings->null_mask(), 1, 2, false);
strings[1] is now null, but the strings column still stores `"CD"`.
The lists column contents will be:
Validity: 101
Offsets: [0, 2, 4, 6]
Child: [A, B, C, D, E, F]
After purging the contents of the list's null rows, the column's contents
will be:
Validity: 101
Offsets: [0, 2, 2, 4]
Child: [A, B, E, F]
auto const lists = lists_column_wrapper<int32_t>{ {0,1}, {2,3}, {4,5} };
auto const structs = structs_column_wrapper{ {lists}, null_at(1) };
structs[1].child is now null, but the lists column still stores `{2,3}`.
The lists column contents will be:
Validity: 101
Offsets: [0, 2, 4, 6]
Child: [0, 1, 2, 3, 4, 5]
After purging the contents of the list's null rows, the column's contents
will be:
Validity: 101
Offsets: [0, 2, 2, 4]
Child: [0, 1, 4, 5]
Parameters
inputThe column whose null rows are to be checked and purged
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned column's device memory
Returns
A new column with equivalent contents to input, but with null rows purged

◆ reverse() [1/2]

std::unique_ptr<column> cudf::reverse ( column_view const &  source_column,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource mr = rmm::mr::get_current_device_resource() 
)

Reverses the elements of a column.

Creates a new column that is the reverse of source_column. Example:

source = [4,5,6]
return = [6,5,4]
Parameters
source_columnColumn that will be reversed
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned table's device memory
Returns
Reversed column

◆ reverse() [2/2]

std::unique_ptr<table> cudf::reverse ( table_view const &  source_table,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource mr = rmm::mr::get_current_device_resource() 
)

Reverses the rows within a table.

Creates a new table that is the reverse of source_table. Example:

source = [[4,5,6], [7,8,9], [10,11,12]]
return = [[6,5,4], [9,8,7], [12,11,10]]
Parameters
source_tableTable that will be reversed
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned table's device memory
Returns
Reversed table

◆ sample()

std::unique_ptr<table> cudf::sample ( table_view const &  input,
size_type const  n,
sample_with_replacement  replacement = sample_with_replacement::FALSE,
int64_t const  seed = 0,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource mr = rmm::mr::get_current_device_resource() 
)

Gather n samples from given input randomly.

Example:
input: {col1: {1, 2, 3, 4, 5}, col2: {6, 7, 8, 9, 10}}
n: 3
replacement: false
output: {col1: {3, 1, 4}, col2: {8, 6, 9}}
replacement: true
output: {col1: {3, 1, 1}, col2: {8, 6, 6}}
Exceptions
cudf::logic_errorif n > input.num_rows() and replacement == FALSE.
cudf::logic_errorif n < 0.
Parameters
inputView of a table to sample
nnon-negative number of samples expected from input
replacementAllow or disallow sampling of the same row more than once
seedSeed value to initiate random number generator
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned table's device memory
Returns
Table containing samples from input