libcudf
24.02.00
|
Files | |
file | range_window_bounds.hpp |
file | rolling.hpp |
Classes | |
struct | cudf::range_window_bounds |
Abstraction for window boundary sizes, to be used with grouped_range_rolling_window() . More... | |
struct | cudf::window_bounds |
Abstraction for window boundary sizes. More... | |
Functions | |
std::unique_ptr< column > | cudf::rolling_window (column_view const &input, size_type preceding_window, size_type following_window, size_type min_periods, rolling_aggregation const &agg, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource()) |
Applies a fixed-size rolling window function to the values in a column. More... | |
std::unique_ptr< column > | cudf::rolling_window (column_view const &input, column_view const &default_outputs, size_type preceding_window, size_type following_window, size_type min_periods, rolling_aggregation const &agg, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource()) |
Applies a fixed-size rolling window function to the values in a column. More... | |
std::unique_ptr< column > | cudf::grouped_rolling_window (table_view const &group_keys, column_view const &input, size_type preceding_window, size_type following_window, size_type min_periods, rolling_aggregation const &aggr, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource()) |
Applies a grouping-aware, fixed-size rolling window function to the values in a column. More... | |
std::unique_ptr< column > | cudf::grouped_rolling_window (table_view const &group_keys, column_view const &input, window_bounds preceding_window, window_bounds following_window, size_type min_periods, rolling_aggregation const &aggr, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource()) |
Applies a grouping-aware, fixed-size rolling window function to the values in a column. More... | |
std::unique_ptr< column > | cudf::grouped_rolling_window (table_view const &group_keys, column_view const &input, column_view const &default_outputs, size_type preceding_window, size_type following_window, size_type min_periods, rolling_aggregation const &aggr, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource()) |
Applies a grouping-aware, fixed-size rolling window function to the values in a column. More... | |
std::unique_ptr< column > | cudf::grouped_rolling_window (table_view const &group_keys, column_view const &input, column_view const &default_outputs, window_bounds preceding_window, window_bounds following_window, size_type min_periods, rolling_aggregation const &aggr, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource()) |
Applies a grouping-aware, fixed-size rolling window function to the values in a column. More... | |
std::unique_ptr< column > | cudf::grouped_time_range_rolling_window (table_view const &group_keys, column_view const ×tamp_column, cudf::order const ×tamp_order, column_view const &input, size_type preceding_window_in_days, size_type following_window_in_days, size_type min_periods, rolling_aggregation const &aggr, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource()) |
Applies a grouping-aware, timestamp-based rolling window function to the values in a column. More... | |
std::unique_ptr< column > | cudf::grouped_time_range_rolling_window (table_view const &group_keys, column_view const ×tamp_column, cudf::order const ×tamp_order, column_view const &input, window_bounds preceding_window_in_days, window_bounds following_window_in_days, size_type min_periods, rolling_aggregation const &aggr, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource()) |
Applies a grouping-aware, timestamp-based rolling window function to the values in a column,. More... | |
std::unique_ptr< column > | cudf::grouped_range_rolling_window (table_view const &group_keys, column_view const &orderby_column, cudf::order const &order, column_view const &input, range_window_bounds const &preceding, range_window_bounds const &following, size_type min_periods, rolling_aggregation const &aggr, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource()) |
Applies a grouping-aware, value range-based rolling window function to the values in a column. More... | |
std::unique_ptr< column > | cudf::rolling_window (column_view const &input, column_view const &preceding_window, column_view const &following_window, size_type min_periods, rolling_aggregation const &agg, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource()) |
Applies a variable-size rolling window function to the values in a column. More... | |
std::unique_ptr<column> cudf::grouped_range_rolling_window | ( | table_view const & | group_keys, |
column_view const & | orderby_column, | ||
cudf::order const & | order, | ||
column_view const & | input, | ||
range_window_bounds const & | preceding, | ||
range_window_bounds const & | following, | ||
size_type | min_periods, | ||
rolling_aggregation const & | aggr, | ||
rmm::mr::device_memory_resource * | mr = rmm::mr::get_current_device_resource() |
||
) |
Applies a grouping-aware, value range-based rolling window function to the values in a column.
This function aggregates rows in a window around each element of a specified input
column. The window is determined based on the values of an ordered orderby
column, and on the values of a preceding
and following
scalar representing an inclusive range of orderby column values.
input
column are grouped into distinct groups (e.g. the result of a groupby), determined by the corresponding values of the columns under group_keys
. The window-aggregation cannot cross the group boundaries.orderby
column, the aggregation window for a row at index i
is determined as follows: a) If orderby
is ASCENDING, aggregation window for row i
includes all input
rows at index j
such that: orderby
is DESCENDING, aggregation window for row i
includes all input
rows at index j
such that: Note: This method requires that the rows are presorted by the group keys and orderby column values.
The window intervals are specified as scalar values appropriate for the orderby column. Currently, only the following combinations of orderby
column type and range types are supported:
orderby
column is a TIMESTAMP, the preceding
/following
windows are specified in terms of DURATION
scalars of the same resolution. E.g. For orderby
column of type TIMESTAMP_SECONDS
, the intervals may only be DURATION_SECONDS
. Durations of higher resolution (e.g. DURATION_NANOSECONDS
) or lower (e.g. DURATION_DAYS
) cannot be used.orderby
column is an integral type (e.g. INT32
), the preceding
/following
should be the exact same type (INT32
).Note: The number of rows participating in each window might vary, based on the index within the group, datestamp, and min_periods
. Apropos:
Each aggregation operation cannot cross group boundaries.
The type of the returned column depends on the input column type T
, and the aggregation:
INT32
columnsT
columnsINT32
yields INT64
.LIST<T>
.LEAD/LAG/ROW_NUMBER are undefined for range queries.
[in] | group_keys | The (pre-sorted) grouping columns |
[in] | orderby_column | The (pre-sorted) order-by column, for range comparisons |
[in] | order | The order (ASCENDING/DESCENDING) in which the order-by column is sorted |
[in] | input | The input column (to be aggregated) |
[in] | preceding | The interval value in the backward direction |
[in] | following | The interval value in the forward direction |
[in] | min_periods | Minimum number of observations in window required to have a value, otherwise element i is null. |
[in] | aggr | The rolling window aggregation type (SUM, MAX, MIN, etc.) |
[in] | mr | Device memory resource used to allocate the returned column's device memory |
std::unique_ptr<column> cudf::grouped_rolling_window | ( | table_view const & | group_keys, |
column_view const & | input, | ||
column_view const & | default_outputs, | ||
size_type | preceding_window, | ||
size_type | following_window, | ||
size_type | min_periods, | ||
rolling_aggregation const & | aggr, | ||
rmm::mr::device_memory_resource * | mr = rmm::mr::get_current_device_resource() |
||
) |
Applies a grouping-aware, fixed-size rolling window function to the values in a column.
Like rolling_window()
, this function aggregates values in a window around each element of a specified input
column. It differs from rolling_window()
in that elements of the input
column are grouped into distinct groups (e.g. the result of a groupby). The window aggregation cannot cross the group boundaries. For a row i
of input
, the group is determined from the corresponding (i.e. i-th) values of the columns under group_keys
.
Note: This method requires that the rows are presorted by the group_key
values.
The returned column for op == COUNT
always has INT32
type. All other operators return a column of the same type as the input. Therefore it is suggested to convert integer column types (especially low-precision integers) to FLOAT32
or FLOAT64
before doing a rolling MEAN
.
Note: preceding_window
and following_window
could well have negative values. This yields windows where the current row might not be included at all. For instance, consider a window defined as (preceding=3, following=-1). This produces a window from 2 (i.e. 3-1) rows preceding the current row, and 1 row preceding the current row. For the example above, the window for row#3 is:
[ 10, 20, 10, 50, 60, 20, 30, 80, 40 ] <–window--> ^ | current_row
Similarly, preceding
could have a negative value, indicating that the window begins at a position after the current row. It differs slightly from the semantics for following
, because preceding
includes the current row. Therefore:
[in] | group_keys | The (pre-sorted) grouping columns |
[in] | input | The input column (to be aggregated) |
[in] | preceding_window | The static rolling window size in the backward direction (for positive values), or forward direction (for negative values) |
[in] | following_window | The static rolling window size in the forward direction (for positive values), or backward direction (for negative values) |
[in] | min_periods | Minimum number of observations in window required to have a value, otherwise element i is null. |
[in] | aggr | The rolling window aggregation type (SUM, MAX, MIN, etc.) |
[in] | mr | Device memory resource used to allocate the returned column's device memory |
default_outputs | A column of per-row default values to be returned instead of nulls. Used for LEAD()/LAG(), if the row offset crosses the boundaries of the column or group. |
std::unique_ptr<column> cudf::grouped_rolling_window | ( | table_view const & | group_keys, |
column_view const & | input, | ||
column_view const & | default_outputs, | ||
window_bounds | preceding_window, | ||
window_bounds | following_window, | ||
size_type | min_periods, | ||
rolling_aggregation const & | aggr, | ||
rmm::mr::device_memory_resource * | mr = rmm::mr::get_current_device_resource() |
||
) |
Applies a grouping-aware, fixed-size rolling window function to the values in a column.
Like rolling_window()
, this function aggregates values in a window around each element of a specified input
column. It differs from rolling_window()
in that elements of the input
column are grouped into distinct groups (e.g. the result of a groupby). The window aggregation cannot cross the group boundaries. For a row i
of input
, the group is determined from the corresponding (i.e. i-th) values of the columns under group_keys
.
Note: This method requires that the rows are presorted by the group_key
values.
The returned column for op == COUNT
always has INT32
type. All other operators return a column of the same type as the input. Therefore it is suggested to convert integer column types (especially low-precision integers) to FLOAT32
or FLOAT64
before doing a rolling MEAN
.
Note: preceding_window
and following_window
could well have negative values. This yields windows where the current row might not be included at all. For instance, consider a window defined as (preceding=3, following=-1). This produces a window from 2 (i.e. 3-1) rows preceding the current row, and 1 row preceding the current row. For the example above, the window for row#3 is:
[ 10, 20, 10, 50, 60, 20, 30, 80, 40 ] <–window--> ^ | current_row
Similarly, preceding
could have a negative value, indicating that the window begins at a position after the current row. It differs slightly from the semantics for following
, because preceding
includes the current row. Therefore:
[in] | group_keys | The (pre-sorted) grouping columns |
[in] | input | The input column (to be aggregated) |
[in] | preceding_window | The static rolling window size in the backward direction (for positive values), or forward direction (for negative values) |
[in] | following_window | The static rolling window size in the forward direction (for positive values), or backward direction (for negative values) |
[in] | min_periods | Minimum number of observations in window required to have a value, otherwise element i is null. |
[in] | aggr | The rolling window aggregation type (SUM, MAX, MIN, etc.) |
[in] | mr | Device memory resource used to allocate the returned column's device memory |
default_outputs | A column of per-row default values to be returned instead of nulls. Used for LEAD()/LAG(), if the row offset crosses the boundaries of the column or group. |
std::unique_ptr<column> cudf::grouped_rolling_window | ( | table_view const & | group_keys, |
column_view const & | input, | ||
size_type | preceding_window, | ||
size_type | following_window, | ||
size_type | min_periods, | ||
rolling_aggregation const & | aggr, | ||
rmm::mr::device_memory_resource * | mr = rmm::mr::get_current_device_resource() |
||
) |
Applies a grouping-aware, fixed-size rolling window function to the values in a column.
Like rolling_window()
, this function aggregates values in a window around each element of a specified input
column. It differs from rolling_window()
in that elements of the input
column are grouped into distinct groups (e.g. the result of a groupby). The window aggregation cannot cross the group boundaries. For a row i
of input
, the group is determined from the corresponding (i.e. i-th) values of the columns under group_keys
.
Note: This method requires that the rows are presorted by the group_key
values.
The returned column for op == COUNT
always has INT32
type. All other operators return a column of the same type as the input. Therefore it is suggested to convert integer column types (especially low-precision integers) to FLOAT32
or FLOAT64
before doing a rolling MEAN
.
Note: preceding_window
and following_window
could well have negative values. This yields windows where the current row might not be included at all. For instance, consider a window defined as (preceding=3, following=-1). This produces a window from 2 (i.e. 3-1) rows preceding the current row, and 1 row preceding the current row. For the example above, the window for row#3 is:
[ 10, 20, 10, 50, 60, 20, 30, 80, 40 ] <–window--> ^ | current_row
Similarly, preceding
could have a negative value, indicating that the window begins at a position after the current row. It differs slightly from the semantics for following
, because preceding
includes the current row. Therefore:
[in] | group_keys | The (pre-sorted) grouping columns |
[in] | input | The input column (to be aggregated) |
[in] | preceding_window | The static rolling window size in the backward direction (for positive values), or forward direction (for negative values) |
[in] | following_window | The static rolling window size in the forward direction (for positive values), or backward direction (for negative values) |
[in] | min_periods | Minimum number of observations in window required to have a value, otherwise element i is null. |
[in] | aggr | The rolling window aggregation type (SUM, MAX, MIN, etc.) |
[in] | mr | Device memory resource used to allocate the returned column's device memory |
std::unique_ptr<column> cudf::grouped_rolling_window | ( | table_view const & | group_keys, |
column_view const & | input, | ||
window_bounds | preceding_window, | ||
window_bounds | following_window, | ||
size_type | min_periods, | ||
rolling_aggregation const & | aggr, | ||
rmm::mr::device_memory_resource * | mr = rmm::mr::get_current_device_resource() |
||
) |
Applies a grouping-aware, fixed-size rolling window function to the values in a column.
Like rolling_window()
, this function aggregates values in a window around each element of a specified input
column. It differs from rolling_window()
in that elements of the input
column are grouped into distinct groups (e.g. the result of a groupby). The window aggregation cannot cross the group boundaries. For a row i
of input
, the group is determined from the corresponding (i.e. i-th) values of the columns under group_keys
.
Note: This method requires that the rows are presorted by the group_key
values.
The returned column for op == COUNT
always has INT32
type. All other operators return a column of the same type as the input. Therefore it is suggested to convert integer column types (especially low-precision integers) to FLOAT32
or FLOAT64
before doing a rolling MEAN
.
Note: preceding_window
and following_window
could well have negative values. This yields windows where the current row might not be included at all. For instance, consider a window defined as (preceding=3, following=-1). This produces a window from 2 (i.e. 3-1) rows preceding the current row, and 1 row preceding the current row. For the example above, the window for row#3 is:
[ 10, 20, 10, 50, 60, 20, 30, 80, 40 ] <–window--> ^ | current_row
Similarly, preceding
could have a negative value, indicating that the window begins at a position after the current row. It differs slightly from the semantics for following
, because preceding
includes the current row. Therefore:
[in] | group_keys | The (pre-sorted) grouping columns |
[in] | input | The input column (to be aggregated) |
[in] | preceding_window | The static rolling window size in the backward direction (for positive values), or forward direction (for negative values) |
[in] | following_window | The static rolling window size in the forward direction (for positive values), or backward direction (for negative values) |
[in] | min_periods | Minimum number of observations in window required to have a value, otherwise element i is null. |
[in] | aggr | The rolling window aggregation type (SUM, MAX, MIN, etc.) |
[in] | mr | Device memory resource used to allocate the returned column's device memory |
std::unique_ptr<column> cudf::grouped_time_range_rolling_window | ( | table_view const & | group_keys, |
column_view const & | timestamp_column, | ||
cudf::order const & | timestamp_order, | ||
column_view const & | input, | ||
size_type | preceding_window_in_days, | ||
size_type | following_window_in_days, | ||
size_type | min_periods, | ||
rolling_aggregation const & | aggr, | ||
rmm::mr::device_memory_resource * | mr = rmm::mr::get_current_device_resource() |
||
) |
Applies a grouping-aware, timestamp-based rolling window function to the values in a column.
Like rolling_window()
, this function aggregates values in a window around each element of a specified input
column. It differs from rolling_window()
in two respects:
input
column are grouped into distinct groups (e.g. the result of a groupby), determined by the corresponding values of the columns under group_keys
. The window-aggregation cannot cross the group boundaries.timestamp_column
argument.Note: This method requires that the rows are presorted by the group keys and timestamp values.
Note: The number of rows participating in each window might vary, based on the index within the group, datestamp, and min_periods
. Apropos:
Each aggregation operation cannot cross group boundaries.
The returned column for op == COUNT
always has INT32
type. All other operators return a column of the same type as the input. Therefore it is suggested to convert integer column types (especially low-precision integers) to FLOAT32
or FLOAT64
before doing a rolling MEAN
.
[in] | group_keys | The (pre-sorted) grouping columns |
[in] | timestamp_column | The (pre-sorted) timestamps for each row |
[in] | timestamp_order | The order (ASCENDING/DESCENDING) in which the timestamps are sorted |
[in] | input | The input column (to be aggregated) |
[in] | preceding_window_in_days | The rolling window time-interval in the backward direction |
[in] | following_window_in_days | The rolling window time-interval in the forward direction |
[in] | min_periods | Minimum number of observations in window required to have a value, otherwise element i is null. |
[in] | aggr | The rolling window aggregation type (SUM, MAX, MIN, etc.) |
[in] | mr | Device memory resource used to allocate the returned column's device memory |
std::unique_ptr<column> cudf::grouped_time_range_rolling_window | ( | table_view const & | group_keys, |
column_view const & | timestamp_column, | ||
cudf::order const & | timestamp_order, | ||
column_view const & | input, | ||
window_bounds | preceding_window_in_days, | ||
window_bounds | following_window_in_days, | ||
size_type | min_periods, | ||
rolling_aggregation const & | aggr, | ||
rmm::mr::device_memory_resource * | mr = rmm::mr::get_current_device_resource() |
||
) |
Applies a grouping-aware, timestamp-based rolling window function to the values in a column,.
Like rolling_window()
, this function aggregates values in a window around each element of a specified input
column. It differs from rolling_window()
in two respects:
input
column are grouped into distinct groups (e.g. the result of a groupby), determined by the corresponding values of the columns under group_keys
. The window-aggregation cannot cross the group boundaries.timestamp_column
argument.Note: This method requires that the rows are presorted by the group keys and timestamp values.
Note: The number of rows participating in each window might vary, based on the index within the group, datestamp, and min_periods
. Apropos:
Each aggregation operation cannot cross group boundaries.
The returned column for op == COUNT
always has INT32
type. All other operators return a column of the same type as the input. Therefore it is suggested to convert integer column types (especially low-precision integers) to FLOAT32
or FLOAT64
before doing a rolling MEAN
.
[in] | group_keys | The (pre-sorted) grouping columns |
[in] | timestamp_column | The (pre-sorted) timestamps for each row |
[in] | timestamp_order | The order (ASCENDING/DESCENDING) in which the timestamps are sorted |
[in] | input | The input column (to be aggregated) |
[in] | preceding_window_in_days | The rolling window time-interval in the backward direction |
[in] | following_window_in_days | The rolling window time-interval in the forward direction |
[in] | min_periods | Minimum number of observations in window required to have a value, otherwise element i is null. |
[in] | aggr | The rolling window aggregation type (SUM, MAX, MIN, etc.) |
[in] | mr | Device memory resource used to allocate the returned column's device memory |
The preceding_window_in_days
and following_window_in_days
are specified as a window_bounds
and supports "unbounded" windows, if set to window_bounds::unbounded()
.
std::unique_ptr<column> cudf::rolling_window | ( | column_view const & | input, |
column_view const & | default_outputs, | ||
size_type | preceding_window, | ||
size_type | following_window, | ||
size_type | min_periods, | ||
rolling_aggregation const & | agg, | ||
rmm::mr::device_memory_resource * | mr = rmm::mr::get_current_device_resource() |
||
) |
Applies a fixed-size rolling window function to the values in a column.
This function aggregates values in a window around each element i of the input column, and invalidates the bit mask for element i if there are not enough observations. The window size is static (the same for each element). This matches Pandas' API for DataFrame.rolling with a few notable differences:
preceding_window + following_window
. Element i
uses elements [i-preceding_window+1, i+following_window]
to do the window computation.Notes on return column types:
INT32
type.FLOAT64
type.FLOAT32
or FLOAT64
before doing a rolling MEAN
.[in] | input | The input column |
[in] | preceding_window | The static rolling window size in the backward direction |
[in] | following_window | The static rolling window size in the forward direction |
[in] | min_periods | Minimum number of observations in window required to have a value, otherwise element i is null. |
[in] | agg | The rolling window aggregation type (SUM, MAX, MIN, etc.) |
[in] | mr | Device memory resource used to allocate the returned column's device memory |
default_outputs | A column of per-row default values to be returned instead of nulls. Used for LEAD()/LAG(), if the row offset crosses the boundaries of the column. |
std::unique_ptr<column> cudf::rolling_window | ( | column_view const & | input, |
column_view const & | preceding_window, | ||
column_view const & | following_window, | ||
size_type | min_periods, | ||
rolling_aggregation const & | agg, | ||
rmm::mr::device_memory_resource * | mr = rmm::mr::get_current_device_resource() |
||
) |
Applies a variable-size rolling window function to the values in a column.
This function aggregates values in a window around each element i of the input column, and invalidates the bit mask for element i if there are not enough observations. The window size is dynamic (varying for each element). This matches Pandas' API for DataFrame.rolling with a few notable differences:
preceding_window + following_window
. Element i
uses elements [i-preceding_window+1, i+following_window]
to do the window computation.The returned column for count aggregation always has INT32 type. All other operators return a column of the same type as the input. Therefore it is suggested to convert integer column types (especially low-precision integers) to FLOAT32
or FLOAT64
before doing a rolling MEAN
.
cudf::logic_error | if window column type is not INT32 |
[in] | input | The input column |
[in] | preceding_window | A non-nullable column of INT32 window sizes in the forward direction. preceding_window[i] specifies preceding window size for element i . |
[in] | following_window | A non-nullable column of INT32 window sizes in the backward direction. following_window[i] specifies following window size for element i . |
[in] | min_periods | Minimum number of observations in window required to have a value, otherwise element i is null. |
[in] | agg | The rolling window aggregation type (sum, max, min, etc.) |
[in] | mr | Device memory resource used to allocate the returned column's device memory |
std::unique_ptr<column> cudf::rolling_window | ( | column_view const & | input, |
size_type | preceding_window, | ||
size_type | following_window, | ||
size_type | min_periods, | ||
rolling_aggregation const & | agg, | ||
rmm::mr::device_memory_resource * | mr = rmm::mr::get_current_device_resource() |
||
) |
Applies a fixed-size rolling window function to the values in a column.
This function aggregates values in a window around each element i of the input column, and invalidates the bit mask for element i if there are not enough observations. The window size is static (the same for each element). This matches Pandas' API for DataFrame.rolling with a few notable differences:
preceding_window + following_window
. Element i
uses elements [i-preceding_window+1, i+following_window]
to do the window computation.Notes on return column types:
INT32
type.FLOAT64
type.FLOAT32
or FLOAT64
before doing a rolling MEAN
.[in] | input | The input column |
[in] | preceding_window | The static rolling window size in the backward direction |
[in] | following_window | The static rolling window size in the forward direction |
[in] | min_periods | Minimum number of observations in window required to have a value, otherwise element i is null. |
[in] | agg | The rolling window aggregation type (SUM, MAX, MIN, etc.) |
[in] | mr | Device memory resource used to allocate the returned column's device memory |