[FEA] Refactor to eliminate redundant device aggregation logic #17032

PointKernel · 2024-10-09T20:58:35Z

Is your feature request related to a problem? Please describe.
Once #17031 is merged, three copies of similar device aggregator logic will exist in libcudf, and we need to address this issue.

Currently, we cannot share the same code path because the existing device aggregator only accepts column_device_view as input, and libcudf is unable to construct a column_device_view from shared memory at this point.

Describe the solution you'd like
Based on our offline discussions, adding an overload for the column_device_view ctor may not be the best approach. Instead, we should consider introducing a new object similar to device_column_view specifically designed for use in shared memory.

Ultimately, we aim to have a single aggregator that manages all types of aggregations: global-global, shared-global, and global-shared.

Additional context

The new gmem_element_aggregator and shmem_element_aggregator cannot be used to deal with dictionary columns due to two reasons:
- CUDA error: on V100, they will cause a cudaErrorInvalidValue when querying the available dynamic shared memory size with cudaOccupancyAvailableDynamicSMemPerBlock. This issue is likely caused by the dictionary template instantiation of the aggregator, which triggers a nested invocation of the type dispatcher. Note this error is seen on V100 but not on RTX8000.
- Performance: Simply adding dictionary instantiations without actually using them, i.e., without dictionary columns involved in the calculation, can make the group performance up to 5x slower.

The text was updated successfully, but these errors were encountered:

PointKernel added feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. improvement Improvement / enhancement to an existing function and removed feature request New feature or request labels Oct 9, 2024

PointKernel mentioned this issue Oct 9, 2024

Add device aggregators used by shared memory groupby #17031

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Refactor to eliminate redundant device aggregation logic #17032

[FEA] Refactor to eliminate redundant device aggregation logic #17032

PointKernel commented Oct 9, 2024 •

edited

Loading

[FEA] Refactor to eliminate redundant device aggregation logic #17032

[FEA] Refactor to eliminate redundant device aggregation logic #17032

Comments

PointKernel commented Oct 9, 2024 • edited Loading

PointKernel commented Oct 9, 2024 •

edited

Loading