Use `cudf::make_strings_column_batch` in `get_json_object` #2499

ttnghia · 2024-10-11T21:49:39Z

This applies cudf::make_strings_column_batch for creating the output columns in get_json_object, which can reduce the total run time to some extent.

Depends on:

Implement batch construction for strings columns rapidsai/cudf#17035.

Signed-off-by: Nghia Truong <[email protected]>

revans2

The code change looks fine to me, but in my testing the change was really small. it was within the run to run variance so I cannot really say if it is better or not.

ttnghia · 2024-10-15T16:35:27Z

in my testing the change was really small

I observe the same thing. From the profiling, I guess that is because the query has very imbalance output. It has one very big column and a large number of very small columns (see image below, which was profiled on the non-batch columns construction, the make_strings_column ranges: one big range followed by many tiny ranges), thus the speed up here is not observable. But overall this should be better.

ttnghia · 2024-10-15T16:39:47Z

Note that the final implementation of strings column batch construction (rapidsai/cudf#17035) shows around 30-35% speedup on strings (output) column construction:

## [0] Quadro RTX 6000

|  num_rows  |  batch_size  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |          Diff |   %Diff |
|------------|--------------|------------|-------------|------------|-------------|---------------|---------|
|   100000   |      10      | 968.501 us |      27.15% | 632.428 us |      22.22% |   -336.073 us | -34.70% |
|   500000   |      10      |   2.796 ms |      13.37% |   1.760 ms |       5.03% |  -1035.502 us | -37.04% |
|  1000000   |      10      |   4.969 ms |      11.47% |   3.309 ms |      10.76% |  -1659.638 us | -33.40% |
|  2000000   |      10      |   9.502 ms |       8.53% |   6.323 ms |       8.49% |  -3178.906 us | -33.46% |
|   100000   |      20      |   1.912 ms |      18.29% |   1.181 ms |      10.76% |   -731.781 us | -38.27% |
|   500000   |      20      |   5.711 ms |       8.18% |   3.707 ms |      11.80% |  -2004.494 us | -35.10% |
|  1000000   |      20      |  10.170 ms |       5.34% |   6.906 ms |      12.90% |  -3263.830 us | -32.09% |
|  2000000   |      20      |  19.383 ms |       3.00% |  13.207 ms |       7.44% |  -6175.960 us | -31.86% |
|   100000   |      50      |   4.719 ms |       9.32% |   3.092 ms |      12.59% |  -1626.383 us | -34.47% |
|   500000   |      50      |  14.334 ms |       4.41% |   9.651 ms |       9.29% |  -4683.219 us | -32.67% |
|  1000000   |      50      |  26.011 ms |       2.83% |  17.286 ms |       6.20% |  -8724.813 us | -33.54% |
|  2000000   |      50      |  49.436 ms |       2.22% |  33.458 ms |       6.67% | -15977.779 us | -32.32% |
|   100000   |     100      |   9.547 ms |       5.29% |   6.489 ms |      12.62% |  -3058.533 us | -32.04% |
|   500000   |     100      |  29.009 ms |       3.51% |  19.489 ms |       4.70% |  -9520.199 us | -32.82% |
|  1000000   |     100      |  52.038 ms |       2.26% |  35.339 ms |       4.98% | -16698.985 us | -32.09% |
|  2000000   |     100      |  99.383 ms |       2.03% |  65.385 ms |       3.19% | -33998.266 us | -34.21% |

ttnghia added 2 commits October 11, 2024 14:47

Testing

ba08d75

Signed-off-by: Nghia Truong <[email protected]>

Merge branch 'branch-24.12' into get_json_obj_batch_construction

c64478d

ttnghia added the performance label Oct 11, 2024

ttnghia requested a review from revans2 October 11, 2024 21:49

ttnghia self-assigned this Oct 11, 2024

ttnghia added 2 commits October 14, 2024 10:42

Merge branch 'branch-24.12' into get_json_obj_batch_construction

0164b13

Use make_strings_column_batch

4341dd0

Signed-off-by: Nghia Truong <[email protected]>

ttnghia marked this pull request as ready for review October 14, 2024 19:17

revans2 approved these changes Oct 15, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `cudf::make_strings_column_batch` in `get_json_object` #2499

Use `cudf::make_strings_column_batch` in `get_json_object` #2499

ttnghia commented Oct 11, 2024 •

edited

Loading

revans2 left a comment

ttnghia commented Oct 15, 2024 •

edited

Loading

ttnghia commented Oct 15, 2024 •

edited

Loading

Use cudf::make_strings_column_batch in get_json_object #2499

Are you sure you want to change the base?

Use cudf::make_strings_column_batch in get_json_object #2499

Conversation

ttnghia commented Oct 11, 2024 • edited Loading

revans2 left a comment

Choose a reason for hiding this comment

ttnghia commented Oct 15, 2024 • edited Loading

ttnghia commented Oct 15, 2024 • edited Loading

Use `cudf::make_strings_column_batch` in `get_json_object` #2499

Use `cudf::make_strings_column_batch` in `get_json_object` #2499

ttnghia commented Oct 11, 2024 •

edited

Loading

ttnghia commented Oct 15, 2024 •

edited

Loading

ttnghia commented Oct 15, 2024 •

edited

Loading