-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use cudf::make_strings_column_batch
in get_json_object
#2499
base: branch-24.12
Are you sure you want to change the base?
Use cudf::make_strings_column_batch
in get_json_object
#2499
Conversation
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code change looks fine to me, but in my testing the change was really small. it was within the run to run variance so I cannot really say if it is better or not.
I observe the same thing. From the profiling, I guess that is because the query has very imbalance output. It has one very big column and a large number of very small columns (see image below, which was profiled on the non-batch columns construction, the |
Note that the final implementation of strings column batch construction (rapidsai/cudf#17035) shows around 30-35% speedup on strings (output) column construction:
|
This applies
cudf::make_strings_column_batch
for creating the output columns inget_json_object
, which can reduce the total run time to some extent.Depends on: