-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create and traverse avro schemas once per task (#296) #834
base: main
Are you sure you want to change the base?
Create and traverse avro schemas once per task (#296) #834
Conversation
* Cleanup and isolate the schema perf changes * add an _
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Solid change!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution. This leads to huge performance improvements in GB upload tasks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Oh wow I thought this had already been merged. I don't have write permissions to actually do the deed... |
Summary
This adds a SchemaTraverser to the chronon row creation logic, which allows callers of Row.to to specify how the output schema should be constructed.
At the core, this code is centered around removing the
AvroConversions.fromChrononSchema
call in the lambda passed fromfromChrononRow
.Why / Goal
The proximate goal is to allow faster Avro Schema construction, avoiding repeated calls to fromChrononSchema in a tight loop. This has resulted in a 90+% reduction in CPU usage during the reduce phase of the GroupByUpload aggregation.
At stripe, several of our largest GroupByUpload apps are now using 50-80% fewer vcore-seconds per run.
Test Plan
Checklist
Reviewers