-
Notifications
You must be signed in to change notification settings - Fork 3.8k
chore(engine): Refactor streams result builder to use column-oriented processsing #19505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
0720a8a
to
c9658ff
Compare
c9658ff
to
3e4384d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM.
Left a suggestion how to reduce allocations. Also, feel free to add the benchmark code and run the benchmarks for old and new version and post the output of benchstat old.txt new.txt
251bf2b
to
3d6ac88
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The benchmarks look great!
ab921b8
to
761be1c
Compare
… processing Change streamsResultBuilder.CollectRecord() from row-by-row to column-by-column iteration. This better aligns with Arrow's columnar memory layout, improving CPU cache locality by reading contiguous memory regions.
761be1c
to
a8d54c8
Compare
@chaudum I updated the implementation to use a slice of structs for rows buffer. Benchmark shows there is no major difference: here's benchstat between the previously reviewed implementation and the latest one (diff):
I also tried to use func TestLabelsBuilder_Playground(t *testing.T) {
lbs := labels.EmptyLabels()
b := NewBaseLabelsBuilder().ForLabels(lbs, labels.StableHash(lbs))
// b.Set(ParsedLabel, "a", "b") // will be returned as part of result.Labels()
b.Set(StreamLabel, "a", "b") // won't be returned as result.Labels()
result := b.LabelsResult()
require.Equal(t, "{a=\"b\"}", result.Labels().String())
} I'm trying to add this support but it takes time and increases the size of the PR so I'd like to have a different PR for it, wdyt? |
What this PR does / why we need it:
Change streamsResultBuilder.CollectRecord() from row-by-row to column-by-column iteration. This better aligns with Arrow's columnar memory layout, improving CPU cache locality by reading contiguous memory regions.
I also remove
IsValid
checks given thatIsNull
checks are present.IsValid
seems to be an exact opposite ofIsNull
(based on this code).The slices and label builders for intermediate rows data are reused between CollectRecord.
old vs latest version
Special notes for your reviewer:
Checklist
CONTRIBUTING.md
guide (required)feat
PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.docs/sources/setup/upgrade/_index.md
deprecated-config.yaml
anddeleted-config.yaml
files respectively in thetools/deprecated-config-checker
directory. Example PR