Skip to content

(POC): Remove some hashmap lookups in distribution lookup #59

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
2 commits merged into from
Mar 25, 2025

Conversation

scsmithr
Copy link
Contributor

Partially addresses #56

Also removes IndexMap

Wip. The timings are down, but there might be alternative approaches here that I may try to explore.

This approach would also remove some flexibility with what we allow in a distribution, but I don't think that matter for this.

Timings

$ time cargo run --release -- -s 0.01 --output-dir=/tmp/tpchdbgen-rs-001

Tiny SF to capture the initialization time, not actual data generation.

main: 2.851s, 2.694s, 2.764s
this branch: 1.860s, 1.867s, 1.864s

@scsmithr scsmithr changed the title (POC): Remove some hashmap lookups distribution lookup (POC): Remove some hashmap lookups in distribution lookup Mar 21, 2025
Copy link
Collaborator

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @scsmithr -- this is looking great. Let me know when / it is ready for a review

@@ -8,7 +8,6 @@ authors = ["clflushopt", "alamb"]
# See ../ARCHITECTURE.md for more details
[dependencies]
chrono = "0.4.40"
indexmap = "2.7.1"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

adverbs: remove_dist("adverbs")?,
nouns: remove_dist("nouns")?,
verbs: remove_dist("verbs")?,
auxiliaries: remove_dist("auxillaries")?, // P.S: The correct spelling is `auxiliaries` which is what we use.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😆

@alamb
Copy link
Collaborator

alamb commented Mar 24, 2025

I started going through this PR -- the idea is really nice. I will try and bash out the conversion for the remaining distributions

@scsmithr
Copy link
Contributor Author

I started going through this PR -- the idea is really nice. I will try and bash out the conversion for the remaining distributions

Yeah my bad for the slowness here. Feel free to take it over.

My only real concern is we lose on flexibility on what distributions we can use. But that's my programmer brain talking -- in reality we only care about that one distribution file.

@alamb
Copy link
Collaborator

alamb commented Mar 24, 2025

Yeah my bad for the slowness here. Feel free to take it over.

No worries -- this is a great step. I also have some ideas about how to avoid hash maps entirely. But I'll do that as a follow on PR potentially.

Right now me and copilot are going to apply your pattern to all the remaining distributions :)

My only real concern is we lose on flexibility on what distributions we can use. But that's my programmer brain talking -- in reality we only care about that one distribution file.

Yeah, I agree -- the TPCH data generator hasn't changed for at least 14 years. I don't think flexibility is particularly important

@alamb alamb closed this pull request by merging all changes into clflushopt:main in 22c754a Mar 25, 2025
@alamb alamb closed this in #62 Mar 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants