Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel output in tpchgen-cli (Nx faster, where N is number of cores) #58

Merged
merged 6 commits into from
Mar 25, 2025

Conversation

alamb
Copy link
Collaborator

@alamb alamb commented Mar 21, 2025

Features:

  1. Logging support via -v and RUST_LOG
  2. Parallel generation for all tables
  3. No new dependencies on the tpch core crate and minor changes

Performance 🌶️

In single threaded mode, the generator can create 250MB/sec with each core which is 4GB/s on my 16 core laptop

This scales linearly with

Here are performance measurements for SF=100 on my Mac M3 (with 16 cores)

branch speed speed (writing to /dev/null)
main 6m1.984s N/A
this branch 1m0.099s 0m28.732s
time target/release/tpchgen-cli -s 100 --output-dir=/tmp/tpchdbgen-rs

It actually turns out this can fully saturate the disk bandwidth on my Mac M3 laptop which caps out at 2GB/sec

If I hard code the generator to just throw the data away rather than writing, it keeps all the cores busy and writes at a blistering 4GB/sec

@alamb
Copy link
Collaborator Author

alamb commented Mar 21, 2025

Ok, I need to stop for today, but this is looking promising

@alamb alamb force-pushed the alamb/parallel_for_real branch from 3000cd5 to aee04a0 Compare March 22, 2025 10:09
@alamb alamb changed the title Parallel output in tpchgen-cli Parallel output in tpchgen-cli (Nx faster, where N is number of cores) Mar 22, 2025
@alamb alamb force-pushed the alamb/parallel_for_real branch from 16bed68 to 7fcaf3e Compare March 22, 2025 11:53
@alamb alamb force-pushed the alamb/parallel_for_real branch from 7fcaf3e to 0af7609 Compare March 22, 2025 13:14
@alamb alamb force-pushed the alamb/parallel_for_real branch from 0af7609 to b9ce319 Compare March 22, 2025 13:38
@alamb alamb marked this pull request as ready for review March 22, 2025 14:04
@alamb
Copy link
Collaborator Author

alamb commented Mar 24, 2025

I plan to merge this PR tomorrow unless anyone else would like time to comment on it.

@clflushopt
Copy link
Owner

I will take a look at it tonight ! Thanks @alamb

Copy link
Owner

@clflushopt clflushopt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM; awesome stuff !!!

@alamb
Copy link
Collaborator Author

alamb commented Mar 25, 2025

BTW with the other improvements on main, when merging it takes less than 5 seconds on my laptop to make TPCH SF 10 🚀

time target/release/tpchgen-cli -s 10  --output-dir=/tmp/tpchdbgen-rs

real	0m4.895s
user	0m36.096s
sys	0m2.910s

@alamb alamb merged commit ab720a7 into clflushopt:main Mar 25, 2025
6 checks passed
@alamb alamb deleted the alamb/parallel_for_real branch March 25, 2025 11:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement multi-threading data generation
2 participants