Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster GBWT construction from GAM/GAF #4433

Merged
merged 15 commits into from
Nov 6, 2024
Merged

Faster GBWT construction from GAM/GAF #4433

merged 15 commits into from
Nov 6, 2024

Conversation

jltsiren
Copy link
Contributor

@jltsiren jltsiren commented Nov 4, 2024

Changelog Entry

To be copied to the draft changelog by merger:

  • GBWT construction from a GAM/GAF file now uses parallel construction jobs.

Description

If instructed to use m construction jobs, the new algorithm will partition the graph into approximately m jobs, but often a bit more. A separate GBWTBuilder will be created for each job. The algorithm reads the GAM/GAF file using m threads, and each read is sent to the appropriate builder. This obviously works better with shuffled reads than sorted reads.

Some additional changes:

  • I got rid of the old path/thread terminology in vg gbwt and in some other places.
  • GBWT construction options have been removed from vg index. They had likely diverged from the supported ones in vg gbwt anyway.
  • SDSL, GCSA2, GBWT, and GBWTGraph have been updated.

@adamnovak
Copy link
Member

It looks like we need to make a corresponding change in toil-vg for the removal of vg index --gbwt-name, or else rip toil-vg out of CI and do something else for testing the pipelines.

@jltsiren jltsiren merged commit 519656f into master Nov 6, 2024
2 checks passed
@jltsiren jltsiren deleted the faster-gam-gaf-gbwt branch November 6, 2024 03:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants