Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parallelize tree-sequence simplification #368

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Conversation

bhaller
Copy link
Contributor

@bhaller bhaller commented Jan 20, 2023

This pull request is for ongoing work on parallelization of tree-sequence simplification. It is based on the conceptual approach pioneered by @molpopgen, and the tskit-based PR by @jeromekelleher at tskit-dev/tskit#2665.

@bhaller
Copy link
Contributor Author

bhaller commented Jan 21, 2023

@petrelharp This is ready for you to take it over. It seems to work fine for the single-threaded case. If you build it multi-threaded you will get a couple of errors (from places where the code is not yet written properly), and several warnings that say "implement me!" Those are the spots where you need to write the code because I'm not sure what to do:

Species::SimplifyTreeSequence()
Species::CheckTreeSeqIntegrity()
Species::RecordNewGenome()
Species::__SplitTableCollection()
Species::__JoinTableCollection()

Hopefully I have left enough bread crubs in those spots that you will know what to do. To build multi-threaded, add -DPARALLEL=ON to your CMake command and hopefully the rest works, or you might need to install libomp and other things. I've got a whole new manual chapter written up for parallel SLiM, including some build instructions; I'll send it to you in Slack.

Probably the best place to start is to review these diffs, starting with species.h. After that, I'd recommend implementing the missing code in the methods listed above, but don't work about parallelization; just handle the multiple table collections with a for loop. I'll parallelize those core loops, particularly in simplify(), later on. If you see more stuff that you think I can do with a little coaching, I'm happy to help; but I doubt I'll be able to implement the split/join operations. I can review your diffs, though. :->

Oh, and of course you'll want to merge in the current tskit head, with the PRs that are needed for this stuff. I imagine we won't merge this PR until the tskit folks have merged those PRs into a new C API release, just to be cautious.

Exciting to be making progress on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant