Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial tree perf #5

Merged
merged 5 commits into from
Jul 26, 2021
Merged

Conversation

jeromekelleher
Copy link
Member

Making a start on #2.

Puts in some infrastructure comparing the tskit version of hartigan parsimony with the numba one. The idea would be to add (maybe) a naive C++ version using pointers, a more sophisticated C++ version, and a Pythran version. There's probably a better way to display the data, but here's the initial version of the plot anyway:

tree-performance

We could probably do a bit more work on the numba version, I haven't made any real effort to profile it.

@molpopgen, how does this look? I guess what we'd want for the C++ versions is to take an executable that takes a tree sequence file as input (and a max_sites), and prints the mean and variance of the time taken to run the parsimony algorithm for the first max_sites to stdout. We probably don't need to bother with the variance tbh, but I thought I'd stick it in in case we ever needed it. We can then wrap them nicely in this script.

@molpopgen
Copy link
Member

I have a start over here. Minor hiccup is needing a non-recursive postorder method for node iteration. Once this matures a bit, I'll get it over here.

@jeromekelleher
Copy link
Member Author

It would be nice to get all the code in here if we could - I was imagining a simple Makefile with a few .cc files what we could run directly in here? Is it possible to do something similar with Rust?

@benjeffery
Copy link
Member

Great stuff - I had a play around but couldn't get the numba code to go any faster, seems numba is already doing some of the tricks I tried and fastmath doesn't help here.

@molpopgen
Copy link
Member

I think this is a good start. I think we may have a hard time matching this setup exactly in other languages, but we can see how it goes, I guess.

@jeromekelleher jeromekelleher merged commit 54fe887 into tskit-dev:main Jul 26, 2021
@jeromekelleher jeromekelleher deleted the initial-tree-perf branch July 26, 2021 15:14
@jeromekelleher
Copy link
Member Author

Sounds good @molpopgen - what I'll do is write a C version of what the other languages should do, which should provide a reasonable template, dealing with the annoying stuff like decoding the genotypes etc.

@molpopgen
Copy link
Member

Sounds good. I'll try to concoct a C++ version of the Usher-like trees that isn't too hackish.

@benjeffery
Copy link
Member

One thing to add is that I think numba would perform closer if it had a non-recursive implementation as in tskit.

@jeromekelleher
Copy link
Member Author

One thing to add is that I think numba would perform closer if it had a non-recursive implementation as in tskit.

Perhaps - we could have "numba-recursive" and "numba-loops"? Not too hard, if a bit fiddly to do. I'd be surprised if it makes much difference though, my earlier experience was that recursion was surprisingly good in numba. Go nuts if you want to try it out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants