-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multichrom example #2665
base: main
Are you sure you want to change the base?
Multichrom example #2665
Conversation
Great! I'll take a closer look in a bit. |
Super cool. Good to know about helgrind, too; I will look into that too! :-> |
I guess this is relevant to #176, and might inform how we deal with multiple chromosomes. |
As I understand it, we could choose to make the two things tied together – to make the separate tree sequences begin/end at chromosome boundaries – but we don't have to do that, and there are pretty strong reasons not to do that, perhaps. (For example, you want to be able to keep multiple tree sequences to speed up even single-chromosome models; and also, chromosomes will be of very different lengths, whereas you want to divide the work of simplification as evenly as you can across your multiple tree sequences; and also, the number of chromosomes you are simulating will often not match the number of threads you have available for simplification.) So I would tentatively lean towards not connecting these two things. |
I think it's not exclusive. If you want to represent multiple chromosomes, then this gives a nice way to do it. But there's not reason why you couldn't also do it on a single chromosome, or indeed on part of a chromosome within a multi-chromosome set. But I do take you point that there's an elegancy about keeping the things completely independent. |
This is really about low-level C API stuff here for now - how we might (or not) surface this to higher level things like representing multiple chromosomes is something we should wait and see (which is I think what we're all saying!). |
5df5775
to
8830fbb
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2665 +/- ##
==========================================
- Coverage 89.76% 89.76% -0.01%
==========================================
Files 30 29 -1
Lines 29892 29886 -6
Branches 5803 5803
==========================================
- Hits 26832 26826 -6
Misses 1753 1753
Partials 1307 1307
Flags with carried forward coverage won't be shown. Click here to find out more. |
8830fbb
to
8ba0448
Compare
@petrelharp @bhaller - I realised there was an easy way to join the tables by appending all the edges to the first table, sorting and then squashing adjacent edges. This should be equivalent to the haploid_wright_fisher example now (but that'll need to be verified). |
1 similar comment
@petrelharp @bhaller - I realised there was an easy way to join the tables by appending all the edges to the first table, sorting and then squashing adjacent edges. This should be equivalent to the haploid_wright_fisher example now (but that'll need to be verified). |
Ah, probably need to set the sample flags for the final generation. |
One more update - I've marked the sample flags and updated the original WF example so that it also indexes the output, and I think the outputs are now identical up to node reordering:
and the second
We don't reorder the samples from 0 - n in the new version. If you do want this you can run simplify on it again and it'll renumber the output. (You could do this more cheaply too if you wanted to for compatibility purposes.) |
I think we can meaningfully compare the timings of this with haploid_wright_fisher now @petrelharp and @bhaller |
I think I know what's going on with the dissappointing scaling here. Here's an example from running the code:
The numbers for the multichromosome example are numbers of edges before and after simplification. Notice that each worker has about one-half the total number of edges: there were 20M edges for the whole simulation (2 edges * 10K individuals * 1K generations); and each of the 8 workers have 10M edges to deal with. This is because: suppose that a chromosome has been inherited in Now, what does simplify have to do under the hood? It's got a data structure that records the collection of ancestral segments that it's tracking; and then it goes through edges one at a time and deals with them. This means that if the cost of dealing with each edge is O(1), then our maximal speedup would be However, the work required by simplify's data structure does scale with the number of segments it is tracking, to some degree, since although simplify has to look at all those edges, it only has to actually do something when it encounters an edge that some of its segments move along. If this was a substantial contribution to runtime then we might expect better speedups. The things we'd need to know are (a) what portion of simplify's work is "do something to the segments" versus "look at the next edge"; and (b) once we divide up a chromosome into chunks, how much does that reduce the proportion of its edges through which a segment of ancestry passes? I'm not very clear on the point, but I suspect the answer to (b) is "not much"; if so, then there's no hope for a speedup. However, this approach would work just fine if each worker got literally one chromosome, since then edge breakpoints would usually (well, half the time) fall exactly on the divisions between workers. I may do a modification to the code here to have a look at that case, and see if we have wins there. |
Aha! That's great @petrelharp, it really clarifies things 👍 |
I'm thinking it's worth fixing up the multichrom example to merge in? @bhaller is enthusiastic about getting this into SLiM at some point. I think this just will require switching over |
OK, I'll fix it up when I get a chance. |
bdf4c9e
to
3eb2be0
Compare
I've updated this so that it's a clean build against upstream now @petrelharp. I think it needs a bit of documentation though - I don't really follow what the algorithm is doing. Could you update with some comments to help clarify that the model is, etc? |
3eb2be0
to
e51c6c2
Compare
I think the docs build is because of a stale cache. The simplest thing is probably to create a new PR. I've squashed my original commits down to one. |
I'm away on vacation for a couple of weeks, but:
|
Hi @benjeffery and @jeromekelleher. Do you really want to close this? It remains a useful example; I'm going to be consulting it carefully in the multichromosome work I'm doing in SLiM right now, in fact, and I can imagine other folks finding it interesting and useful too. I'm not saying you need to merge it necessarily, now or perhaps ever, but if it's closed, it seems to me that a fair bit of very useful knowledge and discussion will become much harder to find. Reopening so this comment doesn't get lost, but if you really want to close it, I won't get into a close/reopen war with you. :-> |
It's fairly esoteric knowledge @bhaller which I'm not sure is worth keeping a PR open in perpetuity for. We've learned the key lessons from it, I think, and moved on? |
But anybody who wants to actually use tskit in a multithreaded manner (which would be me, and very probably others, sooner or later) will continue to find it useful. Which is the point of an example, isn't it? Why purge it from the knowledge pool? |
If it's useful we should merge it I guess - I didn't think too hard about it tbh. Keeping it here in limbo isn't an answer. Does anyone want to have a look over and make suggestions for what would be needed to make this a useful documentation example? |
I nominate @petrelharp :->. In all seriousness, he knows the tskit ecosystem way better than I do, knows your doc system, and has been involved in this all along. Can I volunteer you, Peter? :-> |
I think the proposal is to add something to the docs, here. I could do that, sure. I'm imagining just talking through, like "we can have the tables for each chromosome share the same node and individual tables; here's how we do that". |
I can finish it up too @petrelharp, if you'd prefer to have a look through and post some review comments. |
Go for it! |
for (j = 0; j < num_chroms; j++) { | ||
if (j > 0) { | ||
/* Must not double free node table columns. */ | ||
memset(&tcs[j].nodes, 0, sizeof(tcs[j].nodes)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned on Slack, it seems safer to keep the shared tables zeroed out all the time except when they need to be there, to avoid accidental modification of one of the shared tables, use of stale data, stale pointers, etc.; there are lots of risks here. For SLiM I wrote a method that copies the main table collection's shared tables (we're sharing node, individual, and population) into the other table collections, and another method that zeros out those copies. Then I bracket operations like simplification and writing to disk with those methods. All the rest of the time the copies of the shared tables are zeroed for safety. Just a suggestion; might point people in a safer direction, even though it doesn't matter for this model as it is here.
printf("Simplify %lld nodes\n", (long long) tcs[0].nodes.num_rows); | ||
sort_and_simplify_all(tcs, num_chroms, samples, N); | ||
|
||
for (j = 0; j < num_nodes; j++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather than this loop, I used calloc() for keep_nodes; the kernel can provide zeroed pages faster than any loop can, even if the compiler turns it into memset()...
for (j = 0; j < num_chroms; j++) { | ||
edge_child = tcs[j].edges.child; | ||
edge_parent = tcs[j].edges.parent; | ||
for (k = 0; k < tcs[j].edges.num_rows; k++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here and at line 160, I put tcs[j].edges.num_rows
into a local temporary; I think the compiler might have to refetch the value from memory every time through, otherwise, since it might not be smart enough to realize that there is not a restrict
issue here (i.e., that the loop's work is guaranteed not to change the value of num_rows
through a pointer write)...
ret = tsk_table_collection_check_integrity(&tcs[j], 0); | ||
check_tsk_error(ret); | ||
} | ||
for (j = 0; j < N; j++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I actually have a question, and maybe a comment would be useful. The question is: is the result of filtering the node table ourselves, with tsk_node_table_keep_rows()
as done here, different from letting simplify do the filtering for us? Is the order of the nodes that results different? I ask because SLiM's current code assumes that the N samples are in positions 0:(N-1) in the node table after simplification. The code here kind of lumps the samples together with all of the nodes referenced by edges, and tells tskit "keep all of these", without distinguishing the two; so I'm guessing that SLiM can no longer make that 0:(N-1) assumption, and needs to remap the N samples as you do here, using node_id_map
. But I'd like to understand why. Does simplification filter the node table in a different and more sophisticated way? Or is there some other reason why SLiM's assumption happened to be valid? @petrelharp wrote that code for SLiM, I think, so maybe he wants to hear this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for doing the node filtering ourselves here is that the different per-chromosome simplify operations will come up with different node mappings. That's the only real reason for using tsk_node_table_keep_rows. You can't assume that the samples to 0,...n - 1 in this case, no (and would need to do the remapping).
} | ||
|
||
void | ||
simplify_tables(tsk_table_collection_t *tcs, int num_chroms, int *samples, int N) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest a name like samples_count
instead of N
, but maybe that's just me :->
Stacked on #2663 (and #2619)@petrelharp @bhaller @molpopgen You might be interested in this: the new file
c/examples/multichrom_wright_fisher.c
gives a simple example of multi chromosome simulations with parallel simplify (most of the other stuff is from other in-flight PRs)This adds a simple example of running a multi chromosome forwards simulation and running the sort/simplify steps in parallel using pthreads. This was really just to make sure that the basic idea of share the node table across the different table collections would work, and that we were getting the details right with the new TSK_SIMPLIFY_NO_FILTER_NODES and TSK_SIMPLIFY_NO_UPDATE_SAMPLE_FLAGS.
Running this through helgrind gives no errors (I think it's pretty good at picking up data races):
So, hooray, @molpopgen's ingenious idea works and with the infrastructure in #2663 and #2619 we can do it pretty easily!
Another thing I ended up doing here in 31c47e4 is to add a basic implementation of "delete rows", which is an operation we badly need in the API I think (mentioned here: #1034 (comment)). I think it's worth talking about that separately, so I'll open an issue for it.
We can probably let this sit for a while now anyway, and should try to clear up some other stuff first.