-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Illogical Lineage Counts in algorithms.py #2242
Comments
Thanks @agushin101, I think @GertjanBisschop is taking a look. This is possibly related to #2217 - we didn't do a very good job of keeping the Python prototype up to date with the C implementation as bugs got identified and fixed, and there are inconsistencies between the two. |
Thanks @agushin101, this is insightful! This definitely exposes a couple of issues. |
We adjust the recombination rate by dividing it by the product of the population size and the segment length, i.e., 0.1 / (population size * segment length). However, there are no recombination events observed during the sweep. If everything runs as expected, what should the sweep population size be at the beginning? In some runs, I observed it became [0,0]. Shouldn't it be [0,1], considering there should be a beneficial ancestor at the start of the sweep? Thank you for your time. |
The simulation is a structured coalescent based on a pre-simulated allele frequency trajectory. This does not guarantee that at the end of the simulated frequency trajectory everything will have coalesced. Similarly, all samples might have coalesced before the end of the simulated trajectory. Simply by chance you might have drawn some samples that coalesced right end the end of the sweep (so in the beginning when looked backwards in time). |
Thank you for the insights. I understand the simulation is based on a structured coalescent model with a pre-simulated allele frequency trajectory. However, my concern is specifically about the initial conditions of the sweep. We deliberately start the sweep with only one beneficial allele, implying that all lineages should eventually coalesce into this single ancestor. If the simulation sometimes ends with more than one lineage remaining, doesn't this indicate a potential issue in the coalescent function or the way it's handling these initial conditions? It seems like the simulation should consistently converge to a single lineage in this scenario, given the starting premise of a single beneficial allele. |
This is something I've been confused about too, FWIW! |
Yup, I get your point, this is somewhat confusing. The original Braverman paper on which this implementation is based gives some insight though: "The deterministic frequency trajectory starts at frequency 1 - E and ends at E . KAPLAN et al. (1989) chose E= 5 / (2Nes) based on their calculation of the point at which the probability of extinction of the new mutant due to stochastic effects becomes approximately zero". This means that if the start and end frequency we provide become too close to 0 (and 1 respectively), then we ignore the stochastic effects of the sweep. This is where our approximation no longer holds and so we shouldn't interpret the start frequency as a strict bound. |
sims.ipynb.zip
Above is a Notebook describing the issue in more detail. The summary is that: when running basic selective sweeps using algorithms.py's framework, there is some illogical output. The sweep_pop_sizes array, which I've interpreted to represent the lineage counts for wild type and mutant respectively, does not always converge. When the --population-sizes flag is not specified, it never converges to 1 lineage. When it is specified, it sometimes converges and sometimes does not, getting stuck at 2 mutant lineages, even when recombination is set to 0. I was wondering if someone could clear up why this is happening.
Thanks!
The text was updated successfully, but these errors were encountered: