-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat : add reciprocal monophyly function to Tree class #2974
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2974 +/- ##
==========================================
+ Coverage 89.62% 90.45% +0.82%
==========================================
Files 29 26 -3
Lines 30170 21619 -8551
Branches 5867 4372 -1495
==========================================
- Hits 27041 19555 -7486
+ Misses 1790 1160 -630
+ Partials 1339 904 -435
Flags with carried forward coverage won't be shown. Click here to find out more.
|
|
See the docs here for details on building python extension module: https://tskit.dev/tskit/docs/stable/development.html#id1 |
Here's a quick sketch of another approach to doing this @Silloky: import msprime
demography = msprime.Demography()
demography.add_population(name="A", initial_size=1_000)
demography.add_population(name="B", initial_size=1_000)
demography.add_population(name="C", initial_size=10_000)
demography.add_population_split(time=10000, derived=["A", "B"], ancestral="C")
ts = msprime.sim_ancestry(samples={"A": 2, "B": 3}, demography=demography, random_seed=12)
print(ts)
print(ts.draw_text())
for pop_id in range(ts.num_populations):
pop_samples = ts.samples(population=pop_id)
if len(pop_samples) == 0:
continue
for tree in ts.trees(tracked_samples=pop_samples):
mrca = tree.mrca(*pop_samples) # Inefficient
monophyletic = len(pop_samples) == tree.num_tracked_samples(mrca)
print(pop_id, mrca, monophyletic) Here, we're computing a true/false value for whether a population (or more generally any sample subset) is monophyletic in each tree. We do this using the num_tracked_samples, which is efficiently computed for each node in the C layer. The only inefficient bit here is how we're computing the MRCA of all the samples (but in most cases would be fine). I think a good API for this would be a TreeSequence method that looked something like this: def is_monophyletic(self, samples):
"""
Returns a boolean array of length ts.num_trees which is True if the corresponding tree is monophyletic
for the given set of samples. That is, if ``u`` is the MRCA of all the specified samples in a given tree, no
other samples are present in the subtree rooted at u.
"""
# Code that looks like the above. How does this look to you? If the API matches your needs, then we can go ahead with implementation and testing, and then think about how we can do the MRCA more efficiently (if we need to)? |
Hi, thank you very much for your answer. |
Hi @Silloky! Are you still planning to work on this? |
Hi @benjeffery, |
Description
This PR adds a method to the
Tree
class in the Python API :is_recip_monophylectic()
.It tests the current tree for reciprocal monophyly, a population genomics phenomenon as described here.
The key algorithm can be summarized like this : if each population's MRCA's children are the same as the all the population's leaves, all populations are reciprocally monophyletic.
PR Checklist:
ModuleNotFoundError: No module named '_tskit'
when importingtskit
from my cloned fork)Also, this is only implemented in Python, not C, so someone needs to do that as I do not know C.