Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A nice idea for accuracy of inference at different timepoints #2

Open
hyanwong opened this issue Oct 6, 2019 · 2 comments
Open

A nice idea for accuracy of inference at different timepoints #2

hyanwong opened this issue Oct 6, 2019 · 2 comments

Comments

@hyanwong
Copy link
Member

hyanwong commented Oct 6, 2019

Anders Eriksson suggested a nice way of testing whether our inference methods do well or poorly for different heights in the TS.

We use the (infinite sites) mutations to identify corresponding edges in the true and the inferred TS. Then (since we are guaranteed that the tips under each are the same), we can calculate a topology difference between the subtrees rooted at that node.

@petrelharp
Copy link
Contributor

Nice. This gives us a way of identifying nodes also - nodes = ancestral haplotypes, and are mutations that originated in a given haplotype in one tree sequence, are they in the same in another.

@hyanwong
Copy link
Member Author

Another possibility, as just discussed with Michelle Kendell, and particularly useful for tsinfer, where we have a known (simulated) TS with branch lengths and an inferred topology with arbitrary lengths. We take all nodes from the known topology that exist between certain timepoints, and select all the pairwise differences (with left-right coords if >1 tree) that split on this node. We then calculate a topology-only pairwise distance metric (e.g. KC) based on only those pairs over that portion of the genome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants