Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document traversal arrays and illustrate usage #1788

Closed
jeromekelleher opened this issue Oct 13, 2021 · 6 comments · Fixed by #1952
Closed

Document traversal arrays and illustrate usage #1788

jeromekelleher opened this issue Oct 13, 2021 · 6 comments · Fixed by #1952
Labels
enhancement New feature or request Python API Issue is about the Python API
Milestone

Comments

@jeromekelleher
Copy link
Member

#1704 added numpy array versions of the basic tree traversal orders. Document these and provide an example of how they might be used. It would be neat if we could illustrate with some phylo algorithm that can be implemented directly with (non-obscure) numpy calls.

@hyanwong
Copy link
Member

It would indeed be good to write the docstrings for these methods. Re examples, I've looked though the standard docs and we don't really document examples of using tree traversals there. The main text describing traversals is in the tutorials. I think we should describe an example there and provide a link from the docstring.

@hyanwong
Copy link
Member

hyanwong commented Nov 30, 2021

Here's a trivial example: count the average node arity of a tree

import msprime
import tskit
import numpy as np
ts = msprime.simulate(10, recombination_rate=3)
tree = ts.first()
parent, count = np.unique(tree.parent_array[tree.postorder()], return_counts=True)
print(f"Average arity is {count[parent != tskit.NULL].mean()}")

On a 100,000 tip tree this seems to be about 6 times faster than doing

np.mean([tree.num_children(n) for n in tree.nodes()])

A bit crap because it doesn't help with incremental calculations, but it is a very simple example, which is nice.

@jeromekelleher
Copy link
Member Author

That's a nice example @hyanwong, I like it. Might as well use tree.preorder() though, as it's slightly faster and the ordering doesn't matter.

@jeromekelleher
Copy link
Member Author

Simplest example I can think of is

time = ts.tables.nodes.time
nodes = tree.preorder()
branch_length = time[tree.parent_array[nodes]] - time[nodes]
branch_length[tree.parent_array[nodes] == tskit.NULL] = 0
print(branch_length)
print(np.array([tree.branch_length(u) for u in tree.nodes()]))

This is tricky though as we have to be careful about the branch lengths of the roots. That's possibly a good thing to warn people about if they're using these arrays though, as Python will quite happily use -1 indexes!

@mergify mergify bot closed this as completed in #1952 Dec 1, 2021
@jeromekelleher jeromekelleher reopened this Dec 1, 2021
@hyanwong
Copy link
Member

hyanwong commented Dec 1, 2021

That's a nice example @hyanwong, I like it.

Thanks. Example (with preorder) incorporated into #1963 with a "todo" to move to a reasonable location

@hyanwong
Copy link
Member

hyanwong commented Dec 1, 2021

I think we can close this, as my example is in the docs, and @jeromekelleher 's example is linked to from tskit-dev/tutorials#150 (comment)

@hyanwong hyanwong closed this as completed Dec 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Python API Issue is about the Python API
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants