Skip to content

Latest commit

 

History

History
154 lines (121 loc) · 5.85 KB

intro.md

File metadata and controls

154 lines (121 loc) · 5.85 KB
jupytext kernelspec
text_representation
extension format_name format_version jupytext_version
.md
myst
0.12
1.9.1
display_name language name
Python 3
python
python3

(sec_intro)=

Welcome!

This site contains a number of tutorials to develop your understanding of genetic genealogies, ancestral recombination graphs, and the succinct tree sequence storage format, as implemented in tskit: the tree sequence toolkit. Also included are a number of tutorials showing advanced use of software programs, such as msprime, that form part of the tskit ecosystem.

:tags: [remove-input]
import math
import msprime

def make_7_tree_4_tip_ts():
    ts = msprime.sim_ancestry(
        4, ploidy=1, random_seed=889, sequence_length=1000, recombination_rate=0.001)
    ts = msprime.sim_mutations(ts, rate=2e-3, random_seed=123)

    # Check we have picked a random seed that gives a nice plot of 7 trees
    tip_orders = {
        tuple(u for u in t.nodes(order="minlex_postorder") if t.is_sample(u))
        for t in ts.trees()
    }
    topologies = {tree.rank() for tree in ts.trees()}
    assert tip_orders == {(0, 1, 2, 3)} and len(topologies) > 1 and ts.num_trees == 7

    return ts


ts = make_7_tree_4_tip_ts()

# Set some parameters: these can be adjusted to your liking
tree_width = 80
height = 200 # Normal height for tree + x-axis
y_step = 20  # Stagger between trees (i.e. 0 for all trees in a horizontal line)
skew = 0.7  # How skewed the trees are, in radians

width = tree_width * ts.num_trees + 20 + 20  # L & R margins in draw_svg = 20px
angle = math.atan(y_step/tree_width)
ax_mv = y_step, (ts.num_trees - 1) * y_step - 90 + math.tan(skew) * (tree_width * .9)

# CSS transforms used to skew the axis and stagger + skew the trees
style = f".x-axis {{transform: translate({ax_mv[0]}px, {ax_mv[1]}px) skewY(-{angle}rad)}}"
for i in range(ts.num_trees):
    # Stagger each tree vertically by y_step, transforming the "plotbox" tree container
    style += (
        f".tree.t{i} > .plotbox " + "{transform:" +
        f"translateY({(ts.num_trees - i - 1) * y_step-85}px) skewY({skew}rad)" + "}"
    )

# Define a bigger canvas size so we don't crop the moved trees from the drawing
size = (width, height)
canvas_size = (width + y_step, height + math.tan(skew)*tree_width)

ts.draw_svg(size=size, x_scale="treewise", style=style, canvas_size=canvas_size)

If you are new to the world of tree sequences, we suggest you start with the first tutorial: {ref}sec_what_is

:::{note} Tutorials are under constant development. Those that are still a work in progress and not yet ready for use are shown in italics in the list of tutorials.

We very much welcome help developing existing tutorials or writing new ones. Please open or contribute to a GitHub issue if you would like to help out. :::

Other sources of help

In addition to these tutorials, our Learn page lists selected videos and publications to help you learn about tree sequences.

We aim to be a friendly, welcoming open source community. Questions and discussion about using {program}tskit, the tree sequence toolkit should be directed to the GitHub discussion forum, and there are similar forums for other software in the tree sequence development community, such as for msprime and tsinfer.

(sec_intro_running)=

Running tutorial code

It is possible to run the tutorial code on your own computer, if you wish. This will allow you to experiment with the examples provided. The recommended way to do this is from within a Jupyter notebook. As well as installing Jupyter, you will also need to install the various Python libraries, most importantly tskit, msprime, numpy, and matplotlib. These and other packages are listed in the requirements.txt file; a shortcut to installing the necessary software is therefore:

python3 -m pip install -r https://tskit.dev/tutorials/requirements.txt

In addition, to run the {ref}R tutorial<sec_tskit_r> you will need to install the R reticulate library, and if running it in a Jupyter notebook, the IRkernel library. This can be done by running the following command within R:

install.packages(c("reticulate", "IRkernel")); IRkernel::installspec()

(sec_intro_downloading_datafiles)=

Downloading tutorial datafiles

Many of the tutorials use pre-existing tree sequences stored in the data directory. These can be downloaded individually from that link, or you can download them all at once by running the script stored in https://tskit.dev/tutorials/examples/download.py. If you are running the code in the tutorials from within a Jupyter notebook then you can simply load this code into a new cell by using the %load cell magic. Just run the following in a Jupyter code cell:

%load https://tskit.dev/tutorials/examples/download.py

Running the resulting Python code should download the data files, then print out finished downloading when all files are downloaded. You should then be able to successfully run code such as the following:

import tskit
ts = tskit.load("data/basics.trees")
print(f"The file 'data/basics.trees' exists, and contains {ts.num_trees} trees")