-
Notifications
You must be signed in to change notification settings - Fork 86
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
3928078
commit 4dbefa6
Showing
1 changed file
with
146 additions
and
9 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,37 +1,174 @@ | ||
--- | ||
jupytext: | ||
text_representation: | ||
extension: .md | ||
format_name: myst | ||
format_version: 0.12 | ||
jupytext_version: 1.9.1 | ||
kernelspec: | ||
display_name: Python 3 | ||
language: python | ||
name: python3 | ||
--- | ||
|
||
```{code-cell} | ||
:tags: [remove-cell] | ||
import msprime | ||
from IPython.display import SVG | ||
from matplotlib import pyplot as plt | ||
``` | ||
|
||
(sec_rate_maps)= | ||
# Rate Maps | ||
|
||
It is often useful to express a variable rate of some sort along the chromosome. | ||
For example, most organisms have mutation rates and recombination rates which vary | ||
depending on genomic position. Rates of this sort can be stored as an instance | ||
of a {class}`.RateMap`, and then passed as a parameter to the relevant `msprime` | ||
method. For instance, {func}`.sim_ancestry` accepts a {class}`.RateMap` as the | ||
`recombination_rate` parameter and {func}`.sim_mutations` accepts a {class}`.RateMap` | ||
as the `rate` parameter. | ||
|
||
A rate map is defined as a set of constant rates which apply piecewise | ||
over contiguous regions of the chromosome. An example is shown below, both as a | ||
table and as a plot of the rate at different genomic positions. | ||
|
||
```{eval-rst} | ||
.. todo:: Need to fill out the content here following existing patterns. | ||
```{code-cell} | ||
:tags: [hide-input] | ||
ratemap = msprime.RateMap( | ||
position=[0, 4000, 9000, 11000, 20000, 30000], | ||
rate=[0, 1, 6, 2, 1] | ||
) | ||
display(ratemap) | ||
plt.figure(figsize=(10, 4)) | ||
plt.stairs(ratemap.rate, ratemap.position) | ||
plt.title("Example RateMap with 5 rates, and a hotspot around position 10000") | ||
plt.xlabel("Genome position") | ||
plt.ylabel("Rate") | ||
plt.xlim(0,ratemap.sequence_length) | ||
plt.show() | ||
``` | ||
|
||
--- | ||
|
||
(sec_rate_maps_quickref)= | ||
|
||
## Quick reference | ||
|
||
{class}`.RateMap` | ||
: Vary rates along a genome | ||
: The RateMap class: an instance can be created from provided data | ||
|
||
**Creating RateMaps** | ||
**Creating rate maps** | ||
|
||
{meth}`.RateMap.uniform` | ||
: A RateMap with one rate | ||
: Create a RateMap with a single rate over the entire chromosome | ||
|
||
{meth}`.RateMap.read_hapmap` | ||
: Read in a RateMap from a file in "hapmap" format (common in recombination maps) | ||
|
||
{meth}`.RateMap.slice` | ||
: Slice out a portion of a RateMap | ||
: Create a new RateMap by slicing out a portion of an existing RateMap | ||
|
||
**Computing rates** | ||
|
||
{meth}`.RateMap.mean_rate` | ||
: Weighted average rate | ||
{attr}`.RateMap.mean_rate` | ||
: The weighted average rate | ||
|
||
{attr}`.RateMap.total_mass` | ||
: The cumulative sum of rates over the entire map | ||
|
||
{meth}`.RateMap.get_rate` | ||
: Compute rate at a position | ||
: Compute rate at a set of user-specified positions | ||
|
||
{meth}`.RateMap.get_cumulative_mass` | ||
: Compute the cumulative rate at a set of user-specified positions | ||
|
||
|
||
(sec_rate_maps_creating)= | ||
## Creating rate maps | ||
|
||
A rate map can be created by initializing an instance of the {class}`.RateMap` class | ||
with a list of _n+1_ positions and _n_ rates. Alternatively, uniform rate maps can be | ||
created as described below, or rates read in from a text file. The last position of a | ||
rate map is taken as the sequence length: it is your responsibility to ensure that this | ||
is the same length as the simulated chromosome to which the rate is applied. | ||
|
||
(sec_rate_maps_creating_uniform)= | ||
### Uniform rate maps | ||
|
||
The most basic rate map is a uniform rate over the entire sequence. This can simply be | ||
generated by {meth}`.RateMap.uniform`: | ||
|
||
```{code-cell} | ||
msprime.RateMap.uniform(length=1e8, rate=0.5) | ||
``` | ||
|
||
### Reading from a file | ||
|
||
Describe read_hapmap briefly here. | ||
|
||
(sec_rate_maps_masking_and_slicing)= | ||
### Masking and creating new maps via slicing | ||
|
||
Often, the rates at the start and end of the chromosome (i.e. in the telomeric regions) | ||
are unknown, and treated as zero. Therefore if there is a zero-rate region at the start | ||
or end of the list of provided rates, an alternative start position and/or end position | ||
can be defined when creating the map such that some or all of these terminal zero-rate | ||
regions are "masked out" (i.e. not counted) when calculating the mean rate over the | ||
entire map. | ||
|
||
This sort of zeroing-out can also be handy if only a small region of the rate map is | ||
needed, for instance, because simulation is focussed on a small region of a larger | ||
chromosome. The {meth}`.RateMap.slice` method does this by setting user-specified | ||
start and end positions, overwriting the rate to the left of the start and to the right | ||
of the end with zero. The benefit to doing this is that the original coordinate system | ||
is still used. For example, if you wish to simulate a 40 kb section of human chromosome | ||
1 from position 0.7Mb, but keep the standard chromosome 1 genomic coordinates, you could | ||
do this: | ||
|
||
```{code-cell} | ||
import io | ||
# First and last sections of human chr 1 build 36 genetic map taken from | ||
# https://ftp.ncbi.nlm.nih.gov/hapmap/recombination/latest/rates/ | ||
hapmap_chr1_snippet = io.StringIO("""chr position COMBINED_rate(cM/Mb) Genetic_Map(cM) | ||
1 72434 0.0015000000 0 | ||
1 78032 0.0015000000 0.0000083970 | ||
1 554461 0.0015000000 0.0007230405 | ||
1 554484 0.0015000000 0.0007230750 | ||
1 555296 0.0015000000 0.0007242930 | ||
1 558185 0.0015000000 0.0007286265 | ||
1 558390 0.0010000000 0.0007289340 | ||
1 711153 1.9757156611 0.0008816970 | ||
1 713682 2.0415081787 0.0058782819 | ||
1 713754 2.1264889217 0.0060252705 | ||
1 718105 2.1260252999 0.0152776238 | ||
1 719811 2.1911540342 0.0189046230 | ||
1 728873 2.1938678585 0.0387608608 | ||
1 730720 2.1951341342 0.0428129347 | ||
1 740098 1.3377804252 0.0633989027 | ||
1 742429 0.9035885389 0.0665172688 | ||
1 743132 0.9037201735 0.0671524916 # Next 256440 lines snipped | ||
1 247184904 0.6238867396 278.0908415443 | ||
1 247185273 0 278.0910717585""") | ||
large_ratemap = msprime.RateMap.read_hapmap(hapmap_chr1_snippet) | ||
sliced_ratemap = large_ratemap.slice(700e3, 740e3) | ||
# Simulate only 40kb | ||
ts = msprime.sim_ancestry(10, recombination_rate=sliced_ratemap) | ||
# Only keep the trees in the sliced region. Then if we then mutate, we will not | ||
# create mutations in the flanking regions | ||
ts = ts.keep_intervals([[700e3, 740e3]]) | ||
``` | ||
|
||
Because the recombination rate is 0 in regions below position 700 000 and above position | ||
740 000, `msprime` has not had to put effort into simulating recombination (and its | ||
effect on ancestry) in these regions, so the simulation completes very quickly, even | ||
though the resulting tree sequence has a `sequence_length` of 247 megabases. | ||
|
||
(sec_rate_maps_creating_hotspots)= | ||
### Hotspots | ||
|
||
### Extracting values at specific positions | ||
|
||
Document the used of `get_cumulative_mass` to convert physical to genetic coordinates. |