diff --git a/previews/PR19/.documenter-siteinfo.json b/previews/PR19/.documenter-siteinfo.json index 80fc57b..6b1e61a 100644 --- a/previews/PR19/.documenter-siteinfo.json +++ b/previews/PR19/.documenter-siteinfo.json @@ -1 +1 @@ -{"documenter":{"julia_version":"1.11.2","generation_timestamp":"2024-12-11T21:24:37","documenter_version":"1.8.0"}} \ No newline at end of file +{"documenter":{"julia_version":"1.11.2","generation_timestamp":"2024-12-11T21:42:30","documenter_version":"1.8.0"}} \ No newline at end of file diff --git a/previews/PR19/index.html b/previews/PR19/index.html index 9654bee..038fd4e 100644 --- a/previews/PR19/index.html +++ b/previews/PR19/index.html @@ -1,2 +1,2 @@ -home · PhyloCoalSimulations.jl

PhyloCoalSimulations

PhyloCoalSimulations is a Julia package to simulate phylogenies under the coalescent. It depends on PhyloNetworks for the phylogenetic data structures, and manipulation of phylogenies.

References: please see bibtex entries here.

For a tutorial, see the manual:

For help on individual functions, see the library:

+home · PhyloCoalSimulations.jl

PhyloCoalSimulations

PhyloCoalSimulations is a Julia package to simulate phylogenies under the coalescent. It depends on PhyloNetworks for the phylogenetic data structures, and manipulation of phylogenies.

References: please see bibtex entries here.

For a tutorial, see the manual:

For help on individual functions, see the library:

diff --git a/previews/PR19/lib/internal/index.html b/previews/PR19/lib/internal/index.html index fe7e24a..2718488 100644 --- a/previews/PR19/lib/internal/index.html +++ b/previews/PR19/lib/internal/index.html @@ -1,7 +1,7 @@ -internals · PhyloCoalSimulations.jl

internal documentation

Documentation for PhyloCoalSimulations's internal functions. These functions are not exported and their access (API) should not be considered stable. But they can still be used, like this for example: PhyloCoalSimulations.foo() for a function named foo().

functions & types

PhyloCoalSimulations.mappingnodesType
mappingnodes(gene tree)

Type to define an iterator over degree-2 mapping nodes in a gene tree, assuming these degree-2 nodes (other than the root) have a name to map them to nodes in a species phylogeny. See ismappingnode.

source
PhyloCoalSimulations.coalescence_edgeMethod
coalescence_edge(edge1, edge2, number, populationid)

Create a coalescence between edges 1 and 2: with a new parent node n numbered number and a new parent edge e above the parent node, of length 0 and numbered number. Both n.intn1 and e.inte1 are set to populationid.

source
PhyloCoalSimulations.convert2tree!Method
convert2tree!(rootnode)

Return a network with all nodes and edges that can be reached from rootnode. Warning: Assumes that edges are correctly directed (with correct ischild1 attribute) and that the graph is a tree. This is not checked.

If the root node is still attached to an incomplete root edge, this edge & node are first disconnected.

source
PhyloCoalSimulations.initializetipFunction
initializetip(species::AbstractString, individual::AbstractString,
-              number::Integer, delim=""::AbstractString)

Create a leaf node and a pendant edge of length 0, incident to each other, both numbered number. Return the pendant edge. The leaf name is made by concatenating species, delim and individual.

source
PhyloCoalSimulations.initializetipforestFunction
initializetipforest(speciesnode::Node, nindividuals::Integer,
-              number::Integer, delim)

Vector of pendant leaf edges, with leaves named after speciesnode, and numbered with consecutive number IDs starting at number. If nindividuals is 1, then the leaf name is simply the species name. Otherwise, then the leaf names include the individual number and the default delimiter is _. For example, if the species name is s then leaf names are: s_1, s_2, etc. by default. Pendant leaf edges have inte1 set to the number of the corresponding edge in the species network.

source
PhyloCoalSimulations.map2population!Method
map2population!(forest, population_node, populationid, nextlineageID)

Extend each incomplete edge in the forest with a new degree-2 node n and a new incomplete edge e, with the following information to map n and e into the species phylogeny:

  • e.inte1 is set to populationid, and
  • n.name is set to population_node.name if this name is non-empty, or string(population_node.number) otherwise (with any negative sign replaced by the string "minus").

e.number and n.number are set to nextlineageID, which is incremented by 1 for each incomplete edge in the forest.

The forest is updated to contain the newly-created incomplete edges, replacing the old incomplete (and now complete) edges.

Output: nextlineageID, incremented by the number of newly created degree-2 lineages.

example

julia> using PhyloNetworks; net = readnewick("(A:1,B:1);");
+internals · PhyloCoalSimulations.jl

internal documentation

Documentation for PhyloCoalSimulations's internal functions. These functions are not exported and their access (API) should not be considered stable. But they can still be used, like this for example: PhyloCoalSimulations.foo() for a function named foo().

functions & types

PhyloCoalSimulations.mappingnodesType
mappingnodes(gene tree)

Type to define an iterator over degree-2 mapping nodes in a gene tree, assuming these degree-2 nodes (other than the root) have a name to map them to nodes in a species phylogeny. See ismappingnode.

source
PhyloCoalSimulations.coalescence_edgeMethod
coalescence_edge(edge1, edge2, number, populationid)

Create a coalescence between edges 1 and 2: with a new parent node n numbered number and a new parent edge e above the parent node, of length 0 and numbered number. Both n.intn1 and e.inte1 are set to populationid.

source
PhyloCoalSimulations.convert2tree!Method
convert2tree!(rootnode)

Return a network with all nodes and edges that can be reached from rootnode. Warning: Assumes that edges are correctly directed (with correct ischild1 attribute) and that the graph is a tree. This is not checked.

If the root node is still attached to an incomplete root edge, this edge & node are first disconnected.

source
PhyloCoalSimulations.initializetipFunction
initializetip(species::AbstractString, individual::AbstractString,
+              number::Integer, delim=""::AbstractString)

Create a leaf node and a pendant edge of length 0, incident to each other, both numbered number. Return the pendant edge. The leaf name is made by concatenating species, delim and individual.

source
PhyloCoalSimulations.initializetipforestFunction
initializetipforest(speciesnode::Node, nindividuals::Integer,
+              number::Integer, delim)

Vector of pendant leaf edges, with leaves named after speciesnode, and numbered with consecutive number IDs starting at number. If nindividuals is 1, then the leaf name is simply the species name. Otherwise, then the leaf names include the individual number and the default delimiter is _. For example, if the species name is s then leaf names are: s_1, s_2, etc. by default. Pendant leaf edges have inte1 set to the number of the corresponding edge in the species network.

source
PhyloCoalSimulations.map2population!Method
map2population!(forest, population_node, populationid, nextlineageID)

Extend each incomplete edge in the forest with a new degree-2 node n and a new incomplete edge e, with the following information to map n and e into the species phylogeny:

  • e.inte1 is set to populationid, and
  • n.name is set to population_node.name if this name is non-empty, or string(population_node.number) otherwise (with any negative sign replaced by the string "minus").

e.number and n.number are set to nextlineageID, which is incremented by 1 for each incomplete edge in the forest.

The forest is updated to contain the newly-created incomplete edges, replacing the old incomplete (and now complete) edges.

Output: nextlineageID, incremented by the number of newly created degree-2 lineages.

example

julia> using PhyloNetworks; net = readnewick("(A:1,B:1);");
 
 julia> leafA = net.node[1]; edge2A_number = net.edge[1].number;
 
@@ -22,7 +22,7 @@
 julia> [e.node[1].name for e in f]
 2-element Vector{String}:
  "A"
- "A"
source
PhyloCoalSimulations.simulatecoal_onepopulation!Method
simulatecoal_onepopulation!([rng::AbstractRNG,]
     lineagelist,
     population_length,
     nextlineageID,
@@ -46,4 +46,4 @@
 3 nodes: 2 tips, 0 hybrid nodes, 1 internal tree nodes.
 tip labels: s2, s1
 (s2:0.302,s1:0.302);
-
source

index

+
source

index

diff --git a/previews/PR19/lib/public/index.html b/previews/PR19/lib/public/index.html index ad38997..28c0efb 100644 --- a/previews/PR19/lib/public/index.html +++ b/previews/PR19/lib/public/index.html @@ -1,5 +1,5 @@ -public · PhyloCoalSimulations.jl

public documentation

Documentation for PhyloCoalSimulations's public (exported) functions. Most functions are internal (not exported).

functions & types

PhyloCoalSimulations.gene_edgemapping!Function
gene_edgemapping!(gene_tree, species_network, checknames=true)

Given a gene tree with labeled internal nodes that map to a species phylogeny (a species tree or a species network), this function maps each gene edge to the species edge that it is contained "within". Gene edge mappings are stored in the inte1 field of each gene tree edge, but it's best to access this mapping via population_mappedto.

Assumption: the species_network has unique node names to uniquely identify the speciation and reticulation events; and the gene_tree has degree-2 nodes with matching names, to indicate which species event each degree-2 node corresponds to.

The checknames argument takes a boolean, and, if true, then the function will check that both the species and the gene phylogeny have the same internal node names. The mapping of edges is recovered from matching names between nodes in the gene tree and nodes in the species network.

source
PhyloCoalSimulations.population_mappedtoMethod
population_mappedto(edge or node)

Identifier of the population (edge in the species network) that a gene tree's edge or a node is mapped onto, or nothing if not mapped. For example, coalescent nodes in gene trees map to a node in the species phylogeny, instead of mapping to an edge.

source
PhyloCoalSimulations.simulatecoalescentMethod
simulatecoalescent([rng::AbstractRNG,] net, nloci, nindividuals;
+public · PhyloCoalSimulations.jl

public documentation

Documentation for PhyloCoalSimulations's public (exported) functions. Most functions are internal (not exported).

functions & types

PhyloCoalSimulations.gene_edgemapping!Function
gene_edgemapping!(gene_tree, species_network, checknames=true)

Given a gene tree with labeled internal nodes that map to a species phylogeny (a species tree or a species network), this function maps each gene edge to the species edge that it is contained "within". Gene edge mappings are stored in the inte1 field of each gene tree edge, but it's best to access this mapping via population_mappedto.

Assumption: the species_network has unique node names to uniquely identify the speciation and reticulation events; and the gene_tree has degree-2 nodes with matching names, to indicate which species event each degree-2 node corresponds to.

The checknames argument takes a boolean, and, if true, then the function will check that both the species and the gene phylogeny have the same internal node names. The mapping of edges is recovered from matching names between nodes in the gene tree and nodes in the species network.

source
PhyloCoalSimulations.population_mappedtoMethod
population_mappedto(edge or node)

Identifier of the population (edge in the species network) that a gene tree's edge or a node is mapped onto, or nothing if not mapped. For example, coalescent nodes in gene trees map to a node in the species phylogeny, instead of mapping to an edge.

source
PhyloCoalSimulations.simulatecoalescentMethod
simulatecoalescent([rng::AbstractRNG,] net, nloci, nindividuals;
     nodemapping=false, inheritancecorrelation=0.0)

Simulate nloci gene trees with nindividuals from each species under the multispecies network coalescent, along network net whose branch lengths are assumed to be in coalescent units (ratio: number of generations / effective population size). The coalescent model uses the infinite-population-size approximation.

The random number generator rng is optional.

Output: vector of gene trees, of length nloci.

nindividuals can be a single integer, or a dictionary listing the number of individuals to be simulated for each species.

If nodemapping is true, each simulated gene tree is augmented with degree-2 nodes that can be mapped to speciation or hybridization events. The mapping of gene tree nodes & edges to network edges is carried by their attribute .intn1 (for nodes) and .inte1 (for edges). The mapping of gene tree nodes to network nodes is carried by the .name attribute. Namely:

  • A degree-3 node (1 parent + 2 children) represents a coalescent event that occurred along a population edge in net. Its .intn1 attribute is set to the number of that network population edge. Its parent edge has its .inte1 attribute also set to the number of the population edge that it originated from.
  • The gene tree's root node (of degree 2) represents a coalescent event along the network's root edge. Its .intn1 attribute is the number assigned to the network's root edge, which is set by get_rootedgenumber as the maximum edge number + 1.
  • A leaf (or degree-1 node) represents an individual. It maps to a species in net. The individual leaf name is set to the species name if nindividuals is 1. Otherwise, its name is set to speciesname_i where i is the individual number in that species. Its intn1 attribute is the default -1.
  • A non-root degree-2 node represents a speciation or hybridization and maps to a population node in net. Its intn1 attribute is the default -1. Its name is set to network node name, if it exists. If the network node has no name, the gene tree node is given a name built from the network node number.

By default, lineages at a hybrid node come from a parent (chosen according to inheritance probabilities γ) independently across lineages. Positive dependence can be simulated with option inheritancecorrelation. For example, if this correlation is set to 1, then all lineages inherit from the same (randomly sampled) parent. More generally, the lineages' parents are simulated according to a Dirichlet process with base distribution determined by the γ values, and with concentration parameter α = (1-r)/r, that is, r = 1/(1+α), where r is the input inheritance correlation. For more details about this model, please read the package manual or refer to Fogg, Allman & Ané (2023).

Assumptions:

  • net must have non-missing edge lengths and γ values.
  • If nindividuals is a dictionary, it must have a key for all species, with the same spelling of species names in its keys as in the tip labels of net.

examples

julia> using PhyloNetworks
 
 julia> net = readnewick("(A:1,B:1);"); # branch lengths of 1 coalescent unit
@@ -81,7 +81,7 @@
  (tree_edge_number = 6, pop_edge_number = 3)
  (tree_edge_number = 3, pop_edge_number = 1)
  (tree_edge_number = 1, pop_edge_number = 1)
- (tree_edge_number = 2, pop_edge_number = 1)
source
PhyloCoalSimulations.simulatecoalescentMethod
simulatecoalescent([rng::AbstractRNG,] net, nloci, nindividuals, populationsize;
     nodemapping=false, round_generationnumber=true,
     inheritancecorrelation=0.0)

Simulate nloci gene trees with nindividuals from each species under the multispecies network coalescent, along network net, whose branch lengths are assumed to be in number of generations. populationsize should be a single number, assumed to be the (haploid) effective population size Nₑ, constant across the species phylogeny. Alternatively, populationsize can be a dictionary mapping the number of each edge in net to its Nₑ, including an extra edge number for the population above the root of the network.

Coalescent units are then calculated as u=g/Nₑ where g is the edge length in net (number of generations), and the coalescent model is applied using the infinite-population-size approximation.

Output: vector of gene trees with edge lengths in number of generations, calculated as g=uNₑ and then rounded to be an integer, unless round_generationnumber is false.

Warning

When populationsize Nₑ is not provided as input, all edge lengths are in coalescent units. When populationsize is given as an argument, all edge lengths are in number of generations. The second method (using # generation and Nₑ as input) is a wrapper around the first (using coalescent units).

julia> using PhyloNetworks
 
@@ -105,4 +105,4 @@
 julia> writeMultiTopology(genetrees, stdout) # branch lengths: number of generations
 (B:546.0,A:546.0);
 (B:3155.0,A:3155.0);
-
source

index

+
source

index

diff --git a/previews/PR19/man/converting_coal2generation_units/index.html b/previews/PR19/man/converting_coal2generation_units/index.html index 4e68590..700f741 100644 --- a/previews/PR19/man/converting_coal2generation_units/index.html +++ b/previews/PR19/man/converting_coal2generation_units/index.html @@ -27,4 +27,4 @@ R"mtext"("red: Ne values", side=1, line=-1.5, col="red4"); R"mtext"("black: edge lengths", side=1, line=-0.5);

species net with Ne

To simulate gene trees with edge lengths in generations, we can use a convenience wrapper function that takes Nₑ as an extra input to:

julia> genetree_gen = simulatecoalescent(net_gen,3,1, Ne; nodemapping=true);
julia> writeMultiTopology(genetree_gen, stdout) # 3 gene trees, lengths in #generations(((A:1000.0)i2:500.0)i3:348.0,(((C:900.0)i1:600.0)i3:108.0,(((B:200.0)H1:700.0)i1:600.0)i3:108.0):241.0); (((A:1000.0)i2:500.0)i3:396.0,((((B:200.0)H1:700.0)i1:600.0)i3:7.0,((C:900.0)i1:600.0)i3:7.0):388.0); -(((C:900.0)i1:600.0)i3:1261.0,(((A:1000.0)i2:500.0)i3:610.0,(((B:200.0)H1:600.0)i2:500.0)i3:610.0):651.0);
Warning

When Nₑ is given as an extra input to simulatecoalescent, edge lengths in the network are assumed to be in number of generations. If Nₑ is not given as input, then edge lengths are assumed to be in coalescent units.

+(((C:900.0)i1:600.0)i3:1261.0,(((A:1000.0)i2:500.0)i3:610.0,(((B:200.0)H1:600.0)i2:500.0)i3:610.0):651.0);
Warning

When Nₑ is given as an extra input to simulatecoalescent, edge lengths in the network are assumed to be in number of generations. If Nₑ is not given as input, then edge lengths are assumed to be in coalescent units.

diff --git a/previews/PR19/man/correlated_inheritance/index.html b/previews/PR19/man/correlated_inheritance/index.html index 46e8594..62e13ab 100644 --- a/previews/PR19/man/correlated_inheritance/index.html +++ b/previews/PR19/man/correlated_inheritance/index.html @@ -15,4 +15,4 @@ plot(gt1, shownodelabel=true, edgelabel=el1, edgelabelcolor=el1.label, tipoffset=0.1); plot(gt2, shownodelabel=true, edgelabel=el2, edgelabelcolor=el2.label, tipoffset=0.1); -plot(gt3, shownodelabel=true, edgelabel=el3, edgelabelcolor=el3.label, tipoffset=0.1);

3 gene trees on 1-taxon network with inheritance correlation

In all cases, any lineage has a probability γ=0.6 to come from species edge 2 (labeled in red), and probability γ=0.4 to come from species edge 3 (in green). When the inheritance correlation r increases, lineages have an increased preference to come from the same parent as other lineages (at the same locus).

+plot(gt3, shownodelabel=true, edgelabel=el3, edgelabelcolor=el3.label, tipoffset=0.1);

3 gene trees on 1-taxon network with inheritance correlation

In all cases, any lineage has a probability γ=0.6 to come from species edge 2 (labeled in red), and probability γ=0.4 to come from species edge 3 (in green). When the inheritance correlation r increases, lineages have an increased preference to come from the same parent as other lineages (at the same locus).

diff --git a/previews/PR19/man/getting_started/index.html b/previews/PR19/man/getting_started/index.html index a96f49a..ddfd7b8 100644 --- a/previews/PR19/man/getting_started/index.html +++ b/previews/PR19/man/getting_started/index.html @@ -31,4 +31,4 @@ tip labels: B_3, C_3, C_2, C_1, ... (((B_3:0.964,C_3:0.964):0.214,(C_2:0.756,C_1:0.756):0.422):1.125,((B_1:0.595,B_2:0.595):0.849,(A_1:0.079,(A_2:0.054,A_3:0.054):0.025):1.565):0.659);

We can also ask for varying numbers of individuals. For example, we simulate below 2 individuals in A and 1 individual in each of B and C, using a dictionary to map species to their number of individuals:

julia> genetrees = simulatecoalescent(net, 1, Dict("A"=>2, "B"=>1, "C"=>1));
julia> writenewick(genetrees[1])"(C:1.7342183562262905,(B:1.1125494690619213,(A_2:0.39810435449985226,A_1:0.39810435449985226):0.9144451145620689):0.42166888716436934);"

We can set 0 individuals within a species to simulate missing data.

julia> genetrees = simulatecoalescent(net, 3, Dict("A"=>2, "B"=>1, "C"=>0));
julia> writeMultiTopology(genetrees, stdout)((A_1:0.6217550029603106,A_2:0.6217550029603106):1.8934561254937003,B:2.515211128454011); ((A_1:0.020216077041873223,A_2:0.020216077041873223):1.6564835196459733,B:1.6766995966878466); -(B:1.3598531402719405,(A_2:0.5258264787632165,A_1:0.5258264787632165):1.034026661508724);
+(B:1.3598531402719405,(A_2:0.5258264787632165,A_1:0.5258264787632165):1.034026661508724); diff --git a/previews/PR19/man/mapping_genetree_to_network/index.html b/previews/PR19/man/mapping_genetree_to_network/index.html index e4b31d8..276fe2f 100644 --- a/previews/PR19/man/mapping_genetree_to_network/index.html +++ b/previews/PR19/man/mapping_genetree_to_network/index.html @@ -13,4 +13,4 @@ 4 edges 5 nodes: 3 tips, 0 hybrid nodes, 2 internal tree nodes. tip labels: B, C, A -((C:1.695,A:1.695):0.979,B:2.474);

The option true is to keep the root, even if it's of degree 2.

+((C:1.695,A:1.695):0.979,B:2.474);

The option true is to keep the root, even if it's of degree 2.

diff --git a/previews/PR19/man/more_examples/index.html b/previews/PR19/man/more_examples/index.html index 5bf4549..553096a 100644 --- a/previews/PR19/man/more_examples/index.html +++ b/previews/PR19/man/more_examples/index.html @@ -51,4 +51,4 @@ 3 => 1.75424 1 => 0.548284
julia> writenewick(tree, round=true, digits=4) # before rate variation"((((B:0.2)H1:0.6)i2:0.5)i3:1.1744,(((C:0.9)i1:0.6)i3:0.195,((A:1.0)i2:0.5)i3:0.195):0.9793);"

Finally, we multiply the length of each gene lineage by the rate of the species edge it maps into:

julia> for e in tree.edge
          e.length *= networkedge_rate[e.inte1]
-       end
julia> writenewick(tree, round=true, digits=4) # after rate variation"((((B:0.1344)H1:0.9867)i2:0.2757)i3:1.306,(((C:0.4935)i1:0.9314)i3:0.2169,((A:0.351)i2:0.2757)i3:0.2169):1.0891);"
+ end
julia> writenewick(tree, round=true, digits=4) # after rate variation"((((B:0.1344)H1:0.9867)i2:0.2757)i3:1.306,(((C:0.4935)i1:0.9314)i3:0.2169,((A:0.351)i2:0.2757)i3:0.2169):1.0891);"