Skip to content

Commit

Permalink
Update documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
laduplessis committed Oct 31, 2018
1 parent 9481065 commit c8e9df6
Show file tree
Hide file tree
Showing 4 changed files with 6 additions and 67 deletions.
5 changes: 2 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,9 @@
Please contact Louis du Plessis ([email protected]) for any questions.

## Summary
Phylodynamics example for BEAST 2.5 paper using data from the West African Ebola dataset. The example uses a sampled-ancestor birth-death skyline to infer population dynamics. This can also double as a bModelTest example.


## Documentation
Full documentation is inside the `doc/` directory.
Full [documentation](doc/ebov_beast2_example.pdf) is inside the `doc/` directory.

## Reproducing results
Follow the workflows in `workflows/`
Expand Down
58 changes: 4 additions & 54 deletions doc/ebov_bdskysa.tex
Original file line number Diff line number Diff line change
Expand Up @@ -18,49 +18,10 @@ \section{Tree models for unstructured populations}

\clearpage

\emph{Insert at the end of the paragraph ending on line 206 (bdsky)}

% There are two common approaches for modelling the phylogenetic tree, or the genealogy, in phylogenetic inference. % Tanja: so far we only talked about phylo trees. do we need to talk about genealogies here? if so, shall we define what we mean with phylo tree vs genealogy.
% The first assumes a classic population dynamic model, namely the birth-death model \citep{Yule1924,kendall1949stochastic}, to model the growth of a tree.
% In a population dynamic birth-death model, through time, each individual gives rise to one additional offspring with rate $\lambda$ and dies with rate $\mu$.
% As we only analyse a fraction of individuals arising in this process, it is necessary to model the sampling process for tips of a birth-death tree.
% For a variety of simple partially-sampled birth-death trees, the distribution of branch lengths has been derived exactly\citep{StadlerJTB2010}.

% Alternatively, a mathematical model for trees known as the coalescent \citep{kingman1982b,griffiths1994} can be adapted to account for changing effective population size through time.
% One can interpret the parameters of the coalescent, namely the effective population size and its changes, as birth-death parameters \citep{volz2012complex} when making some coalescent approximations.
% Partially-sampled birth-death models do not make the approximations that coalescent models do, but they depend on a model of the sampling process, and simple sampling models may not always be an adequate description of real data sets. It is an ongoing debate and topic of research to investigate the consequences of coalescent approximations and sampling model assumptions.

% Coalescent approaches have been embedded within BEAST since its original release \citep{drummond2007beast}. Thus, we will not further discuss the basic coalescent approach here. In what follows, we will introduce the basic birth-death models which underwent major development in recent years. Then, we discuss the more sophisticated birth-death and coalescent approaches side by side.
% % Tanja please check if you like my re-write - (I realized that you restructured the start based on erik's comments ; but then the flow for the rest was a bit weird. I hope it is fine again now.)


% %A commonly used basis of population dynamic models giving rise to trees is the class of birth-death models \citep{Yule1924,kendall1949stochastic}.
% %Through time, each individual gives rise to one additional offspring with rate $\lambda$ and dies with rate $\mu$.
% %There are two common approaches for adapting this process to provide a genealogical prior for phylogenetic inference.
% %The first is to propose a sampling process for tips of a birth-death genealogy.
% %For a variety of simple partially-sampled birth-death genealogies, the distribution of branch lengths has been derived exactly\citep{StadlerJTB2010}.
% %Alternatively, a mathematical model for genealogies known as the coalescent \citep{kingman1982b,griffiths1994} can be adapted to account for changing effective population size through time.
% %%Both approaches make different approximations and have disadvantages.
% %%The distribution of branch lengths in coalescent genealogies is not the same as that of the birth-death process, but the distributions converge rapidly with increasing population and sample size.
% %Partially-sampled birth-death models do not make the approximations that coalescent models do, but they depend on a model of the sampling process, and simple sampling models may not always be an adequate description of real data sets.

% In birth-death models, it is assumed that the first individual appears at some time $t_0$ before the present.
% Through time, each individual gives rise to one additional offspring with rate $\lambda$ and dies with rate $\mu$.
% An individual is sampled (e.g. the pathogen of an infected individual is sequenced, or ancient DNA for an individual is sequenced; or a fossil is observed) with rate $\psi$.
% Upon sampling, we assume that the individual representing the sample is removed from the population with probability $r$.
% In the case of infectious diseases, $r$ is the probability of being cured or treated, such that the individual is not infectious any more upon sampling.
% In the case of species, we typically assume $r=0$ as the species continues to exist upon sampling of a fossil.
% At the present (or most recent) time, each extant individual is sampled with probability $\rho$.
% % Louis: Just saying "At present time, each extant..." makes it sounds like this is the current state of the method, not as if you're referring to the time of samples in the tree.
% The probability of a tree, given parameters $t_0,\lambda,\mu,\psi,r, \rho$ has been derived in \citep{StadlerJTB2010} for $r=0$, and generalized for $r \in [0,1]$ in \citep{StadlerEtAl2012MBE-R0}.
% A value $r<1$ necessitates using an MCMC algorithm capable of producing trees with sampled ancestors.
% Such an algorithm is provided in BEAST via the \texttt{SA} (sampled ancestor) package \citep{gavryushkina2014bayesian}.

% This basic model has been extended to account for changes of parameters through time within the \texttt{bdsky} package \citep{stadler2013birth}.
% In \texttt{bdsky}, time is divided up into one or more intervals, inside of which parameters are held constant but between which parameters may be completely different (i.e. the change of parameters occurs in a non-parametric way).
%\emph{Insert at the end of the paragraph ending on line 206 (bdsky)}


\noindent
\textcolor{blue}{
In epidemiological investigations the birth-death model can be reparameterised by setting the rate of becoming noninfectious, $\delta = \mu + \psi r$ (the total rate at which lineages are removed), the effective reproductive number, $R_e = \lambda / \delta$, and the sampling proportion $p = \psi / \delta$ (the proportion of removed lineages that are sampled).
Figure~\ref{fig:ebov_bdsky} shows the posterior estimates from a bdsky analysis of the 2013--2016 West African Ebola epidemic. Estimates are based on the coding regions of 811 sequences sampled through October 24, 2015, representing more than 2.5\% of known cases.
There is evidence that hospital-based transmission and unsafe burials contributed infections to the epidemic \citep{Whitty2014Nature}, thus the sampled ancestor package was used to account for some percentage of patients continuing to transmit the virus after being sampled (by allowing $r$ to be less than 1).
Expand All @@ -72,16 +33,6 @@ \section{Tree models for unstructured populations}
$R_e$ estimates before May 2014 and after August 2015 have a large amount of uncertainty attached to them, due to the small amount of sequences sampled during these time periods.
Trends in sampling proportion estimates follow empirical estimates based on the number of confirmed cases, however the sampling proportion is overestimated during the period of intense transmission, which suggests the existence of transmission chains not represented in the sequence dataset.
In the final two months of the study period the sampling proportion is underestimated, which may indicate ongoing cryptic transmission during this period, but may also be indicative of a model bias resulting from the remaining transmission chains at this time being highly isolated from each other, which is not taken into account by the model.
}

% Popular models in epidemiology, such as the SIR model \citep{kermackMcKendrick1927}, or in macroevolution, such as the diversity-dependent model \citep{nee1992tempo}, assume that parameters change as a function of the number of susceptible individuals or non-occupied niches, for example. Thus, they are called parametric birth-death models.
% Such parametric rate changes can be assumed when using the \texttt{EpiInf} package \citep{vaughan2017directly}.
% This latter package additionally samples the trajectory of infectious and susceptible individuals through time and allows for the inclusion of case count data in addition to sequences.
% In a faster, but approximate way, the \texttt{phylodynamics} package \citep{kuhnert2014simultaneous} performs inference under the SIR model using genetic sequences.






% Although not done here, it is possible to account for the incubation time of an epidemic using the structured models discussed in the next section.
Expand Down Expand Up @@ -109,12 +60,11 @@ \section{Substitution models}

\clearpage

\emph{Insert into substitution models section if it is included}
%\emph{Insert into substitution models section if it is included}

\textcolor{blue}{
Figure~\ref{fig:ebov_bmt} shows the posterior distribution resulting from a bModelTest analysis of substitution models for 14,517 nucleotides from the coding regions of 811 EBOV sequences sampled during the 2013--2016 West African Ebola virus epidemic.
Each circle represents a substitution model indicated by a six digit number corresponding to the six rates of reversible substitution models (see Figure~\ref{fig:ebov_bmt} caption for more details).
}




Expand Down
Binary file added doc/ebov_beast2_example.pdf
Binary file not shown.
10 changes: 0 additions & 10 deletions doc/ebov_beast2_example.tex
Original file line number Diff line number Diff line change
Expand Up @@ -48,16 +48,6 @@
%\clearpage
\begin{abstract}
Phylodynamics example for BEAST 2.5 paper using data from the West African Ebola dataset. The example uses a sampled-ancestor birth-death skyline to infer population dynamics. This can also double as a bModelTest example.

\noindent
New parts added:
\begin{itemize}
\item Figure 1
\item Blue text in unstructured tree models section
\item Figure 2
\item Blue text in substitution models section
\item Supplement
\end{itemize}
\end{abstract}

\tableofcontents
Expand Down

0 comments on commit c8e9df6

Please sign in to comment.