Features
- An
allow_unary
flag (False by default
) has been added to all methods.
Documentation
- Various fixes in documentation, including documenting returned fits.
Breaking changes
-
The
return_posteriors
argument has been removed and replaced withreturn_fit
. An instance of one of two previously internal classes,ExpectationPropagation
andBeliefPropagation
, are now returned whenreturn_fit=True
, and posteriors can be obtained usingfit.node_posteriors()
. -
Topology-only dating (setting
mutation_rate=None
) has been removed for tree sequences of more than one tree, as tests have found that span-weighting the conditional coalescent causes substantial bias.
Bugfixes
- Minor bug fixed with final step of algorithm (path rescaling).
Features
- Initial support for dating with unphased (or poorly phased) singleton
mutations via
singletons_phased=False
option. The API is preliminary and may change.
Documentation
- Fixed description of priors for variational gamma method, which were referred to a 'flat' or improper but are actually empirical Bayes priors on root node ages, fit by expectation maximization.
Bugfixes
-
Variational gamma uses a rescaling approach which helps considerably if e.g. population sizes vary over time
-
Variational gamma does not use mutational area of branches, but average path length, which reduces bias in tree sequences containing polytomies
Breaking changes
-
The default method has been changed to
variational_gamma
. -
Variational gamma uses an improper (flat) prior, and therefore no longer needs
population_size
specifying. -
The standalone
preprocess_ts
function also applies thesplit_disjoint_nodes
method, which creates extra nodes but improves dating accuracy. -
Json metadata for mean time and variance in the mutation and node tables is now saved with a suitable schema. This means
json.loads()
is no longer needed to read it. -
The
mutation_rate
andpopulation_size
parameters are now keyword-only, and therefore these parameter names need to be explicitly typed out. -
The
ignore-oldest
option has been removed from the command-line interface, as it is no longer very helpful with new tsinfer output, which has the root node split. The option is still accessible from the Python API.
Bugfixes
- In variational gamma, rescale messages at end of each iteration to avoid numerical instability.
Breaking changes
-
The standalone
preprocess_ts
function now defaults to not removing unreferenced individuals, populations, or sites, aiming to change the tree sequence tables as little as possible. -
get_dates
(previously undocumented) has been removed, as posteriors can be obtained usingreturn_posterior
. Thenormalize
terminology previously used inget_dates
is changed tostandardize
to better reflect the fact that the maximum (not sum) is one, and exposed via theoutside_standardize
parameter. -
The
Ne
argument todate
has been deprecated (although it is still in the API for backward compatibility). The equivalent argumentpopulation_size
should be used instead. -
The CLI
-verbosity
flag no longer takes a number, but usesaction="count"
, so-v
turns verbosity to INFO level, whereas-vv
turns verbosity to DEBUG level. -
The
return_posteriors=True
option withmethod="inside_outside"
previously returned a dict that included keysstart_time
andend_time
, giving the impression that the posterior for node age is discretized over time slices in this algorithm. In actuality, the posterior is discretized atomically over time points, sostart_time
andend_time
have been replaced by a single keytime
. -
The
return_posteriors=True
option withmethod="maximization"
is no longer accepted (previously simply returnedNone
) -
Python 3.7 is no longer supported.
Features
-
A new continuous-time method,
"variational_gamma"
has been introduced, which uses an iterative expectation propagation approach. Tests show this increases accuracy, especially at older times. A Laplace approximation and damping are used to ensure numerical stability. After testing, the node priors used in this method are based on a global mixture prior, which can be refined during iteration. Future releases may switch to using this as the default method. -
Priors may be calculated using a piecewise-constant effective population trajectory, which is implemented in the
demography.PopulationSizeHistory
class. Thepopulation_size
argument todate
accepts either a single scalar effective population size, or aPopulationSizeHistory
instance. -
Added support and wheels for Python 3.11
-
The
.date()
function is now a wrapper for the individual dating methods (accessible usingtsdate.core.dating_methods
), which can be called independently. (e.g.tsdate.inside_outside(...)
). This makes it easier to document method-specific options. The API docs have been revised accordingly. Provenance is now saved with the name of the method used as the celled command, rather than"command": "date"
. -
Major re-write of documentation (now at https://tskit.dev/tsdate/docs/), to use the standard tskit jupyterbook framework.
Bugfixes
-
The returned posteriors when
return_posteriors=True
now return actual probabilities (scaled so that they sum to one) rather than standardized "probabilities" whose maximum value is one. -
The population size is saved in provenance metadata (as a dictionary if it is a
PopulationSizeHistory
instance) -
preprocess_ts
always records provenance as being from thepreprocess_ts
command, even if no gaps are removed. The command now has a (rarely used)delete_intervals
parameter, which is normally filled out and saved in provenance (as it was before). If no gap deletion is done, the param is saved as[]
Features
- Added the
time_units
parameter totsdate.date
, allowing users to specify the time units of the dated tree sequence. Default is"generations"
. - Added the
return_posteriors
parameter totsdate.date
. If True, the function returns a tuple of(dated_ts, posteriors)
. mutation_rate
is now a required argument intsdate.date
andtsdate.get_dates
- tsdate returns an error if users attempt to date an unsimplified tree sequence.
- Updated tsdate citation information to cite the recent Science paper
- Built wheel on Python 3.10
Features
- The algorithm now operates completely in unscaled time (in units of generations) under
the hood, which means that
tsdate.build_prior_grid
now requires the parameterNe
. - Users now have access to the marginal posterior distributions on node age by running
tsdate.get_dates
, though this is undocumented for now.
Bugfixes
- A fix to the way likelihoods are added should eliminate numerical errors that are sometimes encountered when dating very large tree sequences.
Features
- Two new methods,
tsdate.sites_time_from_ts
andtsdate.add_sampledata_times
, support inference of tree sequences from non-contemporaneous samples. - New tutorial on inferring tree sequences from modern and historic/ancient samples
explains how to use these functions in conjunction with
tsinfer
. tsdate.preprocess_ts
supports dating inferred tree sequences which include large, uninformative stretches (i.e. centromeres and telomeres). Simply run this function on the tree sequence before dating it.ignore_outside
is a new parameter in the outside pass which tellstsdate
to ignore edges from oldest root (these edges are often of low quality intsinfer
inferred tree sequences)- Development environment is now equivalent to other
tskit-dev
projects
- Improve user experience with more progress bars and logging.
- Slightly change traversal method in outside and outside maximization algorithms, this should only affect inference on inferred tree sequences with large numbers of nodes at the same frequency.
- Improve reporting of current project version
- Use appdirs for default caching location
- Prevent dating tree sequences with dangling nodes
- Bugfix release: resolve issue with precalculating prior values.
€# [0.1.0] - 2020-02-24
Early Alpha release made available via PyPI for community testing and evaluation.
Please don't use this version in published works.