Major overhaul of computation of establishment and emergence probabilities in
Landscape
class. This new approach is based on discretizing mutant cloud
diffusion to compute emergence probability at different pseudotimes (mutant
cloud sizes) after bottleneck, then computing probability of fixation within
pseudotime (mutant cloud size) frame, then normalizing by time to obtain true
rate (slideshow explains, preprint will explain in detail). Now each mutant has
a series of emergence-establishment rates corresponding to different mutant
cloud sizes/pseudotimes after bottleneck. Plot class changed accordingly to
visualize sum of all rates across time.
Changed computation of emergence probabilities in Landscape
class to correctly
account for pathway dependence in instances in which intermediates with lower
fitness are present. Also corrected wait time before mutations are accumulated.
Updated parameters, documentation, tutorial, and output in corresponding manner.
Bug fixes and improvements for landscape mapping:
- incomplete mapping due to incorrect depth handling in
evaluateNeighbors()
- incorrect calculation of probabilities of acquiring specific mutations: first, the total number of alleles used in normalization was wrong; second, the total number of available paths to acquiring a group of mutations (calculated by permutation without replacement and assuming equal path likelihood) was missing
- added fitness as a recorded variable in mutation networks and visualization
- improved visualization controls
- cleaned up documentation of
Landscape
class andvisualizeMutationNetwork
- added landscape mapping example to
examples
folder and output graph toimg
- split some of the landscaping parameters into host and vector versions
Also, added all necessary parameters for new intrahost evolution algorithm.
Successfully implemented Landscape class to traverse and compute fitness
landscapes, the parameters needed for this in the Setup
class, as well as a
plotting function to visualize the resulting mutation networks using the PyVis
package.
Fixed some bugs in function importing (or lack thereof; recommended practice is
to declare function parameters as Python code in same scope as simulation and
pass function names as arguments to newSetup()
or loadSetup()
).
Also corrected small bug in which importlib_resources
wasn't being imported in
Setup
class.
Woke up and decided yesterday's changes were significant enough to warrant a version change to 1.2.0 :)
Forgot the MANIFEST.in
file to get the csv files.
- Added methods to save and load parameter settings into
Setup
objects using CSV files - Changed
Setup
object internal mechanics - Added default parameter values as external csv files, removed them from
Model
class
Moved plot and data files to a separate directory for organizational purposes.
Update to pandas dataframe handling (replace append with concat) to comply with new version.
Cleaned up everything for a release. Looking ahead:
- A big overhaul of mutation mechanisms and intra-host dynamics
- A big overhaul of host acquired immunity
- Performance improvements by refactoring simulation engine in C
Changes:
- Uses new Numpy Generator class for random numbers, some performance improvement expected.
- Added function
zeroTruncatedPoisson
to return random numbers from a zero-truncated Poisson distribution, to be used to define the number of cross-over events - Added function
zeroTruncatedNegBinomial
to return random numbers from a zero-truncated negative binomial distribution (or at least an approximation), to be used to define inoculum size during transmission (Sobel et al. 2017 find negative binomial is superior to Poisson when estimating bottleneck size); also add model parametersvariance_inoculum_host
andvariance_inoculum_vector
to be used by this function - Renamed
Gillespie
class asSimulation
; added new simulation algorithm that implements a variation of tau leaping (but left exact Gillespie as default since estimating adequate tau parameter seems tricky) - Fixed bug in
compartmentDf
which created duplicated rows in dataframe when hosts or vectors died infected and/or protected
- Bumped Pillow to satisfy the Github bot
- Fixed a bug that made recombination events depend on mutation coefficients instead of recombination coefficients (does not affect published results)
- Slightly alter the way Poisson distributions are used to define the number of cross-over events and pathogens inoculated in transmission, for consistency (add 1 to the mean of events once they are guaranteed to happen; impact on existing simulations is negligible)
- Modify
getWeightedRandom()
to use numpy arrays (profiling shows it does increase efficiency) - Added a new parameter to the
run()
function:skip_uninfected
. WhenTrue
, allows Opqua to store copies of only infected hosts/vectors as simulation progresses, and stores total number of healthy hosts to then reconstitute those as generic rows on dataframe after simulation is over. For simulations with a large number of hosts/vectors and relatively low infection prevalence, this can greatly increase simulation speed. - Changed Gillespie algorithm to only recalculate probabilities for events that may happen in simulation (since many simulations omit certain types of events)
- Added parameters to
setSetup
that optionally allow recalculation of all host and/or vector coefficients, thus overwriting all establishment frequency effects; the new default is to not recalculate
Previous fix didn't cut it and I jumped the gun on the release. This works.
Fixed recombination bug where one of the two progeny genomes was being lost (thanks David Suárez!). Updated joblib version.
Revisions incorporated!
Change all mentions of "lethality" to "mortality".
Update Pillow to 9.0.1
Modify behavior of compositionPlot to take all hosts/vectors into account when plotting immunity/protection.
Change units of lethality_rate_host
and lethality_rate_vector
to be a rate like
other parameters for better internal consistency, instead of a fraction of
recovered cases. Affects only behavior of simulations with disease-caused
mortality. Updated example accordingly.
Fix bug in updateVectorCoefficients()
specific to natality and migration.
Does not affect any simulation results as long as natality and migration are not
functions of pathogen genome sequence.
Update compositionPlot
additional arguments.
Update Pillow to 9.0.0.
Added biorXiv links.
Changed computation of inter-population contact rates to match logic of intra-population contact rates. Does not affect outcome of simulations not using inter-population contact.
Parenthesis error.
Another small change to the Gillespie algorithm, this time to avoid rare
infinite loops when tampering with t_var
.
- changed order in which time delta was added to
t_var
to be after interventions occur - do not carry out interventions if
t_var
is past simulation end time
Same bug as below, the fix was incomplete.
There was a bug in the last release regarding interventions! in order to correctly impplement custom user killswitches, it is important to update the time variable t_var immediately before an intervention takes place, not after it. Fixed in Gillespie method now, does not change behavior of any previous simulations.
General simulation structure changes:
- changed handling of intra-population contact rate between hosts and vectors within the Gillespie function to fit a biting rate definition for the contact rate (i.e. constant per vector contact rate). Does not affect behavior of simulations if total number of hosts and vectors is equal (or if models have no vector-borne pathogens)
- made time variable t_var a property of
Model
objects instead of internal toGillespie
object, allowing users to modify simulation time (e.g. for killswitches); does not affect simulation results
Opqua structure changes:
- changed way interventions are executed to guarantee that the Model object being simulated carries the intervention upon itself
- changed
runReplicates()
to create independent copies of model object and run simulations on each (should not affect simulation results due to parallelization through joblib) - added
deepCopy()
to reassign all internal model and population references in copied model objects - changed
runReplicates()
andrunParamSweep()
to usedeepCopy()
(should not affect simulation results due to parallelization through joblib) - added
customModelFunction()
function to allow users to add custom methods to specific Model instances (e.g. for killswitches and conditional interventions)
Opqua syntax changes:
- changed syntax of interventions to force all functions used to be methods of the Model object being intervened
All graphs in publication (title pending) generated with this stable version. [RETROACTIVE EDIT: first draft only]
General model structure changes:
- added transmission_efficiency_host_host,
transmission_efficiency_host_vector
,transmission_efficiency_vector_host
as additional parameters - made global_trackers copy into history object of a model
- adjusted computation of recombination probabilities for hosts and vectors
- skip recombining when parental genomes are the same
- changed genome sampling during inoculation of hosts and vectors
- added more flexibility, changed structure, and debugged migration/population
contact options in
runParamSweep()
In compositionDf():
- remove missing data
- change algorithm to replace combinations of genomes (more efficient when combinations are limiting factor)
- allow user to specify genome plotting order
Miscellaneous:
- add option to plot population fractions instead of absolute counts in
compositionPlot()
- Gillespie algorithm now prints out event name rather than ID number
- changed error handling when adding pathogens to hosts and vectors
- removed a duplicate definition in
Model newSetup()
- changed a group name in
intervention_example.py
v0.2.5 created a major bug that escaped my attention with the division by zero error fix.
- corrected
Host
andVector
acquirePathogen()
functions to restore correct behavior - added a requirements.txt file purely for reference purposes in case a future dependency update breaks opqua
- added global_trackers dictionary to Model in order to track and return some global indicators when running replicates or parameter sweeps
- added addCustomConditionTracker() function to
Model
class in order to allow users to track custom events in model - modified
mutation()
andrecombination()
inVector
andHost
classes as well asaddPathogensToHosts()
andaddPathogensToVectors()
inPopulation
class in order to track genomes seed forglobal_trackers
- added model attribute to
Population
class for the above reason as well - fix way host and vector sampling is handled (apparently I forgot to actually
implement it in
Gillespie
andPopulation
classes after I added the arguments, haha) - minor bug fix: modify
acquirePathogen
inVector
andHost
classes to avoid division by zero errors when recalculatingsum_fitness
after it was zero - corrected name of
compositionDataframe
argument incompositionPlot()
to composition_dataframe
- fixed regex processing bug in
compositionDf()
- added
**kwargs
argument passing to joblib functions to allow user to change backend and stuff
Trying to deploy on cluster so bear with me on the updates here
- added parameter sweep function
runParamSweep()
- added id property and argument to
Setup()
in order to associate a Setup to its ID in a Model, so thatrunParamSweep()
can edit the setups - added
getCompositionData()
function toModel
class to allow user output composition data without plotting compartments - fixed bug in how
runReplicates()
computed and return output - added verbose optional argument to
saveToDf()
to reduce console output - added composition_dataframe optional argument to allow for pre-computed data
- added setup.cfg as per Joel Barmettler I guess?
- change
compositionPLot
remove_legend
behavior to fix bug - change
pathogenDistanceDf
seq_names
behavior to fix bug - reduce mean inoculum from hosts into vectors to reflect malaria cycle
- modify
infectHost
andinfectVector
inoculation behavior so that mean_inoculum does not affect overall transmission rate; each infection now results in at least 1 pathogen transfer (if not containing and not immune to the pathogen genome sampled)
- update version tags
- pathogen genome influences transmission, death, recovery, migration, mutation probabilities. Done, tested
- independent recombination of alleles -> make chromosome separators!! make reassortment parameter option as related but separate to recombination. Done, tested
- host/vector birth/death rates in populations -> make birth event. Done, tested
- separate natural death into an event that doesn't log deaths. Done, tested
- make
RECEIVE_CONTACT
andRECEIVE_POPULATION_CONTACT
coefficient columns in arrays, modify how these events are handled. Done, tested - migrate vectors. Done, test
- contact between populations (without migration). Done, tested
- make genome labels optional, if no labels write a file with the genomes
in the same order for
compositionPlot
. Done, test compositionPlot
– make custom groupings, eg. "genomes containing sequence AAA". Done, test- make option to count only 1 fitness-dominant strain/host in compositionPlot. Done, test
- genomes and dates output for TDA. Done, test
- set seed. Done, test
- make all events except mutation, recombination and recovery 1-coefficient like contact and pop contact are. this way, coefficients are fraction of healthy rate. Modify dummy rows in coeff arrays to match this. Done, tested
- parallelizeable simulations. Done, test
- try Numba, Cython, JAX optimization
- update docs. Done
- update tutorials. Done, tested
- correctly update arguments in function documentation and
README
forcompositionPlot
family functions