From 6e4071c3fbef4510cbdf5ef93903aed7fe708f12 Mon Sep 17 00:00:00 2001
From: DavAug <daugu@equinor.com>
Date: Sat, 17 Feb 2024 13:37:54 +0100
Subject: [PATCH] some doc tweaks

---
 .../fitting_models_to_data.rst                | 161 +++++++++++-------
 1 file changed, 100 insertions(+), 61 deletions(-)

diff --git a/docs/source/getting_started/fitting_models_to_data.rst b/docs/source/getting_started/fitting_models_to_data.rst
index 1a80b3ac..00e49373 100644
--- a/docs/source/getting_started/fitting_models_to_data.rst
+++ b/docs/source/getting_started/fitting_models_to_data.rst
@@ -11,9 +11,9 @@ Fitting mechanistic models to data
 
 In the previous tutorial, :doc:`mechanistic_model`, we have seen how we can
 implement and simulate treatment response models in Chi. For example, we can simulate
-the time course of drug concentration levels following repeated dose
-adminstrations as follows, using the same
-1-compartment PK model as before
+the time course of drug concentration levels by 1. implementing a model in SBML;
+2. setting the route of administration and the dosing regimen; 3. simulating the
+drug concentration for a given set of model parameters.
 
 .. literalinclude:: code/3_fitting_models_1.py
     :lines: 112-146
@@ -21,10 +21,14 @@ adminstrations as follows, using the same
 .. raw:: html
    :file: images/3_fitting_models_1.html
 
-This ability to simulate treatment responses is pretty cool in its own right,
-but, at this point, our model has little to do with
-real treatment responses. Assuming that our goal is to describe *real* treatment
-responses, we need to somehow connect our model to reality.
+This functionality to simulate treatment responses is pretty cool in its own
+right! For example, it can help us to study nonlinearities in the treatment response
+dynamics and to optimise dosing regimens to target a desired treatment response.
+
+However, at this point, the simulated treatment responses have little to do with
+real treatment responses. To describe *real* treatment
+responses that we may observe in clinical practice, we need to somehow connect
+our model to reality.
 
 The most common approach to relate models to real treatment responses is to
 compare the model predictions to measurements. Below, we have prepared an example
@@ -43,7 +47,7 @@ two columns that specify measurement times and dose administration times
 (``Observable``, ``Value``, ``Observable unit``), and three columns that specify
 dose administrations (``Dose``, ``Duration``, ``Dose unit``).
 
-If we download the file and save it in the same directory as the Python script,
+Downloading the file and saving it in the same directory as the Python script,
 we can visualise the measurements by executing the below script
 
 .. literalinclude:: code/3_fitting_models_1.py
@@ -52,8 +56,9 @@ we can visualise the measurements by executing the below script
 .. raw:: html
    :file: images/3_fitting_models_2.html
 
-The figure shows that the treatment response dynamics indicated by the measurements
-is similar to the treatment response simulated by the one-compartment PK model above.
+The figure shows the drug concentration measurements as blue scatter points.
+The treatment response dynamics indicated by the measurements
+is similar to the simulated treatment response in the previous code block.
 But looking more closely at the magnitude of the values,
 it appears that the measured values are much smaller
 than the simulated ones. We can therefore conclude that, at this point, our
@@ -61,10 +66,11 @@ model does not provide an accurate description of the measured treatment
 response.
 
 To find a better description of the treatment response, we have two options:
-1. we can try to find parameter values that improves the proximity of the
+1. we can try to find parameter values that improve the proximity of the
 model output to the measurements; or 2. we can define a new mechanistic model and see
 whether this new model is able to describe the measurements better. This tutorial
-will be about the former and detail how we can find better model parameters.
+will be about the former and will detail how we can find better model parameters
+for a given model structure.
 
 Estimating model parameters from data: Background
 *************************************************
@@ -99,26 +105,31 @@ For simulation, this distribution can be used to sample measurement values and
 imitate the measurement process of real treatment responses, see
 Section 1.3 in the :doc:`quick_overview` for an example. For parameter estimation,
 the distribution can be used to quantify the likelihood with which the observed
-measurements had been generated by our model,
-see Section 1.4 in the :doc:`quick_overview`.
-
-To account for measurement noise during the parameter estimation, we therefore
+measurements would have been generated by our model,
+see Section 1.4 in the :doc:`quick_overview`. To account for measurement noise
+during the parameter estimation, we therefore
 choose to quantify the closeness between the model output an the measurements
-using likelihoods. Formally, the measurement process can be described by
-distributions of measurement values of the form :math:`p(y | \psi, t, r)`, where :math:`y`
-denotes the measurement value, :math:`\psi` denotes the model parameters,
-:math:`t` denotes the time, and :math:`r` denotes the dosing
-regimen. The likelihood of a single measurement :math:`((t_1, y_1), r)` is
-given by the value of the probability density evaluated at the measurement,
-:math:`p(y_1 | \psi, t_1, r)`. This value depends on the chosen set of model
-parameters, :math:`\psi`. The model parameters with the maximum likelihood are
+using likelihoods.
+
+Formally, we denote the measurement distribution by :math:`p(y | \psi, t, r)`,
+where :math:`y` denotes the measurement value, :math:`\psi` denotes the model parameters,
+:math:`t` denotes the time point of the measurement, and :math:`r` denotes the adminstered
+dosing regimen. With this measurement distribution in place, we can quantify the
+likelihood with which any given set of measurements would have been generated by the
+model. For example, the likelihood of a measurement :math:`y_1` at time :math:`t_1`
+under dosing regimen :math:`r^*` is defined by the value of the probability density
+of the measurement distribution evaluated at the measurement,
+:math:`p(y_1 | \psi, t_1, r^*)`. Note that this
+likelihood depends on the choice of model parameters, :math:`\psi`. The model
+parameters with the maximum likelihood are
 the parameter values that most closely describe the measurements.
 
 .. note::
-    The measurement distribution, :math:`p(y | \psi, t, r)`, is uniquely defined
-    by the mechanistic model output and the error model. For example for
-    the 1-compartment model, we may denote the simulated drug concentration
-    values by :math:`c(\psi, t, r)`, where the drug concentration values, :math:`c`, are
+    The measurement distribution, :math:`p(y | \psi, t, r)`, is defined
+    by the mechanistic model output and the error model. To illustrate this, let
+    use denote the simulated drug concentration values of the
+    the 1-compartment PK model by :math:`c(\psi, t, r)`. By definition of
+    the model, the drug concentration values, :math:`c`, are
     a function of the model parameters, :math:`\psi = (a_0, k_a, k_e, v)`, the time,
     :math:`t`, and the dosing regimen, :math:`r`.
 
@@ -158,7 +169,7 @@ the parameter values that most closely describe the measurements.
     the probability density.
 
 
-Assuming the independence of measurements, the likelihood for a dataset with :math:`n` measurements,
+Assuming independence of measurements, the likelihood for a dataset with :math:`n` measurements,
 :math:`\mathcal{D}=((t_1, y_1), (t_2, y_2), \ldots, (t_n, y_n), r)`, is given
 by the product of the individual likelihoods
 
@@ -170,14 +181,16 @@ independently and identically distributed according to our model of the
 measurement process (this does not have to be the case, and is especially unlikely
 to be the case when our model fails to describe the measurement process accurately).
 
-While it is easy to numerically maximise the likelihood and
-to derive maximum likelihood estimates in Chi, see Section 1.4.1 in the :doc:`quick_overview`,
-the main objective of Chi is to support the Bayesian inference of model parameters.
+With this likelihood in place, Chi makes it straightforward to run numerical optimisation algorithms in order
+to derive the maximum likelihood estimates that best describe the measurements,
+see Section 1.4.1 in the :doc:`quick_overview`.
+
+Alternatively, Chi also supports Bayesian inference of the model parameters:
 Bayesian inference is conceptually different from maximum likelihood estimation
-because it does not seek to find a single set of model parameters that "best"
-describe the observed measurements. Instead, Bayesian inference acknowledges
-the fact that noisy measurements cannot uniquely identify
-the "best" model parameters, and therefore instead focuses on deriving
+as it does not seek to find a single set of model parameters that "best"
+describes the observed measurements. Instead, Bayesian inference acknowledges
+the fact that noisy measurements leave uncertainty about which model parameters
+best describe the data, and therefore focuses instead on deriving
 a distribution of parameter values which
 are all consistent with the observed measurements, see Section 1.4.2 in :doc:`quick_overview`
 for a more detailed discussion. This
@@ -187,34 +200,35 @@ distribution is derived from the likelihood using Bayes' rule
     p(\psi| \mathcal{D} ) = \frac{p(\mathcal{D}| \psi )\, p(\psi)}{p(\mathcal{D} )},
 
 where :math:`p(\psi)` denotes the prior distribution of the model parameters.
-The prior distribution can be used to quantify our belief of likely parameter
-values for the model before the parameter estimation.
+The prior distribution quantifies the baseline distribution of parameter values
+before the parameter estimation, making it possible to inform the inference results
+with otherwise hard-to-quantify knowledge of feasible or likely parameter values prior to the
+inference.
 
 Defining the log-posterior
 **************************
 
-In Chi, you can estimate model parameters from measurements by inferring posterior
-distributions of parameter values using Markov chain
-Monte Carlo (MCMC) algorithms. In this tutorial, we will use MCMC algorithms
-implemented in the open source Python package
-Pints_, but in principle you can also use implementations from other libraries.
+In Chi, we can derive posterior distributions from measurements
+using Markov chain
+Monte Carlo (MCMC) algorithms implemented in the open source Python package
+Pints_.
 In Sections 1.4.1 and 1.4.2 in the :doc:`quick_overview`,
 we showed in some detail how we can define (log-)posterior distributions,
-:class:`chi.LogPosterior`, in Chi for this purpose. Here, we want to show
-how we can use the :class:`chi.ProblemModellingController` to more easily
-construct the :class:`chi.LogPosterior`, provided the measurements are available
-in a CSV format of the form of Dataset_1_.
+:class:`chi.LogPosterior`, for this purpose. Here, we want to show
+how we can use the :class:`chi.ProblemModellingController` to simplify and automate
+this process as much as possible.
 
 The tricky bit when implementing log-posteriors for treatment response models
-is often that those log-posteriors do not only depend on the treatment
-response measurements, :math:`((t_1, y_1), (t_2, y_2), \ldots, (t_n, y_n))`,
-but that they also depend on the administered dosing regimen, :math:`r`.
+is often that log-posteriors do not only
+depend on the treatment response measurements, :math:`((t_1, y_1), (t_2, y_2), \ldots, (t_n, y_n))`,
+but also on the administered dosing regimen, :math:`r`.
 This can make it tedious to define log-posteriors,
-especially when you are inferring parameters across measurements of multiple
-individuals with different dosing regimens.
+especially when parameters across measurements of multiple
+individuals with different dosing regimens are inferred, as dosing regimens of the
+model will have to be manually set to the dosing regimens administered during the measurement
+process. A mismatch between the dosing regimens would render the inference results invalid.
 
-To simplify the process of constructing a :class:`LogPosterior`, we have
-implemented the :class:`chi.ProblemModellingController`.
+To eliminate this source of error, we have implemented the :class:`chi.ProblemModellingController`.
 The :class:`chi.ProblemModellingController` facilitates the construction of
 log-posteriors and reduces the workflow to a simple 4-step approach:
 
@@ -254,7 +268,7 @@ For the remaining 4 parameters, only positive values make biological sense, so
 we choose prior distributions that focus on positive values. For
 two model parameters, the volume of distribution and the scale parameter,
 negative or zero values are particularly bad as they will break the simulation
-(a volume of zero causes a division by zero error, and the lognormal distribution
+(1. a volume of zero causes a division by zero error; and 2. the lognormal distribution
 is only defined for positive sigma). We therefore use ``pints.LogNormalLogPrior``
 to constrain those parameters to strictly positive values.
 
@@ -274,12 +288,37 @@ Inferring the posterior distribution
 ************************************
 
 With this :class:`chi.LogPosterior` in place, we can infer the posterior
-distribution using any MCMC algorithm of our choice. Recall that MCMC
-algorithms infer distributions by sampling from them.
-If we sample sufficiently many samples,
-the histogram over the samples converges to the posterior
-distribution.
-To illustrate this, we run an MCMC algorithm to infer the above
+distribution using any MCMC algorithm of our choice. Recall that inference, in this context,
+means the reconstruction of the posterior distribution from the
+:class:`chi.LogPosterior`, which defines the log-pdf of the posterior distribution
+up to an unknown constant shift
+
+.. math::
+    \log p(\psi| \mathcal{D} ) = \log p(\mathcal{D}| \psi ) + \log p(\psi) + \text{constant}.
+
+By comparison to above Bayes rule, you will find that :math:`\text{constant} = -\log p(\mathcal{D})`,
+so the constant shift may seem not so 'unkown' after all. However, *unknown* here
+does not mean that we cannot write down an expression for it, it
+refers to the fact that for most treatment response models
+:math:`p(\mathcal{D})` is (practically) impossible to evaluate, as
+evaluating :math:`p(\mathcal{D})` requires the numerical integration of the
+likelihood-prior product over the full parameter space,
+:math:`p(\mathcal{D}) = \int \mathrm{d} \psi \, p(\mathcal{D}, \psi ) = \int \mathrm{d} \psi \, p(\mathcal{D}| \psi )\, p(\psi)`.
+This renders the value of the constant shift for all intents and purposes unknown.
+
+The unknown shift makes it impossible to make statements about the absolute probability
+of parameter values. However, it does allow for relative comparisons of
+probabilities -- a fact exploited by MCMC algorithms to circumvent the limitation
+of the partially known log-posterior. MCMC algorithms use
+the relative comparison of parameter probabilities to generate random samples from the
+posterior distribution, opening a gateway to reconstruct the distribution. The
+more random samples are generated, the closer the histogram over the samples will
+approximate the posterior distribution. In fact, one can show that the histogram
+will converge to the posterior distribution as the number of samples approaches
+infinity. This makes it possible for MCMC algorithms
+to reconstruct any posterior distribution from a :class:`chi.LogPosterior`.
+
+To illustrate this, let us run an MCMC algorithm to infer the above
 defined posterior distribution of the 1-compartment PK
 model.