DOC: Numericalisation-genotype

Documentation for modifying the numericalisation of genotypes
tskit-dev · Feb 20, 2024 · c3892a5 · c3892a5
1 parent 0cd2752
commit c3892a5
Show file tree

Hide file tree

Showing 3 changed files with 34 additions and 2 deletions.
diff --git a/docs/effect-size.md b/docs/effect-size.md
@@ -41,7 +41,8 @@ tstrait simulates a vector of quantitative trait $y$ from the following additive
 y = X\beta+\epsilon,
 ```
 
-where $X$ is the matrix that describes the number of causal alleles in each individual, $\beta$
+where $X$ is the matrix that describes the number of causal alleles in each individual (the values
+in each row will be $0$, $1$, or $2$ in the diploid setting, for example), $\beta$
 is the vector of effect sizes, and $\epsilon$ is the vector of environmental noise. Environmental
 noise is simulated from the following distribution,
 
@@ -58,6 +59,7 @@ regardless of ploidy.
 :::{seealso}
 - [](genetic_value_doc) for obtaining the genetic value $X\beta$.
 - [](environment_noise) for simulating environmental noise $\epsilon$.
+- [](numericalise_genotype) for modifying the numericalisation of genotypes.
 :::
 
 In this documentation, we will be describing how to simulate effect sizes in tstrait.

diff --git a/docs/genetic.md b/docs/genetic.md
@@ -191,3 +191,33 @@ code before inputting it inside the {py:func}`genetic_value` function.
 genetic_df = tstrait.genetic_value(ts, trait_df)
 genetic_df.head()
 ```
+
+(numericalise_genotype)=
+
+# Numericalisation of Genotypes
+
+The genotypes are numericalised as the number of causal alleles in each
+individual (Please see [](phenotype_model) for mathematical details on the phenotype
+model), but it would be possible to change the numericalisation by modifying the
+genetic value dataframe based on the effect size dataframe. For example, in the
+diploid setting, if you are interested in simulating phenotypes from the genotype
+$(aa=-1, Aa=0, AA=1)$, where $A$ is the causal allele, we can simply subtract the
+sum of effect sizes from the genetic value as given in the following example:
+
+```{code-cell}
+
+trait_df = tstrait.sim_trait(ts, num_causal=3, model=model, random_seed=5)
+genetic_df = tstrait.genetic_value(ts, trait_df)
+
+# The original dataframe
+genetic_df.head()
+```
+
+```{code-cell}
+
+genetic_df["genetic_value"] = genetic_df["genetic_value"] - trait_df["effect_size"].sum()
+# New dataframe
+genetic_df.head()
+```
+
+The new genetic value dataframe can be used in {py:func}`sim_env` to simulate phenotypes.
diff --git a/docs/quick-start.md b/docs/quick-start.md
@@ -157,7 +157,7 @@ The simulated phenotypes can be scaled by using the {func}`normalise_phenotypes`
 will first normalise the phenotype by subtracting the mean of the input phenotype from each
 value and divide it by the standard devitation of the input phenotype.
 Afterwards, it scales the normalised phenotype based on the mean and variance input.
-The output of {func}`normalise_phenotype` is a {class}`pandas.DataFrame` object with the scaled phenotypes.
+The output of {func}`normalise_phenotypes` is a {class}`pandas.DataFrame` object with the scaled phenotypes.
 
 An example usage of this function is shown below: