Skip to content

Commit

Permalink
DOC: Numericalisation-genotype
Browse files Browse the repository at this point in the history
Documentation for modifying the numericalisation of genotypes
  • Loading branch information
daikitag committed Feb 20, 2024
1 parent 0cd2752 commit c3892a5
Show file tree
Hide file tree
Showing 3 changed files with 34 additions and 2 deletions.
4 changes: 3 additions & 1 deletion docs/effect-size.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,8 @@ tstrait simulates a vector of quantitative trait $y$ from the following additive
y = X\beta+\epsilon,
```

where $X$ is the matrix that describes the number of causal alleles in each individual, $\beta$
where $X$ is the matrix that describes the number of causal alleles in each individual (the values
in each row will be $0$, $1$, or $2$ in the diploid setting, for example), $\beta$
is the vector of effect sizes, and $\epsilon$ is the vector of environmental noise. Environmental
noise is simulated from the following distribution,

Expand All @@ -58,6 +59,7 @@ regardless of ploidy.
:::{seealso}
- [](genetic_value_doc) for obtaining the genetic value $X\beta$.
- [](environment_noise) for simulating environmental noise $\epsilon$.
- [](numericalise_genotype) for modifying the numericalisation of genotypes.
:::

In this documentation, we will be describing how to simulate effect sizes in tstrait.
Expand Down
30 changes: 30 additions & 0 deletions docs/genetic.md
Original file line number Diff line number Diff line change
Expand Up @@ -191,3 +191,33 @@ code before inputting it inside the {py:func}`genetic_value` function.
genetic_df = tstrait.genetic_value(ts, trait_df)
genetic_df.head()
```

(numericalise_genotype)=

# Numericalisation of Genotypes

The genotypes are numericalised as the number of causal alleles in each
individual (Please see [](phenotype_model) for mathematical details on the phenotype
model), but it would be possible to change the numericalisation by modifying the
genetic value dataframe based on the effect size dataframe. For example, in the
diploid setting, if you are interested in simulating phenotypes from the genotype
$(aa=-1, Aa=0, AA=1)$, where $A$ is the causal allele, we can simply subtract the
sum of effect sizes from the genetic value as given in the following example:

```{code-cell}
trait_df = tstrait.sim_trait(ts, num_causal=3, model=model, random_seed=5)
genetic_df = tstrait.genetic_value(ts, trait_df)
# The original dataframe
genetic_df.head()
```

```{code-cell}
genetic_df["genetic_value"] = genetic_df["genetic_value"] - trait_df["effect_size"].sum()
# New dataframe
genetic_df.head()
```

The new genetic value dataframe can be used in {py:func}`sim_env` to simulate phenotypes.
2 changes: 1 addition & 1 deletion docs/quick-start.md
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,7 @@ The simulated phenotypes can be scaled by using the {func}`normalise_phenotypes`
will first normalise the phenotype by subtracting the mean of the input phenotype from each
value and divide it by the standard devitation of the input phenotype.
Afterwards, it scales the normalised phenotype based on the mean and variance input.
The output of {func}`normalise_phenotype` is a {class}`pandas.DataFrame` object with the scaled phenotypes.
The output of {func}`normalise_phenotypes` is a {class}`pandas.DataFrame` object with the scaled phenotypes.

An example usage of this function is shown below:

Expand Down

0 comments on commit c3892a5

Please sign in to comment.