-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #251 from monarch-initiative/documentation
updating documentation (WIP)
- Loading branch information
Showing
25 changed files
with
588 additions
and
523 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
.. _diagnosis-predicate: | ||
|
||
======================== | ||
Partition by a diagnosis | ||
======================== | ||
|
||
It is also possible to bin the individuals based on a diagnosis. | ||
The :func:`~gpsea.analysis.predicate.genotype.diagnosis_predicate` | ||
prepares a genotype predicate for assigning an individual into a diagnosis group: | ||
|
||
>>> from gpsea.analysis.predicate.genotype import diagnosis_predicate | ||
>>> gt_predicate = diagnosis_predicate( | ||
... diagnoses=('OMIM:154700', 'OMIM:129600'), | ||
... labels=('Marfan syndrome', 'Ectopia lentis, familial'), | ||
... ) | ||
>>> gt_predicate.display_question() | ||
'What disease was diagnosed: OMIM:154700, OMIM:129600' | ||
|
||
Note, an individual must match only one diagnosis group. Any individuals labeled with two or more diagnoses | ||
(e.g. an individual with both *Marfan syndrome* and *Ectopia lentis, familial*) | ||
will be automatically omitted from the analysis. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
.. _filtering-predicate: | ||
|
||
|
||
=================== | ||
Filtering predicate | ||
=================== | ||
|
||
Sometimes a predicate can bin individuals into more genotype groups than necessary and there may be need | ||
to consider only a subset of the groups. A `GenotypePolyPredicate` | ||
created by :class:`~gpsea.analysis.predicate.genotype.filtering_predicate` can retain only a subset | ||
of the target categorizations of interest. | ||
|
||
Example | ||
------- | ||
|
||
Let's suppose we want test the genotype-phenotype association between variants | ||
that lead to frameshift or a stop gain in a fictional transcript `NM_1234.5`, | ||
and we are specifically interested in comparing the heterozygous variants | ||
in a biallelic alternative allele genotypes (homozygous alternate and compound heterozygous). | ||
|
||
First, we set up a :class:`~gpsea.analysis.predicate.genotype.VariantPredicate` | ||
for testing if a variant introduces a premature stop codon or leads to the shift of the reading frame: | ||
|
||
>>> from gpsea.model import VariantEffect | ||
>>> from gpsea.analysis.predicate.genotype import VariantPredicates | ||
>>> tx_id = 'NM_1234.5' | ||
>>> is_frameshift_or_stop_gain = VariantPredicates.variant_effect(VariantEffect.FRAMESHIFT_VARIANT, tx_id) \ | ||
... | VariantPredicates.variant_effect(VariantEffect.STOP_GAINED, tx_id) | ||
>>> is_frameshift_or_stop_gain.get_question() | ||
'(FRAMESHIFT_VARIANT on NM_1234.5 OR STOP_GAINED on NM_1234.5)' | ||
|
||
Then, we create :class:`~gpsea.analysis.predicate.genotype.ModeOfInheritancePredicate.autosomal_recessive` | ||
to bin according to a genotype group: | ||
|
||
>>> from gpsea.analysis.predicate.genotype import ModeOfInheritancePredicate | ||
>>> gt_predicate = ModeOfInheritancePredicate.autosomal_recessive(is_frameshift_or_stop_gain) | ||
>>> gt_predicate.display_question() | ||
'What is the genotype group: HOM_REF, HET, BIALLELIC_ALT' | ||
|
||
We see that the `gt_predicate` bins the patients into three groups: | ||
|
||
>>> cats = gt_predicate.get_categorizations() | ||
>>> cats | ||
(Categorization(category=HOM_REF), Categorization(category=HET), Categorization(category=BIALLELIC_ALT)) | ||
|
||
We wrap the categorizations of interest along with the `gt_predicate` by the `filtering_predicate` function, | ||
and we will get a :class:`~gpsea.analysis.predicate.genotype.GenotypePolyPredicate` | ||
that includes only the categories of interest: | ||
|
||
>>> from gpsea.analysis.predicate.genotype import filtering_predicate | ||
>>> fgt_predicate = filtering_predicate( | ||
... predicate=gt_predicate, | ||
... targets=(cats[1], cats[2]), | ||
... ) | ||
>>> fgt_predicate.display_question() | ||
'What is the genotype group: HET, BIALLELIC_ALT' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
.. _genotype-predicates: | ||
|
||
=================== | ||
Genotype Predicates | ||
=================== | ||
|
||
|
||
A genotype predicate seeks to divide the individuals along an axis that is orthogonal to phenotypes. | ||
Typically, this includes using the genotype data, such as presence of a missense variant | ||
in a heterozygous genotype. However, other categorical variables, | ||
such as diagnoses (TODO - add link to disease predicate) or cluster ids can also be used. | ||
|
||
The genotype predicates test the individual for a presence of variants that meet certain inclusion criteria. | ||
The testing is done in two steps. First, we count the alleles | ||
of the matching variants and then we interpret the count, possibly including factors | ||
such as the expected mode of inheritance and sex, to assign the individual into a group. | ||
Finding the matching variants is what | ||
the :class:`~gpsea.analysis.predicate.genotype.VariantPredicate` is all about. | ||
|
||
|
||
.. toctree:: | ||
:maxdepth: 1 | ||
:caption: Contents: | ||
|
||
variant_predicates | ||
mode_of_inheritance_predicate | ||
filtering_predicate | ||
male_female_predicate | ||
diagnosis_predicate | ||
groups_predicate | ||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
.. _groups-predicate: | ||
|
||
================ | ||
Groups Predicate | ||
================ | ||
|
||
|
||
|
||
Sometimes, all we want is to compare if there is a difference between individuals | ||
who include one or more alleles of variant `X` vs. individuals with variants `Y`, | ||
vs. individuals with variants `Z`, where `X`, `Y` and `Z` are variant predicates. | ||
We can do this with a *groups* predicate. | ||
|
||
The :func:`~gpsea.analysis.predicate.genotype.groups_predicate` | ||
takes *n* variant predicates and *n* group labels, and it will assign the patients | ||
into the respective groups if one or more matching allele is found. | ||
However, only one predicate is allowed to return a non-zero allele count. | ||
Otherwise, the patient is assigned with ``None`` and excluded from the analysis. | ||
|
||
Example | ||
------- | ||
|
||
Here we show how to build a :class:`~gpsea.analysis.predicate.genotype.GenotypePolyPredicate` | ||
for testing if the individual has at least one missense vs. frameshift vs. synonymous variant. | ||
|
||
>>> from gpsea.model import VariantEffect | ||
>>> from gpsea.analysis.predicate.genotype import VariantPredicates, groups_predicate | ||
>>> tx_id = 'NM_1234.5' | ||
>>> gt_predicate = groups_predicate( | ||
... predicates=( | ||
... VariantPredicates.variant_effect(VariantEffect.MISSENSE_VARIANT, tx_id), | ||
... VariantPredicates.variant_effect(VariantEffect.FRAMESHIFT_VARIANT, tx_id), | ||
... VariantPredicates.variant_effect(VariantEffect.SYNONYMOUS_VARIANT, tx_id), | ||
... ), | ||
... group_names=('Missense', 'Frameshift', 'Synonymous'), | ||
... ) | ||
>>> gt_predicate.display_question() | ||
'Genotype group: Missense, Frameshift, Synonymous' | ||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,87 @@ | ||
.. _hpo-predicate: | ||
|
||
|
||
HPO predicate | ||
============= | ||
|
||
When testing for presence or absence of an HPO term, the :class:`~gpsea.analysis.predicate.phenotype.HpoPredicate` | ||
leverages the :ref:`true-path-rule` to take advantage of the HPO hierarchy. | ||
In result, an individual annotated with a term is implicitly annotated with all its ancestors. | ||
For instance, an individual annotated with `Ectopia lentis <https://hpo.jax.org/browse/term/HP:0001083>`_ | ||
is also annotated with `Abnormal lens morphology <https://hpo.jax.org/browse/term/HP:0000517>`_, | ||
`Abnormal anterior eye segment morphology <https://hpo.jax.org/browse/term/HP:0004328>`_, | ||
`Abnormal eye morphology <https://hpo.jax.org/browse/term/HP:0012372>`_, ... | ||
|
||
Similarly, all descendants of a term, whose presence was specifically excluded in an individual, | ||
are implicitly excluded. | ||
|
||
Example | ||
------- | ||
|
||
Here we show how to set up :class:`~gpsea.analysis.predicate.phenotype.HpoPredicate` | ||
to test for a presence of `Abnormal lens morphology <https://hpo.jax.org/browse/term/HP:0000517>`_. | ||
|
||
We need to load :class:`~hpotk.MinimalOntology` with HPO data to access the HPO hierarchy: | ||
|
||
>>> import hpotk | ||
>>> store = hpotk.configure_ontology_store() | ||
>>> hpo = store.load_minimal_hpo(release='v2024-07-01') | ||
|
||
and now we can set up a predicate to test for presence of *Abnormal lens morphology*: | ||
|
||
>>> from gpsea.analysis.predicate.phenotype import HpoPredicate | ||
>>> query = hpotk.TermId.from_curie('HP:0000517') | ||
>>> pheno_predicate = HpoPredicate( | ||
... hpo=hpo, | ||
... query=query, | ||
... ) | ||
>>> pheno_predicate.display_question() | ||
'Is Abnormal lens morphology present in the patient: Yes, No' | ||
|
||
|
||
|
||
missing_implies_phenotype_excluded | ||
---------------------------------- | ||
|
||
In many cases, published reports of clinical data about individuals with rare diseases describes phenotypic features that were observed, but do not | ||
provide a comprehensive list of features that were explicitly excluded. By default, GPSEA will only include features that are recorded as observed or excluded in a phenopacket. | ||
Setting this argument to True will cause "n/a" entries to be set to "excluded". We provide this option for exploration but do not recommend its use for the | ||
final analysis unless the assumption behind it is known to be true. | ||
|
||
|
||
|
||
Predicates for all cohort phenotypes | ||
==================================== | ||
|
||
Constructing phenotype predicates for all HPO terms of a cohort sounds a bit tedious. | ||
The :func:`~gpsea.analysis.predicate.phenotype.prepare_predicates_for_terms_of_interest` | ||
function cuts down the tedium. | ||
|
||
For a given phenopacket collection (e.g. 156 patients with mutations in *WWOX* gene included in Phenopacket Store version `0.1.18`) | ||
|
||
>>> from ppktstore.registry import configure_phenopacket_registry | ||
>>> registry = configure_phenopacket_registry() | ||
>>> with registry.open_phenopacket_store(release='0.1.18') as ps: | ||
... phenopackets = tuple(ps.iter_cohort_phenopackets('TBX5')) | ||
>>> len(phenopackets) | ||
156 | ||
|
||
processed into a cohort | ||
|
||
>>> from gpsea.preprocessing import configure_caching_cohort_creator, load_phenopackets | ||
>>> cohort_creator = configure_caching_cohort_creator(hpo) | ||
>>> cohort, _ = load_phenopackets(phenopackets, cohort_creator) # doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE | ||
Patients Created: ... | ||
|
||
|
||
we can create HPO predicates for testing all 260 HPO terms used in the cohort | ||
|
||
>>> from gpsea.analysis.predicate.phenotype import prepare_predicates_for_terms_of_interest | ||
>>> pheno_predicates = prepare_predicates_for_terms_of_interest( | ||
... cohort=cohort, | ||
... hpo=hpo, | ||
... ) | ||
>>> len(pheno_predicates) | ||
260 | ||
|
||
and subject the predicates into further analysis, such as :class:`~gpsea.analysis.pcats.HpoTermAnalysis`. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
.. _male-female-predicate: | ||
|
||
Partition by the sex of the individual | ||
====================================== | ||
|
||
It is easy to investigate the phenotypic differences between females and males. | ||
The :func:`~gpsea.analysis.predicate.genotype.sex_predicate` provides a predicate | ||
for partitioning based on the sex of the individual: | ||
|
||
>>> from gpsea.analysis.predicate.genotype import sex_predicate | ||
>>> gt_predicate = sex_predicate() | ||
>>> gt_predicate.display_question() | ||
'Sex of the individual: FEMALE, MALE' | ||
|
||
The individuals with :class:`~gpsea.model.Sex.UNKNOWN_SEX` will be omitted from the analysis. | ||
|
||
Note that we have implemented this predicate as a genotype predicate, because it is used in | ||
place of other genotype predicates. Currently, it is not possible to compare the distribution of genotypes across sexes. | ||
|
||
|
||
|
Oops, something went wrong.