Skip to content

Commit

Permalink
Fix some typos, use HPOPredicate.
Browse files Browse the repository at this point in the history
  • Loading branch information
ielis committed Sep 6, 2024
1 parent 3819244 commit 40f3d91
Show file tree
Hide file tree
Showing 21 changed files with 101 additions and 88 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
![PyPi downloads](https://img.shields.io/pypi/dm/gpsea.svg?label=Pypi%20downloads)
![PyPI - Python Version](https://img.shields.io/pypi/pyversions/gpsea)

GPSEA (Genotypes and Ghenotypes - Statistical Evaluation of Associations, pronounced "G"-"P"-"C") is a Python package designed to support genotype-phenotype correlation analysis.
GPSEA (Genotypes and Phenotypes - Statistical Evaluation of Associations, pronounced "G"-"P"-"C") is a Python package designed to support genotype-phenotype correlation analysis.


See the [Tutorial](https://monarch-initiative.github.io/gpsea/stable/tutorial.html)
Expand Down
2 changes: 1 addition & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ A key question in biology and human genetics concerns the relationships between
genetics, the focus is generally placed on the study of whether specific disease-causing alleles are associated with specific phenotypic
manifestations of the disease.

`GPSEA` (Genotypes and Ghenotypes - Statistical Evaluation of Associations, pronounced "G"-"P"-"C") is a Python package designed to support genotype-phenotype correlation analysis.
`GPSEA` (Genotypes and Phenotypes - Statistical Evaluation of Associations, pronounced "G"-"P"-"C") is a Python package designed to support genotype-phenotype correlation analysis.
The input to `GPSEA` is a collection of `Global Alliance for Genomics and Health (GA4GH) Phenopackets <https://pubmed.ncbi.nlm.nih.gov/35705716/>`_.
`gpsea` ingests data from these phenopackets and performs analysis of the correlation of specific variants,
variant types (e.g., missense vs. premature termination codon), or variant location in protein motifs or other features.
Expand Down
12 changes: 6 additions & 6 deletions docs/report/tbx5_frameshift_vs_missense.csv
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"Genotype group: Missense, Frameshift",Missense,Missense,Frameshift,Frameshift,,
Genotype group,Missense,Missense,Frameshift,Frameshift,,
,Count,Percent,Count,Percent,Corrected p values,p values
Ventricular septal defect [HP:0001629],31/60,52%,19/19,100%,0.0009552459156234353,5.6190936213143254e-05
Abnormal atrioventricular conduction [HP:0005150],0/22,0%,3/3,100%,0.003695652173913043,0.00043478260869565214
Expand All @@ -13,10 +13,10 @@ Muscular ventricular septal defect [HP:0011623],6/59,10%,6/25,24%,0.286867598598
Pulmonary arterial hypertension [HP:0002092],4/6,67%,0/2,0%,0.6623376623376622,0.42857142857142855
Hypoplasia of the ulna [HP:0003022],1/12,8%,2/10,20%,0.8095238095238093,0.5714285714285713
Hypoplasia of the radius [HP:0002984],30/62,48%,6/14,43%,1.0,0.7735491022101784
Short thumb [HP:0009778],11/41,27%,8/30,27%,1.0,1.0
Absent radius [HP:0003974],7/32,22%,6/25,24%,1.0,1.0
Short humerus [HP:0005792],7/17,41%,4/9,44%,1.0,1.0
Short thumb [HP:0009778],11/41,27%,8/30,27%,1.0,1.0
Atrial septal defect [HP:0001631],42/44,95%,20/20,100%,1.0,1.0
Abnormal ventricular septum morphology [HP:0010438],31/31,100%,19/19,100%,,
Abnormal cardiac ventricle morphology [HP:0001713],31/31,100%,19/19,100%,,
Abnormal heart morphology [HP:0001627],62/62,100%,30/30,100%,,
Absent radius [HP:0003974],7/32,22%,6/25,24%,1.0,1.0
Aplasia/Hypoplasia of the thumb [HP:0009601],20/20,100%,19/19,100%,,
Aplasia/Hypoplasia of fingers [HP:0006265],22/22,100%,19/19,100%,,
Aplasia/hypoplasia involving bones of the hand [HP:0005927],22/22,100%,19/19,100%,,
31 changes: 0 additions & 31 deletions docs/user-guide/predicates.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,6 @@ use the HPO terms to assign a group.
All GPSEA analyses need at least one predicate (typically a *genotype* predicate) and many require both *genotype* and *phenotype* predicates.
The following pages provide more information.

.. _genotype-predicates:



.. toctree::
Expand All @@ -37,35 +35,6 @@ The following pages provide more information.
predicates/genotype_predicates




.. _groups-predicate:



.. _phenotype-predicates:




Predicates for all cohort phenotypes
====================================

Constructing phenotype predicates for all HPO terms of a cohort sounds a bit tedious.
The :func:`~gpsea.analysis.predicate.phenotype.prepare_predicates_for_terms_of_interest`
function cuts down the tedium:

>>> from gpsea.analysis.predicate.phenotype import prepare_predicates_for_terms_of_interest
>>> pheno_predicates = prepare_predicates_for_terms_of_interest(
... cohort=cohort,
... hpo=hpo,
... )
>>> len(pheno_predicates)
301

and prepares predicates for testing 301 HPO terms of the *RERE* cohort.


*******
Gallery
*******
Expand Down
2 changes: 1 addition & 1 deletion docs/user-guide/predicates/diagnosis_predicate.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
.. _diagnosis_predicate:
.. _diagnosis-predicate:

========================
Partition by a diagnosis
Expand Down
4 changes: 2 additions & 2 deletions docs/user-guide/predicates/filtering_predicate.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ to bin according to a genotype group:
>>> from gpsea.analysis.predicate.genotype import ModeOfInheritancePredicate
>>> gt_predicate = ModeOfInheritancePredicate.autosomal_recessive(is_frameshift_or_stop_gain)
>>> gt_predicate.display_question()
'What is the genotype group?: HOM_REF, HET, BIALLELIC_ALT'
'What is the genotype group: HOM_REF, HET, BIALLELIC_ALT'

We see that the `gt_predicate` bins the patients into three groups:

Expand All @@ -53,4 +53,4 @@ that includes only the categories of interest:
... targets=(cats[1], cats[2]),
... )
>>> fgt_predicate.display_question()
'What is the genotype group?: HET, BIALLELIC_ALT'
'What is the genotype group: HET, BIALLELIC_ALT'
2 changes: 1 addition & 1 deletion docs/user-guide/predicates/genotype_predicates.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
.. _genotype_predicates:
.. _genotype-predicates:

===================
Genotype Predicates
Expand Down
6 changes: 3 additions & 3 deletions docs/user-guide/predicates/groups_predicate.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
.. _groups_predicate:
.. _groups-predicate:

================
Groups Predicate
Expand All @@ -7,8 +7,8 @@ Groups Predicate


Sometimes, all we want is to compare if there is a difference between individuals
who include one or more alleles of variant $X$ vs. individuals with variants $Y$,
vs. individuals with variants $Z$, where $X$, $Y$ and $Z$ are variant predicates.
who include one or more alleles of variant `X` vs. individuals with variants `Y`,
vs. individuals with variants `Z`, where `X`, `Y` and `Z` are variant predicates.
We can do this with a *groups* predicate.

The :func:`~gpsea.analysis.predicate.genotype.groups_predicate`
Expand Down
59 changes: 51 additions & 8 deletions docs/user-guide/predicates/hpo_predicate.rst
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
.. _hpo_predicate:
.. _hpo-predicate:


Propagating phenotype predicate
===============================
HPO predicate
=============

When testing for presence or absence of an HPO term, the propagating phenotype predicate
When testing for presence or absence of an HPO term, the :class:`~gpsea.analysis.predicate.phenotype.HpoPredicate`
leverages the :ref:`true-path-rule` to take advantage of the HPO hierarchy.
In result, an individual annotated with a term is implicitly annotated with all its ancestors.
For instance, an individual annotated with `Ectopia lentis <https://hpo.jax.org/browse/term/HP:0001083>`_
Expand All @@ -15,14 +15,19 @@ is also annotated with `Abnormal lens morphology <https://hpo.jax.org/browse/ter
Similarly, all descendants of a term, whose presence was specifically excluded in an individual,
are implicitly excluded.

:class:`~gpsea.analysis.predicate.phenotype.PropagatingPhenotypePredicate` implements this logic.

Example
-------

Here we show how to set up :class:`~gpsea.analysis.predicate.phenotype.PropagatingPhenotypePredicate`
Here we show how to set up :class:`~gpsea.analysis.predicate.phenotype.HpoPredicate`
to test for a presence of `Abnormal lens morphology <https://hpo.jax.org/browse/term/HP:0000517>`_.

We need to load :class:`~hpotk.MinimalOntology` with HPO data to access the HPO hierarchy:

>>> import hpotk
>>> store = hpotk.configure_ontology_store()
>>> hpo = store.load_minimal_hpo(release='v2024-07-01')

and now we can set up a predicate to test for presence of *Abnormal lens morphology*:

>>> from gpsea.analysis.predicate.phenotype import HpoPredicate
>>> query = hpotk.TermId.from_curie('HP:0000517')
Expand All @@ -41,4 +46,42 @@ missing_implies_phenotype_excluded
In many cases, published reports of clinical data about individuals with rare diseases describes phenotypic features that were observed, but do not
provide a comprehensive list of features that were explicitly excluded. By default, GPSEA will only include features that are recorded as observed or excluded in a phenopacket.
Setting this argument to True will cause "n/a" entries to be set to "excluded". We provide this option for exploration but do not recommend its use for the
final analysis unless the assumption behind it is known to be true.
final analysis unless the assumption behind it is known to be true.



Predicates for all cohort phenotypes
====================================

Constructing phenotype predicates for all HPO terms of a cohort sounds a bit tedious.
The :func:`~gpsea.analysis.predicate.phenotype.prepare_predicates_for_terms_of_interest`
function cuts down the tedium.

For a given phenopacket collection (e.g. 156 patients with mutations in *WWOX* gene included in Phenopacket Store version `0.1.18`)

>>> from ppktstore.registry import configure_phenopacket_registry
>>> registry = configure_phenopacket_registry()
>>> with registry.open_phenopacket_store(release='0.1.18') as ps:
... phenopackets = tuple(ps.iter_cohort_phenopackets('TBX5'))
>>> len(phenopackets)
156

processed into a cohort

>>> from gpsea.preprocessing import configure_caching_cohort_creator, load_phenopackets
>>> cohort_creator = configure_caching_cohort_creator(hpo)
>>> cohort, _ = load_phenopackets(phenopackets, cohort_creator) # doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE
Patients Created: ...


we can create HPO predicates for testing all 260 HPO terms used in the cohort

>>> from gpsea.analysis.predicate.phenotype import prepare_predicates_for_terms_of_interest
>>> pheno_predicates = prepare_predicates_for_terms_of_interest(
... cohort=cohort,
... hpo=hpo,
... )
>>> len(pheno_predicates)
260

and subject the predicates into further analysis, such as :class:`~gpsea.analysis.pcats.HpoTermAnalysis`.
2 changes: 1 addition & 1 deletion docs/user-guide/predicates/male_female_predicate.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
.. _male_female_predicate:
.. _male-female-predicate:

Partition by the sex of the individual
======================================
Expand Down
6 changes: 3 additions & 3 deletions docs/user-guide/predicates/mode_of_inheritance_predicate.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
.. _mode_of_inheritance_predicate:
.. _mode-of-inheritance-predicate:

==============================
Mode of Inheritance Predicates
Expand Down Expand Up @@ -92,6 +92,6 @@ for assigning a patient into a genotype group:
>>> from gpsea.analysis.predicate.genotype import ModeOfInheritancePredicate
>>> gt_predicate = ModeOfInheritancePredicate.autosomal_recessive(is_frameshift_or_stop_gain)
>>> gt_predicate.display_question()
'What is the genotype group?: HOM_REF, HET, BIALLELIC_ALT'
'What is the genotype group: HOM_REF, HET, BIALLELIC_ALT'

The `gt_predicate` can be used in downstream analysis, such as in :class:
The `gt_predicate` can be used in downstream analysis, such as in :class:`~gpsea.analysis.pcats.HpoTermAnalysis`.
10 changes: 5 additions & 5 deletions docs/user-guide/report/tbx5_frameshift.csv
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"What is the genotype group?: HOM_REF, HET",HOM_REF,HOM_REF,HET,HET,,
What is the genotype group,HOM_REF,HOM_REF,HET,HET,,
,Count,Percent,Count,Percent,Corrected p values,p values
Ventricular septal defect [HP:0001629],42/71,59%,19/19,100%,0.00411275392326226,0.00024192670136836825
Abnormal atrioventricular conduction [HP:0005150],1/23,4%,3/3,100%,0.01307692307692308,0.0015384615384615387
Expand All @@ -13,10 +13,10 @@ Muscular ventricular septal defect [HP:0011623],8/84,10%,6/25,24%,0.144002047919
Pulmonary arterial hypertension [HP:0002092],8/14,57%,0/2,0%,0.6899307928951143,0.4666666666666667
Short thumb [HP:0009778],25/69,36%,8/30,27%,0.6899307928951143,0.48700997145537483
Absent radius [HP:0003974],9/43,21%,6/25,24%,1.0,0.7703831604944444
Atrial septal defect [HP:0001631],63/65,97%,20/20,100%,1.0,1.0
Hypoplasia of the radius [HP:0002984],34/75,45%,6/14,43%,1.0,1.0
Hypoplasia of the ulna [HP:0003022],3/17,18%,2/10,20%,1.0,1.0
Short humerus [HP:0005792],8/21,38%,4/9,44%,1.0,1.0
Abnormal atrial septum morphology [HP:0011994],64/64,100%,20/20,100%,,
Abnormal cardiac septum morphology [HP:0001671],89/89,100%,28/28,100%,,
Abnormal heart morphology [HP:0001627],89/89,100%,30/30,100%,,
Atrial septal defect [HP:0001631],63/65,97%,20/20,100%,1.0,1.0
Aplasia/Hypoplasia of the thumb [HP:0009601],40/40,100%,19/19,100%,,
Aplasia/Hypoplasia of fingers [HP:0006265],44/44,100%,19/19,100%,,
Aplasia/hypoplasia involving bones of the hand [HP:0005927],44/44,100%,19/19,100%,,
4 changes: 2 additions & 2 deletions docs/user-guide/report/tbx5_frameshift.mtc_report.html
Original file line number Diff line number Diff line change
Expand Up @@ -103,14 +103,14 @@ <h1>Phenotype testing report</h1>
<tr>
<!-- TODO: plug the real reason code here -->
<td>TODO</td>
<td>Skipping term with only 5 observations (not powered for 2x2)</td>
<td>Skipping term with maximum frequency that was less than threshold 0.2</td>
<td>10</td>
</tr>

<tr>
<!-- TODO: plug the real reason code here -->
<td>TODO</td>
<td>Skipping term with maximum frequency that was less than threshold 0.2</td>
<td>Skipping term with only 5 observations (not powered for 2x2)</td>
<td>10</td>
</tr>

Expand Down
2 changes: 1 addition & 1 deletion docs/user-guide/stats.rst
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,7 @@ we expect the autosomal dominant mode of inheritance:
>>> from gpsea.analysis.predicate.genotype import ModeOfInheritancePredicate
>>> gt_predicate = ModeOfInheritancePredicate.autosomal_dominant(is_frameshift)
>>> gt_predicate.display_question()
'What is the genotype group?: HOM_REF, HET'
'What is the genotype group: HOM_REF, HET'

`gt_predicate` will assign the patients with no frameshift variant allele into `HOM_REF` group
and the patients with one frameshift allele will be assigned into `HET` group.
Expand Down
5 changes: 2 additions & 3 deletions src/gpsea/analysis/predicate/genotype/_gt_predicates.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
from cProfile import label
import dataclasses
import enum
import typing
Expand Down Expand Up @@ -605,7 +604,7 @@ def __init__(
)
if issues:
raise ValueError("Cannot create predicate: {}".format(", ".join(issues)))
self._question = "What is the genotype group?"
self._question = "What is the genotype group"

def get_categorizations(self) -> typing.Sequence[Categorization]:
return self._categorizations
Expand Down Expand Up @@ -727,7 +726,7 @@ def sex_predicate() -> GenotypePolyPredicate:
"""
Get a genotype predicate for categorizing patients by their :class:`~gpsea.model.Sex`.
See the :ref:`sex-predicate` section for an example.
See the :ref:`male-female-predicate` section for an example.
"""
return INSTANCE

Expand Down
4 changes: 2 additions & 2 deletions src/gpsea/analysis/predicate/phenotype/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,13 @@
or using the phenotype features encoded into HPO terms (:class:`PropagatingPhenotypePredicate`).
"""

from ._pheno import PhenotypePolyPredicate, PropagatingPhenotypePredicate
from ._pheno import PhenotypePolyPredicate, HpoPredicate
from ._pheno import DiseasePresencePredicate
from ._pheno import PhenotypeCategorization, P
from ._util import prepare_predicates_for_terms_of_interest, prepare_hpo_terms_of_interest

__all__ = [
'PhenotypePolyPredicate', 'PropagatingPhenotypePredicate',
'PhenotypePolyPredicate', 'HpoPredicate',
'DiseasePresencePredicate',
'PhenotypeCategorization', 'P',
'prepare_predicates_for_terms_of_interest', 'prepare_hpo_terms_of_interest',
Expand Down
8 changes: 5 additions & 3 deletions src/gpsea/analysis/predicate/phenotype/_pheno.py
Original file line number Diff line number Diff line change
Expand Up @@ -92,13 +92,15 @@ def present_phenotype_category(self) -> PatientCategory:
return self.present_phenotype_categorization.category


class PropagatingPhenotypePredicate(PhenotypePolyPredicate[hpotk.TermId]):
class HpoPredicate(PhenotypePolyPredicate[hpotk.TermId]):
"""
`PropagatingPhenotypePredicate` tests if a patient is annotated with an HPO term.
`HpoPredicate` tests if a patient is annotated with an HPO term.
Note, `query` must be a term of the provided `hpo`!
:param hpo: HPO object
See :ref:`hpo-predicate` section for an example usage.
:param hpo: HPO ontology
:param query: the HPO term to test
:param missing_implies_phenotype_excluded: `True` if lack of an explicit annotation implies term's absence`.
"""
Expand Down
4 changes: 2 additions & 2 deletions src/gpsea/analysis/predicate/phenotype/_util.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

import hpotk

from ._pheno import PhenotypePolyPredicate, PropagatingPhenotypePredicate
from ._pheno import PhenotypePolyPredicate, HpoPredicate

from gpsea.model import Patient

Expand All @@ -26,7 +26,7 @@ def prepare_predicates_for_terms_of_interest(
(either directly or indirectly) for the term to be included in the analysis.
"""
return tuple(
PropagatingPhenotypePredicate(
HpoPredicate(
hpo=hpo,
query=term,
missing_implies_phenotype_excluded=missing_implies_excluded,
Expand Down
4 changes: 2 additions & 2 deletions tests/analysis/test_mtc_filter.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

from gpsea.analysis.mtc_filter import HpoMtcFilter, SpecifiedTermsMtcFilter
from gpsea.analysis.predicate.genotype import GenotypePolyPredicate
from gpsea.analysis.predicate.phenotype import PhenotypePolyPredicate, PropagatingPhenotypePredicate
from gpsea.analysis.predicate.phenotype import PhenotypePolyPredicate, HpoPredicate
from gpsea.analysis.pcats import apply_predicates_on_patients
from gpsea.model import Cohort

Expand Down Expand Up @@ -55,7 +55,7 @@ def ph_predicate(
For the purpose of testing counts, let's pretend the counts
were created by this predicate.
"""
return PropagatingPhenotypePredicate(
return HpoPredicate(
hpo=hpo,
query=hpotk.TermId.from_curie("HP:0001250"), # Seizure
missing_implies_phenotype_excluded=False,
Expand Down
Loading

0 comments on commit 40f3d91

Please sign in to comment.