Skip to content

Commit

Permalink
Merge branch 'develop' into duplicate_patient_id
Browse files Browse the repository at this point in the history
  • Loading branch information
ielis authored Sep 4, 2024
2 parents b67a709 + cf42c4e commit 7ecd5c3
Show file tree
Hide file tree
Showing 46 changed files with 2,807 additions and 1,494 deletions.
274 changes: 17 additions & 257 deletions docs/report/tbx5_frameshift_vs_missense.csv

Large diffs are not rendered by default.

21 changes: 7 additions & 14 deletions docs/report/tbx5_frameshift_vs_missense.mtc_report.html
Original file line number Diff line number Diff line change
Expand Up @@ -48,9 +48,9 @@
<h1>Phenotype testing report</h1>
<p>Phenotype MTC filter: <em>HPO MTC filter</em></p>
<p>Multiple testing correction: <em>fdr_bh</em></p>
<p>Performed statistical tests for 16 out of the total of 260 HPO terms.</p>
<p>Performed statistical tests for 17 out of the total of 260 HPO terms.</p>
<table>
<caption>Using <em>HPO MTC filter</em>, 244 term(s) were omitted from statistical analysis.</caption>
<caption>Using <em>HPO MTC filter</em>, 243 term(s) were omitted from statistical analysis.</caption>
<tbody>
<tr>
<th>Code</th>
Expand All @@ -61,21 +61,21 @@ <h1>Phenotype testing report</h1>
<tr>
<!-- TODO: plug the real reason code here -->
<td>TODO</td>
<td>Skipping general term</td>
<td>44</td>
<td>Skipping term with maximum frequency that was less than threshold 0.2</td>
<td>51</td>
</tr>

<tr>
<!-- TODO: plug the real reason code here -->
<td>TODO</td>
<td>Skipping term because all genotypes have same HPO observed proportions</td>
<td>42</td>
<td>Skipping general term</td>
<td>44</td>
</tr>

<tr>
<!-- TODO: plug the real reason code here -->
<td>TODO</td>
<td>Skipping term with only 0 observations (not powered for 2x2)</td>
<td>Skipping term because all genotypes have same HPO observed proportions</td>
<td>41</td>
</tr>

Expand Down Expand Up @@ -114,13 +114,6 @@ <h1>Phenotype testing report</h1>
<td>12</td>
</tr>

<tr>
<!-- TODO: plug the real reason code here -->
<td>TODO</td>
<td>Skipping term with maximum frequency that was less than threshold 0.2</td>
<td>10</td>
</tr>

<tr>
<!-- TODO: plug the real reason code here -->
<td>TODO</td>
Expand Down
18 changes: 9 additions & 9 deletions docs/tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,7 @@ in the individuals of the *TBX5* cohort.
... ),
... group_names=('Missense', 'Frameshift'),
... )
>>> gt_predicate.get_question()
>>> gt_predicate.display_question()
'Genotype group: Missense, Frameshift'

.. note::
Expand Down Expand Up @@ -224,8 +224,8 @@ with a false discovery control level at (``mtc_alpha=0.05``):
Choosing the statistical procedure for assessment of association between genotype and phenotype
groups is the last missing piece of the analysis. We will use Fisher Exact Test:

>>> from gpsea.analysis.pcats.stats import ScipyFisherExact
>>> count_statistic = ScipyFisherExact()
>>> from gpsea.analysis.pcats.stats import FisherExactTest
>>> count_statistic = FisherExactTest()

and we finalize the analysis setup by putting all components together
into :class:`~gpsea.analysis.pcats.HpoTermAnalysis`:
Expand All @@ -246,15 +246,15 @@ Now we can perform the analysis and investigate the results.
... pheno_predicates=pheno_predicates,
... )
>>> result.total_tests
16
17

We only tested 16 HPO terms. This is despite the individuals being collectively annotated with
We only tested 1y HPO terms. This is despite the individuals being collectively annotated with
260 direct and indirect HPO terms

>>> len(result.phenotypes)
260

We can show the reasoning behind *not* testing 244 (`260 - 16`) HPO terms
We can show the reasoning behind *not* testing 243 (`260 - 17`) HPO terms
by exploring the phenotype MTC filtering report.

>>> from gpsea.view import MtcStatsViewer
Expand All @@ -266,11 +266,11 @@ by exploring the phenotype MTC filtering report.
.. raw:: html
:file: report/tbx5_frameshift_vs_missense.mtc_report.html

and these are the HPO terms ordered by the p value corrected with the Benjamini-Hochberg procedure:
and these are the top 20 HPO terms ordered by the p value corrected with the Benjamini-Hochberg procedure:

>>> from gpsea.analysis.predicate import PatientCategories
>>> summary_df = result.summarize(hpo, PatientCategories.YES)
>>> summary_df.to_csv('docs/report/tbx5_frameshift_vs_missense.csv') # doctest: +SKIP
>>> summary_df.head(20).to_csv('docs/report/tbx5_frameshift_vs_missense.csv') # doctest: +SKIP

.. csv-table:: *TBX5* frameshift vs missense
:file: report/tbx5_frameshift_vs_missense.csv
Expand All @@ -283,4 +283,4 @@ was observed in 31/60 (52%) patients with a missense variant
but it was observed in 19/19 (100%) patients with a frameshift variant.
Fisher exact test computed a p value of `~0.0000562`
and the p value corrected by Benjamini-Hochberg procedure
is `~0.00112`.
is `~0.000955`.
4 changes: 2 additions & 2 deletions docs/user-guide/mtc.rst
Original file line number Diff line number Diff line change
Expand Up @@ -96,9 +96,9 @@ when creating an instance of :class:`~gpsea.analysis.pcats.HpoTermAnalysis`:

>>> from gpsea.analysis.mtc_filter import UseAllTermsMtcFilter
>>> from gpsea.analysis.pcats import HpoTermAnalysis
>>> from gpsea.analysis.pcats.stats import ScipyFisherExact
>>> from gpsea.analysis.pcats.stats import FisherExactTest
>>> analysis = HpoTermAnalysis(
... count_statistic=ScipyFisherExact(),
... count_statistic=FisherExactTest(),
... mtc_filter=UseAllTermsMtcFilter(),
... mtc_correction='bonferroni', # <--- The MTC correction setup
... )
Expand Down
Loading

0 comments on commit 7ecd5c3

Please sign in to comment.