Skip to content

Commit

Permalink
Merge pull request #257 from monarch-initiative/ielis/issue228
Browse files Browse the repository at this point in the history
Add codes to MTC filter summary, improve HPO term reporting
  • Loading branch information
ielis authored Sep 10, 2024
2 parents 51c439c + 9c98210 commit 02c9663
Show file tree
Hide file tree
Showing 14 changed files with 340 additions and 292 deletions.
26 changes: 11 additions & 15 deletions docs/report/tbx5_frameshift_vs_missense.csv
Original file line number Diff line number Diff line change
@@ -1,22 +1,18 @@
Genotype group,Missense,Missense,Frameshift,Frameshift,,
,Count,Percent,Count,Percent,Corrected p values,p values
Ventricular septal defect [HP:0001629],31/60,52%,19/19,100%,0.0009552459156234353,5.6190936213143254e-05
Abnormal atrioventricular conduction [HP:0005150],0/22,0%,3/3,100%,0.003695652173913043,0.00043478260869565214
Atrioventricular block [HP:0001678],0/22,0%,2/2,100%,0.015398550724637682,0.0036231884057971015
Heart block [HP:0012722],0/22,0%,2/2,100%,0.015398550724637682,0.0036231884057971015
Absent thumb [HP:0009777],12/71,17%,14/31,45%,0.0191369345329502,0.005628510156750059
Patent ductus arteriosus [HP:0001643],3/37,8%,2/2,100%,0.038236617183985605,0.01349527665317139
Triphalangeal thumb [HP:0001199],13/72,18%,13/32,41%,0.062175372424826694,0.02560162393963452
Cardiac conduction abnormality [HP:0031546],14/36,39%,3/3,100%,0.15811357916621074,0.07440639019586388
Secundum atrial septal defect [HP:0001684],14/35,40%,4/22,18%,0.2690764879148444,0.1424522583078588
Muscular ventricular septal defect [HP:0011623],6/59,10%,6/25,24%,0.2868675985983051,0.1687456462342971
Pulmonary arterial hypertension [HP:0002092],4/6,67%,0/2,0%,0.6623376623376622,0.42857142857142855
Hypoplasia of the ulna [HP:0003022],1/12,8%,2/10,20%,0.8095238095238093,0.5714285714285713
Ventricular septal defect [HP:0001629],31/60,52%,19/19,100%,0.0008990549794102921,5.6190936213143254e-05
Abnormal atrioventricular conduction [HP:0005150],0/22,0%,3/3,100%,0.003478260869565217,0.00043478260869565214
Atrioventricular block [HP:0001678],0/22,0%,2/2,100%,0.01932367149758454,0.0036231884057971015
Absent thumb [HP:0009777],12/71,17%,14/31,45%,0.022514040627000236,0.005628510156750059
Patent ductus arteriosus [HP:0001643],3/37,8%,2/2,100%,0.04318488529014845,0.01349527665317139
Triphalangeal thumb [HP:0001199],13/72,18%,13/32,41%,0.06827099717235872,0.02560162393963452
Cardiac conduction abnormality [HP:0031546],14/36,39%,3/3,100%,0.17007174901911745,0.07440639019586388
Secundum atrial septal defect [HP:0001684],14/35,40%,4/22,18%,0.2849045166157176,0.1424522583078588
Muscular ventricular septal defect [HP:0011623],6/59,10%,6/25,24%,0.29999225997208373,0.1687456462342971
Pulmonary arterial hypertension [HP:0002092],4/6,67%,0/2,0%,0.6857142857142857,0.42857142857142855
Hypoplasia of the ulna [HP:0003022],1/12,8%,2/10,20%,0.831168831168831,0.5714285714285713
Hypoplasia of the radius [HP:0002984],30/62,48%,6/14,43%,1.0,0.7735491022101784
Atrial septal defect [HP:0001631],42/44,95%,20/20,100%,1.0,1.0
Short thumb [HP:0009778],11/41,27%,8/30,27%,1.0,1.0
Absent radius [HP:0003974],7/32,22%,6/25,24%,1.0,1.0
Short humerus [HP:0005792],7/17,41%,4/9,44%,1.0,1.0
Abnormal atrial septum morphology [HP:0011994],43/43,100%,20/20,100%,,
Abnormal cardiac septum morphology [HP:0001671],62/62,100%,28/28,100%,,
Abnormal heart morphology [HP:0001627],62/62,100%,30/30,100%,,
77 changes: 21 additions & 56 deletions docs/report/tbx5_frameshift_vs_missense.mtc_report.html
Original file line number Diff line number Diff line change
Expand Up @@ -48,9 +48,9 @@
<h1>Phenotype testing report</h1>
<p>Phenotype MTC filter: <em>HPO MTC filter</em></p>
<p>Multiple testing correction: <em>fdr_bh</em></p>
<p>Performed statistical tests for 17 out of the total of 260 HPO terms.</p>
<p>Performed statistical tests for 16 out of the total of 260 HPO terms.</p>
<table>
<caption>Using <em>HPO MTC filter</em>, 243 term(s) were omitted from statistical analysis.</caption>
<caption>Using <em>HPO MTC filter</em>, 244 term(s) were omitted from statistical analysis.</caption>
<tbody>
<tr>
<th>Code</th>
Expand All @@ -59,80 +59,45 @@ <h1>Phenotype testing report</h1>
</tr>

<tr>
<!-- TODO: plug the real reason code here -->
<td>TODO</td>
<td>HMF01</td>
<td>Skipping term with maximum frequency that was less than threshold 0.2</td>
<td>51</td>
</tr>

<tr>
<!-- TODO: plug the real reason code here -->
<td>TODO</td>
<td>Skipping general term</td>
<td>44</td>
</tr>

<tr>
<!-- TODO: plug the real reason code here -->
<td>TODO</td>
<td>Skipping term because all genotypes have same HPO observed proportions</td>
<td>41</td>
</tr>

<tr>
<!-- TODO: plug the real reason code here -->
<td>TODO</td>
<td>Skipping term with only 2 observations (not powered for 2x2)</td>
<td>26</td>
</tr>

<tr>
<!-- TODO: plug the real reason code here -->
<td>TODO</td>
<td>Skipping term with only 1 observations (not powered for 2x2)</td>
<td>24</td>
</tr>

<tr>
<!-- TODO: plug the real reason code here -->
<td>TODO</td>
<td>Skipping term with only 3 observations (not powered for 2x2)</td>
<td>22</td>
<td>HMF02</td>
<td>Skipping term because no genotype has more than one observed HPO count</td>
<td>3</td>
</tr>

<tr>
<!-- TODO: plug the real reason code here -->
<td>TODO</td>
<td>Skipping term with only 4 observations (not powered for 2x2)</td>
<td>14</td>
<td>HMF03</td>
<td>Skipping term because of a child term with the same individual counts</td>
<td>1</td>
</tr>

<tr>
<!-- TODO: plug the real reason code here -->
<td>TODO</td>
<td>Skipping term with only 6 observations (not powered for 2x2)</td>
<td>12</td>
<td>HMF04</td>
<td>Skipping term because all genotypes have same HPO observed proportions</td>
<td>41</td>
</tr>

<tr>
<!-- TODO: plug the real reason code here -->
<td>TODO</td>
<td>Skipping term with only 5 observations (not powered for 2x2)</td>
<td>4</td>
<td>HMF05</td>
<td>Skipping term because one genotype had zero observations</td>
<td>2</td>
</tr>

<tr>
<!-- TODO: plug the real reason code here -->
<td>TODO</td>
<td>Skipping term because no genotype has more than one observed HPO count</td>
<td>3</td>
<td>HMF06</td>
<td>Skipping term with less than 7 observations (not powered for 2x2)</td>
<td>102</td>
</tr>

<tr>
<!-- TODO: plug the real reason code here -->
<td>TODO</td>
<td>Skipping term because one genotype had zero observations</td>
<td>2</td>
<td>HMF08</td>
<td>Skipping general term</td>
<td>44</td>
</tr>

</tbody>
Expand Down
10 changes: 5 additions & 5 deletions docs/tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -246,15 +246,15 @@ Now we can perform the analysis and investigate the results.
... pheno_predicates=pheno_predicates,
... )
>>> result.total_tests
17
16

We only tested 1y HPO terms. This is despite the individuals being collectively annotated with
We only tested 16 HPO terms. This is despite the individuals being collectively annotated with
260 direct and indirect HPO terms

>>> len(result.phenotypes)
260

We can show the reasoning behind *not* testing 243 (`260 - 17`) HPO terms
We can show the reasoning behind *not* testing 244 (`260 - 16`) HPO terms
by exploring the phenotype MTC filtering report.

>>> from gpsea.view import MtcStatsViewer
Expand All @@ -266,11 +266,11 @@ by exploring the phenotype MTC filtering report.
.. raw:: html
:file: report/tbx5_frameshift_vs_missense.mtc_report.html

and these are the top 20 HPO terms ordered by the p value corrected with the Benjamini-Hochberg procedure:
and these are the tested HPO terms ordered by the p value corrected with the Benjamini-Hochberg procedure:

>>> from gpsea.view import summarize_hpo_analysis
>>> summary_df = summarize_hpo_analysis(hpo, result)
>>> summary_df.head(20).to_csv('docs/report/tbx5_frameshift_vs_missense.csv') # doctest: +SKIP
>>> summary_df.to_csv('docs/report/tbx5_frameshift_vs_missense.csv') # doctest: +SKIP

.. csv-table:: *TBX5* frameshift vs missense
:file: report/tbx5_frameshift_vs_missense.csv
Expand Down
Loading

0 comments on commit 02c9663

Please sign in to comment.