Merge branch 'develop' into duplicate_patient_id

monarch-initiative · Sep 4, 2024 · 7ecd5c3 · 7ecd5c3
2 parents b67a709 + cf42c4e
commit 7ecd5c3
Show file tree

Hide file tree

Showing 46 changed files with 2,807 additions and 1,494 deletions.
diff --git a/docs/report/tbx5_frameshift_vs_missense.csv b/docs/report/tbx5_frameshift_vs_missense.csv
diff --git a/docs/report/tbx5_frameshift_vs_missense.mtc_report.html b/docs/report/tbx5_frameshift_vs_missense.mtc_report.html
@@ -48,9 +48,9 @@
   <h1>Phenotype testing report</h1>
   <p>Phenotype MTC filter: <em>HPO MTC filter</em></p>
   <p>Multiple testing correction: <em>fdr_bh</em></p>
-  <p>Performed statistical tests for 16 out of the total of 260 HPO terms.</p>
+  <p>Performed statistical tests for 17 out of the total of 260 HPO terms.</p>
     <table>
-      <caption>Using <em>HPO MTC filter</em>, 244 term(s) were omitted from statistical analysis.</caption>
+      <caption>Using <em>HPO MTC filter</em>, 243 term(s) were omitted from statistical analysis.</caption>
         <tbody>
           <tr>
             <th>Code</th>
@@ -61,21 +61,21 @@ <h1>Phenotype testing report</h1>
           <tr>
             <!-- TODO: plug the real reason code here -->
             <td>TODO</td>
-            <td>Skipping general term</td>
-            <td>44</td>
+            <td>Skipping term with maximum frequency that was less than threshold 0.2</td>
+            <td>51</td>
           </tr>
 
           <tr>
             <!-- TODO: plug the real reason code here -->
             <td>TODO</td>
-            <td>Skipping term because all genotypes have same HPO observed proportions</td>
-            <td>42</td>
+            <td>Skipping general term</td>
+            <td>44</td>
           </tr>
 
           <tr>
             <!-- TODO: plug the real reason code here -->
             <td>TODO</td>
-            <td>Skipping term with only 0 observations (not powered for 2x2)</td>
+            <td>Skipping term because all genotypes have same HPO observed proportions</td>
             <td>41</td>
           </tr>
 
@@ -114,13 +114,6 @@ <h1>Phenotype testing report</h1>
             <td>12</td>
           </tr>
 
-          <tr>
-            <!-- TODO: plug the real reason code here -->
-            <td>TODO</td>
-            <td>Skipping term with maximum frequency that was less than threshold 0.2</td>
-            <td>10</td>
-          </tr>
-
           <tr>
             <!-- TODO: plug the real reason code here -->
             <td>TODO</td>

diff --git a/docs/tutorial.rst b/docs/tutorial.rst
@@ -171,7 +171,7 @@ in the individuals of the *TBX5* cohort.
 ...     ),
 ...     group_names=('Missense', 'Frameshift'),
 ... )
->>> gt_predicate.get_question()
+>>> gt_predicate.display_question()
 'Genotype group: Missense, Frameshift'
 
 .. note::
@@ -224,8 +224,8 @@ with a false discovery control level at (``mtc_alpha=0.05``):
 Choosing the statistical procedure for assessment of association between genotype and phenotype
 groups is the last missing piece of the analysis. We will use Fisher Exact Test:
 
->>> from gpsea.analysis.pcats.stats import ScipyFisherExact
->>> count_statistic = ScipyFisherExact()
+>>> from gpsea.analysis.pcats.stats import FisherExactTest
+>>> count_statistic = FisherExactTest()
 
 and we finalize the analysis setup by putting all components together
 into :class:`~gpsea.analysis.pcats.HpoTermAnalysis`:
@@ -246,15 +246,15 @@ Now we can perform the analysis and investigate the results.
 ...     pheno_predicates=pheno_predicates,
 ... )
 >>> result.total_tests
-16
+17
 
-We only tested 16 HPO terms. This is despite the individuals being collectively annotated with
+We only tested 1y HPO terms. This is despite the individuals being collectively annotated with
 260 direct and indirect HPO terms
 
 >>> len(result.phenotypes)
 260
 
-We can show the reasoning behind *not* testing 244 (`260 - 16`) HPO terms
+We can show the reasoning behind *not* testing 243 (`260 - 17`) HPO terms
 by exploring the phenotype MTC filtering report.
 
 >>> from gpsea.view import MtcStatsViewer
@@ -266,11 +266,11 @@ by exploring the phenotype MTC filtering report.
 .. raw:: html
   :file: report/tbx5_frameshift_vs_missense.mtc_report.html
 
-and these are the HPO terms ordered by the p value corrected with the Benjamini-Hochberg procedure:
+and these are the top 20 HPO terms ordered by the p value corrected with the Benjamini-Hochberg procedure:
 
 >>> from gpsea.analysis.predicate import PatientCategories
 >>> summary_df = result.summarize(hpo, PatientCategories.YES)
->>> summary_df.to_csv('docs/report/tbx5_frameshift_vs_missense.csv')  # doctest: +SKIP
+>>> summary_df.head(20).to_csv('docs/report/tbx5_frameshift_vs_missense.csv')  # doctest: +SKIP
 
 .. csv-table:: *TBX5* frameshift vs missense
    :file: report/tbx5_frameshift_vs_missense.csv
@@ -283,4 +283,4 @@ was observed in 31/60 (52%) patients with a missense variant
 but it was observed in 19/19 (100%) patients with a frameshift variant.
 Fisher exact test computed a p value of `~0.0000562`
 and the p value corrected by Benjamini-Hochberg procedure
-is `~0.00112`.
+is `~0.000955`.
diff --git a/docs/user-guide/mtc.rst b/docs/user-guide/mtc.rst
@@ -96,9 +96,9 @@ when creating an instance of :class:`~gpsea.analysis.pcats.HpoTermAnalysis`:
 
 >>> from gpsea.analysis.mtc_filter import UseAllTermsMtcFilter
 >>> from gpsea.analysis.pcats import HpoTermAnalysis
->>> from gpsea.analysis.pcats.stats import ScipyFisherExact
+>>> from gpsea.analysis.pcats.stats import FisherExactTest
 >>> analysis = HpoTermAnalysis(
-...     count_statistic=ScipyFisherExact(),
+...     count_statistic=FisherExactTest(),
 ...     mtc_filter=UseAllTermsMtcFilter(),
 ...     mtc_correction='bonferroni',  #      <--- The MTC correction setup
 ... )