Skip to content

Commit

Permalink
Merge pull request #351 from PGScatalog/doc-edits-0824
Browse files Browse the repository at this point in the history
Documentation edits (August 2024)
  • Loading branch information
smlmbrt authored Aug 6, 2024
2 parents 69c467e + 1dd30f5 commit e9ce3d7
Show file tree
Hide file tree
Showing 4 changed files with 18 additions and 8 deletions.
13 changes: 9 additions & 4 deletions docs/explanation/geneticancestry.rst
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,11 @@ The two groups of methods (empirical and continuous PCA-based) use these data an
with genetic ancestry.** Data is for the normalization of PGS000018 (metaGRS_CAD) in 1000 Genomes,
when applying ``pgsc_calc --run_ancestry`` to data from the Human Genome Diversity Project (HGDP) data.

.. note:: It is important to note that adjusting the PGS distributions by ancestry does not solve differences of
PGS performance that are observed across genetic ancestry groups. The methods implemented within the calculator
ensure that the Z-score distributions in individuals of differing genetic ancestries will be more comparable
(equal mean and/or variance); however, the effect size (e.g. beta, odds/hazard ratio) of being in the tail of
a distribution of a PGS may still differ across ancestry groups.

Empirical methods
~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -87,10 +92,10 @@ at 0 for each genetic ancestry group (output column: ``Z_norm1``), while not rel
model fitting (see `Figure 3`_).

The first method (``Z_norm1``) has the result of normalizing the first moment of the PGS distribution (mean); however,
the second moment of the PGS distribution (variance) can also differ between ancestry groups. A second regression of
the PCA-loadings on the squared residuals (difference of the PGS and the predicted PGS) can be fitted to estimate a
predicted standard deviation based on genetic ancestry, as was proposed by Khan et al. (2022)\ [#Khan2022]_ and
implemented within the eMERGE GIRA.\ [#GIRA]_ The predicted standard deviation (distance from the mean PGS based on
the second moment of the PGS distribution (variance) can also differ between ancestry groups.\ [#Khan2022]_ A second
regression of the PCA-loadings on the squared residuals (difference of the PGS and the predicted PGS) can be fitted to
estimate a predicted standard deviation based on genetic ancestry, described in detail and implemented within the
eMERGE GIRA (Linder et al. (2023)).\ [#GIRA]_ The predicted standard deviation (distance from the mean PGS based on
ancestry) is used to normalize the residual PGS and get a new estimate of relative risk (output column: ``Z_norm2``)
where the variance of the PGS distribution is more equal across ancestry groups and approximately 1 (see `Figure 3`_).

Expand Down
5 changes: 5 additions & 0 deletions docs/explanation/output.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,11 @@ If you have run the pipeline **without** using ancestry information the followin
commands; however, the calculation of the PGS is based on the full precision of the effect_weight value in the
scoring file.

.. warning:: Users should take note of whether the input samples were used in the development of the PGS being
scored as this can lead to inflated estimate of PGS performance (see `Wray et al. (2013)`_ for discussion).

.. _Wray et al. (2013): https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4096801/

``--run_ancestry``-specific outputs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down
4 changes: 3 additions & 1 deletion docs/getting-started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -166,12 +166,14 @@ Congratulations, you've now (`hopefully`) calculated some scores!
After the workflow executes successfully, the calculated scores and a summary
report should be available in the ``results/score/`` directory in your current
working directory (``$PWD``) by default. If you're interested in more
information, see :ref:`interpret`.
information, see :ref:`interpret`. **Note**: *when interpreting results users should ensure
that the samples used for calculation were not used for PGS development (see `Wray et al. (2013)`_).*

If the workflow didn't execute successfully, have a look at the
:ref:`troubleshoot` section. Remember to replace ``<docker/singularity/conda>``
with the software you have installed on your computer.

.. _Wray et al. (2013): https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4096801/

4. Next steps & advanced usage
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
4 changes: 1 addition & 3 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -130,10 +130,8 @@ Features under development

These are some of the fetures and improvements we're planning for the ``pgsc_calc``:

- Improved population reference panels (merged 1000 Genomes & Human Genome Diversity Project (HGDP) for use
within the pipeline
- Further optimizations to the PCA & ancestry similarity analysis steps focused on improving automatic QC
- Performance improvments to make ``pgsc_calc`` work with 1000s of scoring files in paralell (e.g. integration
- Performance improvements to make ``pgsc_calc`` work with 1000s of scoring files in paralell (e.g. integration
with `OmicsPred`_)

.. _OmicsPred: https://www.omicspred.org
Expand Down

0 comments on commit e9ce3d7

Please sign in to comment.