Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPT (text-davinci-003) used to revise manuscript #67

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 3 additions & 10 deletions content/01.abstract.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,5 @@
## Abstract {.page_break_before}

Genes act in concert with each other in specific contexts to perform their functions.
Determining how these genes influence complex traits requires a mechanistic understanding of expression regulation across different conditions.
It has been shown that this insight is critical for developing new therapies.
In this regard, the role of individual genes in disease-relevant mechanisms can be hypothesized with transcriptome-wide association studies (TWAS), which have represented a significant step forward in testing the mediating role of gene expression in GWAS associations.
However, modern models of the architecture of complex traits predict that gene-gene interactions play a crucial role in disease origin and progression.
Here we introduce PhenoPLIER, a computational approach that maps gene-trait associations and pharmacological perturbation data into a common latent representation for a joint analysis.
This representation is based on modules of genes with similar expression patterns across the same conditions.
We observed that diseases were significantly associated with gene modules expressed in relevant cell types, and our approach was accurate in predicting known drug-disease pairs and inferring mechanisms of action.
Furthermore, using a CRISPR screen to analyze lipid regulation, we found that functionally important players lacked TWAS associations but were prioritized in trait-associated modules by PhenoPLIER.
By incorporating groups of co-expressed genes, PhenoPLIER can contextualize genetic associations and reveal potential targets missed by single-gene strategies.
This paper presents PhenoPLIER, a computational approach that maps genetic associations and pharmacological perturbation data into a common latent representation to gain a better understanding of the etiology of complex traits and the mechanisms of action of drugs.
PhenoPLIER incorporates gene-gene interactions to identify disease-associated modules of genes that are expressed in relevant cell types, accurately predicting known drug-disease pairs and inferring mechanisms of action.
Furthermore, it can prioritize functionally important players that lack single-gene associations, providing a more comprehensive view of gene-trait associations.
33 changes: 20 additions & 13 deletions content/02.introduction.md
Original file line number Diff line number Diff line change
@@ -1,25 +1,31 @@
## Introduction

Genes work together in context-specific networks to carry out different functions [@pmid:19104045; @doi:10.1038/ng.3259].
Variations in these genes can change their functional role and, at a higher level, affect disease-relevant biological processes [@doi:10.1038/s41467-018-06022-6].
In this context, determining how genes influence complex traits requires mechanistically understanding expression regulation across different cell types [@doi:10.1126/science.aaz1776; @doi:10.1038/s41586-020-2559-3; @doi:10.1038/s41576-019-0200-9], which in turn should lead to improved treatments [@doi:10.1038/ng.3314; @doi:10.1371/journal.pgen.1008489].
Previous studies have described different regulatory DNA elements [@doi:10.1038/nature11247; @doi:10.1038/nature14248; @doi:10.1038/nature12787; @doi:10.1038/s41586-020-03145-z; @doi:10.1038/s41586-020-2559-3] including genetic effects on gene expression across different tissues [@doi:10.1126/science.aaz1776].
Integrating functional genomics data and GWAS data [@doi:10.1038/s41588-018-0081-4; @doi:10.1016/j.ajhg.2018.04.002; @doi:10.1038/s41588-018-0081-4; @doi:10.1038/ncomms6890] has improved the identification of these transcriptional mechanisms that, when dysregulated, commonly result in tissue- and cell lineage-specific pathology [@pmid:20624743; @pmid:14707169; @doi:10.1073/pnas.0810772105].
Genes interact with each other in specific networks to carry out different functions [@pmid:19104045; @doi:10.1038/ng.3259].
Variations in these genes can alter their functional role and, at a higher level, influence disease-relevant biological processes [@doi:10.1038/s41467-018-06022-6].
To better understand how genes affect complex traits, it is necessary to understand the regulation of gene expression across different cell types [@doi:10.1126/science.aaz1776; @doi:10.1038/s41586-020-2559-3; @doi:10.1038/s41576-019-0200-9], which should lead to improved treatments [@doi:10.1038/ng.3314; @doi:10.1371/journal.pgen.1008489].
Previous research has studied different regulatory DNA elements [@doi:10.1038/nature11247; @doi:10.1038/nature14248; @doi:10.1038/nature12787; @doi:10.1038/s41586-020-03145-z; @doi:10.1038/s41586-020-2559-3], including genetic effects on gene expression across different tissues [@doi:10.1126/science.aaz1776].
Combining functional genomics data and GWAS data [@doi:10.1038/s41588-018-0081-4; @doi:10.1016/j.ajhg.2018.04.002; @doi:10.1038/s41588-018-0081-4; @doi:10.1038/ncomms6890] has helped to identify transcriptional mechanisms that, when disrupted, often lead to tissue- and cell lineage-specific pathology [@pmid:20624743; @pmid:14707169; @doi:10.1073/pnas.0810772105].


Given the availability of gene expression data across several tissues [@doi:10.1038/nbt.3838; @doi:10.1038/s41467-018-03751-6; @doi:10.1126/science.aaz1776; @doi:10.1186/s13040-020-00216-9], an effective approach to identify these biological processes is the transcription-wide association study (TWAS), which integrates expression quantitative trait loci (eQTLs) data to provide a mechanistic interpretation for GWAS findings.
TWAS relies on testing whether perturbations in gene regulatory mechanisms mediate the association between genetic variants and human diseases [@doi:10.1371/journal.pgen.1009482; @doi:10.1038/ng.3506; @doi:10.1371/journal.pgen.1007889; @doi:10.1038/ng.3367], and these approaches have been highly successful not only in understanding disease etiology at the transcriptome level [@pmid:33931583; @doi:10.1101/2021.10.21.21265225; @pmid:31036433] but also in disease-risk prediction (polygenic scores) [@doi:10.1186/s13059-021-02591-w] and drug repurposing [@doi:10.1038/nn.4618] tasks.
Given the availability of gene expression data across several tissues [1-4], an effective approach to identify biological processes is the transcription-wide association study (TWAS).
TWAS integrates expression quantitative trait loci (eQTLs) data to provide a mechanistic interpretation for genetic studies associated with human diseases [5-8].
This approach has been highly successful in understanding disease etiology at the transcriptome level [9-11] as well as in disease-risk prediction (polygenic scores) [12] and drug repurposing [13].
However, TWAS works at the individual gene level, which does not capture more complex interactions at the network level.


These gene-gene interactions play a crucial role in current theories of the architecture of complex traits, such as the omnigenic model [@doi:10.1016/j.cell.2017.05.038], which suggests that methods need to incorporate this complexity to disentangle disease-relevant mechanisms.
Widespread gene pleiotropy, for instance, reveals the highly interconnected nature of transcriptional networks [@doi:10.1038/s41588-019-0481-0; @doi:10.1038/ng.3570], where potentially all genes expressed in disease-relevant cell types have a non-zero effect on the trait [@doi:10.1016/j.cell.2017.05.038; @doi:10.1016/j.cell.2019.04.014].
One way to learn these gene-gene interactions is using the concept of gene module: a group of genes with similar expression profiles across different conditions [@pmid:22955619; @pmid:25344726; @doi:10.1038/ng.3259].
In this context, several unsupervised approaches have been proposed to infer these gene-gene connections by extracting gene modules from co-expression patterns [@pmid:9843981; @pmid:24662387; @pmid:16333293].
Matrix factorization techniques like independent or principal component analysis (ICA/PCA) have shown superior performance in this task [@doi:10.1038/s41467-018-03424-4] since they capture local expression effects from a subset of samples and can handle modules overlap effectively.
Therefore, integrating genetic studies with gene modules extracted using unsupervised learning could further improve our understanding of disease origin [@pmid:25344726] and progression [@pmid:18631455].
Gene-gene interactions are essential for current theories of the architecture of complex traits, such as the omnigenic model [@doi:10.1016/j.cell.2017.05.038], which suggests that methods should take this complexity into account in order to identify the mechanisms underlying diseases.
Pleiotropy, or the capacity of a gene to have multiple effects, is a common phenomenon that reveals the intricate nature of gene expression networks [@doi:10.1038/s41588-019-0481-0; @doi:10.1038/ng.3570], where it is believed that all genes expressed in disease-relevant cell types have some influence on the trait [@doi:10.1016/j.cell.2017.05.038; @doi:10.1016/j.cell.2019.04.014].
To learn these gene-gene interactions, one approach is to use gene modules, which are groups of genes with similar expression profiles across different conditions [@pmid:22955619; @pmid:25344726; @doi:10.1038/ng.3259].
Unsupervised learning methods have been proposed to infer these gene-gene connections by extracting gene modules from co-expression patterns [@pmid:9843981; @pmid:24662387; @pmid:16333293].
Matrix factorization techniques such as independent or principal component analysis (ICA/PCA) have been shown to be more effective in this task [@doi:10.1038/s41467-018-03424-4], as they can capture local expression effects from a subset of samples and handle module overlap efficiently.
Thus, combining genetic studies with gene modules extracted using unsupervised learning could provide further insights into the origin [@pmid:25344726] and progression [@pmid:18631455] of diseases.


<!--
ERROR: the paragraph below could not be revised with the AI model due to the following error:

The AI model returned an empty string ('')
-->
Here we propose PhenoPLIER, an omnigenic approach that provides a gene module perspective to genetic studies.
The flexibility of our method allows integrating different data modalities into the same representation for a joint analysis.
In this work, we show that this module perspective can infer how groups of functionally-related genes influence complex traits, detect shared and distinct transcriptomic properties among traits, and predict how pharmacological perturbations affect genes' activity to exert their effects.
Expand All @@ -38,3 +44,4 @@ In summary, instead of considering single genes associated with different comple
This approach improves robustness in detecting and interpreting genetic associations, and here we show how it can prioritize alternative and potentially more promising candidate targets even when known single gene associations are not detected.
The approach represents a conceptual shift in the interpretation of genetic studies.
It has the potential to extract mechanistic insight from statistical associations to enhance the understanding of complex diseases and their therapeutic modalities.

Loading