scGeneHE (single cell Gene expression Heritability Estimation)

scGeneHE is a method that aim to increase the power of cis-heritaiblity estimates by leveraging the intra- and inter- individual correlation using scRNA-seq data. We employed a Poisson mixed-effects model that quantifies the cis-genetic component of gene expression using individual cellular profiles. scGeneHE is robust enough to conduct bootstrapping for standard error estimation of cis-heritability.

Environment Setup

Note: This pipeline is still under development. Please wait for our release of a reproducible Snakemake workflow. For now, to run the source code, separate environments of R and Python are required.

scGeneHE requires 4 isolated environments: saige, saigeqtl, r, and python. It could be easily set up on any desktop computer or server platform, with estimated time of <1 hour.

Install SAIGE using the bioconda recipe

    conda create -n saige -c conda-forge -c bioconda "r-base>=4.0" r-saige
    conda activate saige

Install SAIGE-QTL using the bioconda recipe. This is a recipe that we created based on the environment needed for SAIGE-QTL. It's now updated to commit 0d1f1eb which is sufficient to run scGeneHE. Since SAIGE-QTL is still under development, we will add it to bioconda after it's ready.

    conda install -c conda-forge -c bioconda 'aryarm::r-saigeqtl'
    conda activate saigeqtl

Set up Python environment for bootstrapping

    git clone https://github.com/AmariutaLab/scGeneHE.git
    conda env create --file=./envs/pythn.yaml
    conda activate pythn

Set up R environment for h2 estimation and processing results

    conda env create --file=./envs/r_env.yaml
    conda activate r_env

Grant execution permission to files

    chmod +x ./scGeneHE/*.sh

Commands

scGeneHE includes three sequential steps to conduct cis-heritability estimation: generating cell-level sparse GRM (genetic relationship matrix), point estimate, bootstrap estimate. Each step depends on the previous step's output. Here is a brief introduction of the commands, detailed description are included in ./scGeneHE/.

Cell-level sparse GRM
- generate_grm function takes in a list of genes and genotype data to generate row-expanded genetic relationship matrix in the dimension of number of cells. We enforce the trace of GRM matix to be M (total number of cells) to constrain h2 estimate within the range of [0, 1].
- Input: list of gene, plink files, n_markers used for GRM
- Output: sparse GRM and GRM id files
cis-heritability point estimate
- estimate_point function takes in gene expression, covariates data, and GRM to estimate variances of genetic and cell environment components by fitting a Poisson linear mixed model.
- Input: gene expression and covariates text file, GRM generated in Step 1
- Output: variance parameter estimates files
Bootstrapping samples
- bootsrap_real function takes in all sample gene expression and covariates data to generate bootstrap gene expression and covariates files. We conduct random sampling with replacement across cells.
- Input: all sample gene expression and covariates text file, number of bootstrap conducted
- Output: bootstrap gene expression and covariates text files in separate directories
cis-heritability estimates of bootstrap samples
- estimate_boot function takes in bootstrap gene expression and covariates files to estimate cis-h2 of bootstrap samples.
- Input: gene expression and covariates text file, GRM generated in Step 3
- Output: variance parameter estimates files
Aggregate bootstrap results
- agg_boot function takes in bootstrap estimate results and aggregate them into a result file.
- Input: variance parameter estimates files generated in Step 4
- Output: bootstrap estimate result

Example Usage

We generate sample data to illustrate the usage of scGeneHE. We use publicly available genotype data from 1000 Genome Project and publicly available single-cell gene expression data from OneK1K. We truncate 1MB region from chromosome 1 of 100 randomly sampled European individuals from 1000GP, combined with 100 random individual's single-cell expression (50 cells per donor) of gene CDC37 in CD4+ T effector memory cells in OneK1K data.

    ./scGeneHE/generate_grm.sh \
        ./example/gene_list.txt \
        ./example/HM_chr1_1MB_100_indiv \
        ./example/245 \
        ./example/

    ./scGeneHE/estimate_point.sh \
        ./example/CDC37 \
        ./example/HM_chr1_1MB_100_indiv \
        ./example/CDC37 \
        _sample_expression \
        ./example/CDC37 \
        245 \
        ./example/gene_list.txt \
        1,0.1,0.1 \
        PC1,PC2,PC3,PC4,PC5,PC6,percent.mt \
        PC1,PC2,PC3,PC4,PC5,PC6 \
        fid count _h2_estimate

    ./scGeneHE/bootstrap_real.sh \
        1.0 1 ./example/gene_list.txt \
        ./example/CDC37 _sample_expression \
        ./example/HM_chr1_1MB_100_indiv \
        PC1,PC2,PC3,PC4,PC5,PC6,percent.mt \
        ./example/ 1.0_sample_boot

    ./scGeneHE/estimate_boot.sh \
        ./example/CDC37 ./example/CDC37 \
        ./example/gene_list.txt \
        ./example 1.0_sample_boot 1 \
        1,0.1,0.1 PC1,PC2,PC3,PC4,PC5,PC6,percent.mt \
        PC1,PC2,PC3,PC4,PC5,PC6 fid count \
        ./example 1.0_sample_boot 245 \
        ./example/HM_chr1_1MB_100_indiv

    ./scGeneHE/agg_boot.sh \
        ./example 1.0_sample_boot 1 \
        ./example/result 1 \
        ./example/gene_list.txt \
        _boot_res

All example input and output files could be found under ./example/. The expected run time for 1 gene in 1 cell type is within 3 minutes, depending on the degree of correlation between individuals in your dataset (less correlated, more time).

Support

Please contact [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
envs		envs
example		example
scGeneHE		scGeneHE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

scGeneHE (single cell Gene expression Heritability Estimation)

Environment Setup

Commands

Example Usage

Support

About

Uh oh!

Releases

Packages

Languages

AmariutaLab/scGeneHE

Folders and files

Latest commit

History

Repository files navigation

scGeneHE (single cell Gene expression Heritability Estimation)

Environment Setup

Commands

Example Usage

Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages