Skip to content

huqiwen0313/snarePip

Repository files navigation

snarePip

Build Status

Overview

snarePip is an analysis pipeline designed for snare-seq data. It can also be used for joint processing and analyze single-cell RNA and atac sequencing dataset.

It contains 2 independent modules: data processing and meta-table manipulation. The data processing part is based on Snakemake framework, which provides an automated framework for complex analysis including quality assessment, doublet removal, cell clustering and identification, peak generation, differential accessible region identification and linkage analysis.

The meta-table part is designed for easier meta-table manipulation and migration including automated extraction of sample information, preparation of data uploading and update of QC statics generated by the pipeline. The current version of pipeline automatically processes single-cell RNA and ATAC datasets with flexible analysis modules and generates summary reports for both quality assessment and downstream analysis.

The entire framework is based on Directed acyclic graph (DAG) and luigi framework is used to build the DAG that connect data processing and meta-table part together with salted workflow for better version control.

DAG of Snare-seq automated processing system

Installation

Requirements

Installation contains 2 parts, for meta-table and python related functions:

git clone https://github.com/huqiwen0313/snarePip.git
cd snarePip
pip install .

snarePip R package for atac-seq QC and downstream analysis

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install(c("Rsamtools", "edgeR", "DropletUtils", "ATACseqQC", "pcaMethods", "TFBSTools",
                        "JASPAR2018", "motifmatchr", "BSgenome.Hsapiens.NCBI.GRCh38", "ComplexHeatmap"))

install.packages("devtools")
devtools::install_github("huqiwen0313/snarePip", ref="main")

Usage

For running the pipeline with default steps:

git clone https://github.com/huqiwen0313/snarePip.git
cd snarePip
python -m snarePip [arguments]
Arguments Description
-s or --sampletable Name of google sheet contains sample information.
-w or --worksheet which worksheet to load from google spreadsheet, default 0(the first worksheet).
-r or --RNAdir path to RNA folder that processed results will be saved.
-a or --ATACdir path to ATAC folder that processed results will be saved.
-c or --cores number of cores used to run snakemake pipeline.
-sr or --snakeRNA snakemake file for RNA processing.
-sa or --snakeATAC snakemake file for ATAC processing.
-t or --type assay type, e.g. snare_2, tenX.
-sb or --subtable name of hubmap_submission table.
-sc or --ctable name of Contributor table.

For uploading task:

python -m snarePip [arguments] --upload
Arguments Description
-r or --RNAdir path to RNA folder that processed results will be saved.
-a or --ATACdir path to ATAC folder that processed results will be saved.
-sb or --subtable name of hubmap_submission table.
-sm or --submeta name of hubmap_submission_metatable.

Pipeline Module

Meta-table structure

There are several tables that record different information we need:

  1. sample table (links): record Experiment_ID, sample_ID and the other information. The ‘flag’ column in the sample table shows if the sample has been processed or not.
  2. QC tables (current have 4 tables: experiment-level_RNA, sample-level_RNA, experiment-level_ATAC, sample-level_ATAC): record the QC statistics when processing pipeline is finished
  3. Contributor: table contains information of person that contributor to individual sample
  4. Hubmap_submission: table contains submission information. The raw experiment data will be submitted to public server once the processing pipeline is finished and submission path is existed in the hubmap_submission table
  5. Hubmap_submission_metatable: the table describes the details for each experiment that need to be submitted to public server along with the raw experiment datasets. The format of hubmap_metatable is consistent with hubmap guidelines.

The relationship among different tables is shown below:

To enable automated meta-table connection, please set up google API and add JSONKEY_PATH="your_crential" into .env file in the snarePip main folder.

Reference

https://doi.org/10.1101/2021.07.28.454201

The package can be cited as:

Qiwen Hu, Xin Wang, Dihn Diep, Blue Lake, Kun Zhang and Peter Kharchenko (2021). SnarePip. Package version 0.1.0.