Skip to content

Code for CRISPR activation screen with single-cell RNA-seq readout studying zygotic genome activation

Notifications You must be signed in to change notification settings

siduanmiao/crispra_zga

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CRISPRa single-cell screen

This is the code that has been used to analyze a CRISPR-activation screen with single-cell RNA-seq readout in order to find regulators of zygotic genome activation. Peer-reviewed publication that describes this dataset can be found here (Open Access).

The main analysis involves assigning sgRNAs to cells as well as pre-processing scRNA-seq data.

Data overview

Raw data is of two main types: global transcriptome read-out (scRNA-seq, 10X Genomics) and amplicon sequencing of the same libraries to get guide → cell assignment. They are to be located in the directory data/raw/main/scrnaseq/, with Cell Ranger output (a folder per sample) in the transcriptome subfolder and amplicon sequencing FASTQ files in the amplicon subfolder. This repository does not contain raw data due to its large size. Original data has been deposited on GEO under GSE135621.

Filtered cell barcodes as defined by Cell Ranger are made available here on figshare:

curl 'https://ndownloader.figshare.com/articles/13393214/versions/1' -o filtered_barcodes.zip

Original tables such as a list of sgRNAs can be found in data/raw/main/tables folder.

Processing

Main processing steps are described in the workflow/Snakefile, which is also symlinked in the root folder. Processing scripts are to be found in the src folder and its subfolders.

Repeat elements quantification

FASTA files with repeat elements sequences per family are located in data/external/repeats. Due to their filesize, the files are not present in this repository and can be obtained from figshare:

curl 'https://ndownloader.figshare.com/files/23166317' -o repeats_fa.tar.gz

FASTQ files with cell barcodes are created from the unmapped reads of the BAM files in the Cell Ranger output and are then mapped to those families. Only reads mapped to a respective family are kept for downstream quantification.

Scripts to map originally unmapped reads to repeat elements and to aggregate results are located in the src/data/repeats directory.

About

Code for CRISPR activation screen with single-cell RNA-seq readout studying zygotic genome activation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 82.6%
  • Shell 6.2%
  • R 6.1%
  • Go 5.1%