How to call SNV from scRNA-seq data using GATK #4

kangjiajinlong · 2023-05-18T20:13:11Z

Hi CAISC team,

CAISC requires SNV profiles of scRNA-seq data as input. As far as I know GATK is designed for bulk RNA-seq samples and I assume there has to be some adaptions for GATK to be used for scRNA-seq data. However, after reading through this github repository and the CAISC paper, I did not find a specific instruction on how we should call SNV from scRNA-seq data using GATK.

My current understanding is that we treat each single cell as a sample and jointly call SNVs using the GATK joint variant calling pipeline (https://gatk.broadinstitute.org/hc/en-us/articles/360035890411-Calling-variants-on-cohorts-of-samples-using-the-HaplotypeCaller-in-GVCF-mode). I assume the process should to similar as follows: we first obtain fastq files for each single cell and convert them into unmapped bam files. We run GATK RNA-seq variant calling pipeline on each cell individually in GVCF mode, generating one vcf file per cell. Then we combine all these vcf files into one using the CombineGVCFs tool. Finally, we use the VariantFiltration tool to generate one filtered vcf containing SNV information across all cells. This filtered vcf can be then used as input for CAISC.

Is what I described above the correct approach for generating the SNV input?

Thanks,
Jack

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to call SNV from scRNA-seq data using GATK #4

How to call SNV from scRNA-seq data using GATK #4

kangjiajinlong commented May 18, 2023 •

edited

Loading

How to call SNV from scRNA-seq data using GATK #4

How to call SNV from scRNA-seq data using GATK #4

Comments

kangjiajinlong commented May 18, 2023 • edited Loading

kangjiajinlong commented May 18, 2023 •

edited

Loading