You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
CAISC requires SNV profiles of scRNA-seq data as input. As far as I know GATK is designed for bulk RNA-seq samples and I assume there has to be some adaptions for GATK to be used for scRNA-seq data. However, after reading through this github repository and the CAISC paper, I did not find a specific instruction on how we should call SNV from scRNA-seq data using GATK.
My current understanding is that we treat each single cell as a sample and jointly call SNVs using the GATK joint variant calling pipeline (https://gatk.broadinstitute.org/hc/en-us/articles/360035890411-Calling-variants-on-cohorts-of-samples-using-the-HaplotypeCaller-in-GVCF-mode). I assume the process should to similar as follows: we first obtain fastq files for each single cell and convert them into unmapped bam files. We run GATK RNA-seq variant calling pipeline on each cell individually in GVCF mode, generating one vcf file per cell. Then we combine all these vcf files into one using the CombineGVCFs tool. Finally, we use the VariantFiltration tool to generate one filtered vcf containing SNV information across all cells. This filtered vcf can be then used as input for CAISC.
Is what I described above the correct approach for generating the SNV input?
Thanks,
Jack
The text was updated successfully, but these errors were encountered:
Hi CAISC team,
CAISC requires SNV profiles of scRNA-seq data as input. As far as I know GATK is designed for bulk RNA-seq samples and I assume there has to be some adaptions for GATK to be used for scRNA-seq data. However, after reading through this github repository and the CAISC paper, I did not find a specific instruction on how we should call SNV from scRNA-seq data using GATK.
My current understanding is that we treat each single cell as a sample and jointly call SNVs using the GATK joint variant calling pipeline (https://gatk.broadinstitute.org/hc/en-us/articles/360035890411-Calling-variants-on-cohorts-of-samples-using-the-HaplotypeCaller-in-GVCF-mode). I assume the process should to similar as follows: we first obtain fastq files for each single cell and convert them into unmapped bam files. We run GATK RNA-seq variant calling pipeline on each cell individually in GVCF mode, generating one vcf file per cell. Then we combine all these vcf files into one using the CombineGVCFs tool. Finally, we use the VariantFiltration tool to generate one filtered vcf containing SNV information across all cells. This filtered vcf can be then used as input for CAISC.
Is what I described above the correct approach for generating the SNV input?
Thanks,
Jack
The text was updated successfully, but these errors were encountered: