Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add NGSCheckMate #993

Open
wants to merge 18 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,11 +25,29 @@ Special thanks to the following for their contributions to the release:

### Enhancements & fixes

- [PR #993](https://github.com/nf-core/rnaseq/pull/993) Added NGSCheckMate for checking that samples come from the same individual
- [PR #1123](https://github.com/nf-core/rnaseq/pull/1123) - Overhaul tximport.r, output length tables
- [PR #1124](https://github.com/nf-core/rnaseq/pull/1124) - Ensure pseudoaligner is set if pseudoalignment is not skipped
- [PR #1126](https://github.com/nf-core/rnaseq/pull/1126) - Pipeline fails if transcript_fasta not provided and `skip_gtf_filter = true`.
- [PR #1127](https://github.com/nf-core/rnaseq/pull/1127) - Enlarge sampling to determine the number of columns in `filter_gtf.py` script.

### Parameters

| Old parameter | New parameter |
| ------------- | -------------------- |
| | `--ngscheckmate_bed` |

> **NB:** Parameter has been **updated** if both old and new parameter information is present.
> **NB:** Parameter has been **added** if just the new parameter information is present.
> **NB:** Parameter has been **removed** if new parameter information isn't present.

### Software dependencies

| Dependency | Old version | New version |
| -------------- | ----------- | ----------- |
| `bcftools` | | 1.17 |
| `ngscheckmate` | | 1.0.1 |

## [[3.13.1](https://github.com/nf-core/rnaseq/releases/tag/3.13.1)] - 2023-11-17

### Enhancements and fixes
Expand Down
44 changes: 23 additions & 21 deletions conf/igenomes.config
Original file line number Diff line number Diff line change
Expand Up @@ -12,29 +12,31 @@ params {
// illumina iGenomes reference file paths
genomes {
'GRCh37' {
fasta = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa"
bwa = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/version0.6.0/"
bowtie2 = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/"
star = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/"
bismark = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/BismarkIndex/"
gtf = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.gtf"
bed12 = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.bed"
readme = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/README.txt"
mito_name = "MT"
macs_gsize = "2.7e9"
blacklist = "${projectDir}/assets/blacklists/GRCh37-blacklist.bed"
fasta = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa"
bwa = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/version0.6.0/"
bowtie2 = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/"
star = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/"
bismark = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/BismarkIndex/"
gtf = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.gtf"
bed12 = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.bed"
ngscheckmate_bed = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh37/Annotation/NGSCheckMate/SNP_GRCh37_hg19_wChr.bed"
readme = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/README.txt"
mito_name = "MT"
macs_gsize = "2.7e9"
blacklist = "${projectDir}/assets/blacklists/GRCh37-blacklist.bed"
}
'GRCh38' {
fasta = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/WholeGenomeFasta/genome.fa"
bwa = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/BWAIndex/version0.6.0/"
bowtie2 = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/Bowtie2Index/"
star = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/STARIndex/"
bismark = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/BismarkIndex/"
gtf = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Annotation/Genes/genes.gtf"
bed12 = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Annotation/Genes/genes.bed"
mito_name = "chrM"
macs_gsize = "2.7e9"
blacklist = "${projectDir}/assets/blacklists/hg38-blacklist.bed"
fasta = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/WholeGenomeFasta/genome.fa"
bwa = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/BWAIndex/version0.6.0/"
bowtie2 = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/Bowtie2Index/"
star = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/STARIndex/"
bismark = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/BismarkIndex/"
gtf = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Annotation/Genes/genes.gtf"
bed12 = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Annotation/Genes/genes.bed"
ngscheckmate_bed = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh38/Annotation/NGSCheckMate/SNP_GRCh38_hg38_wChr.bed"
mito_name = "chrM"
macs_gsize = "2.7e9"
blacklist = "${projectDir}/assets/blacklists/hg38-blacklist.bed"
}
'CHM13' {
fasta = "${params.igenomes_base}/Homo_sapiens/UCSC/CHM13/Sequence/WholeGenomeFasta/genome.fa"
Expand Down
23 changes: 23 additions & 0 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -1064,6 +1064,29 @@ if (!params.skip_alignment && !params.skip_qc) {
}
}

if (params.ngscheckmate_bed) {
process {
withName: ".*BAM_NGSCHECKMATE:BCFTOOLS_MPILEUP" {
ext.args2 = '--no-version --ploidy 1 -c'
ext.args3 = '--no-version'
publishDir = [
path: { "${params.outdir}/${params.aligner}/ngscheckmate/vcfs" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

withName: ".*BAM_NGSCHECKMATE:NGSCHECKMATE_NCM" {
ext.args = '-V'
publishDir = [
path: { "${params.outdir}/${params.aligner}/ngscheckmate/output" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}
}
}

if (!params.skip_multiqc) {
process {
withName: 'MULTIQC' {
Expand Down
2 changes: 2 additions & 0 deletions conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@ params {
hisat2_index = "https://raw.githubusercontent.com/nf-core/test-datasets/7f1614baeb0ddf66e60be78c3d9fa55440465ac8/reference/hisat2.tar.gz"
salmon_index = "https://raw.githubusercontent.com/nf-core/test-datasets/7f1614baeb0ddf66e60be78c3d9fa55440465ac8/reference/salmon.tar.gz"
rsem_index = "https://raw.githubusercontent.com/nf-core/test-datasets/7f1614baeb0ddf66e60be78c3d9fa55440465ac8/reference/rsem.tar.gz"
rsem_index = "https://raw.githubusercontent.com/nf-core/test-datasets/7f1614baeb0ddf66e60be78c3d9fa55440465ac8/reference/rsem.tar.gz"
ngscheckmate_bed = "https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/reference/ngscheckmate.bed"

// Other parameters
skip_bbsplit = false
Expand Down
18 changes: 18 additions & 0 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -669,6 +669,24 @@ The plot on the left hand side shows the standard PC plot - notice the variable

Results generated by MultiQC collate pipeline QC from supported tools i.e. FastQC, Cutadapt, SortMeRNA, STAR, RSEM, HISAT2, Salmon, SAMtools, Picard, RSeQC, Qualimap, Preseq and featureCounts. Additionally, various custom content has been added to the report to assess the output of dupRadar, DESeq2 and featureCounts biotypes, and to highlight samples failing a mimimum mapping threshold or those that failed to match the strand-specificity provided in the input samplesheet. The pipeline has special steps which also allow the software versions to be reported in the MultiQC output for future traceability. For more information about how to use MultiQC reports, see <http://multiqc.info>.

### NGSCheckMate

<details markdown="1">
<summary>Output files</summary>

- `ngscheckmate/vcf`
- `*.vcf`: vcf files containing the SNPs called for each sample.
- `ngscheckmate/output`
- `*.pdf`: Plotted dendrogram showing which samples are considered to be matching.
- `*_corr_matrix.txt`: A correlation matrix showing the correlations between each sample. Correlations below the threshold are set to 0.
- `*_matched.txt`: Text file denoting which samples were considered to be matching. One line per match, stating the correlation and the coverage depth (of the sample in the pair with lowest number of reads).
- `*_corr_matrix.txt`: Text file denoting all the individual comparisons. One line per comparison, stating the correlation and the coverage depth (of the sample in the pair with lowest depth).

</details>

[NGSCheckMate](https://github.com/parklab/NGSCheckMate) is a tool to verify that samples come from the same individual, by examining a set of single nucleotide polymorphisms (SNPs). This calculates correlations between the samples, and then applies a depth-dependent model of allele fractions to call samples as being related or not. The principal outputs are a dendrogram, where samples that are considered to match are shown as connected, a matrix showing the correlations between each samples, and a text file detailing each match between files.
This requires a bed file specifying the SNPs to consider to be provided as input. The NGSCheckMate github provides such sets of bed files for hg19 and hg38 that have been selected as being typically heterogeneous across the population and are present in exonic regions. The tool can also be used to verify samples match between different sequencing modalities, for instance matching RNA-Seq with ATAC-seq, ChIP-seq and WGS.

## Pseudoalignment and quantification

### Pseudoalignment
Expand Down
16 changes: 16 additions & 0 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,11 @@
"git_sha": "de3e6fc949dcffb8d3508c015f435ace5773ff08",
"installed_by": ["modules"]
},
"bcftools/mpileup": {
"branch": "master",
"git_sha": "3f5420aa22e00bd030a2556dfdffc9e164ec0ec5",
"installed_by": ["bam_ngscheckmate"]
},
"cat/fastq": {
"branch": "master",
"git_sha": "5c460c5a4736974abde2843294f35307ee2b0e5e",
Expand Down Expand Up @@ -75,6 +80,12 @@
"git_sha": "bdc2a97ced7adc423acfa390742db83cab98c1ad",
"installed_by": ["modules"]
},
"ngscheckmate/ncm": {
"branch": "master",
"git_sha": "3f5420aa22e00bd030a2556dfdffc9e164ec0ec5",
"installed_by": ["bam_ngscheckmate"],
"patch": "modules/nf-core/ngscheckmate/ncm/ngscheckmate-ncm.diff"
},
"picard/markduplicates": {
"branch": "master",
"git_sha": "2ee934606f1fdf7fc1cb05d6e8abc13bec8ab448",
Expand Down Expand Up @@ -248,6 +259,11 @@
"git_sha": "dedc0e31087f3306101c38835d051bf49789445a",
"installed_by": ["subworkflows"]
},
"bam_ngscheckmate": {
"branch": "master",
"git_sha": "cfd937a668919d948f6fcbf4218e79de50c2f36f",
"installed_by": ["subworkflows"]
},
"bam_rseqc": {
"branch": "master",
"git_sha": "a9784afdd5dcda23b84e64db75dc591065d64653",
Expand Down
7 changes: 7 additions & 0 deletions modules/nf-core/bcftools/mpileup/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

58 changes: 58 additions & 0 deletions modules/nf-core/bcftools/mpileup/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

70 changes: 70 additions & 0 deletions modules/nf-core/bcftools/mpileup/meta.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

7 changes: 7 additions & 0 deletions modules/nf-core/ngscheckmate/ncm/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

64 changes: 64 additions & 0 deletions modules/nf-core/ngscheckmate/ncm/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading