Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates #352

Merged
merged 15 commits into from
Jun 28, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
198 changes: 86 additions & 112 deletions conf/modules.config

Large diffs are not rendered by default.

30 changes: 15 additions & 15 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,19 +108,19 @@ The library-level alignments associated with the same sample are merged and subs
<details markdown="1">
<summary>Output files</summary>

- `<ALIGNER>/mergedLibrary/`
- `<ALIGNER>/merged_library/`
- `*.bam`: Merged library-level, coordinate sorted `*.bam` files after the marking of duplicates, and filtering based on various criteria. The file suffix for the final filtered files will be `*.mLb.clN.*`. If you specify the `--save_align_intermeds` parameter then two additional sets of files will be present. These represent the unfiltered alignments with duplicates marked (`*.mLb.mkD.*`), and in the case of paired-end datasets the filtered alignments before the removal of orphan read pairs (`*.mLb.flT.*`).
- `<ALIGNER>/mergedLibrary/samtools_stats/`
- `<ALIGNER>/merged_library/samtools_stats/`
- SAMtools `*.flagstat`, `*.idxstats` and `*.stats` files generated from the alignment files.
- `<ALIGNER>/mergedLibrary/picard_metrics/`
- `<ALIGNER>/merged_library/picard_metrics/`
- `*_metrics`: Alignment QC files from picard CollectMultipleMetrics.
- `*.metrics.txt`: Metrics file from MarkDuplicates.
- `<ALIGNER>/mergedLibrary/picard_metrics/pdf/`
- `<ALIGNER>/merged_library/picard_metrics/pdf/`
- `*.pdf`: Alignment QC plot files from picard CollectMultipleMetrics.
- `<ALIGNER>/mergedLibrary/preseq/`
- `<ALIGNER>/merged_library/preseq/`
- `*.lc_extrap.txt`: Preseq expected future yield file.

> **NB:** File names in the resulting directory (i.e. `<ALIGNER>/mergedLibrary/`) will have the '`.mLb.`' suffix.
> **NB:** File names in the resulting directory (i.e. `<ALIGNER>/merged_library/`) will have the '`.mLb.`' suffix.

</details>

Expand All @@ -141,7 +141,7 @@ The [Preseq](http://smithlabresearch.org/software/preseq/) package is aimed at p
<details markdown="1">
<summary>Output files</summary>

- `<ALIGNER>/mergedLibrary/bigwig/`
- `<ALIGNER>/merged_library/bigwig/`
- `*.bigWig`: Normalised bigWig files scaled to 1 million mapped reads.

</details>
Expand All @@ -153,12 +153,12 @@ The [bigWig](https://genome.ucsc.edu/goldenpath/help/bigWig.html) format is in a
<details markdown="1">
<summary>Output files</summary>

- `<ALIGNER>/mergedLibrary/phantompeakqualtools/`
- `<ALIGNER>/merged_library/phantompeakqualtools/`
- `*.spp.out`, `*.spp.pdf`: phantompeakqualtools output files.
- `*_mqc.tsv`: MultiQC custom content files.
- `<ALIGNER>/mergedLibrary/deepTools/plotFingerprint/`
- `<ALIGNER>/merged_library/deepTools/plotFingerprint/`
- `*.plotFingerprint.pdf`, `*.plotFingerprint.qcmetrics.txt`, `*.plotFingerprint.raw.txt`: plotFingerprint output files.
- `<ALIGNER>/mergedLibrary/deepTools/plotProfile/`
- `<ALIGNER>/merged_library/deepTools/plotProfile/`
- `*.computeMatrix.mat.gz`, `*.computeMatrix.vals.mat.tab`, `*.plotProfile.pdf`, `*.plotProfile.tab`, `*.plotHeatmap.pdf`, `*.plotHeatmap.mat.tab`: plotProfile output files.

</details>
Expand Down Expand Up @@ -188,10 +188,10 @@ The results from deepTools plotProfile gives you a quick visualisation for the g
<details markdown="1">
<summary>Output files</summary>

- `<ALIGNER>/mergedLibrary/macs2/<PEAK_TYPE>/`
- `<ALIGNER>/merged_library/macs2/<PEAK_TYPE>/`
- `*.xls`, `*.broadPeak` or `*.narrowPeak`, `*.gappedPeak`, `*summits.bed`: MACS2 output files - the files generated will depend on whether MACS2 has been run in _narrowPeak_ or _broadPeak_ mode.
- `*.annotatePeaks.txt`: HOMER peak-to-gene annotation file.
- `<ALIGNER>/mergedLibrary/macs2/<PEAK_TYPE>/qc/`
- `<ALIGNER>/merged_library/macs2/<PEAK_TYPE>/qc/`
- `macs2_peak.plots.pdf`: QC plots for MACS2 peaks.
- `macs2_annotatePeaks.plots.pdf`: QC plots for peak-to-gene feature annotation.
- `*.FRiP_mqc.tsv`, `*.peak_count_mqc.tsv`, `annotatepeaks.summary_mqc.tsv`: MultiQC custom-content files for FRiP score, peak count and peak-to-gene ratios.
Expand All @@ -217,7 +217,7 @@ Various QC plots per sample including number of peaks, fold-change distribution,
<details markdown="1">
<summary>Output files</summary>

- `<ALIGNER>/mergedLibrary/macs2/<PEAK_TYPE>/consensus/<ANTIBODY>/`
- `<ALIGNER>/merged_library/macs2/<PEAK_TYPE>/consensus/<ANTIBODY>/`
- `*.bed`: Consensus peak-set across all samples in BED format.
- `*.saf`: Consensus peak-set across all samples in SAF format. Required by featureCounts for read quantification.
- `*.featureCounts.txt`: Read counts across all samples relative to consensus peak-set.
Expand Down Expand Up @@ -245,7 +245,7 @@ The [featureCounts](http://bioinf.wehi.edu.au/featureCounts/) tool is used to co
<details markdown="1">
<summary>Output files</summary>

- `<ALIGNER>/mergedLibrary/macs2/<PEAK_TYPE>/consensus/<ANTIBODY>/deseq2/`
- `<ALIGNER>/merged_library/macs2/<PEAK_TYPE>/consensus/<ANTIBODY>/deseq2/`
- `*.sample.dists.txt`: Spreadsheet containing sample-to-sample distance across each consensus peak.
- `*.plots.pdf`: File containing PCA and hierarchical clustering plots.
- `*.dds.RData`: File containing R `DESeqDataSet` object generated by DESeq2, with either
Expand All @@ -254,7 +254,7 @@ The [featureCounts](http://bioinf.wehi.edu.au/featureCounts/) tool is used to co
`readRDS` to give user control of the eventual object name.
- `*pca.vals.txt`: Matrix of values for the first 2 principal components.
- `R_sessionInfo.log`: File containing information about R, the OS and attached or loaded packages.
- `<ALIGNER>/mergedLibrary/macs2/<PEAK_TYPE>/consensus/<ANTIBODY>/sizeFactors/`
- `<ALIGNER>/merged_library/macs2/<PEAK_TYPE>/consensus/<ANTIBODY>/sizeFactors/`
- `*.txt`, `*.RData`: Files containing DESeq2 sizeFactors per sample.

</details>
Expand Down
3 changes: 3 additions & 0 deletions modules/local/bam_remove_orphans.nf
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,9 @@ process BAM_REMOVE_ORPHANS {
tuple val(meta), path("${prefix}.bam"), emit: bam
path "versions.yml" , emit: versions

when:
task.ext.when == null || task.ext.when

script: // This script is bundled with the pipeline, in nf-core/chipseq/bin/
def args = task.ext.args ?: ''
prefix = task.ext.prefix ?: "${meta.id}"
Expand Down
8 changes: 5 additions & 3 deletions modules/local/bedtools_genomecov.nf
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,13 @@ process BEDTOOLS_GENOMECOV {
tuple val(meta), path("*.txt") , emit: scale_factor
path "versions.yml" , emit: versions

when:
task.ext.when == null || task.ext.when

script:
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"

def pe = meta.single_end ? '' : '-pc'
def extend = (meta.single_end && params.fragment_size > 0) ? "-fs ${params.fragment_size}" : ''
"""
SCALE_FACTOR=\$(grep '[0-9] mapped (' $flagstat | awk '{print 1000000/\$1}')
echo \$SCALE_FACTOR > ${prefix}.scale_factor.txt
Expand All @@ -30,7 +32,7 @@ process BEDTOOLS_GENOMECOV {
-bg \\
-scale \$SCALE_FACTOR \\
$pe \\
$extend \\
$args \\
| sort -T '.' -k1,1 -k2,2n > ${prefix}.bedGraph

cat <<-END_VERSIONS > versions.yml
Expand Down
8 changes: 5 additions & 3 deletions modules/local/deseq2_qc.nf
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,12 @@ process DESEQ2_QC {
path "size_factors" , optional:true, emit: size_factors
path "versions.yml" , emit: versions

when:
task.ext.when == null || task.ext.when

script:
def args = task.ext.args ?: ''
def peak_type = params.narrow_peak ? 'narrowPeak' : 'broadPeak'
def prefix = task.ext.prefix ?: "${meta.id}"
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"
"""
deseq2_qc.r \\
--count_file $counts \\
Expand Down
3 changes: 3 additions & 0 deletions modules/local/frip_score.nf
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,9 @@ process FRIP_SCORE {
tuple val(meta), path("*.txt"), emit: txt
path "versions.yml" , emit: versions

when:
task.ext.when == null || task.ext.when

script:
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"
Expand Down
3 changes: 3 additions & 0 deletions modules/local/genome_blacklist_regions.nf
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,9 @@ process GENOME_BLACKLIST_REGIONS {
path '*.bed' , emit: bed
path "versions.yml", emit: versions

when:
task.ext.when == null || task.ext.when

script:
def file_out = "${sizes.simpleName}.include_regions.bed"
if (blacklist) {
Expand Down
3 changes: 3 additions & 0 deletions modules/local/gtf2bed.nf
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,9 @@ process GTF2BED {
path '*.bed' , emit: bed
path "versions.yml", emit: versions

when:
task.ext.when == null || task.ext.when

script: // This script is bundled with the pipeline, in nf-core/chipseq/bin/
"""
gtf2bed \\
Expand Down
4 changes: 4 additions & 0 deletions modules/local/igv.nf
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,12 @@ process IGV {
output:
path "*files.txt" , emit: txt
path "*.xml" , emit: xml
path fasta , emit: fasta
path "versions.yml", emit: versions

when:
task.ext.when == null || task.ext.when

script: // scripts are bundled with the pipeline in nf-core/chipseq/bin/
def consensus_dir = "${aligner_dir}/mergedLibrary/macs2/${peak_dir}/consensus/*"
"""
Expand Down
3 changes: 3 additions & 0 deletions modules/local/multiqc.nf
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,9 @@ process MULTIQC {
path "*_plots" , optional:true, emit: plots
path "versions.yml" , emit: versions

when:
task.ext.when == null || task.ext.when

script:
def args = task.ext.args ?: ''
def custom_config = params.multiqc_config ? "--config $mqc_custom_config" : ''
Expand Down
3 changes: 3 additions & 0 deletions modules/local/multiqc_custom_peaks.nf
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,9 @@ process MULTIQC_CUSTOM_PEAKS {
tuple val(meta), path("*.peak_count_mqc.tsv"), emit: count
tuple val(meta), path("*.FRiP_mqc.tsv") , emit: frip

when:
task.ext.when == null || task.ext.when

script:
def prefix = task.ext.prefix ?: "${meta.id}"
"""
Expand Down
3 changes: 3 additions & 0 deletions modules/local/multiqc_custom_phantompeakqualtools.nf
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,9 @@ process MULTIQC_CUSTOM_PHANTOMPEAKQUALTOOLS {
tuple val(meta), path("*.spp_rsc_mqc.tsv") , emit: rsc
tuple val(meta), path("*.spp_correlation_mqc.tsv"), emit: correlation

when:
task.ext.when == null || task.ext.when

script:
def prefix = task.ext.prefix ?: "${meta.id}"
"""
Expand Down
3 changes: 3 additions & 0 deletions modules/local/plot_homer_annotatepeaks.nf
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,9 @@ process PLOT_HOMER_ANNOTATEPEAKS {
path '*.tsv' , emit: tsv
path "versions.yml", emit: versions

when:
task.ext.when == null || task.ext.when

script: // This script is bundled with the pipeline, in nf-core/chipseq/bin/
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "annotatepeaks"
Expand Down
6 changes: 5 additions & 1 deletion modules/local/plot_macs2_qc.nf
Original file line number Diff line number Diff line change
Expand Up @@ -8,15 +8,19 @@ process PLOT_MACS2_QC {

input:
path peaks
val is_narrow_peak

output:
path '*.txt' , emit: txt
path '*.pdf' , emit: pdf
path "versions.yml", emit: versions

when:
task.ext.when == null || task.ext.when

script: // This script is bundled with the pipeline, in nf-core/chipseq/bin/
def args = task.ext.args ?: ''
def peak_type = params.narrow_peak ? 'narrowPeak' : 'broadPeak'
def peak_type = is_narrow_peak ? 'narrowPeak' : 'broadPeak'
"""
plot_macs2_qc.r \\
-i ${peaks.join(',')} \\
Expand Down
3 changes: 2 additions & 1 deletion modules/local/samplesheet_check.nf
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,11 @@ process SAMPLESHEET_CHECK {
task.ext.when == null || task.ext.when

script: // This script is bundled with the pipeline, in nf-core/chipseq/bin/
def args = task.ext.args ?: ''
"""
check_samplesheet.py \\
$samplesheet \\
samplesheet.valid.csv
$args

cat <<-END_VERSIONS > versions.yml
"${task.process}":
Expand Down
3 changes: 3 additions & 0 deletions modules/local/star_genomegenerate.nf
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,9 @@ process STAR_GENOMEGENERATE {
path "star" , emit: index
path "versions.yml", emit: versions

when:
task.ext.when == null || task.ext.when

script:
def args = (task.ext.args ?: '').tokenize()
def memory = task.memory ? "--limitGenomeGenerateRAM ${task.memory.toBytes() - 100000000}" : ''
Expand Down
21 changes: 8 additions & 13 deletions subworkflows/local/prepare_genome.nf
Original file line number Diff line number Diff line change
Expand Up @@ -41,13 +41,7 @@ workflow PREPARE_GENOME {
ch_fasta = GUNZIP_FASTA ( [ [:], params.fasta ] ).gunzip.map{ it[1] }
ch_versions = ch_versions.mix(GUNZIP_FASTA.out.versions)
} else {
ch_fasta = file(params.fasta)
}

// Make fasta file available if reference saved or IGV is run
if (params.save_reference || !params.skip_igv) {
file("${params.outdir}/genome/").mkdirs()
ch_fasta.copyTo("${params.outdir}/genome/")
ch_fasta = Channel.value(file(params.fasta))
}

//
Expand Down Expand Up @@ -107,14 +101,15 @@ workflow PREPARE_GENOME {
ch_gene_bed = GUNZIP_GENE_BED ( [ [:], params.gene_bed ] ).gunzip.map{ it[1] }
ch_versions = ch_versions.mix(GUNZIP_GENE_BED.out.versions)
} else {
ch_gene_bed = file(params.gene_bed)
ch_gene_bed = Channel.value(file(params.gene_bed))
}
}

//
// Create chromosome sizes file
//
ch_chrom_sizes = CUSTOM_GETCHROMSIZES ( [ [:], ch_fasta ] ).sizes.map{ it[1] }
CUSTOM_GETCHROMSIZES ( ch_fasta.map { [ [:], it ] } )
ch_chrom_sizes = CUSTOM_GETCHROMSIZES.out.sizes.map { it[1] }
ch_fai = CUSTOM_GETCHROMSIZES.out.fai.map{ it[1] }
ch_versions = ch_versions.mix(CUSTOM_GETCHROMSIZES.out.versions)

Expand Down Expand Up @@ -144,7 +139,7 @@ workflow PREPARE_GENOME {
ch_bwa_index = file(params.bwa_index)
}
} else {
ch_bwa_index = BWA_INDEX ( [ [:], ch_fasta ] ).index
ch_bwa_index = BWA_INDEX ( ch_fasta.map { [ [:], it ] } ).index
ch_versions = ch_versions.mix(BWA_INDEX.out.versions)
}
}
Expand All @@ -162,7 +157,7 @@ workflow PREPARE_GENOME {
ch_bowtie2_index = [ [:], file(params.bowtie2_index) ]
}
} else {
ch_bowtie2_index = BOWTIE2_BUILD ( [ [:], ch_fasta ] ).index
ch_bowtie2_index = BOWTIE2_BUILD ( ch_fasta.map { [ [:], it ] } ).index
ch_versions = ch_versions.mix(BOWTIE2_BUILD.out.versions)
}
}
Expand All @@ -180,7 +175,7 @@ workflow PREPARE_GENOME {
ch_chromap_index = [ [:], file(params.chromap_index) ]
}
} else {
ch_chromap_index = CHROMAP_INDEX ( [ [:], ch_fasta ] ).index
ch_chromap_index = CHROMAP_INDEX ( ch_fasta.map { [ [:], it ] } ).index
ch_versions = ch_versions.mix(CHROMAP_INDEX.out.versions)
}
}
Expand All @@ -195,7 +190,7 @@ workflow PREPARE_GENOME {
ch_star_index = UNTAR_STAR_INDEX ( [ [:], params.star_index ] ).untar.map{ it[1] }
ch_versions = ch_versions.mix(UNTAR_STAR_INDEX.out.versions)
} else {
ch_star_index = file(params.star_index)
ch_star_index = Channel.value(file(params.star_index))
}
} else {
ch_star_index = STAR_GENOMEGENERATE ( ch_fasta, ch_gtf ).index
Expand Down
9 changes: 5 additions & 4 deletions workflows/chipseq.nf
Original file line number Diff line number Diff line change
Expand Up @@ -205,7 +205,7 @@ workflow CHIPSEQ {
ch_samtools_stats = FASTQ_ALIGN_BOWTIE2.out.stats
ch_samtools_flagstat = FASTQ_ALIGN_BOWTIE2.out.flagstat
ch_samtools_idxstats = FASTQ_ALIGN_BOWTIE2.out.idxstats
ch_versions = ch_versions.mix(FASTQ_ALIGN_BOWTIE2.out.versions.first())
ch_versions = ch_versions.mix(FASTQ_ALIGN_BOWTIE2.out.versions)
}

//
Expand All @@ -229,7 +229,7 @@ workflow CHIPSEQ {
ch_samtools_stats = FASTQ_ALIGN_CHROMAP.out.stats
ch_samtools_flagstat = FASTQ_ALIGN_CHROMAP.out.flagstat
ch_samtools_idxstats = FASTQ_ALIGN_CHROMAP.out.idxstats
ch_versions = ch_versions.mix(FASTQ_ALIGN_CHROMAP.out.versions.first())
ch_versions = ch_versions.mix(FASTQ_ALIGN_CHROMAP.out.versions)
}

//
Expand Down Expand Up @@ -274,7 +274,7 @@ workflow CHIPSEQ {
PICARD_MERGESAMFILES (
ch_sort_bam
)
ch_versions = ch_versions.mix(PICARD_MERGESAMFILES.out.versions.first().ifEmpty(null))
ch_versions = ch_versions.mix(PICARD_MERGESAMFILES.out.versions.first())

//
// SUBWORKFLOW: Mark duplicates & filter BAM files after merging
Expand Down Expand Up @@ -549,7 +549,8 @@ workflow CHIPSEQ {
// MODULE: MACS2 QC plots with R
//
PLOT_MACS2_QC (
ch_macs2_peaks.collect{it[1]}
ch_macs2_peaks.collect{it[1]},
params.narrow_peak
)
ch_versions = ch_versions.mix(PLOT_MACS2_QC.out.versions)

Expand Down