Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
589ff09
Adding in nano plot updates
thomas-watchmaker Apr 9, 2025
79b1303
Added RSEQC genebodycoverage
eduard-watchmaker Apr 17, 2025
9664705
Tidying
eduard-watchmaker Apr 17, 2025
ff89141
using our wmg ecr cache
eduard-watchmaker Apr 22, 2025
ac76cac
Merge pull request #1 from watchmaker-genomics/rseqc_genebodycoverage
eduard-watchmaker Apr 22, 2025
9f3cea8
updated process_medium to process_high in fastqc main
dave-watchmaker May 1, 2025
96a5941
Merge pull request #2 from watchmaker-genomics/increase_fastqc_cpu
dave-watchmaker May 1, 2025
1381f5e
first crack at modification
thomas-watchmaker May 23, 2025
f7dffeb
Have to qualify the docker path
thomas-watchmaker May 23, 2025
9350b88
moving closig bracket down
thomas-watchmaker May 23, 2025
e33a5b5
Merge pull request #3 from watchmaker-genomics/downsampling
thomas-watchmaker May 29, 2025
eee1d05
Added parametres to publish the files
eduard-watchmaker Jun 3, 2025
38108f3
Added parametres to publish the files
eduard-watchmaker Jun 3, 2025
1a49de0
First commit of restrander
thomas-watchmaker Jul 8, 2025
eef5c6e
squash me
thomas-watchmaker Jul 8, 2025
a66497f
squash me
thomas-watchmaker Jul 8, 2025
23c3805
squash me
thomas-watchmaker Jul 8, 2025
837a166
squash me
thomas-watchmaker Jul 8, 2025
f8e714c
squash me
thomas-watchmaker Jul 8, 2025
10a6a99
squash me
thomas-watchmaker Jul 8, 2025
8c72143
squash me
thomas-watchmaker Jul 8, 2025
acf6ee2
Edited check_samplesheet.py so that the restrander config file can be…
julietmWM Jul 22, 2025
6cba169
Rebuilt the restrander docker image with Multi_Arch support so change…
julietmWM Jul 22, 2025
1850eef
Removed /bin/bash ENTRYPOINT from the dockerfile because of a /bin.ba…
julietmWM Jul 22, 2025
03cd574
file prefix issue
julietmWM Jul 22, 2025
659d2e1
file prefix issue
julietmWM Jul 22, 2025
4e0b087
Added line to see the sample sheet before processing.
julietmWM Jul 28, 2025
5cb5df5
Tweeking for sample sheet errors.
julietmWM Jul 28, 2025
85e706f
Tweeking for sample sheet errors.
julietmWM Jul 28, 2025
fba0a08
Tweeking for sample sheet errors.
julietmWM Jul 28, 2025
62c02d5
Tweeking for sample sheet errors.
julietmWM Jul 28, 2025
43e0332
Fixing tuple structure propagation issues
mark-alence-watchmaker Jul 29, 2025
4605577
Cleaning up the Restrander-related code and adding some comments.
julietmWM Aug 5, 2025
d853ab6
Removing unnecessary .view statements.
julietmWM Aug 5, 2025
63bcfd4
More cleaning.
julietmWM Aug 5, 2025
bbe6ea6
Debugging non-restrander run errors.
julietmWM Aug 11, 2025
5c0e5cc
Debugging non-restrander run errors.
julietmWM Aug 11, 2025
fc70be5
Debugging non-restrander run errors.
julietmWM Aug 11, 2025
d5b1ac4
Debugging non-restrander run errors.
julietmWM Aug 12, 2025
68dcbc3
The bug was fixed with the last commit. Now just cleaning up the code.
julietmWM Aug 12, 2025
cc65ad1
Added Restrander information to the usage and output docs.
julietmWM Aug 12, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 36 additions & 0 deletions aws_batch.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
/*
========================================================================================
wmg_nextflow/masterworkflow Nextflow AWS Batch config file
========================================================================================
Default config options for AWS Batch
----------------------------------------------------------------------------------------
*/


params {
awsqueue = 'nextflow-with-dockerhub-aws-batch-large'
awsregion = 'us-west-2'
run = 'default'
// Max resource options
max_memory = '256.GB'
max_cpus = 256
max_time = '240.h'
outdir = "s3://watchmaker-lts/nanoseq/${params.run}/"
}




process {
executor = 'awsbatch'
queue = 'nextflow-with-dockerhub-aws-batch-large'
}

aws {
batch {
cliPath = '/home/ec2-user/miniconda/bin/aws'
}
region = 'us-west-2'
}

workDir = "s3://watchmaker-lts/nanoseq/work/"
20 changes: 10 additions & 10 deletions bin/check_samplesheet.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,19 +49,19 @@ def read_head(handle, num_lines=10):
def check_samplesheet(file_in, updated_path, file_out):
"""
This function checks that the samplesheet follows the following structure:
group,replicate,barcode,input_file,fasta,gtf
MCF7,1,,MCF7_directcDNA_replicate1.fastq.gz,genome.fa,
MCF7,2,,MCF7_directcDNA_replicate3.fastq.gz,genome.fa,genome.gtf
K562,1,,K562_directcDNA_replicate1.fastq.gz,genome.fa,
K562,2,,K562_directcDNA_replicate4.fastq.gz,,transcripts.fa
group,replicate,barcode,input_file,fasta,gtf,restrander_config
MCF7,1,,MCF7_directcDNA_replicate1.fastq.gz,genome.fa,,restrander_config.json
MCF7,2,,MCF7_directcDNA_replicate3.fastq.gz,genome.fa,genome.gtf, restrander_config.json
K562,1,,K562_directcDNA_replicate1.fastq.gz,genome.fa,,
K562,2,,K562_directcDNA_replicate4.fastq.gz,,transcripts.fa,
"""

input_extensions = []
sample_info_dict = {}
with open(file_in, "r") as fin:
## Check header
MIN_COLS = 3
HEADER = ["group", "replicate", "barcode", "input_file", "fasta", "gtf"]
HEADER = ["group", "replicate", "barcode", "input_file", "fasta", "gtf", "restrander_config"]
header = fin.readline().strip().split(",")
if header[: len(HEADER)] != HEADER:
print("ERROR: Please check samplesheet header -> {} != {}".format(",".join(header), ",".join(HEADER)))
Expand All @@ -80,7 +80,7 @@ def check_samplesheet(file_in, updated_path, file_out):
print_error("Invalid number of populated columns (minimum = {})!".format(MIN_COLS), "Line", line)

## Check group name entries
group, replicate, barcode, input_file, fasta, gtf = lspl[: len(HEADER)]
group, replicate, barcode, input_file, fasta, gtf, restrander_config = lspl[: len(HEADER)]
if group:
if group.find(" ") != -1:
print_error("Group entry contains spaces!", "Line", line)
Expand Down Expand Up @@ -177,8 +177,8 @@ def check_samplesheet(file_in, updated_path, file_out):
# is_transcripts = '1'
# genome = transcriptome

## Create sample mapping dictionary = {group: {replicate : [ barcode, input_file, genome, gtf, is_transcripts, nanopolish_fast5 ]}}
sample_info = [barcode, input_file, fasta, gtf, is_transcripts, nanopolish_fast5]
## Create sample mapping dictionary = {group: {replicate : [ barcode, input_file, genome, gtf, is_transcripts, nanopolish_fast5, restrander_config ]}}
sample_info = [barcode, input_file, fasta, gtf, is_transcripts, nanopolish_fast5, restrander_config]
if group not in sample_info_dict:
sample_info_dict[group] = {}
if replicate not in sample_info_dict[group]:
Expand All @@ -200,7 +200,7 @@ def check_samplesheet(file_in, updated_path, file_out):
make_dir(out_dir)
with open(file_out, "w") as fout:
fout.write(
",".join(["sample", "barcode", "input_file", "fasta", "gtf", "is_transcripts", "nanopolish_fast5"])
",".join(["sample", "barcode", "input_file", "fasta", "gtf", "is_transcripts", "nanopolish_fast5", "restrander_config"])
+ "\n"
)
for sample in sorted(sample_info_dict.keys()):
Expand Down
23 changes: 23 additions & 0 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,16 @@ process {
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

// Publish dir for RESTRANDER
withName: RESTRANDER {
publishDir = [
path: { "${params.outdir}/restrander" },
mode: 'copy',
enabled: true,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}
}

if (!params.skip_demultiplexing) {
Expand Down Expand Up @@ -467,6 +477,8 @@ if (params.call_variants) {
]
}
}


}
if (params.structural_variant_caller == 'sniffles') {
process {
Expand Down Expand Up @@ -535,6 +547,17 @@ if (params.call_variants) {
}

if (!params.skip_quantification) {
process {
withName: RSEQC_GENEBODYCOVERAGE {
publishDir = [
path: { "${params.outdir}/rseqc" },
mode: 'copy',
enabled: true,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}
}

if (params.quantification_method == "bambu") {
process {
withName: BAMBU {
Expand Down
17 changes: 17 additions & 0 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,23 @@ _Documentation_:
_Description_:
If you would like to run NanoLyse on the raw FASTQ files you can provide `--run_nanolyse` when running the pipeline. By default, the pipeline will filter lambda phage reads. However, you can provide your own FASTA file of "contaminants" with `--nanolyse_fasta`. The filtered FASTQ files will contain raw reads without the specified reference sequences (default: lambda phage sequences).

## cDNA Read Orientation

<details markdown="1">
<summary>Output files</summary>

- `restrander/<SAMPLE>_restrander.fq.gz`: FASTQ file of the stranded reads. The reverse strand reads are replaced with their reverse-complements, ensuring that all reads in the output have the same orientation as the original transcripts.
- `restrander/<SAMPLE>-unknowns.*_restrander.fq.gz`: FASTQ file of the reads whose strand could not be inferred.
- `restrander/<SAMPLE>.restrander.json`: Restrander output statistics - includes artefact and strand statistics.

</details>

_Documentation_:
[Restrander](https://github.com/mritchielab/restrander)

_Description_:
Restrander is a program designed for orienting and quality-checking cDNA sequencing reads. Restrander will run automatically if the protocol is cDNA and a Restrander config file is present in the sample sheet.

## Read QC

<details markdown="1">
Expand Down
71 changes: 43 additions & 28 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,13 @@ You will need to create a file with information about the samples in your experi

| Column | Description |
| ------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `group` | Group identifier for sample. This will be identical for replicate samples from the same experimental group. |
| `replicate` | Integer representing replicate number. Must start from `1..<number of replicates>`. |
| `barcode` | Barcode identifier attributed to that sample during multiplexing. Must be an integer. |
| `input_file` | Full path to FastQ file if previously demultiplexed, BAM file if previously aligned, or a path to a directory with subdirectories containing fastq or fast5 files. FastQ file has to be zipped and have the extension ".fastq.gz" or ".fq.gz". BAM file has to have the extension ".bam". |
| `fasta` | Genome fasta file or transcriptome fasta file for alignment. This can either be a local path, or the appropriate key for a genome available in [iGenomes config file](../conf/igenomes.config). Must have the extension ".fasta", ".fasta.gz", ".fa" or ".fa.gz". |
| `gtf` | Annotation gtf file for transcript discovery and quantification and RNA modification detection. This can either be blank or a local path. Must have the extension ".gtf". |
| `group` | Group identifier for sample. This will be identical for replicate samples from the same experimental group. |
| `replicate` | Integer representing replicate number. Must start from `1..<number of replicates>`. |
| `barcode` | Barcode identifier attributed to that sample during multiplexing. Must be an integer. |
| `input_file` | Full path to FastQ file if previously demultiplexed, BAM file if previously aligned, or a path to a directory with subdirectories containing fastq or fast5 files. FastQ file has to be zipped and have the extension ".fastq.gz" or ".fq.gz". BAM file has to have the extension ".bam". |
| `fasta` | Genome fasta file or transcriptome fasta file for alignment. This can either be a local path, or the appropriate key for a genome available in [iGenomes config file](../conf/igenomes.config). Must have the extension ".fasta", ".fasta.gz", ".fa" or ".fa.gz". |
| `gtf` | Annotation gtf file for transcript discovery and quantification and RNA modification detection. This can either be blank or a local path. Must have the extension ".gtf". |
| `restrander_config` | Restrander .json config file that provides the template-switching oligo (TSO) and reverse transcription primer (RTP) sequences. Different configurations are used for different library preparation protocols. This can either be blank or a file path. If blank, Restrander will not run for the sample. |

### Skip demultiplexing

Expand All @@ -26,13 +27,13 @@ As shown in the examples below, the accepted samplesheet format is different dep
##### Example `samplesheet.csv` for non-demultiplexed fastq inputs

```bash
group,replicate,barcode,input_file,fasta,gtf
WT_MOUSE,1,1,,mm10,
WT_HUMAN,1,2,,hg19,
WT_POMBE,1,3,,/path/to/local/genome.fa,
WT_DENOVO,1,4,,,/path/to/local/transcriptome.fa
WT_LOCAL,2,5,,/path/to/local/genome.fa,/path/to/local/transcriptome.gtf
WT_UNKNOWN,3,6,,,
group,replicate,barcode,input_file,fasta,gtf,restrander_config
WT_MOUSE,1,1,,mm10,,
WT_HUMAN,1,2,,hg19,,
WT_POMBE,1,3,,/path/to/local/genome.fa,,
WT_DENOVO,1,4,,,/path/to/local/transcriptome.fa,
WT_LOCAL,2,5,,/path/to/local/genome.fa,/path/to/local/transcriptome.gtf,
WT_UNKNOWN,3,6,,,,
```

##### Example command for non-demultiplexed fastq inputs
Expand All @@ -52,11 +53,11 @@ nextflow run nf-core/nanoseq \
##### Example `samplesheet.csv` for demultiplexed fastq inputs

```bash
group,replicate,barcode,input_file,fasta,gtf
WT,1,,SAM101A1.fastq.gz,hg19,
WT,2,,SAM101A2.fastq.gz,hg19,
KO,1,,SAM101A3.fastq.gz,hg19,
KO,2,,SAM101A4.fastq.gz,hg19,
group,replicate,barcode,input_file,fasta,gtf,restrander_config
WT,1,,SAM101A1.fastq.gz,hg19,,
WT,2,,SAM101A2.fastq.gz,hg19,,
KO,1,,SAM101A3.fastq.gz,hg19,,
KO,2,,SAM101A4.fastq.gz,hg19,,
```

##### Example command for demultiplexed fastq inputs
Expand All @@ -74,11 +75,11 @@ nextflow run nf-core/nanoseq \
##### Example `samplesheet.csv` for BAM inputs

```bash
group,replicate,barcode,input_file,fasta,gtf
WT,1,,SAM101A1.bam,hg19,
WT,2,,SAM101A2.bam,hg19,
KO,1,,SAM101A3.bam,hg19,
KO,2,,SAM101A4.bam,hg19,
group,replicate,barcode,input_file,fasta,gtf,restrander_config
WT,1,,SAM101A1.bam,hg19,,
WT,2,,SAM101A2.bam,hg19,,
KO,1,,SAM101A3.bam,hg19,,
KO,2,,SAM101A4.bam,hg19,,
```

##### Example command for BAM inputs
Expand All @@ -97,11 +98,11 @@ nextflow run nf-core/nanoseq \
##### Example `samplesheet.csv` for FAST5 and FASTQ input directories

```bash
group,replicate,barcode,input_file,fasta,gtf
WT,1,,/full/path/to/SAM101A1/,hg19.fasta,hg19.gtf
WT,2,,/full/path/to/SAM101A2/,hg19.fasta,hg19.gtf
KO,1,,/full/path/to/SAM101A3/,hg19.fasta,hg19.gtf
KO,2,,/full/path/to/SAM101A4/,hg19.fasta,hg19.gtf
group,replicate,barcode,input_file,fasta,gtf,restrander_config
WT,1,,/full/path/to/SAM101A1/,hg19.fasta,hg19.gtf,
WT,2,,/full/path/to/SAM101A2/,hg19.fasta,hg19.gtf,
KO,1,,/full/path/to/SAM101A3/,hg19.fasta,hg19.gtf,
KO,2,,/full/path/to/SAM101A4/,hg19.fasta,hg19.gtf,
```

##### Each of the FAST5 and FASTQ input directory should have the following structure:
Expand All @@ -128,6 +129,20 @@ nextflow run nf-core/nanoseq \
-profile <docker/singularity/institute>
```

### Using Restrander

Restrander is a program used for orienting and quality-checking cDNA sequencing reads. Restrander will automatically run if the protocol is cDNA and a Restrander config file is present in the sample sheet. Examples of Restrander configuration files for several protocols can be found in the [README](https://github.com/jakob-schuster/restrander-vignette?tab=readme-ov-file#configuration-files) for the Restrander vignette. The sample sheet can have a mix of samples with and without Restrander config files.

##### Example `samplesheet.csv` for using Restrander

```bash
group,replicate,barcode,input_file,fasta,gtf,restrander_config
WT,1,1,/full/path/to/SAM101A1/,hg19,hg19.gtf,
WT,2,2,/full/path/to/SAM101A2/,hg19,hg19.gtf,
KO,1,3,/full/path/to/SAM101A3/,hg19,hg19.gtf,PCB109.json
KO,2,4,/full/path/to/SAM101A4/,hg19,hg19.gtf,PCB109.json
```

## Running the pipeline

The typical command for running the pipeline is as follows:
Expand Down
31 changes: 31 additions & 0 deletions modules/local/restrander.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
process RESTRANDER {
tag "$meta.id"
label 'process_medium'

container "${'912684371407.dkr.ecr.us-west-2.amazonaws.com/restrander:1.2'}"

input:
tuple val(meta), path(reads), path(input_config)

output:
tuple val(meta), path("*_restrander.fq.gz") , emit: reads
tuple val(meta), path("*.restrander.json") , emit: metrics
path "versions.yml" , emit: versions

when:
task.ext.when == null || task.ext.when

script:
def prefix = task.ext.prefix ?: reads.getBaseName()
"""
/restrander \\
${reads} \\
${prefix}_restrander.fq.gz \\
${input_config} > ${prefix}.restrander.json

cat <<-END_VERSIONS > versions.yml
"${task.process}":
restrander: v1.0.1
END_VERSIONS
"""
}
33 changes: 33 additions & 0 deletions modules/local/rseqc_genebodycoverage.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
process RSEQC_GENEBODYCOVERAGE {
label 'process_high'
container "912684371407.dkr.ecr.us-west-2.amazonaws.com/quay.io/biocontainers/rseqc:3.0.1--py37h516909a_1"

input:
tuple path(bam), path(bai), path(bed12)

output:
path("*.pdf") , emit: pdf
path("*.geneBodyCoverage.txt") , emit: rna_txt_ch
path("versions.yml") , emit: versions

when:
task.ext.when == null || task.ext.when

script:
def args = task.ext.args ?: ''
def name = bam.getName().replaceAll(/\.bam$/, '')

"""
geneBody_coverage.py \\
$args \\
--refgene=$bed12 \\
--input=$bam \\
--minimum_length=100 \\
--out-prefix=${name}

cat <<-END_VERSIONS > versions.yml
"${task.process}":
rseqc: \$(geneBody_coverage.py --version | sed -e "s/geneBody_coverage.py //g")
END_VERSIONS
"""
}
2 changes: 1 addition & 1 deletion modules/nf-core/fastqc/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 5 additions & 1 deletion modules/nf-core/nanoplot/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading