Nextflow workflows for characterizing bovine immunological nanopore sequencing data

This workflow has been tested on Narval but should work on any of the other Digital Research Alliance of Canada systems. With a bit of editing these scripts should be able to run on any system with Nextflow, Fastqc, Fastp, Seqkit, and Bioawk installed.

Running the pipeline with default parameters.

sbatch --output pipeline_run-%j.out path/to/run_pipeline.sh path/to/base/called/files/folder path/to/output/folder

This will run fastp with deduplication enabled and the default read length filters on ever bc*.fastq.gz file in path/to/base/called/files/folder followed by IgM filtering for every fastp-filtered file in path/to/output/folder/fastp-filtered

Then fastqc, and seqkit stats is run on every *.fastq.gz file under path/to/output/folder and the results put in path/to/output/folder/fastqc_data and path/to/output/folder/sequence_stats respectively.

Manually running each step

These steps should be run in an interactive job. The nextflow workflows take an input folder and an output folder. All samples are automatically placed in a subfolder under the output folder.

Fastp filtering

Default length filters, deduplication enabled

fastp_filtering.sh -D base_called_files_folder results/fastp-filtered_folder

Default length filters, no deduplication

fastp_filtering.sh base_called_files_folder results/fastp-filtered_folder

default minimum read length filter, no max length filter, no deduplication

fastp_filtering.sh -M 0 base_called_files_folder results/fastp-filtered_folder

default maximum read length filter, no min length filter, no deduplication

fastp_filtering.sh -m 0 base_called_files_folder results/fastp-filtered_folder

no min or max read length filter, no deduplication

fastp_filtering.sh -m 0 -M 0 base_called_files_folder fastp-filtered_folder

IgM filtering The IgM filtering script takes an input folder and an output folder. It then runs all bc*-<filter>.fastq.gz through the pipeline and puts them in ouput_folder/bc*-<filter>

Reads that passed fastp filtering step

IgM_filtering.sh fastp-filtered_folder path/to/output/folder/IgM-filtered

Reads that failed fastp filtering step

IgM_filtering.sh -f "fastp_failed" fastp-filtered_folder path/to/output/folder/IgM-filtered

To run a single file file through the IgM filtering IgM_filtering.sh fastp-filtered_folder/bc## path/to/output/folder/IgM-filtered

Run fastqc on all fastq.gz files in the output folder fastqc.sh path/to/output/folder path/to/output/folder/fastqc_data
Run seqkit stats on all fastq.gz files in the output folder seqkit_stats.sh path/to/output/folder path/to/output/folder/sequence_stats

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
modules		modules
subworkflows		subworkflows
.gitignore		.gitignore
IgM_filtering.nf		IgM_filtering.nf
IgM_filtering.sh		IgM_filtering.sh
README.md		README.md
fastp_filtering.nf		fastp_filtering.nf
fastp_filtering.sh		fastp_filtering.sh
fastqc.nf		fastqc.nf
fastqc.sh		fastqc.sh
launch_interactive_job.sh		launch_interactive_job.sh
monte_carlo.py		monte_carlo.py
run_pipeline.sh		run_pipeline.sh
seqkit_stats.nf		seqkit_stats.nf
seqkit_stats.sh		seqkit_stats.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Nextflow workflows for characterizing bovine immunological nanopore sequencing data

Running the pipeline with default parameters.

Manually running each step

Default length filters, deduplication enabled

Default length filters, no deduplication

default minimum read length filter, no max length filter, no deduplication

default maximum read length filter, no min length filter, no deduplication

no min or max read length filter, no deduplication

Reads that passed fastp filtering step

Reads that failed fastp filtering step

About

Releases 2

Packages

Languages

harohodg/filtering_bovineIgM_rep

Folders and files

Latest commit

History

Repository files navigation

Nextflow workflows for characterizing bovine immunological nanopore sequencing data

Running the pipeline with default parameters.

Manually running each step

Default length filters, deduplication enabled

Default length filters, no deduplication

default minimum read length filter, no max length filter, no deduplication

default maximum read length filter, no min length filter, no deduplication

no min or max read length filter, no deduplication

Reads that passed fastp filtering step

Reads that failed fastp filtering step

About

Resources

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages