q2-pinocchio: PaIrwise alignment of long-read NucleOtide sequence data for Classification and quality Control in HIgh-thrOughput

QIIME 2 Plugin for quality control and taxonomic classification of long sequences

Installation

Step 1: Create q2-pinocchio environment

mamba create -n q2-pinocchio -c conda-forge -c bioconda -c https://packages.qiime2.org/qiime2/2024.10/metagenome/passed/ -c defaults q2cli q2-types q2-feature-classifier minimap2 bs4 samtools gzip chopper nanoplot

Step 2: Activate q2-pinocchio environment

conda activate q2-pinocchio

Step 3: Installing python package

pip install .

Provided Actions

build-index

Build a Minimap2 index database from reference sequences.
minimap2-search

Search for top hits in a reference database using alignment between the query sequences and reference database sequences using Minimap2. Returns a report of the top M hits for each query (where M=maxaccepts).
filter-reads

This method aligns long-read sequencing data (from a FASTQ file) to a set of reference sequences, identifying sequences that match or do not match the reference within a specified identity percentage. The alignment is performed using Minimap2, and the results are processed using Samtools.
extract-reads

This method aligns long-read sequencing data (from a FASTA file) to a set of reference sequences, identifying sequences that match or do not match the reference within a specified identity percentage. The alignment is performed using Minimap2, and the results are processed using Samtools.
classify-consensus-minimap2

Assign taxonomy to query sequences using Minimap2. Performs alignment between query and reference reads, then assigns consensus taxonomy to each query sequence.
trim

Trim long demultiplexed sequences using Chopper tool.
stats

Quality control statistics of long-read sequencing data using NanoPlot.

Examples

Download the input datasets

build-index

Build Minimap2 index database

qiime pinocchio build-index --i-reference reference.qza --o-index index.qza --verbose

minimap2-search

Generate both hits and no hits for each query. Keep a maximum of one hit per query (primary).

qiime pinocchio minimap2-search --i-query fasta_reads.qza --i-index index.qza --o-search-results paf.qza --verbose

Generate only hits for each query. Keep a maximum of one hit per query (primary mappings).

qiime pinocchio minimap2-search --i-query fasta_reads.qza --i-index index.qza --o-search-results paf_only_hits.qza --p-output-no-hits false --verbose

Generate only hits for each query, limiting the number of hits to a maximum of 3 per query. Ensure that each hit has a minimum similarity percentage of 90% to be considered valid.

qiime pinocchio minimap2-search --i-query fasta_reads.qza --i-index index.qza --o-search-results paf_only_hits_ma3.qza --p-maxaccepts 3 --p-output-no-hits false --verbose

filter-reads

Keep mapped (single-end reads)

qiime pinocchio filter-reads --i-query single-end-reads.qza --i-index index.qza --o-filtered-query mapped_se.qza --verbose

Keep unmapped (single-end reads)

qiime pinocchio filter-reads --i-query single-end-reads.qza --i-index index.qza --p-keep unmapped --o-filtered-query unmapped_se.qza --verbose

Keep mapped (paired-end reads)

qiime pinocchio filter-reads --i-query paired-end-reads.qza --i-index index.qza --o-filtered-query mapped_pe.qza --verbose

Keep mapped reads with mapping percentage >= 98% (paired-end reads)

qiime pinocchio filter-reads --i-query paired-end-reads.qza --i-index index.qza --p-min-per-identity 0.98  --o-filtered-query mapped_pe_over_98p_id.qza --verbose

extract-reads

Extract mapped

qiime pinocchio extract-reads --i-sequences fasta_reads.qza --i-index index.qza --o-extracted-reads mapped_fasta.qza --verbose

Extract unmapped

qiime pinocchio extract-reads --i-sequences fasta_reads.qza --i-index index.qza --p-extract unmapped --o-extracted-reads unmapped_fasta.qza --verbose

Extract mapped reads with mapping percentage >= 87%

qiime pinocchio extract-reads --i-sequences fasta_reads.qza --i-index index.qza --p-min-per-identity 0.87 --o-extracted-reads mapped_fasta_ido_ver_87.qza --verbose

classify-consensus-minimap2

Assign taxonomy to query sequences using Minimap2

qiime pinocchio classify-consensus-minimap2 --i-query n1K_initial_reads_SILVA132.fna.qza --i-index ccm_index.qza --i-reference-taxonomy raw_taxonomy.qza --p-n-threads 8 --output-dir classification_output --verbose

trim

Filter based on the quality (min)

qiime pinocchio trim --i-query single-end-reads.qza --p-min-quality 7 --o-filtered-query filt_qual_min.qza --verbose

Filter based on the quality (max)

qiime pinocchio trim --i-query single-end-reads.qza --p-max-quality 7 --o-filtered-query filt_qual_max.qza --verbose

Headcrop of all sequences ()

qiime pinocchio trim --i-query single-end-reads.qza --p-headcrop 10 --o-filtered-query headcrop.qza --verbose

Filter based on the length of the sequences (min)

qiime pinocchio trim --i-query single-end-reads.qza --p-min-length 3000 --o-filtered-query filt_len_min.qza --verbose

stats

Generate a visualization to display statistics about the sequences

qiime pinocchio stats --i-sequences single-end-reads.qza --o-visualization stats.qzv

To open:

qiime tools view stats.qzv

Name		Name	Last commit message	Last commit date
Latest commit History 141 Commits
.github/workflows		.github/workflows
ci/recipe		ci/recipe
q2_pinocchio		q2_pinocchio
.coveragerc		.coveragerc
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py
versioneer.py		versioneer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

q2-pinocchio: PaIrwise alignment of long-read NucleOtide sequence data for Classification and quality Control in HIgh-thrOughput

QIIME 2 Plugin for quality control and taxonomic classification of long sequences

Installation

Step 1: Create q2-pinocchio environment

Step 2: Activate q2-pinocchio environment

Step 3: Installing python package

Provided Actions

Examples

Download the input datasets

About

Releases

Packages

Contributors 3

Languages

License

bokulich-lab/q2-pinocchio

Folders and files

Latest commit

History

Repository files navigation

q2-pinocchio: PaIrwise alignment of long-read NucleOtide sequence data for Classification and quality Control in HIgh-thrOughput

QIIME 2 Plugin for quality control and taxonomic classification of long sequences

Installation

Step 1: Create q2-pinocchio environment

Step 2: Activate q2-pinocchio environment

Step 3: Installing python package

Provided Actions

Examples

Download the input datasets

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages