Skip to content

pinellolab/crisprapido

Repository files navigation

CRISPRapido

CRISPRapido Logo

CRISPRapido is a reference-free tool for comprehensive detection of CRISPR off-target sites using complete genome assemblies. Unlike traditional approaches that rely on reference genomes and variant files, CRISPRapido directly analyzes haplotype-resolved assemblies to identify potential off-targets arising from any form of genetic variation. By leveraging the efficient Wavefront Alignment (WFA) algorithm and parallel processing, CRISPRapido enables fast scanning of whole genomes while considering both mismatches and DNA/RNA bulges. The tool is particularly valuable for therapeutic applications, where comprehensive off-target analysis is critical for safety assessment. CRISPRapido can process both complete assemblies and raw sequencing data, providing flexibility for different analysis scenarios while maintaining high computational efficiency through its robust Rust implementation.

Features

  • Fast parallel scanning of genomic sequences
  • Support for both gzipped and plain FASTA files
  • Configurable mismatch and bulge tolerances
  • Automatic reverse complement scanning
  • PAF-format output compatible with downstream analysis tools
  • Multi-threaded processing for improved performance

Installation

You need to build WFA2-lib first, which is a submodule of this repository. To do so, run:

git clone --recursive https://github.com/pinellolab/crisprapido.git
cd crisprapido/WFA2-lib
make clean all
cd ..

Then, you can install CRISPRapido using Cargo:

# Point to your pre-built WFA2-lib directory
export WFA2LIB_PATH="./WFA2-lib"

# Install CRISPRapido
cargo install --git https://github.com/pinellolab/crisprapido.git

For GUIX's users

git clone --recursive https://github.com/pinellolab/crisprapido.git
cd crisprapido/WFA2-lib
guix shell -C -D -f guix.scm
export CC=gcc; make clean all
exit
cd ..
env -i bash -c 'WFA2LIB_PATH="./WFA2-lib" PATH=/usr/local/bin:/usr/bin:/bin ~/.cargo/bin/cargo install --path .'

Usage

crisprapido -r <reference.fa> -g <guide_sequence> [OPTIONS]

Required Arguments

  • -r, --reference <FILE>: Input reference FASTA file (supports .fa and .fa.gz)
  • -g, --guide <SEQUENCE>: Guide RNA sequence (without PAM)

Optional Arguments

  • -m, --max-mismatches <NUM>: Maximum number of mismatches allowed (default: 4)
  • -b, --max-bulges <NUM>: Maximum number of bulges allowed (default: 1)
  • -z, --max-bulge-size <NUM>: Maximum size of each bulge in bp (default: 2)
  • -w, --window-size <NUM>: Size of sequence window to scan (default: 4x guide length)
  • -t, --threads <NUM>: Number of threads to use (default: number of logical CPUs)
  • --no-filter: Disable all filtering (report every alignment)

Output Format

CRISPRapido outputs results in the Pairwise Alignment Format (PAF), which is widely used for representing genomic alignments. Each line represents a potential off-target site with the following tab-separated fields:

Column Field Description
1 Query name "Guide" (the guide RNA sequence)
2 Query length Length of the guide RNA
3 Query start 0-based start position in the guide sequence
4 Query end 0-based end position in the guide sequence
5 Strand '+' (forward) or '-' (reverse complement)
6 Target name Reference sequence name (e.g., chromosome)
7 Target length Length of the target reference sequence
8 Target start 0-based start position in reference
9 Target end 0-based end position in reference
10 Matches Number of matching bases
11 Block length Total alignment block length
12 Mapping quality Always 255 for CRISPRapido

Additionally, CRISPRapido includes these custom tags:

Tag Description
as:i Alignment score (lower is better)
nm:i Number of mismatches
ng:i Number of gaps (indels)
bs:i Biggest gap size in bases
cg:Z CIGAR string representing alignment details

Example Output

Guide   20      0       20      +       chr1    248956422       10050   10070   19      21      255     as:i:6  nm:i:1  ng:i:0  bs:i:0  cg:Z:19=1X

This indicates:

  • A 20bp guide RNA aligned to chromosome 1
  • Position 10050-10070 on the forward strand
  • 19 bases match with 1 mismatch (nm:i:1)
  • No gaps (ng:i:0)
  • Alignment score of 6 (as:i:6)
  • CIGAR string shows 19 matches followed by 1 mismatch

PAF Format Specification

For more details on the PAF format, see the official specification from the developers of miniasm.

Example

crisprapido -r genome.fa -g ATCGATCGATCG -m 3 -b 1 -z 2

Testing

Run the test suite:

# Point to your pre-built WFA2-lib directory
export WFA2LIB_PATH="./WFA2-lib"

cargo test

Enable debug output during development:

cargo run --features debug

License

See LICENSE file

Citation

Stay tuned!

About

use WFA2 to scan for CRISPR guide targets

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages