-
Notifications
You must be signed in to change notification settings - Fork 2
Feature/sassy integration #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Feature/sassy integration #5
Conversation
- Replace WFA2 with SASSY for better performance on short DNA sequences - Simplify build process (no more external C library dependencies) - Update CIGAR string parsing to handle SASSY format - All tests passing (12/12) - Updated README with simplified installation instructions Breaking changes: - Removed WFA2LIB_PATH requirement - Updated output CIGAR format from WFA2 to standard format
let max_errors = (max_mismatches + max_bulges) as usize; | ||
|
||
// Create SASSY searcher with DNA profile | ||
let mut searcher: Searcher<Dna> = Searcher::new(false, None); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to support N characters, consider IUPAC profile here.
For reverse-complement searchers, also change false
to true
.
let max_errors = (max_mismatches + max_bulges) as usize; | ||
|
||
// Create SASSY searcher with DNA profile | ||
let mut searcher: Searcher<Dna> = Searcher::new(false, None); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending on how you use this, you may want to reuse the searcher object between consecutive invocations, so allocations are reused.
let mut searcher: Searcher<Dna> = Searcher::new(false, None); | ||
|
||
// Convert window to a Vec so it implements SearchAble | ||
let window_vec = window.to_vec(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, we should fix this @rickbeeloo
src/main.rs
Outdated
|
||
// Convert SASSY CIGAR to standard format | ||
let cigar_debug = format!("{:?}", best_match.cigar); | ||
let cigar_str = parse_sassy_cigar_debug(&cigar_debug); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to use Cigar::to_string()
here, right?
- Fixed incorrect position reporting in scan_window_sassy function - Added manual verification for match positions in windows - Improved SASSY CIGAR parsing with proper fallback handling - Enhanced debug output for troubleshooting alignment issues - Added support for correct query start/end coordinates in PAF output - Fixed CFD score calculation with proper target sequence extraction - Added test files for multi-sequence testing
- Fixed CFD key construction to use correct RNA-DNA pairing format - Corrected position calculation for match reporting - Updated CIGAR parsing to handle count+operation format properly - All tests now passing including CFD score validation - Cleaned up debug output for production
- Basic CFD functionality works correctly for simple cases - Comprehensive test suite has 12/20 failing cases that need investigation - Main tool functionality (position calculation, CIGAR parsing) is working - Will debug CFD matrix lookup issues in separate PR
Major improvements: - Replace alignment engine with SASSY for accurate sequence matching - Implement position-dependent CFD (Cutting Frequency Determination) scoring - Fix target coordinate calculation for proper off-target detection - Add support for mismatches and indels in alignment - Clean up debug output for production-ready tool - Improve PAF output format with CFD scores Technical changes: - Integrate SASSY library for approximate string matching - Add CFD calculation with position-specific mismatch penalties - Fix coordinate mapping from window positions to absolute positions - Implement proper target sequence extraction for CFD scoring - Add comprehensive test cases for validation Breaking changes: - Output format now includes CFD scores (cf:f tag) - Improved coordinate accuracy may change previous results
🚀 SASSY Integration for CRISPRapido
Summary
Replace WFA2 with SASSY for approximate string matching, significantly simplifying the build process and improving performance for short DNA sequences.
Changes
cargo build
- no external deps!)Performance
Testing
Breaking Changes
WFA2LIB_PATH
environment variable requirement