Skip to content

Conversation

FarnazSalehi94
Copy link

No description provided.

FarnazSalehi94 and others added 21 commits April 16, 2025 11:21
- Replace WFA2 with SASSY for better performance on short DNA sequences
- Simplify build process (no more external C library dependencies)
- Update CIGAR string parsing to handle SASSY format
- All tests passing (12/12)
- Updated README with simplified installation instructions

Breaking changes:
- Removed WFA2LIB_PATH requirement
- Updated output CIGAR format from WFA2 to standard format
- Fixed incorrect position reporting in scan_window_sassy function
- Added manual verification for match positions in windows
- Improved SASSY CIGAR parsing with proper fallback handling
- Enhanced debug output for troubleshooting alignment issues
- Added support for correct query start/end coordinates in PAF output
- Fixed CFD score calculation with proper target sequence extraction
- Added test files for multi-sequence testing
- Fixed CFD key construction to use correct RNA-DNA pairing format
- Corrected position calculation for match reporting
- Updated CIGAR parsing to handle count+operation format properly
- All tests now passing including CFD score validation
- Cleaned up debug output for production
- Basic CFD functionality works correctly for simple cases
- Comprehensive test suite has 12/20 failing cases that need investigation
- Main tool functionality (position calculation, CIGAR parsing) is working
- Will debug CFD matrix lookup issues in separate PR
Major improvements:
- Replace alignment engine with SASSY for accurate sequence matching
- Implement position-dependent CFD (Cutting Frequency Determination) scoring
- Fix target coordinate calculation for proper off-target detection
- Add support for mismatches and indels in alignment
- Clean up debug output for production-ready tool
- Improve PAF output format with CFD scores

Technical changes:
- Integrate SASSY library for approximate string matching
- Add CFD calculation with position-specific mismatch penalties
- Fix coordinate mapping from window positions to absolute positions
- Implement proper target sequence extraction for CFD scoring
- Add comprehensive test cases for validation

Breaking changes:
- Output format now includes CFD scores (cf:f tag)
- Improved coordinate accuracy may change previous results
feat: integrate SASSY aligner and implement CFD scoring
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant