Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRAM writer option for read name omission #328

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

athos
Copy link
Member

@athos athos commented Nov 19, 2024

This PR adds a new option to enable read name omission for the CRAM writer.
By default, read name omission is disabled. To enable it, specify the option :omit-read-names? true:

(require '[cljam.io.cram :as cram])

(with-open [w (cram/writer "path/to/cram/file" {..., :omit-read-names? true, ...})]
  ...)

Specification

According to the CRAM specification, a CRAM encoder may omit encoding read names (QNAME), which makes the decoder regenerate them automatically. Details about the CRAM specification have been summarized in a previous PR.

Note that read name omission relies on non-detached mate record encoding. This means that read names can only be omitted for records encoded as non-detached mates. Records encoded as detached, including those for single-end reads, cannot have their read names omitted.

Implementation

When the option :omit-read-names? true is specified, the CRAM writer sets the RN field in the compression header to true and attempts to omit encoding read names for each record.

Read name omission applies only to non-detached mate records consisting of primary and representative alignments. For other mate records, such as those involving secondary or supplementary alignments, read names will still be written to the RN data series, even if :omit-read-names? true is specified.

There is no surefire way to determine whether a set of mate records involves secondary or supplementary alignments. cljam's CRAM writer identifies the presence of secondary alignments using the TC tag and supplementary alignments using the SA tag (which is the same approach taken by htslib).

However, it's still the user's responsibility to ensure that these tags are appropriately attached to records so that the writer can correctly identify secondary and supplementary alignments.

@athos athos self-assigned this Nov 19, 2024
Copy link

codecov bot commented Nov 19, 2024

Codecov Report

Attention: Patch coverage is 92.85714% with 3 lines in your changes missing coverage. Please review.

Project coverage is 89.97%. Comparing base (a8966d9) to head (bc9566c).

Files with missing lines Patch % Lines
src/cljam/io/cram/encode/record.clj 92.50% 1 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #328      +/-   ##
==========================================
- Coverage   89.99%   89.97%   -0.02%     
==========================================
  Files         104      104              
  Lines        9323     9341      +18     
  Branches      488      490       +2     
==========================================
+ Hits         8390     8405      +15     
- Misses        445      446       +1     
- Partials      488      490       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

Base automatically changed from feature/non-detached-mate-records to master November 20, 2024 12:03
@athos athos marked this pull request as ready for review November 21, 2024 09:21
@athos athos requested review from alumi and a team as code owners November 21, 2024 09:21
@athos athos requested review from niyarin and removed request for a team November 21, 2024 09:21
@athos
Copy link
Member Author

athos commented Nov 21, 2024

Rebased this onto the latest master and marked it as ready for review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants