CRAM writer option for read name omission #328

athos · 2024-11-19T11:11:40Z

This PR adds a new option to enable read name omission for the CRAM writer.
By default, read name omission is disabled. To enable it, specify the option :omit-read-names? true:

(require '[cljam.io.cram :as cram])

(with-open [w (cram/writer "path/to/cram/file" {..., :omit-read-names? true, ...})]
  ...)

Specification

According to the CRAM specification, a CRAM encoder may omit encoding read names (QNAME), which makes the decoder regenerate them automatically. Details about the CRAM specification have been summarized in a previous PR.

Note that read name omission relies on non-detached mate record encoding. This means that read names can only be omitted for records encoded as non-detached mates. Records encoded as detached, including those for single-end reads, cannot have their read names omitted.

Implementation

When the option :omit-read-names? true is specified, the CRAM writer sets the RN field in the compression header to true and attempts to omit encoding read names for each record.

Read name omission applies only to non-detached mate records consisting of primary and representative alignments. For other mate records, such as those involving secondary or supplementary alignments, read names will still be written to the RN data series, even if :omit-read-names? true is specified.

There is no surefire way to determine whether a set of mate records involves secondary or supplementary alignments. cljam's CRAM writer identifies the presence of secondary alignments using the TC tag and supplementary alignments using the SA tag (which is the same approach taken by htslib).

However, it's still the user's responsibility to ensure that these tags are appropriately attached to records so that the writer can correctly identify secondary and supplementary alignments.

codecov · 2024-11-19T11:15:27Z

Codecov Report

Attention: Patch coverage is 92.85714% with 3 lines in your changes missing coverage. Please review.

Project coverage is 89.97%. Comparing base (a8966d9) to head (bc9566c).

Files with missing lines	Patch %	Lines
src/cljam/io/cram/encode/record.clj	92.50%	1 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #328      +/-   ##
==========================================
- Coverage   89.99%   89.97%   -0.02%     
==========================================
  Files         104      104              
  Lines        9323     9341      +18     
  Branches      488      490       +2     
==========================================
+ Hits         8390     8405      +15     
- Misses        445      446       +1     
- Partials      488      490       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚨 Try these New Features:

Flaky Tests Detection - Detect and resolve failed and flaky tests

athos · 2024-11-21T09:31:04Z

Rebased this onto the latest master and marked it as ready for review.

athos self-assigned this Nov 19, 2024

Base automatically changed from feature/non-detached-mate-records to master November 20, 2024 12:03

Omit read names if :omit-read-names is specified

bc9566c

athos force-pushed the feature/read-name-removal branch from fd894db to bc9566c Compare November 21, 2024 08:53

athos marked this pull request as ready for review November 21, 2024 09:21

athos requested review from alumi and a team as code owners November 21, 2024 09:21

athos requested review from niyarin and removed request for a team November 21, 2024 09:21

alumi approved these changes Nov 22, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CRAM writer option for read name omission #328

CRAM writer option for read name omission #328

athos commented Nov 19, 2024

codecov bot commented Nov 19, 2024 •

edited

Loading

athos commented Nov 21, 2024

CRAM writer option for read name omission #328

Are you sure you want to change the base?

CRAM writer option for read name omission #328

Conversation

athos commented Nov 19, 2024

Specification

Implementation

codecov bot commented Nov 19, 2024 • edited Loading

Codecov Report

athos commented Nov 21, 2024

codecov bot commented Nov 19, 2024 •

edited

Loading