Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new BAM example for full PG entry #245

Closed
wants to merge 4 commits into from

Conversation

jamestwebber
Copy link
Contributor

This is an example script that adds a full PG tag to the header of a BAM file (sample.bam is the output of bam_write):

@HD	VN:1.6
@PG	ID:noodles-bam
@PG	ID:bam_add_tag	PN:bam_add_tag	PP:noodles-bam	VN:0.57.0	CL:target/release/examples/bam_add_tag_to_header sample.bam
@PG	ID:samtools	PN:samtools	PP:bam_add_tag	VN:1.19	CL:samtools view --with-header
@CO	an example BAM written by noodles-bam
*	4	*	0	255	*	*	0	0	*	*

I spent some time yesterday figuring out how to do this, and I figured the example might be useful. I'm also opening the PR to see if there are any improvements you can see.

The name could be more clear, perhaps bam_add_pg_entry would be better. People might expect "tag" to refer to the reads.

Constructing the PN, PP, VN, and CL tags seems cumbersome. Maybe these should be built-in tags (I believe they're part of the spec). Or if there's a way to construct them more cleanly I'd be happy to use that.

Copy link
Owner

@zaeleus zaeleus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the intentions of this, but handling this properly is more complex than what should be shown in an example. I'd like to first work on program record chaining before adding something like this. What you have now can certainly be used as a simple workaround.

_ => unreachable!(),
};
// the command-line to insert into the CL tag
let cmd_str = env::args().collect::<Vec<_>>().join(" ");
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra care needs to be taken for arguments containing tab characters.

noodles-bam/examples/bam_add_tag_to_header.rs Outdated Show resolved Hide resolved
Comment on lines 42 to 46
let program = if let Some(last_pg) = header.programs().keys().last() {
program.insert(pp, last_pg.clone())
} else {
program
};
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Header records are not guaranteed to be ordered by recency. This would be incorrect for, e.g.,

@PG	ID:pg1	PP:pg0
@PG	ID:pg0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so this is what you mean by program-chaining--you want to parse the order during Program construction?

@jamestwebber
Copy link
Contributor Author

I modified this to use the defined tags in map::program::tag, but it's okay to close this for now until you feel the API is ready. This PR was just a sneaky way for me to ask about those tags. 😂

@zaeleus
Copy link
Owner

zaeleus commented Apr 12, 2024

The SAM header programs are now wrapped by sam::header::Programs, which includes Programs::add. This handles either creating a new program chain or adding to existing ones, e.g.,

/// use noodles_sam::{
/// self as sam,
/// header::record::value::{map::{program::tag, Program}, Map},
/// };
///
/// let mut header = sam::Header::default();
/// let programs = header.programs_mut();
///
/// programs.add("pg0", Map::default())?;
/// programs.add("pg1", Map::default())?;
///
/// let expected = sam::Header::builder()
/// .add_program("pg0", Map::default())
/// .add_program("pg1", Map::builder().insert(tag::PREVIOUS_PROGRAM_ID, "pg0").build()?)
/// .build();
///
/// assert_eq!(programs, expected.programs());
/// # Ok::<_, Box<dyn std::error::Error>>(())

Note how pg0 is the root of the chain and pg1 links to pg0 via the previous program ID (PP) tag. It also covers more convoluted cases, e.g.,

#[test]
fn test_add() -> Result<(), Box<dyn std::error::Error>> {
let mut programs = Programs::default();
assert!(programs.as_ref().is_empty());
programs.add("pg0", Map::default())?;
let expected = Programs(
[(BString::from("pg0"), Map::default())]
.into_iter()
.collect(),
);
assert_eq!(programs, expected);
programs.add("pg1", Map::default())?;
let expected = Programs(
[
(BString::from("pg0"), Map::default()),
(
BString::from("pg1"),
Map::builder()
.insert(tag::PREVIOUS_PROGRAM_ID, "pg0")
.build()?,
),
]
.into_iter()
.collect(),
);
assert_eq!(programs, expected);
programs
.as_mut()
.insert(BString::from("pg2"), Map::default());
programs.add("pg3", Map::default())?;
let expected = Programs(
[
(BString::from("pg0"), Map::default()),
(
BString::from("pg1"),
Map::builder()
.insert(tag::PREVIOUS_PROGRAM_ID, "pg0")
.build()?,
),
(BString::from("pg2"), Map::default()),
(
BString::from("pg3"),
Map::builder()
.insert(tag::PREVIOUS_PROGRAM_ID, "pg2")
.build()?,
),
(
BString::from("pg3-pg1"),
Map::builder()
.insert(tag::PREVIOUS_PROGRAM_ID, "pg1")
.build()?,
),
]
.into_iter()
.collect(),
);
assert_eq!(programs, expected);
Ok(())
}

I added creating the self program record in the reheader examples, e.g.,

fn build_self_program() -> Result<Map<Program>, BuildError> {
let args = env::args().collect::<Vec<_>>().join(" ");
Map::builder()
.insert(tag::NAME, env!("CARGO_BIN_NAME"))
.insert(tag::VERSION, env!("CARGO_PKG_VERSION"))
.insert(tag::COMMAND_LINE, args)
.build()
}

$ cargo --quiet run --example bam_write | cargo --quiet run --example bam_reheader /dev/stdin | samtools view --no-PG --with-header
@HD	VN:1.6
@PG	ID:noodles-bam
@PG	ID:noodles	PN:bam_reheader	VN:0.59.0	CL:target/debug/examples/bam_reheader /dev/stdin	PP:noodles-bam
@CO	an example BAM written by noodles-bam
@CO	a comment added by noodles-bam
*	4	*	0	255	*	*	0	0	*	*

Thanks for initial exploration of this issue!

@jamestwebber jamestwebber deleted the add-pg-example branch April 13, 2024 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants