Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rust: Channel and Schema structs have IDs #1297

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

james-rms
Copy link
Collaborator

@james-rms james-rms commented Dec 19, 2024

Changelog

Breaking changes

  • adds id members to the crate::Channel and crate::Schema structs.
  • refactored and separated Writer::add_channel into two methods, Writer::add_schema and Writer::add_channel.

Docs

In docstrings, please review if not clear.

Description

When a message reader recieves schema and channel information, the schema and channel IDs in the MCAP are hidden from them. This PR adds those so that readers can observe ID information.

Additionally, this PR updates the Writer interface in these ways:

  1. The add_channel(Channel) method is replaced with add_channel(schema_id, topic, message_encoding, metadata). This was done because the Channel struct now contains a channel ID, and to continue to use it would force the user to come up with their own channel ID before calling this function.
  2. An add_schema(...) method is added to register schemas ahead of channels. This is needed to allow the caller to register a new schema without coming up with an ID in advance.
  3. The existing write(Message) method is kept, but instead of automatically assigning new IDs for the message channel and schema, it uses the IDs passed in. This means that when writing functions that transform MCAP data, the IDs from the input can be kept in the output.
BeforeAfter

@@ -20,28 +20,26 @@ pub struct Record {
}

impl Record {
pub fn get_field(self: &Self, name: &str) -> &Value {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all changes in these files are due to the clippy --all-targets change.

rust/src/write.rs Outdated Show resolved Hide resolved
rust/src/write.rs Outdated Show resolved Hide resolved
@james-rms
Copy link
Collaborator Author

cc. @mrkline

return Ok(*id);
// Adds a schema, returning its ID. If a schema with the same content has been added already,
// its ID is returned.
pub fn add_schema(&mut self, name: &str, encoding: &str, data: &[u8]) -> McapResult<u16> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on having a separate struct that's like "ChannelWithoutId"? Pros: avoid having to thread multiple params through different layers, avoid possible confusion between params of the same type.


#[derive(Hash, PartialEq, Eq)]
struct SchemaContent<'a> {
name: Cow<'a, str>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you ELI5 why we need Cow here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'll add a comment. We use CoW because we want to compare equality between all of these values without neccessarily cloning or moving all of them into a struct to do that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some overhead inherent in the Cow struct itself? Should I think of as being basically a union between a couple different storage options/types? or a pointer to an allocation elsewhere?

Copy link
Collaborator Author

@james-rms james-rms Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a Cow is a union between a borrowed and owned version of a given type - so for str, it's (String | &str), for bytes it's (&[u8], Vec). It implements Deref for those types, allowing you to use their comparison, hash implementations against eachother.

In context, this struct lets me solves this problem:

struct left = {
  a: String::new("abc"),
  b: String::new("def")
}

struct right = {
  a: "abc",
  b: "def",
}
if a == b {
   println!("they are the same!")
}

By defining the struct as having a, b be Cow<'a, str>`, I can store both owned and borrowed versions of the same type in it, and compare them against eachother.

}
let next_schema_id = self.schemas.len() as u16 + 1;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pedantic: We should probably return an error if there are more than 2^16-1 schemas.

}) {
return Ok(id);
}
let next_channel_id = self.channels.len() as u16;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pedantic: We should probably return an error if there are more than 2^16 channels.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants