Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rust: add sans-io indexed reader #1338

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open

Conversation

james-rms
Copy link
Collaborator

@james-rms james-rms commented Feb 25, 2025

Changelog

  • Added: mcap::sans_io::SummaryReader and mcap::sans_io::IndexedReader structs, which can be used to a) parse out summary information and b) read messages efficiently using that summary information.

Docs

See cargo doc for usage info.

Description

Adds a sans-io indexed reading implementation, including conformance tests and benchmarks.

Benchmarks

The indexed reader is suspiciously faster than MessageStream by quite a bit. I suspect the major difference is that the indexed reader allows the caller to re-use the same buffer for every message yielded, rather than allocating a new one.

Raw cargo bench outputs (with changes removed because they compare to the previous run, which is confusing)

mcap_read_linear/MessageStream_1M_uncompressed
                        time:   [165.82 ms 167.64 ms 169.53 ms]
                        thrpt:  [5.8986 Melem/s 5.9653 Melem/s 6.0306 Melem/s]
Found 2 outliers among 10 measurements (20.00%)
  1 (10.00%) high mild
  1 (10.00%) high severe
mcap_read_linear/MessageStream_1M_lz4
                        time:   [203.02 ms 204.87 ms 207.63 ms]
                        thrpt:  [4.8163 Melem/s 4.8811 Melem/s 4.9255 Melem/s]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high severe
mcap_read_linear/MessageStream_1M_zstd
                        time:   [213.31 ms 213.74 ms 214.26 ms]
                        thrpt:  [4.6672 Melem/s 4.6787 Melem/s 4.6881 Melem/s]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild

mcap_read_indexed/IndexedReader_1M_uncompressed
                        time:   [42.935 ms 43.215 ms 43.655 ms]
                        thrpt:  [22.907 Melem/s 23.140 Melem/s 23.291 Melem/s]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild
mcap_read_indexed/IndexedReader_1M_zstd
                        time:   [65.055 ms 65.616 ms 66.254 ms]
                        thrpt:  [15.093 Melem/s 15.240 Melem/s 15.372 Melem/s]
Found 2 outliers among 10 measurements (20.00%)
  1 (10.00%) low mild
  1 (10.00%) high severe
mcap_read_indexed/IndexedReader_1M_lz4
                        time:   [60.758 ms 60.977 ms 61.139 ms]
                        thrpt:  [16.356 Melem/s 16.400 Melem/s 16.459 Melem/s]

}
// index the current chunk slot
// before starting, check if all existing message indexes have been exhausted -
// if so, clear them out now.
Copy link
Contributor

@Muon Muon Mar 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we do this? Some extra explanation would be good.

Comment on lines +731 to +732
expected.sort_by(|a, b| a.1.cmp(&b.1));
expected.reverse();
Copy link
Contributor

@Muon Muon Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
expected.sort_by(|a, b| a.1.cmp(&b.1));
expected.reverse();
expected.sort_by_key(|(_, log_time)| std::cmp::Reverse(log_time));

ReadOrder::ReverseLogTime,
ReadOrder::File,
] {
let mut expected: Vec<(u16, u64)> = chunks.iter().cloned().flatten().cloned().collect();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cloned() twice?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants