Replies: 0 comments 1 reply
-
on consideration, it might be even better if we
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
The message index record records the timestamp and offset of each message in a chunk. However, the use the index you must decompress the chunk, at least up to the offset you need. In practice full chunks are generally decompressed.
If chunk data can be filtered faster than it can be decompressed (which it probably can), then there is no benefit to topic-selection from the message index. It seems to me like the main thing we get from it is awareness of disorderings, so we can correct the disorderings during playback.
In practice disorderings in recorded data are rare. Most files don't have them. If your file is disordered and you can filter chunk data faster than you can decompress, then the message indexes are wasted space. Aside from being wasted space, reader implementations waste time consulting them. For remote readers, this requires a new range request and could amount to a few extra tens of ms.
It seems like a better indexing approach would just index the disorderings. For most files that would eliminate the need for message indexes. Among other things to handle, I think we would need to make sure the writer records some indication in the summary section that they were using this strategy.
Beta Was this translation helpful? Give feedback.
All reactions