v2(?) feature: smarter message indexes #1127

wkalt · 2024-01-08T17:06:51Z

wkalt
Jan 8, 2024

The message index record records the timestamp and offset of each message in a chunk. However, the use the index you must decompress the chunk, at least up to the offset you need. In practice full chunks are generally decompressed.

If chunk data can be filtered faster than it can be decompressed (which it probably can), then there is no benefit to topic-selection from the message index. It seems to me like the main thing we get from it is awareness of disorderings, so we can correct the disorderings during playback.

In practice disorderings in recorded data are rare. Most files don't have them. If your file is disordered and you can filter chunk data faster than you can decompress, then the message indexes are wasted space. Aside from being wasted space, reader implementations waste time consulting them. For remote readers, this requires a new range request and could amount to a few extra tens of ms.

It seems like a better indexing approach would just index the disorderings. For most files that would eliminate the need for message indexes. Among other things to handle, I think we would need to make sure the writer records some indication in the summary section that they were using this strategy.

wkalt · 2024-01-08T19:35:34Z

wkalt
Jan 8, 2024
Author

on consideration, it might be even better if we

ignore message indexes completely, and build them on the fly while decompressing the chunk
extend or augment the chunk index concept with some way to know from the summary section whether a chunk has a disordering at all. if the chunk doesn't, use streaming decompression instead of materializing the decompressed data in a buffer.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2(?) feature: smarter message indexes #1127

{{title}}

Replies: 0 comments 1 reply

{{title}}

Select a reply

v2(?) feature: smarter message indexes #1127

wkalt Jan 8, 2024

Replies: 0 comments · 1 reply

wkalt Jan 8, 2024 Author

wkalt
Jan 8, 2024

Replies: 0 comments 1 reply

wkalt
Jan 8, 2024
Author