Implement parallel untagged + tagged indexing #8760

teh-cmc · 2025-01-21T11:18:15Z

We need the store to maintain two indices at write time: the legacy untagged one, and a new tagged one.

The legacy untagged index is necessary to avoid all sorts of UB until everything has been ported to tagged data.

The new tagged index will allow us to start porting things incrementally. Of course we also need accompanying tagged query APIs.

teh-cmc · 2025-01-22T16:46:02Z

I knew I was forgetting yet another subtle complication: maintaining an untagged index is not enough, you can still end up in a situation where a single Chunk has both untagged and tagged data for a component, and no index is gonna save you there.
This is what's happening here (see attached screenshot). if i had to guess, this is because a runtime blueprint write ends up compacted in a pre-existing, tagged blueprint chunk, and now the resulting chunk is both tagged and untagged for that component.

Once we're done with all the API updates on the SDK side, it shouldn't ever be possible for a user to end up in that situation when working in new recording, so that end is covered.
That leaves A) runtime blueprint writes and B) user writes to a pre-existing, legacy recording. Obviously the correct fix for blueprint writes is to port all of them to tagged APIs, but A) that will not happen for 0.22 and B) that doesn't take care of the other problem.

I see two possible avenues here:

Modify the compaction logic so that when tagged data is compacted with untagged data of the same component, we merge them together and keep the tags going forward. This should be well-specified with the current data model, and helps with propagating tags going forward.
Modify the compaction logic so that tagged and untagged data of the same component is never compacted together.

EDIT: Actually solution 1 is not well-specified in any case, since it bottoms down to extrapolating archetype names from component names in the untagged->tagged scenario.

teh-cmc added ⛃ re_datastore affects the datastore itself 🔍 re_query affects re_query itself 🔩 data model labels Jan 21, 2025

teh-cmc self-assigned this Jan 21, 2025

teh-cmc mentioned this issue Jan 22, 2025

Forbid compaction of tagged+untagged data for a single component #8782

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement parallel untagged + tagged indexing #8760

Implement parallel untagged + tagged indexing #8760

teh-cmc commented Jan 21, 2025

teh-cmc commented Jan 22, 2025 •

edited

Loading

Implement parallel untagged + tagged indexing #8760

Implement parallel untagged + tagged indexing #8760

Comments

teh-cmc commented Jan 21, 2025

teh-cmc commented Jan 22, 2025 • edited Loading

teh-cmc commented Jan 22, 2025 •

edited

Loading