Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement parallel untagged + tagged indexing #8760

Open
teh-cmc opened this issue Jan 21, 2025 · 1 comment
Open

Implement parallel untagged + tagged indexing #8760

teh-cmc opened this issue Jan 21, 2025 · 1 comment
Assignees
Labels
🔩 data model ⛃ re_datastore affects the datastore itself 🔍 re_query affects re_query itself

Comments

@teh-cmc
Copy link
Member

teh-cmc commented Jan 21, 2025

We need the store to maintain two indices at write time: the legacy untagged one, and a new tagged one.

The legacy untagged index is necessary to avoid all sorts of UB until everything has been ported to tagged data.

The new tagged index will allow us to start porting things incrementally. Of course we also need accompanying tagged query APIs.

@teh-cmc teh-cmc added ⛃ re_datastore affects the datastore itself 🔍 re_query affects re_query itself 🔩 data model labels Jan 21, 2025
@teh-cmc teh-cmc self-assigned this Jan 21, 2025
@teh-cmc
Copy link
Member Author

teh-cmc commented Jan 22, 2025

I knew I was forgetting yet another subtle complication: maintaining an untagged index is not enough, you can still end up in a situation where a single Chunk has both untagged and tagged data for a component, and no index is gonna save you there.
This is what's happening here (see attached screenshot). if i had to guess, this is because a runtime blueprint write ends up compacted in a pre-existing, tagged blueprint chunk, and now the resulting chunk is both tagged and untagged for that component.

Once we're done with all the API updates on the SDK side, it shouldn't ever be possible for a user to end up in that situation when working in new recording, so that end is covered.
That leaves A) runtime blueprint writes and B) user writes to a pre-existing, legacy recording. Obviously the correct fix for blueprint writes is to port all of them to tagged APIs, but A) that will not happen for 0.22 and B) that doesn't take care of the other problem.

I see two possible avenues here:

  • Modify the compaction logic so that when tagged data is compacted with untagged data of the same component, we merge them together and keep the tags going forward. This should be well-specified with the current data model, and helps with propagating tags going forward.
  • Modify the compaction logic so that tagged and untagged data of the same component is never compacted together.

EDIT: Actually solution 1 is not well-specified in any case, since it bottoms down to extrapolating archetype names from component names in the untagged->tagged scenario.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🔩 data model ⛃ re_datastore affects the datastore itself 🔍 re_query affects re_query itself
Projects
None yet
Development

No branches or pull requests

1 participant