Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement trimming of interned strings #7784

Closed
sync-by-unito bot opened this issue Jun 6, 2024 · 1 comment
Closed

Implement trimming of interned strings #7784

sync-by-unito bot opened this issue Jun 6, 2024 · 1 comment
Assignees

Comments

@sync-by-unito
Copy link

sync-by-unito bot commented Jun 6, 2024

See design document.

The success of interning and compressing needs to be monitored.

Trimming has to be done incrementally - no "stop the world" garbage collection.

Suggestion: When trimming is needed set up a second string-interner. Use the sign of StringIDs to distinguish between interners, allowing two interners to coexist.

Incrementally re-intern all properties of a column, then discard the old string interner.

Since trimming will take place after ingesting data, we can likely optimize compression by training the compressor on available data before interning.

Copy link
Author

sync-by-unito bot commented Jun 6, 2024

➤ PM Bot commented:

Jira ticket: RCORE-2158

@sync-by-unito sync-by-unito bot closed this as completed Aug 14, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 13, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant