-
Notifications
You must be signed in to change notification settings - Fork 1.4k
[ntuple] Add support for efficient multi-stream reading #20257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
jblomer
wants to merge
20
commits into
root-project:master
Choose a base branch
from
jblomer:ntuple-informed-cache
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Test Results 21 files 21 suites 3d 15h 20m 5s ⏱️ Results for commit d803490. ♻️ This comment has been updated with latest results. |
ce3fb05 to
ab99805
Compare
ab99805 to
7bf6afb
Compare
This is only used in unit tests. It should wait for all clusters that are scheduled for background loading. However, it should _not_ remove those clusters from the in-flight queue but just let the queue with the ready clusters sit there for pickup by GetCluster().
Pinned clusters and their successors won't be evicted from the cluster pool. This also means that the cluster pool cannot have a fixed size anymore.
Now that the pool is not fixed-size anymore, use a hash map instead of a vector.
7bf6afb to
7261144
Compare
When cleaning up entire preloaded clusters from the page pool, skip pinned clusters.
API extension to tell RNTuple about the lifetime of entries. Useful when multiple streams (threads) share a single reader. The active entry tokens are linked to the reader by a shared control block. Active entry tokens can be copied and moved and take care of the reference counting of active entry numbers to clusters, such that the corresponding clusters are pinned and unpinned as needed.
9e8f065 to
d803490
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Enables a shared RNTupleReader to read multiple streams efficiently. On the page source layer, add the possibility to pin and unpin clusters. Data from pinned clusters will not be evicted from the cluster pool or the page pool.
Extend the RNTupleReader API by "active entry tokens". Active entry tokens keep an entry alive in the cache. Internally, the active entries turn into a reference counter for the corresponding cluster, so that the clusters are pinned and unpinned correctly.
Active entry tokens should provide a flexible API not only to support multiple streams but also to keep, e.g., certain (past) reference events alive.
While this functionality is different from the description in #16325, it may be the flexibility that is actually needed.
@Dr15Jones FYI
Graphical output of the tutorial: