Skip to content

Conversation

@jblomer
Copy link
Contributor

@jblomer jblomer commented Oct 31, 2025

Enables a shared RNTupleReader to read multiple streams efficiently. On the page source layer, add the possibility to pin and unpin clusters. Data from pinned clusters will not be evicted from the cluster pool or the page pool.

Extend the RNTupleReader API by "active entry tokens". Active entry tokens keep an entry alive in the cache. Internally, the active entries turn into a reference counter for the corresponding cluster, so that the clusters are pinned and unpinned correctly.

Active entry tokens should provide a flexible API not only to support multiple streams but also to keep, e.g., certain (past) reference events alive.

While this functionality is different from the description in #16325, it may be the flexibility that is actually needed.

@Dr15Jones FYI

Graphical output of the tutorial:

image

@jblomer jblomer self-assigned this Oct 31, 2025
@jblomer jblomer requested a review from couet as a code owner October 31, 2025 10:54
@jblomer jblomer marked this pull request as draft October 31, 2025 10:55
@github-actions
Copy link

github-actions bot commented Oct 31, 2025

Test Results

    21 files      21 suites   3d 15h 20m 5s ⏱️
 3 792 tests  3 792 ✅ 0 💤 0 ❌
77 695 runs  77 695 ✅ 0 💤 0 ❌

Results for commit d803490.

♻️ This comment has been updated with latest results.

@jblomer jblomer force-pushed the ntuple-informed-cache branch 3 times, most recently from ce3fb05 to ab99805 Compare December 9, 2025 13:50
@jblomer jblomer marked this pull request as ready for review December 9, 2025 13:51
@jblomer jblomer requested a review from bellenot as a code owner December 9, 2025 13:51
@jblomer jblomer force-pushed the ntuple-informed-cache branch from ab99805 to 7bf6afb Compare December 15, 2025 22:39
This is only used in unit tests. It should wait for all clusters that
are scheduled for background loading. However, it should _not_ remove
those clusters from the in-flight queue but just let the queue with the
ready clusters sit there for pickup by GetCluster().
Pinned clusters and their successors won't be evicted from the cluster
pool. This also means that the cluster pool cannot have a fixed size
anymore.
Now that the pool is not fixed-size anymore, use a hash map instead of a
vector.
@jblomer jblomer force-pushed the ntuple-informed-cache branch from 7bf6afb to 7261144 Compare December 16, 2025 10:00
API extension to tell RNTuple about the lifetime of entries. Useful when
multiple streams (threads) share a single reader.

The active entry tokens are linked to the reader by a shared control
block. Active entry tokens can be copied and moved and take care of the
reference counting of active entry numbers to clusters, such that the
corresponding clusters are pinned and unpinned as needed.
@jblomer jblomer force-pushed the ntuple-informed-cache branch 2 times, most recently from 9e8f065 to d803490 Compare December 16, 2025 14:47
@bellenot bellenot removed their request for review December 16, 2025 16:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant