[VL][Delta] Add native Delta DV reader support#12040
Conversation
|
Run Gluten Clickhouse CI on x86 |
1a16894 to
66ea460
Compare
| struct DeltaProtocolInfo { | ||
| int32_t minReaderVersion; | ||
| int32_t minWriterVersion; | ||
| std::optional<std::vector<std::string>> readerFeatures; | ||
| std::optional<std::vector<std::string>> writerFeatures; | ||
|
|
||
| /// Check if this protocol supports deletion vectors. | ||
| /// Returns true if: | ||
| /// - minReaderVersion >= 3 | ||
| /// - minWriterVersion >= 7 | ||
| /// - 'deletionVectors' is in readerFeatures | ||
| bool supportsDeletionVectors() const { | ||
| if (minReaderVersion < 3 || minWriterVersion < 7) { | ||
| return false; | ||
| } | ||
| if (!readerFeatures.has_value()) { | ||
| return false; | ||
| } | ||
| return std::find(readerFeatures->begin(), readerFeatures->end(), "deletionVectors") != readerFeatures->end(); | ||
| } | ||
| }; |
There was a problem hiding this comment.
In which case do we need this struct and the validation logics? Won't we only use native DV features when the Java side feature exists?
There was a problem hiding this comment.
Good point @zhztheplayer, this was carried over from the earlier delta-dv-read-foundation branch, where native had a broader/future-scoped DV descriptor path.
For this split PR, native only consumes JVM-materialized DV payloads. The Java-side Delta integration is responsible for gating DV support before handing the payload to native, so this protocol struct and validation path are not needed in the current scope.
I removed DeltaProtocolInfo and the related validation logic from this PR. Native now keeps only the checks it actually needs here: materialized payload presence, optional cardinality validation, and file statistics consistency.
I will take a note of that and maybe we can reintroduce protocol validation later if/when native directly resolves Delta DV descriptors.
|
|
||
| namespace { | ||
|
|
||
| class DeltaConnectorTest : public ::testing::Test { |
There was a problem hiding this comment.
Can we also add some test cases that are more E2E? E.g., read a file with a passed-in DV, then verify whether the unwanted rows are filtered out?
There was a problem hiding this comment.
Good catch @zhztheplayer
Added a native test for this:
https://github.com/apache/gluten/pull/12040/changes#diff-c7e9378ffacac4c918121006a2b4606297a5a321fa34ab432e82ef9748aead4bR148-R167
The test writes a small DWRF file, scans it through the Delta connector with a materialized DV payload passed in the split, and verifies that rows whose file positions are in the DV are filtered out from the scan result.
| void loadSerializedDeletionVector( | ||
| std::string_view serializedPayload, | ||
| std::optional<uint64_t> expectedCardinality = std::nullopt); |
There was a problem hiding this comment.
Can we add comments explaining expectedCardinality?
There was a problem hiding this comment.
Thanks @zhztheplayer!
Updated the comment to explain expectedCardinality more directly:
08353bc to
6be05d2
Compare
0338c08 to
4843cb8
Compare
What changes are proposed in this pull request?
This PR is the second step in the split Delta deletion-vector (DV) stack, following #12001.
It adds the native Velox-side Delta DV reader layer that consumes the roaring bitmap payload facilities introduced by #12001, without adding the JVM-side Delta scan metadata handoff yet.
Main changes:
DeltaDeletionVectorReaderto load materialized Delta DV payloads usingRoaringBitmapArrayDeltaSplitReaderto validate DV protocol/statistics metadata and apply row-index filtering semanticsThis PR is intentionally native-reader only:
Those pieces will be added in follow-up split PRs.
issue #11901.
How was this patch tested?
Added focused native test coverage in:
cpp/velox/compute/delta/tests/DeltaConnectorTest.cppcpp/velox/compute/delta/tests/DeltaSplitTest.cppcpp/velox/compute/delta/tests/DeltaDeletionVectorReaderTest.cppCovered cases:
RoaringBitmapArrayValidation run:
malinjawi/incubator-gluten:mainon the combined PR2 branch: all checks passed after rerunning two infra-flaky jobsgit diff --check upstream/main...HEAD/opt/homebrew/opt/llvm@15/bin/clang-formatover changed C++ filesWas this patch authored or co-authored using generative AI tooling?
Generated-by: IBM BOB