Optimize partition-only deletes#1171
Open
jdctinuiti wants to merge 2 commits into
Open
Conversation
|
Nice, I've been looking for something like the efficiency of attaching and dropping table partitions in PostgreSQL. |
4f27d66 to
74e0b75
Compare
54f3e50 to
cce732f
Compare
Author
|
Rebased this stack onto current origin/main and cleaned the branch history so the PRs carry only their feature changes. I intentionally left unrelated DuckDB-main compatibility fixes out of this PR stack; those should land separately. The only compatibility cleanup folded into #1171 is for expression accessors in the metadata-delete planner code introduced by this PR. |
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds a metadata-only DELETE path for DuckLake tables when the DELETE can be proven to remove whole data files, and fixes table-statistics accounting when whole files are dropped.
Metadata-only partition DELETE
For DELETE predicates over identity partition columns, DuckLake can retire matching data files by updating metadata instead of scanning files and writing row-level delete files.
This path is used only when all candidate files are safe to drop:
=/IN)Full-table DELETE on partitioned tables can also use this path when the scan selects whole active data files.
Stats fix
Whole-file drops now decrement
ducklake_table_stats.record_countandfile_size_bytes.Before this PR, full-file deletes retired data files but left global table stats unchanged, so repeated delete/reinsert cycles inflated:
GetCardinalityduckdb_tablessize reportingnext_row_idremains monotonic. Column min/max stats are left unchanged because exact recomputation would require scanning remaining files.The stats cache is also invalidated after delete-only commits that drop files, since
next_file_iddoes not change for those commits.Tests
Adds coverage for:
=andINducklake_add_data_files