Skip to content

Optimize partition-only deletes#1171

Open
jdctinuiti wants to merge 2 commits into
duckdb:mainfrom
jdctinuiti:metadata-delete-partition-only
Open

Optimize partition-only deletes#1171
jdctinuiti wants to merge 2 commits into
duckdb:mainfrom
jdctinuiti:metadata-delete-partition-only

Conversation

@jdctinuiti

@jdctinuiti jdctinuiti commented May 18, 2026

Copy link
Copy Markdown

Summary

This PR adds a metadata-only DELETE path for DuckLake tables when the DELETE can be proven to remove whole data files, and fixes table-statistics accounting when whole files are dropped.

Metadata-only partition DELETE

For DELETE predicates over identity partition columns, DuckLake can retire matching data files by updating metadata instead of scanning files and writing row-level delete files.

This path is used only when all candidate files are safe to drop:

  • the predicate is reducible to accepted constant values for identity partition columns (= / IN)
  • every candidate file has recorded partition values matching the accepted set
  • no candidate file has active row-level delete files
  • no inlined data or active inlined deletion table is involved
  • non-identity partition transforms, mixed non-partition predicates, NULL partition predicates, and legacy files without partition metadata fall back to the existing row-level delete path

Full-table DELETE on partitioned tables can also use this path when the scan selects whole active data files.

Stats fix

Whole-file drops now decrement ducklake_table_stats.record_count and file_size_bytes.

Before this PR, full-file deletes retired data files but left global table stats unchanged, so repeated delete/reinsert cycles inflated:

  • optimizer cardinality estimates via GetCardinality
  • duckdb_tables size reporting

next_row_id remains monotonic. Column min/max stats are left unchanged because exact recomputation would require scanning remaining files.

The stats cache is also invalidated after delete-only commits that drop files, since next_file_id does not change for those commits.

Tests

Adds coverage for:

  • metadata-only partition DELETE with = and IN
  • intersecting partition filters
  • fallback for mixed predicates, NULL predicates, transformed partitions, legacy files, existing deletes, and inlined data
  • atomic delete + ducklake_add_data_files
  • concurrent conflicting metadata deletes
  • table-stats decrement and stats-cache invalidation

@kinghuang

Copy link
Copy Markdown

Nice, I've been looking for something like the efficiency of attaching and dropping table partitions in PostgreSQL.

@jdctinuiti jdctinuiti force-pushed the metadata-delete-partition-only branch from 4f27d66 to 74e0b75 Compare May 21, 2026 21:15
@jdctinuiti jdctinuiti force-pushed the metadata-delete-partition-only branch from 54f3e50 to cce732f Compare June 4, 2026 00:38
@jdctinuiti

Copy link
Copy Markdown
Author

Rebased this stack onto current origin/main and cleaned the branch history so the PRs carry only their feature changes. I intentionally left unrelated DuckDB-main compatibility fixes out of this PR stack; those should land separately. The only compatibility cleanup folded into #1171 is for expression accessors in the metadata-delete planner code introduced by this PR.

@jdctinuiti jdctinuiti mentioned this pull request Jun 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants