fix: concurrent-commit correctness on the postgres metadata backend#1113
Open
hello-world-bfree wants to merge 1 commit into
Open
fix: concurrent-commit correctness on the postgres metadata backend#1113hello-world-bfree wants to merge 1 commit into
hello-world-bfree wants to merge 1 commit into
Conversation
…t-order serialization. schemas keyed on snapshot_id, not schema_version. partial unique indexes to replace retry signal. optimistic concurrency stats. move CheckForConflicts post-lock. add conflict detection tests. handle all combinations of conlict, add test
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Note
This is follow-up to #1044 , addresses issue #1094. Original PR closed by @pdet citing concerns that conflict resolution was being bypassed. That feedback surfaced a window where a concurrent committer could land changes between the check and the lock. This was due to running
CheckForConflictsbeforeAcquireCommitLock. This PR resolves that as well as adds definitive conflict-detection tests and better frames/explains the issue. It's not actually performance related (that'll require the stored proc implementation that @pdet mentioned) but correctness related. Given the level of effort and cost for the stored proc implementation (as I understand it at least) this c++ implementation would stop the bleeding and allow for incremental progress towards the stored proc implementation with minimal cost.Problem
*_idallocation usesmax(id) + 1in C++ across two postgres round trips. Concurrent writers race, hitducklake_snapshot_pkeyviolations, retry succeeds but logs fill at >1 MB/h per cluster, and correctness bugs are introduced.Multiple silent-correctness failure modes on the postgres metadata backend under concurrent writers. PK-violation log spam (>1 MB/h per cluster) is the visible symptom; the underlying defects are worse:
CheckForConflictsand rely onducklake_snapshot_pkeycollision as the sole conflict signal. INSERT × ALTER, INSERT × DROP, ALTER × ALTER, COMPACT × ALTER, ALTER-view × DROP-view all silently both-commit. Catalog ends up referencing dropped tables, columns added to non-existent tables, etc.CREATE TABLE(same name), two concurrentADD COLUMN(same name), or two concurrent DELETEs on the same data file all both-commit.ducklake_table/ducklake_column/ducklake_delete_fileend up with duplicate active rows; subsequent resolution is nondeterministic.ducklake_table_statsis last-writer-wins with no version tracking. Planner reads stale cardinalities.GetSchemaForSnapshotwas keyed onschema_version, but two snapshots can shareschema_versionand see different tables (filtered bybegin_snapshot).AT(VERSION => N)can return wrong column sets. - Default isolation hides committers. postgres_scanner attaches atREPEATABLE_READ. Once a DuckLake tx issues any metadata read, its PG snapshot pins for the rest of the tx, soCheckForConflictssees emptychanges_madeeven when conflicts exist.Change
Allocate ids with postgres sequences (
nextval,cache 1). This is race-free so there are no PK violations to log and create unnecessary bloat. Sequences are bootstrapped on attach andsetval()'d from currentmaxso existing catalogs migrate without issue. Decoupling allocation from commit order requires:pg_advisory_xact_lock(30 s timeout) around snapshot allocation + commit.SetCommittedSnapshotIdbecomes monotonic-max. - Partial unique indexes on active rows ofducklake_schema(schema_name),ducklake_table(schema_id, table_name),ducklake_view(schema_id, view_name),ducklake_delete_file(data_file_id).WHERE end_snapshot IS NULL. Replaces PK-violation as the conflict signal.CheckForConflictsruns after the lock on every attempt. Authoritative under the commit critical section.can_retrygated so logical conflicts don't retry; transient lock errors do.stats_versioncolumn onducklake_table_statsplus post-commitSELECTto catch silent zero-row UPDATEs.snapshot_idinstead ofschema_version.isolation_level 'read committed'for PG metadata (overridable). Applies for bothmetadata_type == "postgres"and"postgres_scanner"— the original change missed the latter, which is whatDBPathAndType::ExtractExtensionPrefixactually returns for thepostgres:prefix.ConflictCheckcalls added coveringaltered_tables ↔ tables_merge_adjacent,altered_tables ↔ tables_rewrite_delete,altered_views ↔ dropped_views(both directions). Pre-existing matrix omissions, surfaced while building the test suite.Tests
test/sql/concurrent/concurrent_pg_conflict_detection.test(new); 17 deterministic scenarios + iso-level regression guard + smoke. Each scenario uses named connections + explicitBEGIN/COMMITand asserts on the regex-matched error from the labelledConflictCheckline, so passing means the path actually fired.CheckForConflictspost-lockcurrent_setting('transaction_isolation')from internal metadata catalog, assertsread committed(catches reintroduction of thepostgres_scannerbranch typo class)concurrentloopCREATE same name converges to one active rowCompact × INSERT remains by-design no-conflict, asserted by the existing
test/sql/compaction/compaction_delete_conflict.test:119-138.[concurrent]suite: 470 assertions / 8 files, all green.[compaction]suite green on modified paths (compaction_delete_conflict.test, 50 assertions).Warning
The ci does have failures but they're upstream in
duckdb:mainDataChunk::Verify - size mismatch: vector 0 (BIGINT) has size 0 but chunk has size 1test/sql/alter/struct_evolution_list_alter.test. It's flipped back and forth between skipped and unskipped recently