feat: flush the delta buffer to duckdb in batched transaction #93
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Optimized Binlog Replica Applier with Transaction Batching
The binlog replica applier reads and applies updates from the binary log (binlog) of a primary MySQL server to a replica. This PR introduces an optimization to batch multiple primary transactions into a single replica transaction whenever possible, aiming to enhance performance and reliability.
Benefits
Transaction Batching Process
Transaction Start: The applier detects a new primary transaction from the binlog. For GTID-based replication, this is typically signaled by a
GTID
event followed by aBEGIN
query event.Batch Extension: Instead of committing each transaction individually, the applier attempts to batch them using the
extendOrCommitBatchTxn
function. As long as the current replica transaction can be safely extended (e.g., the changes from a new primary transaction are pure ROW-format data changes), it adds more changes to the batch.Batch Commit: When the batch reaches a boundary (e.g., a DDL statement) or it is no longer optimal to extend it, the batch is committed. This marks the end of a replica transaction that encapsulates multiple primary transactions.
Implementation
The applier tracks whether it is inside a batched transaction using the
ongoingBatchTxn
flag and manages transaction boundaries with several other state variables:dirtyTxn
,dirtyStream
,pendingPosition
, etc. The implementation can be viewed as a state machine that switches between several states. The extendOrCommitBatchTxn function handles the decision to either extend an ongoing batch by adding more primary transactions to the current replica transaction or to commit the batch, finalizing the replica's transaction.Currently, a batched transaction is closed and a new one is started in the following scenarios:
Previously, the applied binlog position was stored in a special file. In this PR, the binlog position is stored transactionally in DuckDB instead. This makes the system robust to unexpected shutdown.