-
Notifications
You must be signed in to change notification settings - Fork 2.3k
feat: add support for COM_BINLOG_DUMP_GTID/COM_BINLOG_DUMP in the vtgate MySQL server
#18731
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #18731 +/- ##
==========================================
- Coverage 69.68% 69.67% -0.01%
==========================================
Files 1605 1605
Lines 214485 214503 +18
==========================================
Hits 149463 149463
- Misses 65022 65040 +18 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
This PR is being marked as stale because it has been open for 30 days with no activity. To rectify, you may do any of the following:
If no action is taken within 7 days, this PR will be closed. |
|
This PR was closed because it has been stale for 7 days with no activity. |
1e9ef62 to
970d6c7
Compare
Implements direct binlog streaming from MySQL through VTGate, allowing CDC tools to connect to VTGate using MySQL replication protocol. - Add BinlogDump RPC to QueryService interface - Implement COM_BINLOG_DUMP_GTID handler in VTGate - Stream raw MySQL packets through gRPC to clients - Support tablet targeting via USE statement or username format - Add graceful shutdown handling for long-lived streams - Add end-to-end test for binlog dump functionality Signed-off-by: Arthur Schreiber <arthur@planetscale.com>
970d6c7 to
10725b5
Compare
Signed-off-by: Arthur Schreiber <arthur@planetscale.com>
Signed-off-by: Arthur Schreiber <arthur@planetscale.com>
Signed-off-by: Arthur Schreiber <arthur@planetscale.com>
Signed-off-by: Arthur Schreiber <arthur@planetscale.com>
Signed-off-by: Arthur Schreiber <arthur@planetscale.com>
COM_BINLOG_DUMP_GTID in the vtgate MySQL serverCOM_BINLOG_DUMP_GTID/COM_BINLOG_DUMP in the vtgate MySQL server
COM_BINLOG_DUMP_GTID/COM_BINLOG_DUMP in the vtgate MySQL serverCOM_BINLOG_DUMP_GTID/COM_BINLOG_DUMP in the vtgate MySQL server
Signed-off-by: Arthur Schreiber <arthur@planetscale.com>
Signed-off-by: Arthur Schreiber <arthur@planetscale.com>
Signed-off-by: Arthur Schreiber <arthur@planetscale.com>
Signed-off-by: Arthur Schreiber <arthur@planetscale.com>
Description
This PR implements direct binlog streaming from MySQL through VTGate, allowing CDC (Change Data Capture) tools like Debezium, Fivetran, and others to connect to VTGate using the standard MySQL replication protocol instead of requiring special VStream aware adapters or direct access to MySQL instances.
CDC tools typically connect directly to MySQL using the replication protocol (
COM_BINLOG_DUMPorCOM_BINLOG_DUMP_GTID) to stream binlog events. Today, Vitess does not support these commands in the MySQL server implementation exposed via VTGate.This PR adds this missing support, but is limited to very specific conditions - namely that the connection where these commands are executed on needs to be targeting a specific tablet alias (see #18808 and #18809).
It also overloads username parsing in VTGate to allow specifying the initial database as part of the username. This is because not all CDC tools allow configuring a default database when establishing the connection, because that setting doesn't really make sense / have any effect with a "plain" MySQL binlog dump.
Architecture
The implementation tries to be low overhead:
COM_BINLOG_DUMP/COM_BINLOG_DUMP_GTIDto MySQL,vttabletstreams the response tovtgatewith low overhead.vttabletreads one packet fragment at a time (up to 16MB of data), wraps it inBinlogDumpResponsemessages and sends that over grpc tovtgate.vtgatein turn unwraps these fragments, and writes them to the CDC client via the MySQL protocol. There's no parsing of actual binlog event data.MoveTablesorReshardoperations.If any of the above limitations are problematic, using the VStream API is the answer. These will not (and technically can not) be supported via
COM_BINLOG_DUMP/COM_BINLOG_DUMP_GTID.Drawbacks
As mentioned before,
COM_BINLOG_DUMP_GTID/COM_BINLOG_DUMPrequires the connection to be in tablet targeting mode. The reason for this is that effectively all CDC tools employ the following get to the data:SELECTstatements.COM_BINLOG_DUMP/COM_BINLOG_DUMP_GTIDfrom the previously stored binlog position.This ensures that a complete snapshot of the data can be established, with all required data being consistent and being updated continuously.
Without tablet targeting, the initial binlog position and the data selected via
SELECTstatements can not be guaranteed to actually be consistent, as each query could potentially end up being routed to a different tablet, and we might lose some data in the process. With tablet targeting, we are guaranteed to see a consistent view of the data.Open Topics
Replicating Session Variables
COM_BINLOG_DUMP_GTID/COM_BINLOG_DUMPbehavior can be changed via two (four) user defined variables:@source_heartbeat_period/@master_heartbeat_period/@source_binlog_checksum/@master_binlog_checksum. I know that at least the Fivetran MySQL adapter modifies these variables before starting the binlog dump.We should make sure that we replicate the value of these variables into the connection used for binlog streaming before
COM_BINLOG_DUMP_GTID/COM_BINLOG_DUMPis executed.Lower Overhead While Streaming Events
I think it's possible to further reduce the amount of buffer copies that happen in the streaming machinery. There's also potential to further reduce buffer copying by using gRPC's
mem.BufferSlice.Missing Limits
Each
COM_BINLOG_DUMP/COM_BINLOG_DUMP_GTIDstream opens a new dedicated connection to MySQL, which adds CPU and IO overhead on MySQL, but also on all components between the client and MySQL. There's currently no limits on how many connections can be opened. Do we need operator configurable limits here?ACL integration
Binlog access can have wild security implications. Binlogs contain not just DML and DDL change events, but also things like password changes and other internal data (internal both to Vitess as well as MySQL). It seems prudent not to allow every user that has access to VTGate to allow streaming all this information.
Graceful shutdown
Shutting down
vtgateorvttabletshould gracefully close the streaming connection.Documentation
We should make sure we clearly document the use case for this feature as well as the drawbacks.
Related Issue(s)
Checklist
Deployment Notes
AI Disclosure