Skip to content

Conversation

@arthurschreiber
Copy link
Member

@arthurschreiber arthurschreiber commented Oct 8, 2025

Description

This PR implements direct binlog streaming from MySQL through VTGate, allowing CDC (Change Data Capture) tools like Debezium, Fivetran, and others to connect to VTGate using the standard MySQL replication protocol instead of requiring special VStream aware adapters or direct access to MySQL instances.

CDC tools typically connect directly to MySQL using the replication protocol (COM_BINLOG_DUMP or COM_BINLOG_DUMP_GTID) to stream binlog events. Today, Vitess does not support these commands in the MySQL server implementation exposed via VTGate.

This PR adds this missing support, but is limited to very specific conditions - namely that the connection where these commands are executed on needs to be targeting a specific tablet alias (see #18808 and #18809).

It also overloads username parsing in VTGate to allow specifying the initial database as part of the username. This is because not all CDC tools allow configuring a default database when establishing the connection, because that setting doesn't really make sense / have any effect with a "plain" MySQL binlog dump.

Architecture

┌─────────────┐      MySQL       ┌─────────────┐       gRPC        ┌─────────────┐      MySQL       ┌─────────────┐
│             │    Protocol      │             │                   │             │    Protocol      │             │
│  CDC Client │◄────────────────►│   VTGate    │◄─────────────────►│  vttablet   │◄────────────────►│    MySQL    │
│             │                  │             │                   │             │                  │             │
└─────────────┘                  └─────────────┘                   └─────────────┘                  └─────────────┘
                                       │
                                       │
                              Target specified via
                            username or USE statement:
                          user|keyspace:shard@type|alias

The implementation tries to be low overhead:

  • After sending COM_BINLOG_DUMP/COM_BINLOG_DUMP_GTID to MySQL, vttablet streams the response to vtgate with low overhead. vttablet reads one packet fragment at a time (up to 16MB of data), wraps it in BinlogDumpResponse messages and sends that over grpc to vtgate. vtgate in turn unwraps these fragments, and writes them to the CDC client via the MySQL protocol. There's no parsing of actual binlog event data.
  • There's no filtering of binlog data.
  • There's no data concatenation / merging for data from different shards - the client is required to run binlog stream process per shard.
  • There's no automatic failover - if a tablet becomes unavailable, the streaming will fail and the client will need to pick a new tablet to resume the streaming process.
  • There's no support for things like MoveTables or Reshard operations.

If any of the above limitations are problematic, using the VStream API is the answer. These will not (and technically can not) be supported via COM_BINLOG_DUMP/COM_BINLOG_DUMP_GTID.

Drawbacks

As mentioned before, COM_BINLOG_DUMP_GTID/COM_BINLOG_DUMP requires the connection to be in tablet targeting mode. The reason for this is that effectively all CDC tools employ the following get to the data:

  • Connect to a node in the cluster and get the current binlog position.
  • Dump all the required data via SELECT statements.
  • Run COM_BINLOG_DUMP/COM_BINLOG_DUMP_GTID from the previously stored binlog position.

This ensures that a complete snapshot of the data can be established, with all required data being consistent and being updated continuously.

Without tablet targeting, the initial binlog position and the data selected via SELECT statements can not be guaranteed to actually be consistent, as each query could potentially end up being routed to a different tablet, and we might lose some data in the process. With tablet targeting, we are guaranteed to see a consistent view of the data.

Open Topics

Replicating Session Variables

COM_BINLOG_DUMP_GTID/COM_BINLOG_DUMP behavior can be changed via two (four) user defined variables: @source_heartbeat_period/@master_heartbeat_period/@source_binlog_checksum/@master_binlog_checksum. I know that at least the Fivetran MySQL adapter modifies these variables before starting the binlog dump.

We should make sure that we replicate the value of these variables into the connection used for binlog streaming before COM_BINLOG_DUMP_GTID/COM_BINLOG_DUMP is executed.

Lower Overhead While Streaming Events

I think it's possible to further reduce the amount of buffer copies that happen in the streaming machinery. There's also potential to further reduce buffer copying by using gRPC's mem.BufferSlice.

Missing Limits

Each COM_BINLOG_DUMP/COM_BINLOG_DUMP_GTID stream opens a new dedicated connection to MySQL, which adds CPU and IO overhead on MySQL, but also on all components between the client and MySQL. There's currently no limits on how many connections can be opened. Do we need operator configurable limits here?

ACL integration

Binlog access can have wild security implications. Binlogs contain not just DML and DDL change events, but also things like password changes and other internal data (internal both to Vitess as well as MySQL). It seems prudent not to allow every user that has access to VTGate to allow streaming all this information.

Graceful shutdown

Shutting down vtgate or vttablet should gracefully close the streaming connection.

Documentation

We should make sure we clearly document the use case for this feature as well as the drawbacks.

Related Issue(s)

Checklist

  • "Backport to:" labels have been added if this change should be back-ported to release branches
  • If this change is to be back-ported to previous releases, a justification is included in the PR description
  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on CI?
  • Documentation was added or is not required

Deployment Notes

AI Disclosure

@vitess-bot
Copy link
Contributor

vitess-bot bot commented Oct 8, 2025

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • Ensure there is a link to an issue (except for internal cleanup and flaky test fixes), new features should have an RFC that documents use cases and test cases.

Tests

  • Bug fixes should have at least one unit or end-to-end test, enhancement and new features should have a sufficient number of tests.

Documentation

  • Apply the release notes (needs details) label if users need to know about this change.
  • New features should be documented.
  • There should be some code comments as to why things are implemented the way they are.
  • There should be a comment at the top of each new or modified test to explain what the test does.

New flags

  • Is this flag really necessary?
  • Flag names must be clear and intuitive, use dashes (-), and have a clear help text.

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow needs to be marked as required, the maintainer team must be notified.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from vitess-operator and arewefastyet, if used there.
  • vtctl command output order should be stable and awk-able.

@vitess-bot vitess-bot bot added NeedsBackportReason If backport labels have been applied to a PR, a justification is required NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsWebsiteDocsUpdate What it says labels Oct 8, 2025
@github-actions github-actions bot added this to the v23.0.0 milestone Oct 8, 2025
@codecov
Copy link

codecov bot commented Oct 8, 2025

Codecov Report

❌ Patch coverage is 0% with 18 lines in your changes missing coverage. Please review.
✅ Project coverage is 69.67%. Comparing base (d6ce439) to head (338bb55).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
go/vt/vttablet/queryservice/wrapped.go 0.00% 5 Missing ⚠️
go/vt/vttablet/tabletserver/tabletserver.go 0.00% 4 Missing ⚠️
go/vt/vtcombo/tablet_map.go 0.00% 3 Missing ⚠️
go/vt/vttablet/grpctabletconn/conn.go 0.00% 2 Missing ⚠️
go/vt/vttablet/sandboxconn/sandboxconn.go 0.00% 2 Missing ⚠️
go/vt/vttablet/tabletconntest/fakequeryservice.go 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #18731      +/-   ##
==========================================
- Coverage   69.68%   69.67%   -0.01%     
==========================================
  Files        1605     1605              
  Lines      214485   214503      +18     
==========================================
  Hits       149463   149463              
- Misses      65022    65040      +18     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@systay systay modified the milestones: v23.0.0, v23.0.1 Nov 4, 2025
@github-actions
Copy link
Contributor

This PR is being marked as stale because it has been open for 30 days with no activity. To rectify, you may do any of the following:

  • Push additional commits to the associated branch.
  • Remove the stale label.
  • Add a comment indicating why it is not stale.

If no action is taken within 7 days, this PR will be closed.

@github-actions github-actions bot added the Stale Marks PRs as stale after a period of inactivity, which are then closed after a grace period. label Dec 22, 2025
@github-actions
Copy link
Contributor

This PR was closed because it has been stale for 7 days with no activity.

@github-actions github-actions bot closed this Dec 29, 2025
@arthurschreiber arthurschreiber added Type: Feature Component: VTTablet Component: VTGate NeedsWebsiteDocsUpdate What it says and removed Stale Marks PRs as stale after a period of inactivity, which are then closed after a grace period. NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsWebsiteDocsUpdate What it says NeedsIssue A linked issue is missing for this Pull Request NeedsBackportReason If backport labels have been applied to a PR, a justification is required labels Jan 19, 2026
Implements direct binlog streaming from MySQL through VTGate, allowing
CDC tools to connect to VTGate using MySQL replication protocol.

- Add BinlogDump RPC to QueryService interface
- Implement COM_BINLOG_DUMP_GTID handler in VTGate
- Stream raw MySQL packets through gRPC to clients
- Support tablet targeting via USE statement or username format
- Add graceful shutdown handling for long-lived streams
- Add end-to-end test for binlog dump functionality

Signed-off-by: Arthur Schreiber <arthur@planetscale.com>
Signed-off-by: Arthur Schreiber <arthur@planetscale.com>
Signed-off-by: Arthur Schreiber <arthur@planetscale.com>
Signed-off-by: Arthur Schreiber <arthur@planetscale.com>
Signed-off-by: Arthur Schreiber <arthur@planetscale.com>
Signed-off-by: Arthur Schreiber <arthur@planetscale.com>
@arthurschreiber arthurschreiber changed the title [WIP] Add support for COM_BINLOG_DUMP_GTID in the vtgate MySQL server [WIP] feat: add support for COM_BINLOG_DUMP_GTID/COM_BINLOG_DUMP in the vtgate MySQL server Jan 23, 2026
@arthurschreiber arthurschreiber changed the title [WIP] feat: add support for COM_BINLOG_DUMP_GTID/COM_BINLOG_DUMP in the vtgate MySQL server feat: add support for COM_BINLOG_DUMP_GTID/COM_BINLOG_DUMP in the vtgate MySQL server Jan 23, 2026
Signed-off-by: Arthur Schreiber <arthur@planetscale.com>
Signed-off-by: Arthur Schreiber <arthur@planetscale.com>
Signed-off-by: Arthur Schreiber <arthur@planetscale.com>
Signed-off-by: Arthur Schreiber <arthur@planetscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants