Skip to content

Files

Latest commit

d1faa30 · Oct 16, 2020

History

History
790 lines (593 loc) · 34.6 KB

manage-ticdc.md

File metadata and controls

790 lines (593 loc) · 34.6 KB
title summary aliases
Manage TiCDC Cluster and Replication Tasks
Learn how to manage a TiCDC cluster and replication tasks.
/docs/dev/ticdc/manage-ticdc/
/docs/dev/reference/tools/ticdc/manage/

Manage TiCDC Cluster and Replication Tasks

This document describes how to deploy a TiCDC cluster and how to manage the TiCDC cluster and replication tasks through the command line tool cdc cli and the HTTP interface.

Deploy and install TiCDC

You can deploy TiCDC using either TiUP or Binary.

Software and hardware recommendations

In production environments, the recommendations of software and hardware for TiCDC are as follows:

Linux OS Version
Red Hat Enterprise Linux 7.3 or later versions
CentOS 7.3 or later versions
CPU Memory Disk type Network Number of TiCDC cluster instances (minimum requirements for production environment)
16 core+ 64 GB+ SSD 10 Gigabit network card (2 preferred) 2

For more information, see Software and Hardware Recommendations

Deploy and install TiCDC using TiUP

If you use TiUP to deploy TiCDC, you can choose one of the following ways:

  • Deploy TiCDC when deploying a TiDB cluster
  • Deploy a TiCDC component on an existing TiDB cluster

Deploy TiCDC when deploying a TiDB cluster

For details, refer to Deploy a TiDB Cluster Using TiUP.

Deploy a TiCDC component on an existing TiDB cluster

  1. First, make sure that the current TiDB version supports TiCDC; otherwise, you need to upgrade the TiDB cluster to v4.0.0 rc.1 or later versions.

  2. To deploy TiCDC, refer to Scale out a TiCDC cluster.

Use Binary

Binary only supports deploying the TiCDC component on an existing TiDB cluster.

Suppose that the PD cluster has a PD node (the client URL is 10.0.10.25:2379) that can provide services. If you want to deploy three TiCDC nodes, start the TiCDC cluster by executing the following commands. Note that you only need to specify the same PD address, the newly started nodes automatically join the TiCDC cluster.

{{< copyable "shell-regular" >}}

cdc server --pd=http://10.0.10.25:2379 --log-file=ticdc_1.log --addr=0.0.0.0:8301 --advertise-addr=127.0.0.1:8301
cdc server --pd=http://10.0.10.25:2379 --log-file=ticdc_2.log --addr=0.0.0.0:8302 --advertise-addr=127.0.0.1:8302
cdc server --pd=http://10.0.10.25:2379 --log-file=ticdc_3.log --addr=0.0.0.0:8303 --advertise-addr=127.0.0.1:8303

The following are descriptions of options available in the cdc server command:

  • gc-ttl: The TTL (Time To Live) of the service level GC safepoint in PD set by TiCDC, in seconds. The default value is 86400, which means 24 hours.
  • pd: The URL of the PD client.
  • addr: The listening address of TiCDC, the HTTP API address, and the Prometheus address of the service.
  • advertise-addr: The access address of TiCDC to the outside world.
  • tz: Time zone used by the TiCDC service. TiCDC uses this time zone when time data types such as TIMESTAMP are converted internally or when data are replicated to the downstream. The default is the local time zone in which the process runs.
  • log-file: The address of the running log of the TiCDC process. The default is cdc.log.
  • log-level: The log level when the TiCDC process is running. The default is info.
  • ca: The path of the CA certificate file used by TiCDC, in the PEM format (optional).
  • cert: The path of the certificate file used by TiCDC, in the PEM format (optional).
  • key: The path of the certificate key file used by TiCDC, in the PEM format (optional).

Upgrade TiCDC using TiUP

This section introduces how to upgrade the TiCDC cluster using TiUP. In the following example, assume that you need to upgrade TiCDC and the entire TiDB cluster to v4.0.6.

{{< copyable "shell-regular" >}}

tiup update --self && \
tiup update --all && \
tiup cluster upgrade <cluster-name> v4.0.6

Notes for upgrade

Use TLS

For details about using encrypted data transmission (TLS), see Enable TLS Between TiDB Components.

Use cdc cli to manage cluster status and data replication task

This section introduces how to use cdc cli to manage a TiCDC cluster and data replication tasks. The following interface description assumes that PD listens on 10.0.10.25 and the port is 2379.

Manage TiCDC service progress (capture)

  • Query the capture list:

    {{< copyable "shell-regular" >}}

    cdc cli capture list --pd=http://10.0.10.25:2379
    [
      {
        "id": "806e3a1b-0e31-477f-9dd6-f3f2c570abdd",
        "is-owner": true,
        "address": "127.0.0.1:8300"
      },
      {
        "id": "ea2a4203-56fe-43a6-b442-7b295f458ebc",
        "is-owner": false,
        "address": "127.0.0.1:8301"
      }
    ]
    
    • id: The ID of the service process.
    • is-owner: Indicates whether the service process is the owner node.
    • address: The address via which the service process provides interface to the outside.

Manage replication tasks (changefeed)

Create a replication task

Execute the following commands to create a replication task:

{{< copyable "shell-regular" >}}

cdc cli changefeed create --pd=http://10.0.10.25:2379 --sink-uri="mysql://root:123456@127.0.0.1:3306/" --changefeed-id="simple-replication-task"
Create changefeed successfully!
ID: simple-replication-task
Info: {"sink-uri":"mysql://root:123456@127.0.0.1:3306/","opts":{},"create-time":"2020-03-12T22:04:08.103600025+08:00","start-ts":415241823337054209,"target-ts":0,"admin-job-type":0,"sort-engine":"memory","sort-dir":".","config":{"case-sensitive":true,"filter":{"rules":["*.*"],"ignore-txn-start-ts":null,"ddl-allow-list":null},"mounter":{"worker-num":16},"sink":{"dispatchers":null,"protocol":"default"},"cyclic-replication":{"enable":false,"replica-id":0,"filter-replica-ids":null,"id-buckets":0,"sync-ddl":false},"scheduler":{"type":"table-number","polling-time":-1}},"state":"normal","history":null,"error":null}
  • --changefeed-id: The ID of the replication task. The format must match the ^[a-zA-Z0-9]+(\-[a-zA-Z0-9]+)*$ regular expression. If this ID is not specified, TiCDC automatically generates a UUID (the version 4 format) as the ID.
  • --sink-uri: The downstream address of the replication task. Configure --sink-uri according to the following format. Currently, the scheme supports mysql/tidb/kafka/pulsar.
  • --start-ts: Specifies the starting TSO of the changefeed. From this TSO, the TiCDC cluster starts pulling data. The default value is the current time.
  • --target-ts: Specifies the ending TSO of the changefeed. To this TSO, the TiCDC cluster stops pulling data. The default value is empty, which means that TiCDC does not automatically stop pulling data.
  • --config: Specifies the configuration file of the changefeed.

{{< copyable "" >}}

[scheme]://[userinfo@][host]:[port][/path]?[query_parameters]

Configure sink URI with mysql/tidb

Sample configuration:

{{< copyable "shell-regular" >}}

--sink-uri="mysql://root:123456@127.0.0.1:3306/?worker-count=16&max-txn-row=5000"

The following are descriptions of parameters and parameter values that can be configured for the sink URI with mysql/tidb:

Parameter/Parameter Value Description
root The username of the downstream database
123456 The password of the downstream database
127.0.0.1 The IP address of the downstream database
3306 The port for the downstream data
worker-count The number of SQL statements that can be concurrently executed to the downstream (optional, 16 by default)
max-txn-row The size of a transaction batch that can be executed to the downstream (optional, 256 by default)
ssl-ca The path of the CA certificate file needed to connect to the downstream MySQL instance (optional)
ssl-cert The path of the certificate file needed to connect to the downstream MySQL instance (optional)
ssl-key The path of the certificate key file needed to connect to the downstream MySQL instance (optional)

Configure sink URI with kafka

Sample configuration:

{{< copyable "shell-regular" >}}

--sink-uri="kafka://127.0.0.1:9092/cdc-test?kafka-version=2.4.0&partition-num=6&max-message-bytes=67108864&replication-factor=1"

The following are descriptions of parameters and parameter values that can be configured for the sink URI with kafka:

Parameter/Parameter Value Description
127.0.0.1 The IP address of the downstream Kafka services
9092 The port for the downstream Kafka
cdc-test The name of the Kafka topic
kafka-version The version of the downstream Kafka (optional, 2.4.0 by default. Currently, the earlist supported Kafka version is 0.11.0.2 and the latest one is 2.6.0.)
kafka-client-id Specifies the Kafka client ID of the replication task (optional, TiCDC_sarama_producer_replication ID by default)
partition-num The number of the downstream Kafka partitions (Optional. The value must be no greater than the actual number of partitions. If you do not configure this parameter, the partition number is obtained automatically.)
max-message-bytes The maximum size of data that is sent to Kafka broker each time (optional, 64MB by default)
replication-factor The number of Kafka message replicas that can be saved (optional, 1 by default)
protocol The protocol with which messages are output to Kafka. The value options are default, canal, avro, and maxwell (default by default)
ca The path of the CA certificate file needed to connect to the downstream Kafka instance (optional)
cert The path of the certificate file needed to connect to the downstream Kafka instance (optional)
key The path of the certificate key file needed to connect to the downstream Kafka instance (optional)

Configure sink URI with pulsar

Sample configuration:

{{< copyable "shell-regular" >}}

--sink-uri="pulsar://127.0.0.1:6650/cdc-test?connectionTimeout=2s"

The following are descriptions of parameters that can be configured for the sink URI with pulsar:

Parameter Description
connectionTimeout The timeout for establishing a connection to the downstream Pulsar, which is optional and defaults to 30 (seconds)
operationTimeout The timeout for performing an operation on the downstream Pulsar, which is optional and defaults to 30 (seconds)
tlsTrustCertsFilePath The path of the CA certificate file needed to connect to the downstream Pulsar instance (optional)
tlsAllowInsecureConnection Determines whether to allow unencrypted connection after TLS is enabled (optional)
tlsValidateHostname Determines whether to verify the host name of the certificate from the downstream Pulsar (optional)
maxConnectionsPerBroker The maximum number of connections allowed to a single downstream Pulsar broker, which is optional and defaults to 1
auth.tls Uses the TLS mode to verify the downstream Pulsar (optional). For example, "{"tlsCertFile":"/path/to/cert", "tlsKeyFile":"/path/to/key"}".
auth.token Uses the token mode to verify the downstream Pulsar (optional). For example, "{"token":"secret-token"}" or "{"file":"path/to/secret-token-file"}".
name The name of Pulsar producer in TiCDC (optional)
maxPendingMessages Sets the maximum size of the pending message queue, which is optional and defaults to 1000. For example, pending for the confirmation message from Pulsar.
disableBatching Disables automatically sending messages in batches (optional)
batchingMaxPublishDelay Sets the duration within which the messages sent are batched (default: 10ms)
compressionType Sets the compression algorithm used for sending messages (optional). The value options are LZ4, ZLIB, and ZSTD (default).
hashingScheme The hash algorithm used for choosing the partition to which a message is sent (optional). The value options are JavaStringHash (default) and Murmur3.
properties.* The customized properties added to the Pulsar producer in TiCDC (optional). For example, properties.location=Hangzhou.

For more parameters of Pulsar, see pulsar-client-go ClientOptions and pulsar-client-go ProducerOptions.

Use the task configuration file

For more replication configuration (for example, specify replicating a single table), see Task configuration file.

You can use a configuration file to create a replication task in the following way:

{{< copyable "shell-regular" >}}

cdc cli changefeed create --pd=http://10.0.10.25:2379 --sink-uri="mysql://root:123456@127.0.0.1:3306/" --config changefeed.toml

In the command above, changefeed.toml is the configuration file for the replication task.

Query the replication task list

Execute the following command to query the replication task list:

{{< copyable "shell-regular" >}}

cdc cli changefeed list --pd=http://10.0.10.25:2379
[{
    "id": "simple-replication-task",
    "summary": {
      "state": "normal",
      "tso": 417886179132964865,
      "checkpoint": "2020-07-07 16:07:44.881",
      "error": null
    }
}]
  • checkpoint indicates that TiCDC has already replicated data before this time point to the downstream.
  • state indicates the state of the replication task.
    • normal: The replication task runs normally.
    • stopped: The replication task is stopped (manually paused or stopped by an error).
    • removed: The replication task is removed. Tasks of this state are displayed only when you have specified the --all option. To see these tasks when this option is not specified, execute the changefeed query command.
    • finished: The replication task is finished (data is replicated to the target-ts). Tasks of this state are displayed only when you have specified the --all option. To see these tasks when this option is not specified, execute the changefeed query command.

Query a specific replication task

To query a specific replication task, execute the changefeed query command. The query result includes the task information and the task state. You can specify the --simple or -s argument to simplify the query result that will only include the basic replication state and the checkpoint information. If you do not specify this argument, detailed task configuration, replication states, and replication table information are output.

{{< copyable "shell-regular" >}}

cdc cli changefeed query -s --pd=http://10.0.10.25:2379 --changefeed-id=simple-replication-task
{
 "state": "normal",
 "tso": 419035700154597378,
 "checkpoint": "2020-08-27 10:12:19.579",
 "error": null
}

In the command and result above:

  • state is the replication state of the current changefeed. Each state must be consistent with the state in changefeed list.
  • tso represents the largest transaction TSO in the current changefeed that has been successfully replicated to the downstream.
  • checkpoint represents the corresponding time of the largest transaction TSO in the current changefeed that has been successfully replicated to the downstream.
  • error records whether an error has occurred in the current changefeed.

{{< copyable "shell-regular" >}}

cdc cli changefeed query --pd=http://10.0.10.25:2379 --changefeed-id=simple-replication-task
{
  "info": {
    "sink-uri": "mysql://127.0.0.1:3306/?max-txn-row=20\u0026worker-number=4",
    "opts": {},
    "create-time": "2020-08-27T10:33:41.687983832+08:00",
    "start-ts": 419036036249681921,
    "target-ts": 0,
    "admin-job-type": 0,
    "sort-engine": "memory",
    "sort-dir": ".",
    "config": {
      "case-sensitive": true,
      "enable-old-value": false,
      "filter": {
        "rules": [
          "*.*"
        ],
        "ignore-txn-start-ts": null,
        "ddl-allow-list": null
      },
      "mounter": {
        "worker-num": 16
      },
      "sink": {
        "dispatchers": null,
        "protocol": "default"
      },
      "cyclic-replication": {
        "enable": false,
        "replica-id": 0,
        "filter-replica-ids": null,
        "id-buckets": 0,
        "sync-ddl": false
      },
      "scheduler": {
        "type": "table-number",
        "polling-time": -1
      }
    },
    "state": "normal",
    "history": null,
    "error": null
  },
  "status": {
    "resolved-ts": 419036036249681921,
    "checkpoint-ts": 419036036249681921,
    "admin-job-type": 0
  },
  "count": 0,
  "task-status": [
    {
      "capture-id": "97173367-75dc-490c-ae2d-4e990f90da0f",
      "status": {
        "tables": {
          "47": {
            "start-ts": 419036036249681921,
            "mark-table-id": 0
          }
        },
        "operation": null,
        "admin-job-type": 0
      }
    }
  ]
}

In the command and result above:

  • info is the replication configuration of the queried changefeed.
  • status is the replication state of the queried changefeed.
    • resolved-ts: The largest transaction TS in the current changefeed. Note that this TS has been successfully sent from TiKV to TiCDC.
    • checkpoint-ts: The largest transaction TS in the current changefeed. Note that this TS has been successfully written to the downstream.
    • admin-job-type: The status of a changefeed:
      • 0: The state is normal.
      • 1: The task is paused. When the task is paused, all replicated processors exit. The configuration and the replication status of the task are retained, so you can resume the task from checkpiont-ts.
      • 2: The task is resumed. The replication task resumes from checkpoint-ts.
      • 3: The task is removed. When the task is removed, all replicated processors are ended, and the configuration information of the replication task is cleared up. Only the replication status is retained for later queries.
  • task-status indicates the state of each replication sub-task in the queried changefeed.

Pause a replication task

Execute the following command to pause a replication task:

{{< copyable "shell-regular" >}}

cdc cli changefeed pause --pd=http://10.0.10.25:2379 --changefeed-id simple-replication-task

In the above command:

  • --changefeed-id=uuid represents the ID of the changefeed that corresponds to the replication task you want to pause.

Resume a replication task

Execute the following command to resume a paused replication task:

{{< copyable "shell-regular" >}}

cdc cli changefeed resume --pd=http://10.0.10.25:2379 --changefeed-id simple-replication-task

In the above command:

  • --changefeed-id=uuid represents the ID of the changefeed that corresponds to the replication task you want to resume.

Remove a replication task

Execute the following command to remove a replication task:

{{< copyable "shell-regular" >}}

cdc cli changefeed remove --pd=http://10.0.10.25:2379 --changefeed-id simple-replication-task

In the above command:

  • --changefeed-id=uuid represents the ID of the changefeed that corresponds to the replication task you want to remove.

After the replication task is removed, the state information of the task will be retained for 24 hours, mainly used for recording the replication checkpoint. Within this 24 hours, you cannot create a replication task of the same name.

If you want to completely remove the task information, you can specify the --force or -f argument in the command. Then all information of the changefeed will be removed, and you can immediately create a changefeed of the same name.

{{< copyable "shell-regular" >}}

cdc cli changefeed remove --pd=http://10.0.10.25:2379 --changefeed-id simple-replication-task --force

Update task configuration

Starting from v4.0.4, TiCDC supports modifying the configuration of the replication task (not dynamically). To modify the changefeed configuration, pause the task, modify the configuration, and then resume the task.

{{< copyable "shell-regular" >}}

cdc cli changefeed pause -c test-cf
cdc cli changefeed update -c test-cf --sink-uri="mysql://127.0.0.1:3306/?max-txn-row=20&worker-number=8" --config=changefeed.toml
cdc cli changefeed resume -c test-cf

Currently, you can modify the following configuration items:

  • sink-uri of the changefeed.
  • The changefeed configuration file and all configuration items in the file.
  • Whether to use the file sorting feature and the sorting directory.
  • The target-ts of the changefeed.

Manage processing units of replication sub-tasks (processor)

  • Query the processor list:

    {{< copyable "shell-regular" >}}

    cdc cli processor list --pd=http://10.0.10.25:2379
    [
            {
                    "id": "9f84ff74-abf9-407f-a6e2-56aa35b33888",
                    "capture-id": "b293999a-4168-4988-a4f4-35d9589b226b",
                    "changefeed-id": "simple-replication-task"
            }
    ]
    
  • Query a specific changefeed which corresponds to the status of a specific replication task:

    {{< copyable "shell-regular" >}}

    cdc cli processor query --pd=http://10.0.10.25:2379 --changefeed-id=simple-replication-task --capture-id=b293999a-4168-4988-a4f4-35d9589b226b
    {
      "status": {
        "tables": {
          "56": {    # ID of the replication table, corresponding to tidb_table_id of a table in TiDB
            "start-ts": 417474117955485702,
            "mark-table-id": 0  # ID of mark tables in the cyclic replication, corresponding to tidb_table_id of mark tables in TiDB
          }
        },
        "operation": null,
        "admin-job-type": 0
      },
      "position": {
        "checkpoint-ts": 417474143881789441,
        "resolved-ts": 417474143881789441,
        "count": 0
      }
    }
    

    In the command above:

    • status.tables: Each key number represents the ID of the replication table, corresponding to tidb_table_id of a table in TiDB.
    • mark-table-id: The ID of mark tables in the cyclic replication, corresponding to tidb_table_id of mark tables in TiDB.
    • resolved-ts: The largest TSO among the sorted data in the current processor.
    • checkpoint-ts: The largest TSO that has been successfully written to the downstream in the current processor.

Use HTTP interface to manage cluster status and data replication task

Currently, the HTTP interface provides some basic features for query and maintenance.

In the following examples, suppose that the TiCDC server listens on 127.0.0.1, and the port is 8300 (you can specify the IP and port in --addr=ip:port when starting the TiCDC server).

Get the TiCDC server status

Use the following command to get the TiCDC server status:

{{< copyable "shell-regular" >}}

curl http://127.0.0.1:8300/status
{
"version": "0.0.1",
"git_hash": "863f8ea889b144244ff53593a45c47ad22d37396",
"id": "6d92386a-73fc-43f3-89de-4e337a42b766", # capture id
"pid": 12102    # cdc server pid
}

Evict the owner node

{{< copyable "shell-regular" >}}

curl -X POST http://127.0.0.1:8300/capture/owner/resign

The above command takes effect only for requesting on the owner node.

{
 "status": true,
 "message": ""
}

{{< copyable "shell-regular" >}}

curl -X POST http://127.0.0.1:8301/capture/owner/resign

For nodes other than owner nodes, executing the above command will return the following error.

election: not leader

Manually schedule a table to other node

{{< copyable "shell-regular" >}}

curl -X POST http://127.0.0.1:8300/capture/owner/move_table -d 'cf-id=cf060953-036c-4f31-899f-5afa0ad0c2f9&target-cp-id=6f19a6d9-0f8c-4dc9-b299-3ba7c0f216f5&table-id=49'

Parameter description:

Parameter name Description
cf-id The ID of the changefeed to be scheduled
target-cp-id The ID of the target capture
table-id The ID of the table to be scheduled

For nodes other than owner nodes, executing the above command will return the following error.

{
 "status": true,
 "message": ""
}

Dynamically change the log level of TiCDC server

{{< copyable "shell-regular" >}}

curl -X POST -d '"debug"' http://127.0.0.1:8301/admin/log

In the command above, the POST parameter indicates the new log level. The zap-provided log level options are supported: "debug", "info", "warn", "error", "dpanic", "panic", and "fatal". This interface parameter is JSON-encoded and you need to pay attention to the use of quotation marks. For example: '"debug"'.

Task configuration file

This section introduces the configuration of a replication task.

# Specifies whether the database names and tables in the configuration file are case-sensitive.
# The default value is true.
# This configuration item affects configurations related to filter and sink.
case-sensitive = true

# Specifies whether to output the old value. New in v4.0.5.
enable-old-value = false

[filter]
# Ignores the transaction of specified start_ts.
ignore-txn-start-ts = [1, 2]

# Filter rules.
# Filter syntax: https://docs.pingcap.com/tidb/stable/table-filter#syntax.
rules = ['*.*', '!test.*']

[mounter]
# mounter thread counts, which is used to decode the TiKV output data.
worker-num = 16

[sink]
# For the sink of MQ type, you can use dispatchers to configure the event dispatcher.
# Supports four dispatchers: default, ts, rowid, and table.
# The dispatcher rules are as follows:
# - default: When multiple unique indexes (including the primary key) exist or the Old Value feature is enabled, events are dispatched in the table mode. When only one unique index (or the primary key) exists, events are dispatched in the rowid mode.
# - ts: Use the commitTs of the row change to create Hash and dispatch events.
# - rowid: Use the name and value of the selected HandleKey column to create Hash and dispatch events.
# - table: Use the schema name of the table and the table name to create Hash and dispatch events.
# The matching syntax of matcher is the same as the filter rule syntax.
dispatchers = [
    {matcher = ['test1.*', 'test2.*'], dispatcher = "ts"},
    {matcher = ['test3.*', 'test4.*'], dispatcher = "rowid"},
]
# For the sink of MQ type, you can specify the protocol format of the message.
# Currently four protocols are supported: default, canal, avro, and maxwell. The default protocol is TiCDC Open Protocol.
protocol = "default"

[cyclic-replication]
# Whether to enable cyclic replication.
enable = false
# The replica ID of the current TiCDC.
replica-id = 1
# The replica ID to be filtered.
filter-replica-ids = [2,3]
# Whether to replicate DDL statements.
sync-ddl = true

Notes for compatibility

  • In TiCDC v4.0.0, ignore-txn-commit-ts is removed and ignore-txn-start-ts is added, which uses start_ts to filter transactions.
  • In TiCDC v4.0.2, db-dbs/db-tables/ignore-dbs/ignore-tables are removed and rules is added, which uses new filter rules for databases and tables. For detailed filter syntax, see Table Filter.

Cyclic replication

Warning:

Currently (v4.0.2), cyclic replication is still an experimental feature. It is NOT recommended to use it in the production environment.

The cyclic replication feature supports replicating data across multiple independent TiDB clusters. For example, TiDB clusters A, cluster B, and cluster C all have a table named test.user_data and write data into this table respectively. With the cyclic replication feature, the data written into test.user_data in one cluster can be replicated to the other two clusters, so that the test.user_data table in the three clusters is consistent with each other.

Usage example

Enable cyclic replication in the three clusters of A, B, and C. Two TiCDC clusters are used for the replication from cluster A to cluster B. Among the three clusters, DDL statements enters cluster A first.

TiCDC cyclic replication

To use the cyclic replication feature, you need to configure the following parameters for the replication task upon the task creation.

  • --cyclic-replica-id: Specifies the data source (to be written) ID of the upstream cluster. Each cluster ID must be unique.
  • --cyclic-filter-replica-ids: Specifies the data source ID to be filtered, which is usually the downstream cluster ID.
  • --cyclic-sync-ddl: Determines whether to replicate DDL statements to the downstream.

To create a cyclic replication task, take the following steps:

  1. Enable the TiCDC component in TiDB cluster A, cluster B, and cluster C.

    {{< copyable "shell-regular" >}}

    # Enables TiCDC in cluster A.
    cdc server \
        --pd="http://${PD_A_HOST}:${PD_A_PORT}" \
        --log-file=ticdc_1.log \
        --addr=0.0.0.0:8301 \
        --advertise-addr=127.0.0.1:8301
    # Enables TiCDC in cluster B.
    cdc server \
        --pd="http://${PD_B_HOST}:${PD_B_PORT}" \
        --log-file=ticdc_2.log \
        --addr=0.0.0.0:8301 \
        --advertise-addr=127.0.0.1:8301
    # Enables TiCDC in cluster C.
    cdc server \
        --pd="http://${PD_C_HOST}:${PD_C_PORT}" \
        --log-file=ticdc_3.log \
        --addr=0.0.0.0:8301 \
        --advertise-addr=127.0.0.1:8301
  2. Create the mark tables used for the cyclic replication in cluster A, cluster B, and cluster C.

    {{< copyable "shell-regular" >}}

    # Creates mark tables in cluster A.
    cdc cli changefeed cyclic create-marktables \
        --cyclic-upstream-dsn="root@tcp(${TIDB_A_HOST}:${TIDB_A_PORT})/" \
        --pd="http://${PD_A_HOST}:${PD_A_PORT}"
    # Creates mark tables in cluster B.
    cdc cli changefeed cyclic create-marktables \
        --cyclic-upstream-dsn="root@tcp(${TIDB_B_HOST}:${TIDB_B_PORT})/" \
        --pd="http://${PD_B_HOST}:${PD_B_PORT}"
    # Creates mark tables in cluster C.
    cdc cli changefeed cyclic create-marktables \
        --cyclic-upstream-dsn="root@tcp(${TIDB_C_HOST}:${TIDB_C_PORT})/" \
        --pd="http://${PD_C_HOST}:${PD_C_PORT}"
  3. Create the cyclic replication task in cluster A, cluster B, and cluster C.

    {{< copyable "shell-regular" >}}

    # Creates the cyclic replication task in cluster A.
    cdc cli changefeed create \
        --sink-uri="mysql://root@${TiDB_B_HOST}/" \
        --pd="http://${PD_A_HOST}:${PD_A_PORT}" \
        --cyclic-replica-id 1 \
        --cyclic-filter-replica-ids 2 \
        --cyclic-sync-ddl true
    # Creates the cyclic replication task in cluster B.
    cdc cli changefeed create \
        --sink-uri="mysql://root@${TiDB_C_HOST}/" \
        --pd="http://${PD_B_HOST}:${PD_B_PORT}" \
        --cyclic-replica-id 2 \
        --cyclic-filter-replica-ids 3 \
        --cyclic-sync-ddl true
    # Creates the cyclic replication task in cluster C.
    cdc cli changefeed create \
        --sink-uri="mysql://root@${TiDB_A_HOST}/" \
        --pd="http://${PD_C_HOST}:${PD_C_PORT}" \
        --cyclic-replica-id 3 \
        --cyclic-filter-replica-ids 1 \
        --cyclic-sync-ddl false

Usage notes

  • Before creating the cyclic replication task, you must execute cdc cli changefeed cyclic create-marktables to create the mark tables for the cyclic replication.
  • The name of the table with cyclic replication enabled must match the ^[a-zA-Z0-9_]+$ regular expression.
  • Before creating the cyclic replication task, the tables for the task must be created.
  • After enabling the cyclic replication, you cannot create a table that will be replicated by the cyclic replication task.
  • To perform online DDL operations, ensure the following requirements are met:
    • The TiCDC components of multiple clusters form a one-way DDL replication chain, which is not cyclic. For example, in the example above, only the TiCDC component of cluster C disables sync-ddl.
    • DDL operations must be performed on the cluster that is the starting point of the one-way DDL replication chain, such as cluster A in the example above.

Output the historical value of a Row Changed Event New in v4.0.5

Warning:

Currently, outputting the historical value of a Row Changed Event is still an experimental feature. It is NOT recommended to use it in the production environment.

In the default configuration, the Row Changed Event of TiCDC Open Protocol output in a replication task only contains the changed value, not the value before the change. Therefore, the output value neither supports the new collation framework introduced in TiDB v4.0, nor can be used by the consumer ends of TiCDC Open Protocol as the historical value of a Row Changed Event.

Starting from v4.0.5, TiCDC supports outputting the historical value of a Row Changed Event. To enable this feature, specify the following configuration in the changefeed configuration file at the root level:

{{< copyable "" >}}

enable-old-value = true

After this feature is enabled, you can see TiCDC Open Protocol - Row Changed Event for the detailed output format. The new TiDB v4.0 collation framework will also be supported when you use the MySQL sink.