Skip to content

Latest commit

 

History

History

export-import-with-kafkacat

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Using kafkacat to export data from one Kafka cluster and load to another

Sometimes it’s useful to be able to copy the contents of a topic from one Kafka cluster to another. For example, a set of data to support a demo, or some data against which to test an application.

kafkacat is a commandline tool for interacting with Kafka, including producing and sending messages.

You can run all the steps in this example using Docker to simulate two single-node Kafka clusters, using kafkacat to dump data from one and import to the other. Using kafkacat to 'sync' a topic via a

Demo setup

Populate some test data in Cluster "A":

docker run --rm -i --network export-import-with-kafkacat_default confluentinc/cp-kafkacat \
  kafkacat -b kafka-a:29092 -t test_topic -P -K: <<EOF
1:{"hello":"world"}
2:{"foo":"bar"}
EOF

Check it’s there:

docker run --rm -it --network export-import-with-kafkacat_default confluentinc/cp-kafkacat \
  kafkacat -b kafka-a:29092 -t test_topic -C -f '\nKey (%K bytes): %k\t\nValue (%S bytes): %s\nTimestamp: %T\tPartition: %p\tOffset: %o\n--\n'

Should look like:

Key (1 bytes): 1
Value (17 bytes): {"hello":"world"}
Timestamp: 1542086750978        Partition: 0    Offset: 0
--

Key (1 bytes): 2
Value (13 bytes): {"foo":"bar"}
Timestamp: 1542086769945        Partition: 0    Offset: 1
--
% Reached end of topic test_topic [0] at offset 2

Export

docker run --rm -it --network export-import-with-kafkacat_default confluentinc/cp-kafkacat \
  kafkacat -b kafka-a:29092 -t test_topic -C -K: -e -q > export.json

Generated export file should look like:

$ cat export.json
1:{"hello":"world"}
2:{"foo":"bar"}

Import

Import to Cluster B the exported data from Cluster A:

$ docker run --rm -it --volume $PWD:/data --network export-import-with-kafkacat_default confluentinc/cp-kafkacat \
    kafkacat -b kafka-b:29092 -t test_topic -P -K: -l /data/export.json

(if you’re not using Docker, you’ll need to amend the path in -l accordingly)

Verify that the data is there in Cluster B:

docker run --rm -it --network export-import-with-kafkacat_default confluentinc/cp-kafkacat \
  kafkacat -b kafka-b:29092 -t test_topic -C -f '\nKey (%K bytes): %k\t\nValue (%S bytes): %s\nTimestamp: %T\tPartition: %p\tOffset: %o\n--\n'

Should look like:

Key (1 bytes): 1
Value (18 bytes): {"hello":"world"}
Timestamp: 1542087361395        Partition: 0    Offset: 0
--

Key (1 bytes): 2
Value (14 bytes): {"foo":"bar"}
Timestamp: 1542087361395        Partition: 0    Offset: 1
--
% Reached end of topic test_topic [0] at offset 2

Sync via netcat

Since kafkacat can stream and consume messages using pipes, we can pair it with netcat (nc) to stream messages between Kafka clusters. This is obviously a very inferior method of replicating data compared to MirrorMaker or Confluent’s Replicator tool!

  • On the target machine:

    nc -k -l 42000 | kafkacat -b localhost:19092 -t test_topic -P -K:

    Where:

    • -k tells nc not to hang up when the connection is closed

    • -l defines the port on which to listen

    • localhost:19092 is the Kafka broker on the target cluster

  • On the source machine:

    kafkacat -b localhost:9092 -t test_topic -C -K: -u -q | nc localhost 42000

    Where:

    • localhost 42000 is the host and port on which the other nc is listening

    • localhost:9092 is the Kafka broker on the source cluster