Skip to content

hindmasj/my-nifi-cluster

Repository files navigation

My NiFi Cluster

Quick project to create a NiFi cluster in Docker.

Inspired by article Running a cluster with Apache Nifi and Docker and shamelessly pinched their compose file, hence the Apache licence. Uses the Apache NiFi Image.

Installation

Before starting you will need to create a new git repo to store the flows in. It is not a good idea to use this cluster repo, the work needs to go in its own repo.

git init ../flow_storage
sudo chown -R 1000.1000 ../flow_storage

Operation

Quickstart

  1. Start the cluster docker compose up -d.
  2. Create some topics bin/launch-script.sh.
  3. Open the cluster URL in your browser: Localhost Port 8080.
  4. Build some flows, process some data.
  5. If you have already built the registry then link the cluster to it bin/add-registry.sh.

You might need to wait a minute from starting the cluster to using the URL, as it takes some time for all of the NiFi nodes to form a cluster.

See how to load flows from a template or from the NiFi registry. Then look at producing and consuming data with Kafka.

Start

To start the cluster up and connect to the NiFi desktop.

  1. Start the cluster with docker compose up -d. The cluster will start 3 NiFi nodes to hold a proper election for master.
  2. Connect to the Nginx proxy at Localhost Port 80.
  3. The server present a simple menu that will take you to the NiFi cluster or to the register.
  4. Select "NiFi Cluster". The nodes take a while to start running so at first you will get the bad proxy error. Keep trying.
  5. Afterwards, you can go directly to http://localhost:8080/nifi/.

Create Flows

Once connected to the GUI you can create your flows. To get you started there is a simple one stored under the templates directory. Load it from the NiFi desktop.

    right click -> Upload Template -> browse -> "Simple_Kafka_Flow.xml" -> Upload

Then add the template onto the desktop from the design bar.

    drag template icon -> Choose Template: "Simple_Kafka_Flow" -> Add

Manage Kafka

One Off Commands

You can run one-off Kafka commands by using the docker compose run <service> <...> command, which spins up a separate container using the same image but running the command. This however can be quite slow as you need to spin up the container for each single command.

After a while these containers accumulate, which you can see with docker compose ps -a. If this becomes a problem then tidy them up with docker container prune.

Start a Client Console

You can run ad hoc commands on a client container.

$ docker compose run kafka bash
[+] Running 1/0
 ⠿ Container zookeeper Running 0.0s
kafka 09:24:44.07
kafka 09:24:44.07 Welcome to the Bitnami kafka container
kafka 09:24:44.08 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-kafka
kafka 09:24:44.08 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-kafka/issues
kafka 09:24:44.08

I have no name!@d2f135d230e4:/$ echo $PATH
/opt/bitnami/kafka/bin:/opt/bitnami/java/bin:/opt/bitnami/java/bin:/opt/bitnami/common/bin:/opt/bitnami/kafka/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

I have no name!@d2f135d230e4:/$ ls /opt/bitnami/kafka/bin
connect-distributed.sh        kafka-console-consumer.sh    ...

I have no name!@d2f135d230e4:/$ exit
exit
$

Run a Client Command

You can run specific scripts or commands that are already in container. Notice you can use the service alias "kafka" as shorthand for the first available kafka broker.

Create a Topic

$ docker compose run kafka kafka-topics.sh \
  --bootstrap-server kafka:9092 \
  --create --topic my.source.topic \
  --replication-factor 3 --config retention.ms=36000000

[+] Running 1/0
⠿ Container zookeeper Running 0.0s
kafka 09:43:21.17
kafka 09:43:21.18 Welcome to the Bitnami kafka container
kafka 09:43:21.18 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-kafka
kafka 09:43:21.18 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-kafka/issues
kafka 09:43:21.19

WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both.
Created topic my.source.topic.

Describe a Topic

$ docker compose run kafka kafka-topics.sh \
  --bootstrap-server kafka:9092 \
  --describe --topic my.source.topic

[+] Running 1/0
⠿ Container zookeeper Running 0.0s
kafka 09:45:45.41
kafka 09:45:45.41 Welcome to the Bitnami kafka container
kafka 09:45:45.42 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-kafka
kafka 09:45:45.42 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-kafka/issues
kafka 09:45:45.42

Topic: my.source.topic  TopicId: ou824ZiQRo-gELS07nh3mg PartitionCount: 1       ReplicationFactor: 3    Configs: segment.bytes=1073741824,retention.ms=36000000
        Topic: my.source.topic  Partition: 0    Leader: 1001    Replicas: 1001,1003,1002        Isr: 1001,1003,1002

Running Scripts

You can use the run command to also mount a local directory and then run any scripts that might be in there. Note that the paths must be absolute.

docker compose run --volume <path-to-mount>:<mount-point> kafka <mount-point>/<script-name>

See the scripts bin/launch-script.sh and bin/create-topics.sh to see an example of how this is done.

Process Some Data

By now you should have loaded the flow into NiFi and set up the topics on Kafka. Now is the time to move data.

  1. Start all of the processes in the flow by pressing the start button on the Operate dialogue.
  2. Send a simple message to the source topic. (ctrl-D to end)
  3. Observe the messages being processed in the flow.
  4. Retrieve the message from the sink topic. (ctrl-C to end)

For a bit more fun you can run both Kafka commands in separate consoles and see each message flowing.

Send Some Data

$ docker compose run kafka kafka-console-producer.sh \
 --bootstrap-server kafka:9092 --topic my.source.topic
>hello world
>now is the time
>one is the number
> ^D

As this is so useful you can launch it with bin/launch-script.sh producer.

Receive Some Data

$ docker compose run kafka kafka-console-consumer.sh \
 --bootstrap-server kafka:9092 --topic my.sink.topic --offset earliest --partition 0
hello world
now is the time
one is the number
^C

As this is so useful you can launch it with bin/launch-script.sh consumer.

Stop

Simply docker compose down to stop the cluster and destroy the containers. If you want to preserve the containers then use docker compose stop.

Running Specific Versions Of NiFi

The cluster uses a locally built image of NiFi based on the official NiFi image. This gives scope to add extra tools at the build stage instead of waiting until run time. At present this only involves installing the package redis-tools which is used in one of the experiments where an ExecuteStreamCommand processor runs the tools in a shell to run ad hoc Redis commands.

This image uses the build script in build-nifi/Dockerfile to perform this task, and is referred from docker-compose.yml so that the build is performed automatically for you.

You can however rebuild this image manually any time you wish with docker compose build.

You can also override the "latest" tag within the build file to run a specific version of NiFi. For example:

docker compose build --build-arg NIFI_VERSION=1.16.0

The image is still tagged as latest so will be used the next time "up" is called.

Using the Registry

A NiFi registry service has been added to make persistence of flows easier than having to use the template method.

Connect to the registry GUI with http://localhost:18080/nifi-registry.

First Time

The first time you use the registry you need to set up the bucket, and optionally put a flow into it. This is a manual process.

  1. In registry click the wrench then create a new bucket.
  2. In NiFi use the menu "Controller Settings" -> "Registry Clients".
  3. Add a new client with URL "http://registry:18080/".
  4. On the desktop create a processor group.
  5. Inside the group, drag in the template for the test flow.
  6. On the background, right click and select "Version" -> "Start version control".
  7. In the dialogue give the flow a name and click save.

You will see that the test bucket and the flow snapshot have been created in the git repo.

Afterwards

Once the registry has been set up, any flows created will get stored in the local git repo, giving you persistence. If you restart the cluster you will see in the registry that your flow definitions have been preserved.

On NiFi you still need to create the registry client link as described above to "http://registry:18080/". Then import the flow onto the desktop.

  1. Drag a process group from the design bar onto the desktop.
  2. Click "Import from Registry".
  3. Select the bucket, flow and version you want.
  4. Click "Import".

Automatic

This has been automated with the script bin/add-registry.sh. You still need to run this command manually of course.

More

To simplify the documentation, further sections have been moved to separate documents under the "docs" directory.