Skip to content

Setting up: CockroachDB

Sarah Anoke edited this page Feb 21, 2020 · 7 revisions

The federal spending data was migrated from PostgreSQL to CockroachDB, a distributed SQL ("NewSQL") database built on a transactional and strongly-consistent key-value store. When comparing to PostgreSQL, CockroachDB stands out in its built-in scalability, although there's more latency with complex queries. In our case we're using simple filtering and summing operations on a large (and growing) database, so the tradeoff is ok.

The following cluster deployment largely follows guidance by Cockroach Labs, with adjustments to my particular architecture.

1 - Install CockroachDB on local machine

  1. Download the CockroachDB archive and extract the binary.
curl https://binaries.cockroachdb.com/cockroach-v19.2.2.darwin-10.9-amd64.tgz | \
   tar -xJ
  1. Copy the binary into PATH to execute cockroach commands from any shell.
cp -i cockroach-v19.2.2.darwin-10.9-amd64/cockroach /usr/local/bin

2 - Launch cluster instances

Launch 4 EC2 instances, each

  • r5a.xlarge (2 vCPUs and 16GB memory)
  • running Ubuntu
  • with 750GB root volume
  • with an Elastic IP

Note that the instances also require a lot of CPU use for writing, so a compute optimized instance may be an option.

3 - Create a security group and add instances to it

Create a security group for the CockroachDB cluster, so that the nodes can communicate with each other.

Inbound

Type Protocol Port Range Source Description
Custom TCP Rule TCP 8080 < your IP > access cockroachDB admin UI
Custom TCP Rule TCP 8080 IP range of VPC in CIDR notation load balancer - health check communication
Custom TCP Rule TCP 26257 private IP of instance running FedSpend app UI communication with the application UI
Custom TCP Rule TCP 26257 sg-xxx (security group ID) for node X to talk to node Y
Custom TCP Rule TCP 7077 < your IP > cockroachDB port
SSH TCP 22 < your IP > SSH into machine

Outbound

Type Protocol Port Range Source Description
All traffic All All sg-xxx (security group ID) for node X to receive data from node Y
All traffic All All 0.0.0.0/0 all download from external websites

4 - Set up time sync

It's necessary for all nodes to be working from the same clock. Amazon Time Sync is a service that facilitates this and needs to be set up on all nodes.

sudo apt update

sudo apt install chrony


# here we'll open a file...
sudo nano /etc/chrony/chrony.conf
# ...and paste the following text above any pool or server statements
# -> server 169.254.169.123 prefer iburst minpoll 4 maxpoll 4

sudo /etc/init.d/chrony restart


# verify that chrony is using the 169.254.169.123 IP address 
#   to synchronize the time
chronyc sources -v


# verify the time synchronization metrics that are reported by chrony
chronyc tracking

5 - Set up load balancing

We'll be using the AWS load balancing service to distribute client traffic across nodes.

  1. Open the EC2 console and select Load balancers.
  2. Click the button Create Load Balancer on the left.
  3. Create a Network Load Balancer.
    • Select a name for the balancer, and add a TCP listener on port 26257.
    • Under the 'Availability Zones' section, select the same VPC and subnet as the one containing the instances.
    • In Step 3: Configure Routing, select any name for the target group and that they'll receive communication on port 25267. The Health checks protocol should be HTTP port 8080 with path /health?ready=1.
    • In Step 4: Register Targets, register instances with the target group just created, specifying port 26257.

Make a note of the private IP of the load balancer.

6 - Generate certificates and keys

Generation is done locally, then the files transferred to each instance. Below is an outline, the full certificate process is available elsewhere.

  1. Generate the CA certificate and key.
  2. Create the certificate and key for the first node.
  3. Upload the CA certificate, node certificate, key to the first node.
  4. Delete the local copy of the node certificate and key.
  5. Repeat steps 2-4 for all other nodes.
  6. Create a client certificate and key for the root user. Transfer to the machine you'll run workloads from.

7 - Start nodes

Below is an outline, the full startup process is available elsewhere. For each node...

  1. SSH into the node and install CockroachDB. Copy the binary into the PATH.
wget -qO- https://binaries.cockroachdb.com/cockroach-v19.2.2.linux-amd64.tgz | tar xvz
sudo cp -i cockroach-v19.2.2.linux-amd64/cockroach /usr/local/bin/
  1. Run the cockroach start command to prime the node to start.

8 - Initialize the cluster

Run the cockroach init command from within any node to complete the startup process and have them join together as a cluster. Note Although guidance says to run this command locally, this is not work for me!

cockroach init --certs-dir=certs --host=<private IP of node>

9 - Access the Admin UI

For secure clusters, only authorized users can access and view the Admin UI. The Admin UI can be accessed from any node in the cluster, via < node public IP >:8080.

  1. Launch the CockroachDB SQL client from within the node with the root user certificates.
cockroach sql --certs-dir=certs --host=<load balancer IP>
  1. Create a username and password for other users) for an account that will access the Admin UI.
CREATE USER admin WITH PASSWORD '<your-password>';
  1. Open the Admin UI with a web browser and log in!