Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
249 changes: 249 additions & 0 deletions misc/nvme-bootstrap/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,249 @@
# OpenEBS NVMe Bootstrap for Materialize
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might delete this readme file all together, most of this is already available in the helm chart readme anyway, and the AWS specific implementation will be done via Terraform.


This guide helps you set up and configure NVMe instance store volumes for optimal Materialize performance on Kubernetes. The solution provides automatic detection and configuration of NVMe devices, making them available to Materialize through OpenEBS LVM storage classes.

> **WARNING:** This setup **automatically partitions and formats NVMe instance store volumes**. Make sure your nodes have NVMe storage (`r6gd.2xlarge`, `r7gd.2xlarge`), and verify backups before proceeding with the setup.

## Overview

Materialize requires fast, locally-attached NVMe storage for optimal performance. This solution:

1. Automatically detects NVMe instance store devices on your nodes
2. Creates an LVM volume group from these devices
3. Configures OpenEBS LVM Local-PV to provision persistent volumes from this storage
4. Makes high-performance storage available to Materialize

## Prerequisites

- AWS account with permissions to create EC2 instances with NVMe storage
- Kubernetes cluster with nodes that have NVMe instance store volumes
- **Important**: You must use instance types with NVMe storage (those with the "d" suffix)
- Recommended instance types: `r6gd.2xlarge`, `r7gd.2xlarge` (not `r8g.2xlarge` which lacks NVMe storage)
- When using Bottlerocket OS, additional configuration is handled automatically
- Tools required:
- [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/)
- [Helm](https://helm.sh/docs/intro/install/) (v3.2.0+)
- [Docker](https://docs.docker.com/get-docker/) (for building the container)

## Automated Setup with Terraform

If you're using the [Materialize AWS Terraform module](https://github.com/MaterializeInc/terraform-aws-materialize), you can enable NVMe bootstrap by configuring:

```hcl
module "materialize" {
source = "git::https://github.com/MaterializeInc/terraform-aws-materialize.git"

# Use an instance type with NVMe storage
node_group_instance_types = ["r6gd.2xlarge"]
node_group_ami_type = "BOTTLEROCKET_ARM_64"
enable_nvme_storage = true

install_materialize_operator = true

# Other module parameters...
}
```

The module handles creating the appropriate storage class and configuring Materialize to use it.

## Manual Setup

**TODO:** The following steps will be eventually automated in the Terraform modules for Materialize but can be done manually for now.

If you're setting up manually or need to customize the configuration, follow these steps:

### Step 1: Build and Push the Container Image

> Note: This is temporary and will be replaced with a pre-built image which can be pulled from a public registry.

```bash
# Clone the Materialize repository
git clone https://github.com/MaterializeInc/materialize.git
cd materialize

# Navigate to the container directory
cd misc/nvme-bootstrap/container

# Build the image
docker build -t your-registry/nvme-bootstrap:latest .

# Push to your registry
docker push your-registry/nvme-bootstrap:latest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they are pushing it themselves, where do we configure which bootstrap image to use in the terraform?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes this is more of a note for myself while testing, next on my list is to add this to CI so we push the image to Docker hub. Open to suggestions on the Docker image name though!

```

### Step 2: Deploy the NVMe Bootstrap Components

```bash
# Navigate to the Kubernetes manifests directory
cd misc/nvme-bootstrap/kubernetes

# Apply RBAC resources for the bootstrap component
kubectl apply -f rbac.yaml

# Deploy the DaemonSet (update the image reference if needed)
kubectl apply -f daemonset.yaml

# Get the pod logs to monitor the setup
kubectl logs -n kube-system -l app=nvme-disk-setup

# Wait for the pods to be ready
kubectl -n kube-system wait --for=condition=Ready pods -l app=nvme-disk-setup --timeout=120s
```

The DaemonSet will:

1. Run on all nodes in your cluster
2. Detect available NVMe devices
3. Create the "instance-store-vg" volume group
4. Make the storage available for OpenEBS

### Step 3: Install OpenEBS

OpenEBS provides the CSI driver that interfaces with LVM to provide persistent storage:

```bash
# Add the OpenEBS Helm repository
helm repo add openebs https://openebs.github.io/charts
helm repo update

# Create namespace for OpenEBS
kubectl create namespace openebs

# Install OpenEBS with only the necessary components
helm install openebs openebs/openebs \
--namespace openebs \
--set engines.replicated.mayastor.enabled=false
```

Verify the installation:

```bash
# Check if the LVM controller is running
kubectl get pods -n openebs -l role=openebs-lvm
```

### Step 4: Create and Test the Storage Class

```bash
# TODO: remove this step once the Terraform module handles this
# Create the StorageClass
kubectl apply -f storageclass.yaml

# Deploy a test PVC and Pod to verify functionality
kubectl apply -f test-pvc.yaml

# Check if the PVC is bound
kubectl get pvc test-lvm-pvc
```

A successful test shows your storage class is working correctly.

To clean up the test resources:

```bash
# Delete the test PVC
kubectl delete -f test-pvc.yaml
```

### Step 5: Configure Materialize to Use the Storage Class

When installing Materialize, provide the storage class configuration:

```bash
# Create Helm values file
cat > materialize-values.yaml << EOF
storage:
storageClass:
create: true
name: "openebs-lvm-instance-store-ext4"
EOF

# Install Materialize with the storage configuration
helm install my-materialize-operator materialize/materialize-operator \
--namespace materialize \
--create-namespace \
--set observability.podMetrics.enabled=true \
--values materialize-values.yaml
```

This configures Materialize to use the NVMe-backed storage class for its persistent storage needs.

If you are doing this using the [Materialize Helm Terraform module](https://github.com/materializeInc/terraform-helm-materialize), you can set the `storageClass` field in the `materialize` module to `openebs-lvm-instance-store-ext4`.

```
...
storage = {
storageClass = {
create = true
name = "openebs-lvm-instance-store-ext4"
provisioner = "local.csi.openebs.io"
parameters = {
storage = "lvm"
fsType = "ext4"
volgroup = "instance-store-vg"
}
}
}
...
```

## Verifying the Setup

To verify your NVMe bootstrap setup is working correctly:

```bash
# Check the NVMe setup logs
kubectl logs -n kube-system -l app=nvme-disk-setup

# Check that PVCs can be created with the storage class
kubectl get pvc -A | grep openebs-lvm-instance-store-ext4
```

## Troubleshooting

### Common Issues and Solutions

#### No NVMe Devices Found

**Symptom**: The bootstrap logs show "No suitable NVMe devices found"

**Solution**:
- Verify you're using instance types with NVMe storage (with "d" suffix)
- Check the instance type with:
```bash
kubectl debug node/$NODE_NAME -it --image=busybox -- cat /host/etc/ec2_instance_type
```
- If using AWS, ensure you're using types like r6gd.2xlarge, not r6g.2xlarge

#### Pod Fails to Create Storage

**Symptom**: LVM setup fails or PVCs remain in Pending status

**Solution**:
- Check if OpenEBS components are running:
```bash
kubectl get pods -n openebs
```
- Verify the volume group exists:
```bash
kubectl debug node/$NODE_NAME -it --image=ubuntu -- vgs
```
- Check OpenEBS logs:
```bash
kubectl logs -n openebs -l role=openebs-lvm
```

#### Permission Issues

**Symptom**: Permission denied errors in logs

**Solution**:
- Verify RBAC resources are correctly applied:
```bash
kubectl get clusterrole node-taint-manager
kubectl get clusterrolebinding nvme-setup-taint-binding
```
- Check the service account:
```bash
kubectl get serviceaccount nvme-setup-sa -n kube-system
```
28 changes: 28 additions & 0 deletions misc/nvme-bootstrap/container/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Copyright Materialize, Inc. and contributors. All rights reserved.
#
# Use of this software is governed by the Business Source License
# included in the LICENSE file at the root of this repository.
#
# As of the Change Date specified in that file, in accordance with
# the Business Source License, use of this software will be governed
# by the Apache License, Version 2.0.

FROM alpine:3.19

RUN apk add --no-cache \
nvme-cli \
lvm2 \
lsblk \
bash \
jq \
curl \
kubectl

# LVM configuration file
COPY lvm.conf /etc/lvm/lvm.conf
# Disk configuration script
COPY configure-disks.sh /usr/local/bin/configure-disks.sh
# Taint management script
COPY manage-taints.sh /usr/local/bin/manage-taints.sh

RUN chmod +x /usr/local/bin/configure-disks.sh /usr/local/bin/manage-taints.sh
Loading