These instructions describe how to deploy a DAOS Cluster using the example in terraform/examples/daos_cluster.
Deployment tasks described in these instructions:
- Deploy a DAOS cluster using Terraform
- Log into the first DAOS client instance
- Perform DAOS administrative tasks to prepare the storage
- Mount a DAOS container with DFuse (DAOS FUSE)
- Store files in a DAOS container
- Unmount the container
- Undeploy DAOS cluster (terraform destroy)
The steps in the Pre-Deployment Guide must be completed prior to deploying the DAOS cluster in this example.
The Pre-Deployment Guide describes how to build the DAOS images that are used to deploy server and client instances.
Clone the daos-stack/google-cloud-daos repository and change your working directory to the DAOS Cluster example directory.
cd ~/
git clone https://github.com/daos-stack/google-cloud-daos.git
cd ~/google-cloud-daos/terraform/examples/daos_cluster
Before you run terraform apply
to deploy the DAOS cluster you need to create a terraform.tfvars
file in the terraform/examples/daos_cluster
directory.
The terraform.tfvars
file contains the variable values for the configuration.
To ensure a successful deployment of a DAOS cluster there are two pre-configured terraform.tfvars.*.example
files that you can choose from.
You will need to decide which of these files to copy to terraform.tfvars
.
The terraform.tfvars.tco.example
contains variables for a DAOS cluster deployment with
-
16 DAOS Client instances
-
4 DAOS Server instances
Each server instance has sixteen 375GB NVMe SSDs
To use the terraform.tfvars.tco.example
file
cp terraform.tfvars.tco.example terraform.tfvars
The terraform.tfvars.perf.example
contains variables for a DAOS cluster deployment with
-
16 DAOS Client instances
-
4 DAOS Server instances
Each server instances has four 375GB NVMe SSDs
To use the terraform.tfvars.perf.example
file run
cp terraform.tfvars.perf.example terraform.tfvars
Now that you have a terraform.tfvars
file you need to replace the variable placeholders in the file with the values from your active gcloud
configuration.
To update the variables in terraform.tfvars
run
PROJECT_ID=$(gcloud config list --format 'value(core.project)')
REGION=$(gcloud config list --format 'value(compute.region)')
ZONE=$(gcloud config list --format 'value(compute.zone)')
sed -i "s/<project_id>/${PROJECT_ID}/g" terraform.tfvars
sed -i "s/<region>/${REGION}/g" terraform.tfvars
sed -i "s/<zone>/${ZONE}/g" terraform.tfvars
Billing Notification!
Running this example will incur charges in your project.
To avoid surprises, be sure to monitor your costs associated with running this example.
Don't forget to shut down the DAOS cluster with terraform destroy
when you are finished.
To deploy the DAOS cluster
terraform init
terraform plan -out=tfplan
terraform apply tfplan
Verify that the daos-client and daos-server instances are running.
gcloud compute instances list \
--filter="name ~ daos" \
--format="value(name,INTERNAL_IP)"
Log into the first server instance
gcloud compute ssh daos-client-0001
The dmg
command is used to perform adminstrative tasks such as formatting storage and managing pools and therefore must be run with sudo
.
Use dmg
to verify that the DAOS storage system is ready.
sudo dmg system query -v
The State column should display "Joined" for all servers.
Rank UUID Control Address Fault Domain State Reason
---- ---- --------------- ------------ ----- ------
0 0796c576-5651-4e37-aa15-09f333d2d2b8 10.128.0.35:10001 /daos-server-0001 Joined
1 f29f7058-8abb-429f-9fd3-8b13272d7de0 10.128.0.77:10001 /daos-server-0003 Joined
2 09fc0dab-c238-4090-b3f8-da2bd4dce108 10.128.0.81:10001 /daos-server-0002 Joined
3 2cc9140b-fb12-4777-892e-7d190f6dfb0f 10.128.0.30:10001 /daos-server-0004 Joined
View the amount of free NVMe storage.
sudo dmg storage query usage
The output will look different depending on which terraform.tfvars.*.example
file you copied to create the terraform.tfvars
file.
The output will look similar to this
Hosts SCM-Total SCM-Free SCM-Used NVMe-Total NVMe-Free NVMe-Used
----- --------- -------- -------- ---------- --------- ---------
daos-server-0001 48 GB 48 GB 0 % 1.6 TB 1.6 TB 0 %
daos-server-0002 48 GB 48 GB 0 % 1.6 TB 1.6 TB 0 %
daos-server-0003 48 GB 48 GB 0 % 1.6 TB 1.6 TB 0 %
daos-server-0004 48 GB 48 GB 0 % 1.6 TB 1.6 TB 0 %
This shows how much NVMe-Free space is available for each server.
Create a pool named pool1
that uses the total NVMe-Free for all servers.
sudo dmg pool create --size="100%" pool1
View the ACLs on pool1
sudo dmg pool get-acl pool1
# Owner: root@
# Owner Group: root@
# Entries:
A::OWNER@:rw
A:G:GROUP@:rw
Here we see that root owns the pool.
Add an ACE that will allow any user to create a container in the pool
sudo dmg pool update-acl -e A::EVERYONE@:rcta pool1
For more information about pools see
Create a container in the pool
daos container create --type=POSIX --properties=rf:0 pool1 cont1
For more information about containers see
Mount the container with dfuse
MOUNT_DIR="${HOME}/daos/cont1"
mkdir -p "${MOUNT_DIR}"
dfuse --singlethread --pool=pool1 --container=cont1 --mountpoint="${MOUNT_DIR}"
df -h -t fuse.daos
You can now store files in the DAOS container mounted on ${HOME}/daos/cont1
.
For more information about DFuse see the DAOS FUSE section of the User Guide.
The cont1
container is now mounted on ${HOME}/daos/cont1
Create a 20GiB file which will be stored in the DAOS filesystem.
cd ${HOME}/daos/cont1
# Create a 20GB file
time LD_PRELOAD=/usr/lib64/libioil.so \
dd if=/dev/zero of=./test20.img bs=1G count=20
cd ~/
fusermount -u "${HOME}/daos/cont1"
logout
To destroy the DAOS cluster run
terraform destroy
This will destroy all DAOS server and client instances.
You have successfully deployed a DAOS cluster using the terraform/examples/daos_cluster example!