Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Hosted Control Planes for ROSA to speed up cluster creation #748

Merged
merged 2 commits into from
Apr 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/actions/aurora-delete-database/action.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,4 +18,4 @@ runs:
working-directory: provision/aws/rds
env:
AURORA_CLUSTER: ${{ inputs.name }}
AWS_REGION: ${{ inputs.region }}
AURORA_REGION: ${{ inputs.region }}
23 changes: 6 additions & 17 deletions .github/workflows/rosa-cluster-create.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@ on:
type: string
computeMachineType:
description: 'Instance type for the compute nodes'
default: m5.xlarge
type: string
multiAz:
description: 'Deploy to multiple availability zones in the region'
Expand All @@ -20,7 +19,6 @@ on:
type: string
replicas:
description: 'Number of worker nodes to provision'
default: '2'
type: string
region:
description: 'The AWS region to create the cluster in. Defaults to "vars.AWS_DEFAULT_REGION" if omitted.'
Expand All @@ -33,23 +31,10 @@ on:
type: string
computeMachineType:
description: 'Instance type for the compute nodes'
required: true
default: m5.xlarge
type: string
multiAz:
description: 'Deploy to multiple availability zones in the region'
required: true
default: false
type: boolean
availabilityZones:
description: 'Availability zones to deploy to'
required: false
default: ''
type: string
replicas:
description: 'Number of worker nodes to provision'
required: true
default: '2'
type: string
region:
description: 'The AWS region to create the cluster in. Defaults to "vars.AWS_DEFAULT_REGION" if omitted.'
Expand All @@ -76,16 +61,20 @@ jobs:
aws-default-region: ${{ vars.AWS_DEFAULT_REGION }}
rosa-token: ${{ secrets.ROSA_TOKEN }}

- name: Setup OpenTofu
uses: opentofu/setup-opentofu@v1
with:
tofu_wrapper: false

- name: Create ROSA Cluster
run: ./rosa_create_cluster.sh
working-directory: provision/aws
env:
VERSION: ${{ env.OPENSHIFT_VERSION }}
CLUSTER_NAME: ${{ inputs.clusterName || format('gh-{0}', github.repository_owner) }}
COMPUTE_MACHINE_TYPE: ${{ inputs.computeMachineType }}
MULTI_AZ: ${{ inputs.multiAz }}
AVAILABILITY_ZONES: ${{ inputs.availabilityZones }}
REPLICAS: ${{ inputs.replicas }}
TF_VAR_rhcs_token: ${{ secrets.ROSA_TOKEN }}

- name: Archive ROSA logs
uses: actions/upload-artifact@v4
Expand Down
8 changes: 8 additions & 0 deletions .github/workflows/rosa-cluster-delete.yml
Original file line number Diff line number Diff line change
Expand Up @@ -45,19 +45,27 @@ jobs:
with:
clusterName: ${{ inputs.clusterName || format('gh-{0}', github.repository_owner) }}

- name: Setup OpenTofu
uses: opentofu/setup-opentofu@v1
with:
tofu_wrapper: false

- name: Delete a ROSA Cluster
if: ${{ inputs.deleteAll == 'no' }}
shell: bash
run: ./rosa_delete_cluster.sh
working-directory: provision/aws
env:
CLUSTER_NAME: ${{ inputs.clusterName || format('gh-{0}', github.repository_owner) }}
TF_VAR_rhcs_token: ${{ secrets.ROSA_TOKEN }}

- name: Delete all ROSA Clusters
if: ${{ inputs.deleteAll == 'yes' }}
shell: bash
run: ./rosa_cluster_reaper.sh
working-directory: provision/aws
env:
TF_VAR_rhcs_token: ${{ secrets.ROSA_TOKEN }}

- name: Archive ROSA logs
uses: actions/upload-artifact@v4
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/rosa-multi-az-cluster-create.yml
Original file line number Diff line number Diff line change
Expand Up @@ -86,8 +86,8 @@ jobs:

- name: Scale ROSA clusters
run: |
rosa edit machinepool -c ${{ env.CLUSTER_PREFIX }}-a --min-replicas 3 scaling
rosa edit machinepool -c ${{ env.CLUSTER_PREFIX }}-b --min-replicas 3 scaling
rosa edit machinepool -c ${{ env.CLUSTER_PREFIX }}-a --min-replicas 3 --max-replicas 10 scaling
rosa edit machinepool -c ${{ env.CLUSTER_PREFIX }}-b --min-replicas 3 --max-replicas 10 scaling
- name: Setup Go Task
uses: ./.github/actions/task-setup
Expand Down
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -96,3 +96,7 @@ quarkus/data/*.db
# Horreum #
###########
provision/environment_data.json

# OpenTofu / Terraform
**/*.tfstate*
**/*.terraform*
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ See <<aws-efs-as-readwritemany-storage>> for more information.
== Prerequisites

. xref:prerequisite/prerequisite-awscli.adoc[]
. [Install OpenTofu](https://opentofu.org/docs/intro/install/)
. Perform the steps outlined in the https://console.redhat.com/openshift/create/rosa/getstarted[ROSA installation guide]:
.. Enable ROSA Service in AWS account
.. Download and install the ROSA command line tool
Expand Down Expand Up @@ -47,8 +48,6 @@ If no `ADMIN_PASSWORD` is provided in the configuration, it reads it from the AW
`VERSION`:: OpenShift cluster version.
`REGION`:: AWS region where the cluster should run.
`COMPUTE_MACHINE_TYPE`:: https://aws.amazon.com/ec2/instance-types/[AWS instance type] for the default OpenShift worker machine pool.
`MULTI_AZ`:: Boolean parameter to indicate whether the OpenShift cluster should span many Availability Zones within the selected region.
`AVAILABILITY_ZONES`:: Comma separated list of Availability Zones to use for the cluster. For example, `eu-central-1a,eu-central-1b`.
`REPLICAS`:: Number of worker nodes.
If multi-AZ installation is selected, then this needs to be a multiple of the number of AZs available in the region.
For example, if the region has 3 AZs, then replicas need to be set to some multiple of 3.
Expand Down
1 change: 1 addition & 0 deletions provision/aws/efs/.gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
manifests
ccoctl
iam-trust.json

This file was deleted.

44 changes: 44 additions & 0 deletions provision/aws/efs/iam-policy.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"elasticfilesystem:DescribeMountTargets",
"elasticfilesystem:DescribeAccessPoints",
"elasticfilesystem:DescribeFileSystems",
"elasticfilesystem:ClientMount",
"elasticfilesystem:ClientWrite",
"elasticfilesystem:CreateTags",
"elasticfilesystem:CreateMountTarget",
"elasticfilesystem:DeleteMountTarget",
"elasticfilesystem:DeleteTags",
"elasticfilesystem:TagResource",
"elasticfilesystem:UntagResource"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"elasticfilesystem:CreateAccessPoint"
],
"Resource": "*",
"Condition": {
"StringLike": {
"aws:RequestTag/efs.csi.aws.com/cluster": "true"
}
}
},
{
"Effect": "Allow",
"Action": "elasticfilesystem:DeleteAccessPoint",
"Resource": "*",
"Condition": {
"StringEquals": {
"aws:ResourceTag/efs.csi.aws.com/cluster": "true"
}
}
}
]
}
2 changes: 1 addition & 1 deletion provision/aws/rds/aurora_create_peering_connection.sh
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ aws ec2 accept-vpc-peering-connection \

# Update the ROSA Cluster VPC's Route Table
ROSA_PUBLIC_ROUTE_TABLE_ID=$(aws ec2 describe-route-tables \
--filters "Name=vpc-id,Values=${ROSA_VPC}" "Name=association.main,Values=true" \
--filters "Name=vpc-id,Values=${ROSA_VPC}" "Name=tag:Name,Values=*public*" \
--query "RouteTables[*].RouteTableId" \
--output text
)
Expand Down
134 changes: 54 additions & 80 deletions provision/aws/rosa_create_cluster.sh
Original file line number Diff line number Diff line change
@@ -1,100 +1,74 @@
#!/usr/bin/env bash
set -e

if [[ "$RUNNER_DEBUG" == "1" ]]; then
set -x
fi

if [ -f ./.env ]; then
source ./.env
fi

function requiredEnv() {
for ENV in $@; do
if [ -z "${!ENV}" ]; then
echo "${ENV} variable must be set"
exit 1
fi
done
}

SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )

AWS_ACCOUNT=${AWS_ACCOUNT:-$(aws sts get-caller-identity --query "Account" --output text)}
if [ -z "$AWS_ACCOUNT" ]; then echo "Variable AWS_ACCOUNT needs to be set."; exit 1; fi

if [ -z "$VERSION" ]; then echo "Variable VERSION needs to be set."; exit 1; fi
CLUSTER_NAME=${CLUSTER_NAME:-$(whoami)}
if [ -z "$CLUSTER_NAME" ]; then echo "Variable CLUSTER_NAME needs to be set."; exit 1; fi
if [ -z "$REGION" ]; then echo "Variable REGION needs to be set."; exit 1; fi
if [ -z "$COMPUTE_MACHINE_TYPE" ]; then echo "Variable COMPUTE_MACHINE_TYPE needs to be set."; exit 1; fi

if [ "$MULTI_AZ" = "true" ]; then MULTI_AZ_PARAM="--multi-az"; else MULTI_AZ_PARAM=""; fi
if [ -z "$AVAILABILITY_ZONES" ]; then AVAILABILITY_ZONES_PARAM=""; else AVAILABILITY_ZONES_PARAM="--availability-zones $AVAILABILITY_ZONES"; fi
if [ -z "$REPLICAS" ]; then echo "Variable REPLICAS needs to be set."; exit 1; fi

echo "Checking if cluster ${CLUSTER_NAME} already exists."
if rosa describe cluster --cluster="${CLUSTER_NAME}"; then
echo "Cluster ${CLUSTER_NAME} already exists."
else
echo "Verifying ROSA prerequisites."
echo "Check if AWS CLI is installed."; aws --version
echo "Check if ROSA CLI is installed."; rosa version
echo "Check if ELB service role is enabled."
if ! aws iam get-role --role-name "AWSServiceRoleForElasticLoadBalancing" --no-cli-pager; then
aws iam create-service-linked-role --aws-service-name "elasticloadbalancing.amazonaws.com"
fi
rosa whoami
rosa verify quota

echo "Installing ROSA cluster ${CLUSTER_NAME}"

MACHINE_CIDR=$(./rosa_machine_cidr.sh)

ROSA_CMD="rosa create cluster \
--sts \
--cluster-name ${CLUSTER_NAME} \
--version ${VERSION} \
--role-arn arn:aws:iam::${AWS_ACCOUNT}:role/ManagedOpenShift-Installer-Role \
--support-role-arn arn:aws:iam::${AWS_ACCOUNT}:role/ManagedOpenShift-Support-Role \
--controlplane-iam-role arn:aws:iam::${AWS_ACCOUNT}:role/ManagedOpenShift-ControlPlane-Role \
--worker-iam-role arn:aws:iam::${AWS_ACCOUNT}:role/ManagedOpenShift-Worker-Role \
--operator-roles-prefix ${CLUSTER_NAME} \
--region ${REGION} ${MULTI_AZ_PARAM} ${AVAILABILITY_ZONES_PARAM} \
--replicas ${REPLICAS} \
--compute-machine-type ${COMPUTE_MACHINE_TYPE} \
--machine-cidr ${MACHINE_CIDR} \
--service-cidr 172.30.0.0/16 \
--pod-cidr 10.128.0.0/14 \
--host-prefix 23"

echo $ROSA_CMD
$ROSA_CMD

requiredEnv AWS_ACCOUNT CLUSTER_NAME REGION

export CLUSTER_NAME=${CLUSTER_NAME:-$(whoami)}

echo "Verifying ROSA prerequisites."
echo "Check if AWS CLI is installed."; aws --version
echo "Check if ROSA CLI is installed."; rosa version
echo "Check if ELB service role is enabled."
if ! aws iam get-role --role-name "AWSServiceRoleForElasticLoadBalancing" --no-cli-pager; then
aws iam create-service-linked-role --aws-service-name "elasticloadbalancing.amazonaws.com"
fi
rosa whoami
rosa verify quota

mkdir -p "logs/${CLUSTER_NAME}"
echo "Installing ROSA cluster ${CLUSTER_NAME}"

function custom_date() {
date '+%Y%m%d-%H%M%S'
}
cd ${SCRIPT_DIR}/../opentofu/modules/rosa/hcp
tofu init
tofu workspace new ${CLUSTER_NAME} || true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused why this is here. Is each cluster in a separate workspace? My expectation was there will be a new workspace created in the beginning and then all resources will be created in that workspace. For example, the daily run will have only one workspace. Did I miss something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's possible for us to use a single workspace for two clusters as we're effectively using the same module but with different configurations in order to create the two clusters. I think on executing apply for the second time with a different cluster_name opentofu will attempt to update the first cluster's resources.

This is all new to me as well though, so maybe I'm misunderstanding 🙂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is new to me as well but using the same module more than once should be possible.

For example, I used two modules here: https://github.com/mhajas/keycloak-benchmark/blob/opentofu-poc/provision/opentofu/main.tf#L7-L14

You can also specify aliases for providers and use more provider configurations within one tf file like this:
https://github.com/mhajas/keycloak-benchmark/blob/opentofu-poc/provision/opentofu/infinispan/main.tf#L7
and
https://github.com/mhajas/keycloak-benchmark/blob/opentofu-poc/provision/opentofu/infinispan/main.tf#L92

But as I said, it is ok for me to merge this as is and play with enhancements later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, I used two modules here: https://github.com/mhajas/keycloak-benchmark/blob/opentofu-poc/provision/opentofu/main.tf#L7-L14

I hadn't considered referencing two modules from the root like that 🙂

I think the downside to ^ approach is that we would have to make more changes to our various *.sh scripts and GH actions as we would either need to replace the use of rosa_create_cluster.sh or adapt it to support creating multiple clusters at once.

The flip side is that it would definitely make it easier to create multiple clusters in parallel, as the first stage of the module could be to determine CIDR ranges for both clusters before provisioning them.

export TF_WORKSPACE=${CLUSTER_NAME}

TOFU_CMD="tofu apply -auto-approve \
-var cluster_name=${CLUSTER_NAME} \
-var region=${REGION}"

if [ -n "${COMPUTE_MACHINE_TYPE}" ]; then
TOFU_CMD+=" -var instance_type=${COMPUTE_MACHINE_TYPE}"
fi

if [ -n "${VERSION}" ]; then
TOFU_CMD+=" -var openshift_version=${VERSION}"
fi

if [ -n "${REPLICAS}" ]; then
TOFU_CMD+=" -var replicas=${REPLICAS}"
fi

echo "Creating operator roles."
rosa create operator-roles --cluster "${CLUSTER_NAME}" --mode auto --yes > "logs/${CLUSTER_NAME}/$(custom_date)_create-operator-roles.log"

echo "Creating OIDC provider."
rosa create oidc-provider --cluster "${CLUSTER_NAME}" --mode auto --yes > "logs/${CLUSTER_NAME}/$(custom_date)_create-oidc-provider.log"

echo "Waiting for cluster installation to finish."
# There have been failures with 'ERR: Failed to watch logs for cluster ... connection reset by peer' probably because services in the cluster were restarting during the cluster initialization.
# Those errors don't show an installation problem, and installation will continue asynchronously. Therefore, retry.
TIMEOUT=$(($(date +%s) + 3600))
while true ; do
if (rosa logs install --cluster "${CLUSTER_NAME}" --watch --tail=1000000 >> "logs/${CLUSTER_NAME}/$(custom_date)_create-cluster.log"); then
break
fi
if (( TIMEOUT < $(date +%s))); then
echo "Timeout exceeded"
exit 1
fi
echo "retrying watching logs after failure"
sleep 1
done

echo "Cluster installation complete."
echo

./rosa_recreate_admin.sh
echo ${TOFU_CMD}
${TOFU_CMD}

SCALING_MACHINE_POOL=$(rosa list machinepools -c "${CLUSTER_NAME}" -o json | jq -r '.[] | select(.id == "scaling") | .id')
if [[ "${SCALING_MACHINE_POOL}" != "scaling" ]]; then
rosa create machinepool -c "${CLUSTER_NAME}" --instance-type m5.4xlarge --max-replicas 10 --min-replicas 0 --name scaling --enable-autoscaling
rosa create machinepool -c "${CLUSTER_NAME}" --instance-type m5.4xlarge --max-replicas 10 --min-replicas 1 --name scaling --enable-autoscaling
fi

cd ${SCRIPT_DIR}
./rosa_oc_login.sh
./rosa_efs_create.sh
../infinispan/install_operator.sh

Expand Down
Loading