Skip to content

Commit

Permalink
Adding a machinepool for load tests
Browse files Browse the repository at this point in the history
  • Loading branch information
ahus1 committed Jun 29, 2023
1 parent 8870d45 commit 67f7dd6
Show file tree
Hide file tree
Showing 6 changed files with 47 additions and 6 deletions.
13 changes: 12 additions & 1 deletion .github/workflows/keycloak-scalability-benchmark.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,8 @@ on:
initialUsersPerSecond:
description: 'Initial users per second'
type: string
default: '1'
# the number of the first iteration should be large enough to warm up all the nodes
default: '20'
skipCreateDeployment:
description: 'Skip creating Keycloak deployment'
type: boolean
Expand Down Expand Up @@ -90,12 +91,18 @@ jobs:
tar xfvz benchmark/target/keycloak-benchmark-*.tar.gz
mv keycloak-benchmark-* keycloak-benchmark
- name: Allow cluster to scale
if: ${{ !inputs.skipCreateDeployment }}
run: rosa edit machinepool -c ${{ inputs.clusterName }} --min-replicas 3 --max-replicas 10 scaling

- name: Create Keycloak deployment
if: ${{ !inputs.skipCreateDeployment }}
uses: ./.github/actions/keycloak-create-deployment
with:
projectPrefix: ${{ env.PROJECT_PREFIX }}
disableStickySessions: true
podCpuRequests: 4
podCpuLimit: 4

- name: Create Keycloak dataset with "${{ inputs.numberOfEntitiesInRealm }}" clients
if: "!inputs.skipCreateDataset && inputs.scenarioName == 'authentication.ClientSecret'"
Expand Down Expand Up @@ -174,6 +181,10 @@ jobs:
with:
project: ${{ env.PROJECT }}

- name: Scale down the cluster

This comment has been minimized.

Copy link
@kami619

kami619 Jun 30, 2023

Contributor

this would be really helpful to just scale down the worker nodes and get them back up as we cycle between runs. thanks for adding this.

if: ${{ (success() || failure()) && !inputs.skipDeleteProject }}
run: rosa edit machinepool -c ${{ inputs.clusterName }} --min-replicas 0 --max-replicas 0 scaling

archive:
name: Commit results to Git repository
runs-on: ubuntu-latest
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,27 @@ rosa describe cluster -c _cluster-name_

The above installation script creates an admin user automatically but in case the user needs to be re-created it can be done via the `rosa_recreate_admin.sh` script, providing the `CLUSTER_NAME` and optionally `ADMIN_PASSWORD` parameter.

== Scaling the cluster's nodes on demand

The standard setup of nodes might be too small for running a load test, at the same time using a different instance type and rebuilding the cluster takes a lot of time (about 45 minutes).
To scale the cluster on demand, the standard setup has a machine pool named `scaling` with instances of type `m5.4xlarge` spot instances which is scaled to zero by default.

This comment has been minimized.

Copy link
@kami619

kami619 Jun 30, 2023

Contributor

should we mention anything about homogeneous vs heterogeneous clusters based on the ec2 mixture we might end up having ?

This comment has been minimized.

Copy link
@ahus1

ahus1 Jun 30, 2023

Author Contributor

I would skip that for now. Let's see how this evolves.


To scale the machine pool at runtime, issue a command like the following, and the additional nodes will be available within about 5 minutes:

[source,bash,subs=+quotes]
----
rosa edit machinepool -c _**clustername**_ --min-replicas 3 --max-replicas 10 scaling
----

To scale down the cluster, use a command like the following:

[source,bash,subs=+quotes]
----
rosa edit machinepool -c _**clustername**_ --min-replicas 0 --max-replicas 0 scaling
----

To use different instance types, use `rosa create machinepool` to create additional machine pools

[#aws-efs-as-readwritemany-storage]
== AWS Elastic File Service as ReadWriteMany storage

Expand Down
2 changes: 2 additions & 0 deletions provision/aws/rosa_create_cluster.sh
Original file line number Diff line number Diff line change
Expand Up @@ -87,4 +87,6 @@ echo

./rosa_recreate_admin.sh

rosa create machinepool -c "${CLUSTER_NAME}" --instance-type m5.4xlarge --max-replicas 0 --min-replicas 0 --name scaling --use-spot-instances --enable-autoscaling

This comment has been minimized.

Copy link
@kami619

kami619 Jun 30, 2023

Contributor

are we using spot instances for the initial cluster creation too ? would it be "fine" to use a mixture of spot and reserved ec2 nodes in a worker machine pool ?

This comment has been minimized.

Copy link
@ahus1

ahus1 Jun 30, 2023

Author Contributor

I'd argue the workers for the load test are less important than the master nodes, and the ingresses. I'd say it is more of a trying-out-of-things, and see how they behave. Mixing is fine IMHO, and only those nodes you are ok to loose should be spot instances.

Having said that, it can waste a load rund if a node is re-spawn due to that during a load run.

Again, let's see it as an experiment so see how this goes. The downside is it adds some complexity to the setup, the upside is we learn something and might safe some money.

This comment has been minimized.

Copy link
@kami619

kami619 Jun 30, 2023

Contributor

sure thing. just making sure we are aware of the mixed cluster possibility. But it would be less of a chance, given we are going to be defaulting to the m5.4xl ec2 type for larger tests in general.


./rosa_efs_create.sh
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,11 @@ spec:
- -c
- max_prepared_transactions=100
{{ end }}
resources:
requests:
cpu: "{{ .Values.cpuRequests }}"

This comment has been minimized.

Copy link
@kami619

kami619 Jun 30, 2023

Contributor

are these supposed to be set at the workflow level ?
if yes, the name doesnt seem to match

podCpuRequests: 4
podCpuLimit: 4

This comment has been minimized.

Copy link
@ahus1

ahus1 Jun 30, 2023

Author Contributor

This cheap solution uses the same CPU requests and limits as the Keycloak pod to avoid additional configurations. Let see how this evolves.

This comment has been minimized.

Copy link
@kami619

kami619 Jun 30, 2023

Contributor

gotcha. thanks for the clarification.

limits:
cpu: "{{ .Values.cpuLimits }}"
startupProbe:
tcpSocket:
port: 5432
Expand Down
2 changes: 1 addition & 1 deletion provision/openshift/Taskfile.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -190,7 +190,7 @@ tasks:
# TODO: cryostat is disabled as it requires persistent volumes on OpenShift in the current setup
# TODO: sqlpad is disabled as it runs as root in the container, see https://github.com/sqlpad/sqlpad/issues/1171
- >
helm upgrade --install keycloak
helm upgrade --install keycloak --namespace {{.KC_NAMESPACE_PREFIX}}keycloak
--set hostname={{.KC_HOSTNAME_SUFFIX}}
--set otel={{.KC_OTEL}}
--set otelSamplingPercentage={{.KC_OTEL_SAMPLING_PERCENTAGE}}
Expand Down
10 changes: 6 additions & 4 deletions provision/openshift/isup.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
#!/usr/bin/env bash
# set -x

set -e

# Default values for variables from Taskfile.yml are not part of .env file, therefore we need to load them manually
KC_NAMESPACE_PREFIX=$(cat .task/var-KC_NAMESPACE_PREFIX)
KC_HOSTNAME_SUFFIX=$(cat .task/var-KC_HOSTNAME_SUFFIX)
Expand All @@ -10,9 +12,9 @@ if [ -f ./.env ]; then
fi

# kill all CrashLoopBackOff and ImagePullBackOff pods to trigger a fast restart and not wait Kubernetes
kubectl get pods -n "${KC_NAMESPACE_PREFIX}keycloak" | grep -E "(BackOff|Error)" | tr -s " " | cut -d" " -f1 | xargs -r -L 1 kubectl delete pod -n keycloak
kubectl get pods -n "${KC_NAMESPACE_PREFIX}keycloak" | grep -E "(BackOff|Error)" | tr -s " " | cut -d" " -f1 | xargs -r -L 1 kubectl delete pod -n ${KC_NAMESPACE_PREFIX}keycloak

MAXRETRIES=600
MAXRETRIES=1200

declare -A SERVICES=( \
["keycloak-${KC_NAMESPACE_PREFIX}keycloak.${KC_HOSTNAME_SUFFIX}"]="realms/master/.well-known/openid-configuration" \
Expand All @@ -24,8 +26,8 @@ for SERVICE in "${!SERVICES[@]}"; do

if [ "${SERVICE}" == "keycloak-${KC_NAMESPACE_PREFIX}keycloak.${KC_HOSTNAME_SUFFIX}" ]
then
kubectl wait --for=condition=Available --timeout=300s deployments.apps/keycloak-operator -n "${KC_NAMESPACE_PREFIX}keycloak"
kubectl wait --for=condition=Ready --timeout=300s keycloaks.k8s.keycloak.org/keycloak -n "${KC_NAMESPACE_PREFIX}keycloak"
kubectl wait --for=condition=Available --timeout=1200s deployments.apps/keycloak-operator -n "${KC_NAMESPACE_PREFIX}keycloak"
kubectl wait --for=condition=Ready --timeout=1200s keycloaks.k8s.keycloak.org/keycloak -n "${KC_NAMESPACE_PREFIX}keycloak"
fi

until kubectl get ingress -A 2>/dev/null | grep ${SERVICE} >/dev/null && curl -k -f -v https://${SERVICE}/${SERVICES[${SERVICE}]} >/dev/null 2>/dev/null
Expand Down

0 comments on commit 67f7dd6

Please sign in to comment.