Skip to content

Commit

Permalink
Active/Active XSite fencing (#819)
Browse files Browse the repository at this point in the history
related to keycloak/keycloak#29303

- User alert routing enabled on ROSA clusters

- PrometheusRule used to trigger AWS Lambda webhook in the event of a
  split-brain so that only a single site remains in the global accelerator endpoints

- Global Accelerator scripts refactored to use OpenTofu when creating
  AWS resources

- Task created to deploy/undeploy Active/Active

- Task created to simulate split-brain scenarios

- 'active-active' flag added to GH actions to differentiate between
  active/passive and active/active deployments

- 'active-active' and 'active-passive' tags added to crossdc-tests to
  allow different behaviours/tests to be executed for the given
  deployment type.

- Active/Active specific test cases added. Testsuite now interacts
  directly with k8s clusters in order to have greater control over
  deployments being tested. This is necessary so that we can simulate
  split-brain scenarios between sites.

- Daily scheduled job updated to run tests against both active/passive
  and active/active deployments

Signed-off-by: Ryan Emerson <[email protected]>
Co-authored-by: Michal Hajas <[email protected]>
Co-authored-by: Pedro Ruivo <[email protected]>
Signed-off-by: Ryan Emerson <[email protected]>
  • Loading branch information
3 people committed Jun 11, 2024
1 parent 4e9ce2f commit f30cebc
Show file tree
Hide file tree
Showing 51 changed files with 1,866 additions and 279 deletions.
1 change: 1 addition & 0 deletions .github/actions/rosa-cli-setup/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -50,3 +50,4 @@ runs:
run: |
ADMIN_PASSWORD=$(aws secretsmanager get-secret-value --region $SECRET_MANAGER_REGION --secret-id $KEYCLOAK_MASTER_PASSWORD_SECRET_NAME --query SecretString --output text --no-cli-pager)
echo "::add-mask::$ADMIN_PASSWORD"
echo "KEYCLOAK_ADMIN_PASSWORD=${ADMIN_PASSWORD}" >> $GITHUB_ENV
40 changes: 39 additions & 1 deletion .github/workflows/rosa-cluster-auto-provision-on-schedule.yml
Original file line number Diff line number Diff line change
Expand Up @@ -53,11 +53,49 @@ jobs:
createCluster: false
secrets: inherit

run-scaling-benchmark-with-peristent-sessions:
run-scaling-benchmark-with-persistent-sessions:
needs: keycloak-deploy-with-persistent-sessions
uses: ./.github/workflows/rosa-scaling-benchmark.yml
with:
clusterName: gh-keycloak-a # ${{ env.CLUSTER_PREFIX }}-a -- unfortunately 'env.' doesn't work here ${{ env.CLUSTER_PREFIX }}-a
skipCreateDataset: true
outputArchiveSuffix: 'persistent-sessions'
secrets: inherit

keycloak-undeploy-with-persistent-sessions:
needs: run-scaling-benchmark-with-persistent-sessions
name: Undeploy Keycloak deployment on the multi-az cluster
if: github.event_name != 'schedule' || github.repository == 'keycloak/keycloak-benchmark'
uses: ./.github/workflows/rosa-multi-az-cluster-undeploy.yml
with:
clusterPrefix: gh-keycloak # ${{ env.CLUSTER_PREFIX }} -- unfortunately 'env.' doesn't work here
skipAuroraDeletion: true
secrets: inherit

keycloak-deploy-active-active:
needs: keycloak-undeploy-with-persistent-sessions
name: ROSA Scheduled Create Active/Active cluster with Persistent Sessions
if: github.event_name != 'schedule' || github.repository == 'keycloak/keycloak-benchmark'
uses: ./.github/workflows/rosa-multi-az-cluster-create.yml
with:
clusterPrefix: gh-keycloak # ${{ env.CLUSTER_PREFIX }} -- unfortunately 'env.' doesn't work here
enablePersistentSessions: true
createCluster: false
activeActive: true
secrets: inherit

run-functional-tests-active-active:
needs: keycloak-deploy-active-active
uses: ./.github/workflows/rosa-run-crossdc-func-tests.yml
with:
activeActive: true
clusterPrefix: gh-keycloak # ${{ env.CLUSTER_PREFIX }} -- unfortunately 'env.' doesn't work here
secrets: inherit

run-scaling-benchmark-active-active:
needs: run-functional-tests-active-active
uses: ./.github/workflows/rosa-scaling-benchmark.yml
with:
clusterName: gh-keycloak-a # ${{ env.CLUSTER_PREFIX }}-a -- unfortunately 'env.' doesn't work here ${{ env.CLUSTER_PREFIX }}-a
outputArchiveSuffix: 'active-active'
secrets: inherit
65 changes: 60 additions & 5 deletions .github/workflows/rosa-multi-az-cluster-create.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,10 @@ on:
keycloakRepository:
description: 'The repository to deploy Keycloak from. If not set nightly image is used'
type: string
activeActive:
description: 'When true deploy an Active/Active Keycloak deployment'
type: boolean
default: false
enablePersistentSessions:
description: 'To enable Persistent user and client sessions to the DB'
type: boolean
Expand All @@ -32,16 +36,20 @@ on:
description: 'The AWS region to create both clusters in. Defaults to "vars.AWS_DEFAULT_REGION" if omitted.'
type: string
createCluster:
description: 'Check to Create Cluster'
description: 'Check to Create Cluster.'
type: boolean
default: true
keycloakRepository:
description: 'The repository to deploy Keycloak from. If not set nightly image is used'
type: string
activeActive:
description: 'When true deploy an Active/Active Keycloak deployment'
type: boolean
default: false
enablePersistentSessions:
description: 'To enable Persistent user and client sessions to the DB'
type: boolean
default: false
keycloakRepository:
description: 'The repository to deploy Keycloak from. If not set nightly image is used'
type: string
keycloakBranch:
description: 'The branch to deploy Keycloak from. If not set nightly image is used'
type: string
Expand Down Expand Up @@ -109,6 +117,11 @@ jobs:
- name: Checkout repository
uses: actions/checkout@v4

- name: Setup OpenTofu
uses: opentofu/setup-opentofu@v1
with:
tofu_wrapper: false

- name: Setup ROSA CLI
uses: ./.github/actions/rosa-cli-setup
with:
Expand Down Expand Up @@ -140,6 +153,7 @@ jobs:
ROSA_CLUSTER_NAME_2: ${{ env.CLUSTER_PREFIX }}-b

- name: Create Route53 Loadbalancer
if: ${{ !inputs.activeActive }}
working-directory: provision/rosa-cross-dc
run: |
task route53 > route53
Expand All @@ -150,10 +164,51 @@ jobs:
ROSA_CLUSTER_NAME_1: ${{ env.CLUSTER_PREFIX }}-a
ROSA_CLUSTER_NAME_2: ${{ env.CLUSTER_PREFIX }}-b

- name: Deploy
- name: Deploy Active/Passive
if: ${{ !inputs.activeActive }}
working-directory: provision/rosa-cross-dc
run: task
env:
AURORA_CLUSTER: ${{ env.CLUSTER_PREFIX }}
AURORA_REGION: ${{ env.REGION }}
ROSA_CLUSTER_NAME_1: ${{ env.CLUSTER_PREFIX }}-a
ROSA_CLUSTER_NAME_2: ${{ env.CLUSTER_PREFIX }}-b
KC_ACTIVE_ACTIVE: ${{ inputs.activeActive }}
KC_CPU_REQUESTS: 6
KC_INSTANCES: 3
KC_DISABLE_STICKY_SESSION: true
KC_PERSISTENT_SESSIONS: ${{ env.KC_PERSISTENT_SESSIONS }}
KC_MEMORY_REQUESTS_MB: 3000
KC_MEMORY_LIMITS_MB: 4000
KC_DB_POOL_INITIAL_SIZE: 30
KC_DB_POOL_MAX_SIZE: 30
KC_DB_POOL_MIN_SIZE: 30
KC_DATABASE: "aurora-postgres"
MULTI_AZ: "true"
KC_REPOSITORY: ${{ inputs.keycloakRepository }}
KC_BRANCH: ${{ inputs.keycloakBranch }}

- name: Create Accelerator Loadbalancer
if: ${{ inputs.activeActive }}
working-directory: provision/rosa-cross-dc
run: |
task global-accelerator-create 2>&1 | tee accelerator
echo "ACCELERATOR_DNS=$(grep -Po 'ACCELERATOR DNS: \K.*' accelerator)" >> $GITHUB_ENV
echo "ACCELERATOR_WEBHOOK=$(grep -Po 'ACCELERATOR WEBHOOK: \K.*' accelerator)" >> $GITHUB_ENV
env:
ACCELERATOR_NAME: ${{ env.CLUSTER_PREFIX }}
ROSA_CLUSTER_NAME_1: ${{ env.CLUSTER_PREFIX }}-a
ROSA_CLUSTER_NAME_2: ${{ env.CLUSTER_PREFIX }}-b

- name: Deploy Active/Active
if: ${{ inputs.activeActive }}
working-directory: provision/rosa-cross-dc
run: task active-active
env:
ACCELERATOR_DNS: ${{ env.ACCELERATOR_DNS }}
ACCELERATOR_WEBHOOK_URL: ${{ env.ACCELERATOR_WEBHOOK }}
ACCELERATOR_WEBHOOK_USERNAME: "keycloak"
ACCELERATOR_WEBHOOK_PASSWORD: ${{ env.KEYCLOAK_ADMIN_PASSWORD }}
AURORA_CLUSTER: ${{ env.CLUSTER_PREFIX }}
AURORA_REGION: ${{ env.REGION }}
ROSA_CLUSTER_NAME_1: ${{ env.CLUSTER_PREFIX }}-a
Expand Down
19 changes: 16 additions & 3 deletions .github/workflows/rosa-multi-az-cluster-delete.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ on:
type: string

jobs:
route53:
loadbalancer:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
Expand Down Expand Up @@ -40,19 +40,32 @@ jobs:
echo "SUBDOMAIN=$(echo $KEYCLOAK_URL | grep -oP '(?<=client.).*?(?=.keycloak-benchmark.com)')" >> $GITHUB_ENV
- name: Delete Route53 Records
run: |
./provision/aws/route53/route53_delete.sh
run: ./provision/aws/route53/route53_delete.sh
env:
SUBDOMAIN: ${{ env.SUBDOMAIN }}

- name: Set ACCELERATOR_DNS env variable for Global Accelerator processing
run: |
echo "ACCELERATOR_DNS=${KEYCLOAK_URL#"https://"}" >> $GITHUB_ENV
- name: Delete Global Accelerator
run: ./provision/aws/global-accelerator/accelerator_multi_az_delete.sh
env:
ACCELERATOR_DNS: ${{ env.ACCELERATOR_DNS }}
CLUSTER_1: ${{ inputs.clusterPrefix }}-a
CLUSTER_2: ${{ inputs.clusterPrefix }}-b
KEYCLOAK_NAMESPACE: runner-keycloak

cluster1:
needs: loadbalancer
uses: ./.github/workflows/rosa-cluster-delete.yml
with:
clusterName: ${{ inputs.clusterPrefix }}-a
deleteAll: no
secrets: inherit

cluster2:
needs: loadbalancer
uses: ./.github/workflows/rosa-cluster-delete.yml
with:
clusterName: ${{ inputs.clusterPrefix }}-b
Expand Down
38 changes: 22 additions & 16 deletions .github/workflows/rosa-run-crossdc-func-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,20 @@ on:
clusterPrefix:
description: 'The prefix used when creating the Cross DC clusters'
type: string
activeActive:
description: 'Must be true when testing against an Active/Active Keycloak deployment'
type: boolean
default: false

workflow_dispatch:
inputs:
clusterPrefix:
description: 'The prefix used when creating the Cross DC clusters'
type: string
activeActive:
description: 'Must be true when testing against an Active/Active Keycloak deployment'
type: boolean
default: false

concurrency:
# Only run once for the latest commit per ref and cancel other (previous) runs.
Expand All @@ -32,6 +40,7 @@ jobs:
distribution: 'temurin'
java-version: '17'
cache: 'maven'

- name: Cache Maven Wrapper
uses: actions/cache@v4
with:
Expand All @@ -40,37 +49,34 @@ jobs:
key: ${{ runner.os }}-maven-wrapper-${{ hashFiles('**/maven-wrapper.properties') }}
restore-keys: |
${{ runner.os }}-maven-wrapper-
- name: Setup ROSA CLI
uses: ./.github/actions/rosa-cli-setup
with:
aws-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-default-region: ${{ vars.AWS_DEFAULT_REGION }}
rosa-token: ${{ secrets.ROSA_TOKEN }}

- name: Login to OpenShift cluster A
uses: ./.github/actions/oc-keycloak-login
with:
clusterName: ${{ inputs.clusterPrefix }}-a
- name: Get DC1 URLs

- name: Get DC1 Context
shell: bash
run: |
KEYCLOAK_DC1_URL=https://$(kubectl get routes -n "${{ env.PROJECT }}" aws-health-route -o jsonpath='{.spec.host}')
echo "KEYCLOAK_DC1_URL=$KEYCLOAK_DC1_URL" >> "$GITHUB_ENV"
LOAD_BALANCER_URL=https://$(kubectl get routes -n "${{ env.PROJECT }}" -l app=keycloak -o jsonpath='{.items[*].spec.host}')
echo "LOAD_BALANCER_URL=$LOAD_BALANCER_URL" >> "$GITHUB_ENV"
ISPN_DC1_URL=https://$(kubectl get routes -n "${{ env.PROJECT }}" -l app=infinispan-service-external -o jsonpath='{.items[*].spec.host}')
echo "ISPN_DC1_URL=$ISPN_DC1_URL" >> "$GITHUB_ENV"
run: echo "KUBERNETES_1_CONTEXT=$(kubectl config current-context)" >> "$GITHUB_ENV"

- name: Login to OpenShift cluster B
uses: ./.github/actions/oc-keycloak-login
with:
clusterName: ${{ inputs.clusterPrefix }}-b
- name: Get DC2 URLs

- name: Get DC2 Context
shell: bash
run: |
KEYCLOAK_DC2_URL=https://$(kubectl get routes -n "${{ env.PROJECT }}" aws-health-route -o jsonpath='{.spec.host}')
echo "KEYCLOAK_DC2_URL=$KEYCLOAK_DC2_URL" >> "$GITHUB_ENV"
ISPN_DC2_URL=https://$(kubectl get routes -n "${{ env.PROJECT }}" -l app=infinispan-service-external -o jsonpath='{.items[*].spec.host}')
echo "ISPN_DC2_URL=$ISPN_DC2_URL" >> "$GITHUB_ENV"
run: echo "KUBERNETES_2_CONTEXT=$(kubectl config current-context)" >> "$GITHUB_ENV"

- name: Run CrossDC functional tests
run: |
./provision/rosa-cross-dc/keycloak-benchmark-crossdc-tests/run-crossdc-tests.sh
run: ./provision/rosa-cross-dc/keycloak-benchmark-crossdc-tests/run-crossdc-tests.sh
env:
ACTIVE_ACTIVE: ${{ inputs.activeActive }}
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -101,3 +101,4 @@ provision/environment_data.json
**/*.tfstate*
**/*.terraform*
!**/*.terraform.lock.hcl
provision/opentofu/modules/aws/accelerator/builds/*
4 changes: 4 additions & 0 deletions doc/kubernetes/collector/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,10 @@ helm template --debug ${STARTDIR}/../../../provision/infinispan/ispn-helm \
--set metrics.histograms=false \
--set hotrodPassword="strong-password" \
--set cacheDefaults.crossSiteMode=SYNC \
--set acceleratorDNS=a3da6a6cbd4e27b02.awsglobalaccelerator.com \
--set alertmanager.webhook.username=keycloak \
--set alertmanager.webhook.password=changme \
--set alertmanager.webhook.url=https://tjqr2vgc664b6noj6vugprakoq0oausj.lambda-url.eu-west-1.on.aws/ \
> ${BUILDDIR}/helm/ispn-site-a.yaml

# Infinispan site B deployment
Expand Down
1 change: 1 addition & 0 deletions doc/kubernetes/modules/ROOT/examples/stonith_lambda.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,9 +76,10 @@ oc login https://api.**<domain name>**:6443 -u **<username>**

NOTE: The session will expire approximately one a day, and you'll need to re-login.

== Enable user workload monitoring
== Enable alert routing for user-defined projects

By default, OpenShift HCP doesn't enable alert routing for user-defined projects.

By default, OpenShift doesn't monitor user workloads.
Apply the following ConfigMap link:{github-files}/provision/openshift/cluster-monitoring-config.yaml[cluster-monitoring-config.yaml] which is located in the `/provision/openshift` folder to OpenShift:

[source,bash]
Expand All @@ -93,14 +94,11 @@ After this has been deployed, several new pods spin up in the *openshift-user-wo
kubectl get pods -n openshift-user-workload-monitoring
----

The metrics and targets are then available in the menu entry *Observe* in the OpenShift console.

Additional steps are necessary to enable persistent volumes for the recorded metrics.
Alerts defined in `PrometheusRule` CR are then available to view in the menu entry *Observe->Alerting* in the OpenShift console.

Further reading:

* https://docs.openshift.com/container-platform/4.12/monitoring/configuring-the-monitoring-stack.html[Configure OpenShift monitoring stack]
* https://docs.openshift.com/container-platform/4.12/monitoring/enabling-monitoring-for-user-defined-projects.html[Enabling monitoring for user-defined projects]
* https://docs.openshift.com/rosa/observability/monitoring/enabling-alert-routing-for-user-defined-projects.html[Enabling alert routing for user-defined projects]

[#switching-between-different-kubernetes-clusters]
== Switching between different Kubernetes clusters
Expand Down
Loading

0 comments on commit f30cebc

Please sign in to comment.