Skip to content

Commit

Permalink
Aurora multiple regions failover scenario #445 (#564)
Browse files Browse the repository at this point in the history
* Allow Aurora multi-az deployments to be created

* Add scripts to create Global Aurora cluster. Resolves #445

Automatic Aurora client failover with Route53 and Lambda

* fix some issues

* Add global aurora to rosa-cross-dc Taskfile

* review updates

* Fail aurora_delete_global_db script on errors

* Add additional logging

---------

Co-authored-by: Michal Hajas <[email protected]>
Co-authored-by: Kamesh Akella <[email protected]>
  • Loading branch information
3 people authored Oct 30, 2023
1 parent c424009 commit 2589900
Show file tree
Hide file tree
Showing 15 changed files with 817 additions and 50 deletions.
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -88,3 +88,7 @@ target
# Quarkus ephemeral data #
##########################
quarkus/data/*.db

# Python Virtual Environments
#############################
**/.venv/
101 changes: 101 additions & 0 deletions doc/kubernetes/modules/ROOT/pages/storage/aurora-global-postgres.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
= Using Amazon Global Aurora PostgreSQL Cluster
:description: An Amazon Global Aurora PostgreSQL Cluster can be used as the underlying database for Keycloak in either single,
or multi-site configurations. Currently this is only supported with Keycloak deployments on ROSA.

Aurora global database spans multiple AWS Regions, enabling disaster recovery from outages across an AWS Region.
Aurora automatically handles replicating all data and updates from the primary AWS Region to each of the secondary Regions.

== Deploying an Aurora Cluster

Aurora clusters can be deployed across multiple AWS regions by executing `./provision/aws/rds/aurora_create_global_db.sh` with the
following env:

[source]
----
AURORA_GLOBAL_REGIONS= # A list of AWS regions for the Aurora cluster to span. The first region in the list is where the Primary cluster is hosted.
AURORA_GLOBAL_CLUSTER= # The name of the Global Aurora cluster
AURORA_INSTANCES= # The number of Aurora db instances to create in each region, defaults to 1.
----

This creates an Aurora cluster per region and associates then with the Global Aurora cluster `$AURORA_GLOBAL_CLUSTER`.
The script waits until all regional clusters and their instance are available before returning. If the global cluster
already exists, a message indicating this is displayed and the script will fail with exit code 1.

An Aurora Global DB cluster consists of multiple regional clusters, each of which have their own dedicated Writer and Reader
endpoints. In order to abstract this, we create a Route53 CNAME entry that Keycloak instances must utilise to connect to
the database. The Route53 entry exposes the writer endpoint of the Aurora primary cluster at `$AURORA_GLOBAL_CLUSTER.aurora-global.keycloak-benchmark.com`.

In order to ensure that the aforementioned Route53 entry reflects the writer endpoint of the Primary cluster after failover,
we deploy an AWS Lambda function to each of the `$AURORA_GLOBAL_REGIONS`. This function is triggered on completion of a
global-failover event, in the region of the new Primary cluster, and updates the CNAME entry to point to the latest writer
endpoint.

[NOTE]
====
The specified `AURORA_GLOBAL_CLUSTER` must be unique per the AWS account and follow the conventions outlined for the
https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.CreateInstance.html#Aurora.CreateInstance.Settings["DB cluster identifier"].
====

== Connecting ROSA cluster to Aurora Clusters

A Peering Connection must be established between a ROSA cluster and each of the individual Aurora clusters.

To configure such a connection run `./provision/aws/rds/aurora_create_global_peering_connections.sh` with the following environment:

[source]
----
AURORA_GLOBAL_CLUSTER= # The name of the Global Aurora cluster
CLUSTER_NAME= # The name of the ROSA cluster to establish the peering connectin with
----

== Deploying Keycloak

When deploying Keycloak via the various task files, the following env variables must be set in order to ensure that the
correct DB endpoint is configured.

[source]
----
AURORA_GLOBAL_CLUSTER= # The name of the Global Aurora cluster
KC_DATABASE_URL=$AURORA_GLOBAL_CLUSTER.aurora-global.keycloak-benchmark.com
KC_DATABASE=aurora-postgres
----

== Simulating Cluster Failover
It's possible to trigger a failover from the Primary to Secondary Aurora cluster by executing the link:https://awscli.amazonaws.com/v2/documentation/api/latest/reference/rds/failover-global-cluster.html[failover-global-cluster] command:

[source]
----
aws rds failover-global-cluster \
--global-cluster-identifier ${AURORA_GLOBAL_CLUSTER} \
--target-db-cluster-identifier ${AURORA_CLUSTER_IDENTIFIER} \
--allow-data-loss
----

Where `AURORA_CLUSTER_IDENTIFIER` is the arn of the secondary cluster that you desire to become the Primary. The following command outputs arns for all members of the Global Aurora cluster:
[source]
----
aws rds describe-global-clusters \
--query "GlobalClusters[?GlobalClusterIdentifier=='${AURORA_GLOBAL_CLUSTER}'].GlobalClusterMembers[*].DBClusterArn"
----

== Disconnecting ROSA cluster from Aurora Cluster

To remove a Peering Connection between the ROSA and Aurora VPCS, execute `./provision/aws/rds/aurora_delete_global_peering_connection.sh`
with the following env:

[source]
----
AURORA_GLOBAL_CLUSTER= # The name of the Global Aurora cluster
CLUSTER_NAME= # The name of the ROSA cluster to remove the peering connection from
----

== Deleting an Aurora Cluster
Before deleting an Aurora cluster it's first necessary for all Peering Connections established with ROSA cluster(s) to
be removed.

To remove an Aurora cluster, execute `./provision/aws/rds/aurora_delete_global_db.sh` with the following env:

[source]
----
AURORA_GLOBAL_CLUSTER= # The name of the Global Aurora cluster
----
11 changes: 6 additions & 5 deletions doc/kubernetes/modules/ROOT/pages/storage/aurora-postgres.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,12 @@ following env:
----
AURORA_CLUSTER= # The name of the Aurora cluster
AURORA_REGION= # The AWS region hosting the Aurora cluster
AURORA_INSTANCES= # The number of Aurora db instances to create in the AURORA_REGION, defaults to 1
----

This creates the necessary VPCs, subnets and routes required by an Aurora cluster as well as a single Aurora instance
for said cluster. The script waits until both the cluster and instance are available. If the cluster already exists,
a message indiciating this is displayed and the script will fail with exit code 1.
This creates the necessary VPCs, subnets and routes required by an Aurora cluster as well as `$AURORA_INSTANCES` Aurora instances
for said cluster. The script waits until the cluster and all instance are available. If the cluster already exists,
a message indicating this is displayed and the script will fail with exit code 1.

[NOTE]
====
Expand Down Expand Up @@ -79,13 +80,13 @@ Upon exiting the pod shell, the pod will be deleted.
== Disconnecting ROSA cluster from Aurora Cluster

To remove a Peering Connection between the ROSA and Aurora VPCS, execute `./provision/aws/rds/aurora_delete_peering_connection.sh`
wit the the following env:
with the following env:

[source]
----
AURORA_CLUSTER= # The name of the Aurora cluster instance
AURORA_REGION= # The AWS region hosting the Aurora cluster
CLUSTER_NAME= # The name of the ROSA cluster to establish the peering connectin with
CLUSTER_NAME= # The name of the ROSA cluster to remove the peering connection from
AWS_REGION= # The AWS region hosting the ROSA cluster
----

Expand Down
58 changes: 50 additions & 8 deletions provision/aws/rds/aurora_cluster_reaper.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,22 @@ if [[ "$RUNNER_DEBUG" == "1" ]]; then
set -x
fi

function arnToRegion() {
arn=$1
arrIN=(${arn//:/ })
echo ${arrIN[3]}
}

function keepAlive() {
aws rds describe-db-clusters \
--region $1 \
--db-cluster-identifier $2 \
--query DBClusters[*].TagList[?key=='keepalive'] \
--output json \
| jq -c '.[]' \
| jq length
}

# Removes all Aurora DB clusters that are not tagged with the key "keepalive"
# To tag an Aurora Cluster with "keepalive" execute:
# `aws rds add-tags-to-resource --resource-name <arn> --tags Key=keepalive --region <region>`
Expand All @@ -12,6 +28,39 @@ fi
SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )
source ${SCRIPT_DIR}/aurora_common.sh

# Define a default region for global-cluster commands. This can be any region, but is required to prevent 'Invalid endpoint'
# errors
AWS_REGION=${AWS_REGION:-"eu-west-1"}

# Remove Global Aurora Clusters first to prevent aurora_delete.sh being triggered on Global cluster instances
GLOBAL_CLUSTERS=$(aws rds describe-global-clusters \
--query "GlobalClusters[*].GlobalClusterIdentifier" \
--output text
)

for AURORA_GLOBAL_CLUSTER in ${GLOBAL_CLUSTERS}; do
GLOBAL_CLUSTER_MEMBERS=$(aws rds describe-global-clusters \
--query "GlobalClusters[?GlobalClusterIdentifier=='${AURORA_GLOBAL_CLUSTER}']" \
| jq -r '.[]'
)
GLOBAL_CLUSTER_MEMBERS_ARNS=$(echo ${GLOBAL_CLUSTER_MEMBERS} | jq -r '.GlobalClusterMembers[].DBClusterArn')

KEEP_ALIVE=0
for AURORA_CLUSTER_ARN in ${GLOBAL_CLUSTER_MEMBERS_ARNS}; do
REGION=$(arnToRegion ${AURORA_CLUSTER_ARN})
KEEP_ALIVE=$((${KEEP_ALIVE} + $(keepAlive ${REGION} ${AURORA_CLUSTER_ARN})))
done

# If any of the Regional clusters associated with the Global cluster are tagged, don't attempt to remove the DB
if [ $((KEEP_ALIVE)) != "0" ]; then
continue
fi

export AURORA_GLOBAL_CLUSTER
${SCRIPT_DIR}/aurora_delete_global_db.sh
done

# Remove Single Region Aurora Clusters
REGIONS=$(aws ec2 describe-regions \
--query "Regions[*].RegionName" \
--output text
Expand All @@ -30,14 +79,7 @@ for REGION in ${REGIONS}; do
export AURORA_CLUSTER=$(echo $i | jq -r .DBClusterIdentifier)
export AURORA_INSTANCE=$(echo $i | jq -r .DBInstanceIdentifier)

KEEP_ALIVE=$(aws rds describe-db-clusters \
--region ${REGION} \
--db-cluster-identifier ${AURORA_CLUSTER} \
--query DBClusters[*].TagList[?key=='keepalive'] \
--output json \
| jq -c '.[]' \
| jq length
)
KEEP_ALIVE=$(keepAlive ${REGION} ${AURORA_CLUSTER})
if [ ${KEEP_ALIVE} == "0" ]; then
export AWS_REGION=${REGION}
export RUNNER_DEBUG=1
Expand Down
6 changes: 1 addition & 5 deletions provision/aws/rds/aurora_common.sh
Original file line number Diff line number Diff line change
@@ -1,14 +1,10 @@
#!/bin/bash
set -e

if [ -f ./.env ]; then
source ./.env
fi

export AURORA_CLUSTER=${AURORA_CLUSTER:-"keycloak"}
export AURORA_ENGINE=${AURORA_ENGINE:-"aurora-postgresql"}
export AURORA_ENGINE_VERSION=${AURORA_ENGINE_VERSION:-"15.3"}
export AURORA_INSTANCE=${AURORA_INSTANCE:-"${AURORA_CLUSTER}-instance-1"}
export AURORA_INSTANCES=${AURORA_INSTANCES:-"1"}
export AURORA_INSTANCE_CLASS=${AURORA_INSTANCE_CLASS:-"db.t4g.large"}
export AURORA_PASSWORD=${AURORA_PASSWORD:-"secret99"}
export AURORA_REGION=${AURORA_REGION}
Expand Down
38 changes: 23 additions & 15 deletions provision/aws/rds/aurora_create.sh
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,12 @@ SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )
source ${SCRIPT_DIR}/aurora_common.sh

# https://cloud.redhat.com/blog/using-vpc-peering-to-connect-an-openshift-service-on-an-aws-rosa-cluster-to-an-amazon-rds-mysql-database-in-a-different-vpc
EXISTING_INSTANCE=$(aws rds describe-db-instances \
--query 'DBInstances[*].[DBInstanceIdentifier]' \
--filters Name=db-instance-id,Values=${AURORA_INSTANCE} \
EXISTING_INSTANCES=$(aws rds describe-db-instances \
--query "DBInstances[?starts_with(DBInstanceIdentifier, '${AURORA_CLUSTER}')].DBInstanceIdentifier" \
--output text
)
if [ -n "${EXISTING_INSTANCE}" ]; then
echo "Aurora instance '${AURORA_INSTANCE}:${AWS_REGION}' already exists"
if [ -n "${EXISTING_INSTANCES}" ]; then
echo "Aurora instances '${EXISTING_INSTANCES}' already exist in the '${AWS_REGION}' region"
exit 0
fi

Expand Down Expand Up @@ -73,21 +72,30 @@ AURORA_SECURITY_GROUP_ID=$(aws ec2 create-security-group \
| jq -r '.GroupId'
)

if [ -z ${AURORA_GLOBAL_CLUSTER_BACKUP} ]; then
AURORA_MASTER_USER="--master-username ${AURORA_USERNAME} --master-user-password ${AURORA_PASSWORD}"
AURORA_DATABASE_NAME="--database-name keycloak"
fi

# Create the Aurora DB cluster and instance
aws rds create-db-cluster \
--db-cluster-identifier ${AURORA_CLUSTER} \
--database-name keycloak \
${AURORA_DATABASE_NAME} \
--engine ${AURORA_ENGINE} \
--engine-version ${AURORA_ENGINE_VERSION} \
--master-username ${AURORA_USERNAME} \
--master-user-password ${AURORA_PASSWORD} \
${AURORA_MASTER_USER} \
--vpc-security-group-ids ${AURORA_SECURITY_GROUP_ID} \
--db-subnet-group-name ${AURORA_SUBNET_GROUP_NAME}
--db-subnet-group-name ${AURORA_SUBNET_GROUP_NAME} \
${AURORA_GLOBAL_CLUSTER_IDENTIFIER}

aws rds create-db-instance \
--db-cluster-identifier ${AURORA_CLUSTER} \
--db-instance-identifier ${AURORA_INSTANCE} \
--db-instance-class ${AURORA_INSTANCE_CLASS} \
--engine ${AURORA_ENGINE}
for i in $( seq ${AURORA_INSTANCES} ); do
aws rds create-db-instance \
--db-cluster-identifier ${AURORA_CLUSTER} \
--db-instance-identifier "${AURORA_CLUSTER}-instance-${i}" \
--db-instance-class ${AURORA_INSTANCE_CLASS} \
--engine ${AURORA_ENGINE}
done

aws rds wait db-instance-available --db-instance-identifier ${AURORA_INSTANCE}
for i in $( seq ${AURORA_INSTANCES} ); do
aws rds wait db-instance-available --db-instance-identifier "${AURORA_CLUSTER}-instance-${i}"
done
Loading

0 comments on commit 2589900

Please sign in to comment.