-
Notifications
You must be signed in to change notification settings - Fork 459
Introduce a cron job that syncs the prod base into staging #633
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 2 commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
c7672d5
Sync prod base to staging
rdimitrov ec6aa73
Use the existing backups of prod instead
rdimitrov d51f3ea
Add additional checks to ensure we are in staging
rdimitrov d4119e9
Update wrong comment
rdimitrov b94b056
Log out from prod once we got the backup credentials
rdimitrov f39faa6
Cleanup workflow a bit
domdomegg File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,374 @@ | ||
| name: Sync Production DB to Staging (from backups) | ||
|
|
||
| on: | ||
| schedule: | ||
| # Run daily at 2 AM UTC (during low-traffic hours) | ||
| - cron: '0 2 * * *' | ||
| workflow_dispatch: # Allow manual triggering | ||
|
|
||
| permissions: | ||
| contents: read | ||
|
|
||
| jobs: | ||
| sync-database: | ||
| name: Sync Prod DB to Staging from k8up Backups | ||
| runs-on: ubuntu-latest | ||
| environment: staging | ||
| concurrency: | ||
| group: sync-staging-database | ||
| cancel-in-progress: false | ||
| steps: | ||
| - name: Authenticate to Google Cloud (Production) | ||
| uses: google-github-actions/auth@7c6bc770dae815cd3e89ee6cdf493a5fab2cc093 | ||
| with: | ||
| credentials_json: ${{ secrets.GCP_PROD_SERVICE_ACCOUNT_KEY }} | ||
|
|
||
| - name: Setup Google Cloud SDK | ||
| uses: google-github-actions/setup-gcloud@aa5489c8933f4cc7a4f7d45035b3b1440c9c10db | ||
| with: | ||
| project_id: mcp-registry-prod | ||
| install_components: gke-gcloud-auth-plugin | ||
|
|
||
| - name: Get backup credentials from prod cluster | ||
| id: backup-creds | ||
| run: | | ||
| # Connect to prod cluster to get backup credentials | ||
| gcloud container clusters get-credentials mcp-registry-prod \ | ||
| --zone=us-central1-b \ | ||
| --project=mcp-registry-prod | ||
|
|
||
| # Extract backup credentials from prod cluster | ||
| ACCESS_KEY=$(kubectl get secret k8up-backup-credentials -n default -o jsonpath='{.data.AWS_ACCESS_KEY_ID}' | base64 -d) | ||
| SECRET_KEY=$(kubectl get secret k8up-backup-credentials -n default -o jsonpath='{.data.AWS_SECRET_ACCESS_KEY}' | base64 -d) | ||
|
|
||
| # Store in outputs (GitHub Actions encrypts these automatically) | ||
| echo "access_key=$ACCESS_KEY" >> $GITHUB_OUTPUT | ||
| echo "secret_key=$SECRET_KEY" >> $GITHUB_OUTPUT | ||
|
|
||
| echo "✓ Backup credentials extracted from prod" | ||
|
|
||
| - name: Switch to staging cluster | ||
| uses: google-github-actions/auth@7c6bc770dae815cd3e89ee6cdf493a5fab2cc093 | ||
| with: | ||
| credentials_json: ${{ secrets.GCP_STAGING_SERVICE_ACCOUNT_KEY }} | ||
|
|
||
| - name: Configure staging cluster access | ||
| run: | | ||
| gcloud config set project mcp-registry-staging | ||
| gcloud container clusters get-credentials mcp-registry-staging \ | ||
| --zone=us-central1-b \ | ||
| --project=mcp-registry-staging | ||
|
|
||
| echo "✓ Connected to staging cluster" | ||
|
|
||
| - name: Create secret for prod backup bucket access | ||
| run: | | ||
| # Create/update secret in staging with read-only access to prod backups | ||
| kubectl create secret generic prod-backup-credentials \ | ||
| --from-literal=AWS_ACCESS_KEY_ID="${{ steps.backup-creds.outputs.access_key }}" \ | ||
| --from-literal=AWS_SECRET_ACCESS_KEY="${{ steps.backup-creds.outputs.secret_key }}" \ | ||
| --dry-run=client -o yaml | kubectl apply -f - | ||
|
|
||
| echo "✓ Backup credentials configured in staging" | ||
|
|
||
| - name: Create restore PVC | ||
| run: | | ||
| kubectl apply -f - <<EOF | ||
| apiVersion: v1 | ||
| kind: PersistentVolumeClaim | ||
| metadata: | ||
| name: restore-data-pvc | ||
| namespace: default | ||
| spec: | ||
| accessModes: | ||
| - ReadWriteOnce | ||
| resources: | ||
| requests: | ||
| storage: 50Gi | ||
| EOF | ||
|
|
||
| echo "✓ Restore PVC created" | ||
|
|
||
| - name: Trigger k8up restore from prod backups | ||
| id: restore | ||
| run: | | ||
| RESTORE_NAME="restore-from-prod-$(date +%Y%m%d-%H%M%S)" | ||
| echo "restore_name=$RESTORE_NAME" >> $GITHUB_OUTPUT | ||
|
|
||
| # Create a k8up Restore resource to restore from prod backups | ||
| kubectl apply -f - <<EOF | ||
| apiVersion: k8up.io/v1 | ||
| kind: Restore | ||
| metadata: | ||
| name: $RESTORE_NAME | ||
| namespace: default | ||
| spec: | ||
| snapshot: latest | ||
| restoreMethod: | ||
| folder: | ||
| claimName: restore-data-pvc | ||
| backend: | ||
| repoPasswordSecretRef: | ||
| name: k8up-repo-password | ||
| key: password | ||
| s3: | ||
| bucket: mcp-registry-prod-backups | ||
| endpoint: https://storage.googleapis.com | ||
| accessKeyIDSecretRef: | ||
| name: prod-backup-credentials | ||
| key: AWS_ACCESS_KEY_ID | ||
| secretAccessKeySecretRef: | ||
| name: prod-backup-credentials | ||
| key: AWS_SECRET_ACCESS_KEY | ||
| EOF | ||
|
|
||
| echo "✓ k8up restore triggered: $RESTORE_NAME" | ||
|
|
||
| - name: Wait for k8up restore to complete | ||
| run: | | ||
| RESTORE_NAME="${{ steps.restore.outputs.restore_name }}" | ||
|
|
||
| echo "Waiting for restore job to start..." | ||
| sleep 15 | ||
|
|
||
| # Find the job created by k8up for this restore | ||
| for i in {1..30}; do | ||
| JOB_NAME=$(kubectl get jobs -n default -l k8up.io/owned-by=restore -o jsonpath='{.items[?(@.metadata.ownerReferences[0].name=="'$RESTORE_NAME'")].metadata.name}' 2>/dev/null) | ||
| if [ -n "$JOB_NAME" ]; then | ||
| echo "Found restore job: $JOB_NAME" | ||
| break | ||
| fi | ||
| echo "Waiting for job to be created... ($i/30)" | ||
| sleep 2 | ||
| done | ||
|
|
||
| if [ -z "$JOB_NAME" ]; then | ||
| echo "ERROR: Restore job not found" | ||
| kubectl get restore $RESTORE_NAME -n default -o yaml | ||
| exit 1 | ||
| fi | ||
|
|
||
| # Wait for the restore job to complete (max 15 minutes) | ||
| kubectl wait --for=condition=complete \ | ||
| job/$JOB_NAME \ | ||
| --timeout=900s -n default || { | ||
| echo "Restore job failed or timed out" | ||
| kubectl describe job/$JOB_NAME -n default | ||
| kubectl logs job/$JOB_NAME -n default --tail=100 | ||
| exit 1 | ||
| } | ||
|
|
||
| echo "✓ k8up restore completed successfully" | ||
|
|
||
| - name: Find staging PostgreSQL PVC | ||
| id: pgdata-pvc | ||
| run: | | ||
| # Find the PVC used by the PostgreSQL cluster | ||
| PVC_NAME=$(kubectl get pvc -n default -l cnpg.io/cluster=registry-pg -o jsonpath='{.items[0].metadata.name}') | ||
|
|
||
| if [ -z "$PVC_NAME" ]; then | ||
| echo "ERROR: Could not find PostgreSQL PVC" | ||
| kubectl get pvc -n default -l cnpg.io/cluster=registry-pg | ||
| exit 1 | ||
| fi | ||
|
|
||
| echo "pvc_name=$PVC_NAME" >> $GITHUB_OUTPUT | ||
| echo "✓ Found PostgreSQL PVC: $PVC_NAME" | ||
|
|
||
| - name: Scale down staging PostgreSQL | ||
| run: | | ||
| echo "Scaling down PostgreSQL cluster..." | ||
| kubectl patch cluster registry-pg -n default \ | ||
| --type merge \ | ||
| --patch '{"spec":{"instances":0}}' | ||
|
|
||
| # Wait for pods to terminate | ||
| echo "Waiting for pods to terminate..." | ||
| kubectl wait --for=delete pod -l cnpg.io/cluster=registry-pg -n default --timeout=300s || true | ||
|
|
||
| echo "✓ PostgreSQL scaled down" | ||
|
|
||
| - name: Replace staging database with restored backup | ||
| id: copy-job | ||
| run: | | ||
| JOB_NAME="copy-pgdata-$(date +%Y%m%d-%H%M%S)" | ||
| echo "job_name=$JOB_NAME" >> $GITHUB_OUTPUT | ||
|
|
||
| # Create a job to copy the restored backup data to the staging PVC | ||
| kubectl apply -f - <<EOF | ||
| apiVersion: batch/v1 | ||
| kind: Job | ||
| metadata: | ||
| name: $JOB_NAME | ||
| namespace: default | ||
| spec: | ||
| ttlSecondsAfterFinished: 600 | ||
| template: | ||
| spec: | ||
| restartPolicy: Never | ||
| containers: | ||
| - name: copy-data | ||
| image: busybox:latest | ||
| command: | ||
| - /bin/sh | ||
| - -c | ||
| - | | ||
| set -e | ||
| echo "Finding PostgreSQL data in backup..." | ||
| echo "Restore structure:" | ||
| find /restore -maxdepth 3 -type d 2>/dev/null | head -20 | ||
|
|
||
| # Try different possible paths for pgdata | ||
| PGDATA_SOURCE="" | ||
| for path in \$(find /restore -type d -name "pgdata" 2>/dev/null); do | ||
| if [ -f "\$path/PG_VERSION" ]; then | ||
| PGDATA_SOURCE="\$path" | ||
| break | ||
| fi | ||
| done | ||
|
|
||
| if [ -z "\$PGDATA_SOURCE" ]; then | ||
| echo "ERROR: Could not find valid pgdata directory with PG_VERSION" | ||
| echo "Searched paths:" | ||
| find /restore -type d -name "pgdata" 2>/dev/null | ||
| exit 1 | ||
| fi | ||
|
|
||
| echo "Found pgdata at: \$PGDATA_SOURCE" | ||
| echo "Contents:" | ||
| ls -lah \$PGDATA_SOURCE/ | head -10 | ||
|
|
||
| echo "Backing up existing staging data..." | ||
| mkdir -p /pgdata-backup | ||
| if [ "\$(ls -A /pgdata)" ]; then | ||
| cp -a /pgdata/. /pgdata-backup/ || echo "Warning: Could not backup existing data" | ||
| fi | ||
|
|
||
| echo "Clearing existing data..." | ||
| rm -rf /pgdata/* | ||
|
|
||
| echo "Copying backup data to staging PVC..." | ||
| cp -a \$PGDATA_SOURCE/. /pgdata/ | ||
|
|
||
| echo "Setting correct permissions..." | ||
| chmod 700 /pgdata | ||
|
|
||
| echo "✓ Data copy completed" | ||
| ls -lah /pgdata/ | head -20 | ||
| echo "PostgreSQL version: \$(cat /pgdata/PG_VERSION)" | ||
| volumeMounts: | ||
| - name: restore-data | ||
| mountPath: /restore | ||
| - name: staging-pgdata | ||
| mountPath: /pgdata | ||
| volumes: | ||
| - name: restore-data | ||
| persistentVolumeClaim: | ||
| claimName: restore-data-pvc | ||
| - name: staging-pgdata | ||
| persistentVolumeClaim: | ||
| claimName: ${{ steps.pgdata-pvc.outputs.pvc_name }} | ||
| EOF | ||
|
|
||
| echo "✓ Copy job created: $JOB_NAME" | ||
|
|
||
| - name: Wait for data copy to complete | ||
| run: | | ||
| JOB_NAME="${{ steps.copy-job.outputs.job_name }}" | ||
|
|
||
| # Wait for copy to complete | ||
| kubectl wait --for=condition=complete job/$JOB_NAME --timeout=600s -n default || { | ||
| echo "Data copy job failed" | ||
| kubectl describe job/$JOB_NAME -n default | ||
| kubectl logs job/$JOB_NAME -n default --tail=100 | ||
| exit 1 | ||
| } | ||
|
|
||
| echo "✓ Database data replaced successfully" | ||
|
|
||
| - name: Scale up staging PostgreSQL | ||
| run: | | ||
| echo "Scaling up PostgreSQL cluster..." | ||
| kubectl patch cluster registry-pg -n default \ | ||
| --type merge \ | ||
| --patch '{"spec":{"instances":1}}' | ||
|
|
||
| # Wait for PostgreSQL pod to be created | ||
| echo "Waiting for PostgreSQL pod to be created..." | ||
| for i in {1..60}; do | ||
| POD_COUNT=$(kubectl get pods -l cnpg.io/cluster=registry-pg -n default --no-headers 2>/dev/null | wc -l) | ||
| if [ "$POD_COUNT" -gt 0 ]; then | ||
| echo "Pod created" | ||
| break | ||
| fi | ||
| echo "Waiting... ($i/60)" | ||
| sleep 2 | ||
| done | ||
|
|
||
| # Wait for PostgreSQL to be ready | ||
| echo "Waiting for PostgreSQL to be ready..." | ||
| kubectl wait --for=condition=ready pod -l cnpg.io/cluster=registry-pg -n default --timeout=300s | ||
|
|
||
| echo "✓ PostgreSQL is running" | ||
|
|
||
| - name: Verify staging DB is functional | ||
| run: | | ||
| # Create a verification pod | ||
| kubectl run pg-verify-$(date +%s) \ | ||
| --image=postgres:15 \ | ||
| --rm -i --restart=Never \ | ||
| --env="PGPASSWORD=$(kubectl get secret registry-pg-superuser -n default -o jsonpath='{.data.password}' | base64 -d)" \ | ||
| -- bash -c ' | ||
| echo "Waiting for database to accept connections..." | ||
| for i in {1..30}; do | ||
| if pg_isready -h registry-pg-rw -U postgres 2>/dev/null; then | ||
| break | ||
| fi | ||
| echo "Waiting... ($i/30)" | ||
| sleep 2 | ||
| done | ||
|
|
||
| echo "Querying database..." | ||
| TABLE_COUNT=$(psql -h registry-pg-rw -U postgres -d app -tAc "SELECT COUNT(*) FROM information_schema.tables WHERE table_schema = '\''public'\'';" 2>&1) | ||
|
|
||
| if [ $? -ne 0 ]; then | ||
| echo "ERROR: Could not query database" | ||
| echo "$TABLE_COUNT" | ||
| exit 1 | ||
| fi | ||
|
|
||
| if [ "$TABLE_COUNT" -lt 1 ]; then | ||
| echo "ERROR: Staging DB has no tables!" | ||
| exit 1 | ||
| fi | ||
|
|
||
| echo "✓ Staging DB is healthy with $TABLE_COUNT tables" | ||
|
|
||
| echo "Top 10 tables by row count:" | ||
| psql -h registry-pg-rw -U postgres -d app \ | ||
| -c "SELECT schemaname, tablename, n_live_tup FROM pg_stat_user_tables ORDER BY n_live_tup DESC LIMIT 10;" || true | ||
| ' | ||
|
|
||
| echo "✓ Database verification completed" | ||
|
|
||
| - name: Cleanup | ||
| if: always() | ||
| run: | | ||
| # Clean up jobs first | ||
| if [ -n "${{ steps.copy-job.outputs.job_name }}" ]; then | ||
| kubectl delete job ${{ steps.copy-job.outputs.job_name }} -n default || true | ||
| fi | ||
|
|
||
| # Remove restore PVC (will wait for jobs to finish) | ||
| kubectl delete pvc restore-data-pvc -n default || true | ||
|
|
||
| # Remove prod backup credentials (for security) | ||
| kubectl delete secret prod-backup-credentials -n default || true | ||
|
|
||
| # Clean up old restore resources (keep last 3) | ||
| kubectl get restore -n default --sort-by=.metadata.creationTimestamp -o name | head -n -3 | xargs -r kubectl delete || true | ||
|
|
||
| # Clean up old copy jobs (keep last 3) | ||
| kubectl get jobs -n default --sort-by=.metadata.creationTimestamp -o name | grep 'copy-pgdata-' | head -n -3 | xargs -r kubectl delete -n default || true | ||
|
|
||
| echo "✓ Cleanup completed" | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.