Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Zitadel deployment on EKS with AWS PostgreSQL RDS instance - setup job failure #281

Open
2 tasks done
jmutai opened this issue Oct 17, 2024 · 15 comments
Open
2 tasks done
Assignees
Labels
bug Something isn't working devops

Comments

@jmutai
Copy link

jmutai commented Oct 17, 2024

Preflight Checklist

  • I could not find a solution in the documentation, the existing issues or discussions
  • I have joined the ZITADEL chat

Environment

Self-hosted

Version

8.5.0

Database

PostgreSQL

Database Version

16

Describe the problem caused by this bug

Attempting installation of Zitadel on EKS Kubernetes cluster fails at setup job when using AWS RDS for PostgreSQL. Initially the issue was with SSL, but after setting SSL mode to disable, and rds.force_ssl = 0 on RDS, I was able to connect successfully to RDS using username and password.

Database initialization was also successful, but setup job will get stuck to infinity with the following error:

time="2024-10-16T18:28:34Z" level=warning msg="migration already started, will check again in 5 seconds" caller="/home/runner/work/zitadel/zitadel/internal/migration/migration.go:130" migration step=projections.login_policies5

Can you help point out if this is known issue and there is a workaround.

To reproduce

  1. Provision PostgreSQL DB instance on RDS
  2. Edit Parameter Group and set rds.force_ssl = 0
  3. Create a secret with RDS DB credentials - host and password
  4. Deploy Zitadel on Kubernetes using Helm, while using correct values for configSecretName and configSecretKey
  5. Zitadel init job will be successful
  6. Setup job will not be successful

Screenshots

Image

Expected behavior

Init and Setup job successful for Zitadel to be healthy and running.

Operating System

Kubernetes (AWS EKS) - 1.30

Relevant Configuration

Database:
Postgres:
Port: 5432
Database:
MaxOpenConns: 20
MaxIdleConns: 10
MaxConnLifetime: 30m
MaxConnIdleTime: 5m
User:
Username: zitadel
SSL:
Mode: disable
Admin:
Username: zitadel
SSL:
Mode: disable
configSecretName: zitadel-config-secret
configSecretKey: config.yaml

Additional Context

No response

@jmutai jmutai added the bug Something isn't working label Oct 17, 2024
@suchitsancheti
Copy link

suchitsancheti commented Oct 23, 2024

Hi,

I am also facing the same issue while upgrading the zitadel from v2.55.8 to 2.56.0. The deployment is based on EKS v1.31 with RDS Postgres v15.7

Please suggest

@suchitsancheti
Copy link

Any update/workaround on this issue?

@Smana
Copy link
Contributor

Smana commented Nov 7, 2024

The ssl mode should be require (I switched from RDS to CNPG but that was working with RDS).
Here is my repository: https://github.com/Smana/cloud-native-ref

More specificaly here

@adlerhurst
Copy link
Member

adlerhurst commented Nov 11, 2024

The error of the screenshot can be fixed by executing zitadel setup cleanup cli command

@jmutai
Copy link
Author

jmutai commented Nov 11, 2024

The error of the screenshot can be fixed by executing zitadel setup cleanup cli command

We are running the installation of Zitadel in Gitops version. Is there solution that suits an automated setup process?

@jmutai
Copy link
Author

jmutai commented Nov 11, 2024

@Smana Do you have working configurations for RDS that you can share?. I will really appreciate.

@suchitsancheti
Copy link

@adlerhurst How can we run the zitadel setup cleanup when Zitadel is deployed in EKS using helm chart?

@Smana
Copy link
Contributor

Smana commented Nov 11, 2024

@jmutai As I mentioned the trick is to keep the default sslMode for RDS but ensure that Zitadel uses require

These are the envvars that I source

ZITADEL_DATABASE_POSTGRES_USER_SSL_MODE=require
ZITADEL_DATABASE_POSTGRES_ADMIN_SSL_MODE=require

And here is the working code (when I was using RDS).

@jmutai
Copy link
Author

jmutai commented Nov 11, 2024

@Smana will test this out and see on how it goes. Thank you!

@dfry
Copy link

dfry commented Nov 11, 2024

@Smana thanks for the example. I am confused though. You are setting the master password in the crossplane config also referencing the same envvar secret. Does that mean that the master password is just being used for other rds postgresql admin funcionality for the database and not by zitadel? I am missing where rds is being configured to trust client certs issued by the vault cert manager cluster issuer.

@dfry
Copy link

dfry commented Nov 11, 2024

I was confusing myself, now I see you are using username/password auth for connecting to RDS and not client certs.

@Smana
Copy link
Contributor

Smana commented Nov 11, 2024

I was confusing myself, now I see you are using username/password auth for connecting to RDS and not client certs.

Yes and all the configurations are loaded from environment variables.

@suchitsancheti
Copy link

After using the zitadel setup cleanup and env variables as mentioned above, the setup job is stuck at following:

time="2024-11-15T14:38:08Z" level=info msg="verify migration" caller="/home/runner/work/zitadel/zitadel/internal/migration/migration.go:43" name=23_correct_global_unique_constraints time="2024-11-15T14:38:08Z" level=info msg="verify migration" caller="/home/runner/work/zitadel/zitadel/internal/migration/migration.go:43" name=24_add_actor_col_to_auth_tokens time="2024-11-15T14:38:08Z" level=info msg="verify migration" caller="/home/runner/work/zitadel/zitadel/internal/migration/migration.go:43" name=26_auth_users3 time="2024-11-15T14:38:08Z" level=info msg="verify migration" caller="/home/runner/work/zitadel/zitadel/internal/migration/migration.go:43" name=29_init_fields_for_project_grant time="2024-11-15T14:38:08Z" level=info msg="starting migration" caller="/home/runner/work/zitadel/zitadel/internal/migration/migration.go:66" name=29_init_fields_for_project_grant

Please suggest

@jmutai
Copy link
Author

jmutai commented Nov 15, 2024

@suchitsancheti did you try increasing activeDeadlineSeconds value of setupJob?

setupJob:
     activeDeadlineSeconds: 600

@suchitsancheti
Copy link

@jmutai I have tried with values 600, 900 and 1200 and got the same result.
Anything else that I can try/check?

@eliobischof eliobischof assigned eliobischof and unassigned adlerhurst Dec 3, 2024
@eliobischof eliobischof transferred this issue from zitadel/zitadel Dec 3, 2024
@hifabienne hifabienne moved this to 🧐 Investigating in Product Management Dec 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working devops
Projects
None yet
Development

No branches or pull requests

7 participants