Skip to content

Not working RocketChat when primary mongo from PSA setup goes down. #34875

@dusatvoj

Description

@dusatvoj

Description:

Today I did some upgrade tasks and I had to shut down for a while one of cluster's hypervisors which has at local ZVols (due to performance reasons) database masters.

I have MongoDB in PSA setup. When I shut down MongoDB primary, secondary was elected to new primary and RocketChat crashed. All of my 3 instances went unhealthy and K8s starts killing them (CrashLoopBackOff).

I have another app consuming data from cluster and the data in second application could be read. Can't get why RC was broken.

So outage happend during a day - it's not acceptable for us but I did that many times recently and there were no problem. RocketChat connected to MongoDB, did some queries, start server but does not start responding (as it was degraded, ingress doesn't route traffic to them) -> 503 error.

Steps to reproduce:

  1. Deploy HA RC with Mongo in PSA setup.
  2. Shut down master of mongodb instances
  3. RocketChat went down.

Expected behavior:

RocketChat stays alive.

Actual behavior:

503, RC pods unhealthy, repeating System not healthy due to no real time data received recently but started from crashloopbackoff

Server Setup Information:

  • Version of Rocket.Chat Server: 6.13.1
  • License Type: Enterprise
  • Number of Users: 100+
  • Operating System: Linux
  • Deployment Method: Kubernetes via https://artifacthub.io/packages/helm/rocketchat-server/rocketchat (v6.11.5)
  • Number of Running Instances: 3
  • DB Replicaset Oplog: yes
  • NodeJS Version: idk, your image
  • MongoDB Version: 5.0.23

Additional context

Relevant logs:

(node:1) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023.
Please migrate your code to use AWS SDK for JavaScript (v3).
For more information, check the migration guide at https://a.co/7PzMCcy
(Use `node --trace-warnings ...` to show where the warning was created)
LocalStore: store created at
LocalStore: store created at
LocalStore: store created at
{"level":40,"time":"2025-01-03T13:13:12.434Z","pid":1,"hostname":"rocketchat-rocketchat-58946bb66b-cx5bt","name":"VoIPAsteriskService","msg":"Voip is not enabled. Cant start the service"}
{"level":51,"time":"2025-01-03T13:13:12.476Z","pid":1,"hostname":"rocketchat-rocketchat-58946bb66b-cx5bt","name":"Migrations","msg":"Not migrating, already at version 304"}
[2025-01-03T13:13:12.487Z] INFO  2a1b6d25-99dd-44e9-b8cc-2b74cff6638e/BROKER: Moleculer v0.14.34 is starting...
[2025-01-03T13:13:12.487Z] INFO  2a1b6d25-99dd-44e9-b8cc-2b74cff6638e/BROKER: Namespace: <not defined>
[2025-01-03T13:13:12.487Z] INFO  2a1b6d25-99dd-44e9-b8cc-2b74cff6638e/BROKER: Node ID: 2a1b6d25-99dd-44e9-b8cc-2b74cff6638e
[2025-01-03T13:13:12.489Z] INFO  2a1b6d25-99dd-44e9-b8cc-2b74cff6638e/REGISTRY: Strategy: RoundRobinStrategy
[2025-01-03T13:13:12.489Z] INFO  2a1b6d25-99dd-44e9-b8cc-2b74cff6638e/REGISTRY: Discoverer: LocalDiscoverer
[2025-01-03T13:13:12.491Z] INFO  2a1b6d25-99dd-44e9-b8cc-2b74cff6638e/BROKER: Serializer: EJSONSerializer
[2025-01-03T13:13:12.519Z] INFO  2a1b6d25-99dd-44e9-b8cc-2b74cff6638e/BROKER: Validator: FastestValidator
[2025-01-03T13:13:12.522Z] INFO  2a1b6d25-99dd-44e9-b8cc-2b74cff6638e/BROKER: Registered 13 middleware(s).
[2025-01-03T13:13:12.524Z] INFO  2a1b6d25-99dd-44e9-b8cc-2b74cff6638e/BROKER: Transporter: TcpTransporter
{"level":51,"time":"2025-01-03T13:13:12.560Z","pid":1,"hostname":"rocketchat-rocketchat-58946bb66b-cx5bt","name":"DatabaseWatcher","msg":"Using change streams"}
[2025-01-03T13:13:12.574Z] INFO  2a1b6d25-99dd-44e9-b8cc-2b74cff6638e/TRANSIT: Connecting to the transporter...
[2025-01-03T13:13:12.584Z] INFO  2a1b6d25-99dd-44e9-b8cc-2b74cff6638e/TRANSPORTER: TCP server is listening on port 32795
[2025-01-03T13:13:12.585Z] INFO  2a1b6d25-99dd-44e9-b8cc-2b74cff6638e/TRANSPORTER: UDP Discovery is disabled.
[2025-01-03T13:13:12.585Z] INFO  2a1b6d25-99dd-44e9-b8cc-2b74cff6638e/TRANSPORTER: TCP Transporter started.
{"level":51,"time":"2025-01-03T13:13:13.350Z","pid":1,"hostname":"rocketchat-rocketchat-58946bb66b-cx5bt","name":"License","msg":"License installed","version":"3.0","hash":"Wo6UKZeI"}
[2025-01-03T13:13:13.485Z] INFO  2a1b6d25-99dd-44e9-b8cc-2b74cff6638e/REGISTRY: '$node' service is registered.
[2025-01-03T13:13:13.486Z] INFO  2a1b6d25-99dd-44e9-b8cc-2b74cff6638e/REGISTRY: 'matrix' service is registered.
[2025-01-03T13:13:13.487Z] INFO  2a1b6d25-99dd-44e9-b8cc-2b74cff6638e/$NODE: Service '$node' started.
[2025-01-03T13:13:13.487Z] INFO  2a1b6d25-99dd-44e9-b8cc-2b74cff6638e/MATRIX: Service 'matrix' started.
[2025-01-03T13:13:13.488Z] INFO  2a1b6d25-99dd-44e9-b8cc-2b74cff6638e/BROKER: ✔ ServiceBroker with 2 service(s) started successfully in 940ms.
(node:1) [MONGODB DRIVER] Warning: cursor.count is deprecated and will be removed in the next major version, please use `collection.estimatedDocumentCount` or `collection.countDocuments` instead
+---------------------------------------------+
|                SERVER RUNNING               |
+---------------------------------------------+
|                                             |
|  Rocket.Chat Version: 6.13.1                |
|       NodeJS Version: 14.21.3 - x64         |
|      MongoDB Version: 5.0.23                |
|       MongoDB Engine: wiredTiger            |
|             Platform: linux                 |
|         Process Port: 3000                  |
|             Site URL: https://im.example.com/  |
|     ReplicaSet OpLog: Enabled               |
|          Commit Hash: 39d3b2e16c            |
|        Commit Branch: HEAD                  |
|                                             |
+---------------------------------------------+
{"level":51,"time":"2025-01-03T13:13:22.155Z","pid":1,"hostname":"rocketchat-rocketchat-58946bb66b-cx5bt","name":"License","msg":"License installed","version":"3.0","hash":"ZFm6DOKo"}
{"level":50,"time":"2025-01-03T13:13:29.845Z","pid":1,"hostname":"rocketchat-rocketchat-58946bb66b-cx5bt","name":"System","msg":"System not healthy due to no real time data received recently"}
{"level":50,"time":"2025-01-03T13:13:32.564Z","pid":1,"hostname":"rocketchat-rocketchat-58946bb66b-cx5bt","name":"System","msg":"No real time data received recently"}
{"level":50,"time":"2025-01-03T13:13:44.672Z","pid":1,"hostname":"rocketchat-rocketchat-58946bb66b-cx5bt","name":"System","msg":"System not healthy due to no real time data received recently"}
[2025-01-03T13:13:49.354Z] INFO  2a1b6d25-99dd-44e9-b8cc-2b74cff6638e/REGISTRY: Node '4431cc14-4780-4f06-8799-8b174f044702' reconnected.
{"level":50,"time":"2025-01-03T13:13:52.565Z","pid":1,"hostname":"rocketchat-rocketchat-58946bb66b-cx5bt","name":"System","msg":"No real time data received recently"}
{"level":50,"time":"2025-01-03T13:13:54.248Z","pid":1,"hostname":"rocketchat-rocketchat-58946bb66b-cx5bt","name":"System","msg":"System not healthy due to no real time data received recently"}
{"level":50,"time":"2025-01-03T13:13:59.670Z","pid":1,"hostname":"rocketchat-rocketchat-58946bb66b-cx5bt","name":"System","msg":"System not healthy due to no real time data received recently"}
{"level":50,"time":"2025-01-03T13:13:59.672Z","pid":1,"hostname":"rocketchat-rocketchat-58946bb66b-cx5bt","name":"System","msg":"System not healthy due to no real time data received recently"}
{"level":50,"time":"2025-01-03T13:14:12.565Z","pid":1,"hostname":"rocketchat-rocketchat-58946bb66b-cx5bt","name":"System","msg":"No real time data received recently"}
{"level":50,"time":"2025-01-03T13:14:14.670Z","pid":1,"hostname":"rocketchat-rocketchat-58946bb66b-cx5bt","name":"System","msg":"System not healthy due to no real time data received recently"}
{"level":50,"time":"2025-01-03T13:14:14.672Z","pid":1,"hostname":"rocketchat-rocketchat-58946bb66b-cx5bt","name":"System","msg":"System not healthy due to no real time data received recently"}
{"level":50,"time":"2025-01-03T13:14:29.248Z","pid":1,"hostname":"rocketchat-rocketchat-58946bb66b-cx5bt","name":"System","msg":"System not healthy due to no real time data received recently"}
{"level":50,"time":"2025-01-03T13:14:29.670Z","pid":1,"hostname":"rocketchat-rocketchat-58946bb66b-cx5bt","name":"System","msg":"System not healthy due to no real time data received recently"}
{"level":50,"time":"2025-01-03T13:14:29.672Z","pid":1,"hostname":"rocketchat-rocketchat-58946bb66b-cx5bt","name":"System","msg":"System not healthy due to no real time data received recently"}
{"level":50,"time":"2025-01-03T13:14:30.205Z","pid":1,"hostname":"rocketchat-rocketchat-58946bb66b-cx5bt","name":"System","msg":"System not healthy due to no real time data received recently"}
[2025-01-03T13:14:30.333Z] INFO  2a1b6d25-99dd-44e9-b8cc-2b74cff6638e/$NODE: Service '$node' stopped.
[2025-01-03T13:14:30.334Z] INFO  2a1b6d25-99dd-44e9-b8cc-2b74cff6638e/MATRIX: Service 'matrix' stopped.
[2025-01-03T13:14:30.336Z] INFO  2a1b6d25-99dd-44e9-b8cc-2b74cff6638e/BROKER: ServiceBroker is stopped. Good bye.

It says it's connected to mongo but it doesn't work.

In Mongo, there's no relevant errors (only info about success connection, etc. and a bit of queries up to 700ms, nothing really wrong).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions