You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What to do when CHT 4.x upgrades don't work as planned
9
9
relatedContent: >
10
10
hosting/4.x/data-migration
11
11
---
12
12
13
-
With 4.x well into a mature stage as 4.0.0 was released in November of 2022, Medic has learned a number of important lessons on how to unstick 4.x upgrades that get stuck. Below are some specific tips as well as general practices on upgrading 4.x.
13
+
4.0.0 was released in November of 2022 so 4.x is now well into a mature and Medic has learned a number of important lessons on how to unstick 4.x upgrades that get stuck. Below are some specific tips as well as general practices on upgrading 4.x.
14
14
15
15
{{% pageinfo %}}
16
16
All tips apply to both [Docker]({{< relref "hosting/4.x/production/docker" >}}) and [Kubernetes]({{< relref "hosting/4.x/production/kubernetes" >}}) based deployments unless otherwise specified.
17
17
18
18
All upgrades are expected to succeed without issue. Do not attempt any fixes unless you actively have a problem upgrading.
19
19
{{% /pageinfo %}}
20
20
21
-
## Before you start
21
+
## Considerations
22
22
23
-
tk - flesh out, but be prepared by:
23
+
When troubleshooting, consider make sure there are:
24
24
25
-
* Have and have tested backups
26
-
* Have extra disk space (up to 5x!)
27
-
* Have tested the upgrade on a dev instance
28
-
* ?
25
+
* Backups exist and restores have been tested
26
+
* Extra disk space is availabe (up to 5x!)
27
+
* The upgrade has been tested on a development instance with production data
29
28
30
29
## A go-to fix: restart
31
30
32
-
A safe fix for any upgrade getting stuck is to restart all services. Any views that were being re-indexed will be picked up where they left off without loosing any work. This should be your first step when trouble shooting a stuck upgrade.
31
+
A safe fix for any upgrade getting stuck is to restart all services. Any views that were being re-indexed will be picked up where they left off without loosing any work. This should be your first step when trouble shooting a stuck upgrade.
32
+
33
+
If you're able to, after a restart go back into the admin web GUI and try to upgrade again. Consider trying this at least twice.
33
34
34
35
## CHT 4.0.x - 4.3.x: CouchDB Crashes
35
36
36
-
**[issue](https://github.com/medic/cht-core/issues/9286)**: Starting an upgrade that involves view indexing can cause CouchDB to crash on large databases (>30m docs)
37
+
**[Issue #9286](https://github.com/medic/cht-core/issues/9286)**: Starting an upgrade that involves view indexing can cause CouchDB to crash on large databases (>30m docs). The upgrade will fail and you will see the logs below when you have this issue.
37
38
38
39
HAProxy:
39
40
@@ -52,16 +53,16 @@ CouchDB
52
53
```
53
54
54
55
**Fix:**
55
-
1.I'm checking that all the indexes are warmed by loading them one by one in fauxton.
56
-
2. Restart all services, **retry** upgrade from Admin GUI (not cancel and upgrade)
56
+
1.Check that all the indexes are warmed by loading them one by one in fauxton.
57
+
2. Restart all services, **retry** upgrade from Admin GUI - do not cancel and upgrade.
57
58
58
-
## CHT 4.2.4 - 4.c.x: view indexing can become stuck after indexing is finished
59
+
## CHT 4.0.0 - 4.2.2: view indexing can become stuck after indexing is finished
59
60
60
-
**[issue](https://github.com/medic/cht-core/issues/9617):** Starting an upgrade that involves view indexing can become stuck after indexing is finished
61
+
**[Issue #9617](https://github.com/medic/cht-core/issues/9617):** Starting an upgrade that involves view indexing can become stuck after indexing is finished
61
62
62
-
upgrade process stalls after view indexes are built
63
+
Upgrade process stalls while trying to index staged views:
63
64
64
-
tk - get screenshot of admin UI with no progress bar
65
+

65
66
66
67
**Fix:**
67
68
@@ -71,14 +72,14 @@ Unfortunately, the workaround is manual and very technical and involves:
71
72
* The admin upgrade page will say that the upgrade was interrupted, click retry upgrade.
72
73
* Depending on the state of the database, you might see view indexing again. Depending on how many docs need to be indexed, indexing might get stuck again. Go back to 1 if that happens.
73
74
* Eventually, when indexing jobs are short enough not to trigger a request hang, you will get the button to complete the upgrade.
74
-
*
75
+
75
76
## CHT 4.0.1 - 4.9.0: CouchDB restart causes all services to go down
76
77
77
78
**Note** - This is a Docker only issue.
78
79
79
-
**[issue](https://github.com/medic/cht-core/issues/9284)**: A couchdb restart in single node docker takes down the whole instance.
80
+
**[Issue #9284](https://github.com/medic/cht-core/issues/9284)**: A couchdb restart in single node docker takes down the whole instance. The upgrade will fail and you will see the logs below when you have this issue.
2024/07/25 18:40:28 [error] 43#43: *5757 connect() failed (111: Connection refused) while connecting to upstream, client: 172.18.0.1,
97
98
```
98
99
99
-
100
100
**Fix:** Restart all services
101
101
102
102
103
103
## CHT 4.x.x upgrade to 4.x.x - no more free disk space
104
104
105
-
[Issue](https://github.com/moh-kenya/config-echis-2.0/issues/2578#issuecomment-2455702112): prod instance couch is crashing, stuck at compaction initiation - escalated to MoH Team to resolve [lack of free disk space issue]
105
+
**Issue\*:** Couch is crashing during upgrade. The upgrade will fail and you will see the logs below when you have this issue. While there's two log scenarios, both have the same fix.
[info] 2024-11-04T20:18:46.692239Z [email protected] <0.6832.4663> -------- Starting compaction for db "shards/7ffffffe-95555552/medic-user-mikehaya-meta.1690191139" at 10
119
+
[info] 2024-11-04T20:19:47.821999Z [email protected] <0.7017.4653> -------- Starting compaction for db "shards/7ffffffe-95555552/medic-user-marnyakoa-meta.1690202463" at 21
120
+
[info] 2024-11-04T20:21:24.529822Z [email protected] <0.24125.4661> -------- Starting compaction for db "shards/7ffffffe-95555552/medic-user-lilian_lubanga-meta.1690115504" at 15
121
+
```
108
122
109
123
**Fix:** Give CouchDB more disk and Restart all services
110
124
125
+
_* See eCHIS Kenya [Issue #2578](https://github.com/moh-kenya/config-echis-2.0/issues/2578#issuecomment-2455702112) - a private repo and not available to the public_
111
126
112
-
## CHT 4.2.x upgrade to 4.11 - kubernetes has pods stuck in indeterminate state
127
+
128
+
## CHT 4.2.x upgrade to 4.11 - Kubernetes has pods stuck in indeterminate state
113
129
114
130
**Note** - This is a Kubernetes only issue.
115
131
116
-
[Issue](https://github.com/moh-kenya/config-echis-2.0/issues/2579#issuecomment-2455637516): A number of pods were stuck in indeterminate state, presumably because of failed garbage collection
132
+
**Issue\*:** A number of pods were stuck in indeterminate state, presumably because of failed garbage collection
133
+
134
+
API Logs:
117
135
118
-
API Logs
119
136
```shell
120
137
2024-11-04 19:33:56 ERROR: Server error: StatusCodeError: 500 - {"message":"Error: Can't upgrade right now.
121
138
The following pods are not ready...."}
@@ -127,7 +144,8 @@ Running `kubectl get po` shows 3 pods with status of `ContainerStatusUnknown`:
127
144
128
145
**Fix:** delete pods so they get recreated and start cleanly
129
146
130
-
(tk - is this syntax legal/correct?)
131
-
132
-
`kubectl delete po 'cht.service in (api, sentinel, haproxy, couchdb)'`
147
+
```shell
148
+
kubectl delete po 'cht.service in (api, sentinel, haproxy, couchdb)'
149
+
```
133
150
151
+
_* See eCHIS Kenya [Issue #2579](https://github.com/moh-kenya/config-echis-2.0/issues/2579#issuecomment-2455637516) - a private repo and not available to the public_
0 commit comments