Skip to content

Commit

Permalink
Add liveness / startup probe using chronicle import
Browse files Browse the repository at this point in the history
Will mitigate 2 failure modes

* Sawtooth session unavailability is undetectable without higher level transaction monitoring, so chronicle can get in a state where it does not recieve events

* Chronicle can be in catch-up state and marked as available, producing timeouts in client applications until the index process is complete

Signed-off-by: Ryan <[email protected]>
  • Loading branch information
ryan-s-roberts committed Apr 18, 2024
1 parent 8149986 commit ee4b6f9
Show file tree
Hide file tree
Showing 6 changed files with 152 additions and 6 deletions.
2 changes: 1 addition & 1 deletion charts/chronicle/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ keywords:
# This is the chart version. This version number should be incremented each
# time you make changes to the chart and its templates, including the app
# version.
version: 0.1.20
version: 0.1.21

# This is the version number of Chronicle being deployed. This version
# number should be incremented each time you make changes to Chronicle.
Expand Down
16 changes: 16 additions & 0 deletions charts/chronicle/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,3 +58,19 @@
| `postgres.persistence.size` | postgres PVC volume size | string | "40Gi" |
| `postgres.resources` | resources | map | nil |
| `resources` | resources | map | nil |
| `livenessProbe.enabled` | if true, enables the liveness probe | false |
| `livenessProbe.initialDelaySeconds` | delay before liveness probe is initiated | 30 |
| `livenessProbe.periodSeconds` | how often to perform the probe | 10 |
| `livenessProbe.timeoutSeconds` | when the probe times out | 1 |
| `livenessProbe.failureThreshold` | how many times to retry the probe before giving up | 3 |
| `livenessProbe.successThreshold` | how many times the probe must report success to be considered successful after having failed | 1 |
| `livenessProbe.namespaceName` | the namespace name for the liveness probe | "default" |
| `livenessProbe.namespaceUuid` | the namespace UUID for the liveness probe | "fd717fd6-70f1-44c1-81de-287d5e101089" |
| `startupProbe.enabled` | if true, enables the startup probe | false |
| `startupProbe.initialDelaySeconds` | delay before startup probe is initiated | 10 |
| `startupProbe.periodSeconds` | how often to perform the probe | 10 |
| `startupProbe.timeoutSeconds` | when the probe times out | 1 |
| `startupProbe.failureThreshold` | how many times to retry the probe before giving up | 3 |
| `startupProbe.successThreshold` | how many times the probe must report success to be considered successful after having failed | 1 |
| `startupProbe.namespaceName` | the namespace name for the startup probe | "default" |
| `startupProbe.namespaceUuid` | the namespace UUID for the startup probe | "fd717fd6-70f1-44c1-81de-287d5e101089" |
Binary file modified charts/chronicle/charts/sawtooth-0.2.12.tgz
Binary file not shown.
Binary file modified charts/chronicle/charts/standard-defs-0.1.3.tgz
Binary file not shown.
108 changes: 104 additions & 4 deletions charts/chronicle/templates/statefulset.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,106 @@ spec:
- name: chronicle-data
mountPath: /var/lib/chronicle/store/
{{- include "lib.volumeMounts" .Values.extraVolumeMounts | nindent 12 }}
{{- if .Values.livenessProbe.enabled }}
livenessProbe:
exec:
command:
- bash
- -c
- |
PROBE_ID="startup_$(LC_ALL=C tr -dc A-Za-z0-9 </dev/urandom | head -c 13)" &&
TIMESTAMP=$(date -u +"%Y-%m-%dT%H:%M:%SZ") &&
echo '[
{
"@id": "_:n1",
"@type": [
"http://btp.works/chronicleoperations/ns#ActivityExists"
],
"http://btp.works/chronicleoperations/ns#activityName": [
{
"@value": "'"$PROBE_ID"'"
}
],
"http://btp.works/chronicleoperations/ns#namespaceName": [
{
"@value": "{{ .Values.livenessProbe.namespaceName }}"
}
],
"http://btp.works/chronicleoperations/ns#namespaceUuid": [
{
"@value": "{{ .Values.livenessProbe.namespaceUuid }}"
}
]
}
]' > /tmp/import.json &&
echo "Probe ID: $PROBE_ID" &&
chronicle \
-c /etc/chronicle/config/config.toml \
--console-logging json \
--sawtooth tcp://{{ include "chronicle.sawtooth.service" . }}:{{ include "chronicle.sawtooth.sawcomp" . }} \
--remote-database \
--database-name {{ .Values.postgres.database }} \
--database-username {{ .Values.postgres.user }} \
--database-host {{ .Values.postgres.host }} \
{{- if not .Values.opa.enabled }}
--embedded-opa-policy \
{{- end }}
import {{ .Values.livenessProbe.namespaceName }} {{ .Values.livenessProbe.namespaceUuid }} < /tmp/import.json
initialDelaySeconds: {{ .Values.livenessProbe.initialDelaySeconds }}
periodSeconds: {{ .Values.livenessProbe.periodSeconds }}
timeoutSeconds: {{ .Values.livenessProbe.timeoutSeconds }}
failureThreshold: {{ .Values.livenessProbe.failureThreshold }}
{{- end }}
{{- if .Values.startUpProbe.enabled }}
startupProbe:
exec:
command:
- bash
- -c
- |
PROBE_ID="startup_$(LC_ALL=C tr -dc A-Za-z0-9 </dev/urandom | head -c 13)" &&
TIMESTAMP=$(date -u +"%Y-%m-%dT%H:%M:%SZ") &&
echo '[
{
"@id": "_:n1",
"@type": [
"http://btp.works/chronicleoperations/ns#ActivityExists"
],
"http://btp.works/chronicleoperations/ns#activityName": [
{
"@value": "'"$PROBE_ID"'"
}
],
"http://btp.works/chronicleoperations/ns#namespaceName": [
{
"@value": "{{ .Values.startUpProbe.namespaceName }}"
}
],
"http://btp.works/chronicleoperations/ns#namespaceUuid": [
{
"@value": "{{ .Values.startUpProbe.namespaceUuid }}"
}
]
}
]' > /tmp/import.json &&
echo "Probe ID: $PROBE_ID" &&
chronicle \
-c /etc/chronicle/config/config.toml \
--console-logging json \
--sawtooth tcp://{{ include "chronicle.sawtooth.service" . }}:{{ include "chronicle.sawtooth.sawcomp" . }} \
--remote-database \
--database-name {{ .Values.postgres.database }} \
--database-username {{ .Values.postgres.user }} \
--database-host {{ .Values.postgres.host }} \
{{- if not .Values.opa.enabled }}
--embedded-opa-policy \
{{- end }}
import {{ .Values.startUpProbe.namespaceName }} {{ .Values.startUpProbe.namespaceUuid }} < /tmp/import.json
initialDelaySeconds: {{ .Values.startUpProbe.initialDelaySeconds }}
periodSeconds: {{ .Values.startUpProbe.periodSeconds }}
timeoutSeconds: {{ .Values.startUpProbe.timeoutSeconds }}
failureThreshold: {{ .Values.startUpProbe.failureThreshold }}
{{- end }}
volumes:
- name: chronicle-secrets
persistentVolumeClaim:
Expand All @@ -163,10 +263,10 @@ spec:
- name: chronicle-config
configMap:
name: {{ .Release.Name }}-chronicle-config
{{- if not .Values.postgres.persistence.enabled }}
{{- if not .Values.postgres.persistence.enabled }}
- name: "pgdata"
emptyDir: {}
{{- end }}
{{- end }}
volumeClaimTemplates:
- metadata:
name: chronicle-data
Expand All @@ -176,7 +276,7 @@ spec:
resources:
requests:
storage: 6Gi
{{- if .Values.postgres.persistence.enabled }}
{{- if .Values.postgres.persistence.enabled }}
- metadata:
name: "pgdata"
annotations: {{- include "lib.safeToYaml" .Values.postgres.persistence.annotations | nindent 10 }}
Expand All @@ -186,4 +286,4 @@ spec:
resources:
requests:
storage: {{ .Values.postgres.persistence.size | quote }}
{{- end }}
{{- end }}
32 changes: 31 additions & 1 deletion charts/chronicle/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,36 @@ auth:
userinfo:
url:

## @md | `livenessProbe.enabled` | if true, enables the liveness probe | false |
livenessProbe:
enabled: false
## @md | `livenessProbe.timeoutSeconds` | number of seconds after which the probe times out | 10 |
timeoutSeconds: 20
## @md | `livenessProbe.periodSeconds` | how often (in seconds) to perform the probe | 60 |
periodSeconds: 60
## @md | `livenessProbe.failureThreshold` | when a probe fails, Kubernetes will try failureThreshold times before giving up | 1 |
failureThreshold: 1
## @md | `livenessProbe.namespaceName` | the Chronicle namespace in which the probe operates | default |
namespaceName: default
## @md | `livenessProbe.namespaceUuid` | the UUID of the Chronicle namespace in which the probe operates | fd717fd6-70f1-44c1-81de-287d5e101089 |
namespaceUuid: fd717fd6-70f1-44c1-81de-287d5e101089

## @md | `startUpProbe.enabled` | if true, enables the startup probe | true |
startUpProbe:
enabled: false
## @md | `startUpProbe.initialDelaySeconds` | number of seconds after which the probe starts | 5 |
initialDelaySeconds: 5
## @md | `startUpProbe.failureThreshold` | when a probe fails, Kubernetes will try failureThreshold times before giving up | 30 |
failureThreshold: 30
## @md | `startUpProbe.periodSeconds` | how often (in seconds) to perform the probe | 10 |
periodSeconds: 10
## @md | `startUpProbe.timeoutSeconds` | number of seconds after which the probe times out | 3 |
timeoutSeconds: 3
## @md | `startUpProbe.namespaceName` | the Chronicle namespace in which the probe operates | default |
namespaceName: default
## @md | `startUpProbe.namespaceUuid` | the UUID of the Chronicle namespace in which the probe operates | fd717fd6-70f1-44c1-81de-287d5e101089 |
namespaceUuid: fd717fd6-70f1-44c1-81de-287d5e101089

## @md | `backtraceLevel` | backtrace level for Chronicle | nil |
backtraceLevel: full

Expand Down Expand Up @@ -134,7 +164,7 @@ test:
## @md | `test.api` | test the chronicle GraphQL server API |
api:
## @md | `test.api.enabled` | true to enable api-test Jobs and Services | true |
enabled: true
enabled: false
## @md | `test.api.image` | the image to use for the api-test container | blockchaintp/chronicle-helm-api-test |
image:
## @md | `test.api.image.pullPolicy` | the image pull policy | IfNotPresent |
Expand Down

0 comments on commit ee4b6f9

Please sign in to comment.