Integrating MicroShift with Greenboot allows for automatic software upgrade rollbacks in case of a failure.
The current document describes a few techniques for:
- Adding user workload health check procedures in a production environment
- Simulating software upgrade failures in a development environment
These guidelines can be used by developers for implementing user workload health check using Greenboot facilities, as well as simulating failures for testing MicroShift integration with Greenboot in CI/CD pipelines.
Follow the instructions in Auto-applying Manifests section to install a dummy user workload, without restarting the MicroShift service at this time.
Proceed by creating a health check script in the /etc/greenboot/check/required.d
directory.
The name prefix of the user script should be chosen to make sure it runs after the
40_microshift_running_check.sh
script, which implements the MicroShift health check procedure for its core services.
SCRIPT_FILE=/etc/greenboot/check/required.d/50_busybox_running_check.sh
sudo curl -s https://raw.githubusercontent.com/openshift/microshift/main/docs/config/busybox_running_check.sh \
-o ${SCRIPT_FILE} && echo SUCCESS || echo ERROR
sudo chmod 755 ${SCRIPT_FILE}
Reboot the system and run the following command to examine the output of the Greenboot health checks. Note that the MicroShift core service health checks are running before the user workload health checks.
sudo journalctl -o cat -u greenboot-healthcheck.service
The script utilizes the MicroShift health check functions that are available
in the /usr/share/microshift/functions/greenboot.sh
file to reuse procedures
already implemented for the MicroShift core services. These functions need a
definition of the user workload namespaces and the expected count of pods.
PODS_NS_LIST=(busybox)
PODS_CT_LIST=(1 )
The script starts by running sanity checks to verify that it is executed from
the root
account and that the MicroShift service is enabled.
Finally, the MicroShift health check functions are called to perform the following actions:
- Get a wait timeout of the current boot cycle for the
wait_for
function - Call the
namespace_images_downloaded
function to wait until pod images are available - Call the
namespace_pods_ready
function to wait until pods are ready - Call the
namespace_pods_not_restarting
function to verify pods are not restarting
To simulate a situation with the MicroShift service failure after an upgrade,
one can remove the hostname
RPM package used by MicroShift during its startup.
Run the following command to remove the package and reboot.
sudo rpm-ostree override remove hostname -r
Examine the system log to monitor the upgrade verification procedure that will fail
and reboot a few times before rolling back into the previous state. Note the values
of boot_success
and boot_counter
GRUB variables.
$ sudo journalctl -o cat -u greenboot-healthcheck.service
...
...
Running Required Health Check Scripts...
STARTED
GRUB boot variables:
boot_success=0
boot_indeterminate=0
boot_counter=2
...
...
Waiting 600s for MicroShift service to be active and not failed
FAILURE
...
...
Run the
sudo journalctl -o cat -u redboot-task-runner.service
command to see the output of the pre-rollback script.
After a few failed verification attempts, the system should roll back into the
previous state. Use the rpm-ostree
command to verify the current deployment.
$ rpm-ostree status
State: idle
Deployments:
edge:rhel/9/x86_64/edge
Version: 9.1 (2022-12-26T10:28:32Z)
Diff: 1 removed
RemovedBasePackages: hostname 3.20-6.el8
* edge:rhel/9/x86_64/edge
Version: 9.1 (2022-12-26T10:28:32Z)
Finish by checking that all MicroShift pods run normally and cleaning up the failed rollback deployment.
sudo rpm-ostree cleanup -b -r
To simulate a situation with the MicroShift pod failure after an upgrade,
one can set the network.serviceNetwork
MicroShift configuration option to a
non-default 10.66.0.0/16
value without resetting the MicroShift data at the
/var/lib/microshift
directory.
Start by checking out the current file system state and modifying it by adding
the config.yaml
file to its usr/etc/microshift
directory.
NEW_BRANCH="microshift-config"
OSTREE_REF=$(ostree refs | head -1)
OSTREE_COM=$(ostree log ${OSTREE_REF} | grep ^commit | awk '{print $2}')
sudo ostree checkout ${OSTREE_COM} ${NEW_BRANCH}
pushd ${NEW_BRANCH}
sudo tee usr/etc/microshift/config.yaml &>/dev/null <<EOF
network:
serviceNetwork:
- 10.66.0.0/16
EOF
Commit the updated file system, clean up the checked out tree and compare the
base reference with the newly created one to verify that the config.yaml
file
was added at the /usr/etc/microshift
directory.
sudo ostree commit --subject="MicroShift config.yaml update" --branch="${NEW_BRANCH}"
popd && sudo rm -rf ${NEW_BRANCH}
ostree diff ${OSTREE_REF} ${NEW_BRANCH}
Switch to the new branch as the default deployment and reboot for changes to become effective.
BRANCH_COM=$(ostree log ${NEW_BRANCH} | grep ^commit | awk '{print $2}')
sudo rpm-ostree rebase --branch=${NEW_BRANCH} ${BRANCH_COM} -r
Examine the system log to monitor the upgrade verification procedure that will fail
and reboot a few times before rolling back into the previous state. Note the pod
readiness failure in the openshift-ingress
namespace.
$ sudo journalctl -o cat -u greenboot-healthcheck.service
...
...
Running Required Health Check Scripts...
STARTED
...
...
Waiting 600s for 1 pod(s) from the 'openshift-ingress' namespace to be in 'Ready' state
FAILURE
...
...
Run the
sudo journalctl -o cat -u redboot-task-runner.service
command to see the output of the pre-rollback script.
After a few failed verification attempts, the system should roll back into the
previous state. Use the rpm-ostree
command to verify the current deployment.
$ rpm-ostree status
State: idle
Deployments:
* edge:rhel/9/x86_64/edge
Version: 9.1 (2022-12-28T16:50:54Z)
edge:eae8486a204bd72eb56ac35ca9c911a46aff3c68e83855f377ae36a3ea4e87ef
Timestamp: 2022-12-29T14:44:48Z
Finish by checking that all MicroShift pods run normally and cleaning up the failed rollback deployment.
NEW_BRANCH="microshift-config"
sudo rpm-ostree cleanup -b -r
sudo ostree refs --delete ${NEW_BRANCH}