❗ The factory-cli tool permits to download a rootfs image to the local partition. Therefore, when booting the discovery ISO we can use a lightweight image and pull the rootfs from a local partition or from a local HTTPd server. Currently ZTP does not allow a declarative and automated way to load the rootfs image from a local partition.
Edge computing presents extraordinary challenges with managing hundreds to tens of thousands of clusters in hundreds of thousands of locations. These challenges require fully-automated management solutions with, as closely as possible, zero human interaction.
Zero touch provisioning (ZTP) allows you to provision new edge sites with declarative configurations of bare-metal equipment at remote sites following a GitOps deployment set of practices. All configurations are declarative in nature.
ZTP is a project to deploy and deliver OpenShift 4 in a hub-and-spoke architecture (in a relation of 1-N), where a single hub cluster manages many spoke clusters. The hub and the spokes will be based on OpenShift 4 but with the difference that the hub cluster will manage, deploy and control the lifecycle of the spokes using Red Hat Advanced Cluster Management (RHACM). So, hub clusters running RHACM apply radio access network (RAN) policies from predefined custom resources (CRs) and provision and deploy the spoke clusters using multiple products.
❗ ZTP can have two scenarios, connected and disconnected, whether the OpenShift Container Platform Worker nodes can directly access the internet or not. In telco deployments, the disconnected scenario is the most common.
ZTP provides support for deploying single node clusters, three-node clusters, and standard OpenShift clusters. This includes the installation of OpenShift and deployment of the distributed units (DUs) at scale. However, the factory-cli tool focuses on SNO clusters only.
Zero Touch Provisioning (ZTP) leverages multiple products or components to deploy OpenShift Container Platform clusters using a GitOps approach. While the workflow starts when the site is connected to the network and ends with the CNF workload deployed and running on the site nodes, it can be logically divided into two different stages: provisioning of the SNO and applying the desired configuration, which in our case is applying the validated RAN DU profile.
⚠️ The workflow does not need any intervention, so ZTP automatically will configure the SNO once it is provisioned. However, two stages are clearly differentiated.
The workflow is officially started by creating declarative configurations for the provisioning of your OpenShift
clusters. This manifest is described in a custom resource called siteConfig
. See that in a disconnected environment,
there is a need for a container registry which has been configured to deliver the required OpenShift container images
required for the installation. This task can be achieved by using
oc-mirror.
❗ The factory-cli-tool tries to limit the usage of a container registry to pull down the required images. Currently, a registry is still needed since a couple of components require checking the availability of certain images to continue with the installation. The end goal is avoiding this requirement.
Depending on your specific environment, you might need a couple of extra services such as DHCP, DNS, NTP or HTTP. The latest will be needed for downloading the RHCOS live ISO and the rootfs image locally instead of the default OpenShift mirror webpage
Once the configuration is created you can push it to the Git repo where Argo CD is continuously looking to pull the new content:
Argo CD pulls the siteConfig and uses a specific kustomize plugin called siteconfig-generator to transform it into custom resources that are understood by the hub cluster (RHACM/MCE). A siteConfig contains all the necessary information to provision your node or nodes. Basically, it will create ISO images with the defined configuration that are delivered to the edge nodes to begin the installation process. The images are used to repeatedly provision large numbers of nodes efficiently and quickly, allowing you to keep up with requirements from the field for far-edge nodes.
⚠️ On telco use cases, clusters are mainly running on bare-metal hosts. Therefore the produced ISO images are mounted using remote virtual media features of the baseboard management controller (BMC).
In the picture, these resulting manifests are called Cluster Installation CRs. Finally, the provisioning process starts.
The provisioning process includes installing the host operating system (RHCOS) on a blank server and deploying OpenShift Container Platform. This stage is managed mainly by a ZTP component called the Infrastructure Operator
❗ Notice, in the picture, how ZTP allows us to provision clusters at scale.
Once the clusters are provisioned, the day-2 configuration defined in multiple PolicyGenTemplate
(PGTs) custom
resources will be automatically applied. PolicyGenTemplate
custom resource is understood by the ZTP using a specific
kustomize plugin called policy-generator.
In telco RAN DU nodes, this configuration includes the installation of the common telco operators, a common
configuration for RAN and specific configuration (SR-IOV or performance settings) for each site since it is very
dependant on the hardware.
Notice that if, later on, you want to apply a new configuration or replace an existing configuration you must use a new policyGenTemplate
to do that.
- The partitioning stage was already executed successfully.
- The downloading stage is completed, so all dependent artifacts are already stored on the disk partition.
- The bare metal server is powered off.
As mentioned, a siteConfig
manifest defines in a declarative manner how an OpenShift target cluster is going to be
installed and configured. Below it is an example of a valid siteConfig, however, unlike the regular ZTP provisioning
workflow 3 extra fields need to be included:
- clusters.ignitionConfigOverride. This field adds an extra configuration in ignition format during the ZTP discovery stage. Basically, it includes a couple of systemd services in the ISO that it is mounted using virtual media. This way, those scripts are part of the RHCOS discovery live ISO and can be used at that point to load the Assisted Installer images.
- nodes.installerArgs. This field allows us to configure the way coreos-installer utility writes the RHCOS live ISO to disk. In this case, we need to indicate to save the disk partition labeled as 'data'. The artifacts saved there will be needed during the OCP installation stage.
- nodes.ignitionConfigOverride. This field adds similar functionality as the clusters.ignitionConfigOverride, but in the OCP installation stage. Notice that once the RHCOS is written to disk, the extra configuration included in the ZTP discovery ISO is not there anymore. It was in memory since we were running a live OS during the discovery stage. This field allows the addtion of extra configuration in ignition format to the coreos-installer binary, which is in charge of writing the RHCOS live OS to disk.
❗ You can just copy and paste the three fields described above in your siteConfig. A detailed explanation of each one is included in the following sections.
If you have customized the partition label during the partitioning stage, you will need to reflect that customization in your siteConfig:
- Replace
data
with your custom partition label in the nodes.installerArgs for the--save-partlabel
option. - Update the
precache-images.service
andprecache-ocp-images.service
service-unit configuration in the clusters.ignitionConfigOverride and nodes.ignitionConfigOverride to specify your custom partion label as an argument to theextract-ai.sh
andextract-ocp.sh
utilities, using the--label
option.
To update your siteConfig, you can use the factory-precaching-cli siteconfig
command, which will merge the required
prestaging hooks into a given siteConfig yaml file.
$ podman run --rm quay.io/openshift-kni/telco-ran-tools:latest -- factory-precaching-cli siteconfig --help
Update site config
Usage:
factory-precaching-cli siteconfig [flags]
Flags:
-c, --cfg string Site-config file
-h, --help help for siteconfig
-i, --indent int Indentation (default 2)
-l, --label string Partition label (default "data")
--testmode Use dummy ignition data for testing
-v, --version version for siteconfig
Using the siteconfig
command will also help to ensure any updates to prestaging hooks (bug fixes, etc) are reflected
in the siteconfig. The command writes the updated siteConfig to stdout, which can be redirected to a new file for
comparison with the original prior to adoption.
# Update site-config, using default partition label
podman run --rm -i quay.io/openshift-kni/telco-ran-tools:latest -- factory-precaching-cli siteconfig \
<site-config.yaml >new-site-config.yaml
# Update site-config, specifying a custom partition label
podman run --rm -i quay.io/openshift-kni/telco-ran-tools:latest -- factory-precaching-cli siteconfig \
--label mycustompartition <site-config.yaml >new-site-config.yaml
apiVersion: ran.openshift.io/v1
kind: SiteConfig
metadata:
name: "clus3a-5g-lab"
namespace: "clus3a-5g-lab"
spec:
baseDomain: "e2e.bos.redhat.com"
pullSecretRef:
name: "assisted-deployment-pull-secret"
clusterImageSetNameRef: "img4.9.10-x86-64-appsub"
sshPublicKey: "ssh-rsa ..."
clusters:
- clusterName: "sno-worker-0"
clusterImageSetNameRef: "eko4-img4.11.5-x86-64-appsub"
clusterLabels:
group-du-sno: ""
common-411: true
sites : "clus3a-5g-lab"
vendor: "OpenShift"
clusterNetwork:
- cidr: 10.128.0.0/14
hostPrefix: 23
machineNetwork:
- cidr: 10.19.32.192/26
serviceNetwork:
- 172.30.0.0/16
networkType: "OVNKubernetes"
additionalNTPSources:
- clock.corp.redhat.com
ignitionConfigOverride: '{"ignition":{"version":"3.1.0"},"systemd":{"units":[{"name":"precache-images.service","enabled":true,"contents":"[Unit]\nDescription=Load prestaged images in discovery stage\n\nBefore=agent.service\n\n[Service]\nType=oneshot\nUser=root\nWorkingDirectory=/var/mnt\nExecStart=bash /usr/local/bin/extract-ai.sh --label data\n#TimeoutStopSec=30\nExecStopPost=systemctl disable precache-images.service\n\n[Install]\nWantedBy=multi-user.target default.target\nWantedBy=agent.service"}]},"storage":{"files":[{"overwrite":true,"path":"/usr/local/bin/extract-ai.sh","mode":493,"user":{"name":"root"},"contents":{"source":"data:text/plain;charset=utf-8;base64,"}}]}}'
nodes:
- hostName: "snonode.sno-worker-0.e2e.bos.redhat.com"
role: "master"
bmcAddress: "idrac-virtualmedia+https://10.19.28.53/redfish/v1/Systems/System.Embedded.1"
bmcCredentialsName:
name: "worker0-bmh-secret"
bootMACAddress: "e4:43:4b:bd:90:46"
bootMode: "UEFI"
rootDeviceHints:
deviceName: /dev/nvme0n1
cpuset: "0-1,40-41"
installerArgs: '["--save-partlabel", "data"]'
ignitionConfigOverride: '{"ignition":{"version":"3.1.0"},"systemd":{"units":[{"name":"precache-ocp-images.service","enabled":true,"contents":"[Unit]\nDescription=Load prestaged OCP images into containers storage\nBefore=machine-config-daemon-pull.service nodeip-configuration.service\n\n[Service]\nType=oneshot\nUser=root\nWorkingDirectory=/var/mnt\nExecStart=bash /usr/local/bin/extract-ocp.sh --label data\nTimeoutStopSec=60\nExecStopPost=systemctl disable precache-ocp-images.service\n\n[Install]\nWantedBy=multi-user.target\n"}]},"storage":{"files":[{"overwrite":true,"path":"/usr/local/bin/extract-ocp.sh","mode":493,"user":{"name":"root"},"contents":{"source":"data:text/plain;charset=utf-8;base64,"}}]}}'
nodeNetwork:
config:
interfaces:
- name: ens1f0
type: ethernet
state: up
macAddress: "e4:43:4b:bd:90:46"
ipv4:
enabled: true
dhcp: true
ipv6:
enabled: false
interfaces:
- name: "ens1f0"
macAddress: "e4:43:4b:bd:90:46"
Showing the content of the field in a prettier and cleaner way will help us to understand much better what is actually doing the ignition configuration:
- There are two systemd units (var-mnt.mount nad precache-images.services). The precache-images.service depends on the
disk partition to be mounted in /var/mnt by the var-mnt unit. The precache-images service basically calls a script
called
extract-ai.sh
. Notice that the precache-images must be executed before theagent.service
, so it means that extracting the Assisted Installer (ai) images is done before the discovery stage starts. - The
extract-ai.sh
script uncompresses and loads the images required in this stage from the disk partition to the local container storage. Once done, images can be used locally instead of pulled down from a registry. The decoded script can be found here
❗ Sometimes you could need to modify the mentioned scripts or include a new ones. In such cases you can do so by adding them into the discovery-beauty ignition template. Finally, include the modified ignition file into the siteConfig manifest in the expected format.
{
"ignition": {
"version": "3.1.0"
},
"systemd": {
"units": [
{
"name": "precache-images.service",
"enabled": true,
"contents": "[Unit]\nDescription=Load prestaged images in discovery stage\n\nBefore=agent.service\n\n[Service]\nType=oneshot\nUser=root\nWorkingDirectory=/var/mnt\nExecStart=bash /usr/local/bin/extract-ai.sh --label data\n#TimeoutStopSec=30\nExecStopPost=systemctl disable precache-images.service\n\n[Install]\nWantedBy=multi-user.target default.target\nWantedBy=agent.service"
}
]
},
"storage": {
"files": [
{
"overwrite": true,
"path": "/usr/local/bin/extract-ai.sh",
"mode": 493,
"user": {
"name": "root"
},
"contents": {
"source": "data:text/plain;charset=utf-8;base64,"
}
}
]
}
}
This field, as mentioned, permits us to save the disk partition where all the container images were stored. Those extra
parameters are passed directly to the coreos-installer
binary who is in charge of writing the live RHCOS to disk.
Then, on the next boot, the OS is executed from the disk. Notice that we previously named the
partition as data.
installerArgs: '["--save-partlabel", "data"]'
Several extra options can be passed to the coreos-installer utility. Here below, you can see the most interesting ones:
# coreos-installer install --help
coreos-installer-install 0.12.0
Install Fedora CoreOS or RHEL CoreOS
USAGE:
coreos-installer install [OPTIONS] <dest-device>
OPTIONS:
...
-u, --image-url <URL>
Manually specify the image URL
-f, --image-file <path>
Manually specify a local image file
-i, --ignition-file <path>
Embed an Ignition config from a file
-I, --ignition-url <URL>
Embed an Ignition config from a URL
...
--save-partlabel <lx>...
Save partitions with this label glob
--save-partindex <id>...
Save partitions with this number or range
...
--insecure-ignition
Allow Ignition URL without HTTPS or hash
...
ARGS:
<dest-device>
Destination device
The purpose of this field is very similar to clusters.ignitionConfigOverride. Unlike the previous ignitionOverride, we are in the OCP installation stage. This means that we need to extract and load the OCP images that are needed to install the cluster. Remember that in the discovery stage we are only taking care of the container images required for discovery.
⚠️ The number of container images extracted and loaded is way bigger than in the discovery stage. So, depending on the OCP release and whether it was requested to install the telco operators, the time it takes will vary.
- There are two systemd units (var-mnt.mount and precache-ocp.services). The precache-ocp.service depends on the disk
partition to be mounted in /var/mnt by the var-mnt unit. The precache-ocp service basically calls a script called
extract-ocp.sh
. Notice that the precache-ocp must be executed before themachine-config-daemon-pull.service
andnodeip-configuration.service
, so it means that extracting the images is done before the OCP installation starts. - The
extract-ocp.sh
script uncompresses and loads the images required in this stage from the disk partition to the local container storage. Fundamentally there are OCP release images and operators if they were requested to be installed. Once done, images can be used locally instead of pulled down from a registry. The decoded script can be found here
❗ Sometimes you could need to modify the mentioned scripts or include a new ones. In such cases you can do so by adding them into the boot-beauty ignition template. Finally, include the modified ignition file into the siteConfig manifest in the expected format.
{
"ignition": {
"version": "3.1.0"
},
"systemd": {
"units": [
{
"name": "precache-ocp-images.service",
"enabled": true,
"contents": "[Unit]\nDescription=Load prestaged OCP images into containers storage\nBefore=machine-config-daemon-pull.service nodeip-configuration.service\n\n[Service]\nType=oneshot\nUser=root\nWorkingDirectory=/var/mnt\nExecStart=bash /usr/local/bin/extract-ocp.sh --label data\nTimeoutStopSec=60\nExecStopPost=systemctl disable precache-ocp-images.service\n\n[Install]\nWantedBy=multi-user.target\n"
}
]
},
"storage": {
"files": [
{
"overwrite": true,
"path": "/usr/local/bin/extract-ocp.sh",
"mode": 493,
"user": {
"name": "root"
},
"contents": {
"source": "data:text/plain;charset=utf-8;base64,"
}
}
]
}
}
Once the siteConfig
and optionally the policyGenTemplates
are uploaded to the Git repo where Argo CD is monitoring, we are ready to push the sync button to start the whole process. Remember that the process should require Zero Touch.