CNF-15663: Full DU profile example#313
CNF-15663: Full DU profile example#313openshift-merge-bot[bot] merged 1 commit intoopenshift-kni:mainfrom
Conversation
|
@irinamihai: This pull request references CNF-15663 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.18.0" version, but no target version was set. DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/hold cleanup |
|
/unhold |
There was a problem hiding this comment.
This should be a default
There was a problem hiding this comment.
+1 -- or going a bit further I think this is static content that likely doesn't need to be part of the defaults either.
There was a problem hiding this comment.
It has been agreed that this will be locked in the new version of the ClusterLogForwarder, currently WIP under OCPBUGS-44518, so it will be removed from this ClusterTemplate.
There was a problem hiding this comment.
This should be a default
There was a problem hiding this comment.
By default, do you mean have them directly in the Policy Generator and not expose them in the policyTemplateParameters?
There was a problem hiding this comment.
No, these would be set in the default configmap vs being passed in by the client
There was a problem hiding this comment.
As proposed above I think we should narrow this down to just the additional labels. One question I have is whether the user/orchestrator would add one or more labels which are cluster specific (like a higher level cluster identifier, etc)? In that case would the labels (or at least one label) need to be part of this schema?
There was a problem hiding this comment.
The filters are also going to be partially locked in in the ClusterLogForwarder source-cr under OCPBUGS-44518. Yes, these labels will be set in the ClusterInstance defaults ConfigMap, but we also need a way for them to reach the ConfigMap used by the ACM PGs, so they need to also be included in the policyTemplate defaults ConfigMap.
There was a problem hiding this comment.
Not really filters, additional metadata labels
There was a problem hiding this comment.
Do we want to narrow the templating down to just the additional labels, ie the user configures only the value for openshiftLabels?
There was a problem hiding this comment.
Remove, see comment above
lack
left a comment
There was a problem hiding this comment.
Generally speaking, if this is something that will need to be kept in-sync with the cnf-features-deploy repo, perhaps it's worth engineering a way to automatically synchronize them or generate one from the other.
There was a problem hiding this comment.
Should we allow customization of all of this? Or just the Kafka url?
There was a problem hiding this comment.
We shouldn't override this section, but rely on the source-crs original value
There was a problem hiding this comment.
+1
This is also missing the module_blacklist=irdma
There was a problem hiding this comment.
I will remove this section to keep the defaults from the source-cr. I kept it from a previous configuration, but I realized it's not matching the 4.17 configuration.
There was a problem hiding this comment.
| machineConfigPoolSelector: | |
| pools.operator.machineconfiguration.openshift.io/master: "" | |
| nodeSelector: | |
| node-role.kubernetes.io/master: '' | |
| machineConfigPoolSelector: | |
| $patch: replace | |
| pools.operator.machineconfiguration.openshift.io/master: "" | |
| nodeSelector: | |
| $patch: replace | |
| node-role.kubernetes.io/master: '' |
And then we don't need the SetSelector cr variant any more.
(Repeat for *-SetSelector.yaml elsewhere in this file!)
There was a problem hiding this comment.
This should be fine now since we also have support for openapi schemas!
There was a problem hiding this comment.
We shouldn't override these; the source-crs has the right values.
There was a problem hiding this comment.
i don't think any of this is needed any more since the SiteConfig added cpuPartitioningMode: AllNodes in 4.14
There was a problem hiding this comment.
Wouldn't this be in the source cr ?
There was a problem hiding this comment.
The selector can't be, because depending on whether you're deploying SNO or MNO the source CR may need master or worker.
There was a problem hiding this comment.
In this model, the cluster template would only be used for SNO, a MNO would have a different one,
There was a problem hiding this comment.
For SNO we should be able to use either master or worker here since the node has both labels. If we use worker is it valid for all topologies?
There was a problem hiding this comment.
Do we need to override the default profile? 🤔 The ztp git example is just using the default profile values.
There was a problem hiding this comment.
No need to override the source cr here
There was a problem hiding this comment.
By default, observability is not enabled. I think we should just use the default values from source-cr.
There was a problem hiding this comment.
we should enable observability.
There was a problem hiding this comment.
I have referred to the story IanM has pointed out (CNF-13398) and we have to merge our desired configuration for reducing the monitoring footprint with the default observability config so that our configuration does not get overridden.
I will include the new manifest in the following patch.
There was a problem hiding this comment.
Is this DU profile based on the OCP 4.17? I wonder if adding a comment to mention that might be helpful.
There was a problem hiding this comment.
Do we want to use disconnected registry as example?
There was a problem hiding this comment.
Yes, just to showcase how to add extra annotations, but I think it can be removed from the full DU profile example and kept in our other examples.
There was a problem hiding this comment.
Reference is now UEFISecureBoot.
There was a problem hiding this comment.
For IBU there is a need for a separate partition for /var/lib/containers. Should this example set that up as well?
There was a problem hiding this comment.
I will update to use the config from the SiteConfig 4.17 examples then.
There was a problem hiding this comment.
Is there an intent to include ports eth0 and eth1 in this bond, or other ports in it?
There was a problem hiding this comment.
Do we want to narrow the templating down to just the additional labels, ie the user configures only the value for openshiftLabels?
There was a problem hiding this comment.
As noted above we should use more fixed content in the source CR (ie be more opinionated) and allow the user to override the URL and labels.
There was a problem hiding this comment.
We needed to specify it if we use $patch: replace.
This whole manifest will be reworked anyway with the new ClusterLogForwarder source-cr.
There was a problem hiding this comment.
These are not in the reference. Is their addition intentional?
There was a problem hiding this comment.
Missing:
group.ice-dplls=0:f:10:*:ice-dplls.*
There was a problem hiding this comment.
For all of these we should not repeat any of the content already in the source CR.
There was a problem hiding this comment.
Hmm, there is no content locked in the source-crs...
SriovNetworkNodePolicy, SriovNetwork.
There was a problem hiding this comment.
Is there a reason this is in its own policy? This creates more policies than are necessary. Consider combining with the next policy as "baseline config"
There was a problem hiding this comment.
Actually, I think this can be included in the v4-config-policy.
There was a problem hiding this comment.
Could sriov configs be included in the v4-config-policy? I don't see the reason why they couldn't be🤔 .
There was a problem hiding this comment.
Can we remove the bond, bonding is going to be very rare for this use case
4598a0b to
071bcc5
Compare
|
/hold |
56c7b70 to
fa33b9e
Compare
|
/unhold |
There was a problem hiding this comment.
Hmm... how would this work if we had more than one cluster template that needed to enable observability? Wouldn't this result in a conflicting binding?
There was a problem hiding this comment.
This one has to be a one time policy per namespace. We could have a directory called common-<namespace> under sno-ran-full-du and include there all the resources common for the ClusterTemplates in a certain namespace. WDYT, @bartwensley?
There was a problem hiding this comment.
My concern was that the ManagedClusterSetBinding refers to "open-cluster-management-observability" which doesn't include anything specific to the cluster template. So if you had two different cluster templates, and they each contained this same ManagedClusterSetBinding, wouldn't that be a conflict? So don't we need to include the cluster template name in this namespace? Apologies if I'm not explaining this well (or if it makes no sense).
There was a problem hiding this comment.
@bartwensley , yes, it makes sense. I've thought about it and I pushed a new patchset with a new approach.
We could use the following directory structure:
policytemplates/
common/
acm-pg-observability.yaml
msc-observability.yaml
source-cr-observability.yaml
version_4.X.Y/
version_4.X.Y+1/
...In the new patchset, I've used object-templates-raw in the updated source-cr such that we range over the namespaces where we create ORAN policies. The only drawback is that we need to include those namespaces manually. The alternative would have been to have one ACM PG for each ztp-<clustertemplate-namespace>, so I think the current proposal is still cleaner.
I will check with ACM folks if there is any way for us to automatically get the namespaces and I can update that in a future PR.
What do you think?
There was a problem hiding this comment.
Thanks Irina - looks good.
fa33b9e to
8270259
Compare
There was a problem hiding this comment.
where did this come from. We do not want all of these enabled on the managed cluster
There was a problem hiding this comment.
These come from values set by ACM observability. On the hub we (typically) disable ACM's ability to write this configmap because of conflicting settings for this "yaml in a string" value, but we merged our content with what ACM sets. The collectors entries below each have a boolean enable/disable value. With this setting only netclass and netdev are enabled which matches the state when the configmap is removed entirely.
For reference, the node-exporter pod command line looks like this with the nodeExporter config in this PR.
- --no-collector.wifi
- --collector.filesystem.mount-points-exclude=^/(dev|proc|sys|run/k3s/containerd/.+|var/lib/docker/.+|var/lib/kubelet/pods/.+)($|/)
- --collector.netclass.ignored-devices=^(veth.*|[a-f0-9]{15}|enP.*|ovn-k8s-mp[0-9]*|br-ex|br-int|br-ext|br[0-9]*|tun[0-9]*|cali[a-f0-9]*)$
- --collector.netdev.device-exclude=^(veth.*|[a-f0-9]{15}|enP.*|ovn-k8s-mp[0-9]*|br-ex|br-int|br-ext|br[0-9]*|tun[0-9]*|cali[a-f0-9]*)$
- --collector.cpu.info
- --collector.textfile.directory=/var/node_exporter/textfile
- --no-collector.btrfs
- --runtime.gomaxprocs=0
- --no-collector.cpufreq
- --no-collector.tcpstat
- --collector.netdev
- --collector.netclass
- --collector.netclass.netlink
- --no-collector.buddyinfo
- --no-collector.mountstats
- --no-collector.ksmd
- --no-collector.processes
- --no-collector.systemd
There was a problem hiding this comment.
Is the path right? I note that the CR is in sno-ran-full-du-profile directory.
There was a problem hiding this comment.
Uh-oh... I actually renamed the directories to be sno-ran-full-du. Thank you.
8270259 to
df64aaf
Compare
df64aaf to
39378d0
Compare
39378d0 to
51c608e
Compare
|
/retest-required |
|
/lgtm |
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: browsell The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
No description provided.