Skip to content

CNF-15663: Full DU profile example#313

Merged
openshift-merge-bot[bot] merged 1 commit intoopenshift-kni:mainfrom
irinamihai:full-du-profile
Dec 19, 2024
Merged

CNF-15663: Full DU profile example#313
openshift-merge-bot[bot] merged 1 commit intoopenshift-kni:mainfrom
irinamihai:full-du-profile

Conversation

@irinamihai
Copy link
Copy Markdown
Collaborator

No description provided.

@openshift-ci-robot
Copy link
Copy Markdown
Collaborator

openshift-ci-robot commented Nov 15, 2024

@irinamihai: This pull request references CNF-15663 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.18.0" version, but no target version was set.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@irinamihai
Copy link
Copy Markdown
Collaborator Author

/hold cleanup

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 15, 2024
@irinamihai
Copy link
Copy Markdown
Collaborator Author

/unhold

@openshift-ci openshift-ci Bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 18, 2024
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a default

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 -- or going a bit further I think this is static content that likely doesn't need to be part of the defaults either.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has been agreed that this will be locked in the new version of the ClusterLogForwarder, currently WIP under OCPBUGS-44518, so it will be removed from this ClusterTemplate.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a default

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default, do you mean have them directly in the Policy Generator and not expose them in the policyTemplateParameters?

Copy link
Copy Markdown
Collaborator

@browsell browsell Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, these would be set in the default configmap vs being passed in by the client

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As proposed above I think we should narrow this down to just the additional labels. One question I have is whether the user/orchestrator would add one or more labels which are cluster specific (like a higher level cluster identifier, etc)? In that case would the labels (or at least one label) need to be part of this schema?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The filters are also going to be partially locked in in the ClusterLogForwarder source-cr under OCPBUGS-44518. Yes, these labels will be set in the ClusterInstance defaults ConfigMap, but we also need a way for them to reach the ConfigMap used by the ACM PGs, so they need to also be included in the policyTemplate defaults ConfigMap.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really filters, additional metadata labels

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to narrow the templating down to just the additional labels, ie the user configures only the value for openshiftLabels?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove, see comment above

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete

Copy link
Copy Markdown
Member

@lack lack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally speaking, if this is something that will need to be kept in-sync with the cnf-features-deploy repo, perhaps it's worth engineering a way to automatically synchronize them or generate one from the other.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we allow customization of all of this? Or just the Kafka url?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

url only

Comment on lines 144 to 148
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't override this section, but rely on the source-crs original value

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1
This is also missing the module_blacklist=irdma

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove this section to keep the defaults from the source-cr. I kept it from a previous configuration, but I realized it's not matching the 4.17 configuration.

Comment on lines 160 to 163
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
machineConfigPoolSelector:
pools.operator.machineconfiguration.openshift.io/master: ""
nodeSelector:
node-role.kubernetes.io/master: ''
machineConfigPoolSelector:
$patch: replace
pools.operator.machineconfiguration.openshift.io/master: ""
nodeSelector:
$patch: replace
node-role.kubernetes.io/master: ''

And then we don't need the SetSelector cr variant any more.

(Repeat for *-SetSelector.yaml elsewhere in this file!)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be fine now since we also have support for openapi schemas!

Comment on lines 181 to 182
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't override these; the source-crs has the right values.

Comment on lines 250 to 274
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think any of this is needed any more since the SiteConfig added cpuPartitioningMode: AllNodes in 4.14

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this be in the source cr ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The selector can't be, because depending on whether you're deploying SNO or MNO the source CR may need master or worker.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this model, the cluster template would only be used for SNO, a MNO would have a different one,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For SNO we should be able to use either master or worker here since the node has both labels. If we use worker is it valid for all topologies?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to override the default profile? 🤔 The ztp git example is just using the default profile values.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to override the source cr here

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete

Comment thread docs/cluster-configuration.md Outdated
Comment thread docs/samples/git-setup/policytemplates/version_4.Y.Z/kustomization.yaml Outdated
Comment on lines 33 to 38
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default, observability is not enabled. I think we should just use the default values from source-cr.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should enable observability.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have referred to the story IanM has pointed out (CNF-13398) and we have to merge our desired configuration for reducing the monitoring footprint with the default observability config so that our configuration does not get overridden.
I will include the new manifest in the following patch.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this DU profile based on the OCP 4.17? I wonder if adding a comment to mention that might be helpful.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to use disconnected registry as example?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Comment on lines 18 to 19
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intentional?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, just to showcase how to add extra annotations, but I think it can be removed from the full DU profile example and kept in our other examples.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reference is now UEFISecureBoot.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For IBU there is a need for a separate partition for /var/lib/containers. Should this example set that up as well?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, good catch

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will update to use the config from the SiteConfig 4.17 examples then.

Comment on lines 72 to 74
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an intent to include ports eth0 and eth1 in this bond, or other ports in it?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to narrow the templating down to just the additional labels, ie the user configures only the value for openshiftLabels?

Comment on lines 215 to 217
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted above we should use more fixed content in the source CR (ie be more opinionated) and allow the user to override the URL and labels.

Comment on lines 218 to 219
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already in source CR

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We needed to specify it if we use $patch: replace.
This whole manifest will be reworked anyway with the new ClusterLogForwarder source-cr.

Comment on lines 241 to 242
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are not in the reference. Is their addition intentional?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing:
group.ice-dplls=0:f:10:*:ice-dplls.*

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all of these we should not repeat any of the content already in the source CR.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, there is no content locked in the source-crs...
SriovNetworkNodePolicy, SriovNetwork.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason this is in its own policy? This creates more policies than are necessary. Consider combining with the next policy as "baseline config"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think this can be included in the v4-config-policy.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could sriov configs be included in the v4-config-policy? I don't see the reason why they couldn't be🤔 .

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove the bond, bonding is going to be very rare for this use case

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 27, 2024
@irinamihai irinamihai force-pushed the full-du-profile branch 2 times, most recently from 4598a0b to 071bcc5 Compare December 5, 2024 00:15
@irinamihai
Copy link
Copy Markdown
Collaborator Author

/hold
Final cleanup

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 5, 2024
@irinamihai irinamihai force-pushed the full-du-profile branch 2 times, most recently from 56c7b70 to fa33b9e Compare December 5, 2024 18:17
@irinamihai
Copy link
Copy Markdown
Collaborator Author

/unhold

@openshift-ci openshift-ci Bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 5, 2024
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... how would this work if we had more than one cluster template that needed to enable observability? Wouldn't this result in a conflicting binding?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one has to be a one time policy per namespace. We could have a directory called common-<namespace> under sno-ran-full-du and include there all the resources common for the ClusterTemplates in a certain namespace. WDYT, @bartwensley?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern was that the ManagedClusterSetBinding refers to "open-cluster-management-observability" which doesn't include anything specific to the cluster template. So if you had two different cluster templates, and they each contained this same ManagedClusterSetBinding, wouldn't that be a conflict? So don't we need to include the cluster template name in this namespace? Apologies if I'm not explaining this well (or if it makes no sense).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bartwensley , yes, it makes sense. I've thought about it and I pushed a new patchset with a new approach.
We could use the following directory structure:

policytemplates/
    common/
        acm-pg-observability.yaml
        msc-observability.yaml
        source-cr-observability.yaml
    version_4.X.Y/
    version_4.X.Y+1/
   ...

In the new patchset, I've used object-templates-raw in the updated source-cr such that we range over the namespaces where we create ORAN policies. The only drawback is that we need to include those namespaces manually. The alternative would have been to have one ACM PG for each ztp-<clustertemplate-namespace>, so I think the current proposal is still cleaner.

I will check with ACM folks if there is any way for us to automatically get the namespaces and I can update that in a future PR.

What do you think?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Irina - looks good.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where did this come from. We do not want all of these enabled on the managed cluster

Copy link
Copy Markdown
Contributor

@imiller0 imiller0 Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These come from values set by ACM observability. On the hub we (typically) disable ACM's ability to write this configmap because of conflicting settings for this "yaml in a string" value, but we merged our content with what ACM sets. The collectors entries below each have a boolean enable/disable value. With this setting only netclass and netdev are enabled which matches the state when the configmap is removed entirely.

For reference, the node-exporter pod command line looks like this with the nodeExporter config in this PR.

    - --no-collector.wifi
    - --collector.filesystem.mount-points-exclude=^/(dev|proc|sys|run/k3s/containerd/.+|var/lib/docker/.+|var/lib/kubelet/pods/.+)($|/)
    - --collector.netclass.ignored-devices=^(veth.*|[a-f0-9]{15}|enP.*|ovn-k8s-mp[0-9]*|br-ex|br-int|br-ext|br[0-9]*|tun[0-9]*|cali[a-f0-9]*)$
    - --collector.netdev.device-exclude=^(veth.*|[a-f0-9]{15}|enP.*|ovn-k8s-mp[0-9]*|br-ex|br-int|br-ext|br[0-9]*|tun[0-9]*|cali[a-f0-9]*)$
    - --collector.cpu.info
    - --collector.textfile.directory=/var/node_exporter/textfile
    - --no-collector.btrfs
    - --runtime.gomaxprocs=0
    - --no-collector.cpufreq
    - --no-collector.tcpstat
    - --collector.netdev
    - --collector.netclass
    - --collector.netclass.netlink
    - --no-collector.buddyinfo
    - --no-collector.mountstats
    - --no-collector.ksmd
    - --no-collector.processes
    - --no-collector.systemd

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the path right? I note that the CR is in sno-ran-full-du-profile directory.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh-oh... I actually renamed the directories to be sno-ran-full-du. Thank you.

@irinamihai
Copy link
Copy Markdown
Collaborator Author

/retest-required

@bartwensley
Copy link
Copy Markdown
Collaborator

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Dec 19, 2024
@browsell
Copy link
Copy Markdown
Collaborator

/approve

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Dec 19, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: browsell

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 19, 2024
@openshift-merge-bot openshift-merge-bot Bot merged commit 7c9a207 into openshift-kni:main Dec 19, 2024
@irinamihai irinamihai deleted the full-du-profile branch January 9, 2025 20:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants