-
Notifications
You must be signed in to change notification settings - Fork 413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DNM] MCO-1437: MCO-1476: MCO-1477: MCO-1284: Adapt MCO to OCL v1 API #4756
base: master
Are you sure you want to change the base?
Conversation
@djoshy: This pull request references MCO-1437 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Skipping CI for Draft Pull Request. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: djoshy The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@@ -81,6 +80,7 @@ func TestMachineOSBuild(t *testing.T) { | |||
OSImageURLConfig: fixtures.OSImageURLConfig(), | |||
}, | |||
}, | |||
/* QOCL: Are these test cases valid anymore? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are 6 test cases here that seem to exercise fields that have been removed: BaseOSExtensionsImagePullspe
, BaseOSImagePullspec
and ReleaseVersion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, we still need those fields in order to produce a hashed name for a MachineOSBuild. The difference is that we only have a single canonical source for these values with the new API changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood and I guess my question here is that, since these fields no longer exist and can cause a "different" hash for a build, what these tests are trying to verify by manipulating these fields and examining the result hash can't be done anymore?
ctx, cancel := context.WithCancel(context.Background()) | ||
t.Cleanup(cancel) | ||
/* | ||
QOCL: Need a new failure mode for this as BaseImagePullspec field doesn't exist anymore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the removal of BaseImagePullSpec
, I'm not sure the test can be initiated in the same way here, so I'm open to ideas here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do a Containerfile with invalid syntax instead. It will just take a bit longer to produce the failure.
@@ -572,7 +577,8 @@ func TestMCDGetsMachineOSConfigSecrets(t *testing.T) { | |||
}) | |||
|
|||
// Assign the secret name to the MachineOSConfig. | |||
mosc.Spec.BuildOutputs.CurrentImagePullSecret.Name = secretName | |||
// QOCL: This field doesn't exist anymore, is this test valid? | |||
//mosc.Spec.CurrentImagePullSecret.Name = secretName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another scenario where I believe the test currently depends on this field to run correctly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole test can be dropped in favor of a test that just validates whether the global image pull secret is written to the nodes' filesystem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we already do this as part of the internal registry secret cert writer path, where the global pull secret + internal registry secret is written to disk at /etc/mco/internal-registry-pull-secret.json
: 2d77dcf
pkg/operator/render_test.go
Outdated
@@ -306,6 +305,7 @@ func TestRenderAsset(t *testing.T) { | |||
}, | |||
}, | |||
// Tests that the MCD DaemonSet gets MachineOSConfig secrets mounted into it. | |||
/*QOCL: Another instance of what are we doing with CurrentImagePullSecret? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another test that depends on CurrentImagePullSecret
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To simplify things, we can probably remove this test while also removing the volume mounts from the MCD manifest since this is no longer needed. We'd also need to modify the code within the MCD to write the global pull secret to the nodes filesystem, although I think we're already doing that and just need to tell rpm-ostree to use that secret instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you're talking about the internal registry path I mentioned here and I think we already have something that directs rpm-ostree to use the pull secret if that file exists. Correct me if I'm missing something here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pulled out the code specific to the secret mounting and split it off into a separate commit. I've also pulled out all related tests for this particular path - the render unit test mentioned in this thread and the e2e test mentioned here #4756 (comment)
I do agree that it is a good idea to add a test to ensure that global pull secret is written to disk. I think we might have some units for this path on the operator and the daemon level already, but not an e2e - I could be wrong though. Do you think we should add that work to this PR, or open up a new card for it?
6064540
to
a6f8a0d
Compare
/test unit |
a6f8a0d
to
f90eea9
Compare
/test unit |
f90eea9
to
5f33472
Compare
/test unit |
5f33472
to
4f64dc4
Compare
4f64dc4
to
b379724
Compare
Job: &mcfgv1.ObjectReference{ | ||
Name: obj.GetName(), | ||
Group: obj.GroupVersionKind().Group, | ||
Group: "tbd-group", | ||
Namespace: obj.GetNamespace(), | ||
Resource: obj.GetResourceVersion(), | ||
Resource: "tbd-resource", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These values now fail API validation. The API requires that all these fields should validate against format.dns1123Subdomain()
. In my testing, Group
was being assigned an empty value and Resource
was being assigned ResourceVersion
which is normally a number. Open to suggestions here!
// QOCL: Do we want to fatally exit here? | ||
globalPullSecret, err := optr.ocSecretLister.Secrets(ctrlcommon.OpenshiftConfigNamespace).Get("pull-secret") | ||
if err != nil { | ||
return fmt.Errorf("error fetching cluster pull secret: %w", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to fatally exit here? If this secret is missing in cluster, the build will certainly fail, I'd think.
// If it does exist, check if an update is required before making the update call. | ||
if !reflect.DeepEqual(currentSecretCopy.Data, globalPullSecret.Data) { | ||
klog.Infof("updating %s", ctrlcommon.GlobalPullSecretCopyName) | ||
_, err := optr.kubeClient.CoreV1().Secrets(ctrlcommon.MCONamespace).Update(context.TODO(), globalPullSecretCopy, metav1.UpdateOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this update mechanism is necessary, I don't know if the global secret will ever get updated after installation?
|
||
/* | ||
QOCL: This test seems to pass individually, but fails when run with other unit tests. Other tests in this package | ||
// seems to exhibit similar failures but at a lower rate. Possible object conflicts/worth exploring as a new bug? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests seemed to be conflicting with other tests for some reason, would appreciate some pointers if I'm missing something!
OwnerReferences: []metav1.OwnerReference{ | ||
{ | ||
APIVersion: currentPool.APIVersion, | ||
Kind: currentPool.Kind, | ||
Name: currentPool.ObjectMeta.Name, | ||
UID: currentPool.ObjectMeta.UID, | ||
}, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a minor fix for a particular scenario I encountered during e2es: If the pool was deleted prior to being opted out of layering, this sync function wouldn't delete it. Adding the MCP as an owner field seemed like the cleanest fix.
// This test starts a build with an image that is known to fail because it uses | ||
// an invalid containerfile. After failure, it edits the MachineOSConfig | ||
// with the expectation that the failed build and its will be deleted and a new | ||
// build will start in its place. | ||
func TestGracefulBuildFailureRecovery(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test has been updated to use a bad container file as suggested in #4756 (comment). It now takes about 17 minutes to run locally, not sure how fast it used to be prior to that!
imagestreamObjMeta := metav1.ObjectMeta{ | ||
Name: "os-image", | ||
Namespace: strings.ToLower(t.Name()), | ||
Name: "os-image", | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was necessary because we are no longer able to inject a custom pull secret. By placing the image in the MCO's slice of the registry, the daemon will be able to handle this update with existing secrets.
This PR has been rebased on the latest forks of the API and client-go from Jerry, so it is fairly close to the shape of the final API. I've left comments above so hopefully we can remember where we left off before the break 🎄 Since we can't run the e2es until the v1 CRDs have landed, I took the liberty of testing the e2es locally and we seemed to be only failing on |
/test unit |
@djoshy: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
[DNM] Testing current state of v1 API