Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API-1835: Migrate WorkloadController to SSA #1830

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

bertinatto
Copy link
Member

@bertinatto bertinatto commented Oct 11, 2024

Proof: openshift/cluster-authentication-operator#715.

Here's a verb usage comparison between e2e-aws-single-node jobs of the PR above and openshift/cluster-authentication-operator#705.

verb_usage_comparison_auth

/assign @deads2k @p0lyn0mial

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 11, 2024
Copy link
Contributor

openshift-ci bot commented Oct 11, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bertinatto

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 11, 2024
@bertinatto bertinatto changed the title WIP: Migrate WorkloadController to SSA API-1835: Migrate WorkloadController to SSA Oct 11, 2024
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Oct 11, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented Oct 11, 2024

@bertinatto: This pull request references API-1835 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.18.0" version, but no target version was set.

In response to this:

Proof: openshift/cluster-authentication-operator#715

/assign @deads2k @p0lyn0mial

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@bertinatto
Copy link
Member Author

/hold
for results from proof: openshift/cluster-authentication-operator#715

@openshift-ci openshift-ci bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. and removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Oct 11, 2024
@bertinatto
Copy link
Member Author

/retest

@openshift-ci-robot
Copy link

openshift-ci-robot commented Oct 11, 2024

@bertinatto: This pull request references API-1835 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.18.0" version, but no target version was set.

In response to this:

Proof: openshift/cluster-authentication-operator#715.

Here's a verb usage comparison between e2e-aws-single-node jobs of the PR above and openshift/cluster-authentication-operator#705 (not merged yet).

verb_usage_comparison_auth

/assign @deads2k @p0lyn0mial

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Oct 12, 2024

@bertinatto: This pull request references API-1835 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.18.0" version, but no target version was set.

In response to this:

Proof: openshift/cluster-authentication-operator#715.

Here's a verb usage comparison between e2e-aws-single-node jobs of the PR above and openshift/cluster-authentication-operator#705.

verb_usage_comparison_auth

/assign @deads2k @p0lyn0mial

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@bertinatto
Copy link
Member Author

/hold for results from proof: openshift/cluster-authentication-operator#715

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 12, 2024
@bertinatto bertinatto force-pushed the try-apply-workload branch 2 times, most recently from 99e8e37 to 9df235d Compare October 12, 2024 15:02
Type: fmt.Sprintf("%sWorkloadDegraded", defaultControllerName),
Status: operatorv1.ConditionTrue,
Reason: "NoDeployment",
Message: "deployment/: could not be retrieved",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deads2k @p0lyn0mial this is interesting. Forcing every branch to set the conditions revealed an underlying issue. IMO this condition was set incorrectly, but please take a look.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you were right, previously this condition was not set when the workload was missing. Setting it doesn't break anything.

@@ -105,8 +107,8 @@ func TestUpdateOperatorStatus(t *testing.T) {
{
Type: fmt.Sprintf("%sWorkloadDegraded", defaultControllerName),
Status: operatorv1.ConditionTrue,
Message: "nasty error\n",
Reason: "SyncError",
Reason: "NoDeployment",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't overwrite SyncError if occurred, these errors are important. Does that make sense?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I added a commit to change that, PTAL.

Type: fmt.Sprintf("%sWorkloadDegraded", defaultControllerName),
Status: operatorv1.ConditionTrue,
Reason: "NoDeployment",
Message: "deployment/: could not be retrieved",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you were right, previously this condition was not set when the workload was missing. Setting it doesn't break anything.

pkg/operator/apiserver/controller/workload/workload.go Outdated Show resolved Hide resolved

workloadDegradedCondition = workloadDegradedCondition.
WithStatus(operatorv1.ConditionTrue).
WithReason("PreconditionNotFulfilled").
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't know if the workload is degraded but because now we always have to set all conditions we need to do something about it.

maybe we should set the status to Unknown?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO a pre-condition is a requirement, and if we know a requirement isn't met, we can be certain the workload won't be available. Maybe an older workload (with different pre-conditions) is available, but not the workload tied to this version. So I think at least Available and Progressing could be false and true, respectively.

TBH it feels like it should be Degraded too, but I don't have a strong opinion.

@@ -268,18 +289,21 @@ func (c *Controller) updateOperatorStatus(ctx context.Context, previousStatus *o
// If the workload is up to date, then we are no longer progressing
workloadAtHighestGeneration := workload.ObjectMeta.Generation == workload.Status.ObservedGeneration
workloadIsBeingUpdated := workload.Status.UpdatedReplicas < desiredReplicas
workloadIsBeingUpdatedTooLong, err := isUpdatingTooLong(previousStatus, deploymentProgressingCondition.Type)
workloadIsBeingUpdatedTooLong, err := isUpdatingTooLong(previousStatus, *deploymentProgressingCondition.Type)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that *deploymentProgressingCondition.Type will always be set.

WithResource("deployments").
WithNamespace(workload.Namespace).
WithName(workload.Name).
WithLastGeneration(workload.Generation),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not setting the generation will set it to 0, right ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should be set to the zero value

@bertinatto bertinatto force-pushed the try-apply-workload branch 2 times, most recently from cc634a2 to 1798c3a Compare October 14, 2024 18:44
Copy link
Contributor

openshift-ci bot commented Oct 14, 2024

@bertinatto: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants