Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add worker profile status #3053

Merged
merged 17 commits into from
Sep 5, 2023
Merged

Conversation

gouthamMN
Copy link
Contributor

@gouthamMN gouthamMN commented Jul 21, 2023

Which issue this PR addresses:

Fixes https://issues.redhat.com/browse/ARO-2124

What this PR does / why we need it:

OpenShiftCluster GET API returns enriched worker profile data under a new field "workerProfilesStatus". This change is introduced only to new API. More details can be found on Design Doc

Any API call made to endpoints /openShiftClusters or /openShiftCluster with GET method using newer API versions (>= v2023-09-04) will return the enriched worker profile data under a new field "workerProfilesStatus" in the response and "workerProfiles" filed will contain the last input that was passed in PUT/PATCH request body for cluster creation or update.

The response and behavior of any older APIs versions (< v2023-09-04) remain unchanged i.e, only "workerProfiles" filed exists in the response.

Test plan for issue:

Existing unit tests and E2E test copied over from previous API.
As suggested in the comments, performed the Regression Tests:

  1. Created a ARO clustered from az cli **az aro create**.
  2. Performed ARO cluster update **az aro update**.

From the test, it was confirmed that there were no changes in the behavior of the existing stable API (v2023-04-01). The new API (v2023-09-04) cannot be tested at this point as the az cli is not updated to use it.

Is there any documentation that needs to be updated for this PR?

N/A

weinong
weinong previously approved these changes Jul 21, 2023
Copy link
Contributor

@weinong weinong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lranjbar
Copy link
Contributor

The code LGTM. Let's identify the code using these API endpoints and check if they need updates in a subsequent PR.

Copy link
Collaborator

@bennerv bennerv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're missing updating any usage to point to the new workerProfilesStatus.

Anywhere we use kubeSubnets or the workerProfiles field internally may be subject to breakage or not complete / expected functionality. We should explore them, as the PRs should be merged together to prevent any breaking changes by introducing this change.

@@ -141,6 +141,9 @@ func (tw *typeWalker) schemaFromType(t types.Type, deps map[*types.Named]struct{

properties := tw.schemaFromType(field.Type(), deps)
properties.Description = strings.Trim(node.Doc.Text(), "\n")
if field.Name() == "WorkerProfilesStatus" {
properties.ReadOnly = true
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does ARM prevent the user passing in values for this if we set the swagger specification to ReadOnly? If not, we should do a server-side check to ensure the user can't actually patch this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated openshiftcluster_validatestatic.go to validate workerProfilesStatus filed is nil in the input request. This gets called during preflightvalidation.

Comment on lines +54 to +62
workerProfiles := oc.Properties.WorkerProfiles

// Use enriched worker profile data when available
if oc.Properties.WorkerProfilesStatus != nil {
workerProfiles = oc.Properties.WorkerProfilesStatus
}

out.Properties.WorkerProfiles = make([]WorkerProfile, 0, len(workerProfiles))
for _, p := range workerProfiles {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this generate new examples for the older API?

Copy link
Contributor Author

@gouthamMN gouthamMN Jul 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, because under openshiftcluster_example.go package we set the WorkerProfilesStatus field to nil before calling the converter function (ToExternal). This will ensure that the swagger examples remain unchanged.

Older APIs swagger examples never showed the enriched worker profile data under workerProfiles field. So I decided to keep it same.

@@ -64,6 +64,9 @@ type OpenShiftClusterProperties struct {
// The cluster worker profiles.
WorkerProfiles []WorkerProfile `json:"workerProfiles,omitempty"`

// The cluster worker profiles status.
WorkerProfilesStatus []WorkerProfile `json:"workerProfilesStatus,omitempty"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm wondering if we should make a new type for this. In the update code, we never actually set some of the parameters that exist within a workerProfile struct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WorkerProfilesStatus - I see all the field are set/referenced under worker_profile.go package while performing the enrichment.

pkg/frontend/openshiftcluster_putorpatch.go Show resolved Hide resolved
pkg/util/clusterdata/worker_profile.go Outdated Show resolved Hide resolved
@gouthamMN
Copy link
Contributor Author

I think we're missing updating any usage to point to the new workerProfilesStatus.

Anywhere we use kubeSubnets or the workerProfiles field internally may be subject to breakage or not complete / expected functionality. We should explore them, as the PRs should be merged together to prevent any breaking changes by introducing this change.

These are the list of ARO-RP pkgs that had reference to workerProfiles filed and I had updated whichever needs to be updated to workerProfilesStatus. Let me know if you think I'm missing anything.

@github-actions github-actions bot added the needs-rebase branch needs a rebase label Jul 28, 2023
@github-actions
Copy link

Please rebase pull request.

@github-actions github-actions bot removed the needs-rebase branch needs a rebase label Jul 28, 2023
@gouthamMN gouthamMN removed the LGTM Looks Good To Me label Jul 28, 2023
@gouthamMN gouthamMN added the LGTM Looks Good To Me label Aug 7, 2023
Copy link
Contributor

@s-amann s-amann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we need to at least rethink the conversion logic so it can account for nil workerProfileStatus before workerProfiles since the enrichment in this PR will never populate workerProfiles. See comments below.

@@ -102,10 +102,10 @@ func (ce machineClientEnricher) Enrich(
oc.Lock.Lock()
defer oc.Lock.Unlock()

oc.Properties.WorkerProfiles = workerProfiles
Copy link
Contributor

@s-amann s-amann Aug 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after further thinking, I believe this will break our admin api. Since the doc will not have workerProfiles set, when the conversion is made, workerProfiles is nil and so the output will not have workerProfiles since we first check for workerProfiles and then check for workerProfileStatus. Any automation in place which relies on the workerProfiles via admin API would need to be changed.

I think an alternative approach, would be to leave the doc the same (always have it be enriched via workerProfiles) and for new API versions we can do the conversion.

Copy link
Contributor Author

@gouthamMN gouthamMN Aug 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated workerProfiles references as suggested by Ben.

@gouthamMN gouthamMN removed the LGTM Looks Good To Me label Aug 8, 2023
@bennerv
Copy link
Collaborator

bennerv commented Aug 8, 2023

I haven't had the chance to review the list of places it's been used yet, but we need to be careful this change doesn't break existing functionality.

Think this:

  1. Essentially this change makes the workerProfile immutable. the customer sets it on creation, but moving forward, they actually can't update the field and the RP does anything about it. Probably not the best behavior, but the reality
  2. The workerProfilesStatus field represents the in-cluster state of the world. We use this in multiple places: to gather all subnets and ensure we have the proper RBAC on the vnets.
  3. When a customer performs day2 operations, they can technically bring worker nodes from a completely different subnet
  4. Because of 3, we need to update our usage everywhere workerProfiles is used. We always want to reconcile or use the state of the in-cluster machines/machinesets, otherwise this change may have undesired consequences.

I would say generally we can proceed by:

  1. Anywhere workerProfiles is used, change it to workerProfilesStatus if workerProfilesStatus != nil || len(workerProfilesStatus) > 0. Otherwise use workerProfiles.
  2. When we update the azure-cli to use the latest API version, we need to take the intersection of workerProfiles and workerProfilesStatus to ensure we have the proper FPSP and CSP RBAC to work.

@gouthamMN
Copy link
Contributor Author

gouthamMN commented Aug 10, 2023

I haven't had the chance to review the list of places it's been used yet, but we need to be careful this change doesn't break existing functionality.

Think this:

  1. Essentially this change makes the workerProfile immutable. the customer sets it on creation, but moving forward, they actually can't update the field and the RP does anything about it. Probably not the best behavior, but the reality
  2. The workerProfilesStatus field represents the in-cluster state of the world. We use this in multiple places: to gather all subnets and ensure we have the proper RBAC on the vnets.
  3. When a customer performs day2 operations, they can technically bring worker nodes from a completely different subnet
  4. Because of 3, we need to update our usage everywhere workerProfiles is used. We always want to reconcile or use the state of the in-cluster machines/machinesets, otherwise this change may have undesired consequences.

I would say generally we can proceed by:

  1. Anywhere workerProfiles is used, change it to workerProfilesStatus if workerProfilesStatus != nil || len(workerProfilesStatus) > 0. Otherwise use workerProfiles.
  2. When we update the azure-cli to use the latest API version, we need to take the intersection of workerProfiles and workerProfilesStatus to ensure we have the proper FPSP and CSP RBAC to work.

made changes to the workerProfiles references as suggested.

@gouthamMN gouthamMN added the go Pull requests that update Go code label Aug 11, 2023
@gouthamMN gouthamMN removed the size-medium Size medium label Aug 17, 2023
@s-amann
Copy link
Contributor

s-amann commented Aug 21, 2023

/azp run e2e

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@bennerv bennerv added the next-release To be included in the next RP release rollout label Aug 21, 2023
Copy link
Member

@petrkotas petrkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @gouthamMN thank you for the PR. I believe it should work as is, but this has potential to go in regression. Meaning, I approve, but we need at least one more approve before merging. @bennerv ?

@gouthamMN
Copy link
Contributor Author

gouthamMN commented Aug 22, 2023

@bennerv - I have added JIRA item https://issues.redhat.com/browse/ARO-4016 to work on updating the azure-cli to take the intersection of workerProfiles and workerProfilesStatus to ensure we have the proper FPSP and CSP RBAC to work. Once this PR is merged I will update the azure-cli.

Copy link
Contributor

@s-amann s-amann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, considering the risk of regression and the importance of workerprofiles, can we deploy this branch to INT, create a cluster and run an az aro update on it, PUCM it, and see what the output of the GetCluster GA is?

@gouthamMN
Copy link
Contributor Author

gouthamMN commented Aug 28, 2023

lgtm, considering the risk of regression and the importance of workerprofiles, can we deploy this branch to INT, create a cluster and run an az aro update on it, PUCM it, and see what the output of the GetCluster GA is?

As suggested, performed the Regression Tests:

  1. Created a ARO clustered from az cli **az aro create**.
  2. Performed ARO cluster update **az aro update**.

From the test it was confirmed there were no changes in the behavior of the existing stable API (v2023-04-01). The new API (v2023-09-04) cannot be tested at this point as the az cli is not updated to use it.

@s-amann s-amann requested a review from bennerv September 5, 2023 17:15
@s-amann s-amann dismissed bennerv’s stale review September 5, 2023 18:48

stale, appears concerns have been addressed

@s-amann s-amann merged commit dba48c1 into Azure:master Sep 5, 2023
18 checks passed
SrinivasAtmakuri pushed a commit to SrinivasAtmakuri/ARO-RP that referenced this pull request Sep 18, 2023
* add workerProfilesStatus field to hold the enriched worker profile data

* update swagger

* swagger examples

* update clients

* resolve golint

* update defaults

* validate worker Profile status is nil in input request

* make client changes after rebase

* rebase

* update workerProfiles references and UTs

* fix golint errors

* remove duplicate logic of verifing workerProfilesStatus not nil

---------

Co-authored-by: gniranjan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go Pull requests that update Go code next-release To be included in the next RP release rollout ready-for-review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants