Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SDK2: Ensure service endpoints track2 #3885

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

bitoku
Copy link
Collaborator

@bitoku bitoku commented Oct 4, 2024

Which issue this PR addresses:

Fixes https://issues.redhat.com/browse/ARO-4665 and https://issues.redhat.com/browse/ARO-7316

What this PR does / why we need it:

This PR replaces old SDK to the new one.

Test plan for issue:

ci, e2e

Is there any documentation that needs to be updated for this PR?

no

How do you know this will function as expected in production?

Creating a cluster can verify the change.
It can be checked in e2e.

@tsatam
Copy link
Collaborator

tsatam commented Oct 4, 2024

/azp run ci,e2e

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

kimorris27
kimorris27 previously approved these changes Oct 8, 2024
Copy link
Contributor

@kimorris27 kimorris27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for a tiny nit.

pkg/cluster/ensureendpoints_test.go Outdated Show resolved Hide resolved
@kimorris27
Copy link
Contributor

One question though: how does this address the cluster install failure?

@bitoku
Copy link
Collaborator Author

bitoku commented Oct 8, 2024

@kimorris27

We have seen some install failures about authorization in this step.
It should be caused by the AAD propagation delay.

We can wrap the step with AuthorizationRetryingAction, but it will retry the whole step again.
Because we don't make steps idempotent, retrying the whole step might cause different issues.

By updating the old SDK to the new one, we can use built-in retry strategy in the new SDK.
This will retry just the failed API, so it's a safer way than AuthorizationRetryingAction.

Aside from the CIF, we should update old SDKs anyway.

@bitoku
Copy link
Collaborator Author

bitoku commented Oct 8, 2024

/azp run

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

kimorris27
kimorris27 previously approved these changes Oct 8, 2024
Copy link
Contributor

@kimorris27 kimorris27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kimorris27

We have seen some install failures about authorization in this step. It should be caused by the AAD propagation delay.

We can wrap the step with AuthorizationRetryingAction, but it will retry the whole step again. Because we don't make steps idempotent, retrying the whole step might cause different issues.

By updating the old SDK to the new one, we can use built-in retry strategy in the new SDK. This will retry just the failed API, so it's a safer way than AuthorizationRetryingAction.

Aside from the CIF, we should update old SDKs anyway.

Oh, cool. I didn't realize the new SDK clients had that built-in retry strategy. That makes sense!

Copy link

Please rebase pull request.

@github-actions github-actions bot added the needs-rebase branch needs a rebase label Oct 10, 2024
…nts-track2

# Conflicts:
#	pkg/cluster/cluster.go
@github-actions github-actions bot removed the needs-rebase branch needs a rebase label Oct 11, 2024
@bitoku
Copy link
Collaborator Author

bitoku commented Oct 11, 2024

/azp run

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@bitoku
Copy link
Collaborator Author

bitoku commented Oct 11, 2024

/azp run e2e

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link

@dkeohane dkeohane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a quick review, didn't look through unit tests added.

userAssignedIdentities armmsi.UserAssignedIdentitiesClient

dns dns.Manager
storage storage.Manager
subnet subnet.Manager
subnet subnet.Manager // TODO: use armSubnet instead.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO should be removed or carded up?

for _, endpoint := range endpoints {
endpointFound, serviceEndpointPtr := subnetContainsEndpoint(subnet, endpoint)

if !endpointFound || *serviceEndpointPtr.ProvisioningState != armnetwork.ProvisioningStateSucceeded {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can ProvisioningState ever be in an error state? We wouldn't want to add it then I'm guessing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants