-
Notifications
You must be signed in to change notification settings - Fork 578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🌱 test: e2e: make managed suite more robust to errors with Eventually() #5215
base: main
Are you sure you want to change the base?
Conversation
/assign @richardcase @nrb |
/test ? |
@damdo: The following commands are available to trigger required jobs:
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/test pull-cluster-api-provider-aws-e2e-eks |
Failed is unrelated (due to AWS CloudFormation stack) /test pull-cluster-api-provider-aws-e2e-eks |
f860b49
to
acd3ed3
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test pull-cluster-api-provider-aws-e2e-eks |
2 similar comments
/test pull-cluster-api-provider-aws-e2e-eks |
/test pull-cluster-api-provider-aws-e2e-eks |
/test pull-cluster-api-provider-aws-test |
acd3ed3
to
7d6b8b6
Compare
/test pull-cluster-api-provider-aws-e2e-eks |
1 similar comment
/test pull-cluster-api-provider-aws-e2e-eks |
/test pull-cluster-api-provider-aws-e2e-eks |
2 similar comments
/test pull-cluster-api-provider-aws-e2e-eks |
/test pull-cluster-api-provider-aws-e2e-eks |
AWS Cloud formation stack timed out /test pull-cluster-api-provider-aws-e2e-eks |
/test pull-cluster-api-provider-aws-e2e-eks |
@richardcase do you have any idea on why waiting for addons fails so often? |
/test pull-cluster-api-provider-aws-e2e-eks |
1 similar comment
/test pull-cluster-api-provider-aws-e2e-eks |
@@ -149,7 +149,7 @@ intervals: | |||
default/wait-machine-status: ["20m", "10s"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked into the gathered artifacts and found this:
- lastTransitionTime: "2024-11-21T20:06:09Z"
message: |-
addon_update: updating eks addon coredns: ResourceInUseException: Addon coredns cannot be updated as it is currently in UPDATING state
{
RespMetadata: {
StatusCode: 409,
RequestID: "776223f5-f6d1-4a1b-83bf-6455dbfc09f6"
},
AddonName: "coredns",
ClusterName: "eks-nodes-kapm37_eks-nodes-8g7yso-control-plane",
Message_: "Addon coredns cannot be updated as it is currently in UPDATING state"
}
So I see it's updating CoreDNS.
In this file, it's set to v1.11.1-eksbuild.8
.
Amazon has versions for a given Kube version here: https://docs.aws.amazon.com/eks/latest/userguide/managing-coredns.html#coredns-add-on-update
For Kube 1.30, it should be v1.11.3-eksbuild.2
.
I'm going to add a commit to this to see if incrementing the coredns version will help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this didn't help like I was hoping. For historic reference, https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/kubernetes-sigs_cluster-api-provider-aws/5215/pull-cluster-api-provider-aws-e2e-eks/1860077544150142976 is the test run that happened w/ v1.11.3-eksbuild.2
CoreDNS
/test pull-cluster-api-provider-aws-e2e-eks |
@damdo since eventually failed for addons test in few runs, do you think we should increase the timeout from 2 minutes? Or is there any other issue? |
cd5bd86
to
697e555
Compare
/test pull-cluster-api-provider-aws-e2e-eks |
1 similar comment
/test pull-cluster-api-provider-aws-e2e-eks |
Expect(err).ToNot(HaveOccurred()) | ||
Eventually(func() error { | ||
return mgmtClient.Get(ctx, crclient.ObjectKey{Namespace: input.Namespace.Name, Name: controlPlaneName}, controlPlane) | ||
}, 20*time.Minute, 5*time.Second).Should(Succeed(), "eventually failed trying to get the AWSManagedControlPlane") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know how this timeout relates to the one set in the e2e_eks_conf.yaml file? Are they added together? Or do they have no relation?
It still fails at |
/test pull-cluster-api-provider-aws-e2e-eks |
1 similar comment
/test pull-cluster-api-provider-aws-e2e-eks |
@damdo: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What type of PR is this?
/kind flake
What this PR does / why we need it:
make managed suite more robust to errors with Eventually()
Special notes for your reviewer:
Trying to address issues like the ones seen here: https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/kubernetes-sigs_cluster-api-provider-aws/5211/pull-cluster-api-provider-aws-e2e-eks/1856371404925046784