🌱 test: e2e: make managed suite more robust to errors with Eventually() #5215

damdo · 2024-11-13T14:32:19Z

What type of PR is this?
/kind flake

What this PR does / why we need it:
make managed suite more robust to errors with Eventually()

Special notes for your reviewer:
Trying to address issues like the ones seen here: https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/kubernetes-sigs_cluster-api-provider-aws/5211/pull-cluster-api-provider-aws-e2e-eks/1856371404925046784

NONE

damdo · 2024-11-13T14:32:41Z

/assign @richardcase @nrb

damdo · 2024-11-13T14:36:24Z

/test ?

k8s-ci-robot · 2024-11-13T14:36:26Z

@damdo: The following commands are available to trigger required jobs:

/test pull-cluster-api-provider-aws-build
/test pull-cluster-api-provider-aws-build-docker
/test pull-cluster-api-provider-aws-test
/test pull-cluster-api-provider-aws-verify

The following commands are available to trigger optional jobs:

/test pull-cluster-api-provider-aws-apidiff-main
/test pull-cluster-api-provider-aws-e2e
/test pull-cluster-api-provider-aws-e2e-blocking
/test pull-cluster-api-provider-aws-e2e-clusterclass
/test pull-cluster-api-provider-aws-e2e-conformance
/test pull-cluster-api-provider-aws-e2e-conformance-with-ci-artifacts
/test pull-cluster-api-provider-aws-e2e-eks
/test pull-cluster-api-provider-aws-e2e-eks-gc
/test pull-cluster-api-provider-aws-e2e-eks-testing

Use /test all to run the following jobs that were automatically triggered:

pull-cluster-api-provider-aws-apidiff-main
pull-cluster-api-provider-aws-build
pull-cluster-api-provider-aws-build-docker
pull-cluster-api-provider-aws-test
pull-cluster-api-provider-aws-verify

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

damdo · 2024-11-13T14:36:40Z

/test pull-cluster-api-provider-aws-e2e-eks

damdo · 2024-11-13T15:37:45Z

Failed is unrelated (due to AWS CloudFormation stack)

/test pull-cluster-api-provider-aws-e2e-eks

k8s-ci-robot · 2024-11-13T19:16:32Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from nrb. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

damdo · 2024-11-13T19:16:49Z

/test pull-cluster-api-provider-aws-e2e-eks

damdo · 2024-11-13T19:43:33Z

/test pull-cluster-api-provider-aws-e2e-eks

damdo · 2024-11-14T07:46:12Z

/test pull-cluster-api-provider-aws-e2e-eks

damdo · 2024-11-14T08:33:16Z

/test pull-cluster-api-provider-aws-test

damdo · 2024-11-14T11:13:58Z

/test pull-cluster-api-provider-aws-e2e-eks

damdo · 2024-11-14T14:59:14Z

/test pull-cluster-api-provider-aws-e2e-eks

damdo · 2024-11-14T17:21:06Z

/test pull-cluster-api-provider-aws-e2e-eks

damdo · 2024-11-15T08:23:44Z

/test pull-cluster-api-provider-aws-e2e-eks

damdo · 2024-11-15T10:47:08Z

/test pull-cluster-api-provider-aws-e2e-eks

damdo · 2024-11-15T11:47:43Z

AWS Cloud formation stack timed out

/test pull-cluster-api-provider-aws-e2e-eks

damdo · 2024-11-15T12:49:53Z

/test pull-cluster-api-provider-aws-e2e-eks

damdo · 2024-11-15T15:46:44Z

@richardcase do you have any idea on why waiting for addons fails so often?

damdo · 2024-11-20T07:59:37Z

/test pull-cluster-api-provider-aws-e2e-eks

damdo · 2024-11-21T17:50:22Z

/test pull-cluster-api-provider-aws-e2e-eks

nrb · 2024-11-22T19:51:43Z

test/e2e/data/e2e_eks_conf.yaml

@@ -149,7 +149,7 @@ intervals:
  default/wait-machine-status: ["20m", "10s"]


I looked into the gathered artifacts and found this:

- lastTransitionTime: "2024-11-21T20:06:09Z" message: |- addon_update: updating eks addon coredns: ResourceInUseException: Addon coredns cannot be updated as it is currently in UPDATING state { RespMetadata: { StatusCode: 409, RequestID: "776223f5-f6d1-4a1b-83bf-6455dbfc09f6" }, AddonName: "coredns", ClusterName: "eks-nodes-kapm37_eks-nodes-8g7yso-control-plane", Message_: "Addon coredns cannot be updated as it is currently in UPDATING state" }

from https://storage.googleapis.com/kubernetes-ci-logs/pr-logs/pull/kubernetes-sigs_cluster-api-provider-aws/5215/pull-cluster-api-provider-aws-e2e-eks/1859655615442325504/artifacts/clusters/bootstrap/resources/eks-nodes-kapm37/AWSManagedControlPlane/eks-nodes-8g7yso-control-plane.yaml

So I see it's updating CoreDNS.

In this file, it's set to v1.11.1-eksbuild.8.

Amazon has versions for a given Kube version here: https://docs.aws.amazon.com/eks/latest/userguide/managing-coredns.html#coredns-add-on-update

For Kube 1.30, it should be v1.11.3-eksbuild.2.

I'm going to add a commit to this to see if incrementing the coredns version will help.

Looks like this didn't help like I was hoping. For historic reference, https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/kubernetes-sigs_cluster-api-provider-aws/5215/pull-cluster-api-provider-aws-e2e-eks/1860077544150142976 is the test run that happened w/ v1.11.3-eksbuild.2 CoreDNS

damdo · 2024-11-22T21:46:57Z

/test pull-cluster-api-provider-aws-e2e-eks

Ankitasw · 2024-11-26T12:42:30Z

@damdo since eventually failed for addons test in few runs, do you think we should increase the timeout from 2 minutes? Or is there any other issue?

damdo · 2024-11-26T14:21:12Z

/test pull-cluster-api-provider-aws-e2e-eks

damdo · 2024-11-26T15:19:53Z

/test pull-cluster-api-provider-aws-e2e-eks

nrb · 2024-11-26T15:59:40Z

test/e2e/suites/managed/addon.go

-	Expect(err).ToNot(HaveOccurred())
+	Eventually(func() error {
+		return mgmtClient.Get(ctx, crclient.ObjectKey{Namespace: input.Namespace.Name, Name: controlPlaneName}, controlPlane)
+	}, 20*time.Minute, 5*time.Second).Should(Succeed(), "eventually failed trying to get the AWSManagedControlPlane")


Do you know how this timeout relates to the one set in the e2e_eks_conf.yaml file? Are they added together? Or do they have no relation?

damdo · 2024-11-26T16:25:57Z

It still fails at Should've eventually succeeded creating an AWS CloudFormation stack

damdo · 2024-11-26T16:26:10Z

/test pull-cluster-api-provider-aws-e2e-eks

damdo · 2024-11-26T19:40:15Z

/test pull-cluster-api-provider-aws-e2e-eks

k8s-ci-robot · 2024-11-26T22:37:29Z

@damdo: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-cluster-api-provider-aws-test	`697e555`	link	true	`/test pull-cluster-api-provider-aws-test`
pull-cluster-api-provider-aws-e2e-eks	`697e555`	link	false	`/test pull-cluster-api-provider-aws-e2e-eks`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/flake Categorizes issue or PR as related to a flaky test. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Nov 13, 2024

k8s-ci-robot requested review from dlipovetsky and nrb November 13, 2024 14:32

k8s-ci-robot added needs-priority size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 13, 2024

k8s-ci-robot assigned nrb and richardcase Nov 13, 2024

damdo force-pushed the fix-eventually branch from f860b49 to acd3ed3 Compare November 13, 2024 19:16

test: e2e: make managed suite more robust to errors with Eventually()

7d6b8b6

damdo force-pushed the fix-eventually branch from acd3ed3 to 7d6b8b6 Compare November 14, 2024 11:13

damdo mentioned this pull request Nov 14, 2024

✨ Add the support for capacity blocks #5211

Open

5 tasks

nrb reviewed Nov 22, 2024

View reviewed changes

increase wait-addon-status timeout

697e555

damdo force-pushed the fix-eventually branch from cd5bd86 to 697e555 Compare November 26, 2024 14:09

nrb reviewed Nov 26, 2024

View reviewed changes

nrb mentioned this pull request Dec 4, 2024

EKS e2e tests permanently failing #5237

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🌱 test: e2e: make managed suite more robust to errors with Eventually() #5215

🌱 test: e2e: make managed suite more robust to errors with Eventually() #5215

damdo commented Nov 13, 2024

damdo commented Nov 13, 2024

damdo commented Nov 13, 2024

k8s-ci-robot commented Nov 13, 2024

damdo commented Nov 13, 2024

damdo commented Nov 13, 2024

k8s-ci-robot commented Nov 13, 2024

damdo commented Nov 13, 2024

damdo commented Nov 13, 2024

damdo commented Nov 14, 2024

damdo commented Nov 14, 2024

damdo commented Nov 14, 2024

damdo commented Nov 14, 2024

damdo commented Nov 14, 2024

damdo commented Nov 15, 2024

damdo commented Nov 15, 2024

damdo commented Nov 15, 2024

damdo commented Nov 15, 2024

damdo commented Nov 15, 2024

damdo commented Nov 20, 2024

damdo commented Nov 21, 2024

nrb Nov 22, 2024

nrb Nov 25, 2024

damdo commented Nov 22, 2024

Ankitasw commented Nov 26, 2024

damdo commented Nov 26, 2024

damdo commented Nov 26, 2024

nrb Nov 26, 2024

damdo commented Nov 26, 2024

damdo commented Nov 26, 2024

damdo commented Nov 26, 2024

k8s-ci-robot commented Nov 26, 2024

		@@ -149,7 +149,7 @@ intervals:
		default/wait-machine-status: ["20m", "10s"]

🌱 test: e2e: make managed suite more robust to errors with Eventually() #5215

Are you sure you want to change the base?

🌱 test: e2e: make managed suite more robust to errors with Eventually() #5215

Conversation

damdo commented Nov 13, 2024

damdo commented Nov 13, 2024

damdo commented Nov 13, 2024

k8s-ci-robot commented Nov 13, 2024

damdo commented Nov 13, 2024

damdo commented Nov 13, 2024

k8s-ci-robot commented Nov 13, 2024

damdo commented Nov 13, 2024

damdo commented Nov 13, 2024

damdo commented Nov 14, 2024

damdo commented Nov 14, 2024

damdo commented Nov 14, 2024

damdo commented Nov 14, 2024

damdo commented Nov 14, 2024

damdo commented Nov 15, 2024

damdo commented Nov 15, 2024

damdo commented Nov 15, 2024

damdo commented Nov 15, 2024

damdo commented Nov 15, 2024

damdo commented Nov 20, 2024

damdo commented Nov 21, 2024

nrb Nov 22, 2024

Choose a reason for hiding this comment

nrb Nov 25, 2024

Choose a reason for hiding this comment

damdo commented Nov 22, 2024

Ankitasw commented Nov 26, 2024

damdo commented Nov 26, 2024

damdo commented Nov 26, 2024

nrb Nov 26, 2024

Choose a reason for hiding this comment

damdo commented Nov 26, 2024

damdo commented Nov 26, 2024

damdo commented Nov 26, 2024

k8s-ci-robot commented Nov 26, 2024