Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase the timeout value for e2e downgrade tests #19366

Conversation

henrybear327
Copy link
Contributor

Since the e2e downgrade tests exhibits timeout on the CI more frequently than expected, this PR attempts to increase timeout and see if it will alleviate the situation.

See one of the timeout being flagged here

Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: henrybear327
Once this PR has been reviewed and has the lgtm label, please assign ahrtr for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Since the e2e downgrade tests exhibits timeout on the CI more frequently
than expected, this PR attempts to increase timeout and see if it will
alleviate the situation.

Signed-off-by: Chun-Hung Tseng <[email protected]>
@henrybear327 henrybear327 force-pushed the e2e/increase_downgrade_e2e_test_timeout branch from c273909 to 9cc6361 Compare February 9, 2025 22:04
Copy link

codecov bot commented Feb 9, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 68.91%. Comparing base (9de211d) to head (9cc6361).
Report is 26 commits behind head on main.

Additional details and impacted files

see 20 files with indirect coverage changes

@@            Coverage Diff             @@
##             main   #19366      +/-   ##
==========================================
- Coverage   68.98%   68.91%   -0.08%     
==========================================
  Files         420      420              
  Lines       35739    35739              
==========================================
- Hits        24656    24628      -28     
- Misses       9660     9682      +22     
- Partials     1423     1429       +6     

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9de211d...9cc6361. Read the comment docs.

@@ -56,7 +56,7 @@ func DowngradeCancel(t *testing.T, epc *EtcdProcessCluster) {
c := epc.Etcdctl()

var err error
testutils.ExecuteWithTimeout(t, 1*time.Minute, func() {
testutils.ExecuteWithTimeout(t, 2*time.Minute, func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate why it takes so long to cancel the downgrade? @henrybear327 @siyuanfoundation

Copy link
Contributor Author

@henrybear327 henrybear327 Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't had a solid conclusion so far. I will have to investigate deeper to be able to answer

  1. How often are we getting the time out if we use 1 min
  2. What are the usual errors that precedes the timeout

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siyuanfoundation @henrybear327 can you take this as a high priority? I see multiple times of timeout of the downgrade e2e cases.

60s should be already long enough. We need a clearer understanding why it takes so long.

also cc @fuweid

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am checking this flaky case in my local. I think it maybe related to case. So, I think increasing timeout isn't working

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fuweid Thanks for looking into this. Just assigned #19391 to you. Feel free to let me know if you need discussion tomorrow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will keep looking into this tomorrow morning as my first task.

@@ -143,7 +143,7 @@ func ValidateMemberVersions(t *testing.T, epc *EtcdProcessCluster, expect []*ver
}

func ValidateVersion(t *testing.T, cfg *EtcdProcessClusterConfig, member EtcdProcess, expect version.Versions) {
testutils.ExecuteWithTimeout(t, 1*time.Minute, func() {
testutils.ExecuteWithTimeout(t, 2*time.Minute, func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question why it takes so long?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahrtr
Copy link
Member

ahrtr commented Feb 12, 2025

See fix in #19398 and discussion in #19391

@ahrtr ahrtr closed this Feb 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants