Skip to content

Conversation

@xueqzhan
Copy link
Contributor

@xueqzhan xueqzhan commented Nov 11, 2025

This PR adds a testScheduler that

  1. Add conflict handler that will prevent tests with same conflict from running at the same time.
  2. Conflicts are detected within conflictGroup, which is used to support future conflict modes: instance, bucket and exec
  3. This also adds support for taint/toleration. The idea is to allow some tests running while forbidding all others from running at the same time.

If taint/toleration function looks ok, I will add the structure in the extension library and adjust here accordingly.

/hold for review and further test

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Nov 11, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Nov 11, 2025

@xueqzhan: This pull request references TRT-2292 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

This PR adds a testScheduler that

  1. Add conflict handler that will prevent tests with same conflict from running at the same time.
  2. Conflicts are detected within conflictGroup, which is used to support future conflict modes: instance, bucket and exec
  3. This also adds support for taint/toleration. The idea is to allow some tests running while forbidding all others from running at the same time.

/hold for review and further test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 11, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Nov 11, 2025

@xueqzhan: This pull request references TRT-2292 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

This PR adds a testScheduler that

  1. Add conflict handler that will prevent tests with same conflict from running at the same time.
  2. Conflicts are detected within conflictGroup, which is used to support future conflict modes: instance, bucket and exec
  3. This also adds support for taint/toleration. The idea is to allow some tests running while forbidding all others from running at the same time.

If taint/toleration function looks ok, I will add the structure in the extension library and adjust here accordingly.

/hold for review and further test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from deads2k and p0lyn0mial November 11, 2025 20:54
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 11, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: xueqzhan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 11, 2025
Copy link
Member

@stbenjam stbenjam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good, just a couple comments

@openshift-ci-robot
Copy link

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

@openshift-ci-robot
Copy link

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

@xueqzhan
Copy link
Contributor Author

/retest-required

@openshift-trt
Copy link

openshift-trt bot commented Nov 18, 2025

Job Failure Risk Analysis for sha: df32893

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-ipv6 IncompleteTests
Tests for this run (22) are below the historical average (2192): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

@xueqzhan
Copy link
Contributor Author

/retest-required

Comment on lines 65 to 66
// isolation defines conflict groups, mode, taints, and tolerations for test isolation
isolation extensiontests.Isolation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to copy this or could we just access it via spec?

Copy link
Contributor Author

@xueqzhan xueqzhan Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't seem that we are copying spec in extensionTestSpecsToOriginTestCases. I can and use the spec instead if that is preferred.

Copy link
Member

@stbenjam stbenjam Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely should, we have a field for it. Retry copies it -- looks like maybe an oversight that it wasn't copied in the original creation of them

// getTestConflictGroup returns the conflict group for a test.
// Conflicts are only checked within the same conflict group.
func getTestConflictGroup(test *testCase) string {
return "default"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to get replaced from the content in the spec right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spec doesn't define group. Group is designed to support mode.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I'm not quite understanding what this function does. Isolation has two fields relevant, Conflicts and Mode. I think we agreed to only support mode=exec for now, but shouldn't we get the Conflicts out of the Isolation struct?

Why does this always return default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to have a framework to support mode in the future. So conflict is only checked within a conflictGroup. Right now all tests belong to default group and therefore work like mode=exec. But just in case another mode is needed, more conflictGroup will be created for that purpose.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason not to call it mode here instead of conflict group, and use the name we're implementing ("exec")? It's not clear to me "group" is linked to the "mode"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mode is not clear to indicate a grouping mechanism. But conflictGroup is clear about its functionality of grouping. But configGroup can be used to implement mode. So I think conflictGroup is still better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants