Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed the flaky test in the ProtectedPathSpec #5648

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -373,21 +373,15 @@ class ProtectedPathSpec extends HealthCheckSpecification {
@Tags([ISL_RECOVER_ON_FAIL, ISL_PROPS_DB_RESET])
def "Flow swaps to protected path when main path gets broken, becomes DEGRADED if protected path is unable to reroute(no bw)"() {
given: "Two switches with 2 diverse paths at least"
//def switchPair = switchPairs.all().withAtLeastNNonOverlappingPaths(2).random()
//https://github.com/telstra/open-kilda/issues/5608
def switchesWhere5608IsReproducible = topology.activeSwitches.findAll {it.dpId.toString().endsWith("08")
||it.dpId.toString().endsWith("09")}
def switchPair = switchPairs.all()
.excludeSwitches(switchesWhere5608IsReproducible)
.withAtLeastNNonOverlappingPaths(2).random()
def switchPair = switchPairs.all().withAtLeastNNonOverlappingPaths(2).random()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly is the fix? #5608 is still open and not solved, as far as I can see.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let me add some details to explain my PR.

So, I was debugging a couple of tests (namely, "Flow swaps to protected path when main path gets broken, becomes DEGRADED if protected path is unable to reroute(no bw)" and "Flow swaps to protected path when main path gets broken, becomes DEGRADED if protected path is unable to reroute(no path)" because they failed sometimes for the past month. The failure was related to the same Couldn't find non overlapping protected path. Skipped creating it error message mentioned in #5608. So, even with the WA to skip 8-9 switches, the overlapping issue was still reproducible.

  • Flow swaps to protected path when main path gets broken, becomes DEGRADED if protected path is unable to reroute(no bw) - here the issue was still reproducible between the switches 2-3, please see the screenshot below.
    image
    When I investigated this test, it turned out that the protected path ISL has enough bw, that is why the test cannot get expected "Not enough bandwidth or no path found" message. So the fix in this PR is done to reduce the BW on the original protected path ISLs. So, with this fix, the test passes because the flow is degraded due to the expected "Not enough bandwidth or no path found" message. So, now no need to skip 8-9 switches since with the correct BW reducing, the test passes.
  • Flow swaps to protected path when main path gets broken, becomes DEGRADED if protected path is unable to reroute(no path) - this test also failed sometimes due to Overlapping issue - even with the WA to skip 8-9 switches. I made the PR Fixed the flaky test when the flow swaps to protected path #5645 to fix this test because the ISLs were not broken correctly there. But now I see that with this fix from Fixed the flaky test when the flow swaps to protected path #5645, the WA to skip 8-9 can be removed, it is not reproduced anymore.

So, I have executed both these tests 20 times with just 8-9 switchpair, 2-3 switchpair, random switchpair and the overlapping issue is not reproduced anymore - the tests are passing. Thus, there is no need to exclude 8-9 switches from the switchpair anymore.


when: "Create flow with protected path"
def flow = flowHelperV2.randomFlow(switchPair).tap { allocateProtectedPath = true }
flowHelperV2.addFlow(flow)
def path = northbound.getFlowPath(flow.flowId)

and: "Other paths have not enough bandwidth to host the flow in case of reroute"
def otherIsls = switchPair.paths.findAll { it != pathHelper.convert(path.protectedPath) &&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when we removed path.protectedPath from exclusion, we set up bandwidth less that required (protected path), but after isl break, our new main path is UP (very strange behaviour). Need to discuss.

def otherIsls = switchPair.paths.findAll {
it != pathHelper.convert(path) }.collectMany { pathHelper.getInvolvedIsls(it) }
.unique { a, b -> a == b || a == b.reversed ? 0 : 1 }
otherIsls.collectMany{[it, it.reversed]}.each {
Expand Down Expand Up @@ -436,14 +430,7 @@ Failed to find path with requested bandwidth=$flow.maximumBandwidth/
@Tags(ISL_RECOVER_ON_FAIL)
def "Flow swaps to protected path when main path gets broken, becomes DEGRADED if protected path is unable to reroute(no path)"() {
given: "Two switches with 2 diverse paths at least"
//def switchPair = switchPairs.all().withAtLeastNNonOverlappingPaths(2).random()
//https://github.com/telstra/open-kilda/issues/5608
def switchesWhere5608IsReproducible = topology.activeSwitches.findAll {it.dpId.toString().endsWith("08")
||it.dpId.toString().endsWith("09")}
def switchPair = switchPairs.all()
.excludeSwitches(switchesWhere5608IsReproducible)
.withAtLeastNNonOverlappingPaths(2).random()

def switchPair = switchPairs.all().withAtLeastNNonOverlappingPaths(2).random()

when: "Create flow with protected path"
def flow = flowHelperV2.randomFlow(switchPair).tap { allocateProtectedPath = true }
Expand Down
Loading