-
Notifications
You must be signed in to change notification settings - Fork 257
CNS Change for Subnet Overlay Expansion Job #4074
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements a mechanism to handle overlay subnet expansion in CNS by detecting primary IP address changes and forcing a CNS daemon restart. When a PrimaryCA mismatch is detected, the code now deletes the CNS state file and triggers a panic to restart the daemon, allowing it to pull fresh configuration from the updated NNC.
- Changes the behavior from returning an error code to panicking and deleting the state file
- Adds comprehensive test coverage for the new panic-based recovery mechanism
- Updates existing test to verify the new panic behavior instead of error return
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
File | Description |
---|---|
cns/restserver/internalapi.go | Implements the core logic to delete state file and panic on PrimaryCA mismatch |
cns/restserver/internalapi_test.go | Adds tests and updates existing test to verify panic behavior and state file deletion |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deleting the statefile is a nonstarter
Just got notified that this PR is replaced with the PR here: https://github.com/Azure/azure-container-networking/pull/4083/files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
27f3eaa
to
d452d9f
Compare
/azp run Azure Container Networking PR |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run Azure Container Networking PR |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run Azure Container Networking PR |
Azure Pipelines successfully started running 1 pipeline(s). |
|
||
// Condition 2: Check if the base IP of oldCIDR is contained in newCIDR | ||
if !newPrefix.Contains(oldPrefix.Addr()) { | ||
return false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Full containment vs base address only
Currently, it only checks if the starting IP of oldCIDR is inside newCIDR.
If oldCIDR spans multiple IPs, this doesn’t guarantee full containment.
return false | |
// Check full containment: start and end of oldCIDR inside newCIDR | |
oldStart := oldPrefix.Masked().Addr() | |
oldEnd := oldStart.Next().Next().Prev() // compute last IP in oldCIDR | |
return newPrefix.Contains(oldStart) && newPrefix.Contains(oldEnd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is possible that subnets can partially overlap, because they must be aligned on a pow2.
If we think about this bitwise, it is impossible for an IP to belong to multiple prefixes which only partially overlap - the masked bits must match, so the subnets would necessarily fully overlap, right?
@nairashu do you have a counter example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually I think this is what netip.Prefix#Overlaps
does (and @rejain456 we should just use it) -
It reduces both subnets to their minimum fixed bits and checks to see if they're equal. Because if so, the larger subnet MUST fully overlap the smaller subnet (or they are exactly equal), because it's a bitmask, and everything in
1111xxxxx
is also in
111xxxxxx
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's verify full CIDR containment. Rest looks good.
Reason for Change:
Without this CNS change, expanding an overlay subnet causes the CNS daemonset to crash, resulting in a PrimaryCANotSame error for all clusters.
To enable overlay subnet expansion:
Simply restarting CNS does not regenerate the state file with updated information. Instead, this pr will update the state file to have the new subnet address (same as the one from updated nnc) which will prevent the error only for overlay clusters when there is a mismatch of pod cidrs to start with.
Issue Fixed:
Requirements:
Notes: