Skip to content

Conversation

rejain456
Copy link
Contributor

@rejain456 rejain456 commented Oct 13, 2025

Reason for Change:

Without this CNS change, expanding an overlay subnet causes the CNS daemonset to crash, resulting in a PrimaryCANotSame error for all clusters.
To enable overlay subnet expansion:

  • only for overlay clusters, we bypass the PrimaryCANotSame error.
  • The subnet address in the NNC is updated.
  • The CNS state file gets updated to reflect the new pod cidr from NNC

Simply restarting CNS does not regenerate the state file with updated information. Instead, this pr will update the state file to have the new subnet address (same as the one from updated nnc) which will prevent the error only for overlay clusters when there is a mismatch of pod cidrs to start with.

Issue Fixed:

Requirements:

Notes:

@rejain456 rejain456 requested a review from a team as a code owner October 13, 2025 16:49
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements a mechanism to handle overlay subnet expansion in CNS by detecting primary IP address changes and forcing a CNS daemon restart. When a PrimaryCA mismatch is detected, the code now deletes the CNS state file and triggers a panic to restart the daemon, allowing it to pull fresh configuration from the updated NNC.

  • Changes the behavior from returning an error code to panicking and deleting the state file
  • Adds comprehensive test coverage for the new panic-based recovery mechanism
  • Updates existing test to verify the new panic behavior instead of error return

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
cns/restserver/internalapi.go Implements the core logic to delete state file and panic on PrimaryCA mismatch
cns/restserver/internalapi_test.go Adds tests and updates existing test to verify panic behavior and state file deletion

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Collaborator

@rbtr rbtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleting the statefile is a nonstarter

@nairashu
Copy link
Contributor

Just got notified that this PR is replaced with the PR here: https://github.com/Azure/azure-container-networking/pull/4083/files

Copy link
Contributor

@nairashu nairashu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rejain789 rejain789 force-pushed the jainriya/cnsChangeForOverlayExpansion branch from 27f3eaa to d452d9f Compare October 17, 2025 15:35
@rbtr rbtr requested a review from nairashu October 17, 2025 20:36
@rbtr
Copy link
Collaborator

rbtr commented Oct 17, 2025

/azp run Azure Container Networking PR

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@rbtr
Copy link
Collaborator

rbtr commented Oct 17, 2025

/azp run Azure Container Networking PR

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@rejain789
Copy link

/azp run Azure Container Networking PR

Copy link

Azure Pipelines successfully started running 1 pipeline(s).


// Condition 2: Check if the base IP of oldCIDR is contained in newCIDR
if !newPrefix.Contains(oldPrefix.Addr()) {
return false
Copy link
Contributor

@nairashu nairashu Oct 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Full containment vs base address only

Currently, it only checks if the starting IP of oldCIDR is inside newCIDR.
If oldCIDR spans multiple IPs, this doesn’t guarantee full containment.

Suggested change
return false
// Check full containment: start and end of oldCIDR inside newCIDR
oldStart := oldPrefix.Masked().Addr()
oldEnd := oldStart.Next().Next().Prev() // compute last IP in oldCIDR
return newPrefix.Contains(oldStart) && newPrefix.Contains(oldEnd)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is possible that subnets can partially overlap, because they must be aligned on a pow2.
If we think about this bitwise, it is impossible for an IP to belong to multiple prefixes which only partially overlap - the masked bits must match, so the subnets would necessarily fully overlap, right?

@nairashu do you have a counter example?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually I think this is what netip.Prefix#Overlaps does (and @rejain456 we should just use it) -
It reduces both subnets to their minimum fixed bits and checks to see if they're equal. Because if so, the larger subnet MUST fully overlap the smaller subnet (or they are exactly equal), because it's a bitmask, and everything in
1111xxxxx is also in
111xxxxxx.

Copy link
Contributor

@nairashu nairashu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's verify full CIDR containment. Rest looks good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants