Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-slb related bug fixes #7432

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

nilo19
Copy link
Contributor

@nilo19 nilo19 commented Oct 29, 2024

What type of PR is this?

/kind bug

What this PR does / why we need it:

  1. All endpointslices of a local service should be included in local backend pool updater, instead of only the first endpointslice.
  2. In some rare cases, migration from NIC to IP-based LB can be in a middle state where the NIC references are removed, but those IPConfigs in the backend pool are not. In this case, we should manually exclude those IPConfigs from the request body.
  3. localServiceOwnsBackendPool should compare the full backend pool name, not just prefix, because two service names can share the same prefix.
  4. There is a corner case when the cluster is being updated to multi-slb from classic NIC-based single lb, not from an IP-based cluster. In this case, if the service being reconciled is local, the cloud provider will try to update a NIC pool to IP-based pool direct, which is not allowed. We should skip adding IPs to NIC-based pool in multi-slb mode.
  5. There is a bug in ReconcileBackendPools, where we by mistake parse the LB name to use as the backend pool name.

Which issue(s) this PR fixes:

Fixes #7113
Fixes #7200
Fixes #6980

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Fix: several bugs related to multiple standard load balancers mode.
1. All endpointslices of a local service should be included in local backend pool updater, instead of only the first endpointslice.
2. In some rare cases, migration from NIC to IP-based LB can be in a middle state where the NIC references are removed, but those IPConfigs in the backend pool are not. In this case, we should manually exclude those IPConfigs from the request body.
3. localServiceOwnsBackendPool should compare the full backend pool name, not just prefix, because two service names can share the same prefix.
4. There is a corner case when the cluster is being updated to multi-slb from classic NIC-based single lb, not from an IP-based cluster. In this case, if the service being reconciled is local, the cloud provider will try to update a NIC pool to IP-based pool direct, which is not allowed. We should skip adding IPs to NIC-based pool in multi-slb mode.
5. There is a bug in ReconcileBackendPools, where we by mistake parse the LB name to use as the backend pool name.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Oct 29, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: nilo19

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Oct 29, 2024
activeNodes = bi.getLocalServiceEndpointsNodeNames(service)
}

if isNICPool(backendPool) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix no. 4

@@ -886,7 +889,13 @@ func removeNodeIPAddressesFromBackendPool(
if addresses[i].LoadBalancerBackendAddressPropertiesFormat != nil {
ipAddress := ptr.Deref((*backendPool.LoadBalancerBackendAddresses)[i].IPAddress, "")
if ipAddress == "" {
klog.V(4).Infof("removeNodeIPAddressFromBackendPool: LoadBalancerBackendAddress %s is not IP-based, skipping", ptr.Deref(addresses[i].Name, ""))
if isNodeIP {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix no. 2

@@ -633,7 +634,7 @@ func (bi *backendPoolTypeNodeIP) ReconcileBackendPools(ctx context.Context, clus
if isMigration && bi.EnableMigrateToIPBasedBackendPoolAPI {
var backendPoolNames []string
for _, id := range lbBackendPoolIDsSlice {
name, err := getLBNameFromBackendPoolID(id)
name, err := getBackendPoolNameFromBackendPoolID(id)
Copy link
Contributor Author

@nilo19 nilo19 Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix no. 5

@@ -446,8 +445,10 @@ func (az *Cloud) getLocalServiceBackendPoolID(serviceName string, lbName string,

// localServiceOwnsBackendPool checks if a backend pool is owned by a local service.
func localServiceOwnsBackendPool(serviceName, bpName string) bool {
prefix := strings.Replace(serviceName, "/", "-", -1)
return strings.HasPrefix(strings.ToLower(bpName), strings.ToLower(prefix))
if strings.HasSuffix(strings.ToLower(bpName), consts.IPVersionIPv6StringLower) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix no. 3

ep = endpointSlice
foundInCache = true
return false
eps = append(eps, endpointSlice)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix no. 1

@nilo19 nilo19 force-pushed the fix/multi-slb/endpointslice branch from 2b510ac to 5084147 Compare October 29, 2024 04:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
2 participants