Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always reset agent connection backoff and enter fast sync when client count < server count #632

Merged
merged 3 commits into from
Jun 18, 2024

Conversation

carreter
Copy link
Contributor

Part of #358 (see #358 (comment)).

Current behavior

Currently, if an agent receives an increased server count from a server it is already connected to, it doesn't reset the backoff and just uses the last set duration. This could result in an agent taking a long time to connect to new proxy servers if it receives the updated server count from a server it is already connected to.

This cannot currently happen as there is no way to update the server count on a proxy server without restarting it and dropping the connection, but this will be an issue once #273 is implemented.

Testing the fix

Using #631, I set up https://github.com/carreter/apiserver-network-proxy/tree/backoff-reset-fix-test-example . Run the following in the repo root to set up a cluster with 1 proxy server on a KCP node and 1 proxy agent on a worker node:

make docker-build

cd examples/kind-multinode

# These are the default values of the registry, image name, and tag used by the Makefile.
# Edit them if necessary.
REGISTRY=gcr.io/$(gcloud config get-value project)
TAG=$(git rev-parse HEAD)
TARGET_ARCH="amd64"
SERVER_IMAGE="$REGISTRY/proxy-server-$TARGET_ARCH:$TAG"
AGENT_IMAGE="$REGISTRY/proxy-agent-$TARGET_ARCH:$TAG"

# Bring up the cluster!
./quickstart-kind.sh --cluster-name custom-knp-test --server-image "$SERVER_IMAGE" --agent-image "$AGENT_IMAGE" \
  --num-kcp-nodes 1 --num-worker-nodes 1 --sideload-images

Then, apply server-count-file.yaml with the following to upload a file with an updated server count:

kubectl apply -f server-count-file.yaml

Give it a couple seconds and you should see logs showing that the client is issuing new connection requests every 5 secs (i.e. --sync-interval) instead of every 30 secs (i.e. --sync-interval-cap) like it would without this fix!

I0613 22:04:19.702443       1 client.go:215] "Connect to server" serverID="50ea06e6-89fa-444a-9f8e-1ee9bf0d4a22"
I0613 22:04:19.702687       1 clientset.go:186] "duplicate server" serverID="50ea06e6-89fa-444a-9f8e-1ee9bf0d4a22" serverCount=1 clientsCount=1
I0613 22:04:19.703466       1 client.go:474] "received DATA" connectionID=1
I0613 22:04:19.703585       1 client.go:600] "write to remote" connectionID=1 lastData=48 dataSize=48
I0613 22:04:45.631336       1 client.go:215] "Connect to server" serverID="50ea06e6-89fa-444a-9f8e-1ee9bf0d4a22"
I0613 22:04:45.631365       1 clientset.go:219] "Server count change suggestion by server" current=1 serverID="50ea06e6-89fa-444a-9f8e-1ee9bf0d4a22" actual=4
I0613 22:04:45.631588       1 clientset.go:186] "duplicate server" serverID="50ea06e6-89fa-444a-9f8e-1ee9bf0d4a22" serverCount=4 clientsCount=1
I0613 22:04:51.070279       1 client.go:215] "Connect to server" serverID="50ea06e6-89fa-444a-9f8e-1ee9bf0d4a22"
I0613 22:04:51.070593       1 clientset.go:186] "duplicate server" serverID="50ea06e6-89fa-444a-9f8e-1ee9bf0d4a22" serverCount=4 clientsCount=1
I0613 22:04:56.324115       1 client.go:215] "Connect to server" serverID="50ea06e6-89fa-444a-9f8e-1ee9bf0d4a22"
I0613 22:04:56.324438       1 clientset.go:186] "duplicate server" serverID="50ea06e6-89fa-444a-9f8e-1ee9bf0d4a22" serverCount=4 clientsCount=1

@k8s-ci-robot k8s-ci-robot requested review from elmiko and tallclair June 13, 2024 23:53
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 13, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @carreter. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Jun 13, 2024
@carreter
Copy link
Contributor Author

@avrittrohwer for visibility

@jkh52
Copy link
Contributor

jkh52 commented Jun 18, 2024

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 18, 2024
pkg/agent/clientset.go Outdated Show resolved Hide resolved
pkg/agent/clientset.go Outdated Show resolved Hide resolved
@jkh52
Copy link
Contributor

jkh52 commented Jun 18, 2024

LGTM. See latest suggestion, happy to approve when you're ready.

@carreter
Copy link
Contributor Author

@jkh52 Should be ready now!

@jkh52 jkh52 added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Jun 18, 2024
@jkh52
Copy link
Contributor

jkh52 commented Jun 18, 2024

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 18, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: carreter, jkh52

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 18, 2024
@k8s-ci-robot k8s-ci-robot merged commit aaea192 into kubernetes-sigs:master Jun 18, 2024
16 checks passed
@carreter carreter deleted the backoff-reset-fix branch June 20, 2024 18:53
cheftako pushed a commit to cheftako/apiserver-network-proxy that referenced this pull request Nov 20, 2024
… count < server count (kubernetes-sigs#632)

* Always reset backoff and enter fast sync when clien count < server count

* Only call ClientsCount() once

* Review comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants