Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rate limit for re-creating MySQL Pods #698

Open
4 tasks
masa213f opened this issue Jun 17, 2024 · 4 comments
Open
4 tasks

rate limit for re-creating MySQL Pods #698

masa213f opened this issue Jun 17, 2024 · 4 comments
Assignees

Comments

@masa213f
Copy link
Contributor

What

Updating MOCO on a Kubernetes cluster with many MySQLClusters causes MySQL to disconnect for several minutes.
In our past failures, MOCO re-created many MySQL Pods (hundreds of pods at that time) almost simultaneously due to MOCO updates.
Then, the Cilium could not process the pod update events and delayed switching to the service's backend.
This results in the MySQLs being disconnected for several minutes.
(This failure may depend on the configuration of the k8s cluster, such as the CNI, etc.)

To prevent such failures, I want to limit the re-creating speed of MySQL Pods.

How

To limit the reconciliation speed of MySQL StatefulSet's partition (Implementing with #628 and #633).

Checklist

  • Finish implementation of the issue
  • Test all functions
  • Have enough logs to trace activities
  • Notify developers of necessary actions
@masa213f masa213f changed the title rate limit of re-creating MySQL Pods rate limit for re-creating MySQL Pods Jun 17, 2024
@ymmt2005
Copy link
Member

@masa213f
TBH, I don't like to add anything for Cilium to Moco.
Since it's a Cilium problem, other middleware besides Moco can face similar problems.

@masa213f
Copy link
Contributor Author

@ymmt2005
Thank you for the comment.

I think, this failure is due to MOCO re-creating many pods at once. So, I want to add some updates to MOCO.
It does not have to be a rate limit of partition. Do you have any ideas?

Indeed, just reading the case written here, it seems to be a problem with the Cilium.
However, in my view, there are some components that can lead to this failure, and this time, it just happened to be in Cilium.
After the Cilium tuning, the kube-controller-manager or other CNIs (depending on the k8s settings and the number of MySQLClusters) may lead to similar problems .

Based on my experience, creating and deleting pods in K8s is a time-consuming process, and we should not create or delete many pods in a short period. So, I want to shift the re-creation timing of MySQL Pods when MOCO updates.
There are risks of recurring #517.

@ymmt2005
Copy link
Member

ymmt2005 commented Jun 17, 2024

@masa213f
Thank you for your opinion.

Do you have any examples of this type of rate limit in other software?
Having a lot of MySQLCluster resources is NOT Moco's problem; it's a moco user's problem.

The same can happen, for example, with ECK if a user has a lot of Elasticsearch clusters.

@masa213f
Copy link
Contributor Author

Do you have any examples of this type of rate limit in other software?

I'll check it out.

@masa213f masa213f assigned masa213f and unassigned d-kuro Oct 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants