-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
help request: Apisix ETCD going into Crash loop back off #11338
Comments
Do you have a strong requirement to use |
I think you need to try |
The most recent version is 2.8.0 (released in Jun 04, 2024). So, was using one version prior to that which got released on April. May I know which version of helm chart you're using? |
If it's cluster's etcd then we have to login into the node and execute the commands, since here it is running as pod, not sure where to execute etcdctl commands and also as the pods are in crash loop back off, I can't even exec into the pods. |
There is one running etcd pod, run BTW, I think it's more likely a etcd problem. |
Hello, I have no name!@apisix-etcd-2:/opt/bitnami/etcd$ etcdctl member list this is the member id I found in the logs of crashing pods [local-member-id":"2c16fb63879f0d98"]. I also tried disabling apisix etcd and used an external etcd but it was not able to integrate with the running etcd pod. I'm trying to fix that. Share if anything you know could help. |
any solutions for this issue iam also facing the same issue for last 3 days |
I have changed the etcd version in chart.yaml to "10.1.0". Now all pods are in running state. I'm checking few things in UI to make sure everything is working fine. If you are using Helm chart for deploying apisix, try this. |
thanks will try and update mam |
Apisix is working fine after upgrading the version of etcd in chart.yaml as "10.1.0". So, closing this issue. |
Hi @Lakshmi2k1 Still are you facing the same issue? Need some suggestion on it. We have upgraded to 10.2.6 but still facing the same issue |
Still having the same issue. We downloaded and added the entire chart dir, setting the etcd version in chart.yaml to "10.1.0" as suggested by @Lakshmi2k1 Are there any plans to have this fixed? |
Hi @BadTorro try to enable the disaster recovery cron job. |
How do you mean? Do you have some more specifics to that?
We're currently using it on a local development environment and etcd boots
with 3 nodes, but 2 always keep failing.
From time to time I need to shutdown the entire environment and restart it
again to get it working again.
Using it within https://tilt.dev/
thanks
|
Hi @BadTorro , I have found as of now two solution for it.
|
@sudhir649 thanks for the tipp, need to verify - seems like I need an nfs storage provider to get the snapshot image to work. |
@BadTorro yes, tilt is working on local machine so you need to deploy nfs storage class. By the way lakshmi solution won't work for me |
@sudhir649 @BadTorro |
|
@sudhir649 |
@sudhir649, We are facing one more error in apisix. We use openid-connect plugin for authentication and authorization in the ApisixPluginConfig. When we try to hit the ingress of application, it gives 431 (Header too large) error. We tried removing few headers but it was breaking UI of application, so is there a way to solve this? Have you come across similar issue before? |
did u use nginx? if u use nginx add this in nginx |
just increase client size and check it will work |
Regarding to that, I managed to get it work by basically:
Currently keeps on running and did not crash since. |
I have enabled disaster recovery and deployed the helm chart, but this time not just etcd was crashing, the apisix pod stuck in init container, apisix ingress controller was crashing and the snapshot pod was also in error state. So, I rolled back to previous revision again after observing the pod status doesn't seem to change for a long time. |
Hi @BadTorro , How was the experince after deploying the disater recovery? For us its working fine so we replicate it in all the envs. Regards, |
@Lakshmi2k1 The problem you encountered has nothing to do with disaster recovery. I have not experienced this problem |
We had a very similar issue.
We created apisix and etcd through the helm chart and for us the issue was that even though we re-created the StatefulSet and deleted the PVC-s for a fresh start the Changed it to |
The problem is the same here bitnami/charts#16069 |
JFYI : |
neither in GKE |
apologizes for the late reply @sudhir649 , but we ended up using the bitnami chart and customized it to our needs. |
Any news about this problem? |
any solutions for this issue iam also facing the same issue |
I had this issue running on GKE. Uninstalling and deleting the etcd PVCs, then reinstalling fixed the issue. |
Hi @Joeydelarago , when you delete the etcd, what data is gone ya? all route is gone or is still exist? |
I actually encountered this when setting up a new environment, so it was not a concern for me. Also I have done my configuration via yaml files instead of the API. If you need the data, you can always mount the pvcs to a dummy deployment and copy the files to local with kubectl cp. Then delete the pvc and apply apisix. Then use kubectl cp to copy the important files to the newly created pvc. I can't guarantee it will work though. |
This issue started occurring again for me, so I ended up installing etcd separately. I installed etcd. I only updated the etcd values.yaml to increase the replication factor from 1 -> 3
Then I updated the apisix values.yaml and did a helm upgrade apisix.
My solution for the issue is still the same as I mentioned above. Helm uninstall, delete etcd PVC, helm reinstall. However, with etcd separated, this can be done without taking down apisix. Edit: The experimental composite architecture simulates etcd instead. Perhaps by the time you are reading this it is in stable https://apisix.apache.org/docs/ingress-controller/composite/ |
Description
Hello,
I have deployed apisix 2.7.0 Helm chart and out of three etcd pods, two are going into crash loop back off error which affects the ingress created for other deployments.
The logs show the following details,
Master (etcd pod in running state)
"msg":"rejected stream from remote peer because it was removed","local-member-id"
Other pods (etcd pods in crash loop back off state)
"failed to publish local member to cluster through raft","local-member-id":"2c16fb63879f0d98","local-member-attributes":"{Name:apisix-etcd-1 ClientURLs:[http://apisix-etcd-1.apisix-etcd-headless.apisix.svc.cluster.local:2379/ http://apisix-etcd.apisix.svc.cluster.local:2379]}","request-path":"/0/members/2c16fb63879f0d98/attributes","publish-timeout":"7s","error":"etcdserver: request cancelled"
Currently stuck in this, let me know if anyone has faced this and has any fix for this
Environment
apisix version
): 2.7.0The text was updated successfully, but these errors were encountered: