-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
federated-learning-job run error: http://yolo-v5-aggregation.default:7363 connection failed #459
Comments
This is an issue of DNS failure on k8s+kubedge+edgemesh+sedna cluster. The info of the cluster: |
Is this project dead? Why no replies for all these issues? |
Congratulations on another successful bug fixing. A complicated system deployment like OpenStack, K8S, KubeEdge, or Sedna is usually for real-world cloud services and is tackled by professional experts in large enterprises, indeed not an easy task for newcomers. Nevertheless, we see that one might be confronted with urgent issues, but when participating in the KubeEdge Community, one should try to be understanding and show respect to others, following code of conduct. Experts usually have their important duties in the company and it is also infeasible to expect a 24-Hour on-Call reply from them, e.g., two hours in this case. For successful deployers, a submission of blogs or documents is encouraged and highly appreciated to help members use Sedna within this community. |
BTW, what would be the opinion from @tangming1996 and @SherlockShemol : could there be any chance that this issue is related to the recent merge of #446 ? |
In my opinion it's not related to the recent PR.When I initially deploy sedna applications like joint inference and federated learning I seem to encounter the same dns problem which is caused by edgemesh.And I solve them by referencing a Q&A mannual on zhihu.Hope it helps. |
@MooreZheng @SherlockShemol @tangming1996 Thank you for all have done regarding this project, as AI running on Edge devices has a booming perspective. I hope this project would keep evolving. But I find it's hard to use Sedna in my own project as there is little docs for app developers. If there is a toturial guide for app development, it would be much helpful. |
Can you provide more details about your project? Is it in the framwork of now sedna provides(joint inference, federated learning etc.).I am still a beginner in the Sedna project, so I may not be able to provide you with an answer immediately. However, during my learning process, I will pay attention to the parts you mentioned, and maybe someday I will improve it. |
It shouldn't matter, as our new feature hasn't been released yet. |
@SherlockShemol we're engaging some projects to deploy some ai models on edge devices(rk3568), but the network is not stable. And we don't want share some data to cloud. So we turn to kubeedge and sedna. But it seems rather difficult to use these frameworks as the demos are just some toy-like stuffs. That's why we want help from you all. |
It seems that the network of one edge node has a problem. You need to confirm whether the edgemesh-agent status is normal, and then confirm that all test cases can be run through. If there is still a problem, you can compare whether the configuration between the two edge nodes is consistent, because the network of one edge node is normal. |
@tangming1996 yeah, only one edge node runs to completed and the other edge node hangs on to error. But no aggregated model output on the cloud node. I don't know what's wrong with this demo as the logs seems ok.
|
@victorming666 The aggregation of the cloud will be triggered only when the models of all edge nodes are successfully trained, because there are nodes in your environment that have problems and cannot upload the models to the cloud, resulting in the cloud being unable to complete the aggregation process. |
@victorming666 This is a public lecture by Mr. jaypume. Through it, you can get an overview of Sedna. If you want to create applications with Sedna, you can directly go to the Sedna Lib Source Code Analysis section. Hope it helps. |
@SherlockShemol Thank you very much! We are scratching on the code of Sedna to figure how to integrate it into an aiot project. This quite help a lot. |
I rebuilt the docker images for federated learning job, the pod run ok on both cloud node and edge nodes:
the pod on cloud node:
but the pod on edge node gives errors:
anybody can help? many tks!
The text was updated successfully, but these errors were encountered: