Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

modify FederatedLearningJob CRD #456

Open
SherlockShemol opened this issue Nov 9, 2024 · 3 comments
Open

modify FederatedLearningJob CRD #456

SherlockShemol opened this issue Nov 9, 2024 · 3 comments
Labels
kind/question Indicates an issue that is a support question.

Comments

@SherlockShemol
Copy link
Contributor

SherlockShemol commented Nov 9, 2024

Mr. Tang Ming suggests that, unlike inference tasks, Federated Learning should be treated as a job that should not be modified once it is running. Modifying it essentially means restarting the job, so users should delete the existing job and create a new FederatedLearningJob after making any changes. Therefore, users should be directly prohibited from modifying the CRD.

Could you please provide your feedback?
@tangming1996 @hsj576 @MooreZheng

@SherlockShemol SherlockShemol added the kind/question Indicates an issue that is a support question. label Nov 9, 2024
@SherlockShemol
Copy link
Contributor Author

SherlockShemol commented Nov 21, 2024

two optional solution:

  • make FederatedLearningJob CRD modifiable, and once the config is modified, we delete and create the specific worker with new config.
  • make FederatedLearningJob CRD unmodifiable.The reason is above.

@MooreZheng
Copy link
Contributor

MooreZheng commented Dec 27, 2024

It seems that the issue has been discussed by community members at the routine meeting. It would be highly appreciated if @SherlockShemol could share which solution is used in #446 , when you catch some time :D

@SherlockShemol
Copy link
Contributor Author

It seems that the issue has been discussed by community members at the routine meeting. It would be highly appreciated if @SherlockShemol could share which solution is used in #446 , when you catch some time :D

I will improve the solution after my final exam and present a complete and fully-tested solution to the community :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/question Indicates an issue that is a support question.
Projects
None yet
Development

No branches or pull requests

2 participants