Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training Operator ROADMAP 2024 #2259

Open
andreyvelich opened this issue Sep 10, 2024 · 7 comments
Open

Training Operator ROADMAP 2024 #2259

andreyvelich opened this issue Sep 10, 2024 · 7 comments

Comments

@andreyvelich
Copy link
Member

We should update the Training Operator ROADMAP with 2024 work items.

Let's discuss it during the upcoming Training WG calls. Some initial ideas:

  • Training Operator V2
  • Enhance JobSet APIs for distributed training and fine-tuning
  • Kubeflow Training SDK improvements
  • Support for distributed JAX
  • Support for LLM Training runtimes
  • Python APIs for LLMs fine-tuning
  • Consolidate MPI Operator V2 into Training Operator

cc @kubeflow/wg-training-leads @franciscojavierarceo @alculquicondor @kannon92 @mimowo @ahg-g @kuizhiqing @Syulin7 @shravan-achar @akshaychitneni @StefanoFioravanzo @vsoch @helenxie-bit @Electronic-Waste

@franciscojavierarceo
Copy link
Contributor

franciscojavierarceo commented Sep 10, 2024

This is awesome @andreyvelich!! Can't wait! 🚀

rocket

@StefanoFioravanzo
Copy link
Member

@andreyvelich this is an awesome list!

Would it be possible to draft a user journey mapping and value proposition for each one of these initiatives? I can think of having an umbrella issue for each project that presents it to users. Something similar to what we wrote for the LLM APIs here https://www.kubeflow.org/docs/components/training/explanation/fine-tuning/

Doing this before design and implementation helps us ground the value prop and provides a guideline for the expected result

@vsoch
Copy link

vsoch commented Sep 11, 2024

This is fantastic work @andreyvelich ! I'll be here along the way to provide the HPC perspective, if needed.

@franciscojavierarceo
Copy link
Contributor

Linking this issue for reference: #2231

@andreyvelich andreyvelich pinned this issue Sep 23, 2024
@tenzen-y
Copy link
Member

tenzen-y commented Oct 2, 2024

Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@tenzen-y
Copy link
Member

/remove-lifecycle stale

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants