What you would like to be added?
This is the tracking issue for #2590 (comment).
We'll revisit the torchtune ClusterTrainingRuntimes when kubernetes-sigs/jobset#874 is finished.
/remove-label lifecycle/needs-triage
/area llm
/area deployment
/cc @kubeflow/wg-training-leads @astefanutti @franciscojavierarceo @kannon92 @vsoch @ahg-g
Why is this needed?
In #2590, the training task needs to wait for both model initializer and dataset initializer. However, current jobset does not support specifying multiple items, which is blocking our implementation.
Love this feature?
Give it a 👍 We prioritize the features with most 👍
What you would like to be added?
This is the tracking issue for #2590 (comment).
We'll revisit the torchtune ClusterTrainingRuntimes when kubernetes-sigs/jobset#874 is finished.
/remove-label lifecycle/needs-triage
/area llm
/area deployment
/cc @kubeflow/wg-training-leads @astefanutti @franciscojavierarceo @kannon92 @vsoch @ahg-g
Why is this needed?
In #2590, the training task needs to wait for both model initializer and dataset initializer. However, current jobset does not support specifying multiple items, which is blocking our implementation.
Love this feature?
Give it a 👍 We prioritize the features with most 👍