You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
However, if I want to use different configurations (such as images) for different workers, how should I do it? I tried to configure multiple workers, like this:
Thank you for creating this issue!
Currently, the training-operator does not support the requested behavior. The PyTorchJob does not allow us to specify duplicate roles in a single job.
Thank you for creating this issue!
Currently, the training-operator does not support the requested behavior. The PyTorchJob does not allow us to specify duplicate roles in a single job.
/remove-kind bug
/kind support
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Thank you for creating this issue! Currently, the training-operator does not support the requested behavior. The PyTorchJob does not allow us to specify duplicate roles in a single job.
/remove-kind bug /kind support
Ok, I got it, thank you for your reply~ By the way, if I still want to achieve this behavior, is there any way to do it? I can accept any other job (tfjob or anythong else) or any proposal, thanks again and have a nice day.
What happened?
Hello everyone! As mentioned above, I hope that the worker of pytorhjob can use different images or configurations.
Normally, my yaml is similar to this:
apiVersion: "kubeflow.org/v1"
kind: "PyTorchJob"
metadata:
name: "resnet-1"
spec:
pytorchReplicaSpecs:
Master:
replicas: 1
...
Worker:
replicas: 3
...
However, if I want to use different configurations (such as images) for different workers, how should I do it? I tried to configure multiple workers, like this:
spec:
pytorchReplicaSpecs:
Master:
replicas: 1
...
Worker:
replicas: 1
...
Worker:
replicas: 1
...
But unfortunately, this doesn't work.
What did you expect to happen?
How can I implement pytorchjob's workers to use different images or configurations?
Environment
Kubernetes version:
Client Version: v1.30.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.30.2
Training Operator version:
kubeflow/training-operator:v1-9e52eb7#
Impacted by this bug?
Give it a 👍 We prioritize the issues with most 👍
The text was updated successfully, but these errors were encountered: