Update KFTO multi-node test names according to recent updates in orig…#2164
Conversation
Robot Results
|
|
What about other 2 test scenarios |
@ChughShilpa Actually the remaining MultiNode/MultiGPUs tests requires 2 cluster-nodes with minimum 2 GPUs each (GPU instance like g4dn.12xlarge - A100 GPUs), which I'm not sure whether will be available during QG tests.. |
|
We can add the tests to ODS CI, just we can't run them as part of QG, only as part of our own jobs. |
g4dn.12xlarge instance is used in qe-jenkins, and we also have |
| Run Training Operator KFTO Test TestPyTorchJobMnistMultiNodeWithROCm ${ROCM_TRAINING_IMAGE} | ||
|
|
||
| Run Training operator KFTO_MNIST multi-node multi-gpu test with NVIDIA CUDA image | ||
| [Documentation] Run Go KFTO_MNIST multi-node multi-gpu test for Training operator using PyTorch job with NVIDIA CUDA image - It requires 2 cluster-nodes with 2 GPUs each |
Check warning
Code scanning / Robocop
Line is too long ({{ line_length }}/{{ allowed_length }})
| Run Training Operator KFTO Test TestPyTorchJobMnistMultiNodeMultiGpuWithCuda ${CUDA_TRAINING_IMAGE} | ||
|
|
||
| Run Training operator KFTO_MNIST multi-node multi-gpu test with AMD ROCm image | ||
| [Documentation] Run Go KFTO_MNIST multi-node multi-gpu test for Training operator using PyTorch job with AMD ROCm image - It requires 2 cluster-nodes with 2 GPUs each |
Check warning
Code scanning / Robocop
Line is too long ({{ line_length }}/{{ allowed_length }})
d8d75d4 to
ffd8213
Compare
| Run Training operator KFTO_MNIST multi-node CPU test with NVIDIA CUDA image | ||
| [Documentation] Run Go KFTO_MNIST multi-node CPU test for Training operator using PyTorch job with NVIDIA CUDA image | ||
| Run Training operator KFTO_MNIST multi-node single-CPU test with NVIDIA CUDA image | ||
| [Documentation] Run Go KFTO_MNIST multi-node single-CPU test for Training operator using PyTorch job with NVIDIA CUDA image - It requires 2 cluster-nodes with at least 1 CPUs each |
Check warning
Code scanning / Robocop
Line is too long ({{ line_length }}/{{ allowed_length }})
| Run Training operator KFTO_MNIST multi-node test with NVIDIA CUDA image | ||
| [Documentation] Run Go KFTO_MNIST multi-node test for Training operator using PyTorch job with NVIDIA CUDA image | ||
| Run Training operator KFTO_MNIST multi-node multi-CPU test with NVIDIA CUDA image | ||
| [Documentation] Run Go KFTO_MNIST multi-node multi-CPU test for Training operator using PyTorch job with NVIDIA CUDA image - It requires 2 cluster-nodes with 2 CPUs each |
Check warning
Code scanning / Robocop
Line is too long ({{ line_length }}/{{ allowed_length }})
| Run Training Operator KFTO Test TestPyTorchJobMnistMultiNodeMultiCpu ${CUDA_TRAINING_IMAGE} | ||
|
|
||
| Run Training operator KFTO_MNIST multi-node single-GPU test with NVIDIA CUDA image | ||
| [Documentation] Run Go KFTO_MNIST multi-node single-GPU test for Training operator using PyTorch job with NVIDIA CUDA image - It requires 2 cluster-nodes with 1 GPU each |
Check warning
Code scanning / Robocop
Line is too long ({{ line_length }}/{{ allowed_length }})
| Run Training operator KFTO_MNIST multi-node test with AMD ROCm image | ||
| [Documentation] Run Go KFTO_MNIST multi-node test for Training operator using PyTorch job with AMD ROCm image | ||
| Run Training operator KFTO_MNIST multi-node single-GPU test with AMD ROCm image | ||
| [Documentation] Run Go KFTO_MNIST multi-node single-GPU test for Training operator using PyTorch job with AMD ROCm image - It requires 2 cluster-nodes with 1 GPU each |
Check warning
Code scanning / Robocop
Line is too long ({{ line_length }}/{{ allowed_length }})
ffd8213 to
2a5986d
Compare
2a5986d to
c098ea0
Compare
|
|
/approve |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: abhijeet-dhumal, ChughShilpa, jiripetrlik, sutaakar The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |



Update KFTO multi-node test names according to recent updates in original test names
Related to : opendatahub-io/distributed-workloads#299