-
Notifications
You must be signed in to change notification settings - Fork 715
Pull requests: kubeflow/training-operator
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Add Changelog for Training Operator v1.9.0
approved
do-not-merge/hold
lgtm
size/L
#2397
opened Jan 21, 2025 by
andreyvelich
Loading…
[WIP] KEP-2170: Implement MPI Plugin for Kubeflow Trainer
do-not-merge/work-in-progress
size/XL
#2394
opened Jan 21, 2025 by
andreyvelich
Loading…
4 tasks
[WIP] Remove the Training Operator V1 Source Code
do-not-merge/work-in-progress
size/XXL
#2389
opened Jan 16, 2025 by
andreyvelich
Loading…
1 of 2 tasks
KEP-2170: Deploy JobSet in
kubeflow-system
namespace
do-not-merge/hold
size/M
#2388
opened Jan 14, 2025 by
andreyvelich
Loading…
KEP-2170: Add PyTorch DDP MNIST training example
size/XL
#2387
opened Jan 14, 2025 by
astefanutti
Loading…
1 task done
Use dictionary unpacking to pass trainer function arguments
size/XS
#2384
opened Jan 9, 2025 by
astefanutti
Loading…
1 task done
KEP-2170: Add the manifests overlay for Kubeflow Training V2
lgtm
ok-to-test
size/L
#2382
opened Jan 9, 2025 by
Doris-xm
Loading…
1 task
Fix read permission denied on train script when run as non-root
size/XS
#2373
opened Jan 7, 2025 by
astefanutti
Loading…
1 task done
Use env variable for the pytorch init-container image in case of usin…
size/S
#2366
opened Jan 6, 2025 by
abhijeet-dhumal
Loading…
1 task
Update workflow and docs for releasing Training Operator
size/L
#2362
opened Dec 23, 2024 by
LogicalGuy77
Loading…
1 task done
chore: added dependabot configuration
do-not-merge/hold
size/M
#2360
opened Dec 23, 2024 by
Veer0x1
Loading…
1 task
Imporved the release process of training operator
size/L
#2359
opened Dec 22, 2024 by
Veer0x1
Loading…
1 task done
fix restart policy bug in mpi job UpdateJobConditions
size/XS
#2344
opened Dec 5, 2024 by
fyxemmmm
Loading…
commonize job name validation
do-not-merge/hold
size/L
#2315
opened Oct 29, 2024 by
akagami-harsh
Loading…
1 task
KEP: 2170: Adding cel validations on TrainingRuntime/ClusterTrainingRuntime CRDs
size/L
#2313
opened Oct 28, 2024 by
akshaychitneni
Loading…
1 task
Update Dockerfile with python debian image in cmd/initializer_v2/dataset/Dockerfile
approved
do-not-merge/hold
lgtm
size/XS
#2312
opened Oct 28, 2024 by
mani1soni
Loading…
WIP: Use SSA in TrainJob Controller
do-not-merge/work-in-progress
size/XXL
#2309
opened Oct 26, 2024 by
varshaprasad96
•
Draft
1 task
KEP-2170: Adding validation webhook for v2 trainjob
size/XL
#2307
opened Oct 24, 2024 by
akshaychitneni
Loading…
1 task
Add helm charts for training operator
do-not-merge/hold
size/XXL
#2263
opened Sep 20, 2024 by
ChenYi015
Loading…
1 task
Migrate to controller-runtime logger in mpi job controller
size/M
#2177
opened Jul 18, 2024 by
champon1020
Loading…
1 task
ProTip!
no:milestone will show everything without a milestone.