-
Notifications
You must be signed in to change notification settings - Fork 775
Issues: kubeflow/trainer
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
KEP-2401: Revisit PVC claim in torchtune CTRs when stateful jobset is ready
area/llm
area/runtimes
kind/feature
#2630
opened May 1, 2025 by
Electronic-Waste
Support KAI Scheduler in Kubeflow Trainer
area/controller
kind/feature
#2628
opened Apr 30, 2025 by
andreyvelich
v1: Gang Scheduling for Training Operator V1 with KAI Scheduler
area/controller
kind/feature
#2627
opened Apr 30, 2025 by
andreyvelich
Retrieve arguments from Trainer instance
area/sdk
kind/feature
#2624
opened Apr 30, 2025 by
Electronic-Waste
Implement TrainingRuntimes finalizer mechanism
kind/feature
#2609
opened Apr 21, 2025 by
tenzen-y
2 tasks
Flaky Test: Should fail in creating trainJob with invalid trainer config for torch runtime
kind/bug
#2605
opened Apr 18, 2025 by
tenzen-y
Implement validations to prevent changing TrainingRuntime
area/webhook
good first issue
help wanted
kind/feature
#2599
opened Apr 16, 2025 by
tenzen-y
Support XGBoost/LightGBM runtime and examples
area/runtimes
kind/feature
#2598
opened Apr 14, 2025 by
nqvuong1998
KEP-2401: Revisit DependsOn API in CTRs When Supporting Multiple Ancestor
area/deployment
area/llm
kind/feature
#2592
opened Apr 10, 2025 by
Electronic-Waste
KEP-2401: Create LLM Training Runtimes for Llama 3.3 model family
area/llm
kind/feature
#2591
opened Apr 10, 2025 by
Electronic-Waste
Add Helm integration tests to GitHub actions workflow
area/testing
kind/feature
#2577
opened Mar 29, 2025 by
ChenYi015
Contributors Guide to Trainer v2 Docs
area/docs
good first issue
help wanted
#2574
opened Mar 29, 2025 by
SanthoshToorpu
Automated way to generate Kustomize manifests from Helm templates
area/deployment
kind/feature
#2572
opened Mar 28, 2025 by
ChenYi015
Unable to Access Monitoring Port (Prometheus Metrics) on Kubeflow Trainer Controller Manager
area/monitoring
kind/bug
#2547
opened Mar 19, 2025 by
izuku-sds
User guide for PyTorch Training
area/docs
good first issue
help wanted
#2543
opened Mar 18, 2025 by
andreyvelich
Operator guide to manage TrainingRuntime and ClusterTrainingRuntime
area/docs
good first issue
help wanted
#2542
opened Mar 18, 2025 by
andreyvelich
KEP-2170: Add manifest overlays for standalone installation
kind/feature
#2526
opened Mar 16, 2025 by
Doris-xm
Support TrainJob ResourcePerNode in CoScheduling plugin
area/controller
kind/feature
#2525
opened Mar 15, 2025 by
tenzen-y
KEP-2401: Determine the tag for torchtune trainer & Add support for multiple accelerators
area/llm
kind/feature
#2518
opened Mar 13, 2025 by
Electronic-Waste
Get and Use TrainingRuntime ApplyConfiguration throughout KF PipelineFramework
area/controller
kind/feature
#2515
opened Mar 13, 2025 by
tenzen-y
KEP-2401: Create LLM Training Runtimes for Llama 3.1 model family
area/llm
area/runtimes
kind/feature
#2509
opened Mar 12, 2025 by
Electronic-Waste
Previous Next
ProTip!
Type g i on any issue or pull request to go back to the issue listing page.