-
Notifications
You must be signed in to change notification settings - Fork 706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-2170: Update Training V2 APIs in the KEP #2240
KEP-2170: Update Training V2 APIs in the KEP #2240
Conversation
Signed-off-by: Andrey Velichkevich <[email protected]>
Part of: #2170 |
Pull Request Test Coverage Report for Build 10632842905Details
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, could we clarify the criteria for when the Initializer, Trainer, and Finalizer will start?
For example, if the initializer fails, what happens? The Trainer and Finalizer will start or not?
I have similar questions about Trainer and Finalizer. So, I believe in clarifying the criteria for when each task will start.
@@ -819,49 +824,51 @@ orchestration (e.g. using Kubernetes admission webhooks or custom clients). | |||
In the future, we can add more parameters if we find use-cases when it is required. | |||
|
|||
```golang | |||
type PodSpecOverride struct { | |||
// Name of the training replica in the training runtime template to override | |||
type PodSpecOverrides struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type PodSpecOverrides struct { | |
type PodSpecOverride struct { |
Could we use the singular name instead of the plural since we used to select the singular name for field typed?
We can image the following fields:
PodSpecOverrides []PodSpecOverride `json:"podSpecOverrides"`
vs
PodSpecOverrides []PodSpecOverrides `json:"podSpecOverrides"`
// Name for the container. | ||
// ContainerOverrides represents parameters that can be overridden using PodSpecOverrides. | ||
// Parameters from the Trainer, DatasetConfig, and ModelConfig will take precedence. | ||
type ContainerOverrides struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type ContainerOverrides struct { | |
type ContainerOverride struct { |
I would recommend the singular name based on the above comment.
Sure, can we do that in the followup PR when we agree on the Initializer and Finalizer structure ? E.g. init container + sidecar container which will be triggered with preStop lifecycle hook. |
Co-authored-by: Yuki Iwai <[email protected]> Signed-off-by: Andrey Velichkevich <[email protected]>
Signed-off-by: Andrey Velichkevich <[email protected]>
54bdbcc
to
273af3c
Compare
Signed-off-by: Andrey Velichkevich <[email protected]>
@tenzen-y I've done changes to PodSpecOverride API. |
@andreyvelich Sure. If so, can we treat the Initializer and Finalizer as a separate feature similar to PodSpecOverride: #2218? And, can we create a dedicated issue and work on design and implementations in the issue? |
We have issue for Initializer job already: #2210. |
Sounds great. Thanks! |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: tenzen-y The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
I updated APIs in Training V2 KEP after our discussions in this PR: #2223.
Additionally, we discussed with @tenzen-y to rename
Exporter
Job to theFinalizer
, since we might introduce another post-processing functionality later (in addition to model exporter).Please take a look.
/assign @kubeflow/wg-training-leads @shravan-achar @vsoch @kannon92