This repository has been archived by the owner on Jan 31, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 21
[Label Bot Continuous Training] Needs Training Needs to take into account whether there is a model currently being trained #178
Labels
Comments
Issue-Label Bot is automatically applying the labels:
Please mark this comment with 👍 or 👎 to give our bot feedback! |
It looks like we need to also look at the datasets and see if there is a model training in progress. |
jlewi
added a commit
that referenced
this issue
Jul 28, 2020
Temporarily disable continuous retraining until we can fix #178
jlewi
pushed a commit
to jlewi/code-intelligence
that referenced
this issue
Oct 4, 2020
* NeedsSync needs to check whether there is a model being trained or if there is a dataset being imported. Otherwise we end up launching multiple overlapping jobs because it takes a long time for the model to train. During which time the Tekton job will have finished. * Related to kubeflow#178
k8s-ci-robot
pushed a commit
that referenced
this issue
Oct 4, 2020
* NeedsSync needs to check whether there is a model being trained or if there is a dataset being imported. Otherwise we end up launching multiple overlapping jobs because it takes a long time for the model to train. During which time the Tekton job will have finished. * Related to #178
jlewi
pushed a commit
to jlewi/code-intelligence
that referenced
this issue
Oct 4, 2020
* It is NeedsTraining not NeedsSync that needs to check whether there is a training job running. Related to kubeflow#178
k8s-ci-robot
pushed a commit
that referenced
this issue
Oct 4, 2020
* It is NeedsTraining not NeedsSync that needs to check whether there is a training job running. Related to #178
#182 auto PR created for a model trained by manually running the notebook. Need to verify that a new model is trained automatically and then deployed. |
#184 opened a PR to update to the same model. It doesn't look like a new model got trained. |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Our synchronous training pipeline is currently spawning multiple instances of training rather than the expected 1 model per hour.
The problem appears to be the code to decide whether to train a model only looks at whether there is a trained model.
So I don't think we take into account whether a model is currently being trained.
code-intelligence/Label_Microservice/go/cmd/automl/pkg/automl/automl.go
Line 101 in faeb657
My conjecture is the following happens
At this point
The text was updated successfully, but these errors were encountered: