forked from foundation-model-stack/fms-hf-tuning
-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: History based loss metric (foundation-model-stack#156)
* feat: History based loss metric Signed-off-by: Padmanabha V Seshadri <[email protected]> * feat: History based loss metric Signed-off-by: Padmanabha V Seshadri <[email protected]> * fix: corrected the keys and rules Signed-off-by: Padmanabha V Seshadri <[email protected]> * fix: Formatting issues in the code Signed-off-by: Padmanabha V Seshadri <[email protected]> * feat: Added patience configuration Signed-off-by: Padmanabha V Seshadri <[email protected]> * fix: Added info msgs to patience Signed-off-by: Padmanabha V Seshadri <[email protected]> * fix: Formatting issues resolved Signed-off-by: Padmanabha V Seshadri <[email protected]> * feat: Test cases for history based metrics Signed-off-by: Padmanabha V Seshadri <[email protected]> * fix: Format issues resolved Signed-off-by: Padmanabha V Seshadri <[email protected]> * fix: Addressed review comments on patience Signed-off-by: Padmanabha V Seshadri <[email protected]> * feat: patience upgraded Signed-off-by: Padmanabha V Seshadri <[email protected]> * fix: Handle repeated entries Signed-off-by: Padmanabha V Seshadri <[email protected]> * fix: Duplicate logs and other comments addressed Signed-off-by: Padmanabha V Seshadri <[email protected]> * fix: ADR changed to reflect convension change Signed-off-by: Padmanabha V Seshadri <[email protected]> * feat: Added examples for eval and training loss thresholds Signed-off-by: Padmanabha V Seshadri <[email protected]> * feat: Added examples for eval and training loss thresholds Signed-off-by: Padmanabha V Seshadri <[email protected]> --------- Signed-off-by: Padmanabha V Seshadri <[email protected]>
- Loading branch information
Showing
37 changed files
with
711 additions
and
100 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
12 changes: 12 additions & 0 deletions
12
examples/trainercontroller_configs/epoch-level-eval-loss-below-threshold.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
controller_metrics: | ||
- name: trainer_state | ||
class: TrainingState | ||
- name: evalmetric | ||
class: EvalMetrics | ||
controllers: | ||
- name: epoch_level_eval_loss_below_threshold | ||
triggers: | ||
- on_epoch_end | ||
rule: evalmetric['eval_loss'] < 2.25 and trainer_state["epoch"] > 2 | ||
operations: | ||
- hfcontrols.should_training_stop |
14 changes: 14 additions & 0 deletions
14
examples/trainercontroller_configs/epoch-level-eval-loss-patience.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
controller_metrics: | ||
- name: eval_loss_window | ||
class: HistoryBasedMetric | ||
arguments: | ||
window_size: 2 | ||
controllers: | ||
- name: epoch_level_eval_loss_patience | ||
triggers: | ||
- on_epoch_end | ||
rule: len(eval_loss_window["metrics"]) > 0 and eval_loss_window["metrics"]["eval_loss"][-1] > 2 | ||
patience: | ||
patience_threshold: 2 | ||
operations: | ||
- hfcontrols.should_training_stop |
14 changes: 14 additions & 0 deletions
14
examples/trainercontroller_configs/epoch-level-eval-loss.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
controller_metrics: | ||
- name: trainer_state | ||
class: TrainingState | ||
- name: eval_loss_window | ||
class: HistoryBasedMetric | ||
arguments: | ||
window_size: 1 | ||
controllers: | ||
- name: epoch_level_eval_loss | ||
triggers: | ||
- on_epoch_end | ||
rule: len(eval_loss_window["metrics"]) > 0 and eval_loss_window["metrics"]["eval_loss"][-1] > 2.2 and trainer_state["epoch"] > 3 | ||
operations: | ||
- hfcontrols.should_training_stop |
12 changes: 12 additions & 0 deletions
12
examples/trainercontroller_configs/epoch-level-training-loss-below-threshold.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
controller_metrics: | ||
- name: training_loss_window | ||
class: HistoryBasedMetric | ||
arguments: | ||
window_size: 1 | ||
controllers: | ||
- name: epoch_level_stop_on_training_loss_below_threshold | ||
triggers: | ||
- on_log | ||
rule: len(training_loss_window["training_loss"]["loss"]) == training_loss_window["window_size"] and training_loss_window["training_loss"]["loss"][0] < 2.2 and training_loss_window["training_loss"]["epoch"][0] > 2 | ||
operations: | ||
- hfcontrols.should_training_stop |
14 changes: 14 additions & 0 deletions
14
examples/trainercontroller_configs/epoch-level-training-loss.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
controller_metrics: | ||
- name: trainer_state | ||
class: TrainingState | ||
- name: training_loss_window | ||
class: HistoryBasedMetric | ||
arguments: | ||
window_size: 1 | ||
controllers: | ||
- name: epoch_level_training_loss | ||
triggers: | ||
- on_epoch_end | ||
rule: training_loss_window["training_loss"]["loss"][-1] > 2 and trainer_state["epoch"] > 3 | ||
operations: | ||
- hfcontrols.should_training_stop |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
12 changes: 12 additions & 0 deletions
12
examples/trainercontroller_configs/non-decreasing-training-loss.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
controller_metrics: | ||
- name: training_loss_window | ||
class: HistoryBasedMetric | ||
arguments: | ||
window_size: 5 | ||
controllers: | ||
- name: stop_on_training_loss_not_decreasing | ||
triggers: | ||
- on_log | ||
rule: training_loss_window["training_loss"]["loss"][0] < training_loss_window["training_loss"]["loss"][-1] and len(training_loss_window["training_loss"]["loss"]) == training_loss_window["window_size"] | ||
operations: | ||
- hfcontrols.should_training_stop |
12 changes: 12 additions & 0 deletions
12
examples/trainercontroller_configs/thresholded-training-loss.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
controller_metrics: | ||
- name: training_loss_window | ||
class: HistoryBasedMetric | ||
arguments: | ||
window_size: 1 | ||
controllers: | ||
- name: stop_on_training_loss_not_decreasing | ||
triggers: | ||
- on_log | ||
rule: training_loss_window["training_loss"]["loss"][-1] > 2.2 | ||
operations: | ||
- hfcontrols.should_training_stop |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
14 changes: 14 additions & 0 deletions
14
tests/data/trainercontroller/epoch-level-eval-loss-patience.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
controller_metrics: | ||
- name: eval_loss_window | ||
class: HistoryBasedMetric | ||
arguments: | ||
window_size: 2 | ||
controllers: | ||
- name: epoch_level_eval_loss_patience | ||
triggers: | ||
- on_epoch_end | ||
rule: len(eval_loss_window["metrics"]["eval_loss"]) > 0 and eval_loss_window["metrics"]["eval_loss"][-1] > 2 | ||
patience: | ||
patience_threshold: 2 | ||
operations: | ||
- hfcontrols.should_training_stop |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
controller_metrics: | ||
- name: trainer_state | ||
class: TrainingState | ||
- name: eval_loss_window | ||
class: HistoryBasedMetric | ||
arguments: | ||
window_size: 1 | ||
controllers: | ||
- name: epoch_level_eval_loss | ||
triggers: | ||
- on_epoch_end | ||
rule: len(eval_loss_window["metrics"]["eval_loss"]) > 0 and eval_loss_window["metrics"]["eval_loss"][-1] > 2 and trainer_state["epoch"] > 0.1 | ||
operations: | ||
- hfcontrols.should_training_stop |
14 changes: 14 additions & 0 deletions
14
tests/data/trainercontroller/epoch-level-training-loss.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
controller_metrics: | ||
- name: trainer_state | ||
class: TrainingState | ||
- name: training_loss_window | ||
class: HistoryBasedMetric | ||
arguments: | ||
window_size: 1 | ||
controllers: | ||
- name: epoch_level_training_loss | ||
triggers: | ||
- on_epoch_end | ||
rule: training_loss_window["training_loss"]["loss"][-1] < 1 and trainer_state["epoch"] >= 0.5 | ||
operations: | ||
- hfcontrols.should_training_stop |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
6 changes: 3 additions & 3 deletions
6
tests/data/trainercontroller/incorrect_source_event_exposed_metrics.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,13 +1,13 @@ | ||
controller-metrics: | ||
controller_metrics: | ||
- name: loss | ||
class: Loss | ||
operations: | ||
- name: customoperation | ||
- name: custom_operation | ||
class: CustomOperation | ||
controllers: | ||
- name: loss-controller-custom-operation | ||
- name: loss_controller_custom_operation | ||
triggers: | ||
- on_log | ||
rule: loss < 1.0 | ||
operations: | ||
- customoperation.should_perform_action_xyz | ||
- custom_operation.should_perform_action_xyz |
8 changes: 4 additions & 4 deletions
8
tests/data/trainercontroller/loss_custom_operation_invalid_action.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,13 +1,13 @@ | ||
controller-metrics: | ||
controller_metrics: | ||
- name: loss | ||
class: Loss | ||
operations: | ||
- name: customoperation | ||
- name: custom_operation | ||
class: CustomOperationInvalidAction | ||
controllers: | ||
- name: loss-controller-custom-operation-invalid-action | ||
- name: loss_controller_custom_operation_invalid_action | ||
triggers: | ||
- on_log | ||
rule: loss < 1.0 | ||
operations: | ||
- customoperation.should_ | ||
- custom_operation.should_ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
4 changes: 2 additions & 2 deletions
4
tests/data/trainercontroller/loss_invalid_operation_action.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
4 changes: 2 additions & 2 deletions
4
tests/data/trainercontroller/loss_on_threshold_with_trainer_state.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
4 changes: 2 additions & 2 deletions
4
tests/data/trainercontroller/loss_with_invalid_type_rule.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
4 changes: 2 additions & 2 deletions
4
tests/data/trainercontroller/loss_with_malicious_input_rule.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
4 changes: 2 additions & 2 deletions
4
tests/data/trainercontroller/loss_with_malicious_os_rule.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.