Skip to content
This repository has been archived by the owner on Oct 9, 2023. It is now read-only.

Mark webapi task failure as retry limit exceeded #392

Merged
merged 6 commits into from
Sep 14, 2023
Merged

Conversation

pingsutw
Copy link
Member

@pingsutw pingsutw commented Aug 19, 2023

TL;DR

fixed https://flyte-org.slack.com/archives/C05GQ16TQLF/p1692310075236329

propeller updates the cache.state to PhaseSystemFailure at here. however, it sets the action to cache.Unchanged in cache.ItemSyncResponse, which won't update the item in the LRU cache.

Add an error message field to cache.state, and use it when we return PhaseInfoFailure. Therefore, we can see the error message on FlyteConsole if the propeller fails to get the task status.

Type

  • Bug Fix
  • Feature
  • Plugin

Are all requirements met?

  • Code completed
  • Smoke tested
  • Unit tests added
  • Code documentation added
  • Any pending items have an associated Issue

Complete description

image

Tracking Issue

NA

Follow-up issue

NA

@codecov
Copy link

codecov bot commented Aug 19, 2023

Codecov Report

Patch coverage: 48.71% and project coverage change: +1.21% 🎉

Comparison is base (03dc9db) 63.01% compared to head (3ee20ed) 64.23%.
Report is 6 commits behind head on master.

❗ Current head 3ee20ed differs from pull request most recent head cd67d94. Consider uploading reports for the commit cd67d94 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #392      +/-   ##
==========================================
+ Coverage   63.01%   64.23%   +1.21%     
==========================================
  Files         154      156       +2     
  Lines       13080    10670    -2410     
==========================================
- Hits         8243     6854    -1389     
+ Misses       4220     3195    -1025     
- Partials      617      621       +4     
Flag Coverage Δ
unittests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
...o/tasks/pluginmachinery/internal/webapi/monitor.go 40.00% <0.00%> (-7.50%) ⬇️
go/tasks/pluginmachinery/internal/webapi/state.go 100.00% <ø> (ø)
go/tasks/plugins/k8s/kfoperators/common/config.go 0.00% <0.00%> (ø)
...sks/plugins/k8s/kfoperators/common/config_flags.go 17.39% <17.39%> (ø)
go/tasks/plugins/k8s/spark/spark.go 79.32% <50.00%> (+1.39%) ⬆️
go/tasks/plugins/array/awsbatch/transformer.go 81.45% <60.00%> (+2.33%) ⬆️
...o/tasks/plugins/k8s/kfoperators/pytorch/pytorch.go 79.22% <60.00%> (+0.53%) ⬆️
...s/plugins/k8s/kfoperators/tensorflow/tensorflow.go 77.18% <60.00%> (+1.45%) ⬆️
go/tasks/plugins/k8s/kfoperators/mpi/mpi.go 74.49% <66.66%> (+1.92%) ⬆️
go/tasks/pluginmachinery/internal/webapi/cache.go 85.39% <100.00%> (+5.59%) ⬆️
... and 2 more

... and 126 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

if err != nil {
return core.PhaseInfoUndefined, err
}
logger.Infof(ctx, "Agent returned an output")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: remove it once this pr is merged.

Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
@honnix
Copy link
Member

honnix commented Sep 7, 2023

Thank you for address this. Does this also make CreateTask failure handled correctly? More concretely here. Will that error be handled correctly?

Signed-off-by: Kevin Su <[email protected]>
@pingsutw
Copy link
Member Author

Does this also make CreateTask failure handled correctly? More concretely here. Will that error be handled correctly?

That's another issue, I'll fix it in a separate PR.

@pingsutw pingsutw merged commit 9bf0fb9 into master Sep 14, 2023
5 of 7 checks passed
eapolinario pushed a commit that referenced this pull request Sep 28, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants