Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@retry that retries only system errors #1443

Open
tuulos opened this issue Jun 8, 2023 · 0 comments · May be fixed by #2131
Open

@retry that retries only system errors #1443

tuulos opened this issue Jun 8, 2023 · 0 comments · May be fixed by #2131
Labels
enhancement New feature or request

Comments

@tuulos
Copy link
Collaborator

tuulos commented Jun 8, 2023

To handle interrupted spot instances and other system-level exceptions, we need a version of @retry that lets non-retrieable user errors go through.

The example below does the trick for locally scheduled runs but not on production runs on Argo/SFN/Airflow:

import sys
import time
import traceback
from functools import wraps

from metaflow import FlowSpec, step, retry
from metaflow.exception import METAFLOW_EXIT_DISALLOW_RETRY

def platform_retry(f):
    @wraps(f)
    def wrapper(self):
        try:
            f(self)
        except:
            traceback.print_exc()
            sys.exit(METAFLOW_EXIT_DISALLOW_RETRY)
    return retry(wrapper)

class PlatformRetryFlow(FlowSpec):

    @platform_retry
    @step
    def start(self):
        time.sleep(10)
        print('fail', 1 / 0)
        self.next(self.end)

    @platform_retry
    @step
    def end(self):
        print("done!")

if __name__ == '__main__':
    PlatformRetryFlow()

We could implement the pattern e.g. as an option in @retry, e.g. @retry(only_system=True)

@tuulos tuulos added the enhancement New feature or request label Jun 8, 2023
@madhur-ob madhur-ob linked a pull request Nov 4, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant