Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce error handling recommendations to the spec and improve error handling in existing taps and targets #20

Open
anthonyp opened this issue Apr 9, 2017 · 2 comments

Comments

@anthonyp
Copy link

anthonyp commented Apr 9, 2017

While using taps and targets, there are different types of errors that may occur, some transient and some fatal.

Many errors in Singer-compatible taps and targets are treated as fatal, and this makes for a sometimes frustrating user experience because it means that one random (and possibly transient) failure in the middle of a very long data dump can cause the entire dump to quit unexpectedly.

As part of the spec and in general communication about Singer, it seems crucial to communicate to developers the importance of taps and targets being patient and resilient. When a tap or a target encounters an error that might potentially resolve itself by simply trying again - like an API throwing a random 500, or being down - Singer-compatible code should try as hard as it can to not just give up.

Of course, this won't always be possible. Sometimes errors really are fatal. In this case, it seems that the state functionality is a decent way to deal with re-running a job that failed in the middle without needing to re-load all of the data. That said, it just seems important that if this is Singer's expectation (that state is useful not only for normal delta data loads, but also to pick up on failed loads), then it is communicated as such in the spec so that tap and target developers understand the use cases for which they are developing. In addition, it would then be helpful to advise end users to consider using state functionality even if they are performing only one-off loads.

@mdelaurentis
Copy link
Contributor

Thanks for the suggestion. We could definitely enhance the docs to mention that Taps should be robust against transient API failures. Most of the Taps we've written use the backoff library to retry failed HTTP requests a small number of times. Does that seem like an acceptable solution for most errors? Are there any specific examples you found where a Tap or Target fails when it probably would have succeeded after retrying? I think you're mostly suggesting that we enhance the docs, but I'm wondering if there are any specific Taps or Targets that prompted you to mention this.

@anthonyp
Copy link
Author

@mdelaurentis Specifically, I ran into rate limiting errors with the gsheet target, and also some errors with HubSpot (bad code causing one stream to fail while other streams worked fine).

I suppose the level of resiliency required is subjective, but the experience did - at a higher level - prompt me to consider how important it was for tap and target developers to understand the importance of error/exception mitigation. So yes, this is a mostly just a suggestion in regard to documentation and specs. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants