Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No SQS retry on "read: connection reset by peer" #717

Open
eugenea opened this issue Nov 2, 2022 · 4 comments
Open

No SQS retry on "read: connection reset by peer" #717

eugenea opened this issue Nov 2, 2022 · 4 comments
Labels
good first issue Good for newcomers stalebot-ignore To NOT let the stalebot update or close the Issue / PR Type: Enhancement New feature or request

Comments

@eugenea
Copy link

eugenea commented Nov 2, 2022

Describe the bug
NTH does not retry request over AWS SDK API to retrieve SQS queue message.

Steps to reproduce
Close firewall to SQS AWS endpoint and try to monitor for SQS events.

Expected outcome
The network layer cannot be guaranteed to be reliable so need to implement retry logic here.

Application Logs

WRN There was a problem monitoring for events error="RequestError: send request failed\ncaused by: Post \"https://sqs.us-west-2.amazonaws.com/\": read tcp 100.100.xx.xx:xxxx->10.xx.xx.xx:443: read: connection reset by peer" event_type=SQS_TERMINATE

Environment

  • NTH App Version: v1.16.3
  • NTH Mode (IMDS/Queue processor): Queue processor
  • OS/Arch: Linux
  • Kubernetes version: v1.21.14-eks
  • Installation method: deployment

The check that denies retry is here
For V1 of AWS SDK the fix should be custom retryer which re-implements should retry, and custom retryer should be injected here, however upgrade to V2 of AWS SKD should fix this issue automatically, because it does not make distinction between different kinds of connection reset and retries them all which is desired behavior here.

@snay2 snay2 added Type: Enhancement New feature or request stalebot-ignore To NOT let the stalebot update or close the Issue / PR labels Nov 16, 2022
@snay2
Copy link
Contributor

snay2 commented Nov 16, 2022

Thank you for the suggestion! Upon first reading, we would favor doing the upgrade to v2 of the AWS SDK if it can handle this logic automatically.

@eugenea
Copy link
Author

eugenea commented Nov 30, 2022

Do you have any timeline/plan for v2 upgrade?

@snay2
Copy link
Contributor

snay2 commented Dec 2, 2022

No firm timeline yet, but it's one of our ongoing projects at the moment.

@snay2 snay2 assigned snay2 and cjerad and unassigned snay2 Dec 2, 2022
@jillmon
Copy link
Contributor

jillmon commented Jan 19, 2023

@eugenea, the beta version of the NTH v2 upgrade has recently been released. Have you had a chance to investigate whether then new SDK can handle this use case?

@steveshidev steveshidev added the good first issue Good for newcomers label Aug 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers stalebot-ignore To NOT let the stalebot update or close the Issue / PR Type: Enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants