Skip to content

Conversation

jwadolowski
Copy link

@jwadolowski jwadolowski commented Sep 5, 2025

Proposed changes

Intermittent network errors may happen when a module's source code is downloaded. That happened to me recently (an unexpected 502 from GitHub) which resulted in a missing module in my Docker image (that was partially my fault, as the build process should stop in case of any error).

This PR implements automatic retries for common transient network hiccups.

Here's how that's going to work in practice:

$ wget --retry-on-http-error=429,502,503,504 --retry-on-host-error --retry-connrefused --tries 3 -O /tmp/out https://httpbingo.org/status/429
--2025-09-05 11:14:04--  https://httpbingo.org/status/429
Resolving httpbingo.org (httpbingo.org)... 66.241.125.232, 2a09:8280:1:7fcb:9efa:a365:aa2a:d036
Connecting to httpbingo.org (httpbingo.org)|66.241.125.232|:443... connected.
HTTP request sent, awaiting response... 429 Too Many Requests
Retrying.

--2025-09-05 11:14:05--  (try: 2)  https://httpbingo.org/status/429
Reusing existing connection to httpbingo.org:443.
HTTP request sent, awaiting response... 429 Too Many Requests
Retrying.

--2025-09-05 11:14:07--  (try: 3)  https://httpbingo.org/status/429
Reusing existing connection to httpbingo.org:443.
HTTP request sent, awaiting response... 429 Too Many Requests
Giving up.

As per the documentation, wget uses linear backoff between retry attempts by default:

  --waitretry=seconds
      If you don't want Wget to wait between every retrieval, but only between retries of failed downloads, you can use >this option.  Wget will use linear backoff, waiting 1 second
      after the first failure on a given file, then waiting 2 seconds after the second failure on that file, up to the maximum number of seconds you specify.

Checklist

Before creating a PR, run through this checklist and mark each as complete:

Intermittent network errors may happen when a module's source code is
downloaded. That happened to me recently (an unexpected 502 from GitHub)
which resulted in a missing module in my Docker image (that was
partially my fault, as the build process should stop in case of any
error).

This PR implements automatic retries for common transient network
hiccups.
@jwadolowski jwadolowski marked this pull request as ready for review September 5, 2025 09:30
@mihaiplesa
Copy link

Any helping in getting this in would be appreciated @oxpa @thresheek.

@thresheek
Copy link
Member

thresheek commented Sep 23, 2025

Thanks for the patch. I've looked up when those options appeared in wget:

  • --retry-on-http-error: ~Feb 2017, released in v1.19.1
  • --retry-on-host-error: ~Jun 2018, released in v1.20
  • --retry-connrefused: ~Sep 2003

Now, looking at the oldest currently supported distros by pkg-oss (https://nginx.org/en/linux_packages.html):

  • ❌ RHEL 8 has wget 1.19.5, the manual page does not list --retry-on-host-error, and passing it as an option yields "unrecognized option"
  • ✅ RHEL 9 has wget 1.21.1
  • ✅ Debian 11 has wget 1.21
  • ✅ Ubuntu 22.04 has wget 1.21.2
  • ✅ SLES 15 has wget 1.20.3
  • ✅ Alpine Linux 3.19 has wget 1.21.4
  • ❌ Amazon Linux 2 has wget 1.14
  • ✅ Amazon Linux 2023 has wget 1.21.3

That means if we're to merge this change, we're going to abandon at least two operating systems, one of which is arguably very widely used. I'm inclined to NACK for this reason.

@oxpa your thoughts?

@oxpa
Copy link
Contributor

oxpa commented Sep 24, 2025

Such network issues never happened to me. So I probably tend to downplay the importance of the patch.
If a restart is not an option - I would wrap wget in a function that uses simple for or while loop to try download several times.
And it probably makes sense to make it more general like try_n_time. Very little effort and can be reused later. Like this:

try_n_times() {
    N=$1
    CMD=$2
    CLEAN=$3
    i=0
    while ! $CMD; do
        i=`expr $i + 1`
        if [ $i -le $N ]; then
            echo "attempt $i failed! retry!"
            test -n "$CLEAN" && $CLEAN
        else
            echo "$N attempts failed!"
            return 1
        fi;
    done
    return 0
}

And then

try_n_times 3 "wget ..."

This way it works anywhere and can be reused for other places where retries make sense.

@jwadolowski
Copy link
Author

Thanks for the feedback guys, these are very valid points.

I didn't check it myself, but kind of hoped for better support across supported operating systems. Adding these new params as-is is definitely not a viable solution.

The retry loop approach seems much more universal. I'm going to rewrite the PR and include this suggestion.

@jwadolowski
Copy link
Author

@oxpa @thresheek I've just updated the PR as per your suggestion.

The updated logic has been tested against one of my NGINX container images and it worked as expected:

#9 19.47 build_module.sh: INFO: Creating /tmp/build_module.sh.2138 build area
#9 19.47 build_module.sh: INFO Downloading module source
#9 19.48 --2025-10-08 13:35:36--  https://httpbingo.org/status/502
#9 19.49 Resolving httpbingo.org (httpbingo.org)... 66.241.125.232
#9 19.53 Connecting to httpbingo.org (httpbingo.org)|66.241.125.232|:443... connected.
#9 19.60 HTTP request sent, awaiting response... 502 Bad Gateway
#9 19.73 2025-10-08 13:35:36 ERROR 502: Bad Gateway.
#9 19.73
#9 19.73 Attempt 1 failed! Waiting 1 seconds before retry...
#9 20.75 --2025-10-08 13:35:37--  https://httpbingo.org/status/502
#9 20.76 Resolving httpbingo.org (httpbingo.org)... 66.241.125.232
#9 20.77 Connecting to httpbingo.org (httpbingo.org)|66.241.125.232|:443... connected.
#9 20.84 HTTP request sent, awaiting response... 502 Bad Gateway
#9 21.07 2025-10-08 13:35:37 ERROR 502: Bad Gateway.
#9 21.07
#9 21.08 Attempt 2 failed! Waiting 2 seconds before retry...
#9 23.09 --2025-10-08 13:35:39--  https://httpbingo.org/status/502
#9 23.10 Resolving httpbingo.org (httpbingo.org)... 66.241.125.232
#9 23.10 Connecting to httpbingo.org (httpbingo.org)|66.241.125.232|:443... connected.
#9 23.18 HTTP request sent, awaiting response... 502 Bad Gateway
#9 23.31 2025-10-08 13:35:40 ERROR 502: Bad Gateway.
#9 23.31
#9 23.31 Attempt 3 failed! Waiting 4 seconds before retry...
#9 27.32 --2025-10-08 13:35:44--  https://httpbingo.org/status/502
#9 27.33 Resolving httpbingo.org (httpbingo.org)... 66.241.125.232
#9 27.33 Connecting to httpbingo.org (httpbingo.org)|66.241.125.232|:443... connected.
#9 27.41 HTTP request sent, awaiting response... 502 Bad Gateway
#9 27.54 2025-10-08 13:35:44 ERROR 502: Bad Gateway.
#9 27.54
#9 27.54 3 attempts failed!
#9 27.54
#9 27.54 gzip: stdin: unexpected end of file
#9 27.54 tar: Child returned status 1
#9 27.54 tar: Error is not recoverable: exiting now
#9 27.54
#9 27.54 gzip: stdin: unexpected end of file
#9 27.54 tar: Child returned status 1
#9 27.54 tar: Error is not recoverable: exiting now
#9 27.55 mv: missing destination file operand after '/tmp/build_module.sh.2138/vts'
#9 27.55 Try 'mv --help' for more information.
#9 27.55 /pkg-oss
#9 27.55 build_module.sh: ERROR: Cannot locate module config file - quitting
#9 27.55 + exit 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants