-
-
Notifications
You must be signed in to change notification settings - Fork 629
Remove deprecated Temporary() usage in DNS retry logic #8441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Some context we should review when reviewing this PR, courtesy of @pgporada:
|
bdns/dns.go
Outdated
| // Retry all errors up to maxTries limit for maximum resilience. | ||
| isRetryable := true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without commenting on whether we should retry all errors (I'm still contemplating), I think that if we do go this direction, we should take this minor simplification a step further: I'd remove the isRetryable boolean altogether and simplify the conditionals on lines 276 and 283.
|
Hi @sheurich! Thanks for the contribution. I'm glad to be getting rid of the deprecated call to As a side note, can I ask if you are using AI to generate your PR descriptions? If so, could you provide the prompt you used, or how you generated it (i.e. what are the inputs)? In particular I'd love to discuss the Rationale but I first want to make sure those are your words and not an AI's. My takeaway from golang/go#45729 is that Temporary is a superset of Timeout, and:
So a smaller change here would be to simply replace Temporary with Timeout, which is not deprecated. I'd prefer that change, in part because it avoids masking surprising behavior behind retries. As an example: in Unbound 1.20, a few new options were introduced, including That As an example of a non-retryable error: a certificate validation failure for one out of a pool of DoH servers. If we retry all errors, we might never notice that one server has a broken config, because we would automatically retry on a different server1. Yes, this would be more resilient, but it would also be masking brokenness, possibly until the brokenness gets worse. So, overall, I'm in favor of switching this to use Footnotes
|
I use Claude Code (and to a small extent Roo Code) as a coding assistant. For this PR, there wasn't a traditional "prompt". My workflow was:
We were seeing 'unexpected EOF' errors in VA DoH requests. Your comment about Unbound’s
I agree that replacing Temporary() with Timeout() is the correct minimal fix. The deprecation of Temporary() indicates we need better visibility into what's retryable, but your point about not masking errors like cert validation failures is good. Retry-all could hide critical failures (although the logging should still note this).
I'll update this PR to the simple Temporary() → Timeout() replacement as you suggested. Thanks! |
The net.Error.Temporary() method has been deprecated since Go 1.18. Replace with Timeout() to check specifically for timeout errors. This also switches from *url.Error to net.Error because Timeout() is defined by the net.Error interface
Reflects the change from net.Error.Temporary() to net.Error.Timeout() for DNS retry logic. Test changes: - Add testTimeoutError type to mirror tempError - Rename test variables: isTempErr → isTimeoutErr, nonTempErr → nonTimeoutErr - Update test case names and comments to reference "timeout" - Fix test expectations: timeout errors return "query timed out" message - Update mockTempURLError → mockTimeoutURLError All tests pass with the new timeout-based retry semantics.
a873eb7 to
867033e
Compare
|
@jsha I appreciate the visibility vs. resilience trade-off discussion. While I agree with moving forward with Timeout() as the minimal fix, I wanted to share my operational perspective: Even mature deployment practices can't eliminate all transient connection errors. Network path transitions and process/container orchestration events routinely cause brief connection-refused/reset errors that are genuinely transient and not misconfigurations. In production CA operations, failing customer certificate issuance for these routine events feels like the wrong trade-off. I think retry-all (or at least retry-connection-errors) with proper logging/alerting provides the best of both worlds: resilience for customers and visibility for operators. That said, I respect the fail-fast philosophy and understand the operational debt concerns. I'm happy to stick with the Timeout() change as implemented here and we can revisit retry strategy in future PRs if operational experience suggests it's needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You make good points about transient errors in the connection between VA and Unbound. Can you add some detail about what your setup is like, what specific errors you're getting, and how often?
| // Check if the error is a timeout error. Network errors | ||
| // that can timeout implement the net.Error interface. | ||
| var netErr net.Error | ||
| isRetryable = errors.As(r.err, &netErr) && netErr.Timeout() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why this became a net.Error? *url.Error also offers .Timeout().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are functionally identical; http.Client.Do() returns *url.Error, which implements net.Error. Both errors.As checks match the same object and call the same .Timeout() method.
I changed to net.Error because we only need the .Timeout() method, not *url.Error-specific fields like .Op or .URL.
Unbound and boulder-va run in separate containers, communicating over Docker networking. Current error rates (% of validations):
|
Summary
Replace deprecated
Temporary()withTimeout()in DNS retry logic per golang/go#45729.Change
net.Error.Timeout()instead of deprecatedTemporary()