Skip to content

Conversation

mcpherrinm
Copy link
Contributor

@mcpherrinm mcpherrinm commented Oct 7, 2025

Once we have enough RVA results, set a tighter timeout on the remaining ones so we don't hold up issuance on a single slow perspective. This strikes a balance between our previous strategy (cancelling all dangling remotes immediately upon getting enough results) and our current strategy (don't cancel any remotes unless we hit our global timeout).

Add a new VA config field to control how long we wait for the lagging perspectives. If this config field is not set, default to not cancelling early, which is our current behavior.

Once we have enough perspectives, we want to wait a little bit more to see if
the last perspective replies, but we don't want to hold up all of issuance on
a perspective that's hanging or being slow.
@jsha
Copy link
Contributor

jsha commented Oct 7, 2025

This seems like a good idea. My first instinct was that, rather than a sleep past the end of hitting quorum, we should have a certain amount of time past the start of validation. E.g. "We'll wait up to 10 seconds to see all perspectives; after that we'll return early if we hit quorum." The idea being that it would be somewhat more obvious in the impact it would create in our latencies. But this is probably better, since slow hosts are generally slow from all perspectives; so a host that takes 15 seconds to reply should get an additional (e.g.) 2 seconds to reply from the last perspective.

@aarongable
Copy link
Contributor

Yeah, I like this idea. I like that it can mitigate one perspective behaving poorly: if all the other perspectives completed in 500ms, we don't want to wait a full 60s for the last one. Give it a chance, sure, but no need to wait forever if we have what we need. I think this is an elegant compromise between our previous strategy (cancel immediately upon getting enough results) and our current strategy (never cancel).

I think with a config and a default value I'm happy to LGTM this.

@aarongable aarongable marked this pull request as ready for review October 15, 2025 23:35
@aarongable aarongable requested a review from a team as a code owner October 15, 2025 23:35
@aarongable aarongable self-requested a review October 15, 2025 23:35
Copy link
Contributor

@mcpherrinm, this PR appears to contain configuration and/or SQL schema changes. Please ensure that a corresponding deployment ticket has been filed with the new values.

@aarongable
Copy link
Contributor

Requesting review from others because Matt asked me to take this PR over from him.

@letsencrypt letsencrypt deleted a comment from github-actions bot Oct 15, 2025
@mcpherrinm
Copy link
Contributor Author

Thanks!

@aarongable aarongable requested a review from jsha October 16, 2025 21:56
jsha
jsha previously approved these changes Oct 16, 2025
@aarongable aarongable merged commit 4850cf2 into main Oct 21, 2025
13 checks passed
@aarongable aarongable deleted the mattm-timer-slow branch October 21, 2025 16:55
@aarongable
Copy link
Contributor

I've filed IN-11910 to track the corresponding production config changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants