Enough functionality to implement adaptive load shedding #6148
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
An investigation into backpressure issues in the router.
Most of the changes are in various plugins to implement backpressure. However, those fixes are not enough to provide useful functionality...
The current implementation of the router create a new pipeline for each connection. This has the unfortunate impact of discarding state which is required for various load impacting layers to work correctly.
This exploration modifies the router to hold a single master pipeline which is clone'd for each connection. This allows the various tower connection limiting layers to work correctly.
I've got a version which works with standard tower layers, commented out here, but I've also got a potentially more interesting version which uses a load shedded based on Little's Law, which is what is active in this code.
Notes:
Modifying the pipeline to be cloneable has generally worked fine, but it has caused issues for the Limit layer. This layer looks "generally problematic" since it appears to make a number of assumptions about what request rejection actually means. I've done some minimal modification to try and make it work win a cloned pipeline, but tests are still failing and I'm not sure it does what it should do.
I also noticed that when implementing backpressure, various mock tests needed to be modified since test rejection happened earlier in the pipeline and a map_result() somewhere isn't triggered. That needs some investigation, but I think it's a small problem to address.
I modified the bridge query planner pool to prevent excessive queueing in this layer. Since I now want to control this by load_shedding before this service is reached, I only want enough channels to support the number of planners.
Summary:
This PR provides a router which will operate with approximately the same performance of the base router, but which controls memory and rejects excess load to prevent "over-commit" by the router. This is a very desirable property.
More testing is required, but this is looking promising so far.
Description here
Fixes #issue_number
Checklist
Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.
Exceptions
Note any exceptions here
Notes
Footnotes
It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this. ↩
Configuration is an important part of many changes. Where applicable please try to document configuration examples. ↩
Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions. ↩