refactor(shred-network): consolidate threads #980
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The current structure of the shred network is copied directly from agave. Each pair of
[square brackets]
represents a thread. In agave they use different names, and some of the components are duplicated on multiple threads in agave where we use just one, but otherwise the architecture is the same. I simply copied agave without thinking about how to improve the design.This design is pointlessly complicated. It makes the code hard to understand and hard to debug. This is a sequence of steps that need to happen in order, and there's no good reason why they shouldn't happen in the same thread.
I've started to reduce the number of threads here by consolidating the logic from the shred receiver, verifier, and processor into a single thread, and consolidate the packet tagging into the socket threads. Here's the new approach I have in the current PR:
I'd like to also flatten down the socket threads so that this entire thing becomes a single thread, but that's going to require a more complicated rework of our networking code.
I haven't done much benchmarking yet, but the performance appears to be unchanged by this refactor. Before and after this change, sig is able to process about 40 slots worth of mainnet shreds per second on my computer running on my home network.
Future optimizations
If in the future, if the number of shreds skyrockets and we really need multiple threads to handle them, then it will require some more rework, but not a return to the old design. The old design was overly pipelined. If you need more parallelism, you really only need to define two single threaded tasks:
You can parallelize task 1 in a thread pool if necessary. Currently we have task one split into six different threads that all do different things and I don't see any reason to keep this up.