Skip to content

Conversation

dnut
Copy link
Contributor

@dnut dnut commented Oct 2, 2025

The current structure of the shred network is copied directly from agave. Each pair of [square brackets] represents a thread. In agave they use different names, and some of the components are duplicated on multiple threads in agave where we use just one, but otherwise the architecture is the same. I simply copied agave without thinking about how to improve the design.

[turbine socket listener]-->[packet tagger]--\
                                              |-->[shred receiver]-->[shred verifier]-->[shred processor]-->ledger
 [repair socket listener]-->[packet tagger]--/

This design is pointlessly complicated. It makes the code hard to understand and hard to debug. This is a sequence of steps that need to happen in order, and there's no good reason why they shouldn't happen in the same thread.

I've started to reduce the number of threads here by consolidating the logic from the shred receiver, verifier, and processor into a single thread, and consolidate the packet tagging into the socket threads. Here's the new approach I have in the current PR:

[turbine socket listener]--\
                            |-->[shred receiver]-->ledger
 [repair socket listener]--/

I'd like to also flatten down the socket threads so that this entire thing becomes a single thread, but that's going to require a more complicated rework of our networking code.

I haven't done much benchmarking yet, but the performance appears to be unchanged by this refactor. Before and after this change, sig is able to process about 40 slots worth of mainnet shreds per second on my computer running on my home network.

Future optimizations

If in the future, if the number of shreds skyrockets and we really need multiple threads to handle them, then it will require some more rework, but not a return to the old design. The old design was overly pipelined. If you need more parallelism, you really only need to define two single threaded tasks:

  1. receive the shred from the network, verify them, and collect metadata about them
  2. write the shreds into the ledger

You can parallelize task 1 in a thread pool if necessary. Currently we have task one split into six different threads that all do different things and I don't see any reason to keep this up.

dnut added 9 commits August 26, 2025 14:55
the following threads are all eliminated and the logic is run by the shred receiver thread
- two packet handler threads for turbine and repair
- shred verifier thread
- unreachable: assume capacity when no capacity is guaranteed
- double free: deinit items in a recycled list without clearing the list, then deinit again on the next usage of the list
@github-project-automation github-project-automation bot moved this to 🏗 In progress in Sig Oct 2, 2025
Copy link

codecov bot commented Oct 2, 2025

Codecov Report

❌ Patch coverage is 56.84211% with 41 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/shred_network/shred_receiver.zig 46.66% 40 Missing ⚠️
src/shred_network/service.zig 93.33% 1 Missing ⚠️
Files with missing lines Coverage Δ
src/net/packet.zig 100.00% <ø> (ø)
src/net/socket_utils.zig 93.04% <100.00%> (+0.18%) ⬆️
src/shred_network/shred_verifier.zig 0.00% <ø> (-16.67%) ⬇️
src/utils/bitflags.zig 100.00% <ø> (ø)
src/shred_network/service.zig 89.90% <93.33%> (-1.90%) ⬇️
src/shred_network/shred_receiver.zig 44.92% <46.66%> (-2.35%) ⬇️

... and 8 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🏗 In progress
Development

Successfully merging this pull request may close these issues.

1 participant