-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EIP-7594: Decouple network subnets from das-core #3832
Conversation
I haven't done a detailed review of how you do the layering here, but I generally support this direction to ensure that core protocol and network are sufficiently decoupled so they can be tuned/optimized independently |
I think this is a good proposal. I believe it has something in common with former 'Network Shards' idea. However it's worth to keep in mind that in this case the gossip 'backbone' is defined rather implicitly (by the upper das core layer) than explicitly. Changing for example |
I like the idea of decoupling the concepts related to custody allocation from the delivery method. My concern is that the only reason I know of for grouping columns in custody is the (supposedly) inefficiency of gossipsub (implementations) in handling a larger number of topics and having smaller topics. Anyway, I think the conceptual decoupling is good to have. Another thing I see in the PR is that you also have FAQ extensions and alike. Haven't seen the details yet, I will try to give it a review later on. |
One more thing: parameter changes are also proposed in #3779 |
From another point of view, that can be seen as a non-networking issue. Let me rephrase your issue. If the number of custody groups is too large, the number of nodes in each group will be so small that you cannot find the peers to cover all the custody groups. I think whether it's an issue on the network layer is quite subjective. However, the whole point of doing DAS is to reduce the bandwidth consumption, so every part of the protocol (including the core one) can be seen as related to the network in some way. |
Another way to decouple the GossipSub topics from the core protocol is not to correspond subnets to GossipSub topics, but think of subnets as groups of nodes. However, we use the term "subnets" to mean GossipSub topics a lot in the past, so it's hard to think of subnets as anything else. |
I have a big concern on that change. Let's discuss it in that PR. |
I miss the method which translates custody group into subnet, I think we need it or I don't understand an idea a bit. So instead of this def compute_subnet_for_data_column_sidecar(column_index: ColumnIndex) -> SubnetID:
return SubnetID(column_index % DATA_COLUMN_SIDECAR_SUBNET_COUNT) we should have something like def compute_subnet_for_custody_group(custody_group: UInt64) -> SubnetID:
assert custody_group < NUMBER_OF_CUSTODY_GROUPS
return SubnetID(custody_group % DATA_COLUMN_SIDECAR_SUBNET_COUNT) And also it would be good to have compile assertion that NUMBER_OF_CUSTODY_GROUPS is set to the number, which could be always translated to DATA_COLUMN_SIDECAR_SUBNET_COUNT (say, we don't have 32 groups and 64 subnets or 33 groups and 32 subnets) |
The idea is that you can get the subnets by calling We don't want to do
What do you mean?
Absolutely, we can have 32 groups and 64 subnets. |
@ppopth now I got this, thank you for the explanation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm from lodestar
I think this is ready to be merged? |
I know this proposal has been around for a while, and I actually reviewed it earlier, but I have some additional thoughts. I’m trying to wrap my head around the benefits of this proposal. It looks like the idea is to allow us to tweak the subnet count without touching the core protocol ( Personally, I’m not entirely convinced that this change is necessary. The more we decouple things, the more complexity we introduce, which might not be ideal. Since we're already looking at up to 128 subnets, it feels like we won’t need to go beyond that in the near future. I’ve heard that research shows it can be more efficient to spread larger chunks of data across multiple subnets rather than sticking to just one. So, if we end up increasing the blob count significantly, I can see how increasing the subnet count might make sense. But for now, I’m not sure there’s a big advantage in going above 128 subnets unless we’re dealing with a much higher blob count. Even with 256 blobs per block, we’re still only sending up to 512kb per block to each subnet. Maybe there’s something I’m missing here and about the future plans - Is there an issue with the current spec, or are we just trying to future-proof for scenarios we’re not certain about yet? |
@jimmygchen Sorry for the late reply. I would like to compare this PR to attestation subnets and attestation committees. In the phase0 spec, we decoupled subnets from committees as well. In real life and as I always heard, no one really mentions committees. People always talk about attestation subnets. In fact, according to the spec, the phase0 core protocol has nothing to do with subnets. They only deal with committees. The subnets are in the network layer, another layer. So in PeerDAS, people may still discuss things in term of subnets (just like the attestation subnets), but I think it doesn't add more complexity since phase0 is already complex (according to your standard, some people may say it's not complex). |
Thanks for the context @ppopth. It's a fair comparison with some differences. Attestations have a target committee size, while with DAS, we’ve got a fixed number of columns/groups. And repeating my earlier comment, I'm not sure we need to design this with increasing to over 128 subnets in mind at this stage, or if there's any reason to reduce the number of subnets after shipping PeerDAS. I feel there might be some tension, but I just wanted to pass along our team's feedback and keep things moving since we talked about scheduling this for |
@jimmygchen Thank you for the feedback.
I think it's reasonable to make the spec agile to changes even if we don't know clearly yet if we want to change the number of subnets or not. Again, just like attestation subnets, the number of subnets and the number of committees have been both 64 for a long time and we don't see whether this will ever change. That reason also applies here. The number of column subnets will probably be equal to the number of custody groups forever, but it has value in the view of agility. What do you think? |
85bdcf6
to
1dd29cc
Compare
Currently we use subnets as a unit of custody in the PeerDAS core protocol because it doesn't make sense to partially custody only some columns in the subnets and waste the bandwidth to download the columns the node doesn't custody. Since subnets correspond to GossipSub topics which are in a layer lower than the core protocol, using subnets as a unit of custody makes the core layer and the network layer too coupled to each other and leave no room for the network layer flexibility. This commit introduces "custody groups" which are used a unit of custody instead of subnets. The immediate benefit of the decoupling is that we can immediately increase the number of subnets without affecting the expected number of peers to cover all columns and affecting the network stability and without touching the core protocol. The reason we want to increase the number of subnets to match the number of columns is that the columns will be propagated through the network faster when they have their own subnets. Just like EIP-4844, each blob has its own subnet because, if all the blobs are in a single subnet, the blobs will be propagated more slowly. Since we keep the number of custody groups the same as the previous number of subnets (32), the expected number of peers you need to cover all the columns is not changed. In fact, you need only NUMBER_OF_COLUMNS and NUMBER_OF_CUSTODY_GROUPS to analyze the expected number, which makes the core protocol completely decoupled from the network layer.
|
||
To start with a simple, stable backbone, for now, we don't shuffle the subnet assignments via the deterministic custody selection helper `get_custody_columns`. However, staggered rotation likely needs to happen on the order of the pruning period to ensure subnets can be utilized for recovery. For example, introducing an `epoch` argument allows the function to maintain stability over many epochs. | ||
No, the number of subnets doesn't really matter. What matters to the network stability is the number of nodes and the churn rate in the network. If the number of the nodes is too low, it's likely to have a network partition when some nodes are down. For the churn rate, if the churn rate is high, we even need to have a higher number of nodes, since nodes are likely to be turned off more often. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having a higher subnet count may require us to bump up our target peer count, as there would be a higher probability that nodes have 0 peers in a given column subnet, and will have to do more frequent discovery.
@dknopik has done some interesting simulation on PeerDAS with 128 column subnets, and the LH-only network (without any supernodes) performs poorly with our current peer count (100), but much better when when target peers is increased to 300. With a small peer count, there's a higher chance that the node doesn't have peers across all column subnets, and fails to publish columns to the network. However, increasing peer count to 300 does increase the resource consumption in LH significantly, and our preference is to increase our peer gradually to ~120. I think there's some work we need to do to improve our peer selection logic for PeerDAS but thought it might be worth mentioning.
I'm wondering if we need to choose a column subnet count where configuring CUSTODY_REQUIREMENT
to a lower number won't cause the network to fall apart, with the existing target peer count?
With this, I'm convinced that decoupling network and das-core do seems like a useful thing - because without decoupling, changing any of the parameters would have impact to BOTH security and networking. If we decouple networking and das-core, we could determine an optimal networking parameters and test different custody parameters, and vice versa.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having a higher subnet count may require us to bump up our target peer count, as there would be a higher probability that nodes have 0 peers in a given column subnet, and will have to do more frequent discovery.
According to this PR, the target peer count depends on the number of custody groups, right? Yes, before this PR it depends on the number of subnets, but this PR changes the structure and the terminology. WDYT?
It seems that everyone in this PR is already convinced that this PR is useful. So it means it's ready to be merged? |
Currently we use subnets as a unit of custody in the PeerDAS core protocol because it doesn't make sense to partially custody only some columns in the subnets and waste the bandwidth to download the columns the node doesn't custody.
Since subnets correspond to GossipSub topics which are in a layer lower than the core protocol, using subnets as a unit of custody makes the core layer and the network layer too coupled to each other and leave no room for the network layer flexibility.
This commit introduces "custody groups" which are used as a unit of custody instead of subnets.
The immediate benefit of the decoupling is that we can immediately increase the number of subnets without affecting the expected number of peers to cover all columns and affecting the network stability and without touching the core protocol.
The reason we want to increase the number of subnets to match the number of columns is that the columns will be propagated through the network faster when they have their own subnets. Just like EIP-4844, each blob has its own subnet because, if all the blobs are in a single subnet, the blobs will be propagated more slowly.
Since we keep the number of custody groups the same as the previous number of subnets (32), the expected number of peers you need to cover all the columns is not changed. In fact, you need only NUMBER_OF_COLUMNS and NUMBER_OF_CUSTODY_GROUPS to analyze the expected number, which makes the core protocol completely decoupled from the network layer.