Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Erlay meta-issue: gradual deployment #11

Open
naumenkogs opened this issue Jan 24, 2022 · 5 comments
Open

Erlay meta-issue: gradual deployment #11

naumenkogs opened this issue Jan 24, 2022 · 5 comments

Comments

@naumenkogs
Copy link
Owner

naumenkogs commented Jan 24, 2022

Erlay gradual deployment

We cannot expect everyone to enable Erlay at once. Furthermore, it will probably take couple years before we reach even half of the network to enable it, just based on how fast users update their nodes.

That’s why we need to understand the impact of it at different scales of deployment, and potentially tune parameters for the best outcomes.

I define the following configurations.
C10. ~10% deployment
C25. ~25% deployment
C50. ~50% deployment
C90. ~75% deployment
C90. ~100% deployment

I think these configurations can be roughly followed on new releases. Say, we see a change from 10% to 25% in 2023, then we update the config for new nodes according to the suggestions below.

To modify this value, update init.2.reconcile_percent in config.

The configurations will affect only in/out average delay before flooding a tx to the node, or before adding it to the reconciliation set. These delays are used to obfuscate transaction origin from timing analysis.
The relevant simulator fields are:
in_relay_delay_recon_peer/out_relay_delay_recon_peer
These fields only apply to reconciliation-enabled nodes, when they choose the delay for both reconciling and legacy peers they have. For legacy nodes, the delays are always 5/2, as we’re not planning to change that.

The full config (in which, only these few fields will be modified based on configuration) can be found here.

To avoid any imbalances (bandwidth, relay speed, etc), the configs (mainly just relay delays) should be not very different across phases (we could also do if locally we reached a certain % of conns, switch to other config values to make it even more balanced).

Note: for these experiments/settings I use 5 for both in_flood_peers_percent/out_flood_peers_percent. I think this is what we would do in the real network too: as long as at least 25% of nodes are legacy, they would sustain low latency (and this would reduce overall bandwidth slightly).

C10

For anything <= 10%, there will be very few Erlay connections in the network (every Erlay-enabled node statistically will have at most 1), so it’s hard to expect any real savings.
It’s possible to still get real gains if a node manually restricts itself from connecting to non-erlay nodes (e.g., via -connect CLI).

I suggest flood delays of 5s/2s for reconciling nodes.

init.2.in_relay_delay_recon_peer 5000
init.2.out_relay_delay_recon_peer 2000

This gives a little gain on Erlay nodes (7.65 INV per tx), and no much effect beyond that.

Even though there is no real effect here, these nodes will be ready to participate in future reconciliations, which is valuable

C25

For <25%, it’s possible to start a bit more gains on both nodes.

The most balanced configuration (7.89 INV on legacy and 7.49 INV on erlay) happens with the following config:

init.2.in_relay_delay_recon_peer 4000
init.2.out_relay_delay_recon_peer 1500

C50

Even more gains come here: 7.39 INV on legacy and 5.98 INV on erlay nodes, with the following config:

init.2.in_relay_delay_recon_peer 3000
init.2.out_relay_delay_recon_peer 1300

An interesting alternative: we can get 6.68 INV and 6.51INV with the following configurations:

init.2.in_relay_delay_recon_peer 2000
init.2.out_relay_delay_recon_peer 1000

C75

Gains: 6.29 INV on legacy and 4.29 on erlay nodes with:

init.2.in_relay_delay_recon_peer 2000
init.2.out_relay_delay_recon_peer 1000

Latency

In 100% Erlay, we decided to cap flooding at ~10%, since that would provide a tolerable latency increase (from 3.5 to 6s), given other parameters.

All these configurations provide a lower latency than 100% erlay.

Connectivity increase

One of the benefits of Erlay is to allow connectivity increase for almost no bandwidth increase.

Say, those 50% erlay nodes with the given config were to use 12 outbound connections instead of 8.
To play with this, update out_peers_recon, and potentially reconciliation_interval too.

My preliminary experiments show that the bandwidth goes up from 5.98 INV to 7.5, while also slightly increasing the bandwidth of legacy nodes from 7.39 INV to 7.51.

It seems like this is what’s happening: since 50% of nodes is legacy, 50% those extra conns would be legacy. Given that connectivity is increased on half of the nodes, on average this should result in 2 extra connections per node in the network. In legacy flooding, this would be 2 extra INV per tx.

In our half-erlay setting, however, this is just (1.52 + 0.12) / 2 = 0.82 extra INV per tx. Or, for Erlay nodes, it’s 1.52 extra instead of, potentially, 4.

A smarter thing would be to probably make those new connections Erlay-only. However, I’m unsure it’s good for the topology to group nodes like that.

TX work

Another goal while picking Erlay configuration was avoid imbalance of the workload on nodes.

Across reachable/private angle, the workload distribution remains the same.

Across erlay/legacy angle, legacy nodes take slightly more workload. E.g., in 50/50 case, they take (1.2 against 0.8 tx messages per tx, across the entire network).

I don’t think this is a big deal: 1) legacy nodes will get (both in/out) INV traffic reduction just from the existence of erlay nodes; 2) the distribution of workload probably already varies a lot, based on the node connectivity, etc.

@naumenkogs
Copy link
Owner Author

naumenkogs commented Jan 24, 2022

These results should be confirmed running a real node, too... Running an erlay node with 7 legacy peers and 1 erlay, with the given delays. And 4/4 too. And compare it to the legacy node, and compare the bandwidth (splitting between INV and TX).

The relevant Bitcoin Core fields are INBOUND_INVENTORY_BROADCAST_INTERVAL and OUTBOUND_INVENTORY_BROADCAST_INTERVAL.

The results won't directly reflect the simulation results above, because we won't see any reduction on a legacy node (it happens only at scale).

However, it would be cool to confirm erlay does good while some of its peers are legacy.

@glozow
Copy link

glozow commented Jan 26, 2022

Thanks for putting this together. I was trying to reason about what percentage of the network we could reasonably expect to upgrade quickly. It looks like >50% of the network had upgraded 6 months after v0.21 was released, but taproot probably had something to do with it...

A smarter thing would be to probably make those new connections Erlay-only.

Just to clarify. Would this mean, as we're increasing the number of default outbound connections on a node, we would only allow reconciling peers for those extra connections?

@naumenkogs
Copy link
Owner Author

Thanks for putting this together. I was trying to reason about what percentage of the network we could reasonably expect to upgrade quickly. It looks like >50% of the network had upgraded 6 months after v0.21 was released, but taproot probably had something to do with it...

Yeah, possibly. I would expect 25% in a year, and 50% in 2 years. And that's after we make it enabled by default (which is probably not right away).

Just to clarify. Would this mean, as we're increasing the number of default outbound connections on a node, we would only allow reconciling peers for those extra connections?

I didn't mean that, but I think a useful general strategy would be: have of all outbounds, no more than 8 should be legacy. Something like that. The rest is implementation details.

@glozow
Copy link

glozow commented Jan 27, 2022

Across erlay/legacy angle, legacy nodes take slightly more workload. E.g., in 50/50 case, they take (1.2 against 0.8 tx messages per tx, across the entire network).

Another clarification: does this mean for every tx, the ratio of inv messages sent by legacy nodes vs Erlay nodes is 1.2 to 0.8 aka 3 to 2? So legacy nodes are sending 50% more messages than Erlay nodes?

@naumenkogs
Copy link
Owner Author

naumenkogs commented Jan 27, 2022

does this mean for every tx, the ratio of inv messages sent by legacy nodes vs Erlay nodes is 1.2 to 0.8 aka 3 to 2? So legacy nodes are sending 50% more messages than Erlay nodes?

No, this is specifically about tx relaying work. Legacy nodes send 1.5 more TX bandwidth (they just take take the work off of erlay nodes).

That can be minimized by reducing out_relay_delay_recon_peer. Then erlay nodes would be almost as fast as legacy nodes, in terms of being first to announce a tx (and thus relaying a full tx).
I think it's also rather safe, too.
Maybe, we should use that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants