Erlay meta-issue: gradual deployment #11

naumenkogs · 2022-01-24T13:10:37Z

Erlay gradual deployment

We cannot expect everyone to enable Erlay at once. Furthermore, it will probably take couple years before we reach even half of the network to enable it, just based on how fast users update their nodes.

That’s why we need to understand the impact of it at different scales of deployment, and potentially tune parameters for the best outcomes.

I define the following configurations.
C10. ~10% deployment
C25. ~25% deployment
C50. ~50% deployment
C90. ~75% deployment
C90. ~100% deployment

I think these configurations can be roughly followed on new releases. Say, we see a change from 10% to 25% in 2023, then we update the config for new nodes according to the suggestions below.

To modify this value, update init.2.reconcile_percent in config.

The configurations will affect only in/out average delay before flooding a tx to the node, or before adding it to the reconciliation set. These delays are used to obfuscate transaction origin from timing analysis.
The relevant simulator fields are:
in_relay_delay_recon_peer/out_relay_delay_recon_peer
These fields only apply to reconciliation-enabled nodes, when they choose the delay for both reconciling and legacy peers they have. For legacy nodes, the delays are always 5/2, as we’re not planning to change that.

The full config (in which, only these few fields will be modified based on configuration) can be found here.

To avoid any imbalances (bandwidth, relay speed, etc), the configs (mainly just relay delays) should be not very different across phases (we could also do if locally we reached a certain % of conns, switch to other config values to make it even more balanced).

Note: for these experiments/settings I use 5 for both in_flood_peers_percent/out_flood_peers_percent. I think this is what we would do in the real network too: as long as at least 25% of nodes are legacy, they would sustain low latency (and this would reduce overall bandwidth slightly).

C10

For anything <= 10%, there will be very few Erlay connections in the network (every Erlay-enabled node statistically will have at most 1), so it’s hard to expect any real savings.
It’s possible to still get real gains if a node manually restricts itself from connecting to non-erlay nodes (e.g., via -connect CLI).

I suggest flood delays of 5s/2s for reconciling nodes.

init.2.in_relay_delay_recon_peer 5000
init.2.out_relay_delay_recon_peer 2000

This gives a little gain on Erlay nodes (7.65 INV per tx), and no much effect beyond that.

Even though there is no real effect here, these nodes will be ready to participate in future reconciliations, which is valuable

C25

For <25%, it’s possible to start a bit more gains on both nodes.

The most balanced configuration (7.89 INV on legacy and 7.49 INV on erlay) happens with the following config:

init.2.in_relay_delay_recon_peer 4000
init.2.out_relay_delay_recon_peer 1500

C50

Even more gains come here: 7.39 INV on legacy and 5.98 INV on erlay nodes, with the following config:

init.2.in_relay_delay_recon_peer 3000
init.2.out_relay_delay_recon_peer 1300

An interesting alternative: we can get 6.68 INV and 6.51INV with the following configurations:

init.2.in_relay_delay_recon_peer 2000
init.2.out_relay_delay_recon_peer 1000

C75

Gains: 6.29 INV on legacy and 4.29 on erlay nodes with:

init.2.in_relay_delay_recon_peer 2000
init.2.out_relay_delay_recon_peer 1000

Latency

In 100% Erlay, we decided to cap flooding at ~10%, since that would provide a tolerable latency increase (from 3.5 to 6s), given other parameters.

All these configurations provide a lower latency than 100% erlay.

Connectivity increase

One of the benefits of Erlay is to allow connectivity increase for almost no bandwidth increase.

Say, those 50% erlay nodes with the given config were to use 12 outbound connections instead of 8.
To play with this, update out_peers_recon, and potentially reconciliation_interval too.

My preliminary experiments show that the bandwidth goes up from 5.98 INV to 7.5, while also slightly increasing the bandwidth of legacy nodes from 7.39 INV to 7.51.

It seems like this is what’s happening: since 50% of nodes is legacy, 50% those extra conns would be legacy. Given that connectivity is increased on half of the nodes, on average this should result in 2 extra connections per node in the network. In legacy flooding, this would be 2 extra INV per tx.

In our half-erlay setting, however, this is just (1.52 + 0.12) / 2 = 0.82 extra INV per tx. Or, for Erlay nodes, it’s 1.52 extra instead of, potentially, 4.

A smarter thing would be to probably make those new connections Erlay-only. However, I’m unsure it’s good for the topology to group nodes like that.

TX work

Another goal while picking Erlay configuration was avoid imbalance of the workload on nodes.

Across reachable/private angle, the workload distribution remains the same.

Across erlay/legacy angle, legacy nodes take slightly more workload. E.g., in 50/50 case, they take (1.2 against 0.8 tx messages per tx, across the entire network).

I don’t think this is a big deal: 1) legacy nodes will get (both in/out) INV traffic reduction just from the existence of erlay nodes; 2) the distribution of workload probably already varies a lot, based on the node connectivity, etc.

The text was updated successfully, but these errors were encountered:

naumenkogs · 2022-01-24T13:15:15Z

These results should be confirmed running a real node, too... Running an erlay node with 7 legacy peers and 1 erlay, with the given delays. And 4/4 too. And compare it to the legacy node, and compare the bandwidth (splitting between INV and TX).

The relevant Bitcoin Core fields are INBOUND_INVENTORY_BROADCAST_INTERVAL and OUTBOUND_INVENTORY_BROADCAST_INTERVAL.

The results won't directly reflect the simulation results above, because we won't see any reduction on a legacy node (it happens only at scale).

However, it would be cool to confirm erlay does good while some of its peers are legacy.

glozow · 2022-01-26T12:23:49Z

Thanks for putting this together. I was trying to reason about what percentage of the network we could reasonably expect to upgrade quickly. It looks like >50% of the network had upgraded 6 months after v0.21 was released, but taproot probably had something to do with it...

A smarter thing would be to probably make those new connections Erlay-only.

Just to clarify. Would this mean, as we're increasing the number of default outbound connections on a node, we would only allow reconciling peers for those extra connections?

naumenkogs · 2022-01-26T16:31:16Z

Thanks for putting this together. I was trying to reason about what percentage of the network we could reasonably expect to upgrade quickly. It looks like >50% of the network had upgraded 6 months after v0.21 was released, but taproot probably had something to do with it...

Yeah, possibly. I would expect 25% in a year, and 50% in 2 years. And that's after we make it enabled by default (which is probably not right away).

Just to clarify. Would this mean, as we're increasing the number of default outbound connections on a node, we would only allow reconciling peers for those extra connections?

I didn't mean that, but I think a useful general strategy would be: have of all outbounds, no more than 8 should be legacy. Something like that. The rest is implementation details.

glozow · 2022-01-27T10:14:17Z

Across erlay/legacy angle, legacy nodes take slightly more workload. E.g., in 50/50 case, they take (1.2 against 0.8 tx messages per tx, across the entire network).

Another clarification: does this mean for every tx, the ratio of inv messages sent by legacy nodes vs Erlay nodes is 1.2 to 0.8 aka 3 to 2? So legacy nodes are sending 50% more messages than Erlay nodes?

naumenkogs · 2022-01-27T11:26:19Z

does this mean for every tx, the ratio of inv messages sent by legacy nodes vs Erlay nodes is 1.2 to 0.8 aka 3 to 2? So legacy nodes are sending 50% more messages than Erlay nodes?

No, this is specifically about tx relaying work. Legacy nodes send 1.5 more TX bandwidth (they just take take the work off of erlay nodes).

That can be minimized by reducing out_relay_delay_recon_peer. Then erlay nodes would be almost as fast as legacy nodes, in terms of being first to announce a tx (and thus relaying a full tx).
I think it's also rather safe, too.
Maybe, we should use that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Erlay meta-issue: gradual deployment #11

Erlay meta-issue: gradual deployment #11

naumenkogs commented Jan 24, 2022 •

edited

Loading

naumenkogs commented Jan 24, 2022 •

edited

Loading

glozow commented Jan 26, 2022

naumenkogs commented Jan 26, 2022

glozow commented Jan 27, 2022

naumenkogs commented Jan 27, 2022 •

edited

Loading

Erlay meta-issue: gradual deployment #11

Erlay meta-issue: gradual deployment #11

Comments

naumenkogs commented Jan 24, 2022 • edited Loading

Erlay gradual deployment

C10

C25

C50

C75

Latency

Connectivity increase

TX work

naumenkogs commented Jan 24, 2022 • edited Loading

glozow commented Jan 26, 2022

naumenkogs commented Jan 26, 2022

glozow commented Jan 27, 2022

naumenkogs commented Jan 27, 2022 • edited Loading

naumenkogs commented Jan 24, 2022 •

edited

Loading

naumenkogs commented Jan 24, 2022 •

edited

Loading

naumenkogs commented Jan 27, 2022 •

edited

Loading