Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lighthouse proposer-reorgs feature causes proposing missed #5437

Closed
SimonSMH1015 opened this issue Mar 19, 2024 · 7 comments
Closed

lighthouse proposer-reorgs feature causes proposing missed #5437

SimonSMH1015 opened this issue Mar 19, 2024 · 7 comments

Comments

@SimonSMH1015
Copy link

Description

Please provide a brief description of the issue.

Lighthouse client has a propose featrue that they prefer to do a reorg once some rules are satisfied. Last week (Mar-15-2024 17:14:11 UTC+0) at slot 8641569 we encounted a propose miss due to proposer-reorgs feature. We connected two ssv operator with aound 500 validators to a lighthose mainnet full node. And the validator with index 990976 are set to propose block at slot 8641569.

Because the previous slot was considered as a weak head, which triggerd the lighthouse proposer-reorgs feature, it attempt to do a reorg and took 500 milliseconds longer than usual to propose. At the same time, because our node is connected to the SSV operator, it is slower than the directly connected validator when requesting the blinded block . This lead to block was broadcast 2380 milliseconds after the start of the propose.

However, the next proposer, proposer of slot 8641570, which is alao a lighthouse client, should think our block as a late block and it should use the proposer-reorgs feature to do the reorg again. As a result, slot 8641569 was reorged and we got a propose miss.

Version

Please provide your Lighthouse and Rust version. Are you building from
stable or unstable, which commit?

5.1.0 stable

Present Behaviour

Describe the present behaviour of the application, with regards to this
issue.

Lighthosue tried to do a reorg at 8641569.

Mar 15 17:14:09.132 INFO Prepared beacon proposer                parent_root: 0xd64445f20beeee955a399c3d2d36d885a53d56a3f45d8bca9cc61c1a8bc36fff, validator: 990976, prepare_slot: 8641569, service: beacon
Mar 15 17:14:11.394 INFO Attempting re-org due to weak head      threshold_weight: 195602481250000, head_weight: 0, parent: 0xd64445f20beeee955a399c3d2d36d885a53d56a3f45d8bca9cc61c1a8bc36fff, weak_head: 0x07252935047cfb8ed2c78cb343b976cffc313e9927549975c232c47e808b38bc, service: beacon

but took longer time than usual, which caused a delayed broadcast.

Mar 15 17:14:11.395 INFO Proposing block to re-org current head  head_to_reorg: 0x0725…38bc, slot: 8641569, service: beacon
Mar 15 17:14:11.672 INFO Requesting blinded header from connected builder, parent_hash: 0x2e76661b14aebdad9c56a2c1cd24f2b5fb803da77b0d830e984ddaf675c01dbe, pubkey: 0xa6af3fdbf1369e7836ffb219f187306ba262caca38b981c42b05ed3f45c24ec811782436a821f1572ab3ca522ddd4746, slot: Slot(8641569), service: exec
Mar 15 17:14:12.641 INFO Requested blinded execution payload     parent_hash: 0x2e76661b14aebdad9c56a2c1cd24f2b5fb803da77b0d830e984ddaf675c01dbe, local_response_ms: 20, local_fee_recipient: 0xaecd92aec5bfbe2f5a02db2dee90733897360983, relay_response_ms: 968, relay_fee_recipient: 0x88c6c46ebf353a52bdbab708c23d0c81daa8134a, service: exec
Mar 15 17:14:12.641 INFO Received local and builder payloads     parent_hash: 0x2e76661b14aebdad9c56a2c1cd24f2b5fb803da77b0d830e984ddaf675c01dbe, local_block_hash: 0xa6e83f808dd427f070872f969b5553ff3fae6496f5088163b6c7f53ae1a2c299, relay_block_hash: 0x168f6a50367e3a44edc8303d95a46eaa1dda7043753dc1d944c889e61d6b21dd, service: exec
Mar 15 17:14:12.642 INFO Relay block is more profitable than local block, builder_boost_factor: None, boosted_relay_value: 36461374838835401, relay_value: 36461374838835401, local_block_value: 34093644618030147, service: exec
Mar 15 17:14:13.384 ERRO Block broadcast was delayed             root: 0x2776aa18b35049d8655fc47e19f701a2573fd794b269f0f6bd6ed2dd0d3c7638, slot: 8641569, delay_ms: 2384, provenance: builder, msg: system may be overloaded, block may be orphaned
Mar 15 17:14:17.007 INFO Synced                                  slot: 8641569, block:    …  empty, epoch: 270049, finalized_epoch: 270047, finalized_root: 0xea35…1a14, exec_hash: 0x9eab…cfa5 (verified), peers: 107, service: slot_notifier

And the next proposer with a lighthouse client sucessfully performed another reorg at slot 8641570

Mar 15 17:14:18.371 INFO New block received                      root: 0x2776aa18b35049d8655fc47e19f701a2573fd794b269f0f6bd6ed2dd0d3c7638, slot: 8641569
Mar 15 17:14:18.472 WARN Builder failed to reveal payload        parent_hash: "0x2e76…1dbe", block_root: 0x2776aa18b35049d8655fc47e19f701a2573fd794b269f0f6bd6ed2dd0d3c7638, relay_response_ms: 5087, error: Builder(ServerMessage(ErrorMessage { code: 502, message: "no successful relay response", stacktraces: [] })), info: this is common behaviour for some builders and may not indicate an issue, service: exec
Mar 15 17:14:18.474 WARN Error processing HTTP API request       method: POST, path: /eth/v2/beacon/blinded_blocks, status: 500 Internal Server Error, elapsed: 5.126038349s
Mar 15 17:14:22.027 WARN Beacon chain re-org                     reorg_distance: 1, new_slot: 8641569, new_head: 0x2776aa18b35049d8655fc47e19f701a2573fd794b269f0f6bd6ed2dd0d3c7638, previous_slot: 8641568, previous_head: 0x07252935047cfb8ed2c78cb343b976cffc313e9927549975c232c47e808b38bc, service: beacon
Mar 15 17:14:23.672 WARN Beacon chain re-org                     reorg_distance: 2, new_slot: 8641568, new_head: 0x07252935047cfb8ed2c78cb343b976cffc313e9927549975c232c47e808b38bc, previous_slot: 8641569, previous_head: 0x2776aa18b35049d8655fc47e19f701a2573fd794b269f0f6bd6ed2dd0d3c7638, service: beacon
Mar 15 17:14:25.587 INFO New block received                      root: 0x92948e2c25e11cea1fd48f65ecee25ea83203cf5dd04981f3cc9ded0a90cc4a1, slot: 8641570
Mar 15 17:14:29.015 INFO Synced                                  slot: 8641570, block:    …  empty, epoch: 270049, finalized_epoch: 270047, finalized_root: 0xea35…1a14, exec_hash: 0x9eab…cfa5 (verified), peers: 110, service: slot_notifier
Mar 15 17:14:36.952 INFO New block received                      root: 0xfe94c837acd9f095e6f9fcd19d6a1e060eefdf3e8b7a44aaa61c24a705a234cc, slot: 8641571
Mar 15 17:14:41.001 INFO Synced                                  slot: 8641571, block: 0xfe94…34cc, epoch: 270049, finalized_epoch: 270047, finalized_root: 0xea35…1a14, exec_hash: 0x40dd…7461 (verified), peers: 101, service: slot_notifier

and lead to my validator got a propose miss at slot 8641569.
proposeMiss

Expected Behaviour

How should the application behave?

Not sure if we should not try a propose re-org at slot 8641569.

Steps to resolve

Please describe the steps required to resolve this issue, if known.

@michaelsproul
Copy link
Member

Can you share some logs from mev-boost during this time?

It actually seems like there should have been enough time for the block to propagate. 1.5s+ (from 2.3s to 4s) should be enough

I'll check some more metrics tomorrow to see at what time the block arrived on other nodes. It might have been borderline on time

The SSV thing might have slowed this down a bit too. I'll have a closer look at timestamps to see how long signing took

@SimonSMH1015
Copy link
Author

Can you share some logs from mev-boost during this time?

It actually seems like there should have been enough time for the block to propagate. 1.5s+ (from 2.3s to 4s) should be enough

I'll check some more metrics tomorrow to see at what time the block arrived on other nodes. It might have been borderline on time

The SSV thing might have slowed this down a bit too. I'll have a closer look at timestamps to see how long signing took

No problem. If you need more detailed logs. I would like to post the files on discord.

time="2024-03-16T01:14:07.236+08:00" level=info msg="http: GET / 200" duration=0.000036 method=GET path=/ status=200 version=1.7
time="2024-03-16T01:14:17.503+08:00" level=info msg="http: GET / 200" duration=0.000032 method=GET path=/ status=200 version=1.7
time="2024-03-16T01:14:27.33+08:00" level=info msg="http: GET / 200" duration=0.000045 method=GET path=/ status=200 version=1.7
time="2024-03-16T01:14:35.12+08:00" level=debug msg=registerValidator method=registerValidator version=1.7
time="2024-03-16T01:14:35.716+08:00" level=info msg="http: POST /eth/v1/builder/validators 200" duration=0.595332 method=POST path=/eth/v1/builder/validators status=200 version=1.7
time="2024-03-16T01:14:37.46+08:00" level=info msg="http: GET / 200" duration=0.000039 method=GET path=/ status=200 version=1.7
time="2024-03-16T01:14:37.623+08:00" level=warning msg="error calling registerValidator on relay" error="HTTP error response: 400 / {\"code\":400,\"message\":\"failed to read request body\"}
" method=registerValidator numRegistrations=490 ua=Lighthouse/v5.1.0-10a38a8 url="https://aestus.live/eth/v1/builder/validators" version=1.7
time="2024-03-16T01:14:37.649+08:00" level=warning msg="error calling registerValidator on relay" error="HTTP error response: 400 / {\"code\":400,\"message\":\"failed to read request body\"}
" method=registerValidator numRegistrations=490 ua=Lighthouse/v5.1.0-10a38a8 url="https://relay.ultrasound.money/eth/v1/builder/validators" version=1.7
time="2024-03-16T01:14:47.234+08:00" level=info msg="http: GET / 200" duration=0.000036 method=GET path=/ status=200 version=1.7
time="2024-03-16T01:14:57.233+08:00" level=info msg="http: GET / 200" duration=0.000040 method=GET path=/ status=200 version=1.7
time="2024-03-16T01:14:59.082+08:00" level=debug msg=registerValidator method=registerValidator version=1.7
time="2024-03-16T01:14:59.915+08:00" level=info msg="http: POST /eth/v1/builder/validators 200" duration=0.832610 method=POST path=/eth/v1/builder/validators status=200 version=1.7
time="2024-03-16T01:15:01.01+08:00" level=warning msg="error calling registerValidator on relay" error="HTTP error response: 400 / {\"code\":400,\"message\":\"failed to read request body\"}
" method=registerValidator numRegistrations=500 ua=Lighthouse/v5.1.0-10a38a8 url="https://relay.ultrasound.money/eth/v1/builder/validators" version=1.7
time="2024-03-16T01:15:02.193+08:00" level=warning msg="error calling registerValidator on relay" error="Post \"https://mainnet-relay.securerpc.com/eth/v1/builder/validators\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" method=registerValidator numRegistrations=500 ua=Lighthouse/v5.1.0-10a38a8 url="https://mainnet-relay.securerpc.com/eth/v1/builder/validators" version=1.7
time="2024-03-16T01:15:07.262+08:00" level=info msg="http: GET / 200" duration=0.000050 method=GET path=/ status=200 version=1.7
time="2024-03-16T01:15:17.579+08:00" level=info msg="http: GET / 200" duration=0.000034 method=GET path=/ status=200 version=1.7

@michaelsproul
Copy link
Member

Some more observations:

At 1.642s into the slot, Lighthouse has the payload from the builder:

Mar 15 17:14:12.642 INFO Relay block is more profitable than local block

There is very little that needs to happen in the BN after this point, so I suspect the block was finished being built after ~1.6s.

The broadcast warning is logged at 2.384s:

Mar 15 17:14:13.384 ERRO Block broadcast was delayed root: 0x2776aa18b35049d8655fc47e19f701a2573fd794b269f0f6bd6ed2dd0d3c7638, slot: 8641569, delay_ms: 2384, provenance: builder, msg: system may be overloaded, block may be orphaned

The delay_ms is calculated based on when the block was first seen, so this indicates there
was a delay of up to 2.384 - 1.642s = 740ms for block signing. This delay would incorporate the time for the signature itself, as well as the SSV consensus that needs to occur (probably the bulk of it). I need more BN logs at debug level to confirm the timing of:

  • DEBG Processed HTTP API request with path: /eth/v2/validator/blocks/8641569 (block building is complete).
  • DEBG Signed block received in HTTP API, which indicates the time the signed block first arrives to be published.

Looking at my nodes, I can see there's even more delay beyond 2.38s before the block is received by peers. The earliest it was received by the nodes I checked was 7.57s into the slot, i.e. 5.2s after your node attempted to publish it:

Mar 15 17:14:18.586 DEBG Gossip block arrived late block_delay: 7.575601611s, slot: 8641569, proposer_index: 990976, block_root: 0x2776aa18b35049d8655fc47e19f701a2573fd794b269f0f6bd6ed2dd0d3c7638

This 5s delay either came from:

  1. Lighthouse being slow to send the block to mev-boost
  2. Mev-boost being slow to send the block to the relay
  3. The relay being slow to publish the block to the network

I think the most likely is (3), but it would be good to rule out (1) and (2) with more mev-boost logs. Unfortunately the mev-boost logs you've posted are not useful as they don't show the request from mev-boost to the relay to unblind the block. Can you check again and send me more mev-boost logs?

I'm @sproul on Discord and I'm happy for you to DM me. Please send me debug logs from the beacon node (less important) and more mev-boost logs (more important).

@michaelsproul
Copy link
Member

Related (includes full mev-boost logs):

@michaelsproul
Copy link
Member

@SimonSMH1015 do you mind sharing some SSV logs as well? It would be interesting to investigate the 700ms delay there too

@SimonSMH1015
Copy link
Author

@SimonSMH1015 do you mind sharing some SSV logs as well? It would be interesting to investigate the 700ms delay there too

Sent you on Discord

@michaelsproul
Copy link
Member

michaelsproul commented Apr 5, 2024

Closing this, as it should be resolved now that the bloxroute's issues are resolved. Please open a new issue or reopen this issue if you get another missed block.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants