Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long running sync process blocks or shuts down RPC/engine API connection after communicating with CL #2816

Open
mjfh opened this issue Nov 1, 2024 · 2 comments
Labels
bug Something isn't working EL RPC JSON-RPC over HTTP/websocket Sync Prevents or affects sync with Ethereum network

Comments

@mjfh
Copy link
Contributor

mjfh commented Nov 1, 2024

Observation:

An EL client on Mainnet was started as

    nimbus_execution_client
      --data-dir:./data
      --tcp-port:30325
      --metrics-port=9099
      --metrics-address=172.16.210.1
      --metrics
      --engine-api=true
      --engine-api-port=8551
      --engine-api-ws=true
      --jwt-secret=/status/tmp/jwtsecret
      --network=mainnet
      --log-level:TRACE

at 2024-10-31 12:51:06.497. On the same machine a CL beacon node has been run as

    nimbus_beacon_node
      --network=mainnet
      --data-dir=build/data/shared_mainnet_0
      --tcp-port=9000
      --udp-port=9000
      --rest
      --rest-port=5052
      --metrics
      --web3-url=http://127.0.0.1:8551
      --jwt-secret=/status/tmp/jwtsecret

The EL client recieved Forkchoice updates which invoked the beacon sync logic. These updates come with log messages like

    DBG 2024-10-31 12:54:04.803 Received JSON-RPC request                  topics="JSONRPC-HTTP-SERVER" file=httpserver.nim:52 address=127.0.0.1:52670 len=342
    DBG 2024-10-31 12:54:04.803 Processing JSON-RPC request                file=router.nim:135 id=104 name=engine_forkchoiceUpdatedV3
    INF 2024-10-31 12:54:04.803 Forkchoice requested sync to new head      file=api_forkchoice.nim:107 number=21085479 hash=6ffc9828c615
    DBG 2024-10-31 12:54:04.803 Returning JSON-RPC response                file=router.nim:137 id=104 name=engine_forkchoiceUpdatedV3 len=38
    DBG 2024-10-31 12:54:04.803 Received JSON-RPC request                  topics="JSONRPC-HTTP-SERVER" file=httpserver.nim:52 address=127.0.0.1:52676 len=241
    DBG 2024-10-31 12:54:04.803 Processing JSON-RPC request                file=router.nim:135 id=105 name=eth_getLogs
    DBG 2024-10-31 12:54:05.283 Returning JSON-RPC response                file=router.nim:137 id=105 name=eth_getLogs len=2

The last such message block was recorded as

    DBG 2024-10-31 21:15:52.257 Processing JSON-RPC request                file=router.nim:135 id=1067 name=engine_forkchoiceUpdatedV3
    WRN 2024-10-31 21:15:52.257 Forkchoice requested unknown head          file=api_forkchoice.nim:102 hash=d4e78b0217c7
    DBG 2024-10-31 21:15:52.257 Returning JSON-RPC response                file=router.nim:137 id=1067 name=engine_forkchoiceUpdatedV3 len=38
    DBG 2024-10-31 21:15:52.257 Internal error while processing JSON-RPC call topics="JSONRPC-HTTP-SERVER" file=httpserver.nim:84
    DBG 2024-10-31 21:15:52.257 Received JSON-RPC request                  topics="JSONRPC-HTTP-SERVER" file=httpserver.nim:52 address=127.0.0.1:58212 len=138590
    DBG 2024-10-31 21:15:52.261 Processing JSON-RPC request                file=router.nim:135 id=1059 name=engine_newPayloadV3
    TRC 2024-10-31 21:15:52.261 Engine API request received                file=api_newpayload.nim:93 meth=newPayload number=21088244 hash=0x22ed6bcaba7bbcd00cc4ca901d920199745a37eae01401a3df5b55b6381c8a86
    DBG 2024-10-31 21:15:52.261 Returning JSON-RPC response                file=router.nim:137 id=1059 name=engine_newPayloadV3 len=20
    DBG 2024-10-31 21:15:52.261 Internal error while processing JSON-RPC call topics="JSONRPC-HTTP-SERVER" file=httpserver.nim:84
    DBG 2024-10-31 21:15:52.261 Received JSON-RPC request                  topics="JSONRPC-HTTP-SERVER" file=httpserver.nim:52 address=127.0.0.1:39352 len=144292
    DBG 2024-10-31 21:15:52.261 Processing JSON-RPC request                file=router.nim:135 id=1061 name=engine_newPayloadV3
    TRC 2024-10-31 21:15:52.265 Engine API request received                file=api_newpayload.nim:93 meth=newPayload number=21088245 hash=0xb18e08cad42e05c1c07becac1fbc4cf764b967c33d75d384c75dab2de85e506d
    DBG 2024-10-31 21:15:52.265 Returning JSON-RPC response                file=router.nim:137 id=1061 name=engine_newPayloadV3 len=20
    DBG 2024-10-31 21:15:52.265 Internal error while processing JSON-RPC call topics="JSONRPC-HTTP-SERVER" file=httpserver.nim:84
    DBG 2024-10-31 21:15:52.265 Received JSON-RPC request                  topics="JSONRPC-HTTP-SERVER" file=httpserver.nim:52 address=127.0.0.1:36930 len=148497
    DBG 2024-10-31 21:15:52.265 Processing JSON-RPC request                file=router.nim:135 id=1064 name=engine_newPayloadV3
    TRC 2024-10-31 21:15:52.265 Engine API request received                file=api_newpayload.nim:93 meth=newPayload number=21088246 hash=0xd4e78b9326d90e57d18ee521acb7117dbaca21bfc1d013748602b5f6b20217c7
    DBG 2024-10-31 21:15:52.269 Returning JSON-RPC response                file=router.nim:137 id=1064 name=engine_newPayloadV3 len=20
    DBG 2024-10-31 21:15:52.269 Internal error while processing JSON-RPC call topics="JSONRPC-HTTP-SERVER" file=httpserver.nim:84
    DBG 2024-10-31 21:15:52.269 Received JSON-RPC request                  topics="JSONRPC-HTTP-SERVER" file=httpserver.nim:52 address=127.0.0.1:54714 len=108015
    DBG 2024-10-31 21:15:52.269 Processing JSON-RPC request                file=router.nim:135 id=1066 name=engine_newPayloadV3
    TRC 2024-10-31 21:15:52.269 Engine API request received                file=api_newpayload.nim:93 meth=newPayload number=21088247 hash=0xd3cbf11e754f933b3ae91f31bce5afbad86e99046d2571b1fdb7b95e29c213bc
    DBG 2024-10-31 21:15:52.269 Returning JSON-RPC response                file=router.nim:137 id=1066 name=engine_newPayloadV3 len=20
    DBG 2024-10-31 21:15:52.269 Internal error while processing JSON-RPC call topics="JSONRPC-HTTP-SERVER" file=httpserver.nim:84

After that time, no message with the text JSON or Forkchoice can be found until time stamp 2024-11-01 09:31:40.929 where this log ends.

Additional information:

The syncer has stalled after finishing the last job which stopped some hours later with the message

    TRC 2024-11-01 03:05:46.920 RunDaemon: hibernating, awaiting new sync target topics="beacon sync" file=update.nim:158 L=H H=#21085479

and has stalled since (the PR branch with exactly this message might not be available on GitHub yet.)

The CL beacon node repeatedly shows messages like

    INF 2024-11-01 09:34:14.016 No execution client connected; cannot process block payloads topics="gossip_blocks" executionPayload=[..]
    INF 2024-11-01 09:34:21.650+00:00 Database checkpointed                      topics="beacnde" dur=566ms533us207ns
    INF 2024-11-01 09:34:21.650+00:00 Slot end                                   topics="beacnde" slot=10302469 nextActionWait=n/a [..]
    INF 2024-11-01 09:34:23.000+00:00 Slot start                                 topics="beacnde" head=ea64550b:10302469 delay=557us907ns finalized=[..]
    WRN 2024-11-01 09:34:23.464+00:00 Failed to exchange configuration with the configured EL end-points topics="elman" completed=0 failed=0 timed_out=1
    INF 2024-11-01 09:34:25.010+00:00 No execution client connected; cannot process block payloads topics="gossip_blocks" executionPayload=[..]
    INF 2024-11-01 09:34:33.739+00:00 Database checkpointed                      topics="beacnde" dur=657ms387us200ns
    INF 2024-11-01 09:34:33.739+00:00 Slot end                                   topics="beacnde" slot=10302470 nextActionWait=n/a nextAttestationSlot=-1 [..]
    INF 2024-11-01 09:34:35.000+00:00 Slot start                                 topics="beacnde" head=4268cf34:10302470 delay=21us116ns finalized=[..]

The sudo netstat -anp|grep 8551 command on Debian shows something like

    tcp    101   0 127.0.0.1:8551     0.0.0.0:*         LISTEN      2598712/nimbus_exec
    tcp    639   0 127.0.0.1:8551     127.0.0.1:58578   CLOSE_WAIT  - 
    tcp  81524   0 127.0.0.1:8551     127.0.0.1:47708   ESTABLISHED -
    tcp    383   0 127.0.0.1:8551     127.0.0.1:41966   CLOSE_WAIT  -
    tcp    639   0 127.0.0.1:8551     127.0.0.1:41724   CLOSE_WAIT  -
    tcp  81524   0 127.0.0.1:8551     127.0.0.1:57700   ESTABLISHED -
    tcp  81524   0 127.0.0.1:8551     127.0.0.1:48442   ESTABLISHED -
    [..]

with one LISTEN entry, 36 CLOSE_WAIT entries, and 18 ESTABLISED entries.

The peer pool running RLPx and Discovery is still actively communicating.

This phenomeonen has been observed for while. The only successful remedy so far was restarting the EL client. Restarting rhe CL node had no effect. In the past there were massive JWT problems observed which is not the case anymore.

@mjfh mjfh added bug Something isn't working EL RPC JSON-RPC over HTTP/websocket Sync Prevents or affects sync with Ethereum network labels Nov 1, 2024
@mjfh
Copy link
Contributor Author

mjfh commented Dec 11, 2024

Currently on mainnet (as of PR #2926), the sticky CLOSE_WAIT records with the netstat tables are not observed anymore. There are still CLOSE_WAIT entries but they resolve over time. Also the CL does not show errors.

Will keep an eye on that ...

@mjfh
Copy link
Contributor Author

mjfh commented Dec 12, 2024

Error still appears, only not that fast ... on holesky and mainnet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working EL RPC JSON-RPC over HTTP/websocket Sync Prevents or affects sync with Ethereum network
Projects
None yet
Development

No branches or pull requests

2 participants
@mjfh and others