Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frequent Delays and Unrecoverable State in Archive Node for Base Network #13353

Open
maximalvido opened this issue Dec 11, 2024 · 0 comments
Open

Comments

@maximalvido
Copy link

maximalvido commented Dec 11, 2024

Describe the bug

Also reported in reth

We are running a Base archive node using reth and op-node. The node frequently encounters delays and never recovers, requiring a manual restart approximately once an hour. This impacts our ability to maintain a reliable archive node.

We have a script that checks the block s timestamp against and throws alerts when the node is delayed

Image
Image
Image
Image
Image

Hardware Specs:
Processor: AMD EPYC™ 9454P - 48 core
HDD : 3x SAMSUNG MZQL27T6HBLA-00A07
256 GB RAM: 8x SAMSUNG M321R4GA3BB6-CQKET

Environment Details

  • Reth Version: ghcr.io/paradigmxyz/op-reth:v1.1.0
  • Op-node Version: us-docker.pkg.dev/oplabs-tools-artifacts/images/op-node:v1.9.1
  • Docker: Using docker-compose for service orchestration
  • Host OS: Ubuntu 24.04.1 LTS

Steps to reproduce

  1. Run the services using the provided docker-compose.yml configuration.

Docker compose:

version: '3.9'

services:
  reth:
    user: root
    image: my/reth:v1.0.8
    ports:
      - '8545:8545' 
      - '8546:8546' 
    volumes:
      - /opt/reth/data:/root/.local/share/reth/mainnet/
      - /opt/reth/jwttoken:/root/jwt:ro
    command: >
      node
      --authrpc.addr 0.0.0.0
      --authrpc.port 8551
      --authrpc.jwtsecret /root/jwt/jwt.hex
      --ipcdisable
      --http --http.addr 0.0.0.0 --http.port 8545
      --http.api "debug,eth,net,trace,txpool,web3,reth,ots"
      --ws --ws.addr 0.0.0.0 --ws.port 8546
      --ws.api "debug,eth,net,trace,txpool,web3,reth,ots"
      --rpc-max-logs-per-response 0
    stop_grace_period: 10m

  lighthouse:
    user: root
    image: my/lighthouse:v5.3.0
    depends_on:
      - reth
    ports:
      - '5052:5052/tcp' 
    volumes:
      - /opt/lighthouse:/root/.lighthouse
      - /opt/reth/jwttoken:/root/jwt:ro
    command: >
      lighthouse bn
      --http --http-address 0.0.0.0
      --execution-endpoint http://reth:8551
      --execution-jwt /root/jwt/jwt.hex
      --checkpoint-sync-url https://mainnet.checkpoint.sigp.io
      --disable-deposit-contract-sync
    stop_grace_period: 10m

  base_reth:
    image: ghcr.io/paradigmxyz/op-reth:v1.1.2
    depends_on:
      - reth
      - lighthouse
    command: >
        node
        --chain base
        --datadir /data
        --rollup.sequencer-http https://mainnet-sequencer.base.org
        --rollup.disable-tx-pool-gossip
        --port 30305
        --http
        --http.addr 0.0.0.0
        --http.port 8549
        --http.corsdomain "*"
        --http.api all
        --ws
        --ws.addr 0.0.0.0
        --ws.port 8550
        --ws.origins "*"
        --ws.api all
        --authrpc.jwtsecret /jwt/jwt.hex
        --authrpc.addr 0.0.0.0
        --authrpc.port 9551
    ports:
      - 8549:8549/tcp
      - 8550:8550/tcp
    volumes:
      - /opt/base/reth_data:/data
      - /opt/base/jwt:/jwt
    stop_grace_period: 10m

  base_rollup_node:
    image: us-docker.pkg.dev/oplabs-tools-artifacts/images/op-node:v1.10.0
    depends_on:
      - reth
      - lighthouse
    command: >
      op-node
      --network=base-mainnet
      --syncmode=execution-layer
      --l1=ws://reth:8546
      --l1.trustrpc
      --l1.beacon=http://lighthouse:5052
      --l2=http://base_reth:9551
      --l2.jwt-secret=/jwt/jwt.hex
    volumes:
      - /opt/base/jwt:/jwt:ro
    stop_grace_period: 10m

networks:
  default:
    ipam:
      config:
        - subnet: xxx.xxx.xxx.xxx/16
          gateway: xxx.xxx.xxx.xxx

Node logs

Observed Behavior

# base_rollup_node

base_rollup_node_1  | t=2024-12-05T08:18:42+0000 lvl=info msg="failed to insert payload" ref=0xb8ea1ed645599043d65822c8559284c19c480ef2970889c680b4317af80b2537:23298442 txs=53 err="temp: failed to update insert payload: failed to execute payload: Post \"http://base_reth:9551\": context deadline exceeded"
base_rollup_node_1  | t=2024-12-05T08:18:42+0000 lvl=warn msg="Engine temporary error" err="temp: failed to update insert payload: failed to execute payload: Post \"http://base_reth:9551\": context deadline exceeded"
# base_reth

base_reth_1         | 2024-12-05T08:18:42.404236Z  INFO Block added to canonical chain number=23298442 hash=0xb8ea1ed645599043d65822c8559284c19c480ef2970889c680b4317af80b2537 peers=37 txs=53 gas=17.24 Mgas gas_throughput=3.36 Mgas/second full=8.7% base_fee=0.01gwei blobs=0 excess_blobs=0 elapsed=5.124570365s
base_reth_1         | 2024-12-05T08:18:42.404826Z ERROR Failed to send event: Ok(PayloadStatus { status: Valid, latest_valid_hash: Some(0xb8ea1ed645599043d65822c8559284c19c480ef2970889c680b4317af80b2537) })
# base_rollup_node

base_rollup_node_1  | t=2024-12-05T08:16:45+0000 lvl=warn msg="failed to notify engine driver of new L2 payload" err="context deadline exceeded" id=0x0066e8dc4d80d0665bcc378e8b560c0ac6e6c95d1444203d66663bbd93b5f7cc:23298558
base_rollup_node_1  | t=2024-12-05T08:16:45+0000 lvl=info msg="Received signed execution payload from p2p" id=0x91e1e5840c68c83a2fb7b3661060465653b4847df388c6236490e9802bff8e81:23298559 peer=16Uiu2HAmPuP3jx4X7pEG9Pxt3Vw6cYqKEGtFG72atmCBaXjkbjug txs=154
base_rollup_node_1  | t=2024-12-05T08:16:48+0000 lvl=error msg="Payload execution failed" block_hash=0xb22060ca9ccb90118639ab2293e3075754da8573a4d3f3093b33eadf83979358 err="Post \"http://base_reth:9551\": context deadline exceeded"
base_rollup_node_1  | t=2024-12-05T08:16:48+0000 lvl=info msg="failed to insert payload" ref=0xb22060ca9ccb90118639ab2293e3075754da8573a4d3f3093b33eadf83979358:23298396 txs=358 err="temp: failed to update insert payload: failed to execute payload: Post \"http://base_reth:9551\": context deadline exceeded"
base_rollup_node_1  | t=2024-12-05T08:16:48+0000 lvl=warn msg="Engine temporary error" err="temp: failed to update insert payload: failed to execute payload: Post \"http://base_reth:9551\": context deadline exceeded"

And also I found parts of the logs with the following pattern.

# base_rollup_node

base_rollup_node_1  | t=2024-12-04T23:30:12+0000 lvl=info msg="Walking back L1Block by hash" curr=0x6389343dca7b516a6e2db8af835488044ce733d2ae9a5012b5de2a16c60fc444:21332403 next=0x459f673ce5a5b4b5a595d5f5b9449adc2a3a60b331b62e024efc8b45d89acd9a:21332402 l2block=0x8ac2d86f3a90962797981b20daa663267b7516c00a436680a56fe3e9941ff4e0:23282767
base_rollup_node_1  | t=2024-12-04T23:30:12+0000 lvl=info msg="Walking back L1Block by hash" curr=0x459f673ce5a5b4b5a595d5f5b9449adc2a3a60b331b62e024efc8b45d89acd9a:21332402 next=0xddedc3ee04f08afcc902a1a672ded25134701252606217e98dd569f7e89fa1d0:21332401 l2block=0x2ea9852298fb6080753807ecc44a1224ce4b729c4782e4a9e23fe021bef8c884:23282761
base_rollup_node_1  | t=2024-12-04T23:30:12+0000 lvl=info msg="Walking back L1Block by hash" curr=0xddedc3ee04f08afcc902a1a672ded25134701252606217e98dd569f7e89fa1d0:21332401 next=0x5376a4ab58e1d60808a3bdb60d914327c09fdb8a06bdf038d1fc93fc0db70859:21332400 l2block=0x04d8a904c9a707325c4abc0bc94f8bb2085d3291b6af55af4d6ebdf2b5e670f6:23282755
base_rollup_node_1  | t=2024-12-04T23:30:12+0000 lvl=info msg="Walking back L1Block by hash" curr=0x5376a4ab58e1d60808a3bdb60d914327c09fdb8a06bdf038d1fc93fc0db70859:21332400 next=0x0ec72c4e1fc4423d8879addf4ff582fdf40b594c12ad25a346b06990209e23c9:21332399 l2block=0x0c7a42c1f76efbca8a14e4290f97b47706ff5a4c398d208503bb0e50718e2e39:23282749
base_rollup_node_1  | t=2024-12-04T23:30:12+0000 lvl=info msg="Walking back L1Block by hash" curr=0x0ec72c4e1fc4423d8879addf4ff582fdf40b594c12ad25a346b06990209e23c9:21332399 next=0xcd60571ee9ac234120891ee1f664d44d4e7d5d926cab710d375e8b6a4de604e2:21332398 l2block=0x6178c72f6cdd5f99b116d6b6676577b825f4966eadd363e6c6130185c8bcc185:23282743
base_rollup_node_1  | t=2024-12-04T23:30:13+0000 lvl=info msg="Walking back L1Block by hash" curr=0xcd60571ee9ac234120891ee1f664d44d4e7d5d926cab710d375e8b6a4de604e2:21332398 next=0xb514ebd7ff6a0200dee05dc806915ac6319518cdb434d9a516fc74b366323411:21332397 l2block=0xcb5ed3594f5828c81c01c271ec3346a3217a226ceb5a2c1250df03ede89f757f:23282737
base_rollup_node_1  | t=2024-12-04T23:30:13+0000 lvl=info msg="Walking back L1Block by hash" curr=0xb514ebd7ff6a0200dee05dc806915ac6319518cdb434d9a516fc74b366323411:21332397 next=0xe9cb1b22e867349cb8d018e3d459f2908aa380bd06cef5e21cb1599efb6a4389:21332396 l2block=0x4c61b75db8849fade883fbae1a9780c732dc4d1c83271772681bb0887f684060:23282731
base_rollup_node_1  | t=2024-12-04T23:30:13+0000 lvl=info msg="Walking back L1Block by hash" curr=0xe9cb1b22e867349cb8d018e3d459f2908aa380bd06cef5e21cb1599efb6a4389:21332396 next=0xc61dc9a15cf0430a3274b6baf828c630db1b7ef81cea90e456c04aff26727c11:21332395 l2block=0x7a72ab9e84b1f99b7db7c7fb966ae3257500fd8c7f512c1943d0160b0e6a6276:23282724
base_rollup_node_1  | t=2024-12-04T23:30:13+0000 lvl=info msg="Walking back L1Block by hash" curr=0xc61dc9a15cf0430a3274b6baf828c630db1b7ef81cea90e456c04aff26727c11:21332395 next=0x1d8934738db4191a2f9b797f9ba98516270a9a3070615ae00010b19de3905dba:21332394 l2block=0x6a6f520e57697ecfb04955391b66e2ac0a95098753918340fe38183111267685:23282718
base_rollup_node_1  | t=2024-12-04T23:30:13+0000 lvl=info msg="Walking back L1Block by hash" curr=0x1d8934738db4191a2f9b797f9ba98516270a9a3070615ae00010b19de3905dba:21332394 next=0x4b5d6e5493ff4f1279b25dfb5956de72d921eceb12955d18c08eb8c4d79b6c27:21332393 l2block=0xce3d271ffb009a7dbf651fbe1ef947ab71325a3a7b9749f6f95ae70c8f1e20fe:23282712
base_rollup_node_1  | t=2024-12-04T23:30:13+0000 lvl=info msg="Walking back L1Block by hash" curr=0x4b5d6e5493ff4f1279b25dfb5956de72d921eceb12955d18c08eb8c4d79b6c27:21332393 next=0x53ef2b94115d27582c45c8db3f651f83f0b3c98c5625aac0af5d51b4c94613cf:21332392 l2block=0x539033eeed14a744877a3ad3514fc7bf23a54faae844ff2c00e57b58480abc86:23282706
base_rollup_node_1  | t=2024-12-04T23:30:13+0000 lvl=info msg="Walking back L1Block by hash" curr=0x53ef2b94115d27582c45c8db3f651f83f0b3c98c5625aac0af5d51b4c94613cf:21332392 next=0xd1af58b96583271cc557f972bd9615ff123ffc6b6abc440489827d19f7c51f76:21332391 l2block=0x26ad4d9b364659301996fac4617b4a206485c978f241809d3bdd569e5ba255f2:23282700
base_rollup_node_1  | t=2024-12-04T23:30:13+0000 lvl=info msg="Walking back L1Block by hash" curr=0xd1af58b96583271cc557f972bd9615ff123ffc6b6abc440489827d19f7c51f76:21332391 next=0xad1ca87a8f78f60f6d63895f99d9cb0c08b1f1599bfdf1ed6ecd9c27a6208c8a:21332390 l2block=0xc3450d23fed27a7eba51d3c1219308779d6348d82e7174cc24867a0cd605c05f:23282694
base_rollup_node_1  | t=2024-12-04T23:30:13+0000 lvl=info msg="Walking back L1Block by hash" curr=0xad1ca87a8f78f60f6d63895f99d9cb0c08b1f1599bfdf1ed6ecd9c27a6208c8a:21332390 next=0xeccfe592d2fce76fd4056f099a50c592c755d082f4cd33cd0f347dc6fe5f83c6:21332389 l2block=0x0ce3c30686d43c3537fc5cef790be207937ef7e687be9bec95334793fcf27f2f:23282688

Complete logs
Complete base_reth.log
Complete base_rollup_node.log

Platform(s)

Linux (x86)

Container Type

Docker

What version/commit are you on?

base_reth version : 1.1.2
base_reth image: ghcr.io/paradigmxyz/op-reth:v1.1.2

base_rollup_node version: 1.10.0
base_rollup_node image: us-docker.pkg.dev/oplabs-tools-artifacts/images/op-node:v1.10.0

What database version are you on?

default

Which chain / network are you on?

base-mainnet

What type of node are you running?

Archive (default)

What prune config do you use, if any?

none

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant