Skip to content

devnet-3 lean consensus client#198

Merged
dimka90 merged 49 commits intomainfrom
devnet-3
Apr 8, 2026
Merged

devnet-3 lean consensus client#198
dimka90 merged 49 commits intomainfrom
devnet-3

Conversation

@devylongs
Copy link
Copy Markdown
Collaborator

@devylongs devylongs commented Apr 3, 2026

Summary

Devnet-3 lean consensus client implementation targeting leanSpec commit be85318.

What's included

  • SSZ types with fastssz codegen (State, Block, Attestation, Checkpoint, Validator)
  • State transition — process_slots, process_block, 3SF-mini justification/finalization
  • Fork choice — proto-array LMD GHOST with delta propagation
  • P2P networking — gossipsub v1.1 + req-resp (Status, BlocksByRoot) over QUIC
  • XMSS signatures — post-quantum sign/verify/aggregate via Rust FFI
  • Pebble storage — persistent block/state/attestation storage with pruning
  • Checkpoint sync — fetch finalized state from peers with 12 verification checks
  • HTTP API — /lean/v0/health, states/finalized, checkpoints/justified, fork_choice
  • 43 Prometheus metrics matching leanMetrics spec
  • Dockerfile — multi-stage build (Rust FFI + Go binary)

Test plan

  • go build ./...
  • Unit tests for all packages
  • leanSpec fixture tests (state transition, fork choice, signature verification)
  • Multi-client devnet interop

Closes #190 #197

@devylongs devylongs requested review from Cyberking99 and dimka90 April 3, 2026 15:54
devylongs and others added 23 commits April 3, 2026 17:08
…nc backfill

GetBlockHeader can return nil for the new head during rapid cascade
processing after checkpoint sync. Guard against nil dereference
before accessing header fields.
…ompilation

Add logger.Quiet mode to prevent log output from interleaving with
test framework output. Fix forkchoice_test.go compilation after
package refactor (qualify ConsensusStore with node. prefix, remove
.node from fc.OnBlock calls).
test-spec now runs only spectests/ (0.1s). Use test-all for
everything including xmss FFI crypto tests (~6 min).
Packages moved from pkg/ to top level during refactor. Exclude xmss,
spectests, and cmd packages from make test (run separately).
Replace slot-based retention windows with canonicality analysis.
On finalization: walk the proto-array to identify canonical vs
non-canonical blocks, prune non-canonical states/blocks immediately,
prune old finalized ancestors (keep only latest finalized root),
and clean up stale attestation data. Periodic fallback every 7200
slots when finalization stalls. Storage stays bounded instead of
growing to 3000 states before any cleanup.
Prevent stuck-forever scenarios when blocks are permanently
unfetchable:

- Depth tracking per pending block (MaxBlockFetchDepth=512): discard
  blocks whose ancestor chain exceeds this depth
- Pending block cache limit (MaxPendingBlocks=1024): reject new
  pending blocks when cache is full
- Finalization-triggered cleanup: discard pending blocks at/below
  finalized slot with entire subtrees
- Depth cleared on cascade processing

Includes 4 new tests: pending count, depth tracking, subtree discard,
and cascade depth cleanup.
Eliminate N FFI calls per aggregation cycle by caching parsed pubkey
handles. Previously ParsePublicKey (CGo→Rust FFI) was called once per
validator per aggregation (~100ms each). Now parsed once and reused.

Also pool the 1 MiB proof serialization buffer via sync.Pool to
reduce GC pressure from heap allocations every 4 seconds.

With 5 validators: FFI calls reduced from 13 to 3 per aggregation.
Expected aggregation time reduction from ~1000ms to ~300-400ms.
Cyberking99 and others added 6 commits April 8, 2026 19:19
Skills are meant to be invoked by Claude reading SKILL.md, not by humans
running make targets. Remove the top-level .claude/skills/README.md and
the seven devnet-* Makefile targets that framed the skills as a CLI.
The per-skill SKILL.md files (already modeled on ethlambda) remain the
sole entry points.
Under set -euo pipefail, a grep with no matches returns 1 and aborts
the script. This caused check-consensus-progress.sh to silently die
mid-listing whenever a node had zero proposed blocks (e.g. lantern_0
in the current devnet logs), skipping every node alphabetically after
it. Found while testing the devnet-log-review skill on real logs.
Without --cleanData, spin-node.sh leaves each node's data dir intact
between runs. Clients then boot from on-disk state from the previous
run instead of the fresh genesis we just generated. This caused lantern
to come up at slot 22 with the previous run's fork-choice history,
which then never produced votes for the new chain and stalled
finalization across all peers.
Patches two RustSec advisories that were blocking cargo audit in CI:

  - bytes 1.11.0 → 1.11.1 fixes RUSTSEC-2026-0007 (BytesMut::reserve overflow)
  - ruint 1.17.0 → 1.17.2 fixes RUSTSEC-2025-0137 (reciprocal_mg10 unsoundness)

Both are transitive dependencies through the alloy-primitives /
ethereum_ssz / leansig crypto stack — no Cargo.toml changes needed,
patch-level updates only. Cargo dedups windows-sys 0.59.0 and 0.61.2
to a single version as a side effect.

Recovers the lockfile portion of commit 9f41a10 'fix(ci): bump Go to
1.25.9 and patch Rust CVEs in security audit', which passed CI but
was destroyed by a force-push to db9fc97 that only kept the Go bump.
@dimka90 dimka90 merged commit bcb925e into main Apr 8, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Memory leak: all pruning gated behind finalization, causing unbounded growth during stalls

3 participants