Lean 4 formal verification of the ZisK
zkVM against the Sail RISC-V specification,
via sail-riscv-lean's
LeanRV64D module. This effort follows the pattern established by openvm-fv, and we are grateful to the authors of that library for their excellent work.
Status: all 63 RV64IM opcodes proved equivalent to the Sail spec
(0 sorries, 82 trusted axioms β see docs/fv/trusted-base.md). The
load-bearing claim is lake build: every per-opcode equivalence theorem
typechecks. Run nix run .#test for the full suite (cargo + lake + trust
gate + flake repro check).
| Path | Purpose |
|---|---|
docs/fv/ |
Live library-reference notes: trust ledger, extractor contract, AIR inventory |
tools/pil-extract/ |
Rust CLI: decodes .pilout protobuf β Lean constraint definitions |
ZiskFv/ |
Lake 4 package (mathlib + LeanZKCircuit + LeanRV, toolchain v4.26.0). See Inside ZiskFv/ below. |
zisk/ |
ZisK source tree (git submodule, pinned at 48cf7ccef) |
trust/ |
Trust-boundary baselines + enforcement scripts. See trust/README.md. |
flake.nix, nix/ |
Nix flake that builds the pilout + Sail-Lean spec + extracted Lean reproducibly. See nix/README.md. |
build/ |
Generated artifacts. Gitignored β produced by nix run .#populate. See Inside build/ below. |
docs/site/ |
Single-page trust-boundary explainer (run docs/site/serve.sh, port 4044). |
The Lake 4 package. Subdirectories track the proof's data flow: foundations β extraction β per-AIR named layer β per-opcode lifted semantics β Sail-side mirror β equivalence theorems.
ZiskFv/
βββ Fundamentals/ foundations: Goldilocks field, BitVec/U64 lemmas, transpiler axioms
βββ Extraction/ 3 hand-written extraction-layer files (auto-generated set lives in build/extraction/)
βββ Airs/ per-AIR named-column wrappers + single-AIR correctness theorems
βββ Circuit/ per-opcode lifted circuit semantics β one .lean per RV64IM opcode
βββ Sail/ per-opcode Sail-side mirrors with equivalence-to-LeanRV64D lemmas
βββ Tactics/ instruction-shape archetype tactics that drive the per-opcode proofs
βββ Equivalence/ the final per-opcode theorems: canonical equiv_<OP> + bus-precondition variants
βββ ZiskFv.lean root module β imports the whole tree
Foundational definitions and lemmas the rest of the tree builds on.
Goldilocks.lean defines FGL := Fin (2^64 - 2^32 + 1) and the canonical
[Field FGL] instance β declared once globally and never shadowed
(shadowing it as a proof-local variable defeats ring).
PrattCertificate.lean proves the prime is prime via native_decide
(~6 min cold; the slowest single proof in the build). U64.lean and the
PackedBitVec/ subdirectory hold the fixed-width arithmetic lemmas needed
for carry-free decompositions across packed lanes. Transpiler.lean and
TranspileConsumers.lean declare the transpile_* axioms that bridge
RISC-V instruction encoding to ZisK's microinstruction format (the
trust-base entries 1β9 in docs/fv/trusted-base.md). Execution.lean and
Interaction.lean define generic execution-trace and bus-interaction
structures shared across all AIRs.
Three hand-written extraction-layer files that supplement the auto-generated
set under build/extraction/Extraction/:
ArithTable.leanβ hand-transcribed fromzisk/state-machines/arith/src/arith_table_data.rs::ARITH_TABLE(74 rows). The Rust constant is itself generated from PIL withgenerate_table = 1; we transcribe rather than extending the Rust extractor.MemoryBuses.leanβ hand-curated subset of memory-bus emissions (bus_id = 10) extracted from PIL2gsum_debug_datahints.OperationBuses.leanβ hand-written operation-bus emission for the Main AIR (bus_id = 5000); needed because the auto-extracted form became a multiplicity-0 stub starting v0.16.0.
The per-AIR layer. For each ZisK AIR (Main, Binary, BinaryAdd,
BinaryExtension, Arith, Mem, MemAlign{,Byte,ReadByte,WriteByte})
this layer provides:
- a
Valid_<AIR>structure naming each column (som.cout_1 rowinstead ofCircuit.main circ (column := 9) (row := row) (rotation := 0)), with_deflemmas tying the names back to the underlying anonymous accessors; - iff-bridges that turn each anonymous
constraint_N_every_rowinto a meaningfully-named predicate (e.g.core_every_rowfor BinaryAdd's carry chain); - single-AIR correctness theorems β proofs that one AIR's constraints,
in isolation, imply the BitVec relation they claim. These are the heaviest
files in the layer:
BinaryPackedCorrect.leanis 2,109 lines,BinaryExtensionPackedCorrect.leanis 2,804; - bus-protocol machinery (
OperationBus.lean,OpBusEffect.lean,BusEmission.lean, β¦) and lookup-table soundness (BinaryTable.lean,BinaryExtensionTable.lean, plusArith/ArithTable.leanfor the table-soundness theorem that consumesExtraction/ArithTable.lean's table data).
Files are organized by ZisK constraint table, not by RISC-V instruction β a single AIR (e.g. Binary) covers many opcodes (ADD, SUB, AND, OR, XOR, all branches, β¦).
Per-opcode lifted circuit semantics β one file per RV64IM opcode (62
files total, including shared infrastructure like MemModel.lean,
MulField.lean, DivFieldSigned.lean). Each file composes the relevant Airs/ pieces via
the operation-bus abstraction (Airs/OperationBus.lean::matches_entry) and
concludes that the involved AIR rows together produce f(inputs) for some
BitVec function f. The math stays in Fin p (Goldilocks); the BitVec lift
itself happens in Equivalence/. Files are organized by RISC-V opcode,
not by AIR β Circuit/Add.lean projects out the Add behaviour from
Airs/Main.lean + Airs/Binary/BinaryAdd.lean, both joined by their
matching bus row.
Per-opcode Sail-side mirrors (lowercase, one per opcode: add.lean,
lw.lean, β¦, 65 files total = 63 opcodes + Auxiliaries.lean +
BusEffect.lean). Each
opcode file does two jobs:
- Defines a pure version in
PureSpecnamespace β the Sail-extractedexecute_instructionrewritten in clean BitVec/Option terms, monad stripped, decoder dispatch removed, ZisK-irrelevant trap arms eliminated via the four platform axioms inAuxiliaries.lean(PMP/CLINT/PMA inert, Zicfilp disabled β all scope-honest for ZisK's RV64IM target, ledger entries 10β13). - Proves an equivalence lemma
execute_<OP>_pure_equiv : LeanRV64D.Functions.execute (.<shape> β¦) state = β¦ = PureSpec.execute_<shape>_<op>_pure β¦. The lemma is what keeps the pure form honest β drift betweenbuild/sail-lean/and the pure form is a build failure, not a silent trust extension.
BusEffect.lean defines the bus_effect function that produces a
Sail-shaped state update from circuit-side rows β this is the RHS of every
equivalence theorem.
Archetype tactics: instruction-shape templates that drive the per-opcode
proofs. The 12 archetypes (ALURTypeArchetype, ALUITypeArchetype,
BranchArchetype, LoadArchetype, StoreArchetype, JumpArchetype,
MulArchetype, ShiftArchetype, SignExtendLoadArchetype,
RTypeWArchetype, UTypeArchetype, ArithSMArchetype) each package the
standard simp/rewrite cascade for one instruction shape. Per-opcode files
under Equivalence/ mostly instantiate the relevant archetype with the
opcode's specific Sail-side rewrite and Circuit/ compositional theorem;
this is what makes 63 individual proofs feasible without 63 bespoke proof
scripts.
The top-level FV theorems β one file per opcode, each containing:
equiv_<OP> : execute_instruction (.<shape> β¦) state = (bus_effect exec_row mem_row state).2β the canonical equivalence. Both sides live in Sail's state space: the LHS is Sail'sexecute; the RHS comes fromSail/BusEffect.lean. Every parameter is in one of the safe trust classes ({CIRCUIT-CONSTRAINT, LANE-MATCH, RANGE, TRANSPILE-BRIDGE, TRANSPILE-PIN}); the rd-value derivation runs internally viaRdValDerivation/. The trust gate'scheck-no-output-eq.shmechanically rejects any signature that reintroduces a retired OUTPUT-EQ parameter (h_rd_val,h_byte_sum, etc. β seetrust/forbidden-param-shapes.txt). The 7 load opcodes are exempt pending a follow-up that derives byte-decomposition from circuit witnesses.- Bus-precondition variants (
equiv_<OP>_from_bus,equiv_<OP>_bus_self,equiv_<OP>_op_bus) bridge alternate precondition shapes into the same canonical conclusion.
Subdirectory RdValDerivation/ factors out shared rd-value lemmas across
opcodes that share a derivation pattern (Arith, BinaryCompare, JumpUType,
MulDivRemUnsigned). lake build succeeding on Equivalence/ IS the
formal-verification claim β everything else in the tree is scaffolding
toward this point. All 63 RV64IM opcodes covered, 0 sorries.
build/
βββ sail-lean/ Sail RV64 spec compiled to Lean (LeanRV64D module). 149 generated files.
βββ extraction/ Lake lib `Extraction` β auto-generated PIL2 extraction (11 per-AIR files).
βββ zisk.pilout Compiled ZisK constraint set (protobuf). The pil-extract input.
Everything under /build/ is regenerated by nix run .#populate from
flake-pinned inputs and is gitignored. The audit surface for these
artifacts is flake.lock (content-hashed pins of the Sail compiler,
sail-riscv source, ZisK source, and pil2-* toolchain), not the files
themselves.
The Sail RV64 spec mechanically compiled to Lean by
NethermindEth/sail-riscv-lean
(149 files). Defines LeanRV64D.Functions.execute, the monadic decoder
that pattern-matches every RV64GD instruction, plus all supporting types
(registers, memory model, traps, PMP, CLINT, β¦). This is the trusted
source of truth for the LHS of every equivalence theorem; per-opcode
ergonomic mirrors live in ZiskFv/Sail/.
A standalone Lake library named Extraction, required by the root
lakefile.toml via [[require]] path = "build/extraction". Contains the
11 auto-generated per-AIR .lean files emitted by tools/pil-extract
from build/zisk.pilout (Arith, Binary, BinaryAdd, BinaryExtension,
Buses, Main, Mem, MemAlign, MemAlignByte, MemAlignReadByte,
MemAlignWriteByte). Each file defines anonymous constraint_N_every_row
predicates directly over witness columns; the human-readable named bridges
live in ZiskFv/Airs/. The static lakefile.toml and root Extraction.lean
are written into the directory by nix/populate.nix (they aren't part of
any Nix derivation), so wiping build/ and re-running populate restores
the full lib.
The ZisK PIL2 constraint set in protobuf form, the output of pil2-compile
applied to ZisK's .pil source tree. Input to tools/pil-extract (which
produces build/extraction/Extraction/*.lean). At ~17 GiB peak / ~24 min
wall on cold rebuild, this is the dominant cost in the populate pipeline.
The trust boundary is mechanically enforced on every PR via
.github/workflows/trust-gate.yml, which runs
trust/scripts/check-all.sh. The gate ensures:
- All
axiom/opaque/constant/unsafe def/partial def/@[extern]/@[implemented_by]declarations live in one of the files listed intrust/allowed-axiom-files.txt. - The hash + name + location of every project axiom matches
trust/baseline-axioms.txt. Any add, remove, rename, or subtle weakening of an axiom shows up as a diff on this file. - No canonical
equiv_<OP>theorem accepts a retired OUTPUT-EQ hypothesis parameter (h_rd_val,h_byte_sum, etc. β seetrust/forbidden-param-shapes.txt). The 7 load opcodes are exempt pending a follow-up. - Sanity floors on axiom count and canonical-theorem count.
To legitimately extend the trust surface, edit the relevant allowlisted
file, run trust/scripts/regenerate.sh, commit the updated baseline,
and have a CODEOWNER review the trust/baseline-axioms.txt diff (these
files are protected by .github/CODEOWNERS). See trust/README.md
for the full process and CLAUDE.md for guidance to AI agents
contributing to this repo.
Run trust/scripts/check-all.sh locally to see what CI will check.
zisk-fv reads two artifacts that aren't checked into the repo: the Sail-Lean RV64D spec (the Lean translation of the official Sail RISC-V specification, which is what the proofs are about) and the ZisK pilout (compiled constraint set, which is what the proofs are checked against). Both are built locally from primary source via Nix β nothing is pulled pre-built. Requires Nix with flakes enabled; one-time install:
curl --proto '=https' --tlsv1.2 -sSf -L \
https://install.determinate.systems/nix | sh -s -- installThen, once after cloning:
nix run .#populate # ~30 min cold; ~seconds warm via Nix store cache
# produces build/sail-lean/, build/zisk.pilout,
# and build/extraction/Extraction/*.leanThe artifacts persist under build/, reused
on every subsequent lake build. Re-run only when flake.lock or any
nix/*.nix file changes (in which case the rebuild is incremental
via the Nix store).
The lakefile points at build/sail-lean/ via a path-based require,
so lake build reads the locally-built spec β there is no upstream
git dep for the spec to drift against. The audit surface for build
inputs is flake.lock: it pins every transitive dependency
(sail/sail-riscv/zisk/pil2-* sources + nixpkgs revision) by content
hash, so the build is deterministic across machines.
The ZisK tree is pulled in as a git submodule at zisk/,
pinned to 0xPolygonHermez/zisk@48cf7ccef (Merge pull request #875 from 0xPolygonHermez/develop). Clone with
git clone --recurse-submodules or run git submodule update --init
after cloning. The submodule is the source-text reference for
transpile_* axiom rationales; the pilout itself is built by the
flake from a separate pinned commit of upstream zisk (see
flake.nix::inputs.zisk-src).
After the first-time nix run .#populate (above), three commands cover
everything:
nix run .#populate # refresh artifacts (cached after first build)
nix develop --command lake build # the FV check
nix run .#test # full test suiteOr enter the devshell once and run lake build without the nix develop prefix:
nix develop
lake buildlake build succeeding is the formal-verification claim. Everything
in nix run .#test past lake build (cargo unit tests, trust gate,
flake repro check) is auxiliary scaffolding around that core proof
check.
First cold lake build takes roughly 10 minutes, dominated by
native_decide on Goldilocks primality. The devshell provides
cargo, the Lean toolchain (elan), python3, and jq β the same
toolchain nix run .#test bundles.
Cold first-time nix run .#populate (no Cachix hits, empty Nix
store) is bounded by the pilout build:
| Step | Peak RAM | Wall time |
|---|---|---|
.#zisk-pilout (cold rebuild) |
~17 GiB | ~24 min |
lake build worst process (Sail/sd.lean) |
~8 GiB RSS / ~7 GiB PSS | (subset of total lake build) |
| Everything else | < 5 GiB | minutes |
The pilout build dominates because pil2-compiler (Node, V8 heap
capped at 12 GiB) composes every AIR's algebraic constraints in one
process; total RSS hits ~16 GiB plus OS overhead. A 16 GiB machine
cannot run the cold pilout build β 32 GiB is the practical minimum.
CI runs on size-xl-x64 (32 GiB).
Once the pilout is in the local Nix store or Cachix, every subsequent
nix run .#populate is a few-second cached download (~5 MB) with
trivial RAM cost. The 17 GiB ceiling only matters when a flake.lock
input changes (i.e. an upstream version bump).
Three independent cache layers, each with a different scope:
| Layer | Caches | Scope | Eviction |
|---|---|---|---|
Cachix (zisk-fv.cachix.org) |
Nix derivations: sail-lean-tree, zisk-pilout, extracted-lean |
Content-addressed; visible to every machine + CI run | Manual; near-permanent in practice |
| GitHub Actions cache | .lake/ (compiled oleans for ZiskFv) |
Per refs/<branch-or-PR>/ ref; PR runs read their own ref + main's |
7 days idle; 10 GB total per repo |
| Lake's Azure cache | Mathlib oleans (via lake exe cache get) |
Public; content-addressed by Mathlib commit | Effectively never |
Together these mean a steady-state PR run sees: cachix HIT on all
flake outputs (no pilout rebuild), GitHub-cache HIT on .lake (no
ZiskFv re-elaboration), Azure HIT on mathlib (no Mathlib compile).
Cold cost is paid only when a flake input changes (cachix miss) or
when no PR has touched main in over a week (GitHub-cache miss).
The community project lean4-nix
provides lake2nix.mkPackage, which builds Lake projects as content-
addressed Nix derivations and pushes results to Cachix. This would
replace the GitHub-Actions .lake cache with a per-content-hash one
(no 7-day eviction, no per-ref scoping). We considered it and
deliberately stayed with stock Lake. Two reasons:
- It would lose Mathlib's Azure cache.
lake exe cache getpulls Mathlib's oleans (multi-GB) directly from Mathlib's CI cache; underlean4-nixwe'd either have to package Mathlib as a Nix derivation (full rebuild on every Mathlib bump, hours) or skip the cache and accept ~10 min cold compile per Mathlib update. The lake side is the part of our pipeline that already has a working community cache; trading it for our own cache is net negative. - Granularity. A single
mkPackagederivation hashes over the entire ZiskFv source tree, so any source edit invalidates the whole derivation. To approach Lake's per-file incremental rebuild, we'd have to split ZiskFv into many derivations and hand-encode the import graph β duplicating Lake's bookkeeping in Nix for marginal benefit over what we already have.
If we ever do hit real friction (e.g. CI gaps long enough that
.lake evicts every time), a much smaller move is to keep stock Lake
and just relocate the .lake cache off GitHub Actions to S3/GCS keyed
by (lake-manifest, lakefile, toolchain, flake.lock) hashes β
preserves Mathlib's Azure cache, no per-ref scoping, ~30 LoC of glue.