refactor(core): PoC of decoupling node from JSON-RPC#12047
Draft
bingyanglin wants to merge 10 commits into
Draft
refactor(core): PoC of decoupling node from JSON-RPC#12047bingyanglin wants to merge 10 commits into
bingyanglin wants to merge 10 commits into
Conversation
Bumps the formal-snapshot format to **V2** so new snapshots carry the data the indexer and other downstream consumers need to rebuild state without an archival replay: - **Per-object previous-transaction checkpoint** inline in `.obj` records: `StoreObjectV2` on the row layer, unified `LiveObject` on the snapshot wire. - **Per-epoch metadata** in a new `EPOCH_INFO` file alongside the bucket files. **Scope: producer-side only.** This PR teaches a single writer node to *produce* V2 snapshots; the V1 reader path is preserved on disk and remains unchanged in behavior. The reader-side restore from `EPOCH_INFO` and the one-time backfill of `previous_transaction_checkpoint` for pre-V2 objects are the PR-2 follow-up tracked in **#10957**. - Magic bumped to `0x00B7EC76` (V1 was `0x00B7EC75`) — V1/V2 readers fail-fast on each other. - Records are BCS-encoded `LiveObject` (unified type), carrying `previous_transaction_checkpoint: u64` inline alongside the live `Object`. The on-disk shape, not the Rust type name, is what changes between V1 and V2. - `LiveObject::Wrapped` removed from the live-set view (the enum collapsed to a plain struct); `StoreObject::Wrapped` on the row layer stays (distinct `OBJECT_WRAPPED` digest). - Unchanged from V1. - Layout: `magic(0x9000C001, 4 B) | bcs(EpochInfo::V1 { entries: Vec<Option<EpochInfoEntry>> })`. - One entry per epoch in `[0, snapshot_epoch]`. - Integrity anchored by `FileMetadata::sha3_digest` in MANIFEST (same as `.obj`/`.ref`). ```rust pub struct EpochInfoEntry { pub first_checkpoint: CheckpointSequenceNumber, pub start_system_state: Vec<u8>, // bcs(IotaSystemState) pub last_checkpoint_summary: Option<CertifiedCheckpointSummary>, pub end_of_epoch_tx_events: Option<TransactionEvents>, } ``` `start_system_state` is opaque BCS bytes so the inner `IotaSystemStateV1/V2/…` can evolve without forcing an `EpochInfoEntry` schema change. - Source of truth is the gRPC indexer's `epoch_info` table on `IndexStoreTables` (sibling of the existing `epochs` table). Populated by `grpc_indexes::index_epoch` from `CheckpointData` over two checkpoint boundaries: - **Boundary 1** (prev epoch closes): insert `first_checkpoint` + `start_system_state` for the new epoch. - **Boundary 2** (new epoch closes): upsert `last_checkpoint_summary` + `end_of_epoch_tx_events` and advance `Watermark::EpochIndexed`. - Snapshot V2 writer pre-publish gate: refuses to publish unless `Watermark::EpochIndexed >= snapshot_epoch`, so emitted snapshots are complete-by-construction. The single network node that produces the formal snapshot must: 1. Run as a **fullnode** (validators do not run grpc_indexes). 2. Have **`enable_grpc_api = true`** (so `epoch_info` is populated). 3. Run the PR-2 backfill once before publishing the first V2 snapshot (so `epoch_info` covers `[0, current_epoch]`); after that, live indexing keeps it complete forever. All three are enforced at node startup or snapshot-publish time with operator-facing error messages. - V2 readers accept V2 snapshots and restore object state from `LiveObject` records that carry `previous_transaction_checkpoint` inline. - V2 readers validate the manifest lists `EPOCH_INFO` (fail-fast on a missing entry) but **do not** download or parse the file in this PR — that's PR 2. - V1 readers are untouched and continue to consume V1 snapshots as before. ~~- Reader: download + verify + parse `EPOCH_INFO`, populate the indexer's `epoch_info` table.~~ - Reader: parse `EPOCH_INFO` and dispatch each `EpochInfoEntry` through a new `Restore` trait method (mirroring the object-write generalization in #11559). Concrete consumer-side persistence lives behind the trait; the Indexer-specific impl is tracked in #11023. - One-time backfill of `epoch_info` for pre-PR epochs (so the writer's watermark precondition can clear). - Backfill of `previous_transaction_checkpoint` for objects in snapshots produced before this PR. fixes #11254 - [x] Basic tests (linting, compilation, formatting, unit/integration tests) - [x] Patch-specific tests (correctness, functionality coverage) - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have checked that new and existing unit tests pass locally with my changes --------- Co-authored-by: muXxer <git@muxxer.de>
…11697) # Description of change Makes a gRPC fullnode's `epochs_v2` index complete since genesis on every bootstrap path and enforces it at startup: the node closes any gap before services start, or refuses to run instead of silently serving an incomplete index — the consumer side of #11453's `EPOCH_INFO` file. No object/`previous_transaction_checkpoint` backfill: the snapshot-publisher node resyncs from genesis instead, enforced by the writer's existing refusal of `None` rows. - **Synchronous startup backfill (`iota-node`, `iota-config`).** A node that detects a gap (`epochs_v2_gap`: index short of the last executed closed epoch) fetches only MANIFEST + `EPOCH_INFO` from the new `state_snapshot_read_config`, seeds `[0, snapshot_epoch]`, closes any residual above it from local history, seeds the open epoch's row, and re-checks; a remaining gap — or no configured source — aborts startup. Runs before live indexing exists, so the watermark has one writer at a time and the previous design's background task, retry/backoff, and `epoch_watermark_lock` are gone. - **Local indexing fallback (`iota-core`).** When the latest published snapshot lags local execution (delayed snapshot pipeline), `index_missing_epochs_locally` replays only the missing epochs' closing checkpoints, located via the never-pruned `epoch_last_checkpoint_map`; best-effort up to the pruning horizon. - **Atomic `EpochIndexed` advance (`iota-core`).** The live path advances the watermark in the same batch as the close-of-epoch row (gap-aware `try_advance_epoch_indexed_watermark`); `reconcile_epoch_indexed_watermark` remains only to jump across a seeded prefix. - **Restore builds the whole gRPC index store (`iota-tool`, `iota-snapshot`, `iota-core`).** `download_formal_snapshot` tees the restored object stream into the live-state indexers (`RestoreWithGrpcIndexes`), seeds the epoch rows, and finalizes the store (`Watermark::Indexed`, then `meta` — a crash before `meta` leaves a store the next open wipes and re-inits). The node opens it in place instead of re-indexing the whole restored state; opt out with `--skip-grpc-indexes`. `init` and the restore share one indexing implementation (`GrpcLiveObjectRestorer`; `ParMakeLiveObjectIndexer` is lifetime-generic now). - **Chain-identity gate (`iota-snapshot`).** `verify_and_restore_epoch_info` rejects a snapshot whose manifest `chain_id` differs from this node's chain before writing any row. - **`RestoreEpochInfo` trait (`iota-snapshot`).** Separate single-method trait instead of a new `Restore` method: the two cover different snapshot payloads with different targets, so each call site requires exactly the capability it uses; the unified indexer (#11023) can implement both. - **No `epochs` migration (`iota-core`).** The deprecated `epochs` CF is dropped without migration: its rows lack the end-of-epoch fields, so they could never satisfy `EpochIndexed` and the backfill would overwrite them anyway. - **Open-epoch seeding (`iota-core`).** `initialize_current_epoch_info` keys off the open epoch (`open_epoch_of`): a restore lands on a closing checkpoint, and seeding that checkpoint's own (closed) epoch would leave the open epoch's row permanently missing and wedge the watermark. Its start checkpoint derives from `epoch_last_checkpoint_map` (new seq-only `CheckpointStore` accessor; the backwards scan over prunable summaries is deleted). - **Default snapshot source (`setups`).** Fullnode setups for mainnet/testnet/devnet ship a default `state-snapshot-read-config` pointing at the IOTA Foundation buckets; ignored unless the gRPC API is enabled and the index is incomplete. - **Misc.** Uploader honors `state_snapshot_write_config.concurrency` (default still 20); `get_latest_available_epoch` delegates to `iota_snapshot::reader::latest_available_epoch`; `snapshots.mdx` documents the restore flag and the backfill setup. ## Links to any relevant issues follow-up (PR-2) of #11453 · part of #11023 ## How the change has been tested - [x] Basic tests (linting, compilation, formatting, unit/integration tests) - [x] Patch-specific tests (correctness, functionality coverage) - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have checked that new and existing unit tests pass locally with my changes --------- Co-authored-by: muXxer <git@muxxer.de>
# Description of change Some cleanup for #11697 - **Local epoch backfill: only missing data counts as pruned (`iota-core`, `iota-types`).** Real storage failures were swallowed as "data pruned, retry once a newer snapshot is published". Absent-data errors now carry `Kind::Missing` (was `custom`) and only those end the best-effort replay; anything else propagates. Nothing else reads the kind. - **Operator doc fixes (`iota-config`, `docs`).** The config doc still described the removed background task (the backfill is synchronous and gates startup); the snapshots guide now documents the refuse-to-start case when even the latest snapshot is older than the node's pruned-away history. ## How the change has been tested - [x] Basic tests (linting, compilation, formatting, unit/integration tests) - [ ] Patch-specific tests (correctness, functionality coverage) - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have checked that new and existing unit tests pass locally with my changes
…rom formal snapshot's EPOCH_INFO (#11868) # Description of change The formal-snapshot restore currently depends on two buckets: the snapshot bucket and the checkpoint-archive bucket, where the default summary-sync mode only ever fetches the end-of-epoch summaries. Snapshot V2's EPOCH_INFO file already carries exactly those. It contains one certified closing summary per epoch since genesis. Additionally, the gRPC epochs_v2 seeding (restore and node startup backfill) trusted the bucket after a chain-id check only. This PR makes the default restore archive-free and anchors every consumed EPOCH_INFO byte to the operator's genesis: - `VerifiedEpochInfo` witness (iota-snapshot): `verify_epoch_info_chain` is its only constructor -> chain-id check, contiguity from epoch 0, and the committee-chain walk from the genesis committee (built on `CommitteeChainVerifier`). Holding the witness is proof of verification; `restore_epoch_info` now lives on it, so unverified rows structurally cannot be written. - Archive-free default restore (iota-tool): EPOCH_INFO is downloaded and chain-verified up front (one small file — also rejecting wrong-network/tampered snapshots before any large download); `sync_summaries_from_epoch_info` seeds the genesis checkpoint, every epoch's verified closing summary (via `CheckpointStore::insert_verified_checkpoint`, which also maintains the never-pruned `epoch_last_checkpoint_map`), and the committees, then sets the four watermarks. - `--all-checkpoints` stays archive-backed (full summary history has no snapshot replacement); the archive config is now required only with that flag. `start_summary_sync` is reduced to that mode, and the stale TODO plus the redundant manual `insert_epoch_last_checkpoint` are removed (summary sync maintains the map in both modes). - Node startup backfill chain-verifies too: `backfill_epochs_v2_from_snapshot` takes the genesis committee from the node config and goes through the witness, closing the bucket-trust gap. Tests: snapshot-test fixtures upgraded to real committee-signed summaries; six new unit tests cover accept+restore and every rejection path (wrong chain id, wrong genesis committee, tampered summary, missing end-of-epoch data, non-contiguous entries); e2e tests adapted to the witness API and hardened against a timing flake `wait_until_executed_open_epoch`). ## How the change has been tested - [x] Basic tests (linting, compilation, formatting, unit/integration tests) - [ ] Patch-specific tests (correctness, functionality coverage) - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have checked that new and existing unit tests pass locally with my changes ### Release Notes - [ ] Protocol: - [x] Nodes (Validators and Full nodes): Restoring a node from a formal snapshot no longer requires a checkpoint-archive bucket. The required checkpoint summaries now come from the snapshot itself and are cryptographically verified against the genesis committee. - [ ] Indexer: - [ ] JSON-RPC: - [ ] GraphQL: - [ ] CLI: - [ ] Rust SDK: - [ ] gRPC:
# Description of change Formal-snapshot restore no longer touches the checkpoint archive. The default (and now only) path seeds the end-of-epoch summaries and committees from the snapshot's chain-verified EPOCH_INFO; the `--all-checkpoints flag`, the `archive_store_config` plumbing, and the `start_summary_sync_from_archive` machinery are removed from download_formal_snapshot. The full-summary-history download, previously bolted onto restore as `--all-checkpoints`, becomes a standalone command, iota-tool `backfill-checkpoint-summaries`, runnable on any stopped node. It downloads every intermediate checkpoint summary from the archive into the node's checkpoint store up to `min(highest_synced, archive_latest)`, optionally verifying the chain pairwise from genesis. It only adds historical summaries below the node's existing watermarks. No watermark is moved, so it's safe to re-run, and it's decoupled from restore (a node restored from a formal snapshot can become a full-header source for peers, or serve historical checkpoint queries). ## How the change has been tested - [x] Basic tests (linting, compilation, formatting, unit/integration tests) - [ ] Patch-specific tests (correctness, functionality coverage) - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have checked that new and existing unit tests pass locally with my changes ### Release Notes - [ ] Protocol: - [ ] Nodes (Validators and Full nodes): - [ ] Indexer: - [ ] JSON-RPC: - [ ] GraphQL: - [x] CLI: Added `backfill-checkpoint-summaries` command to `iota-tool` to download the full checkpoint summary history for a stopped node - [ ] Rust SDK: - [ ] gRPC:
) Hardens the formal-snapshot `EPOCH_INFO` trust model. Each entry previously carried `start_system_state` as raw BCS bytes taken on faith from the (unsigned) snapshot transport. Now every field is proven against its certified `last_checkpoint_summary`, and each epoch's start state is derived from the *next* epoch's verified boundary objects (epoch 0 from genesis). A restoring node trusts only data reachable from a signed summary. - On-disk entry `EpochInfoV1Entry` (moved to `iota-types`): drop `start_system_state`; carry `last_checkpoint_summary`, `last_checkpoint_contents`, `end_of_epoch_tx_effects`, `end_of_epoch_tx_events`, and raw `next_epoch_start_system_state_objects` (`0x5` + inner state). `epoch`/`start_checkpoint` are derived from the signed summaries, not stored. - Add `verify_epoch_boundary_proof`: hash-chains contents → effects → events → start-state objects back to the signed summary; rejects on any mismatch. - `verify_epoch_info_chain` now also takes `genesis_system_state` (epoch 0's start, which no entry proves); derives each entry's `epoch`/`start_checkpoint` from the signed summaries and cross-checks the start committee against the certified chain. - Decode each epoch's `system_state` from its boundary's digest-verified object bytes instead of trusting `start_system_state` (which can't round-trip byte-identically against the effects' object digest). - `EpochInfoV2` (the `epochs_v2` row, `iota-types`): hold the on-disk entry directly as `epoch_info_entry: Option<EpochInfoV1Entry>` (finalized ⟺ `Some`); store only the non-derivable start-of-epoch identity (`epoch`, `start_checkpoint`, `start_timestamp_ms`, `system_state`) and expose `protocol_version`/`reference_gas_price`/`end_timestamp_ms`/`end_checkpoint` as derived getters. - `iota-core/grpc_indexes.rs`: at each boundary capture contents, epoch-change effects, and system-state objects into the entry; load events during sparse checkpoint assembly so the entry is complete. - New helpers (`iota-types`): `CheckpointData::end_of_epoch_transaction` (runtime-checked, not `debug_assert`), `CheckpointContents::end_of_epoch_execution_digests`, `get_iota_system_state_objects`. - `iota-snapshot/writer.rs`: write the row's `epoch_info_entry` straight into `EPOCH_INFO` (no projection step). - `iota-grpc-server` `get_epoch`: serve the derived getters instead of the removed stored fields. - `iota-node` / `iota-tool`: pass `genesis.iota_system_object()` to the verifier. - `simulacrum`: set `epoch_info_entry` to `None`. - [x] Basic tests (linting, compilation, formatting, unit/integration tests) - [x] Patch-specific tests (correctness, functionality coverage) - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have checked that new and existing unit tests pass locally with my changes --------- Co-authored-by: muXxer <git@muxxer.de>
Moves the per-epoch verified chain (`epoch_info`) out of the gRPC index store into the `CheckpointStore`, so every node (validators and full nodes) holds it. Unblocks the future summary pruning, lets snapshots publish without `enable_grpc_api`, and decouples the chain from `--skip-grpc-indexes`. - New `epoch_info` + `epoch_info_watermark` CFs in `CheckpointStore`, never pruned; logic in new `checkpoints/epoch_info.rs`. - Checkpoint executor populates the chain live at every boundary (validators included). - gRPC index store drops its `epochs_v2` CF and `index_epoch`; keeps only transaction + live-state indexes. - `GetEpoch` now reads the `CheckpointStore` (`get_epoch_info` moved to `GrpcStateReader`). - Snapshot writer/uploader read `EPOCH_INFO` from the `CheckpointStore`; `enable_grpc_api` requirement removed. - Node startup `seed_epoch_info` fills any historical gap before serving: a **recognized chain** (mainnet/testnet/current devnet) restores from that chain's formal-snapshot `EPOCH_INFO` first, then replays local checkpoints for the residual tail; an **unrecognized network** rebuilds from local checkpoints only (no snapshot). A residual gap is **fatal on a recognized chain** (every node must hold the verified chain since genesis) and a warning otherwise (left unfilled — such a node won't produce snapshots or serve the epoch gRPC API for those epochs). - The formal-snapshot source is **hardcoded per chain** (no operator config); fetched `EPOCH_INFO` is verified against the genesis committee. One-time migration aid for pruned upgrading nodes, removed one release later (#12028). - Local rebuild reconstructs the chain from genesis using only each epoch's **boundary (change-epoch) transaction** — cheap, and succeeds even when non-boundary transactions or old object versions have been pruned. - Restore tool seeds the chain unconditionally; `--skip-grpc-indexes` now governs only live-state indexes. - Docs: `snapshots.mdx` made node-generic; the per-epoch-metadata backfill section removed (the source is now built-in). No new node config. - [x] Basic tests (linting, compilation, formatting, unit/integration tests) - [x] Patch-specific tests (correctness, functionality coverage) - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have checked that new and existing unit tests pass locally with my changes --------- Co-authored-by: muXxer <git@muxxer.de>
…ils (#12040) # Description of change If the snapshot download fails, a node that still has its full local history now rebuilds `epoch_info` from that local data instead of refusing to start. It only fails if neither the snapshot nor local data can complete the chain (e.g. a pruned node that also can't reach the snapshot). ## How the change has been tested - [x] Basic tests (linting, compilation, formatting, unit/integration tests) - [ ] Patch-specific tests (correctness, functionality coverage) - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have checked that new and existing unit tests pass locally with my changes
# Description of change This patch introduces three changes motivated by the parallel work on using the V2 api in the indexer. 1. It updates the progress bar on regular ticks, so it doesn't made for download completion to appear while restoring 2. Exposes a consuming `VerifiedEpochInfo::into_parts` method that extracts all inner values, so that custom indexing logic on epoch can be implemented efficiently. 3. Expose a getter to the `start_system_states` as a slice, to complement the existing API and make documentation more concise. ## Links to any relevant issues Part of #11023 ## How the change has been tested - [x] Basic tests (linting, compilation, formatting, unit/integration tests)
a7a94b1 to
88c1242
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description of change
Note: this branch will be rebased after #12035 is merged to
develop. Can check a7a94b1 only for ease of review.The indexer is now the sole JSON-RPC server. This removes JSON-RPC serving (and its backing index) from
iota-node/iota-core, plus the code that existed only to support it. The client SDK and theiota-json-rpc*crates stay (used by the indexer / CLI / graphql).build_http_server/build_kv_store),IndexStorewiring,verify_indexes, the/healthendpoint, and theiota-json-rpc{,-api}deps. gRPC +TransactionOrchestratorkept.jsonrpc_index(IndexStore),subscription_handler,streamer,verify_indexes; removed theAuthorityStateindex field/methods, the commit-path index/subscription hooks, and the pruner's index logic.iota-json-rpc-types—dev_inspect_transaction_block/dry_exec_transactionnow return raw node types (TransactionEffects/TransactionEvents/ExecutionResult); consumers (unit tests, transactional-test-runner, benchmark) migrated. iota-core has zero JSON-RPC deps.StateRead+AuthorityState-backed API structs); kept the server infra + helpers the indexer/graphql reuse.NodeConfigfields (json_rpc_address,enable_index_processing,jsonrpc_server_type,indexer_max_subscriptions,iota_names_config,num_epochs_to_retain_for_indexes,enable_secondary_index_checks) + swarm-config/swarm/localnet/test-cluster plumbing.LedgerService.GetHealth(same checkpoint-lag check the HTTP/healthdid).iota-json-rpc-apidoc types (output byte-identical); dropped the unusedIndexStoreNotAvailableerror.Links to any relevant issues
fixes #11260
How the change has been tested