feat: `forest-tool db import` #6161

hanabi1224 · 2025-10-13T13:19:34Z

Summary of changes

Usage:

forest-tool db import --chain calibnet 1760014753576.forest.car.zst

Changes introduced in this pull request:

added forest-tool db import subcommand
updated CarStream to skip metadata and F3 blocks
unit tests

Reference issue to close (if applicable)

Closes #6162

Change checklist

I have performed a self-review of my own code,
I have made corresponding changes to the documentation. All new code adheres to the team's documentation standards,
I have added tests that prove my fix is effective or that my feature works (if possible),
I have made sure the CHANGELOG is up-to-date. All user-facing changes should be reflected in this document.

Summary by CodeRabbit

New Features
- Added “forest-tool db import” for streaming/importing CAR snapshots (chain selection, optional DB path, skip validation, progress).
- V2 export accepts more flexible input sources and new export options.
- Streamlined file-based CAR loading via a new path-based CAR stream constructor.
Documentation
- CLI reference and changelog updated to document the new import command and export notes.
Bug Fixes
- Safer error handling for invalid snapshot offsets/sizes.
Tests
- Added comprehensive CAR streaming and snapshot parity tests.

coderabbitai · 2025-10-13T13:19:56Z

Walkthrough

Generalizes export_v2 to accept a generic Seek+Read f3 source and options; adds a forest-tool db import subcommand and changelog/docs; introduces CarStream path-based construction, framing/header helpers, skip_bytes helper, CarStream tests, exposes a const, and tightens CAR offset conversions. No breaking API removals.

Changes

Cohort / File(s)	Summary
Docs & Changelog `CHANGELOG.md`, `docs/docs/users/reference/cli.sh`	Adds changelog entry for `forest-tool db import` and registers CLI docs generation for the new `forest-tool db import` command.
Dependency `Cargo.toml`	Enables `tokio_async` feature for `integer-encoding` (`integer-encoding = { version = "4.0", features = ["tokio_async"] }`).
Chain export API & Callers `src/chain/mod.rs`, `src/chain/tests.rs`, `src/rpc/methods/chain.rs`	`export_v2` signature extended with `F: Seek + Read` for the f3 parameter and an added `options: Option<ExportOptions>` argument; call sites and tests updated to match the new generics.
CarStream call sites `src/daemon/bundle.rs`, `src/daemon/db_util.rs`, `src/tool/subcommands/archive_cmd.rs`	Replace inline `tokio::fs::File` + `BufReader` construction with `CarStream::new_from_path(...)` calls; adjust encoder calls to pass explicit data_len types.
CarStream internals & tests `src/utils/db/car_stream.rs`, `src/utils/db/car_stream/tests.rs`	Adds `new_from_path` constructor, MAX_FRAME_LEN, generic v1 header reader, frame helpers (`read_frame`, `read_car_block`), skip/metadata F3 handling, FramedRead integration, updated CarBlockWrite/reader signatures, and comprehensive CarStream tests (including Arbitrary impl for `CarBlock`).
DB tool: Import subcommand `src/tool/subcommands/db_cmd.rs`	Adds `DBCommands::Import { snapshot_files, chain, db, skip_validation }`; changes `run(self)` signature; implements CAR streaming import with optional validation, Blockstore writes, and progress reporting.
DB CAR module visibility `src/db/car/mod.rs`	Makes `V2_SNAPSHOT_ROOT_COUNT` public (`pub const`).
I/O utility `src/utils/io/mod.rs`	Adds `pub async fn skip_bytes<T: tokio::io::AsyncRead + Unpin>(reader: T, n: u64) -> std::io::Result<T>` to consume and discard bytes from async readers.
CAR plain reader error handling `src/db/car/plain.rs`	Replaces unchecked casts for `header_v2.data_offset` / `data_size` with `u64::try_from(...).map_err(io::Error::other)` to propagate conversion errors.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant User
  participant CLI as forest-tool (DBCommands)
  participant FS as Filesystem
  participant CS as CarStream
  participant Val as Validator
  participant DB as Blockstore

  User->>CLI: db import --chain <net> --db <path> [--no-validation] <CAR...>
  CLI->>CLI: resolve DB root & open Blockstore
  loop for each snapshot file
    CLI->>CS: CarStream::new_from_path(path)
    CS->>FS: open file, read header_v1/v2, apply MAX_FRAME_LEN, skip F3 metadata if present
    CS-->>CLI: stream of (cid, bytes)
    loop stream blocks
      alt validation enabled
        CLI->>Val: validate (cid, bytes)
        Val-->>CLI: ok / error
      end
      CLI->>DB: put(cid, bytes)
      CLI->>CLI: update progress
    end
  end
  CLI->>DB: flush/close
  CLI-->>User: Import completed

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

fix: CarStream should support large data blocks like F3 snap #6099 — Related CarStream framing/uvi_bytes changes for larger CAR frames (strong overlap in framing logic).

Suggested reviewers

akaladarshi
sudo-shashank

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The pull request title "feat: `forest-tool db import`" clearly and concisely describes the primary change being introduced: a new CLI subcommand for importing CAR snapshots into the database. This aligns directly with the PR objectives and the primary feature implementation visible throughout the changeset, particularly the Import variant added to DBCommands and the associated import logic. The title is specific enough for developers scanning commit history to understand this introduces database import functionality.
Linked Issues Check	✅ Passed	The PR successfully implements the objective from linked issue #6162: providing a tool to import CAR snapshot files into the key-value database for setting up archival nodes. The implementation includes the new `forest-tool db import` CLI subcommand with configurable options (snapshot files, chain, database path, and validation), database integration with write buffering, CAR block streaming and validation logic, and CarStream modifications to properly skip metadata and F3 blocks during import. Supporting infrastructure changes include the new_from_path constructor for simplified file handling, error handling improvements, and comprehensive unit tests validating the import behavior with F3 snapshots.
Out of Scope Changes Check	✅ Passed	The vast majority of changes are directly or demonstrably related to implementing the import tool requirement. Core implementation changes include the Import subcommand in db_cmd.rs, CarStream modifications to skip metadata and F3 blocks, the skip_bytes I/O helper, and documentation updates. Supporting refactorings to use the new CarStream::new_from_path constructor across bundle.rs, db_util.rs, and archive_cmd.rs are reasonable infrastructure improvements that simplify file handling. The export_v2 generic parameter extension appears necessary to support the included unit tests that exercise F3 snapshot handling across both import and export paths. The only changes with less clear justification are the integer-encoding tokio_async feature addition in Cargo.toml and the error handling improvements in src/db/car/plain.rs, though these appear to be minor supporting changes rather than substantial scope creep.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch hm/tool-db-import

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0b1e490 and 35dd565.

📒 Files selected for processing (1)

src/utils/db/car_stream.rs (7 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

src/utils/db/car_stream.rs (3)

src/db/car/plain.rs (6)

read_v2_header (315-338)

new (126-174)

new (440-442)

reader (319-319)

read_v1_header (347-361)

version (200-202)

src/utils/io/mod.rs (1)

skip_bytes (48-60)

src/utils/encoding/mod.rs (1)

from_slice_with_fallback (18-29)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)

GitHub Check: Build Ubuntu
GitHub Check: cargo-publish-dry-run
GitHub Check: Build MacOS
GitHub Check: Build forest binaries on Linux AMD64
GitHub Check: All lint checks
GitHub Check: tests-release
GitHub Check: tests

🔇 Additional comments (14)

src/utils/db/car_stream.rs (14)

32-33: LGTM: Frame size limit is appropriate.

The 512 MiB limit is reasonable for CAR frames and avoids the 32-bit overflow issues that existed with the previous 8 GiB limit.

62-67: LGTM: Write method properly adapted to trait changes.

The explicit parameter passing and u64 cast align correctly with the updated CarBlockWrite trait signature.

117-119: LGTM: Documentation accurately describes skipping behavior.

The updated documentation correctly describes the automatic skipping of metadata and F3 blocks, with appropriate reference to the FRC-0108 specification.

168-173: LGTM: CARv2 offset validation properly implemented.

The checked conversion with try_from correctly prevents negative data_offset values from being cast to huge u64 values, as addressed in previous review feedback.

175-181: LGTM: Data size validation and reader limiting.

The checked conversion for data_size and the use of take() to enforce the CARv2 data boundary are correctly implemented.

193-222: F3/metadata skipping logic is sound.

The conditional logic correctly:

Detects v2 snapshots via root count

Validates the first block is the metadata block

Skips the F3 frame with proper bounds checking (16 GiB limit)

Excludes the metadata block from the stream

The error handling and edge cases (deserialization failure, CID mismatch) are appropriately handled.

225-232: LGTM: FramedRead construction properly configured.

The use of FramedRead with the uvi_bytes() codec correctly wraps the length-limited reader, simplifying the subsequent stream processing.

294-301: LGTM: Convenient path-based constructor added.

The new_from_path constructor provides an ergonomic entry point for the common use case of reading CAR files from disk, aligning with the PR objective to add forest-tool db import functionality.

364-381: LGTM: read_v1_header refactored for clarity.

The refactored implementation delegates frame reading to the read_frame helper, reducing duplication and improving maintainability while maintaining equivalent error handling.

383-400: LGTM: read_frame helper is well-structured.

The helper properly:

Distinguishes EOF from errors

Enforces frame size limits before allocation

Handles all error cases appropriately

The implementation is clean and reusable.

402-409: LGTM: read_car_block elegantly composes helpers.

The use of transpose() cleanly converts the nested Option<Result> to Result<Option>, making the code both concise and correct.

411-414: LGTM: uvi_bytes codec properly configured.

The codec's max length setting provides defense-in-depth frame size validation, complementing the explicit checks in read_frame.

418-418: LGTM: Tests modularized appropriately.

Moving tests to a separate module file is good practice for maintaining code organization as the test suite grows.

99-112: Public API change verified across all call sites.

All implementations and call sites of write_car_block have been correctly updated to match the new trait signature. Three call sites confirmed:

src/utils/db/car_stream.rs:62 passes explicit data_len parameter

src/tool/subcommands/archive_cmd.rs:718 passes explicit data_len parameter

src/chain/mod.rs:111 passes explicit data_len parameter

The breaking change has been consistently propagated throughout the codebase.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2f2b0d9 and 7ae2f6e.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (14)

CHANGELOG.md (1 hunks)
Cargo.toml (1 hunks)
docs/docs/users/reference/cli.sh (1 hunks)
src/chain/mod.rs (2 hunks)
src/chain/tests.rs (2 hunks)
src/daemon/bundle.rs (1 hunks)
src/daemon/db_util.rs (1 hunks)
src/db/car/mod.rs (1 hunks)
src/rpc/methods/chain.rs (1 hunks)
src/tool/subcommands/archive_cmd.rs (1 hunks)
src/tool/subcommands/db_cmd.rs (3 hunks)
src/utils/db/car_stream.rs (6 hunks)
src/utils/db/car_stream/tests.rs (1 hunks)
src/utils/io/mod.rs (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (8)

src/daemon/db_util.rs (1)

src/utils/db/car_stream.rs (1)

new_from_path (283-288)

src/utils/io/mod.rs (1)

src/utils/net.rs (1)

reader (67-112)

src/rpc/methods/chain.rs (2)

src/chain/mod.rs (1)

export_v2 (67-129)

src/chain/tests.rs (1)

export_v2 (66-66)

src/tool/subcommands/db_cmd.rs (3)

src/db/mod.rs (2)

db_root (320-322)

open_db (324-326)

src/db/blockstore_with_write_buffer.rs (1)

new_with_capacity (38-44)

src/utils/db/car_stream.rs (1)

new_from_path (283-288)

src/utils/db/car_stream.rs (2)

src/db/car/plain.rs (7)

read_v2_header (314-337)

reader (318-318)

read_v1_header (346-360)

new (126-173)

new (439-441)

metadata (175-187)

version (199-201)

src/utils/io/mod.rs (1)

skip_bytes (48-53)

src/daemon/bundle.rs (1)

src/utils/db/car_stream.rs (1)

new_from_path (283-288)

src/tool/subcommands/archive_cmd.rs (1)

src/utils/db/car_stream.rs (1)

new_from_path (283-288)

src/chain/mod.rs (1)

src/chain/tests.rs (1)

export_v2 (66-66)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)

GitHub Check: Build Ubuntu
GitHub Check: Build MacOS
GitHub Check: cargo-publish-dry-run
GitHub Check: All lint checks
GitHub Check: tests
GitHub Check: tests-release
GitHub Check: Build forest binaries on Linux AMD64
GitHub Check: Deploy to Cloudflare Pages

🔇 Additional comments (7)

CHANGELOG.md (1)

32-32: Changelog entry looks good

Accurately describes the new subcommand and links the PR.

docs/docs/users/reference/cli.sh (1)

133-133: Docs generator updated for db import

Good addition; help output will be captured consistently with other subcommands.

src/daemon/bundle.rs (1)

51-51: Use of CarStream::new_from_path simplifies I/O

Cleaner and async-friendly open; no behavior change.

src/db/car/mod.rs (1)

36-36: Exposing V2_SNAPSHOT_ROOT_COUNT is reasonable

Public constant matches FRC‑0108 assumptions and can aid callers.

src/tool/subcommands/archive_cmd.rs (1)

681-681: Path-based CarStream construction LGTM

Consistent with new helper; reduces boilerplate.

src/rpc/methods/chain.rs (1)

451-458: export_v2 call sites correctly updated
All instances now use the updated <D, F> signature with matching parameters across chain.rs, chain/tests.rs, and utils/db/car_stream/tests.rs.

Cargo.toml (1)

111-111: Confirm integer-encoding v4.0 feature and review breaking changes.

The tokio_async feature exists in v4.0.0.

No changelog was retrievable via API—please review the crate’s 3.x→4.0 diff or release notes for any breaking changes.

src/tool/subcommands/db_cmd.rs

src/utils/io/mod.rs

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

src/daemon/db_util.rs (2)

151-173: Fix URL detection: local paths with colon are treated as URLs (breaks imports).

Using Url::parse(from_path.display().to_string()) will classify Windows paths (e.g., C:\file.car) and POSIX filenames with ':' as URLs. Limit to http/https schemes; otherwise treat as a local file.

Apply this diff to make scheme checks explicit and fall back to local-file handling:

             let downloaded_car_temp_path = new_forest_car_temp_path_in(forest_car_db_dir)?;
-            if let Ok(url) = Url::parse(&from_path.display().to_string()) {
-                download_to(
-                    &url,
-                    &downloaded_car_temp_path,
-                    DownloadFileOption::Resumable,
-                    snapshot_progress_tracker.create_callback(),
-                )
-                .await?;
-
-                snapshot_progress_tracker.completed();
-            } else {
-                snapshot_progress_tracker.not_required();
-                if ForestCar::is_valid(&EitherMmapOrRandomAccessFile::open(from_path)?) {
-                    move_or_copy_file(from_path, &downloaded_car_temp_path, mode)?;
-                } else {
-                    // For a local snapshot, we transcode directly instead of copying & transcoding.
-                    transcode_into_forest_car(from_path, &downloaded_car_temp_path).await?;
-                    if mode == ImportMode::Move {
-                        std::fs::remove_file(from_path).context("Error removing original file")?;
-                    }
-                }
-            }
+            let path_str = from_path.to_string_lossy();
+            if let Ok(url) = Url::parse(&path_str) {
+                if url.scheme() == "http" || url.scheme() == "https" {
+                    download_to(
+                        &url,
+                        &downloaded_car_temp_path,
+                        DownloadFileOption::Resumable,
+                        snapshot_progress_tracker.create_callback(),
+                    )
+                    .await?;
+                    snapshot_progress_tracker.completed();
+                } else {
+                    snapshot_progress_tracker.not_required();
+                    if ForestCar::is_valid(&EitherMmapOrRandomAccessFile::open(from_path)?) {
+                        move_or_copy_file(from_path, &downloaded_car_temp_path, mode)?;
+                    } else {
+                        // For a local snapshot, we transcode directly instead of copying & transcoding.
+                        transcode_into_forest_car(from_path, &downloaded_car_temp_path).await?;
+                        if mode == ImportMode::Move {
+                            std::fs::remove_file(from_path).context("Error removing original file")?;
+                        }
+                    }
+                }
+            } else {
+                snapshot_progress_tracker.not_required();
+                if ForestCar::is_valid(&EitherMmapOrRandomAccessFile::open(from_path)?) {
+                    move_or_copy_file(from_path, &downloaded_car_temp_path, mode)?;
+                } else {
+                    // For a local snapshot, we transcode directly instead of copying & transcoding.
+                    transcode_into_forest_car(from_path, &downloaded_car_temp_path).await?;
+                    if mode == ImportMode::Move {
+                        std::fs::remove_file(from_path).context("Error removing original file")?;
+                    }
+                }
+            }

190-211: Same scheme check needed in Auto mode.

Auto mode uses Url::parse(...).is_ok() which has the same misclassification problem. Check for http/https schemes only.

-        ImportMode::Auto => {
-            if Url::parse(&from_path.display().to_string()).is_ok() {
+        ImportMode::Auto => {
+            let path_str = from_path.to_string_lossy();
+            if Url::parse(&path_str)
+                .ok()
+                .map(|u| u.scheme() == "http" || u.scheme() == "https")
+                .unwrap_or(false)
+            {
                 // Fallback to move if from_path is url
                 move_or_copy(ImportMode::Move).await?;
             } else if ForestCar::is_valid(&EitherMmapOrRandomAccessFile::open(from_path)?) {

src/chain/mod.rs (1)

106-117: Potential overflow/truncation when casting f3_data_len (u64 → usize)

On 32-bit or for very large F3 data, casting u64 to usize can truncate and corrupt the frame length. Guard before casting.

Apply this diff:

-    if let Some((f3_cid, mut f3_data)) = f3 {
-        let f3_data_len = f3_data.seek(SeekFrom::End(0))?;
-        f3_data.seek(SeekFrom::Start(0))?;
+    if let Some((f3_cid, mut f3_data)) = f3 {
+        let f3_data_len = f3_data.seek(SeekFrom::End(0))?;
+        if f3_data_len > usize::MAX as u64 {
+            anyhow::bail!(
+                "f3 data too large to encode on this platform: {} bytes",
+                f3_data_len
+            );
+        }
+        let f3_data_len = f3_data_len as usize;
+        f3_data.seek(SeekFrom::Start(0))?;
         prefix_data_frames.push({
             let mut encoder = forest::new_encoder(forest::DEFAULT_FOREST_CAR_COMPRESSION_LEVEL)?;
-            encoder.write_car_block(f3_cid, f3_data_len as _, &mut f3_data)?;
+            encoder.write_car_block(f3_cid, f3_data_len, &mut f3_data)?;
             anyhow::Ok((
                 vec![f3_cid],
                 finalize_frame(forest::DEFAULT_FOREST_CAR_COMPRESSION_LEVEL, &mut encoder)?,
             ))
         });
     }

🧹 Nitpick comments (3)

src/tool/subcommands/db_cmd.rs (2)

111-121: Avoid shadowing db_root; improve naming.

Variable db_root shadows the imported function db_root, reducing readability.

-                let db_root = if let Some(db) = db {
+                let db_root_path = if let Some(db) = db {
                     db
                 } else {
                     let (_, config) = read_config(None, Some(chain.clone()))?;
-                    db_root(&chain_path(&config))?
+                    db_root(&chain_path(&config))?
                 };
-                println!("Opening parity-db at {}", db_root.display());
+                println!("Opening parity-db at {}", db_root_path.display());
                 let db_writer = BlockstoreWithWriteBuffer::new_with_capacity(
-                    open_db(db_root, &Default::default())?,
+                    open_db(db_root_path, &Default::default())?,
                     DB_WRITE_BUFFER_CAPACITY,
                 );

129-141: Optional UX: show which file is being imported.

Progress message only shows total; include the current file to aid tracking.

-                for snap in snapshot_files {
-                    let mut car = CarStream::new_from_path(&snap).await?;
+                for snap in snapshot_files {
+                    let file_display = snap.display().to_string();
+                    pb.set_message(format!("Importing {file_display}..."));
+                    let mut car = CarStream::new_from_path(&snap).await?;
                     while let Some(b) = car.try_next().await? {
                         if !no_validation {
                             b.validate()?;
                         }
                         db_writer.put_keyed(&b.cid, &b.data)?;
                         total += 1;
-                        let text = format!("{total} blocks imported");
-                        pb.set_message(text);
+                        pb.set_message(format!("{total} blocks imported ({file_display})"));
                     }
                 }

src/utils/db/car_stream.rs (1)

352-369: Optional: bound header frame size to prevent OOM

Header should be tiny. Consider rejecting header frames above a small cap (e.g., 1–4 MiB) instead of 8 GiB to harden against malformed inputs.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2f2b0d9 and 7ae2f6e.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (14)

CHANGELOG.md (1 hunks)
Cargo.toml (1 hunks)
docs/docs/users/reference/cli.sh (1 hunks)
src/chain/mod.rs (2 hunks)
src/chain/tests.rs (2 hunks)
src/daemon/bundle.rs (1 hunks)
src/daemon/db_util.rs (1 hunks)
src/db/car/mod.rs (1 hunks)
src/rpc/methods/chain.rs (1 hunks)
src/tool/subcommands/archive_cmd.rs (1 hunks)
src/tool/subcommands/db_cmd.rs (3 hunks)
src/utils/db/car_stream.rs (6 hunks)
src/utils/db/car_stream/tests.rs (1 hunks)
src/utils/io/mod.rs (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (8)

src/daemon/bundle.rs (1)

src/utils/db/car_stream.rs (1)

new_from_path (283-288)

src/daemon/db_util.rs (1)

src/utils/db/car_stream.rs (1)

new_from_path (283-288)

src/tool/subcommands/archive_cmd.rs (1)

src/utils/db/car_stream.rs (1)

new_from_path (283-288)

src/chain/mod.rs (1)

src/chain/tests.rs (1)

export_v2 (66-66)

src/rpc/methods/chain.rs (2)

src/chain/mod.rs (1)

export_v2 (67-129)

src/chain/tests.rs (1)

export_v2 (66-66)

src/utils/io/mod.rs (1)

src/utils/net.rs (1)

reader (67-112)

src/utils/db/car_stream.rs (2)

src/db/car/plain.rs (7)

read_v2_header (314-337)

reader (318-318)

read_v1_header (346-360)

new (126-173)

new (439-441)

metadata (175-187)

version (199-201)

src/utils/io/mod.rs (1)

skip_bytes (48-53)

src/tool/subcommands/db_cmd.rs (4)

src/db/mod.rs (2)

db_root (320-322)

open_db (324-326)

src/cli_shared/mod.rs (2)

read_config (24-41)

chain_path (20-22)

src/db/blockstore_with_write_buffer.rs (1)

new_with_capacity (38-44)

src/utils/db/car_stream.rs (1)

new_from_path (283-288)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)

GitHub Check: Build Ubuntu
GitHub Check: Build MacOS
GitHub Check: cargo-publish-dry-run
GitHub Check: All lint checks
GitHub Check: tests
GitHub Check: tests-release
GitHub Check: Build forest binaries on Linux AMD64
GitHub Check: Check
GitHub Check: Deploy to Cloudflare Pages

🔇 Additional comments (5)

src/daemon/db_util.rs (1)

308-316: Switch to CarStream::new_from_path looks good.

Cleaner and consistent with new constructor; no behavior change in transcode path.

src/utils/db/car_stream/tests.rs (1)

127-144: Great parity check with optional F3 snapshot.

Validates that F3 data doesn’t affect the block set; this directly tests the intended skipping behavior.

src/daemon/bundle.rs (1)

51-55: Switch to CarStream::new_from_path is clean.

Simplifies file handling while preserving behavior.

src/tool/subcommands/db_cmd.rs (1)

142-144: Confirmed buffer flush on drop: BlockstoreWithWriteBuffer’s Drop impl calls flush_buffer(), so drop(db_writer) already flushes all writes.

src/chain/mod.rs (1)

67-74: Signature generalization looks good

Generic F: Seek + Read and options parameter make the API more flexible. Call sites appear adjusted.

src/utils/db/car_stream.rs

src/utils/io/mod.rs

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (3)

src/utils/db/car_stream.rs (3)

178-182: Apply safe conversion to data_size cast.

Line 180 still uses an unsafe cast that was flagged in previous reviews:

.map(|h| h.data_size as u64)

Negative data_size will wrap to huge u64 values, corrupting max_car_v1_bytes and the Take limit. This is inconsistent with the safe data_offset handling at line 173.

Apply this diff:

-        let max_car_v1_bytes = header_v2
-            .as_ref()
-            .map(|h| h.data_size as u64)
-            .unwrap_or(u64::MAX);
+        let max_car_v1_bytes = if let Some(h) = header_v2.as_ref() {
+            u64::try_from(h.data_size).map_err(std::io::Error::other)?
+        } else {
+            u64::MAX
+        };

195-214: Validate F3 block length and avoid unsafe cast.

Lines 203-204 read the F3 block length into usize and cast to u64 without validation:

let len: usize = reader.read_varint_async().await?;
reader = skip_bytes(reader, len as _).await?;

Issues:

On 32-bit systems, usize can overflow when reading a large varint
No upper bound check allows pathological frame sizes
as _ cast hides potential truncation

As flagged in previous reviews, this should read as u64 and validate against MAX_FRAME_LEN.

Apply this diff:

                     // Skip the F3 block in the block stream
                     if metadata.f3_data.is_some() {
-                        let len: usize = reader.read_varint_async().await?;
-                        reader = skip_bytes(reader, len as _).await?;
+                        const MAX_FRAME_LEN: u64 = 8 * 1024 * 1024 * 1024; // 8 GiB
+                        let len: u64 = reader.read_varint_async().await?;
+                        if len > MAX_FRAME_LEN {
+                            return Err(io::Error::new(
+                                io::ErrorKind::InvalidData,
+                                format!("F3 block frame length too large: {len} > {MAX_FRAME_LEN}"),
+                            ));
+                        }
+                        reader = skip_bytes(reader, len).await?;
                     }

375-393: Critical: Fix read_exact misuse and add frame size validation.

This code has two critical issues flagged in previous reviews:

Compile error: Line 384 assigns read_exact result to n, but tokio::io::AsyncReadExt::read_exact returns Result<()>, not Result<usize>. The check if n == len will not compile.
No size validation: Reading directly into usize risks overflow on 32-bit systems and allows pathological frame sizes that could exhaust memory.

Apply this diff:

 async fn read_frame<ReaderT: AsyncRead + Unpin>(
     reader: &mut ReaderT,
 ) -> std::io::Result<Option<Vec<u8>>> {
-    let len: usize = match reader.read_varint_async().await {
+    let len: u64 = match reader.read_varint_async().await {
         Ok(len) => len,
         Err(e) if e.kind() == std::io::ErrorKind::UnexpectedEof => return Ok(None),
         Err(e) => return Err(e),
     };
+    const MAX_FRAME_LEN: u64 = 8 * 1024 * 1024 * 1024; // 8 GiB
+    if len > MAX_FRAME_LEN {
+        return Err(std::io::Error::new(
+            std::io::ErrorKind::InvalidData,
+            format!("frame too large: {len} > {MAX_FRAME_LEN}"),
+        ));
+    }
+    let len_usize = usize::try_from(len).map_err(|_| {
+        std::io::Error::new(std::io::ErrorKind::InvalidData, "frame len overflows usize")
+    })?;
-    let mut bytes = vec![0; len];
-    let n = reader.read_exact(&mut bytes[..]).await?;
-    if n == len {
-        Ok(Some(bytes))
-    } else {
-        Err(std::io::Error::new(
-            std::io::ErrorKind::UnexpectedEof,
-            format!("{len} expected, {n} read"),
-        ))
-    }
+    let mut bytes = vec![0; len_usize];
+    reader.read_exact(&mut bytes[..]).await?;
+    Ok(Some(bytes))
 }

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7ae2f6e and 1b0d612.

📒 Files selected for processing (3)

src/db/car/plain.rs (1 hunks)
src/utils/db/car_stream.rs (6 hunks)
src/utils/io/mod.rs (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

src/utils/db/car_stream.rs (2)

src/db/car/plain.rs (7)

read_v2_header (317-340)

reader (321-321)

read_v1_header (349-363)

new (126-176)

new (442-444)

metadata (178-190)

version (202-204)

src/utils/io/mod.rs (1)

skip_bytes (48-60)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)

GitHub Check: cargo-publish-dry-run
GitHub Check: Build Ubuntu
GitHub Check: Build MacOS
GitHub Check: tests-release
GitHub Check: tests
GitHub Check: All lint checks
GitHub Check: Deploy to Cloudflare Pages
GitHub Check: Build forest binaries on Linux AMD64

🔇 Additional comments (7)

src/utils/io/mod.rs (1)

47-60: LGTM! Error handling properly addresses past review concerns.

The implementation correctly captures the number of bytes skipped and validates it matches the expected count, returning a descriptive error on short reads. This properly addresses the concerns raised in previous reviews about silent failures on EOF.

src/utils/db/car_stream.rs (6)

4-28: LGTM!

Import additions support the new path-based constructor and frame reading functionality.

120-122: LGTM!

Documentation accurately describes the automatic skipping behavior for metadata and F3 blocks per FRC-0108.

171-176: LGTM!

Safe conversion of data_offset using try_from prevents negative values from wrapping to huge u64s.

286-293: LGTM!

Convenient path-based constructor simplifies file-based CarStream creation.

356-373: LGTM!

Refactored read_v1_header cleanly uses the new read_frame helper and provides clear error messages.

395-402: LGTM once read_frame is fixed.

Clean wrapper that will work correctly once the underlying read_frame issues are resolved.

src/db/car/plain.rs

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

src/utils/db/car_stream.rs (1)
181-184: Validate data_size before casting to u64.

The unsafe cast h.data_size as u64 on line 183 will produce a huge value if data_size is negative (e.g., -1 becomes u64::MAX), allowing Take to read far beyond intended boundaries.

Apply this diff to validate before casting:
-        let max_car_v1_bytes = header_v2
-            .as_ref()
-            .map(|h| h.data_size as u64)
-            .unwrap_or(u64::MAX);
+        let max_car_v1_bytes = if let Some(h) = header_v2.as_ref() {
+            u64::try_from(h.data_size).map_err(std::io::Error::other)?
+        } else {
+            u64::MAX
+        };

🧹 Nitpick comments (1)

src/utils/db/car_stream.rs (1)
384-408: Consider simplifying redundant read_exact validation.

The check on lines 400-406 is redundant since read_exact guarantees to read exactly buf.len() bytes or return an error. The function could be simplified to:
     let mut bytes = vec![0; len];
-    let n = reader.read_exact(&mut bytes[..]).await?;
-    if n == len {
-        Ok(Some(bytes))
-    } else {
-        Err(std::io::Error::new(
-            std::io::ErrorKind::UnexpectedEof,
-            format!("{len} expected, {n} read"),
-        ))
-    }
+    reader.read_exact(&mut bytes[..]).await?;
+    Ok(Some(bytes))
Note: The current code is not incorrect, just more verbose than necessary.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1b0d612 and 52f8905.

📒 Files selected for processing (3)

src/db/car/plain.rs (1 hunks)
src/utils/db/car_stream.rs (5 hunks)
src/utils/io/mod.rs (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

src/db/car/plain.rs

🧰 Additional context used

🧬 Code graph analysis (2)

src/utils/db/car_stream.rs (3)

src/db/car/plain.rs (7)

read_v2_header (315-338)

reader (319-319)

read_v1_header (347-361)

new (126-174)

new (440-442)

metadata (176-188)

version (200-202)

src/utils/io/mod.rs (1)

skip_bytes (48-60)

src/utils/encoding/mod.rs (1)

from_slice_with_fallback (18-29)

src/utils/io/mod.rs (2)

src/db/car/plain.rs (3)

reader (319-319)

new (126-174)

new (440-442)

src/utils/net.rs (1)

reader (67-112)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)

GitHub Check: Build Ubuntu
GitHub Check: cargo-publish-dry-run
GitHub Check: Build MacOS
GitHub Check: tests
GitHub Check: tests-release
GitHub Check: Build forest binaries on Linux AMD64
GitHub Check: Deploy to Cloudflare Pages
GitHub Check: All lint checks

🔇 Additional comments (6)

src/utils/io/mod.rs (1)

47-60: LGTM! Skip semantics correctly enforced.

The function properly validates that exactly n bytes are skipped and returns a descriptive error when encountering EOF early. The implementation addresses previous review concerns.

src/utils/db/car_stream.rs (5)

172-179: Good use of try_from for offset validation.

The code correctly validates that data_offset is non-negative before skipping bytes, preventing wraparound from negative i64 values.

198-223: Approve metadata and F3 block skipping logic.

The implementation correctly:

Detects metadata blocks by comparing CID and root count

Validates F3 frame length against MAX_FRAME_LEN before skipping

Uses skip_bytes which will error on short reads

Skips the metadata block from the stream when detected

295-302: LGTM! Convenient path-based constructor.

The helper simplifies common usage by handling file opening and buffering internally.

365-382: LGTM! Clear header parsing with proper validation.

The function correctly reads the header frame, decodes it with fallback support, and validates the version.

410-417: LGTM! Clean helper for reading CAR blocks.

The function properly chains frame reading and block parsing with appropriate error handling.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

src/utils/db/car_stream.rs (2)
32-33: Consider 32-bit portability for MAX_FRAME_LEN.

The constant MAX_FRAME_LEN is defined as 8 GiB, which exceeds usize::MAX on 32-bit systems (4 GiB), causing a compile-time overflow. If 32-bit support is not required, this is acceptable. Otherwise, consider using conditional compilation or a u64 constant with runtime checks when converting to usize.

Example with conditional compilation:
#[cfg(target_pointer_width = "64")]
const MAX_FRAME_LEN: usize = 8 * 1024 * 1024 * 1024; // 8 GiB

#[cfg(target_pointer_width = "32")]
const MAX_FRAME_LEN: usize = 2 * 1024 * 1024 * 1024; // 2 GiB
385-409: Consider removing redundant length check.

The function correctly implements frame reading with proper bounds checking (line 389). However, the check at lines 401-407 is redundant because read_exact guarantees to either read exactly buf.len() bytes or return an error—it never returns Ok(n) where n < buf.len().

You can simplify the code by removing the redundant check:
     let mut bytes = vec![0; len];
-    let n = reader.read_exact(&mut bytes[..]).await?;
-    if n == len {
-        Ok(Some(bytes))
-    } else {
-        Err(std::io::Error::new(
-            std::io::ErrorKind::UnexpectedEof,
-            format!("{len} expected, {n} read"),
-        ))
-    }
+    reader.read_exact(&mut bytes[..]).await?;
+    Ok(Some(bytes))
Note: This was discussed in the previous review and acknowledged as a minor cleanup opportunity.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 52f8905 and d4c8ce4.

📒 Files selected for processing (3)

src/db/car/plain.rs (1 hunks)
src/utils/db/car_stream.rs (5 hunks)
src/utils/io/mod.rs (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

src/db/car/plain.rs
src/utils/io/mod.rs

🧰 Additional context used

🧬 Code graph analysis (1)

src/utils/db/car_stream.rs (3)

src/db/car/plain.rs (7)

read_v2_header (315-338)

reader (319-319)

read_v1_header (347-361)

new (126-174)

new (440-442)

metadata (176-188)

version (200-202)

src/utils/io/mod.rs (1)

skip_bytes (48-60)

src/utils/encoding/mod.rs (1)

from_slice_with_fallback (18-29)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)

GitHub Check: Build forest binaries on Linux AMD64
GitHub Check: tests
GitHub Check: tests-release
GitHub Check: Build Ubuntu
GitHub Check: Build MacOS
GitHub Check: cargo-publish-dry-run
GitHub Check: All lint checks

🔇 Additional comments (6)

src/utils/db/car_stream.rs (6)

174-186: LGTM! Header offset/size validation properly implemented.

The code now correctly validates that data_offset and data_size from the CARv2 header are non-negative before casting to u64, preventing potential corruption from negative values. This addresses the critical issue from the previous review.

199-224: LGTM! Metadata and F3 block skipping logic correctly implemented.

The code properly:

Identifies V2 snapshots by checking root count

Verifies the first block matches the expected metadata CID

Deserializes and validates the metadata structure

Bounds-checks the F3 block length (line 208) before skipping - this addresses the major issue from the previous review

Skips both the F3 data block and metadata block from the stream

The logic correctly handles all edge cases and prevents the stream from including optional blocks.

296-303: LGTM! Convenient path-based constructor.

The new_from_path method provides a clean API for the common use case of opening a CAR file from a filesystem path. The implementation properly chains file opening, buffering, and stream initialization.

366-383: LGTM! Clear refactoring of header reading logic.

The refactored read_v1_header properly:

Borrows the reader instead of consuming it, allowing continued use

Delegates frame reading to the read_frame helper for consistency

Provides explicit error messages for missing frames and version mismatches

Uses the fallback deserializer for robustness

411-418: LGTM! Clean helper function.

The read_car_block helper provides a concise abstraction for reading and parsing CAR blocks. The use of transpose() elegantly handles the Option<Result<T>> → Result<Option<T>> conversion.

420-424: LGTM! Consistent frame size limits.

Configuring UviBytes::set_max_len(MAX_FRAME_LEN) ensures the codec-level decoder enforces the same frame size limit as the manual checks in read_frame, providing defense in depth.

sudo-shashank

LGTM

sudo-shashank · 2025-10-15T10:08:28Z

src/tool/subcommands/db_cmd.rs

+        /// No block validation
+        #[arg(long)]
+        no_validation: bool,


nit: will skip_validation be more appropriate here?

akaladarshi

Couple of comments,

Not directly related to changes but, you can also use usize::try_from(f3_data_len) here as well

akaladarshi · 2025-10-16T14:49:59Z

src/utils/db/car_stream.rs

+    };
+    let mut bytes = vec![0; len];
+    let n = reader.read_exact(&mut bytes[..]).await?;
+    if n == len {


nit: This check is not needed, read_exact already make sure that we are reading exact len

akaladarshi · 2025-10-16T14:50:03Z

src/utils/db/car_stream.rs

+        Err(e) if e.kind() == std::io::ErrorKind::UnexpectedEof => return Ok(None),
+        Err(e) => return Err(e),
+    };
+    let mut bytes = vec![0; len];


@hanabi1224 Loading 8GB directly into the memory, could this be an issue?

MAX_FRAME_LEN is a sane upper bound to check len against before allocating the memory, in practice, the F3 data frame could grow to 8GiB after 10+ years (And we skip F3 data directly so no allocation performed).
Changed MAX_FRAME_LEN to 512GiB and use a separate MAX_F3_FRAME_LEN(16GiB) for F3 frame

akaladarshi · 2025-10-16T14:50:55Z

src/tool/subcommands/db_cmd.rs

+            } => {
+                const DB_WRITE_BUFFER_CAPACITY: usize = 10000;
+
+                let db_root = if let Some(db) = db {


nit: db_root is slightly confusing, you can change it to db_root_path

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (4)

src/tool/subcommands/db_cmd.rs (3)

103-109: Avoid double-negative naming for clarity.

Rename no_validation to skip_validation and invert the conditional to if !skip_validation to match the flag name.

-            Self::Import {
-                snapshot_files,
-                chain,
-                db,
-                skip_validation: no_validation,
-            } => {
+            Self::Import {
+                snapshot_files,
+                chain,
+                db,
+                skip_validation,
+            } => {
@@
-                        if !no_validation {
+                        if !skip_validation {
                             b.validate()?;
                         }

111-121: Honor DB engine/config instead of hardcoded defaults.

open_db(..., &Default::default()) ignores user config (engine, options). Load DB config via read_config and pass it to open_db.

Would you confirm the correct config field to pass (e.g., config.store.db or equivalent) and update open_db accordingly?

129-141: Add error context for easier debugging.

Wrap file-specific operations with context (which file failed; which step).

+                use anyhow::Context as _;
@@
-                    let mut car = CarStream::new_from_path(&snap).await?;
+                    let mut car = CarStream::new_from_path(&snap)
+                        .await
+                        .with_context(|| format!("opening snapshot {}", snap.display()))?;
                     while let Some(b) = car.try_next().await? {
@@
-                        db_writer.put_keyed(&b.cid, &b.data)?;
+                        db_writer
+                            .put_keyed(&b.cid, &b.data)
+                            .with_context(|| format!("writing block {} from {}", b.cid, snap.display()))?;
@@
-                        pb.set_message(text);
+                        if (total & 0x3FF) == 0 { // update every 1024 blocks
+                            pb.set_message(text);
+                        }

src/utils/db/car_stream.rs (1)

296-303: Consider exposing a sync-path constructor too (optional).

A new_from_reader for any AsyncBufRead + AsyncSeek would help tests and non-file sources. new_from_path can delegate to it.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cd6f004 and 46cf6a0.

📒 Files selected for processing (3)

CHANGELOG.md (1 hunks)
src/tool/subcommands/db_cmd.rs (3 hunks)
src/utils/db/car_stream.rs (5 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

CHANGELOG.md

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-10-14T14:31:20.565Z

Learnt from: hanabi1224
PR: ChainSafe/forest#6161
File: src/tool/subcommands/db_cmd.rs:123-144
Timestamp: 2025-10-14T14:31:20.565Z
Learning: In Rust Forest codebase, `BlockstoreWithWriteBuffer` implements the `Drop` trait (at src/db/blockstore_with_write_buffer.rs) which automatically calls `flush_buffer()` to persist buffered writes when the object goes out of scope. Explicit flush calls are not required before dropping the wrapper.

Applied to files:

src/tool/subcommands/db_cmd.rs

🧬 Code graph analysis (2)

src/utils/db/car_stream.rs (3)

src/db/car/plain.rs (8)

read_v2_header (315-338)

reader (319-319)

try_from (242-244)

read_v1_header (347-361)

new (126-174)

new (440-442)

metadata (176-188)

version (200-202)

src/utils/io/mod.rs (1)

skip_bytes (48-60)

src/utils/encoding/mod.rs (1)

from_slice_with_fallback (18-29)

src/tool/subcommands/db_cmd.rs (4)

src/db/mod.rs (2)

db_root (320-322)

open_db (324-326)

src/cli_shared/mod.rs (2)

read_config (24-41)

chain_path (20-22)

src/db/blockstore_with_write_buffer.rs (1)

new_with_capacity (38-44)

src/utils/db/car_stream.rs (1)

new_from_path (297-302)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)

GitHub Check: Build forest binaries on Linux AMD64
GitHub Check: All lint checks
GitHub Check: Build Ubuntu
GitHub Check: Build MacOS
GitHub Check: cargo-publish-dry-run
GitHub Check: tests
GitHub Check: tests-release

🔇 Additional comments (1)

src/tool/subcommands/db_cmd.rs (1)
142-144: Surface flush errors instead of only warning on Drop.

Drop currently logs a warning on flush failure; the CLI still returns success. Explicitly flush and propagate errors so imports don’t silently “succeed.”
-                drop(db_writer);
+                // Prefer explicit flush here if API exists to propagate errors.
+                // e.g., db_writer.flush()?;
+                drop(db_writer);
                 pb.finish();
To locate a public flush method, run:

src/utils/db/car_stream.rs

hanabi1224 · 2025-10-17T06:42:04Z

Not directly related to changes but, you can also use usize::try_from(f3_data_len) here as well

@akaladarshi Fixed by 0b1e490

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

src/utils/db/car_stream.rs (2)

172-179: Header v2 offset/size casts look safe

try_from on data_offset/data_size before skip/limit is correct.

Also applies to: 181-186

32-34: Use u64 for frame lengths end-to-end; avoid 32-bit pitfalls and overflow during varint decode

Decode lengths as u64, bound-check, then try_into usize for allocation; clamp UviBytes max_len when converting. This hardens against 32‑bit overflow and oversized frames.

-// 512MiB
-const MAX_FRAME_LEN: usize = 512 * 1024 * 1024;
+// 512 MiB as u64 (safe on 32-bit and 64-bit)
+const MAX_FRAME_LEN: u64 = 512 * 1024 * 1024;
@@
-async fn read_frame<ReaderT: AsyncRead + Unpin>(
+async fn read_frame<ReaderT: AsyncRead + Unpin>(
     reader: &mut ReaderT,
 ) -> std::io::Result<Option<Vec<u8>>> {
-    let len: usize = match reader.read_varint_async().await {
+    let len: u64 = match reader.read_varint_async().await {
         Ok(len) if len > MAX_FRAME_LEN => {
             return Err(std::io::Error::new(
                 std::io::ErrorKind::InvalidData,
                 format!("frame too large: {len} > {MAX_FRAME_LEN}"),
             ));
         }
         Ok(len) => len,
         Err(e) if e.kind() == std::io::ErrorKind::UnexpectedEof => return Ok(None),
         Err(e) => return Err(e),
     };
-    let mut bytes = vec![0; len];
+    let len_usize = usize::try_from(len)
+        .map_err(|_| std::io::Error::new(std::io::ErrorKind::InvalidData, "frame len overflows usize"))?;
+    let mut bytes = vec![0; len_usize];
     reader.read_exact(&mut bytes[..]).await?;
     Ok(Some(bytes))
 }
@@
 pub fn uvi_bytes() -> UviBytes {
     let mut decoder = UviBytes::default();
-    decoder.set_max_len(MAX_FRAME_LEN);
+    // UviBytes takes usize; clamp on 32-bit.
+    decoder.set_max_len(std::cmp::min(MAX_FRAME_LEN, usize::MAX as u64) as usize);
     decoder
 }

Also applies to: 387-404, 415-418

🧹 Nitpick comments (1)

src/utils/db/car_stream.rs (1)
199-227: Prefer fallback decoder for metadata CBOR

Use from_slice_with_fallback for FilecoinSnapshotMetadata to be resilient to legacy DAG‑CBOR nuances, mirroring header handling.
-                    && let Ok(metadata) =
-                        fvm_ipld_encoding::from_slice::<FilecoinSnapshotMetadata>(&block.data)
+                    && let Ok(metadata) =
+                        from_slice_with_fallback::<FilecoinSnapshotMetadata>(&block.data)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 46cf6a0 and d46d557.

📒 Files selected for processing (1)

src/utils/db/car_stream.rs (5 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

src/utils/db/car_stream.rs (3)

src/db/car/plain.rs (5)

read_v2_header (315-338)

reader (319-319)

new (126-174)

new (440-442)

metadata (176-188)

src/utils/io/mod.rs (1)

skip_bytes (48-60)

src/utils/encoding/mod.rs (1)

from_slice_with_fallback (18-29)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)

GitHub Check: cargo-publish-dry-run
GitHub Check: Build MacOS
GitHub Check: Build Ubuntu
GitHub Check: tests
GitHub Check: tests-release
GitHub Check: Build forest binaries on Linux AMD64
GitHub Check: All lint checks
GitHub Check: Check
GitHub Check: Deploy to Cloudflare Pages

🔇 Additional comments (2)

src/utils/db/car_stream.rs (2)

298-305: Convenience path constructor LGTM

new_from_path is concise and aligns callers on a single API.

120-125: Doc note about skipping metadata/F3 is clear

Good to set expectations for stream consumers.

src/utils/db/car_stream.rs

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/utils/db/car_stream.rs (1)

103-111: Enforce data_len when writing frames to prevent truncated blocks

std::io::copy may write fewer bytes than data_len; we should validate to avoid emitting a malformed frame (length prefix won’t match payload).

Apply:

 impl<T: Write> CarBlockWrite for T {
-    fn write_car_block(&mut self, cid: Cid, data_len: u64, data: &mut impl Read) -> io::Result<()> {
+    fn write_car_block(&mut self, cid: Cid, data_len: u64, data: &mut impl Read) -> io::Result<()> {
         let frame_length = cid.encoded_len() as u64 + data_len;
         self.write_all(&frame_length.encode_var_vec())?;
         cid.write_bytes(&mut *self)
             .map_err(|e| io::Error::new(io::ErrorKind::InvalidData, e))?;
-        std::io::copy(data, self)?;
-        Ok(())
+        let written = std::io::copy(data, self)?;
+        if written == data_len {
+            Ok(())
+        } else {
+            Err(io::Error::new(
+                io::ErrorKind::UnexpectedEof,
+                format!("wrote {written} bytes, expected {data_len} for block {cid}"),
+            ))
+        }
     }
 }

♻️ Duplicate comments (1)

src/utils/db/car_stream.rs (1)
203-209: Fix constant name in error message (MAX_F3_FRAME_LEN vs MAX_FRAME_LEN)

Message references the wrong limit; use MAX_F3_FRAME_LEN for clarity.
-                                format!("f3 block frame length too large: {len} > {MAX_FRAME_LEN}"),
+                                format!("f3 block frame length too large: {len} > {MAX_F3_FRAME_LEN}"),

🧹 Nitpick comments (1)

src/utils/db/car_stream.rs (1)

168-173: Prefer InvalidData with descriptive message for negative offsets/sizes

map_err(std::io::Error::other) hides context. Emit InvalidData with clear text for data_offset/data_size conversion failures.

-                u64::try_from(header_v2.data_offset).map_err(std::io::Error::other)?,
+                u64::try_from(header_v2.data_offset).map_err(|_| {
+                    io::Error::new(io::ErrorKind::InvalidData, format!("negative data_offset: {}", header_v2.data_offset))
+                })?,
@@
-            .map(|h| u64::try_from(h.data_size).map_err(std::io::Error::other))
+            .map(|h| u64::try_from(h.data_size).map_err(|_| {
+                io::Error::new(io::ErrorKind::InvalidData, format!("negative data_size: {}", h.data_size))
+            }))

Also applies to: 175-181

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d46d557 and 0b1e490.

📒 Files selected for processing (3)

src/chain/mod.rs (3 hunks)
src/tool/subcommands/archive_cmd.rs (2 hunks)
src/utils/db/car_stream.rs (7 hunks)

🧰 Additional context used

🧬 Code graph analysis (3)

src/tool/subcommands/archive_cmd.rs (1)

src/utils/db/car_stream.rs (1)

new_from_path (293-298)

src/chain/mod.rs (1)

src/chain/tests.rs (1)

export_v2 (66-66)

src/utils/db/car_stream.rs (3)

src/db/car/plain.rs (7)

read_v2_header (315-338)

new (126-174)

new (440-442)

reader (319-319)

read_v1_header (347-361)

metadata (176-188)

version (200-202)

src/utils/io/mod.rs (1)

skip_bytes (48-60)

src/utils/encoding/mod.rs (1)

from_slice_with_fallback (18-29)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)

GitHub Check: All lint checks
GitHub Check: Build Ubuntu
GitHub Check: Build MacOS
GitHub Check: cargo-publish-dry-run
GitHub Check: Build forest binaries on Linux AMD64
GitHub Check: tests
GitHub Check: tests-release

🔇 Additional comments (6)

src/tool/subcommands/archive_cmd.rs (1)

681-683: Path-based CarStream and u64 length usage are appropriate

CarStream::new_from_path simplifies I/O setup, and passing f3_data_len (u64) to write_car_block aligns with the new API. Looks correct.

Also applies to: 714-719

src/utils/db/car_stream.rs (4)

193-221: Nice: CarStream now transparently skips v2 metadata and optional F3 block

The first-block probing + conditional skip keeps consumers unchanged while honoring FRC‑0108. The MAX_F3_FRAME_LEN guard prevents pathological skips.

Please confirm tests cover:

v1 snapshot (no metadata) → first block yielded unchanged

v2 snapshot with metadata but no F3 → metadata skipped

v2 snapshot with metadata+F3 → both skipped; stream starts at next block

292-299: Convenient constructor

new_from_path is a useful ergonomic addition and matches call sites in archive_cmd.

362-379: Header read path looks solid

read_v1_header via read_frame with size bounds + explicit version check is straightforward and safe.

409-413: Bounded decoder

uvi_bytes with MAX_FRAME_LEN guard is appropriate given normal block sizes.

src/chain/mod.rs (1)

67-74: I can see all call sites are being reviewed. Let me verify the type of f3_data_len at the chain/mod.rs call site to ensure it's u64 and not usize:

No remaining issues detected—all write_car_block callers correctly pass u64

The Seek trait's seek method returns Result, which means all three call sites identified in the scan pass the correct type:

src/utils/db/car_stream.rs:62 casts explicitly with as u64

src/tool/subcommands/archive_cmd.rs:718 uses f3_data_len from seek() (u64)

src/chain/mod.rs:111 uses f3_data_len from seek() (u64)

The trait definition at src/utils/db/car_stream.rs:100 correctly specifies data_len: u64 and all callers comply.

github-actions bot deployed to preview October 13, 2025 13:21 View deployment

hanabi1224 force-pushed the hm/tool-db-import branch from 23f511e to aa6325c Compare October 13, 2025 13:35

github-actions bot deployed to preview October 13, 2025 13:37 View deployment

hanabi1224 force-pushed the hm/tool-db-import branch from aa6325c to 0870600 Compare October 13, 2025 13:39

feat: forest-tool db import

725317f

hanabi1224 force-pushed the hm/tool-db-import branch from 0870600 to 725317f Compare October 13, 2025 13:40

github-actions bot deployed to preview October 13, 2025 13:41 View deployment

github-actions bot deployed to preview October 13, 2025 13:42 View deployment

github-actions bot deployed to preview October 13, 2025 14:49 View deployment

hanabi1224 force-pushed the hm/tool-db-import branch from b9a67bf to c3b4df7 Compare October 14, 2025 12:58

github-actions bot deployed to preview October 14, 2025 13:00 View deployment

Merge branch 'main' into hm/tool-db-import

9bf03ba

hanabi1224 force-pushed the hm/tool-db-import branch from c3b4df7 to 9bf03ba Compare October 14, 2025 13:48

hanabi1224 marked this pull request as ready for review October 14, 2025 13:50

hanabi1224 requested a review from a team as a code owner October 14, 2025 13:50

hanabi1224 requested review from akaladarshi and sudo-shashank and removed request for a team October 14, 2025 13:50

Merge branch 'main' into hm/tool-db-import

7ae2f6e

github-actions bot deployed to preview October 14, 2025 13:50 View deployment

github-actions bot deployed to preview October 14, 2025 13:52 View deployment

coderabbitai bot reviewed Oct 14, 2025

View reviewed changes

src/tool/subcommands/db_cmd.rs Show resolved Hide resolved

src/utils/io/mod.rs Show resolved Hide resolved

coderabbitai bot reviewed Oct 14, 2025

View reviewed changes

src/utils/db/car_stream.rs Outdated Show resolved Hide resolved

src/utils/db/car_stream.rs Show resolved Hide resolved

src/utils/db/car_stream.rs Show resolved Hide resolved

src/utils/io/mod.rs Show resolved Hide resolved

github-actions bot deployed to preview October 14, 2025 14:40 View deployment

coderabbitai bot reviewed Oct 14, 2025

View reviewed changes

src/db/car/plain.rs Show resolved Hide resolved

hanabi1224 force-pushed the hm/tool-db-import branch from 1b0d612 to 52f8905 Compare October 14, 2025 14:52

github-actions bot deployed to preview October 14, 2025 14:54 View deployment

coderabbitai bot reviewed Oct 14, 2025

View reviewed changes

resolve AI comments

d4c8ce4

hanabi1224 force-pushed the hm/tool-db-import branch from 52f8905 to d4c8ce4 Compare October 14, 2025 14:59

github-actions bot deployed to preview October 14, 2025 15:01 View deployment

coderabbitai bot reviewed Oct 14, 2025

View reviewed changes

sudo-shashank previously approved these changes Oct 15, 2025

View reviewed changes

hanabi1224 added 2 commits October 16, 2025 18:31

rename --no-validation to --skip-validation

0c8c24c

Merge remote-tracking branch 'origin/main' into hm/tool-db-import

cd6f004

hanabi1224 dismissed sudo-shashank’s stale review via cd6f004 October 16, 2025 10:32

github-actions bot deployed to preview October 16, 2025 10:34 View deployment

hanabi1224 requested a review from sudo-shashank October 16, 2025 10:35

akaladarshi reviewed Oct 16, 2025

View reviewed changes

hanabi1224 added 2 commits October 17, 2025 14:17

Merge remote-tracking branch 'origin/main' into hm/tool-db-import

f32a746

resolve comments

46cf6a0

github-actions bot deployed to preview October 17, 2025 06:29 View deployment

separate MAX_FRAME_LEN and MAX_F3_FRAME_LEN

d46d557

coderabbitai bot reviewed Oct 17, 2025

View reviewed changes

src/utils/db/car_stream.rs Outdated Show resolved Hide resolved

github-actions bot deployed to preview October 17, 2025 06:38 View deployment

avoid casting f3_data_len from usize to u64

0b1e490

hanabi1224 requested a review from akaladarshi October 17, 2025 06:43

github-actions bot deployed to preview October 17, 2025 06:43 View deployment

coderabbitai bot reviewed Oct 17, 2025

View reviewed changes

src/utils/db/car_stream.rs Show resolved Hide resolved

coderabbitai bot reviewed Oct 17, 2025

View reviewed changes

resolve AI comments

35dd565

github-actions bot deployed to preview October 17, 2025 07:49 View deployment

akaladarshi approved these changes Oct 17, 2025

View reviewed changes

feat: forest-tool db import #6161

Are you sure you want to change the base?

feat: forest-tool db import #6161

Uh oh!

Conversation

hanabi1224 commented Oct 13, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary of changes

Reference issue to close (if applicable)

Other information and links

Change checklist

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sudo-shashank left a comment

Choose a reason for hiding this comment

Uh oh!

sudo-shashank Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hanabi1224 Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

akaladarshi left a comment

Choose a reason for hiding this comment

Uh oh!

akaladarshi Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

hanabi1224 Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

akaladarshi Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

hanabi1224 Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

akaladarshi Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

akaladarshi Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

hanabi1224 Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

feat: `forest-tool db import` #6161

feat: `forest-tool db import` #6161

hanabi1224 commented Oct 13, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 13, 2025 •

edited

Loading

sudo-shashank Oct 15, 2025 •

edited

Loading

hanabi1224 Oct 17, 2025 •

edited

Loading