-
Notifications
You must be signed in to change notification settings - Fork 180
feat: forest-tool db import
#6161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughGeneralizes export_v2 to accept a generic Seek+Read f3 source and options; adds a forest-tool Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant User
participant CLI as forest-tool (DBCommands)
participant FS as Filesystem
participant CS as CarStream
participant Val as Validator
participant DB as Blockstore
User->>CLI: db import --chain <net> --db <path> [--no-validation] <CAR...>
CLI->>CLI: resolve DB root & open Blockstore
loop for each snapshot file
CLI->>CS: CarStream::new_from_path(path)
CS->>FS: open file, read header_v1/v2, apply MAX_FRAME_LEN, skip F3 metadata if present
CS-->>CLI: stream of (cid, bytes)
loop stream blocks
alt validation enabled
CLI->>Val: validate (cid, bytes)
Val-->>CLI: ok / error
end
CLI->>DB: put(cid, bytes)
CLI->>CLI: update progress
end
end
CLI->>DB: flush/close
CLI-->>User: Import completed
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used🧬 Code graph analysis (1)src/utils/db/car_stream.rs (3)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
🔇 Additional comments (14)
Comment |
23f511e
to
aa6325c
Compare
aa6325c
to
0870600
Compare
0870600
to
725317f
Compare
b9a67bf
to
c3b4df7
Compare
c3b4df7
to
9bf03ba
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lock
is excluded by!**/*.lock
📒 Files selected for processing (14)
CHANGELOG.md
(1 hunks)Cargo.toml
(1 hunks)docs/docs/users/reference/cli.sh
(1 hunks)src/chain/mod.rs
(2 hunks)src/chain/tests.rs
(2 hunks)src/daemon/bundle.rs
(1 hunks)src/daemon/db_util.rs
(1 hunks)src/db/car/mod.rs
(1 hunks)src/rpc/methods/chain.rs
(1 hunks)src/tool/subcommands/archive_cmd.rs
(1 hunks)src/tool/subcommands/db_cmd.rs
(3 hunks)src/utils/db/car_stream.rs
(6 hunks)src/utils/db/car_stream/tests.rs
(1 hunks)src/utils/io/mod.rs
(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (8)
src/daemon/db_util.rs (1)
src/utils/db/car_stream.rs (1)
new_from_path
(283-288)
src/utils/io/mod.rs (1)
src/utils/net.rs (1)
reader
(67-112)
src/rpc/methods/chain.rs (2)
src/chain/mod.rs (1)
export_v2
(67-129)src/chain/tests.rs (1)
export_v2
(66-66)
src/tool/subcommands/db_cmd.rs (3)
src/db/mod.rs (2)
db_root
(320-322)open_db
(324-326)src/db/blockstore_with_write_buffer.rs (1)
new_with_capacity
(38-44)src/utils/db/car_stream.rs (1)
new_from_path
(283-288)
src/utils/db/car_stream.rs (2)
src/db/car/plain.rs (7)
read_v2_header
(314-337)reader
(318-318)read_v1_header
(346-360)new
(126-173)new
(439-441)metadata
(175-187)version
(199-201)src/utils/io/mod.rs (1)
skip_bytes
(48-53)
src/daemon/bundle.rs (1)
src/utils/db/car_stream.rs (1)
new_from_path
(283-288)
src/tool/subcommands/archive_cmd.rs (1)
src/utils/db/car_stream.rs (1)
new_from_path
(283-288)
src/chain/mod.rs (1)
src/chain/tests.rs (1)
export_v2
(66-66)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: Build Ubuntu
- GitHub Check: Build MacOS
- GitHub Check: cargo-publish-dry-run
- GitHub Check: All lint checks
- GitHub Check: tests
- GitHub Check: tests-release
- GitHub Check: Build forest binaries on Linux AMD64
- GitHub Check: Deploy to Cloudflare Pages
🔇 Additional comments (7)
CHANGELOG.md (1)
32-32
: Changelog entry looks goodAccurately describes the new subcommand and links the PR.
docs/docs/users/reference/cli.sh (1)
133-133
: Docs generator updated for db importGood addition; help output will be captured consistently with other subcommands.
src/daemon/bundle.rs (1)
51-51
: Use of CarStream::new_from_path simplifies I/OCleaner and async-friendly open; no behavior change.
src/db/car/mod.rs (1)
36-36
: Exposing V2_SNAPSHOT_ROOT_COUNT is reasonablePublic constant matches FRC‑0108 assumptions and can aid callers.
src/tool/subcommands/archive_cmd.rs (1)
681-681
: Path-based CarStream construction LGTMConsistent with new helper; reduces boilerplate.
src/rpc/methods/chain.rs (1)
451-458
: export_v2 call sites correctly updated
All instances now use the updated<D, F>
signature with matching parameters acrosschain.rs
,chain/tests.rs
, andutils/db/car_stream/tests.rs
.Cargo.toml (1)
111-111
: Confirm integer-encoding v4.0 feature and review breaking changes.
- The
tokio_async
feature exists in v4.0.0.- No changelog was retrievable via API—please review the crate’s 3.x→4.0 diff or release notes for any breaking changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
src/daemon/db_util.rs (2)
151-173
: Fix URL detection: local paths with colon are treated as URLs (breaks imports).Using Url::parse(from_path.display().to_string()) will classify Windows paths (e.g., C:\file.car) and POSIX filenames with ':' as URLs. Limit to http/https schemes; otherwise treat as a local file.
Apply this diff to make scheme checks explicit and fall back to local-file handling:
let downloaded_car_temp_path = new_forest_car_temp_path_in(forest_car_db_dir)?; - if let Ok(url) = Url::parse(&from_path.display().to_string()) { - download_to( - &url, - &downloaded_car_temp_path, - DownloadFileOption::Resumable, - snapshot_progress_tracker.create_callback(), - ) - .await?; - - snapshot_progress_tracker.completed(); - } else { - snapshot_progress_tracker.not_required(); - if ForestCar::is_valid(&EitherMmapOrRandomAccessFile::open(from_path)?) { - move_or_copy_file(from_path, &downloaded_car_temp_path, mode)?; - } else { - // For a local snapshot, we transcode directly instead of copying & transcoding. - transcode_into_forest_car(from_path, &downloaded_car_temp_path).await?; - if mode == ImportMode::Move { - std::fs::remove_file(from_path).context("Error removing original file")?; - } - } - } + let path_str = from_path.to_string_lossy(); + if let Ok(url) = Url::parse(&path_str) { + if url.scheme() == "http" || url.scheme() == "https" { + download_to( + &url, + &downloaded_car_temp_path, + DownloadFileOption::Resumable, + snapshot_progress_tracker.create_callback(), + ) + .await?; + snapshot_progress_tracker.completed(); + } else { + snapshot_progress_tracker.not_required(); + if ForestCar::is_valid(&EitherMmapOrRandomAccessFile::open(from_path)?) { + move_or_copy_file(from_path, &downloaded_car_temp_path, mode)?; + } else { + // For a local snapshot, we transcode directly instead of copying & transcoding. + transcode_into_forest_car(from_path, &downloaded_car_temp_path).await?; + if mode == ImportMode::Move { + std::fs::remove_file(from_path).context("Error removing original file")?; + } + } + } + } else { + snapshot_progress_tracker.not_required(); + if ForestCar::is_valid(&EitherMmapOrRandomAccessFile::open(from_path)?) { + move_or_copy_file(from_path, &downloaded_car_temp_path, mode)?; + } else { + // For a local snapshot, we transcode directly instead of copying & transcoding. + transcode_into_forest_car(from_path, &downloaded_car_temp_path).await?; + if mode == ImportMode::Move { + std::fs::remove_file(from_path).context("Error removing original file")?; + } + } + }
190-211
: Same scheme check needed in Auto mode.Auto mode uses Url::parse(...).is_ok() which has the same misclassification problem. Check for http/https schemes only.
- ImportMode::Auto => { - if Url::parse(&from_path.display().to_string()).is_ok() { + ImportMode::Auto => { + let path_str = from_path.to_string_lossy(); + if Url::parse(&path_str) + .ok() + .map(|u| u.scheme() == "http" || u.scheme() == "https") + .unwrap_or(false) + { // Fallback to move if from_path is url move_or_copy(ImportMode::Move).await?; } else if ForestCar::is_valid(&EitherMmapOrRandomAccessFile::open(from_path)?) {src/chain/mod.rs (1)
106-117
: Potential overflow/truncation when casting f3_data_len (u64 → usize)On 32-bit or for very large F3 data, casting u64 to usize can truncate and corrupt the frame length. Guard before casting.
Apply this diff:
- if let Some((f3_cid, mut f3_data)) = f3 { - let f3_data_len = f3_data.seek(SeekFrom::End(0))?; - f3_data.seek(SeekFrom::Start(0))?; + if let Some((f3_cid, mut f3_data)) = f3 { + let f3_data_len = f3_data.seek(SeekFrom::End(0))?; + if f3_data_len > usize::MAX as u64 { + anyhow::bail!( + "f3 data too large to encode on this platform: {} bytes", + f3_data_len + ); + } + let f3_data_len = f3_data_len as usize; + f3_data.seek(SeekFrom::Start(0))?; prefix_data_frames.push({ let mut encoder = forest::new_encoder(forest::DEFAULT_FOREST_CAR_COMPRESSION_LEVEL)?; - encoder.write_car_block(f3_cid, f3_data_len as _, &mut f3_data)?; + encoder.write_car_block(f3_cid, f3_data_len, &mut f3_data)?; anyhow::Ok(( vec![f3_cid], finalize_frame(forest::DEFAULT_FOREST_CAR_COMPRESSION_LEVEL, &mut encoder)?, )) }); }
🧹 Nitpick comments (3)
src/tool/subcommands/db_cmd.rs (2)
111-121
: Avoid shadowing db_root; improve naming.Variable db_root shadows the imported function db_root, reducing readability.
- let db_root = if let Some(db) = db { + let db_root_path = if let Some(db) = db { db } else { let (_, config) = read_config(None, Some(chain.clone()))?; - db_root(&chain_path(&config))? + db_root(&chain_path(&config))? }; - println!("Opening parity-db at {}", db_root.display()); + println!("Opening parity-db at {}", db_root_path.display()); let db_writer = BlockstoreWithWriteBuffer::new_with_capacity( - open_db(db_root, &Default::default())?, + open_db(db_root_path, &Default::default())?, DB_WRITE_BUFFER_CAPACITY, );
129-141
: Optional UX: show which file is being imported.Progress message only shows total; include the current file to aid tracking.
- for snap in snapshot_files { - let mut car = CarStream::new_from_path(&snap).await?; + for snap in snapshot_files { + let file_display = snap.display().to_string(); + pb.set_message(format!("Importing {file_display}...")); + let mut car = CarStream::new_from_path(&snap).await?; while let Some(b) = car.try_next().await? { if !no_validation { b.validate()?; } db_writer.put_keyed(&b.cid, &b.data)?; total += 1; - let text = format!("{total} blocks imported"); - pb.set_message(text); + pb.set_message(format!("{total} blocks imported ({file_display})")); } }src/utils/db/car_stream.rs (1)
352-369
: Optional: bound header frame size to prevent OOMHeader should be tiny. Consider rejecting header frames above a small cap (e.g., 1–4 MiB) instead of 8 GiB to harden against malformed inputs.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lock
is excluded by!**/*.lock
📒 Files selected for processing (14)
CHANGELOG.md
(1 hunks)Cargo.toml
(1 hunks)docs/docs/users/reference/cli.sh
(1 hunks)src/chain/mod.rs
(2 hunks)src/chain/tests.rs
(2 hunks)src/daemon/bundle.rs
(1 hunks)src/daemon/db_util.rs
(1 hunks)src/db/car/mod.rs
(1 hunks)src/rpc/methods/chain.rs
(1 hunks)src/tool/subcommands/archive_cmd.rs
(1 hunks)src/tool/subcommands/db_cmd.rs
(3 hunks)src/utils/db/car_stream.rs
(6 hunks)src/utils/db/car_stream/tests.rs
(1 hunks)src/utils/io/mod.rs
(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (8)
src/daemon/bundle.rs (1)
src/utils/db/car_stream.rs (1)
new_from_path
(283-288)
src/daemon/db_util.rs (1)
src/utils/db/car_stream.rs (1)
new_from_path
(283-288)
src/tool/subcommands/archive_cmd.rs (1)
src/utils/db/car_stream.rs (1)
new_from_path
(283-288)
src/chain/mod.rs (1)
src/chain/tests.rs (1)
export_v2
(66-66)
src/rpc/methods/chain.rs (2)
src/chain/mod.rs (1)
export_v2
(67-129)src/chain/tests.rs (1)
export_v2
(66-66)
src/utils/io/mod.rs (1)
src/utils/net.rs (1)
reader
(67-112)
src/utils/db/car_stream.rs (2)
src/db/car/plain.rs (7)
read_v2_header
(314-337)reader
(318-318)read_v1_header
(346-360)new
(126-173)new
(439-441)metadata
(175-187)version
(199-201)src/utils/io/mod.rs (1)
skip_bytes
(48-53)
src/tool/subcommands/db_cmd.rs (4)
src/db/mod.rs (2)
db_root
(320-322)open_db
(324-326)src/cli_shared/mod.rs (2)
read_config
(24-41)chain_path
(20-22)src/db/blockstore_with_write_buffer.rs (1)
new_with_capacity
(38-44)src/utils/db/car_stream.rs (1)
new_from_path
(283-288)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: Build Ubuntu
- GitHub Check: Build MacOS
- GitHub Check: cargo-publish-dry-run
- GitHub Check: All lint checks
- GitHub Check: tests
- GitHub Check: tests-release
- GitHub Check: Build forest binaries on Linux AMD64
- GitHub Check: Check
- GitHub Check: Deploy to Cloudflare Pages
🔇 Additional comments (5)
src/daemon/db_util.rs (1)
308-316
: Switch to CarStream::new_from_path looks good.Cleaner and consistent with new constructor; no behavior change in transcode path.
src/utils/db/car_stream/tests.rs (1)
127-144
: Great parity check with optional F3 snapshot.Validates that F3 data doesn’t affect the block set; this directly tests the intended skipping behavior.
src/daemon/bundle.rs (1)
51-55
: Switch to CarStream::new_from_path is clean.Simplifies file handling while preserving behavior.
src/tool/subcommands/db_cmd.rs (1)
142-144
: Confirmed buffer flush on drop:BlockstoreWithWriteBuffer
’sDrop
impl callsflush_buffer()
, sodrop(db_writer)
already flushes all writes.src/chain/mod.rs (1)
67-74
: Signature generalization looks goodGeneric F: Seek + Read and options parameter make the API more flexible. Call sites appear adjusted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (3)
src/utils/db/car_stream.rs (3)
178-182
: Apply safe conversion todata_size
cast.Line 180 still uses an unsafe cast that was flagged in previous reviews:
.map(|h| h.data_size as u64)Negative
data_size
will wrap to huge u64 values, corruptingmax_car_v1_bytes
and theTake
limit. This is inconsistent with the safedata_offset
handling at line 173.Apply this diff:
- let max_car_v1_bytes = header_v2 - .as_ref() - .map(|h| h.data_size as u64) - .unwrap_or(u64::MAX); + let max_car_v1_bytes = if let Some(h) = header_v2.as_ref() { + u64::try_from(h.data_size).map_err(std::io::Error::other)? + } else { + u64::MAX + };
195-214
: Validate F3 block length and avoid unsafe cast.Lines 203-204 read the F3 block length into
usize
and cast tou64
without validation:let len: usize = reader.read_varint_async().await?; reader = skip_bytes(reader, len as _).await?;Issues:
- On 32-bit systems,
usize
can overflow when reading a large varint- No upper bound check allows pathological frame sizes
as _
cast hides potential truncationAs flagged in previous reviews, this should read as
u64
and validate againstMAX_FRAME_LEN
.Apply this diff:
// Skip the F3 block in the block stream if metadata.f3_data.is_some() { - let len: usize = reader.read_varint_async().await?; - reader = skip_bytes(reader, len as _).await?; + const MAX_FRAME_LEN: u64 = 8 * 1024 * 1024 * 1024; // 8 GiB + let len: u64 = reader.read_varint_async().await?; + if len > MAX_FRAME_LEN { + return Err(io::Error::new( + io::ErrorKind::InvalidData, + format!("F3 block frame length too large: {len} > {MAX_FRAME_LEN}"), + )); + } + reader = skip_bytes(reader, len).await?; }
375-393
: Critical: Fixread_exact
misuse and add frame size validation.This code has two critical issues flagged in previous reviews:
Compile error: Line 384 assigns
read_exact
result ton
, buttokio::io::AsyncReadExt::read_exact
returnsResult<()>
, notResult<usize>
. The checkif n == len
will not compile.No size validation: Reading directly into
usize
risks overflow on 32-bit systems and allows pathological frame sizes that could exhaust memory.Apply this diff:
async fn read_frame<ReaderT: AsyncRead + Unpin>( reader: &mut ReaderT, ) -> std::io::Result<Option<Vec<u8>>> { - let len: usize = match reader.read_varint_async().await { + let len: u64 = match reader.read_varint_async().await { Ok(len) => len, Err(e) if e.kind() == std::io::ErrorKind::UnexpectedEof => return Ok(None), Err(e) => return Err(e), }; + const MAX_FRAME_LEN: u64 = 8 * 1024 * 1024 * 1024; // 8 GiB + if len > MAX_FRAME_LEN { + return Err(std::io::Error::new( + std::io::ErrorKind::InvalidData, + format!("frame too large: {len} > {MAX_FRAME_LEN}"), + )); + } + let len_usize = usize::try_from(len).map_err(|_| { + std::io::Error::new(std::io::ErrorKind::InvalidData, "frame len overflows usize") + })?; - let mut bytes = vec![0; len]; - let n = reader.read_exact(&mut bytes[..]).await?; - if n == len { - Ok(Some(bytes)) - } else { - Err(std::io::Error::new( - std::io::ErrorKind::UnexpectedEof, - format!("{len} expected, {n} read"), - )) - } + let mut bytes = vec![0; len_usize]; + reader.read_exact(&mut bytes[..]).await?; + Ok(Some(bytes)) }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/db/car/plain.rs
(1 hunks)src/utils/db/car_stream.rs
(6 hunks)src/utils/io/mod.rs
(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/utils/db/car_stream.rs (2)
src/db/car/plain.rs (7)
read_v2_header
(317-340)reader
(321-321)read_v1_header
(349-363)new
(126-176)new
(442-444)metadata
(178-190)version
(202-204)src/utils/io/mod.rs (1)
skip_bytes
(48-60)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: cargo-publish-dry-run
- GitHub Check: Build Ubuntu
- GitHub Check: Build MacOS
- GitHub Check: tests-release
- GitHub Check: tests
- GitHub Check: All lint checks
- GitHub Check: Deploy to Cloudflare Pages
- GitHub Check: Build forest binaries on Linux AMD64
🔇 Additional comments (7)
src/utils/io/mod.rs (1)
47-60
: LGTM! Error handling properly addresses past review concerns.The implementation correctly captures the number of bytes skipped and validates it matches the expected count, returning a descriptive error on short reads. This properly addresses the concerns raised in previous reviews about silent failures on EOF.
src/utils/db/car_stream.rs (6)
4-28
: LGTM!Import additions support the new path-based constructor and frame reading functionality.
120-122
: LGTM!Documentation accurately describes the automatic skipping behavior for metadata and F3 blocks per FRC-0108.
171-176
: LGTM!Safe conversion of
data_offset
usingtry_from
prevents negative values from wrapping to huge u64s.
286-293
: LGTM!Convenient path-based constructor simplifies file-based CarStream creation.
356-373
: LGTM!Refactored
read_v1_header
cleanly uses the newread_frame
helper and provides clear error messages.
395-402
: LGTM onceread_frame
is fixed.Clean wrapper that will work correctly once the underlying
read_frame
issues are resolved.
1b0d612
to
52f8905
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
src/utils/db/car_stream.rs (1)
181-184
: Validate data_size before casting to u64.The unsafe cast
h.data_size as u64
on line 183 will produce a huge value ifdata_size
is negative (e.g., -1 becomes u64::MAX), allowingTake
to read far beyond intended boundaries.Apply this diff to validate before casting:
- let max_car_v1_bytes = header_v2 - .as_ref() - .map(|h| h.data_size as u64) - .unwrap_or(u64::MAX); + let max_car_v1_bytes = if let Some(h) = header_v2.as_ref() { + u64::try_from(h.data_size).map_err(std::io::Error::other)? + } else { + u64::MAX + };
🧹 Nitpick comments (1)
src/utils/db/car_stream.rs (1)
384-408
: Consider simplifying redundant read_exact validation.The check on lines 400-406 is redundant since
read_exact
guarantees to read exactlybuf.len()
bytes or return an error. The function could be simplified to:let mut bytes = vec![0; len]; - let n = reader.read_exact(&mut bytes[..]).await?; - if n == len { - Ok(Some(bytes)) - } else { - Err(std::io::Error::new( - std::io::ErrorKind::UnexpectedEof, - format!("{len} expected, {n} read"), - )) - } + reader.read_exact(&mut bytes[..]).await?; + Ok(Some(bytes))Note: The current code is not incorrect, just more verbose than necessary.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/db/car/plain.rs
(1 hunks)src/utils/db/car_stream.rs
(5 hunks)src/utils/io/mod.rs
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/db/car/plain.rs
🧰 Additional context used
🧬 Code graph analysis (2)
src/utils/db/car_stream.rs (3)
src/db/car/plain.rs (7)
read_v2_header
(315-338)reader
(319-319)read_v1_header
(347-361)new
(126-174)new
(440-442)metadata
(176-188)version
(200-202)src/utils/io/mod.rs (1)
skip_bytes
(48-60)src/utils/encoding/mod.rs (1)
from_slice_with_fallback
(18-29)
src/utils/io/mod.rs (2)
src/db/car/plain.rs (3)
reader
(319-319)new
(126-174)new
(440-442)src/utils/net.rs (1)
reader
(67-112)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: Build Ubuntu
- GitHub Check: cargo-publish-dry-run
- GitHub Check: Build MacOS
- GitHub Check: tests
- GitHub Check: tests-release
- GitHub Check: Build forest binaries on Linux AMD64
- GitHub Check: Deploy to Cloudflare Pages
- GitHub Check: All lint checks
🔇 Additional comments (6)
src/utils/io/mod.rs (1)
47-60
: LGTM! Skip semantics correctly enforced.The function properly validates that exactly
n
bytes are skipped and returns a descriptive error when encountering EOF early. The implementation addresses previous review concerns.src/utils/db/car_stream.rs (5)
172-179
: Good use of try_from for offset validation.The code correctly validates that
data_offset
is non-negative before skipping bytes, preventing wraparound from negative i64 values.
198-223
: Approve metadata and F3 block skipping logic.The implementation correctly:
- Detects metadata blocks by comparing CID and root count
- Validates F3 frame length against MAX_FRAME_LEN before skipping
- Uses
skip_bytes
which will error on short reads- Skips the metadata block from the stream when detected
295-302
: LGTM! Convenient path-based constructor.The helper simplifies common usage by handling file opening and buffering internally.
365-382
: LGTM! Clear header parsing with proper validation.The function correctly reads the header frame, decodes it with fallback support, and validates the version.
410-417
: LGTM! Clean helper for reading CAR blocks.The function properly chains frame reading and block parsing with appropriate error handling.
52f8905
to
d4c8ce4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
src/utils/db/car_stream.rs (2)
32-33
: Consider 32-bit portability for MAX_FRAME_LEN.The constant
MAX_FRAME_LEN
is defined as 8 GiB, which exceedsusize::MAX
on 32-bit systems (4 GiB), causing a compile-time overflow. If 32-bit support is not required, this is acceptable. Otherwise, consider using conditional compilation or a u64 constant with runtime checks when converting to usize.Example with conditional compilation:
#[cfg(target_pointer_width = "64")] const MAX_FRAME_LEN: usize = 8 * 1024 * 1024 * 1024; // 8 GiB #[cfg(target_pointer_width = "32")] const MAX_FRAME_LEN: usize = 2 * 1024 * 1024 * 1024; // 2 GiB
385-409
: Consider removing redundant length check.The function correctly implements frame reading with proper bounds checking (line 389). However, the check at lines 401-407 is redundant because
read_exact
guarantees to either read exactlybuf.len()
bytes or return an error—it never returnsOk(n)
wheren < buf.len()
.You can simplify the code by removing the redundant check:
let mut bytes = vec![0; len]; - let n = reader.read_exact(&mut bytes[..]).await?; - if n == len { - Ok(Some(bytes)) - } else { - Err(std::io::Error::new( - std::io::ErrorKind::UnexpectedEof, - format!("{len} expected, {n} read"), - )) - } + reader.read_exact(&mut bytes[..]).await?; + Ok(Some(bytes))Note: This was discussed in the previous review and acknowledged as a minor cleanup opportunity.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/db/car/plain.rs
(1 hunks)src/utils/db/car_stream.rs
(5 hunks)src/utils/io/mod.rs
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- src/db/car/plain.rs
- src/utils/io/mod.rs
🧰 Additional context used
🧬 Code graph analysis (1)
src/utils/db/car_stream.rs (3)
src/db/car/plain.rs (7)
read_v2_header
(315-338)reader
(319-319)read_v1_header
(347-361)new
(126-174)new
(440-442)metadata
(176-188)version
(200-202)src/utils/io/mod.rs (1)
skip_bytes
(48-60)src/utils/encoding/mod.rs (1)
from_slice_with_fallback
(18-29)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Build forest binaries on Linux AMD64
- GitHub Check: tests
- GitHub Check: tests-release
- GitHub Check: Build Ubuntu
- GitHub Check: Build MacOS
- GitHub Check: cargo-publish-dry-run
- GitHub Check: All lint checks
🔇 Additional comments (6)
src/utils/db/car_stream.rs (6)
174-186
: LGTM! Header offset/size validation properly implemented.The code now correctly validates that
data_offset
anddata_size
from the CARv2 header are non-negative before casting tou64
, preventing potential corruption from negative values. This addresses the critical issue from the previous review.
199-224
: LGTM! Metadata and F3 block skipping logic correctly implemented.The code properly:
- Identifies V2 snapshots by checking root count
- Verifies the first block matches the expected metadata CID
- Deserializes and validates the metadata structure
- Bounds-checks the F3 block length (line 208) before skipping - this addresses the major issue from the previous review
- Skips both the F3 data block and metadata block from the stream
The logic correctly handles all edge cases and prevents the stream from including optional blocks.
296-303
: LGTM! Convenient path-based constructor.The
new_from_path
method provides a clean API for the common use case of opening a CAR file from a filesystem path. The implementation properly chains file opening, buffering, and stream initialization.
366-383
: LGTM! Clear refactoring of header reading logic.The refactored
read_v1_header
properly:
- Borrows the reader instead of consuming it, allowing continued use
- Delegates frame reading to the
read_frame
helper for consistency- Provides explicit error messages for missing frames and version mismatches
- Uses the fallback deserializer for robustness
411-418
: LGTM! Clean helper function.The
read_car_block
helper provides a concise abstraction for reading and parsing CAR blocks. The use oftranspose()
elegantly handles theOption<Result<T>>
→Result<Option<T>>
conversion.
420-424
: LGTM! Consistent frame size limits.Configuring
UviBytes::set_max_len(MAX_FRAME_LEN)
ensures the codec-level decoder enforces the same frame size limit as the manual checks inread_frame
, providing defense in depth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
src/tool/subcommands/db_cmd.rs
Outdated
/// No block validation | ||
#[arg(long)] | ||
no_validation: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: will skip_validation
be more appropriate here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of comments,
Not directly related to changes but, you can also use usize::try_from(f3_data_len)
here as well
src/utils/db/car_stream.rs
Outdated
}; | ||
let mut bytes = vec![0; len]; | ||
let n = reader.read_exact(&mut bytes[..]).await?; | ||
if n == len { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This check is not needed, read_exact
already make sure that we are reading exact len
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
Err(e) if e.kind() == std::io::ErrorKind::UnexpectedEof => return Ok(None), | ||
Err(e) => return Err(e), | ||
}; | ||
let mut bytes = vec![0; len]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hanabi1224 Loading 8GB directly into the memory, could this be an issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MAX_FRAME_LEN
is a sane upper bound to check len
against before allocating the memory, in practice, the F3 data frame could grow to 8GiB after 10+ years (And we skip F3 data directly so no allocation performed).
Changed MAX_FRAME_LEN
to 512GiB
and use a separate MAX_F3_FRAME_LEN
(16GiB) for F3 frame
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(512Mib)*
src/tool/subcommands/db_cmd.rs
Outdated
} => { | ||
const DB_WRITE_BUFFER_CAPACITY: usize = 10000; | ||
|
||
let db_root = if let Some(db) = db { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: db_root
is slightly confusing, you can change it to db_root_path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
src/tool/subcommands/db_cmd.rs (3)
103-109
: Avoid double-negative naming for clarity.Rename
no_validation
toskip_validation
and invert the conditional toif !skip_validation
to match the flag name.- Self::Import { - snapshot_files, - chain, - db, - skip_validation: no_validation, - } => { + Self::Import { + snapshot_files, + chain, + db, + skip_validation, + } => { @@ - if !no_validation { + if !skip_validation { b.validate()?; }
111-121
: Honor DB engine/config instead of hardcoded defaults.
open_db(..., &Default::default())
ignores user config (engine, options). Load DB config viaread_config
and pass it toopen_db
.Would you confirm the correct config field to pass (e.g.,
config.store.db
or equivalent) and updateopen_db
accordingly?
129-141
: Add error context for easier debugging.Wrap file-specific operations with context (which file failed; which step).
+ use anyhow::Context as _; @@ - let mut car = CarStream::new_from_path(&snap).await?; + let mut car = CarStream::new_from_path(&snap) + .await + .with_context(|| format!("opening snapshot {}", snap.display()))?; while let Some(b) = car.try_next().await? { @@ - db_writer.put_keyed(&b.cid, &b.data)?; + db_writer + .put_keyed(&b.cid, &b.data) + .with_context(|| format!("writing block {} from {}", b.cid, snap.display()))?; @@ - pb.set_message(text); + if (total & 0x3FF) == 0 { // update every 1024 blocks + pb.set_message(text); + }src/utils/db/car_stream.rs (1)
296-303
: Consider exposing a sync-path constructor too (optional).A
new_from_reader
for anyAsyncBufRead + AsyncSeek
would help tests and non-file sources.new_from_path
can delegate to it.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
CHANGELOG.md
(1 hunks)src/tool/subcommands/db_cmd.rs
(3 hunks)src/utils/db/car_stream.rs
(5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- CHANGELOG.md
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-14T14:31:20.565Z
Learnt from: hanabi1224
PR: ChainSafe/forest#6161
File: src/tool/subcommands/db_cmd.rs:123-144
Timestamp: 2025-10-14T14:31:20.565Z
Learning: In Rust Forest codebase, `BlockstoreWithWriteBuffer` implements the `Drop` trait (at src/db/blockstore_with_write_buffer.rs) which automatically calls `flush_buffer()` to persist buffered writes when the object goes out of scope. Explicit flush calls are not required before dropping the wrapper.
Applied to files:
src/tool/subcommands/db_cmd.rs
🧬 Code graph analysis (2)
src/utils/db/car_stream.rs (3)
src/db/car/plain.rs (8)
read_v2_header
(315-338)reader
(319-319)try_from
(242-244)read_v1_header
(347-361)new
(126-174)new
(440-442)metadata
(176-188)version
(200-202)src/utils/io/mod.rs (1)
skip_bytes
(48-60)src/utils/encoding/mod.rs (1)
from_slice_with_fallback
(18-29)
src/tool/subcommands/db_cmd.rs (4)
src/db/mod.rs (2)
db_root
(320-322)open_db
(324-326)src/cli_shared/mod.rs (2)
read_config
(24-41)chain_path
(20-22)src/db/blockstore_with_write_buffer.rs (1)
new_with_capacity
(38-44)src/utils/db/car_stream.rs (1)
new_from_path
(297-302)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Build forest binaries on Linux AMD64
- GitHub Check: All lint checks
- GitHub Check: Build Ubuntu
- GitHub Check: Build MacOS
- GitHub Check: cargo-publish-dry-run
- GitHub Check: tests
- GitHub Check: tests-release
🔇 Additional comments (1)
src/tool/subcommands/db_cmd.rs (1)
142-144
: Surface flush errors instead of only warning on Drop.Drop currently logs a warning on flush failure; the CLI still returns success. Explicitly flush and propagate errors so imports don’t silently “succeed.”
- drop(db_writer); + // Prefer explicit flush here if API exists to propagate errors. + // e.g., db_writer.flush()?; + drop(db_writer); pb.finish();To locate a public flush method, run:
@akaladarshi Fixed by 0b1e490 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
src/utils/db/car_stream.rs (2)
172-179
: Header v2 offset/size casts look safetry_from on data_offset/data_size before skip/limit is correct.
Also applies to: 181-186
32-34
: Use u64 for frame lengths end-to-end; avoid 32-bit pitfalls and overflow during varint decodeDecode lengths as u64, bound-check, then try_into usize for allocation; clamp UviBytes max_len when converting. This hardens against 32‑bit overflow and oversized frames.
-// 512MiB -const MAX_FRAME_LEN: usize = 512 * 1024 * 1024; +// 512 MiB as u64 (safe on 32-bit and 64-bit) +const MAX_FRAME_LEN: u64 = 512 * 1024 * 1024; @@ -async fn read_frame<ReaderT: AsyncRead + Unpin>( +async fn read_frame<ReaderT: AsyncRead + Unpin>( reader: &mut ReaderT, ) -> std::io::Result<Option<Vec<u8>>> { - let len: usize = match reader.read_varint_async().await { + let len: u64 = match reader.read_varint_async().await { Ok(len) if len > MAX_FRAME_LEN => { return Err(std::io::Error::new( std::io::ErrorKind::InvalidData, format!("frame too large: {len} > {MAX_FRAME_LEN}"), )); } Ok(len) => len, Err(e) if e.kind() == std::io::ErrorKind::UnexpectedEof => return Ok(None), Err(e) => return Err(e), }; - let mut bytes = vec![0; len]; + let len_usize = usize::try_from(len) + .map_err(|_| std::io::Error::new(std::io::ErrorKind::InvalidData, "frame len overflows usize"))?; + let mut bytes = vec![0; len_usize]; reader.read_exact(&mut bytes[..]).await?; Ok(Some(bytes)) } @@ pub fn uvi_bytes() -> UviBytes { let mut decoder = UviBytes::default(); - decoder.set_max_len(MAX_FRAME_LEN); + // UviBytes takes usize; clamp on 32-bit. + decoder.set_max_len(std::cmp::min(MAX_FRAME_LEN, usize::MAX as u64) as usize); decoder }Also applies to: 387-404, 415-418
🧹 Nitpick comments (1)
src/utils/db/car_stream.rs (1)
199-227
: Prefer fallback decoder for metadata CBORUse from_slice_with_fallback for FilecoinSnapshotMetadata to be resilient to legacy DAG‑CBOR nuances, mirroring header handling.
- && let Ok(metadata) = - fvm_ipld_encoding::from_slice::<FilecoinSnapshotMetadata>(&block.data) + && let Ok(metadata) = + from_slice_with_fallback::<FilecoinSnapshotMetadata>(&block.data)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/utils/db/car_stream.rs
(5 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/utils/db/car_stream.rs (3)
src/db/car/plain.rs (5)
read_v2_header
(315-338)reader
(319-319)new
(126-174)new
(440-442)metadata
(176-188)src/utils/io/mod.rs (1)
skip_bytes
(48-60)src/utils/encoding/mod.rs (1)
from_slice_with_fallback
(18-29)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: cargo-publish-dry-run
- GitHub Check: Build MacOS
- GitHub Check: Build Ubuntu
- GitHub Check: tests
- GitHub Check: tests-release
- GitHub Check: Build forest binaries on Linux AMD64
- GitHub Check: All lint checks
- GitHub Check: Check
- GitHub Check: Deploy to Cloudflare Pages
🔇 Additional comments (2)
src/utils/db/car_stream.rs (2)
298-305
: Convenience path constructor LGTMnew_from_path is concise and aligns callers on a single API.
120-125
: Doc note about skipping metadata/F3 is clearGood to set expectations for stream consumers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/utils/db/car_stream.rs (1)
103-111
: Enforce data_len when writing frames to prevent truncated blocksstd::io::copy may write fewer bytes than data_len; we should validate to avoid emitting a malformed frame (length prefix won’t match payload).
Apply:
impl<T: Write> CarBlockWrite for T { - fn write_car_block(&mut self, cid: Cid, data_len: u64, data: &mut impl Read) -> io::Result<()> { + fn write_car_block(&mut self, cid: Cid, data_len: u64, data: &mut impl Read) -> io::Result<()> { let frame_length = cid.encoded_len() as u64 + data_len; self.write_all(&frame_length.encode_var_vec())?; cid.write_bytes(&mut *self) .map_err(|e| io::Error::new(io::ErrorKind::InvalidData, e))?; - std::io::copy(data, self)?; - Ok(()) + let written = std::io::copy(data, self)?; + if written == data_len { + Ok(()) + } else { + Err(io::Error::new( + io::ErrorKind::UnexpectedEof, + format!("wrote {written} bytes, expected {data_len} for block {cid}"), + )) + } } }
♻️ Duplicate comments (1)
src/utils/db/car_stream.rs (1)
203-209
: Fix constant name in error message (MAX_F3_FRAME_LEN vs MAX_FRAME_LEN)Message references the wrong limit; use MAX_F3_FRAME_LEN for clarity.
- format!("f3 block frame length too large: {len} > {MAX_FRAME_LEN}"), + format!("f3 block frame length too large: {len} > {MAX_F3_FRAME_LEN}"),
🧹 Nitpick comments (1)
src/utils/db/car_stream.rs (1)
168-173
: Prefer InvalidData with descriptive message for negative offsets/sizesmap_err(std::io::Error::other) hides context. Emit InvalidData with clear text for data_offset/data_size conversion failures.
- u64::try_from(header_v2.data_offset).map_err(std::io::Error::other)?, + u64::try_from(header_v2.data_offset).map_err(|_| { + io::Error::new(io::ErrorKind::InvalidData, format!("negative data_offset: {}", header_v2.data_offset)) + })?, @@ - .map(|h| u64::try_from(h.data_size).map_err(std::io::Error::other)) + .map(|h| u64::try_from(h.data_size).map_err(|_| { + io::Error::new(io::ErrorKind::InvalidData, format!("negative data_size: {}", h.data_size)) + }))Also applies to: 175-181
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/chain/mod.rs
(3 hunks)src/tool/subcommands/archive_cmd.rs
(2 hunks)src/utils/db/car_stream.rs
(7 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
src/tool/subcommands/archive_cmd.rs (1)
src/utils/db/car_stream.rs (1)
new_from_path
(293-298)
src/chain/mod.rs (1)
src/chain/tests.rs (1)
export_v2
(66-66)
src/utils/db/car_stream.rs (3)
src/db/car/plain.rs (7)
read_v2_header
(315-338)new
(126-174)new
(440-442)reader
(319-319)read_v1_header
(347-361)metadata
(176-188)version
(200-202)src/utils/io/mod.rs (1)
skip_bytes
(48-60)src/utils/encoding/mod.rs (1)
from_slice_with_fallback
(18-29)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: All lint checks
- GitHub Check: Build Ubuntu
- GitHub Check: Build MacOS
- GitHub Check: cargo-publish-dry-run
- GitHub Check: Build forest binaries on Linux AMD64
- GitHub Check: tests
- GitHub Check: tests-release
🔇 Additional comments (6)
src/tool/subcommands/archive_cmd.rs (1)
681-683
: Path-based CarStream and u64 length usage are appropriateCarStream::new_from_path simplifies I/O setup, and passing f3_data_len (u64) to write_car_block aligns with the new API. Looks correct.
Also applies to: 714-719
src/utils/db/car_stream.rs (4)
193-221
: Nice: CarStream now transparently skips v2 metadata and optional F3 blockThe first-block probing + conditional skip keeps consumers unchanged while honoring FRC‑0108. The MAX_F3_FRAME_LEN guard prevents pathological skips.
Please confirm tests cover:
- v1 snapshot (no metadata) → first block yielded unchanged
- v2 snapshot with metadata but no F3 → metadata skipped
- v2 snapshot with metadata+F3 → both skipped; stream starts at next block
292-299
: Convenient constructornew_from_path is a useful ergonomic addition and matches call sites in archive_cmd.
362-379
: Header read path looks solidread_v1_header via read_frame with size bounds + explicit version check is straightforward and safe.
409-413
: Bounded decoderuvi_bytes with MAX_FRAME_LEN guard is appropriate given normal block sizes.
src/chain/mod.rs (1)
67-74
: I can see all call sites are being reviewed. Let me verify the type off3_data_len
at the chain/mod.rs call site to ensure it's u64 and not usize:No remaining issues detected—all write_car_block callers correctly pass u64
The Seek trait's seek method returns Result, which means all three call sites identified in the scan pass the correct type:
src/utils/db/car_stream.rs:62
casts explicitly withas u64
src/tool/subcommands/archive_cmd.rs:718
usesf3_data_len
fromseek()
(u64)src/chain/mod.rs:111
usesf3_data_len
fromseek()
(u64)The trait definition at
src/utils/db/car_stream.rs:100
correctly specifiesdata_len: u64
and all callers comply.
Summary of changes
Usage:
Changes introduced in this pull request:
forest-tool db import
subcommandCarStream
to skip metadata and F3 blocksReference issue to close (if applicable)
Closes #6162
Other information and links
Change checklist
Summary by CodeRabbit
New Features
Documentation
Bug Fixes
Tests