BurnpackStore #3792

antimora · 2025-09-26T04:31:40Z

This PR improves Burn's DefaultFileRecorder by replacing the inefficient NamedMpkFileRecorder with BurnpackStore and new Burnpack format that addresses critical performance and compatibility issues.

Problems with current NamedMpkFileRecorder:

Memory Inefficient: Deserializes the entire file and loads all weights into memory at once
No Lazy Loading: Cannot selectively load individual tensors
No-std Incompatible: Uses rmp-serde which doesn't support no-std environments
Poor Performance: Unnecessary memory overhead for large models

Pull Request Template

Checklist

Confirmed that cargo run-checks command has been executed.
Made sure the book is up to date with changes in this PR.

Related Issues/PRs

Depends on #3741 (PyTorch store changes)

Changes

A new native storage format (Burnpack Format) that serves as Burn's improved DefaultFileRecorder (NamedMpkFileRecorder):

Key Improvements:

Lazy Tensor Loading: Look up and load individual tensor weights on-demand without deserializing the entire file
Memory Efficient: Only loads requested tensors into memory, critical for large models
True No-std Support: Replaced rmp-serde with ciborium (CBOR ETF RFC 8949 for proper embedded system compatibility
Zero-Copy Operations: Memory-mapped I/O support for efficient data access
Reader/Writer are ready for user provided buffers and Bytes. Once we have a backend allocated buffer (pinned GPU memory), we can update easily.
Ready for production use with untrusted input. Denial of Service (DOS) vulnerabilities related to resource exhaustion fixes.
Thoroughly benchmarked and audited.
All panics are handled and returned as errors (except on regex format related).

Technical Changes:

Implemented efficient tensor lookup via metadata index without full deserialization
Three storage backends: in-memory, memory-mapped files, and buffered I/O
CBOR metadata serialization works in no-std environments (tested on thumbv7m-none-eabi)
Modified ModuleMapper to work with a Param instead of Tensor to work with uninitialized params, load/save param hooks.
Modified ModuleVisitor to work with a Param to keep param.

Testing

Embedded Target: Successfully builds and runs on thumbv7m-none-eabi
Memory Efficiency: Verified lazy loading with large tensor files (1MB+ tensors)
Backward Compatible: All existing recorder tests pass
Comprehensive Test Suite: Added 18 test modules covering all edge cases
Round-trip Testing: Ensures data integrity through save/load cycles

Benchmarks

LOADING Benchmarks (Median Values)

Maximum Memory Allocation During Load

Loading Method	LibTorch	NdArray (CPU)	WGPU (GPU)
Burnpack Store (new)	16.82 MB	344.1 MB	243.4 MB
NamedMpk Recorder (old)	344.1 MB	344.1 MB	344.1 MB
SafeTensors Recorder (old)	654.6 MB	654.6 MB	654.6 MB
SafeTensors Store (new)	33.58 MB	344.1 MB	243.4 MB
PyTorch Recorder (old)	344.1 MB	360.9 MB	344.2 MB
PyTorch Store (new)	33.59 MB	344.1 MB	176.2 MB

Load Time vs Memory Trade-off

Loading Method	LibTorch	NdArray (CPU)	WGPU (GPU)
Burnpack Store	36.72 ms / 16.82 MB	34.21 ms / 344.1 MB	60.31 ms / 243.4 MB
NamedMpk Recorder	38.43 ms / 344.1 MB	34.77 ms / 344.1 MB	80.69 ms / 344.1 MB
SafeTensors Recorder	125.8 ms / 654.6 MB	932.8 ms / 654.6 MB	166 ms / 654.6 MB
SafeTensors Store	722.1 ms / 33.58 MB	602.5 ms / 344.1 MB	648.9 ms / 243.4 MB
PyTorch Recorder	229.6 ms / 344.1 MB	1,026 ms / 360.9 MB	261.3 ms / 344.2 MB
PyTorch Store	858.9 ms / 33.59 MB	741.1 ms / 344.1 MB	798.5 ms / 176.2 MB

SAVING Benchmarks (Median Values)

Maximum Memory Allocation During Save

Saving Method	LibTorch	NdArray (CPU)	WGPU (GPU)
Burnpack Store (new)	16.81 MB	360.9 MB	17.09 MB
NamedMpk Recorder (old)	327.3 MB	327.3 MB	327.6 MB
SafeTensors Store (new)	33.59 MB	360.9 MB	33.85 MB

Save Time vs Memory Trade-off

Saving Method	LibTorch	NdArray (CPU)	WGPU (GPU)
Burnpack Store	214.1 ms / 16.81 MB	1,406 ms / 360.9 MB	66.94 ms / 17.09 MB
NamedMpk Recorder	217.3 ms / 327.3 MB	539.2 ms / 327.3 MB	58.63 ms / 327.6 MB
SafeTensors Store	219 ms / 33.59 MB	1,396 ms / 360.9 MB	66.9 ms / 33.85 MB

Benchmark details
load_report.txt
save_report.txt

Replaces candle-core pickle parsing with burn-store's PyTorch reader for improved compatibility and maintainability. Adds tensor snapshot support, updates config and reader modules, and adjusts dependencies. Includes new test files and updates feature flags in Cargo.toml files.

Implemented handling for 'BoolStorage' in the PyTorch pickle reader, allowing boolean tensors to be loaded. Updated the test to enable the boolean tensor test, which was previously ignored due to lack of support.

Introduces extensive tests for the PyTorch file reader covering various tensor types, shapes, edge cases, and nested structures. Updates the pickle_reader to correctly parse int32, int16, and int8 tensor data. Adds a Python script to generate test .pt files and integrates all test data into the repository for robust validation.

This update adds robust handling for legacy PyTorch checkpoint formats (pre-1.6), including sequential pickle streams and embedded storage data. The pickle reader now supports additional opcodes and improved error messages, while the main reader detects legacy formats and extracts tensors with correct storage offsets. New tests and test data files verify compatibility with legacy files, shared storage, and error handling for corrupted files.

Introduces the PytorchMetadata struct and related enums to capture metadata about loaded PyTorch files, including format type, version, byte order, tensor count, and data size. Updates PytorchReader to expose metadata and adds tests to verify metadata extraction for ZIP and legacy formats.

Replaces legacy zip/pickle reading logic in burn-import's config.rs with the new PytorchReader and PickleValue API from burn-store. Adds PickleValue enum and read_pickle_data method to PytorchReader for simplified config extraction. Updates error handling, test coverage, and public API to support reading configuration and metadata from PyTorch files in a more robust and extensible way.

Introduces PytorchReader::load_config for deserializing configuration data from PyTorch files using serde. Refactors config loading in burn-import to use this new API, adds related tests, and updates dependencies to support custom serde features.

Introduces the PytorchStore struct for loading models from PyTorch checkpoint files (.pt/.pth), with support for filtering, remapping, and validation. Adds comprehensive tests for various model types and error handling. Saving to PyTorch format is not yet supported.

Improves the adapter system to use container type information for correct tensor transformations (e.g., transposing linear weights, renaming normalization parameters) and refactors the Applier to support adapters and provide more detailed error handling. Updates tests and documentation to reflect new module-aware behavior.

Unified the `collect` and `apply` methods to accept an optional `PathFilter` for flexible tensor filtering. Deprecated specialized methods in favor of a single interface, updated all usages and tests, and improved documentation for clarity. This change simplifies the API and enhances consistency across tensor snapshot operations.

The Collector and ModuleSnapshot traits now accept an optional ModuleAdapter to transform tensors during collection. This change centralizes tensor adaptation logic, simplifies usage in SafetensorsStore, and updates all relevant tests and usages to support the new adapter parameter.

Introduces a test for verifying multi-layer neural network loading from SafeTensors format using a PyTorch adapter. The test checks successful parameter loading and validates the model's forward output against expected values.

Enhanced crate-level and module-level documentation for burn-store, detailing key features, usage examples, and configuration options for model storage and PyTorch interoperability. Improves clarity for users integrating Burn with PyTorch and using advanced storage features.

Documentation and examples now clarify that PyTorchToBurnAdapter is applied automatically when loading PyTorch models, handling weight transposition and normalization parameter renaming by default. Code comments and docstrings have been updated for consistency and improved guidance.

Introduces a new benchmark suite for PyTorch model loading in burn-store, including a Python script to generate model files and a Rust benchmark comparing old and new loading methods across multiple backends. Updates Cargo.toml to register the new benchmark.

Introduces the LazyDataSource abstraction to support efficient, on-demand loading of tensor data from PyTorch files, including ZIP archives and legacy multi-storage formats. Refactors pickle_reader and reader modules to utilize lazy loading, reducing memory usage and improving performance for large models.

Removes unused FileSource and related code, improves lazy boundary detection for legacy multi-storage format by tracking storage usage and storage keys, and adds skip_pickle for efficient pickle skipping. Updates pickle_reader to support optional data sources and refactors error handling for tensor data loading. Enhances legacy format metadata extraction and adds a detailed test for legacy metadata correctness.

Introduces a Python script and Rust benchmark for loading and profiling a ResNet18 PyTorch model in burn-store. Refactors lazy loading in LegacyMultiStorageSource to strictly require storage boundaries, removing fallback to full blob loading. Updates lazy data range reading to only read requested tensor ranges. Ensures storage usage is tracked immediately during tensor reconstruction for accurate lazy boundary detection.

Replaces all usages of Param::into_initialized with Param::from_mapped_value across module, quantize, reinit, and optimizer adaptor code. Updates the method name in the Param implementation to improve clarity and consistency in parameter mapping operations.

Renamed Param<Tensor> methods: 'save' to 'transform_for_save' and 'load' to 'transform_for_load' to better reflect their purpose of applying transformations during serialization and deserialization. Updated all usages in burn-core and burn-store accordingly for improved code clarity.

Replaces all usages and documentation of `collect_to` and `apply_from` with the more descriptive `save_into` and `load_from` methods for model serialization and deserialization. Updates all code, tests, examples, and documentation to use the new method names, improving clarity and consistency across the burn-store crate.

Replaces all usages and references of the ModuleSnapshoter trait with ModuleStore across the codebase, including trait implementations, imports, and documentation. This change improves naming consistency and clarity for module storage operations.

Renamed the BurnpackHeader::to_bytes method to into_bytes for clarity and consistency with Rust naming conventions. Updated all usages in tests and writer modules accordingly.

Refactored BurnpackWriter to support writing directly into caller-provided buffers via a new write_into() method, and added a size() method to calculate the required buffer size. Updated internal APIs and tests to use the new Bytes type for in-memory storage, improved memory efficiency, and enabled buffer reuse for serialization. Also added comprehensive tests for buffer-based writing and error handling.

Updated all references, documentation, tests, and implementation logic to use the .bpk file extension instead of .burnpack for BurnpackStore files.

Replaces references to `load()` and `save()` with `transform_for_load()` and `transform_for_save()` in the ParamMapper documentation to accurately describe where transformations are applied.

Enhanced the StorageBackend::read_into method to return errors for out-of-bounds and offset overflow conditions, ensuring consistent and safe behavior across backends. Added unit tests to verify error handling for out-of-bounds reads and offset overflows.

Replaces calls to `as_ref()` with `&*bytes` in assertions comparing byte slices in writer tests.

This commit adds overflow checking for metadata size, tensor shape dimensions, and data offsets in BurnpackReader and BurnpackWriter. It also validates tensor data length consistency during writing, ensuring that actual and expected lengths match. These changes improve robustness against corrupted or malformed files and prevent potential panics or undefined behavior due to integer overflows.

antimora · 2025-10-10T15:13:24Z

@laggui, I have addressed all @nathanielsimard feedback. It's more robust. Additionally, I have fixed potential data corruption issues (made more robust and strict).

One of the biggest improvements is Bytes/[u8] usage. I added todos for to have a better control over byte allocation, as we transition to Backends allocator. So it should be easier now to transition to that model.

Added implementations of the core::error::Error trait for ApplyError, BurnpackError, and TensorSnapshotError. Also implemented Display for TensorSnapshotError to improve error reporting and compatibility with standard error handling.

Refactored BurnpackReader::get_snapshots to return Result and propagate BurnpackError instead of panicking on corrupted tensor shape or offset data. Updated call sites and added tests to ensure errors are returned for invalid tensor metadata, improving robustness and error handling.

Introduced maximum limits for metadata size, tensor size, tensor count, and CBOR deserialization recursion depth to prevent resource exhaustion and DoS attacks. Updated BurnpackReader to validate these limits during file parsing and tensor access.

Introduces checks in BurnpackReader to ensure the underlying file or buffer is large enough to contain all claimed tensor data, preventing truncated file errors. Adds tests to verify correct error handling for truncated files and successful reading when file size is exactly correct.

Introduces a MAX_FILE_SIZE constant (100 GB) to prevent resource exhaustion from extremely large files. Both mmap and buffered file loading methods in BurnpackReader now validate the file size before proceeding.

antimora · 2025-10-10T17:57:01Z

Addressed Denial of Service (DOS) vulnerabilities related to resource exhaustion. This is for a production use with untrusted input.

Set MAX_TENSOR_SIZE to 2 GB on 32-bit platforms and 10 GB on 64-bit platforms to prevent memory exhaustion. Also, conditionally define MAX_FILE_SIZE and its usage based on the 'std' feature to improve portability.

Introduces ParamId support in the Burnpack file format for stateful training continuation. Updates the format specification, core types, reader and writer logic, and adds comprehensive tests to ensure ParamId is preserved and backward compatible. Documentation is updated to reflect the new feature.

nathanielsimard

🔥🔥🔥

antimora added 30 commits September 16, 2025 18:07

Add copy Candle's PyTorch pickle reader to burn-store crate

cea0804

Refactor PyTorch pickle reader for TensorSnapshot use

581e0e1

Handle Class object in tensor storage type parsing

dc93872

Improve storage file matching in pickle_reader

1484d74

Refactor tensor extraction to remove unused data_files param

ce1347a

Add support for loading boolean tensors from PyTorch

08b7647

Implemented handling for 'BoolStorage' in the PyTorch pickle reader, allowing boolean tensors to be loaded. Updated the test to enable the boolean tensor test, which was previously ignored due to lack of support.

Remove unused read_pytorch_tensors function

f142bfd

Remove PyTorch version warning in reader

c8781ae

Remove unused metastack field from Stack struct

15e2511

Update script usage instruction in test_data.py

a7afbe9

Merge remote-tracking branch 'upstream/main' into pytorch-pt-store

6a7a124

Add multi-layer SafeTensors model loading test

9a4be60

Introduces a test for verifying multi-layer neural network loading from SafeTensors format using a PyTorch adapter. The test checks successful parameter loading and validates the model's forward output against expected values.

Remove unused test modules from safetensors tests

1e46470

Rename with_key_pattern to with_key_remapping

306877f

antimora added 17 commits October 9, 2025 11:49

Remove extensive inline documentation from param/base.rs

1ec0e8e

Remove redundant return value docs from Param methods

19ed28d

Remove outdated loading benchmark results from README

bb6ead7

Rename BurnpackHeader::to_bytes to into_bytes

68e0eb6

Renamed the BurnpackHeader::to_bytes method to into_bytes for clarity and consistency with Rust naming conventions. Updated all usages in tests and writer modules accordingly.

Switch Burnpack file extension from .burnpack to .bpk

0f460fb

Updated all references, documentation, tests, and implementation logic to use the .bpk file extension instead of .burnpack for BurnpackStore files.

Merge remote-tracking branch 'upstream/main' into burnpackstore

18019ea

Update ParamMapper doc to reflect method name changes

a4ea166

Replaces references to `load()` and `save()` with `transform_for_load()` and `transform_for_save()` in the ParamMapper documentation to accurately describe where transformations are applied.

Fix build

c84c2c3

Move burnpack tests to separate module file

a9385e4

Refactor tests to use slice deref for byte comparison

ccf90f8

Replaces calls to `as_ref()` with `&*bytes` in assertions comparing byte slices in writer tests.

antimora requested a review from nathanielsimard October 10, 2025 15:10

antimora added 5 commits October 10, 2025 12:03

Implement Error trait for custom error types

00ae841

Added implementations of the core::error::Error trait for ApplyError, BurnpackError, and TensorSnapshotError. Also implemented Display for TensorSnapshotError to improve error reporting and compatibility with standard error handling.

Add security limits to Burnpack reader

ba0d46d

Introduced maximum limits for metadata size, tensor size, tensor count, and CBOR deserialization recursion depth to prevent resource exhaustion and DoS attacks. Updated BurnpackReader to validate these limits during file parsing and tensor access.

Add maximum file size limit to BurnpackReader

fe60198

Introduces a MAX_FILE_SIZE constant (100 GB) to prevent resource exhaustion from extremely large files. Both mmap and buffered file loading methods in BurnpackReader now validate the file size before proceeding.

antimora added 3 commits October 10, 2025 13:15

Add platform-specific tensor size limits in burnpack

478723f

Set MAX_TENSOR_SIZE to 2 GB on 32-bit platforms and 10 GB on 64-bit platforms to prevent memory exhaustion. Also, conditionally define MAX_FILE_SIZE and its usage based on the 'std' feature to improve portability.

Fix formatting

207310c

nathanielsimard approved these changes Oct 16, 2025

View reviewed changes

nathanielsimard merged commit 165b21c into tracel-ai:main Oct 16, 2025
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BurnpackStore #3792

BurnpackStore #3792

Uh oh!

antimora commented Sep 26, 2025 •

edited

Loading

Uh oh!

antimora commented Oct 10, 2025

Uh oh!

antimora commented Oct 10, 2025

Uh oh!

nathanielsimard left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

BurnpackStore #3792

BurnpackStore #3792

Uh oh!

Conversation

antimora commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Template

Checklist

Related Issues/PRs

Changes

Testing

Benchmarks

LOADING Benchmarks (Median Values)

Maximum Memory Allocation During Load

Load Time vs Memory Trade-off

SAVING Benchmarks (Median Values)

Maximum Memory Allocation During Save

Save Time vs Memory Trade-off

Uh oh!

antimora commented Oct 10, 2025

Uh oh!

antimora commented Oct 10, 2025

Uh oh!

nathanielsimard left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

antimora commented Sep 26, 2025 •

edited

Loading