Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Initiative] Repackage existing record files in the Block Stream. #328

Open
8 of 20 tasks
jsync-swirlds opened this issue Oct 31, 2024 · 0 comments
Open
8 of 20 tasks
Assignees
Labels
Epic Phase 1 Issue/PR related to Phase 1 Phase 2 Issue/PR related to Phase 2 Phase 3 Issue/PR related to Phase 3

Comments

@jsync-swirlds
Copy link
Contributor

jsync-swirlds commented Oct 31, 2024

Epic Goal

Repackage record file data in a series of uncompressed ZIP files containing ZStd compressed block files. Each block contains a single record file, its sidecars, and all of its signature files. These block files must be stored in cloud storage buckets, and Mirror Node must be able to read the contained data and verify the contents.

Required Actions

Create a tool to convert existing record files to block files

  • Read a record file
  • (Optional) Split the record file into one entry per round (may not be technically possible).
  • Read and verify all available signature files
  • wrap the record file data and all signatures in a RecordFile BlockItem
  • Wrap the BlockItem in a Block.
  • Write the Block to disk, using zstd compression
  • Verify the written block
  • Repeat for all record files (roughly 160 million)

Run the conversion tool on groups of record files.

  • Launch a VM in GCP or AWS local to the record file archive(s).
  • Run the conversion tool on A group of 100,000 record files.
  • Create an uncompressed ZIP file for each group
  • Upload the new ZIP file to a new storage bucket

Work with Mirror Node team to read the ZIP files

  • Read a single ZIP from cloud storage
  • extract block files
  • Uncompress a single block file using ZStd decompression
  • Create an input stage in the Mirror Node import to unpack the block file to retrieve the original record file, sidecar file(s), and signature files.
  • Verify that the existing mirror node import process can consume the data extracted from the "Block" file.
@jsync-swirlds jsync-swirlds added Phase 1 Issue/PR related to Phase 1 Phase 3 Issue/PR related to Phase 3 Phase 2 Issue/PR related to Phase 2 labels Feb 3, 2025
@jsync-swirlds jsync-swirlds changed the title EPIC: Repackage existing record files in the Block Stream. [Initiative] Repackage existing record files in the Block Stream. Feb 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Epic Phase 1 Issue/PR related to Phase 1 Phase 2 Issue/PR related to Phase 2 Phase 3 Issue/PR related to Phase 3
Projects
None yet
Development

No branches or pull requests

2 participants