Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WACZ reading / streaming #16

Open
matteocargnelutti opened this issue Mar 6, 2023 · 2 comments
Open

WACZ reading / streaming #16

matteocargnelutti opened this issue Mar 6, 2023 · 2 comments

Comments

@matteocargnelutti
Copy link
Collaborator

(Suggested by @ikreymer)

Add a command and associated API for reading and streaming the contents WACZ files, either locally or remotely.

See: https://www.npmjs.com/package/unzipit

@matteocargnelutti
Copy link
Collaborator Author

(Suggested by @rebeccacremona)

This feature should allow for specialized / simplified extractions such as:

  • Extracting the CDX of a WACZ
  • Extracting the datapackage signature
  • Looking up for a specific record in the underlying WARC(s)
  • etc...

@ikreymer
Copy link
Collaborator

ikreymer commented May 4, 2023

Previously also prototyped a very simple zip (not wacz) loader that can stream a file from zip, mostly using it myself, but recently put it up here: https://github.com/ikreymer/loadzip/blob/main/index.js

Agreed WACZ-specific semantics would be very useful to have as well!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants