Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Future codecs #40

Closed
manzt opened this issue Mar 1, 2020 · 9 comments
Closed

Future codecs #40

manzt opened this issue Mar 1, 2020 · 9 comments

Comments

@manzt
Copy link
Collaborator

manzt commented Mar 1, 2020

Related to #1. Currently we support zlib and gzip thanks to pako, but we will want to support more codecs in the future (especially BLOSC if/when that reaches the web). I'm curious what your stance on bundling more third party libraries for decoding/encoding chunks.

I'm not sure tree-shaking will work since the decoders needed for an HTTPStore won't be known until runtime. I would be in favor of creating something akin to numcodecs used in zarr and then offering some API to register a codec with zarr.js.

import { openArray } from 'zarr';
import { Zlib } from 'zarr-codecs'; 

const config = { 
    store: "http://localhost:8000/",
    path: "my_data.zarr",
    mode: "r",
    codec: Zlib,
};

z = await openArray(config);

I think we should include the default codec (zlib/gzip for zarr.js currently but BLOSC in the future) since it will be the most user friendly. We could keep track of what decoders are supported by zarr-codecs in zarr and give helpful warnings if someone needs to import a codec from zarr-codecs.

@manzt
Copy link
Collaborator Author

manzt commented Mar 1, 2020

This setup might also expose an API to allow advanced users to do more bespoke things with their codecs, like configuring web-workers to perform decoding and encoding #33.

@gzuidhof
Copy link
Owner

gzuidhof commented Mar 3, 2020

Sorry for the late response, I've been traveling the past weeks.

I agree that the codecs should live in a different repository as we add stuff like BLOSC, especially as they may grow in size quite a bit (could be a decently sized blob of WASM). I think we can follow the same interface as in Python numcodecs, it doesn't get much simpler :).

@gzuidhof
Copy link
Owner

gzuidhof commented Mar 3, 2020

It will also be useful outside of zarr, perhaps a name without zarr in it better reflects that?

@manzt
Copy link
Collaborator Author

manzt commented Mar 3, 2020

I think we can follow the same interface as in Python numcodecs, it doesn't get much simpler :).

Agreed.

It will also be useful outside of zarr, perhaps a name without zarr in it better reflects that?

Want to use use the same name, numcodecs, on npm?

@gzuidhof
Copy link
Owner

gzuidhof commented Mar 3, 2020

Yes that sounds good :), feel free to create a repo if you're happy to own it.

I think this interface is pretty much the same as in numcodecs (maybe the return type should be uint8array though?). The stuff in the compression folder could be starting point

@manzt
Copy link
Collaborator Author

manzt commented Mar 3, 2020

working on it :)

@kylebarron
Copy link
Contributor

I'm not sure tree-shaking will work since the decoders needed for an HTTPStore won't be known until runtime

In this case, you could consider dynamically loading the Blosc decompressor on the fly from a CDN, (e.g. https://unpkg.com/[email protected]/dist/blosc.cjs) only if a HTTPStore requires it. Then the minimum bundle size of zarr.js would be considerably smaller.

@manzt
Copy link
Collaborator Author

manzt commented Sep 20, 2020

Thanks for the suggestion, @kylebarron. We've had a much more extensive conversation about future codecs in the numcodecs repo, where we've discussed something similar to what you've mentioned above: manzt/numcodecs.js#2

The decision to include blosc currently is out of convenience since it is the default in zarr-python. Ideally, the codec registry would be completely dynamic and configurable by the end-user. This is why numcodecs is setup the way that it is, as separate code-split modules. I just haven't had the bandwidth to work on this further.

I'd love to let a bundler take care of code-splitting the final modules for a users' application, it's just a matter of having this be ergonomic.

const registry = new Map();
// let bundler do code splitting, so users can self-host compression modules 
registry.add('blosc', () => import('numcodecs/blosc'));
registry.add('zlib', () => import('numcodecs/zlib'))
registry.add('gzip', () => import('numcodecs/gzip'))

@manzt
Copy link
Collaborator Author

manzt commented Nov 25, 2020

#63

@manzt manzt closed this as completed Nov 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants