Skip to content

Resolvable PIDs for icechunk repos? #1035

@NathanCummings

Description

@NathanCummings

I just want to see if anyone has any ideas or experience around this. I have a separate metadata catalogue of my zarr (soon to be icechunk) repositories to make it easier to find data. Right now, the metadata database has a field for the url pointing to the object, but I guess I may want to at some point migrate the data to some other object store. It would be nice if I had resolvable PIDs in that field instead, so it doesn't break for people if I move the data.

With DOIs, it seems to be expected that the DOI would resolve to a landing page, that might have a link to download the 'file', but it feels like it would be better to have something that resolves directly to the object (or collection of objects) in s3, so users can:

  1. Query the metadata catalogue with our API
  2. Read the url field which contains a PID
  3. Put that directly in to icechunk.s3_storage()

I guess then you'd need some way to parse out the endpoint_url, bucket, and prefix so icechunk knew what to do with it. What would be even cooler would be if you could utilise versionable PIDs and have the PID somehow resolve to specific tags of your icechunk repo... but maybe I'm getting carried away at this point.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions