Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define API for recording/setting compute instructions in dataset #4

Open
mih opened this issue Apr 29, 2024 · 1 comment
Open

Define API for recording/setting compute instructions in dataset #4

mih opened this issue Apr 29, 2024 · 1 comment

Comments

@mih
Copy link
Member

mih commented Apr 29, 2024

From a user POV, we want to present a compute-on-demand like a download-on-demand, and wrap everything into a git-annex special remote. This means that we are bound to that protocol, which translates to an API that has the request-this-key as the main entrypoint.

So at the start of an operation, we only know which key is requested. Therefore the instruction on computing a key needs to be (discoverably) recording in association with a particular key.

Three established patterns for storing key-based information are known:

  • URL-encoded parameter list via an added "availability URL", as done in https://github.com/matrss/datalad-getexec
  • recording a key state via GET/SETSTATE in the special remote protocol
  • key-value storage in git-annex metadata

Challenges:

  • Not all computations are request-one-key-compute-one-key, but one computation can produce more than one key
  • GET/SETSTATE is not directly exposed to a user-facing API
  • git-annex metadata cannot handle multi-value metadata (list of values per metadata key) -- maybe use a dedicated top-level metadata key, with JSON-encoded value

Candidate solutions:

@mih mih transferred this issue from another repository May 2, 2024
@github-project-automation github-project-automation bot moved this to discussion needed in DataLad remake May 2, 2024
@christian-monch
Copy link
Contributor

In a first implementation https://github.com/christian-monch/datalad-compute, a POC that will turn in an MVP, the first option for key-based information storage was chosen, i.e. "URL-encoded parameter list via an added "availability URL", as done in https://github.com/matrss/datalad-getexec"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: discussion needed
Development

No branches or pull requests

2 participants