The remake effort aims to serve a few general use cases, and also to yield tools that can serve all of them with maximum alignment with other existing solutions or development. These general use cases are:
- provenance capture of programmatic dataset modifications (i.e., the domain of
datalad run)
- re-execution of provenance records, for the purpose of
- verifying reproducibility (i.e.,
datalad rerun)
- re-applying computational steps on different data (i.e.,
datalad rerun --onto)
- output extraction after execution of (parametric) compute instructions (i.e., "compute for get" special remote)
- depositing compute instructions for "prospective outputs" (never computed/recorded)
A list of more concrete use cases will help to inform both design and presentation (documentation, paper) of the implementation. Here is a (growing) collection for consideration as documentation example, or use case featured prominently in the paper:
- fmriprep: compute large outputs, hash them, an rely on them being bit-identical reproducible to avoid storing them
- provide data in alternative (file) formats (store CSV, provide XLSX on-demand)
- render partial data for specific purposes (produce video clips from source video via a cutlist)
- apply all edits to a RAW photo to render a JPEG on demand
The
remakeeffort aims to serve a few general use cases, and also to yield tools that can serve all of them with maximum alignment with other existing solutions or development. These general use cases are:datalad run)datalad rerun)datalad rerun --onto)A list of more concrete use cases will help to inform both design and presentation (documentation, paper) of the implementation. Here is a (growing) collection for consideration as documentation example, or use case featured prominently in the paper: