Skip to content

Collect use cases for datalad-remake (and underlying tooling) #3

@mih

Description

@mih

The remake effort aims to serve a few general use cases, and also to yield tools that can serve all of them with maximum alignment with other existing solutions or development. These general use cases are:

  • provenance capture of programmatic dataset modifications (i.e., the domain of datalad run)
  • re-execution of provenance records, for the purpose of
    • verifying reproducibility (i.e., datalad rerun)
    • re-applying computational steps on different data (i.e., datalad rerun --onto)
  • output extraction after execution of (parametric) compute instructions (i.e., "compute for get" special remote)
  • depositing compute instructions for "prospective outputs" (never computed/recorded)

A list of more concrete use cases will help to inform both design and presentation (documentation, paper) of the implementation. Here is a (growing) collection for consideration as documentation example, or use case featured prominently in the paper:

  • fmriprep: compute large outputs, hash them, an rely on them being bit-identical reproducible to avoid storing them
  • provide data in alternative (file) formats (store CSV, provide XLSX on-demand)
  • render partial data for specific purposes (produce video clips from source video via a cutlist)
  • apply all edits to a RAW photo to render a JPEG on demand

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions