You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The challenge is that the datalad run record is a combination of three things that are recognized as separate entities in the CWL world:
workflow/command line tool specification
workflow inputs
workflow execution provenance
Following the cwltool documentation, the first two can be linked to form a single execution specification:
positional arguments:
cwl_document
path or URL to a CWL Workflow, CommandLineTool, or ExpressionTool.
If the `inputs_object` has a `cwl:tool` field indicating
the path or URL to the cwl_document, then the `cwl_document`
argument is optional.
inputs_object
path or URL to a YAML or JSON formatted description of the required input
values for the given `cwl_document`.
Does this have all information from a datalad run-record?
cmd is comprehensively captured in the workflow declaration
inputs in the workflow inputs, much more detailed. Also in prov.out/workflow/primary-job.json (careful with absolute file:// URL)
outputs see prov.out/workflow/primary-output.json
dsid is absent, CWL has no concept of this. a related "associated with dataset" property an be defined easily. But with Design datalad remake-provision #12 the dsid could even become an explicit workflow parameter
exit is not recorded verbatim, but CWL allows for labeling exit codes into success, temporary failure and permanent failure. Although this reduced information, it is also more flexible (not every non-zero is a problem), and also enables decision-making
pwd in the prov output everything is recoded to match the organization of the bagit, which includes its own data hashtree.
So not everything is readily available in the right format, but missing bits can be added easily.
Going with the bagit as main/only output format seems unnecessarily complex. With a datalad dataset we can capture most/all info without taking apart the dataset worktree.
The text was updated successfully, but these errors were encountered:
Here is the target spec: https://www.commonwl.org/v1.2/CommandLineTool.html
The source is much simpler: http://docs.datalad.org/en/stable/design/provenance_capture.html#the-provenance-record
The challenge is that the datalad run record is a combination of three things that are recognized as separate entities in the CWL world:
Following the
cwltool
documentation, the first two can be linked to form a single execution specification:Here is a demo of that:
cp.cwl.yaml
cp.inputs.yaml
This can be executed as one instruction set
Now this can be taken a step further. With cwlprov https://github.com/common-workflow-language/cwlprov we can have an instant PROV record as a BagIt
Leading to
Does this have all information from a datalad run-record?
cmd
is comprehensively captured in the workflow declarationinputs
in the workflow inputs, much more detailed. Also inprov.out/workflow/primary-job.json
(careful with absolutefile://
URL)outputs
seeprov.out/workflow/primary-output.json
dsid
is absent, CWL has no concept of this. a related "associated with dataset" property an be defined easily. But with Designdatalad remake-provision
#12 thedsid
could even become an explicit workflow parameterexit
is not recorded verbatim, but CWL allows for labeling exit codes into success, temporary failure and permanent failure. Although this reduced information, it is also more flexible (not every non-zero is a problem), and also enables decision-makingpwd
in the prov output everything is recoded to match the organization of the bagit, which includes its own data hashtree.So not everything is readily available in the right format, but missing bits can be added easily.
Going with the bagit as main/only output format seems unnecessarily complex. With a datalad dataset we can capture most/all info without taking apart the dataset worktree.
The text was updated successfully, but these errors were encountered: