Skip to content

workspace: expose workspace choice to users #545

@tiborsimko

Description

@tiborsimko

Now that we have an option to use several different POSIX workspaces where to run workflows, the users should be able to configure where they would like to run their given workflow. E.g. one workflow in the default place, another workflow in their EOS home, etc.

This configuration should be done in reana.yaml.

Option 1: introduce new top-level section

We can introduce a new section in reana.yaml to express the concept of workspace. Pros: instead of just writing the POSIX path, we could store more information there, should we need it in the future. Also, the concept of workspace will stand out clearly. Cons: we would need to amend parsing and REST API protocols due to having new section.

An example of how this could look like:

version: 0.6.0
inputs:
  files:
    - code/gendata.C
    - code/fitdata.C
  parameters:
    events: 20000
    data: results/data.root
    plot: results/plot.png
workflow:
  type: serial
  specification:
    steps:
      - name: gendata
        environment: 'reanahub/reana-env-root6:6.18.04'
        commands:
        - mkdir -p results && root -b -q 'code/gendata.C(${events},"${data}")'
      - name: fitdata
        environment: 'reanahub/reana-env-root6:6.18.04'
        commands:
        - root -b -q 'code/fitdata.C("${data}","${plot}")'
workspace:
  type: posix
  workspace_root_dir: /eos/home-s/simko/myworkflows
outputs:
  files:
    - results/plot.png

A future option could be:

workspace:
  type: s3
  workspace_root_dir: s3://mybucket/myworkflows

Option 2: use existing options clause

We have an option of not changing reana.yaml and simply use existing clauses, such as parameters or options. Parameters, such as temperature=20c and mass=10g, influence the research results, whilst options, such as cache=off, keep the physics results and only influence how the workflow is orchestrated. From this point of view, a choice of workspace is more an option than a parameter, since a good reproducible analysis should not depend on where it is run.
Hence we could choose options. Pros: we only add some parameter, REST API could use existing vehicle. Cons: conceptually the notion of workspace would not stand out so clearly, the workspace configuration would be "hidden" amongst other options. Also, options can be set via CLI options (e.g. reana-client start -o foo=bar) but this cannot be done for workspace, since it must be initialised before.

Example:

version: 0.6.0
inputs:
  files:
    - code/gendata.C
    - code/fitdata.C
  parameters:
    events: 20000
    data: results/data.root
    plot: results/plot.png
  options:
    workspace_root_prefix: /eos/home-s/simko/myworkflows
workflow:
  type: serial
  specification:
    steps:
      - name: gendata
        environment: 'reanahub/reana-env-root6:6.18.04'
        commands:
        - mkdir -p results && root -b -q 'code/gendata.C(${events},"${data}")'
      - name: fitdata
        environment: 'reanahub/reana-env-root6:6.18.04'
        commands:
        - root -b -q 'code/fitdata.C("${data}","${plot}")'
workspace:
  type: eos
  workspace_root_dir: /eos/home-s/simko/myworkflows
outputs:
  files:
    - results/plot.png

A future option could be:

  options:
    workspace_root_prefix: s3://mybucket/myworkflows

(The type is inferred from the beginning of the value. Or, if need be, more strings would be added, such as workspace_type: s3. This is basically "flattened" option 1 expressed via options clause.)

Notes

Regardless of which option we shall choose, there is a certain default that should be used in case the user does not set anything. This default will be set by the cluster administrator, but this will be part of another issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions