-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IDEA: upload remote files #666
Comments
I realized this may not be the proper place to ask. Shall I open this issue for |
Hi @agy-why, this repository is a perfect location for this issue, there is no need to move it. Currently, we don't support remote storage services in the above suggested way. What is possible is that the researchers can express remote file access needs by special stage-in and stage-out steps in their computational workflow graphs. That is, the first step of the workflow would be the download of inputs from S3, and the last step of the workflow would be uploads of results back to S3. For a live example, please see EOS stage-out example in the documentation: https://docs.reana.io/advanced-usage/storage-backends/eos/ We support virtually any external storage system where we can use Kerberos authentication or VOMS proxy authentication mechanisms. Examples include EOS or WLCG sites. Note also that we are in the middle of adding support for Rucio, see reanahub/reana-auth-rucio#1 That said, we have been planning to support remote file syntax sugar in a rather similar way as you suggested. We thought of allowing a syntax like: inputs:
files:
- s3(`mybucket`, `myfile.csv`) REANA would then do an automatic stage-in and stage-out for this file. One advantage is that researchers wouldn't have to write explicit data staging steps in their DAG workflows. This is a bit similar to Snakemake support for remote storage, see https://snakemake.readthedocs.io/en/stable/snakefiles/remote_files.html and the examples therein for AWS or S3 in Snakemake rules. We hope to start working on similar remote file storage support syntax sugar sometime this winter. |
P.S. Another related idea I should note that we have been thinking about is to add support for popular protocols so that REANA workspace could be manipulated via tools such as rclone. This might simplify initial stage-in upload and final stage-out download, especially when using many files or when using very large files. |
Dear Tibor, thank you for your clear and detailed response. My personal use-case would be to have a single workflow that could work with various data origins: my dev-data are on a server that I can access via scp, my prod-data are on a private s3 infrastructure but they may move to another one (not necessarily s3) after publication of the results. Therefore I would found useful to be able to specify not only the source but also the protocol to access the data outside of the workflow (git repo). Currently, I need to implement two variants in my first step (get_data) to get the data in the work space, which I can chose via input parameters. It is fully acceptable that way, but I'd greatly appreciate the rclone feature you suggested. This would allow me to plug-in / plug-out input data to the same workflow by populating my workspace accordingly. |
An alternative would be to be able to mix workflows together, I don't know how far this is possible. I have:
It would an acceptable solution for me to be able to propagate the WorkSpace of one of the Is this already possible? I thank you in advance. |
Dear developers,
I have a question regarding the
reana-client upload
feature.Is it already possible or is it planned to execute something like:
And if yes which services are you currently supporting? scp, ftp, s3, google cloud, webdav,...
I thank you in advance,
Yori
The text was updated successfully, but these errors were encountered: