Skip to content

Features : Optional_Files, jobTimeout and LogStream to metadata #55

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

geertvandeweyer
Copy link
Collaborator

@geertvandeweyer geertvandeweyer commented Nov 15, 2024

Optional Files
As raised by Peter Thomas on slack :

Added support for optional files in the AWS handling of (de)localization and caching:

  • support for "File?"
  • support for "Array[File?] in manual assignment
  • support for "Array[File?]" in globbing

Tested for input and output of individual tasks and workflows:

  • optional files are allowed to be missing during (de)localization on both S3 and EFS (mentioned to log)
  • optional files are considered valid in cache-hash calculation (if missing in original, and still missing => valid hit) (silent)

Note: Mixed types are not supported and cast to mandatory files:

  • an array of [File_1, File_2] where _1 is optional and _2 is not, will be "non-optional" completely.

jobTimeout

It is now possible to specify a maximal job runtime (walltime) for jobs on the AWS backend. If the time is exceeded, the job is teminated. Use this to kill hanging jobs (seen in R multicore processing in our case). Added to README for documentation

LogStream

The CloudWatch LogGroupName, LogStreamName and the AWS region the job was executed in, are now added to the task call metadata. Query cromwell for metadata to retrieve it. These logstreams also contain info on (de)-localization, in contstrast to stdout/stderr from the metadata.

** fix for globbing without foler prefix

See issue 46. Globbing now functions as expcected.

** Run jobs in privileged mode

Added option to enable fuse in AWS/Batch jobs. This allows to install & use tools like mount-s3 to "locally" access buckets instead of localizing. Usefull when extracting minor sections of big files, by tools not able to handle s3-urls as input.

** Optional Localization

(Re)-enabled support for the optional_localization flag in the WDL (see here).

extra : minor optimization on hashing through EFS/MD5 files : if considered invalid, return a random string instead of the same message each time. This forces a cache-break.

For testing : see release 87.1-AWS in my own fork : https://github.com/geertvandeweyer/cromwell/releases/tag/87.1-AWS

This was referenced Nov 15, 2024
@geertvandeweyer geertvandeweyer linked an issue Nov 22, 2024 that may be closed by this pull request
@geertvandeweyer geertvandeweyer changed the title Feature/Optional_Files Features : Optional_Files, jobTimeout and LogStream to metadata Nov 28, 2024
@geertvandeweyer
Copy link
Collaborator Author

testing shows that : Array[File?] in call caching caused cache hit missing. => fix this before approving

@geertvandeweyer
Copy link
Collaborator Author

PR passed my testing. Ready for review & merging.

@geertvandeweyer
Copy link
Collaborator Author

Passed all functional tests in : cromwell-testing

Ready for merging

@geertvandeweyer geertvandeweyer merged commit c5c074b into henriqueribeiro:develop_aws Mar 13, 2025
0 of 6 checks passed
geertvandeweyer added a commit that referenced this pull request Mar 13, 2025
…#58)

* support for Array[File?] as job input/output

* support for JobTimeout directive

* expose logStreamName, logStreamGroup and Region to metadata

* fix for optional files in cache_copy strategy

* Added FuseMount option to runtime attributes, (re-)enabled optional localization of input files

* fix globbing issues without directory prefix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Missing optional Output Files
1 participant