-
Notifications
You must be signed in to change notification settings - Fork 233
Modify process_local_sources to support mount paths #1635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Modify process_local_sources to support mount paths #1635
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
EDIT: additional changes and testing requested below, so please do not merge in the meantime.
1580f40 to
391b2da
Compare
| def process_data_sources(data, check_dir_traversal=False): | ||
| """Process and validate data sources.""" | ||
| cwd = os.getcwd() | ||
| os.getcwd() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only returns cwd. If cwd is unused, line also can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry right. removed of course
| return LocalDataSource(source_name, rel_path, base_path=Path(".")) | ||
| else: | ||
| raise ValueError(f"Invalid path for local data source '{source_name}': {path}.") | ||
| def process_local_sources(local_datasources, datasources, check_dir_traversal=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is also supposed to be staticmethod?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, right.
| # to the mount path inside the container. | ||
| # This way, we only need to adjust the base_dir to point to the mount path. | ||
| absolute_paths = { | ||
| source_name: os.path.realpath(params.get("path", None)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
realpath would raise an exception if path is None. If we know params will always have path, better to set it as params['path'] IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, fixed.
| "This file should contain a JSON object with the data sources to be registered. For local " | ||
| "data source, 'type' is 'fs', and 'params' must include: 'path'. For 's3' type, 'params' " | ||
| "must include: 'uri', 'access_key_env_name', 'secret_key_env_name', 'secret_name', and " | ||
| "optionally 'endpoint'. For azure_blob, 'type' is 'ab', and 'params' must include: " | ||
| "'connection_string', 'container_name', and optionally 'folder_prefix'." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion would be to point to OpenFL documentation URL that describes data sources in detail with examples, in addition to the format of JSON files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. @teoparvanov what doc file should we insert that info to?
| with open(datasources_json_path, "r", encoding="utf-8") as file: | ||
| data = file.read() | ||
| vds = DataSourcesJsonParser.parse(data) | ||
| vds = DataSourcesJsonParser.parse(data, check_dir_traversal=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead, can we check_dir_traversal before calling this function? I don't think the class needs to know about whether it is in a valid subdirectory...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. That means we go over the json one more time to look for the local data sources. Makes sense? currently happens only in calchash
|
Hi @Efrat1, after addressing @MasterSkepticista's comments, I'd request that you additionally test those changes by running a containerized OpenFL federation locally. |
d8751d7 to
de73c55
Compare
Signed-off-by: Dar, Efrat <[email protected]>
de73c55 to
465447c
Compare
Summary
Modify
process_local_sourcesto support mount pathsType of Change (Mandatory)
Description (Mandatory)
Use common
base_dirfor all local data sources, andsource_paths relative to that base.The reason is to simplify path management in containerized environments, such as Docker. By using a common
base_dir, we can ensure that paths remain consistent when mounting volumes, as only thebase_dirneeds to be adjusted to point to the mount path inside the container. This way, we only need to adjust thebase_dirto point to the mount path.Testing
Run full experiment in OpenFL and in OpenFL-Security.