-
Notifications
You must be signed in to change notification settings - Fork 7.1k
[Runtime Env] Add support for custom transport parameters in smart_open #59579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds valuable support for custom transport_params in smart_open, allowing for more flexible remote package downloads. The changes are well-structured across the affected files. My review includes a few suggestions for improvement: a critical fix for a potential UnboundLocalError, a suggestion to make parameter merging more robust, a refactoring to reduce code duplication, and a minor simplification. Overall, this is a great addition with the recommended changes.
e731f6d to
5e6b63c
Compare
60cfad3 to
38afcfc
Compare
|
|
||
| @classmethod | ||
| def download_remote_uri(cls, protocol: str, source_uri: str, dest_file: str): | ||
| def download_remote_uri(cls, protocol: str, source_uri: str, dest_file: str, transport_params=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ABFSS protocol ignores user-provided transport parameters
The open_file function in _handle_abfss_protocol accepts transport_params as a keyword argument but completely ignores it. The function always creates an AzureBlobFileSystem with hardcoded DefaultAzureCredential(), so any custom credentials, account keys, or SAS tokens that users provide via transport_params are silently discarded. This contradicts the PR's stated behavior that custom parameters would be merged and used for ABFSS protocol.
Additional Locations (1)
2340b3e to
83d7e30
Compare
6447a3e to
e3a62f7
Compare
|
This pull request has been automatically marked as stale because it has not had You can always ask for help on our discussion forum or Ray's public slack channel. If you'd like to keep this open, just leave any comment, and the stale label will be removed. |
|
@dkhachyan we have an existing "config" field that contains some configuration options related to runtime_env setup: https://docs.ray.io/en/latest/ray-core/api/doc/ray.runtime_env.RuntimeEnvConfig.html#ray.runtime_env.RuntimeEnvConfig. It's passed as a separate field in the runtime_env. I would suggest that we next "transport_options" as a subkey under "config" to avoid adding more fields to the global namespace. |
Thanks for the feedback! I’ll update the code accordingly. |
edoakes
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for the contribution
|
@dkhachyan there is a merge conflict and I've kicked off the premerge CI: https://buildkite.com/ray-project/premerge/builds/57069 Ping me once CI is passing to merge |
Signed-off-by: Denis Khachyan <[email protected]>
Signed-off-by: Denis Khachyan <[email protected]>
| elif protocol == "https": | ||
| open_file, tp = cls._handle_https_protocol() | ||
| open_file, default_tp = cls._handle_https_protocol() | ||
| tp = cls._merge_transport_params(default_tp, transport_params) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HTTPS headers lost when custom headers provided
Medium Severity
When custom headers are passed via transport_params for HTTPS, the default User-Agent and Accept headers are completely replaced rather than merged. The _handle_https_protocol() returns None for default transport params, so _merge_transport_params(None, custom_params) just returns the custom params unchanged. Then inside open_file, params.update(transport_params) does a shallow update that replaces the entire headers dict. This breaks the documented GitLab use case where users add authentication headers but lose the default headers that some servers may require.
Additional Locations (1)
Signed-off-by: Denis Khachyan <[email protected]>
Signed-off-by: Denis Khachyan <[email protected]>
Signed-off-by: Denis Khachyan <[email protected]>
Description
This PR adds support for custom transport parameters in smart_open when downloading packages from remote storage systems in Ray's runtime environment. This enables users to customize how packages are downloaded, including authentication methods, SSL settings, and protocol-specific client configurations.
The implementation allows users to specify transport_params in their runtime_env configuration, which are then passed to smart_open during package downloads. For protocols with default configurations (S3, Azure, GCS, ABFSS), custom parameters are merged with defaults, with custom parameters taking precedence.
Related issues
Fixes #46833
Additional information
Implementation Details
download_and_unpack_package()inpython/ray/_private/runtime_env/packaging.pyto accept transport_paramspython/ray/_private/runtime_env/protocol.pyto merge custom parameters with defaults_merge_transport_params()helper method to properly combine default and custom parametersUsage Examples
GitLab Integration (resolves #46833)