Skip to content

Conversation

@dkhachyan
Copy link

@dkhachyan dkhachyan commented Dec 19, 2025

Description

This PR adds support for custom transport parameters in smart_open when downloading packages from remote storage systems in Ray's runtime environment. This enables users to customize how packages are downloaded, including authentication methods, SSL settings, and protocol-specific client configurations.

The implementation allows users to specify transport_params in their runtime_env configuration, which are then passed to smart_open during package downloads. For protocols with default configurations (S3, Azure, GCS, ABFSS), custom parameters are merged with defaults, with custom parameters taking precedence.

Related issues

Fixes #46833

Additional information

Implementation Details

  1. Modified download_and_unpack_package() in python/ray/_private/runtime_env/packaging.py to accept transport_params
  2. Updated PyModulesPlugin and WorkingDirPlugin to extract transport_params from runtime_env
  3. Enhanced protocol handlers in python/ray/_private/runtime_env/protocol.py to merge custom parameters with defaults
  4. Added _merge_transport_params() helper method to properly combine default and custom parameters

Usage Examples

GitLab Integration (resolves #46833)

ray.init(
    runtime_env={
        "working_dir": "https://gitlab.example.com/group/project/-/archive/main.zip",
        "config": {
            "transport_params": {
                "headers": {
                    "PRIVATE-TOKEN": "TOKEN",
                },
            },
        },
    }
)

@dkhachyan dkhachyan requested a review from a team as a code owner December 19, 2025 12:23
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds valuable support for custom transport_params in smart_open, allowing for more flexible remote package downloads. The changes are well-structured across the affected files. My review includes a few suggestions for improvement: a critical fix for a potential UnboundLocalError, a suggestion to make parameter merging more robust, a refactoring to reduce code duplication, and a minor simplification. Overall, this is a great addition with the recommended changes.

@ray-gardener ray-gardener bot added core Issues that should be addressed in Ray Core community-contribution Contributed by the community labels Dec 19, 2025
@dkhachyan dkhachyan force-pushed the issue-46833 branch 3 times, most recently from e731f6d to 5e6b63c Compare December 19, 2025 14:40
@dkhachyan dkhachyan force-pushed the issue-46833 branch 2 times, most recently from 60cfad3 to 38afcfc Compare December 22, 2025 08:14

@classmethod
def download_remote_uri(cls, protocol: str, source_uri: str, dest_file: str):
def download_remote_uri(cls, protocol: str, source_uri: str, dest_file: str, transport_params=None):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ABFSS protocol ignores user-provided transport parameters

The open_file function in _handle_abfss_protocol accepts transport_params as a keyword argument but completely ignores it. The function always creates an AzureBlobFileSystem with hardcoded DefaultAzureCredential(), so any custom credentials, account keys, or SAS tokens that users provide via transport_params are silently discarded. This contradicts the PR's stated behavior that custom parameters would be merged and used for ABFSS protocol.

Additional Locations (1)

Fix in Cursor Fix in Web

@dkhachyan dkhachyan force-pushed the issue-46833 branch 3 times, most recently from 6447a3e to e3a62f7 Compare December 22, 2025 14:17
@github-actions
Copy link

github-actions bot commented Jan 6, 2026

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@github-actions github-actions bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Jan 6, 2026
@edoakes
Copy link
Collaborator

edoakes commented Jan 6, 2026

@dkhachyan we have an existing "config" field that contains some configuration options related to runtime_env setup: https://docs.ray.io/en/latest/ray-core/api/doc/ray.runtime_env.RuntimeEnvConfig.html#ray.runtime_env.RuntimeEnvConfig. It's passed as a separate field in the runtime_env. I would suggest that we next "transport_options" as a subkey under "config" to avoid adding more fields to the global namespace.

@github-actions github-actions bot added unstale A PR that has been marked unstale. It will not get marked stale again if this label is on it. and removed stale The issue is stale. It will be closed within 7 days unless there are further conversation labels Jan 6, 2026
@dkhachyan
Copy link
Author

we have an existing "config" field that contains some configuration options related to runtime_env setup: https://docs.ray.io/en/latest/ray-core/api/doc/ray.runtime_env.RuntimeEnvConfig.html#ray.runtime_env.RuntimeEnvConfig. It's passed as a separate field in the runtime_env. I would suggest that we next "transport_options" as a subkey under "config" to avoid adding more fields to the global namespace.

Thanks for the feedback! I’ll update the code accordingly.

Copy link
Collaborator

@edoakes edoakes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the contribution

@edoakes edoakes added the go add ONLY when ready to merge, run all tests label Jan 7, 2026
@edoakes
Copy link
Collaborator

edoakes commented Jan 7, 2026

@dkhachyan there is a merge conflict and I've kicked off the premerge CI: https://buildkite.com/ray-project/premerge/builds/57069

Ping me once CI is passing to merge

elif protocol == "https":
open_file, tp = cls._handle_https_protocol()
open_file, default_tp = cls._handle_https_protocol()
tp = cls._merge_transport_params(default_tp, transport_params)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HTTPS headers lost when custom headers provided

Medium Severity

When custom headers are passed via transport_params for HTTPS, the default User-Agent and Accept headers are completely replaced rather than merged. The _handle_https_protocol() returns None for default transport params, so _merge_transport_params(None, custom_params) just returns the custom params unchanged. Then inside open_file, params.update(transport_params) does a shallow update that replaces the entire headers dict. This breaks the documented GitLab use case where users add authentication headers but lose the default headers that some servers may require.

Additional Locations (1)

Fix in Cursor Fix in Web

Signed-off-by: Denis Khachyan <[email protected]>
Signed-off-by: Denis Khachyan <[email protected]>
Signed-off-by: Denis Khachyan <[email protected]>
@dkhachyan dkhachyan marked this pull request as draft January 9, 2026 14:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community core Issues that should be addressed in Ray Core go add ONLY when ready to merge, run all tests unstale A PR that has been marked unstale. It will not get marked stale again if this label is on it.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Ray CORE] Gitlab integration with authentication

2 participants