-
Notifications
You must be signed in to change notification settings - Fork 13
Description
Hello,
I encountered a trouble which prevented me from running ML Jobs.
Description
When submitting an ML Job, the V2 submission path (_do_submit_job_v2, which calls SYSTEM$EXECUTE_ML_JOB) incorrectly constructs the command-line arguments for the job container. It prepends the job's stage path to what should be absolute paths within the container, causing the job to fail at startup.
Observed Behavior
The arguments passed to the container entrypoint script are malformed. For example, a path that should be an absolute path inside the container, such as: /mnt/job_stage/system/mljob_launcher.py
is incorrectly transformed into a stage path like: @payload_stage/MLJOB_.../\\/mnt/job_stage/system/mljob_launcher.py
This invalid path causes the container to fail to find and execute the launcher script.
Root Cause
The issue is located in the list comprehension that builds the args list within the snowflake.ml.jobs.manager._do_submit_job_v2 function:
# manager.py
def _do_submit_job_v2(...):
# ...
args = [
(payload.stage_path.joinpath(v).as_posix() if isinstance(v, PurePath) else v) for v in payload.entrypoint
] + (args or [])
# ...The payload.entrypoint list can contain pathlib.PurePath objects that represent absolute paths within the container's filesystem (e.g., PurePath('/mnt/job_stage/system/mljob_launcher.py')).
The current logic incorrectly assumes that any PurePath object is a relative path that needs to be joined with payload.stage_path. This results in the erroneous concatenation of the stage path and the container-local absolute path.
Proposed Solution
The fix is to modify the list comprehension to simply convert PurePath objects to their string representation without prepending the stage path. The path in payload.entrypoint is already the correct path to be used inside the container.
The line should be changed from:
(payload.stage_path.joinpath(v).as_posix() if isinstance(v, PurePath) else v) for v in payload.entrypointto:
(v.as_posix() if isinstance(v, PurePath) else v) for v in payload.entrypointThis ensures that absolute paths within the container are preserved correctly in the final command arguments.
(additional information) My environment
- Windows 11
- Python 3.10.19
- snowflake-ml-python 1.19.0