Skip to content

Conversation

@dlwh
Copy link
Member

@dlwh dlwh commented Jan 7, 2026

  • Run gcsfuse once on the host at /tmp/gcsfuse_mount
  • Expose mount inside container at /opt/gcsfuse_mount via symlink
  • Avoid Docker bind-mount source-path creation failures
  • Enable allow_other + permissive modes so container can read FUSE mount
  • Thread --verbose through to Ray CLI (ray up/down/attach/exec -v)

Part of putting vllm tpu in docker

  - Run gcsfuse once on the host at /tmp/gcsfuse_mount
  - Expose mount inside container at /opt/gcsfuse_mount via symlink
  - Avoid Docker bind-mount source-path creation failures
  - Enable allow_other + permissive modes so container can read FUSE mount
  - Thread --verbose through to Ray CLI (ray up/down/attach/exec -v)
Copilot AI review requested due to automatic review settings January 7, 2026 06:47
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the GCS mount strategy to use a single host-level gcsfuse mount and propagates verbose flags through Ray CLI commands.

Key changes:

  • Moves gcsfuse mounting from per-container setup to host-level initialization at /tmp/gcsfuse_mount
  • Adds FUSE configuration with allow_other and permissive modes to enable container access
  • Implements verbose flag propagation from cluster.py CLI to Ray commands (up/down/attach/exec)

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
scripts/ray/cluster.py Added _maybe_add_ray_verbose helper function and updated all Ray CLI invocations to support verbose flag passthrough; updated _stop_cluster_internal signature to accept Context
infra/marin-us-west4.yaml Added gcsfuse installation and host-level mount in initialization_commands; replaced direct mount with symlink approach in setup_commands
infra/marin-us-east5.yaml Same gcsfuse changes as us-west4 for consistency across regions
infra/marin-us-east5-a.yaml Same gcsfuse changes as us-west4 for consistency across regions
infra/marin-us-east1.yaml Same gcsfuse changes as us-west4 for consistency across regions
infra/marin-us-central2.yaml Same gcsfuse changes as us-west4 for consistency across regions
infra/marin-us-central2-staging.yaml Same gcsfuse changes as us-west4 for staging environment
infra/marin-us-central1.yaml Same gcsfuse changes as us-west4 for consistency across regions
infra/marin-eu-west4.yaml Same gcsfuse changes as us-west4 for EU region
infra/marin-eu-west4-a.yaml Same gcsfuse changes as us-west4 for EU region
infra/marin-cluster-template.yaml Template file updated with gcsfuse changes to propagate to future cluster configurations
infra/marin-big-run.yaml Same gcsfuse changes as us-west4 for big-run cluster

Comment on lines +96 to +97
- if [ -e /opt/gcsfuse_mount ] && [ ! -L /opt/gcsfuse_mount ]; then sudo rm -rf /opt/gcsfuse_mount; fi
- sudo ln -sfn /tmp/gcsfuse_mount /opt/gcsfuse_mount
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The symlink created at /opt/gcsfuse_mount will not be accessible inside the Docker container because /opt is not mounted into the container. Only /tmp is mounted (line 48). For the mount to be accessible at /opt/gcsfuse_mount inside the container, you need to add a volume mount like "-v /opt:/opt" to both head_run_options and worker_run_options.

Copilot uses AI. Check for mistakes.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lies

lrwxrwxrwx 1 root root 18 Jan  6 22:37 /opt/gcsfuse_mount -> /tmp/gcsfuse_mount
bash: gcsfuse_mount: command not found
a
dedupe
gcsfuse_mount
helmet-data
huggingface-cache
marin-us-central2
medu-models
models
nfliu
nvidia--Llama-Nemotron-Post-Training-Dataset-v1-ed905e6

@rjpower rjpower self-requested a review January 7, 2026 18:04
Copy link
Collaborator

@rjpower rjpower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems fine, not sure why we use /tmp/ outside but /opt inside but meh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants