-
Notifications
You must be signed in to change notification settings - Fork 780
Open
Description
Trying to use managed jobs with consolidation mode, but stuck at Waiting for task to start
:
$ sky jobs launch -c nemo nemorl.sky.yaml --secret HF_TOKEN
YAML to run: nemorl.sky.yaml
Managed job 'nemo' will be launched on (estimated):
Considered resources (2 nodes):
----------------------------------------------------------------------------------------
INFRA INSTANCE vCPUs Mem(GB) GPUS COST ($) CHOSEN
----------------------------------------------------------------------------------------
Kubernetes (xx) - 32 64 H200:1 0.00 ✔
----------------------------------------------------------------------------------------
Launching a managed job 'nemo'. Proceed? [Y/n]:
Launching managed job 'nemo' from jobs controller...
⠧ Waiting for task to start (status: PENDING). It may take a few minutes.
(py310) ➜ ~ sky jobs logs --controller 14
<stays stuck here>
Things I tried that did not work:
- Restarting API server with
sky api stop; sky api start
. export SKYPILOT_ENABLE_GRPC=0
andexport SKYPILOT_ENABLE_GRPC=1
.sky jobs cancel -ay
and try again
Commit: a201b22dc2361fe9be379ba9e5d9aef132272e44
Running on local API server.
Metadata
Metadata
Assignees
Labels
No labels