Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Small bug with the command sky show-gpus #4556

Closed
biarne-a opened this issue Jan 14, 2025 · 2 comments · Fixed by #4558
Closed

Small bug with the command sky show-gpus #4556

biarne-a opened this issue Jan 14, 2025 · 2 comments · Fixed by #4558

Comments

@biarne-a
Copy link

Up to know, the command sky show-gpus was working fine. But this morning I got the following error:
Screenshot 2025-01-14 at 6 30 22 AM

Version & Commit info:

  • sky -v: 1.0.0.dev20250110
  • sky -c: fd1ac0e
@romilbhardwaj
Copy link
Collaborator

Thanks for the report @biarne-a. This may happen if some GPUs in your kubernetes cluster are in a bad state (i.e., they show up under Capacity, but not Allocatable, see NVIDIA/k8s-device-plugin#75). Restarting the Nvidia device plugin on the bad node may fix.

Though I do agree, we should have better logging here instead of failing with this assertion error.

@romilbhardwaj
Copy link
Collaborator

@biarne-a this should be fixed in #4558 - can you give it a try?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants