Skip to content

Commit

Permalink
[k8s] Fix AssertionError in show-gpus when all GPUs were allocated (#…
Browse files Browse the repository at this point in the history
…4558)

* Fix total_accelerators_available

* lint
  • Loading branch information
romilbhardwaj authored Jan 14, 2025
1 parent 5a52e85 commit 837d51b
Showing 1 changed file with 14 additions and 0 deletions.
14 changes: 14 additions & 0 deletions sky/clouds/service_catalog/kubernetes_catalog.py
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,16 @@ def _list_accelerators(
If the user does not have sufficient permissions to list pods in all
namespaces, the function will return free GPUs as -1.
Returns:
A tuple of three dictionaries:
- qtys_map: Dict mapping accelerator names to lists of InstanceTypeInfo
objects with quantity information.
- total_accelerators_capacity: Dict mapping accelerator names to their
total capacity in the cluster.
- total_accelerators_available: Dict mapping accelerator names to their
current availability. Returns -1 for each accelerator if
realtime=False or if insufficient permissions.
"""
# TODO(romilb): This should be refactored to use get_kubernetes_node_info()
# function from kubernetes_utils.
Expand Down Expand Up @@ -243,6 +253,10 @@ def _list_accelerators(

accelerators_available = accelerator_count - allocated_qty

# Initialize the entry if it doesn't exist yet
if accelerator_name not in total_accelerators_available:
total_accelerators_available[accelerator_name] = 0

if accelerators_available >= min_quantity_filter:
quantized_availability = min_quantity_filter * (
accelerators_available // min_quantity_filter)
Expand Down

0 comments on commit 837d51b

Please sign in to comment.