EKS Support #40

aldmbmtl · 2023-07-24T03:26:10Z

Hello!

I am trying to get this to work on EKS. Sadly the device plugin doesn't seem to see the GPU. I am using g4ad's and I made a custom AMI running the latest version of the AMD GPU Pro drivers that I could get from Amazon (20.20). When scaling from zero, the cluster autoscaler isn't detecting the resource "amd.com/gpu: 1", but I don't think that will solve this other issue.

When I launch a node and then deploy the device plugin, the pod still won't be scheduled to the node. Any idea as to why?

I0724 03:19:09.793086       1 main.go:305] ./k8s-device-plugin version v1.18.1-21-g2e5bbc7
I0724 03:19:09.793089       1 main.go:305] hwloc: _VERSION: 2.9.1, _API_VERSION: 0x00020800, _COMPONENT_ABI: 7, Runtime: 0x00020800
I0724 03:19:09.793105       1 manager.go:42] Starting device plugin manager
I0724 03:19:09.793108       1 manager.go:46] Registering for system signal notifications
I0724 03:19:09.793346       1 manager.go:52] Registering for notifications of filesystem changes in device plugin directory
I0724 03:19:09.793400       1 manager.go:60] Starting Discovery on new plugins
I0724 03:19:09.793416       1 manager.go:66] Handling incoming signals```

This is the log from the device plugin manager. I assume I should be seeing something else? We would love to get off of Nvidia for our containerized workstations, but this has been blocking us. I assume it is because AWS doesn't seem to want to support Radeon :disappointed: 

 Thanks!

The text was updated successfully, but these errors were encountered:

PierreJiji · 2024-10-11T19:03:58Z

Is there any update on this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EKS Support #40

EKS Support #40

aldmbmtl commented Jul 24, 2023 •

edited

Loading

PierreJiji commented Oct 11, 2024 •

edited

Loading

EKS Support #40

EKS Support #40

Comments

aldmbmtl commented Jul 24, 2023 • edited Loading

PierreJiji commented Oct 11, 2024 • edited Loading

aldmbmtl commented Jul 24, 2023 •

edited

Loading

PierreJiji commented Oct 11, 2024 •

edited

Loading