You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to get this to work on EKS. Sadly the device plugin doesn't seem to see the GPU. I am using g4ad's and I made a custom AMI running the latest version of the AMD GPU Pro drivers that I could get from Amazon (20.20). When scaling from zero, the cluster autoscaler isn't detecting the resource "amd.com/gpu: 1", but I don't think that will solve this other issue.
When I launch a node and then deploy the device plugin, the pod still won't be scheduled to the node. Any idea as to why?
I0724 03:19:09.793086 1 main.go:305] ./k8s-device-plugin version v1.18.1-21-g2e5bbc7
I0724 03:19:09.793089 1 main.go:305] hwloc: _VERSION: 2.9.1, _API_VERSION: 0x00020800, _COMPONENT_ABI: 7, Runtime: 0x00020800
I0724 03:19:09.793105 1 manager.go:42] Starting device plugin manager
I0724 03:19:09.793108 1 manager.go:46] Registering for system signal notifications
I0724 03:19:09.793346 1 manager.go:52] Registering for notifications of filesystem changes in device plugin directory
I0724 03:19:09.793400 1 manager.go:60] Starting Discovery on new plugins
I0724 03:19:09.793416 1 manager.go:66] Handling incoming signals```
This is the log from the device plugin manager. I assume I should be seeing something else? We would love to get off of Nvidia for our containerized workstations, but this has been blocking us. I assume it is because AWS doesn't seem to want to support Radeon :disappointed:
Thanks!
The text was updated successfully, but these errors were encountered:
Hello!
I am trying to get this to work on EKS. Sadly the device plugin doesn't seem to see the GPU. I am using g4ad's and I made a custom AMI running the latest version of the AMD GPU Pro drivers that I could get from Amazon (20.20). When scaling from zero, the cluster autoscaler isn't detecting the resource "amd.com/gpu: 1", but I don't think that will solve this other issue.
When I launch a node and then deploy the device plugin, the pod still won't be scheduled to the node. Any idea as to why?
The text was updated successfully, but these errors were encountered: