Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EKS Support #40

Open
aldmbmtl opened this issue Jul 24, 2023 · 1 comment
Open

EKS Support #40

aldmbmtl opened this issue Jul 24, 2023 · 1 comment

Comments

@aldmbmtl
Copy link

aldmbmtl commented Jul 24, 2023

Hello!

I am trying to get this to work on EKS. Sadly the device plugin doesn't seem to see the GPU. I am using g4ad's and I made a custom AMI running the latest version of the AMD GPU Pro drivers that I could get from Amazon (20.20). When scaling from zero, the cluster autoscaler isn't detecting the resource "amd.com/gpu: 1", but I don't think that will solve this other issue.

When I launch a node and then deploy the device plugin, the pod still won't be scheduled to the node. Any idea as to why?

I0724 03:19:09.793086       1 main.go:305] ./k8s-device-plugin version v1.18.1-21-g2e5bbc7
I0724 03:19:09.793089       1 main.go:305] hwloc: _VERSION: 2.9.1, _API_VERSION: 0x00020800, _COMPONENT_ABI: 7, Runtime: 0x00020800
I0724 03:19:09.793105       1 manager.go:42] Starting device plugin manager
I0724 03:19:09.793108       1 manager.go:46] Registering for system signal notifications
I0724 03:19:09.793346       1 manager.go:52] Registering for notifications of filesystem changes in device plugin directory
I0724 03:19:09.793400       1 manager.go:60] Starting Discovery on new plugins
I0724 03:19:09.793416       1 manager.go:66] Handling incoming signals```

This is the log from the device plugin manager. I assume I should be seeing something else? We would love to get off of Nvidia for our containerized workstations, but this has been blocking us. I assume it is because AWS doesn't seem to want to support Radeon :disappointed: 

 Thanks!
@PierreJiji
Copy link

PierreJiji commented Oct 11, 2024

Is there any update on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants