-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Issue]: Unable to Update ( 1.25.2.7 → 1.25.2.8 ) #76
Comments
Can you try to get the crash log with |
|
ah ok... sorry I misunderstood what you had before. This is weird... let me look into it. |
Can you help narrow down the start of the issue a bit? i.e. do you see the same issue with 1.25.2.4 and .5? (I don't have a Talos setup to reproduce and I am able to use the plugin tip of tree.) |
Sorry i thought i tested v1.25.2.4, but i suspect i didn't allow enough time for flux to update the commit. I have now tried the following tags Thanks for looking into this. Hopefully, this helps narrow down the issue to the latest changes! |
Problem Description
I have 3 nodes, all the same hardware spec. Running kubernetes on Talos, deployed amd-device-plugin using helm chart and demonset. On tag v1.25.2.3 everything works, each node has access to the iGPU and can be assigned to a pod.
kubectl -n kube-system get pods -o wide
When i attempt to upgrade to any tag greater than 1.25.2.3. amd-device-plugin fails to deploy on node 3. From what I can tell the image is detecting the wrong system architect?
kubectl -n kube-system get pods -o wide
kubectl describe pod amd-device-plugin-6h7tt -n kube-system
kubectl -n kube-system logs amd-device-plugin-6h7tt -f
talosctl dmesg -n black-knight-02 | grep -i amdgpu
Operating System
Talos v1.8.0
CPU
AMD 6850U CPU with Radeon Graphics
GPU
AMD Radeon VII
ROCm Version
ROCm 6.2.0
ROCm Component
No response
Steps to Reproduce
Upgrade docker.io/rocm/k8s-device-plugin ( 1.25.2.3 → 1.25.2.8).
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
Additional Information
kubectl get nodes -o wide
kubectl version
kubectl get no -o json | jq ".items[].metadata.labels"
kubectl get nodes -o=jsonpath='{.items[*].status.nodeInfo.architecture}'
The text was updated successfully, but these errors were encountered: