-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Talos Support #379
Comments
Hey, thanks for the interest! We've been kicking this around for a bit and I filed an internal JIRA to move the identifier to the Kubernetes control-plane instead. I've had some heated conversations with Andrew from the Talos project and I'm not 100% sure moving the identifier to Kubernetes will solve all our problems. If you are an existing HPE customer or a prospect, you should work with your account team and mention this requirement. That is the fastest route. |
I don't think moving the ID to the control plane would solve all the problems, but it's a start. Maybe at least making it possible to set the /etc/hpe-storage mount path so we can specify Talos' ephemeral environment? It's possible with Kustomize but that's an extra step. I do plan on talking with our account rep but wanted to get it on the board here. |
Internal JIRA is CON-1838. |
Hi, are there any news about support for Talos? |
It did not make it into the 2.5.0 release. I was going to do some research on it but it got delayed. |
I'm glad to hear that it is actively being pursued at least. I will likely be deploying a new cluster in the relatively near future and it would be nice to be able to start with Talos |
I try not to be the one pinging for updates all the time. But I need to start deploying a bare metal Kubernetes cluster soon and I'm in a bit of a planning pickle. I'd really like to just start with Talos but can't because of the need to use Nimble for PVs. I can start with a kubeadm cluster and later migrate to Talos, but that would mean putting a bunch of effort into setting up deployment workflows that may just be abandoned shortly after. So I'm not sure how much effort I should invest in automation vs just rolling by hand for now, or using an alternative storage. I can understand 2.5 is out of the picture, it looks like there're already betas for that. So is this planned to be included in 2.6, which based on previous release cadence we may see before EOY or perhaps a 2.5.x release? Or is this planned for a longer timeframe like next year. Just trying to get an idea to help with planning. |
It's hard for me to gauge when we can get to a stage to support Talos and immutable nodes in general. It's very high on my list but I rarely get my way when large deals are on the table demanding feature X, Y and Z. Also, full disclosure, we have not even scoped the next minor or patch release as we're neck deep stabilizing 2.5.0. I'll make a note and try to get it in for consideration in the next couple of releases. If you want to email me directly at michael.mattsson at hpe.com with your company name and business relationship with HPE it will make it easier for me to talk to product management. |
I don't have a Talos environment readily available and skimming through the docs I realize I need firewall rules or deploy a new deployment environment for Talos itself. As a quick hack, can you tell me how far you get with this?
|
It looks like it is still mounting Node YAMLapiVersion: v1
kind: Pod
metadata:
creationTimestamp: "2024-06-21T18:34:59Z"
generateName: hpe-csi-node-
labels:
app: hpe-csi-node
controller-revision-hash: 6cc9c89c6b
pod-template-generation: "1"
role: hpe-csi
name: hpe-csi-node-tsvkz
namespace: hpe-storage
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: DaemonSet
name: hpe-csi-node
uid: 280184d8-2211-44a8-9829-4d182242cb65
resourceVersion: "7099"
uid: 29d72260-ce51-4e52-8050-f975d54eacbc
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchFields:
- key: metadata.name
operator: In
values:
- talos-nvj-4af
containers:
- args:
- --csi-address=$(ADDRESS)
- --kubelet-registration-path=$(DRIVER_REG_SOCK_PATH)
- --v=5
env:
- name: ADDRESS
value: /csi/csi.sock
- name: DRIVER_REG_SOCK_PATH
value: /var/lib/kubelet/plugins/csi.hpe.com/csi.sock
- name: KUBE_NODE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
image: registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.10.1
imagePullPolicy: IfNotPresent
name: csi-node-driver-registrar
resources:
limits:
cpu: "2"
memory: 1Gi
requests:
cpu: 100m
memory: 128Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /csi
name: plugin-dir
- mountPath: /registration
name: registration-dir
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-lw749
readOnly: true
- args:
- --endpoint=$(CSI_ENDPOINT)
- --node-service
- --flavor=kubernetes
- --node-monitor
- --node-monitor-interval=30
env:
- name: CSI_ENDPOINT
value: unix:///csi/csi.sock
- name: LOG_LEVEL
value: info
- name: NODE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
- name: DISABLE_NODE_CONFIGURATION
value: "true"
- name: KUBELET_ROOT_DIR
value: /var/lib/kubelet
image: quay.io/hpestorage/csi-driver:v2.5.0-beta
imagePullPolicy: IfNotPresent
name: hpe-csi-driver
resources:
limits:
cpu: "2"
memory: 1Gi
requests:
cpu: 100m
memory: 128Mi
securityContext:
allowPrivilegeEscalation: true
capabilities:
add:
- SYS_ADMIN
privileged: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /csi
name: plugin-dir
- mountPath: /var/lib/kubelet
mountPropagation: Bidirectional
name: pods-mount-dir
- mountPath: /host
mountPropagation: Bidirectional
name: root-dir
- mountPath: /dev
name: device-dir
- mountPath: /var/log
name: log-dir
- mountPath: /etc/hpe-storage
name: etc-hpe-storage-dir
- mountPath: /etc/kubernetes
name: etc-kubernetes
- mountPath: /sys
name: sys
- mountPath: /run/systemd
name: runsystemd
- mountPath: /etc/systemd/system
name: etcsystemd
- mountPath: /opt/hpe-storage/nimbletune/config.json
name: linux-config-file
subPath: config.json
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-lw749
readOnly: true
dnsConfig:
options:
- name: ndots
value: "1"
dnsPolicy: ClusterFirstWithHostNet
enableServiceLinks: true
hostNetwork: true
initContainers:
- args:
- --node-init
- --endpoint=$(CSI_ENDPOINT)
- --flavor=kubernetes
env:
- name: CSI_ENDPOINT
value: unix:///csi/csi.sock
image: quay.io/hpestorage/csi-driver:v2.5.0-beta
imagePullPolicy: IfNotPresent
name: hpe-csi-node-init
resources:
limits:
cpu: "2"
memory: 1Gi
requests:
cpu: 100m
memory: 128Mi
securityContext:
allowPrivilegeEscalation: true
capabilities:
add:
- SYS_ADMIN
privileged: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /host
mountPropagation: Bidirectional
name: root-dir
- mountPath: /dev
name: device-dir
- mountPath: /sys
name: sys
- mountPath: /etc/hpe-storage
name: etc-hpe-storage-dir
- mountPath: /run/systemd
name: runsystemd
- mountPath: /etc/systemd/system
name: etcsystemd
- mountPath: /csi
name: plugin-dir
- mountPath: /var/lib/kubelet
name: pods-mount-dir
- mountPath: /var/log
name: log-dir
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-lw749
readOnly: true
nodeName: talos-nvj-4af
preemptionPolicy: PreemptLowerPriority
priority: 2000001000
priorityClassName: system-node-critical
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: hpe-csi-node-sa
serviceAccountName: hpe-csi-node-sa
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoSchedule
key: csi.hpe.com/hpe-nfs
operator: Exists
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/disk-pressure
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/memory-pressure
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/pid-pressure
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/unschedulable
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/network-unavailable
operator: Exists
volumes:
- hostPath:
path: /var/lib/kubelet/plugins_registry
type: Directory
name: registration-dir
- hostPath:
path: /var/lib/kubelet/plugins/csi.hpe.com
type: DirectoryOrCreate
name: plugin-dir
- hostPath:
path: /var/lib/kubelet
type: ""
name: pods-mount-dir
- hostPath:
path: /
type: ""
name: root-dir
- hostPath:
path: /dev
type: ""
name: device-dir
- hostPath:
path: /var/log
type: ""
name: log-dir
- hostPath:
path: /etc/hpe-storage
type: ""
name: etc-hpe-storage-dir
- hostPath:
path: /etc/kubernetes
type: ""
name: etc-kubernetes
- hostPath:
path: /run/systemd
type: ""
name: runsystemd
- hostPath:
path: /etc/systemd/system
type: ""
name: etcsystemd
- hostPath:
path: /sys
type: ""
name: sys
- configMap:
defaultMode: 420
name: hpe-linux-config
name: linux-config-file
- name: kube-api-access-lw749
projected:
defaultMode: 420
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
items:
- key: ca.crt
path: ca.crt
name: kube-root-ca.crt
- downwardAPI:
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
path: namespace
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2024-06-21T18:35:00Z"
status: "True"
type: PodReadyToStartContainers
- lastProbeTime: null
lastTransitionTime: "2024-06-21T18:34:59Z"
message: 'containers with incomplete status: [hpe-csi-node-init]'
reason: ContainersNotInitialized
status: "False"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2024-06-21T18:34:59Z"
message: 'containers with unready status: [csi-node-driver-registrar hpe-csi-driver]'
reason: ContainersNotReady
status: "False"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2024-06-21T18:34:59Z"
message: 'containers with unready status: [csi-node-driver-registrar hpe-csi-driver]'
reason: ContainersNotReady
status: "False"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2024-06-21T18:34:59Z"
status: "True"
type: PodScheduled
containerStatuses:
- image: registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.10.1
imageID: ""
lastState: {}
name: csi-node-driver-registrar
ready: false
restartCount: 0
started: false
state:
waiting:
reason: PodInitializing
- image: quay.io/hpestorage/csi-driver:v2.5.0-beta
imageID: ""
lastState: {}
name: hpe-csi-driver
ready: false
restartCount: 0
started: false
state:
waiting:
reason: PodInitializing
hostIP: 10.100.155.236
hostIPs:
- ip: 10.100.155.236
initContainerStatuses:
- image: quay.io/hpestorage/csi-driver:v2.5.0-beta
imageID: ""
lastState: {}
name: hpe-csi-node-init
ready: false
restartCount: 0
started: false
state:
waiting:
message: 'failed to generate container "d1bfa53cdae544c0b62c5d36c001fc2f7270357ac5bcf01691257ae999dbc058"
spec: failed to generate spec: failed to mkdir "/etc/hpe-storage": mkdir
/etc/hpe-storage: read-only file system'
reason: CreateContainerError
phase: Pending
podIP: 10.100.155.236
podIPs:
- ip: 10.100.155.236
qosClass: Burstable
startTime: "2024-06-21T18:34:59Z" Controller YAMLapiVersion: v1
kind: Pod
metadata:
creationTimestamp: "2024-06-21T18:42:38Z"
generateName: hpe-csi-controller-574bc6ccf9-
labels:
app: hpe-csi-controller
pod-template-hash: 574bc6ccf9
role: hpe-csi
name: hpe-csi-controller-574bc6ccf9-bzpb5
namespace: hpe-storage
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: ReplicaSet
name: hpe-csi-controller-574bc6ccf9
uid: 5b8c38be-808a-4293-b80a-b7780843bc8b
resourceVersion: "7487"
uid: 727b54dc-e0b6-4281-b6a4-4cbd297a592f
spec:
containers:
- args:
- --csi-address=$(ADDRESS)
- --v=5
- --extra-create-metadata
- --timeout=30s
- --worker-threads=16
- --feature-gates=Topology=true
- --immediate-topology=false
env:
- name: ADDRESS
value: /var/lib/csi/sockets/pluginproxy/csi.sock
image: registry.k8s.io/sig-storage/csi-provisioner:v4.0.1
imagePullPolicy: IfNotPresent
name: csi-provisioner
resources:
limits:
cpu: "2"
memory: 1Gi
requests:
cpu: 100m
memory: 128Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/lib/csi/sockets/pluginproxy
name: socket-dir
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-djh9g
readOnly: true
- args:
- --v=5
- --csi-address=$(ADDRESS)
env:
- name: ADDRESS
value: /var/lib/csi/sockets/pluginproxy/csi.sock
image: registry.k8s.io/sig-storage/csi-attacher:v4.5.1
imagePullPolicy: IfNotPresent
name: csi-attacher
resources:
limits:
cpu: "2"
memory: 1Gi
requests:
cpu: 100m
memory: 128Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/lib/csi/sockets/pluginproxy
name: socket-dir
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-djh9g
readOnly: true
- args:
- --v=5
- --csi-address=$(ADDRESS)
env:
- name: ADDRESS
value: /var/lib/csi/sockets/pluginproxy/csi.sock
image: registry.k8s.io/sig-storage/csi-snapshotter:v7.0.2
imagePullPolicy: IfNotPresent
name: csi-snapshotter
resources:
limits:
cpu: "2"
memory: 1Gi
requests:
cpu: 100m
memory: 128Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/lib/csi/sockets/pluginproxy/
name: socket-dir
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-djh9g
readOnly: true
- args:
- --csi-address=$(ADDRESS)
- --v=5
env:
- name: ADDRESS
value: /var/lib/csi/sockets/pluginproxy/csi.sock
image: registry.k8s.io/sig-storage/csi-resizer:v1.10.1
imagePullPolicy: IfNotPresent
name: csi-resizer
resources:
limits:
cpu: "2"
memory: 1Gi
requests:
cpu: 100m
memory: 128Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/lib/csi/sockets/pluginproxy
name: socket-dir
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-djh9g
readOnly: true
- args:
- --endpoint=$(CSI_ENDPOINT)
- --flavor=kubernetes
- --pod-monitor
- --pod-monitor-interval=30
env:
- name: CSI_ENDPOINT
value: unix:///var/lib/csi/sockets/pluginproxy/csi.sock
- name: LOG_LEVEL
value: info
image: quay.io/hpestorage/csi-driver:v2.5.0-beta
imagePullPolicy: IfNotPresent
name: hpe-csi-driver
resources:
limits:
cpu: "2"
memory: 1Gi
requests:
cpu: 100m
memory: 128Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/lib/csi/sockets/pluginproxy
name: socket-dir
- mountPath: /var/log
name: log-dir
- mountPath: /etc/kubernetes
name: k8s
- mountPath: /etc/hpe-storage
name: hpeconfig
- mountPath: /host
name: root-dir
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-djh9g
readOnly: true
- args:
- --v=5
- --csi-address=$(ADDRESS)
env:
- name: ADDRESS
value: /var/lib/csi/sockets/pluginproxy/csi-extensions.sock
image: quay.io/hpestorage/volume-mutator:v1.3.6-beta
imagePullPolicy: IfNotPresent
name: csi-volume-mutator
resources:
limits:
cpu: "2"
memory: 1Gi
requests:
cpu: 100m
memory: 128Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/lib/csi/sockets/pluginproxy/
name: socket-dir
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-djh9g
readOnly: true
- args:
- --v=5
- --csi-address=$(ADDRESS)
env:
- name: ADDRESS
value: /var/lib/csi/sockets/pluginproxy/csi-extensions.sock
image: quay.io/hpestorage/volume-group-snapshotter:v1.0.6-beta
imagePullPolicy: IfNotPresent
name: csi-volume-group-snapshotter
resources:
limits:
cpu: "2"
memory: 1Gi
requests:
cpu: 100m
memory: 128Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/lib/csi/sockets/pluginproxy/
name: socket-dir
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-djh9g
readOnly: true
- args:
- --v=5
- --csi-address=$(ADDRESS)
env:
- name: ADDRESS
value: /var/lib/csi/sockets/pluginproxy/csi-extensions.sock
image: quay.io/hpestorage/volume-group-provisioner:v1.0.6-beta
imagePullPolicy: IfNotPresent
name: csi-volume-group-provisioner
resources:
limits:
cpu: "2"
memory: 1Gi
requests:
cpu: 100m
memory: 128Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/lib/csi/sockets/pluginproxy/
name: socket-dir
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-djh9g
readOnly: true
- args:
- --v=5
- --endpoint=$(CSI_ENDPOINT)
env:
- name: CSI_ENDPOINT
value: unix:///var/lib/csi/sockets/pluginproxy/csi-extensions.sock
- name: LOG_LEVEL
value: info
image: quay.io/hpestorage/csi-extensions:v1.2.7-beta
imagePullPolicy: IfNotPresent
name: csi-extensions
resources:
limits:
cpu: "2"
memory: 1Gi
requests:
cpu: 100m
memory: 128Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/lib/csi/sockets/pluginproxy/
name: socket-dir
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-djh9g
readOnly: true
dnsConfig:
options:
- name: ndots
value: "1"
dnsPolicy: ClusterFirstWithHostNet
enableServiceLinks: true
hostNetwork: true
nodeName: talos-nvj-4af
preemptionPolicy: PreemptLowerPriority
priority: 2000000000
priorityClassName: system-cluster-critical
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: hpe-csi-controller-sa
serviceAccountName: hpe-csi-controller-sa
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- emptyDir: {}
name: socket-dir
- hostPath:
path: /var/log
type: ""
name: log-dir
- hostPath:
path: /etc/kubernetes
type: ""
name: k8s
- hostPath:
path: /etc/hpe-storage
type: ""
name: hpeconfig
- hostPath:
path: /
type: ""
name: root-dir
- name: kube-api-access-djh9g
projected:
defaultMode: 420
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
items:
- key: ca.crt
path: ca.crt
name: kube-root-ca.crt
- downwardAPI:
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
path: namespace
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2024-06-21T18:42:41Z"
status: "True"
type: PodReadyToStartContainers
- lastProbeTime: null
lastTransitionTime: "2024-06-21T18:42:39Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2024-06-21T18:42:39Z"
message: 'containers with unready status: [hpe-csi-driver]'
reason: ContainersNotReady
status: "False"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2024-06-21T18:42:39Z"
message: 'containers with unready status: [hpe-csi-driver]'
reason: ContainersNotReady
status: "False"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2024-06-21T18:42:39Z"
status: "True"
type: PodScheduled
containerStatuses:
- containerID: containerd://69d077a9414b9f622fffc550c68bf651c4ede0fc41ef85279347c363049f4f54
image: registry.k8s.io/sig-storage/csi-attacher:v4.5.1
imageID: registry.k8s.io/sig-storage/csi-attacher@sha256:9dcd469f02bbb7592ad61b0f848ec242f9ea2102187a0cd8407df33c2d633e9c
lastState:
terminated:
containerID: containerd://dd726dccda8c6a3774e1e96060d9b1529dfebbed83667ea76e6fd85c0b995b0b
exitCode: 1
finishedAt: "2024-06-21T18:43:41Z"
reason: Error
startedAt: "2024-06-21T18:43:10Z"
name: csi-attacher
ready: true
restartCount: 2
started: true
state:
running:
startedAt: "2024-06-21T18:43:56Z"
- containerID: containerd://ab469a79652fd7d894e15f93528d2d92a03fa867c80f818c2575eee3ce530652
image: quay.io/hpestorage/csi-extensions:v1.2.7-beta
imageID: quay.io/hpestorage/csi-extensions@sha256:106637da1dad32a0ffda17f3110f5d396cc6b03ed2af63b4c5260c8ed02b1314
lastState: {}
name: csi-extensions
ready: true
restartCount: 0
started: true
state:
running:
startedAt: "2024-06-21T18:42:40Z"
- containerID: containerd://b0a981a751c5143813a4db0b53bb2c2243312136e543a4d65b952dc61b84f5c1
image: registry.k8s.io/sig-storage/csi-provisioner:v4.0.1
imageID: registry.k8s.io/sig-storage/csi-provisioner@sha256:bf5a235b67d8aea00f5b8ec24d384a2480e1017d5458d8a63b361e9eeb1608a9
lastState:
terminated:
containerID: containerd://443bb47ec3cb0e9093342377d75f7812422e7f62bf4d0ce5d22757c42052dc15
exitCode: 1
finishedAt: "2024-06-21T18:43:40Z"
reason: Error
startedAt: "2024-06-21T18:43:10Z"
name: csi-provisioner
ready: true
restartCount: 2
started: true
state:
running:
startedAt: "2024-06-21T18:43:56Z"
- containerID: containerd://bbaafc9a87726f13ca40e7b7f1973e4473dd8ce94a78c9a67ce05b7205e88553
image: registry.k8s.io/sig-storage/csi-resizer:v1.10.1
imageID: registry.k8s.io/sig-storage/csi-resizer@sha256:4ecda2818f6d88a8f217babd459fdac31588f85581aa95ac7092bb0471ff8541
lastState:
terminated:
containerID: containerd://9d299091598a0a53213d7a92321f4ef5fc9fff1d1f88beba87b62fe35c7b7639
exitCode: 1
finishedAt: "2024-06-21T18:43:41Z"
reason: Error
startedAt: "2024-06-21T18:43:11Z"
name: csi-resizer
ready: true
restartCount: 2
started: true
state:
running:
startedAt: "2024-06-21T18:43:56Z"
- containerID: containerd://fe753c0bc4d762a861c59f5d557c4152e6bf85bb5495fb336e3e8a8ce57bf5e4
image: registry.k8s.io/sig-storage/csi-snapshotter:v7.0.2
imageID: registry.k8s.io/sig-storage/csi-snapshotter@sha256:c4b6b02737bc24906fcce57fe6626d1a36cb2b91baa971af2a5e5a919093c34e
lastState:
terminated:
containerID: containerd://ec7b4e064f648cfd70c882b81a601db820d1eaf483f30867bcaaf93347d26879
exitCode: 1
finishedAt: "2024-06-21T18:43:41Z"
reason: Error
startedAt: "2024-06-21T18:43:11Z"
name: csi-snapshotter
ready: true
restartCount: 2
started: true
state:
running:
startedAt: "2024-06-21T18:43:56Z"
- containerID: containerd://1eb157a1a0fe1ebf3bc26ac8a6d7ee1a729fc1e1f7b04edc78664ea1294ceff0
image: quay.io/hpestorage/volume-group-provisioner:v1.0.6-beta
imageID: quay.io/hpestorage/volume-group-provisioner@sha256:8d1ee0f752271148c019bc6ff2db53fdbfb56dfce3ede2e8f1549952becfeb05
lastState: {}
name: csi-volume-group-provisioner
ready: true
restartCount: 0
started: true
state:
running:
startedAt: "2024-06-21T18:42:40Z"
- containerID: containerd://42fec0266f3669000d461c690cc2c0fd74e7d8a5c0f0093a5b591c82fc3b6612
image: quay.io/hpestorage/volume-group-snapshotter:v1.0.6-beta
imageID: quay.io/hpestorage/volume-group-snapshotter@sha256:9be38de0f93f6b4ce7d0456eaabf5da3890b094a89a7b811852d31fbaf76c79c
lastState: {}
name: csi-volume-group-snapshotter
ready: true
restartCount: 0
started: true
state:
running:
startedAt: "2024-06-21T18:42:40Z"
- containerID: containerd://ae0dce20062d444aa8a124fe753bcc200c1b8008a3a4ef800e7b4500fc73b861
image: quay.io/hpestorage/volume-mutator:v1.3.6-beta
imageID: quay.io/hpestorage/volume-mutator@sha256:247153bb789805c272b76fd8018ccd0f8bf4eabded5d4baf362d8a2c162b8672
lastState: {}
name: csi-volume-mutator
ready: true
restartCount: 0
started: true
state:
running:
startedAt: "2024-06-21T18:42:40Z"
- image: quay.io/hpestorage/csi-driver:v2.5.0-beta
imageID: ""
lastState: {}
name: hpe-csi-driver
ready: false
restartCount: 0
started: false
state:
waiting:
message: 'failed to generate container "ee110797b0f68f31aa64c448b04f663590359bc4181a08be4f764f4dd599941f"
spec: failed to generate spec: failed to mkdir "/etc/hpe-storage": mkdir
/etc/hpe-storage: read-only file system'
reason: CreateContainerError
hostIP: 10.100.155.236
hostIPs:
- ip: 10.100.155.236
phase: Pending
podIP: 10.100.155.236
podIPs:
- ip: 10.100.155.236
qosClass: Burstable
startTime: "2024-06-21T18:42:39Z" |
Ok, I had a brain fart, try now.
|
Getting closer, the controller started fine but the hpe-csi-node daemonset pod is still trying to mount
|
I did ensure
|
Ok, I here's the next one.
|
The pod starts but the initContainer immediately crashes
|
I looks like the spec:
initContainers:
- args:
- --node-init
- --endpoint=$(CSI_ENDPOINT)
- --flavor=kubernetes
env:
- name: CSI_ENDPOINT
value: unix:///csi/csi.sock
image: quay.io/hpestorage/csi-driver:v2.5.0-beta
imagePullPolicy: IfNotPresent
name: hpe-csi-node-init
resources:
limits:
cpu: "2"
memory: 1Gi
requests:
cpu: 100m
memory: 128Mi
securityContext:
allowPrivilegeEscalation: true
capabilities:
add:
- SYS_ADMIN
privileged: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /host
mountPropagation: Bidirectional
name: root-dir
- mountPath: /dev
name: device-dir
- mountPath: /sys
name: sys
- mountPath: /run/systemd
name: runsystemd
- mountPath: /csi
name: plugin-dir
- mountPath: /var/lib/kubelet
name: pods-mount-dir
- mountPath: /var/log
name: log-dir
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-xsr7w
readOnly: true |
This is very interesting, I think you just uncovered a different bug altogether. =) |
Ok, talos4 has been published.
|
I edited the DS to add the environment variable and used your latest update. The initContainer succeeds now but then I think we get to meat of the situation the csi-node-driver-registrar starts crashing and the hpe-csi-driver container complains it can't find initiators. It looks like part of the problem is on the Talos Side their iscsi-tools extension doesn't appear to include the multipath command siderolabs/extensions#134. Though democratic-csi claims that it's not needed I'm not an expert in iSCSI so I can't say how true democratic-csi/democratic-csi#225 (comment) Container logs
|
Not sure how much help it is, but looking at your code it looks like perhaps the main issue is you're looking for the Looks to me like they need to mount the full |
EUREKA! I found it they do mount the
obviously that will break when that pod restarts. But I then created a storage class and a PVC and it worked right away
and I can see the volume on the array. The last step mounting it is the one remaining issue. It does not successfully mount the volume. That appears to actually be related to the multipath I mentioned. I'm signing off for the weekend I'll look more on Monday |
I should've researched this but is As for commands the CSI driver needs to have availabe, look for clues here: https://github.com/hpe-storage/csi-driver/blob/master/Dockerfile As for the multipath issue is that the HPE CSI Driver require multipath/multipathd on the host, there's no workaround as we don't even consider non-multipath entries. I'm out of pocket of the rest of the weekend as well, cheers! |
I've exhausted the time I can work on this for now. But this is what I found messing around some more. Hopefully it can help you get on the correct path, but it certainly looks like it's going to require more work than just changing the mount location. It does give me a little better idea on my planning though, I'll probably need to plan on a longer time for support. It looks like However because the extensions are not persistent across reboots things like the Initiator name are not consistent, a new one is generated on every boot. Because of this I don't think it's a good idea to try and persist your node ID on disk like we discussed earlier. Either that should be generated dynamically or use the Kubernetes Node ID and store extra persistent data in a configmap or crd. In my opinion this is more in line with the general idea of Kubernetes anyway and cattle vs pets workflows. Overall I see two maybe three major problems. One will require changes from Talos, the other will require work on your driver
#!/bin/bash
iscsi_pid=$(pgrep -f "iscsid -f")
nsenter --mount="/proc/$iscsi_pid/ns/mnt" --net="/proc/$iscsi_pid/ns/net" -- /usr/local/sbin/iscsiadm "${@:1}"
|
Thanks for the additional context. This definitely needs more work. I'm just puzzled how we can't even persist an IQN on the host though? Do we need to grab the first boot one and store in our CRD and regenerate the host IQN from that? I guess FC wouldn't have as many problems but we still would need multipath/multipathd regardless. Not having ext4 available will also create problems for our NFS server implementation for RWX claims that doesn't play nicely with XFS in failure scenarios. |
Just my thoughts I can think of two ways to deal with it
Looking around at other CSI iSCSI implementations it looks like many of them use their own |
we also need the feature of HPE driver integration with talos, please prioritize this one, thanks |
Funny that I just bumped into this. I'm not a HP customer, but I have some similar issues working on a storage array from a competitor. The CSI driver I was using also needed the multipath tools. So I ended up creating my own Talos extension for this. Currently it's just internally running where I'm at, but I did have plans to try to get it upstream. I had some conversations with the Talos guys and they were receptive to it. My initial extension was just based on binaries I grabbed from Alpine and stuck into an extension image, but they want to be able to integrate a build from source into their build system. So I have some cleaning up to do, and some work to get that build running within their build system. I don't think the build should be too complicated since building in Alpine vs building in Talos should be similar since they are both musl based distros. Anyways, I guess my advice to you guys is maybe just check every once in a while upstream to see if that multipath extension shows up. I know your issues extend beyond multipath, but getting that core piece in there benefits everyone who needs this for fiber channel storage and their CSI drivers. Here's a thread over on the Talos repo for reference: siderolabs/extensions#505 |
Thanks for bumping this. We are dependent on multipath tools and don't have resources to work on upstream projects. We still have some work to do if multipath tools become available for Talos but at least we have the foundation. We'll monitor the conversation. |
I ended up having to abandon Talos for now so I haven't been keeping an eye on it. From that conversation it does look like at least they now generate a persistent iqn which was one of the hitches. Getting multipath on the and it may be doable to use the hpe driver |
Talos is becoming more popular but currently the csi-driver doesn't work with it. If we need to do manual configuration of things like the iscsi and multipath we can do that by pushing values/files in the machine config. But the biggest hitch to me appears to be the requirement to create and mount
/etc/hpe-storage
on the host. That works on CoreOS but does not on Talos because basically the whole system is RO.From what I can see that mount is needed to store a unique id for the node, couldn't you use the already existing unique ide and store specific data in ConfigMaps.
The text was updated successfully, but these errors were encountered: