Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can i use CSI with an existing LINSTOR Cluster, managed outside Kubernetes? #492

Open
lomonosov77 opened this issue Jun 22, 2023 · 14 comments

Comments

@lomonosov77
Copy link

Hello! I have k8s cluster and i want to use Linstor CSI Plugin with an existing LINSTOR Cluster, managed outside Kubernetes.
I've used manifest - https://github.com/piraeusdatastore/linstor-csi/blob/master/examples/k8s/deploy/linstor-csi-1.19.yaml , but KUBE_NODE_NAME value equal name of k8s node and it's not the same in my case.
Who knows how to set it up, help me, please!

@lomonosov77 lomonosov77 changed the title How can i use CSI with an existing LINSTOR Cluster, managed outside Kubernetes How can i use CSI with an existing LINSTOR Cluster, managed outside Kubernetes? Jun 22, 2023
@WanzenBug
Copy link
Member

That will be difficult. Why are your host names different between kubernetes and the host OS?

@lomonosov77
Copy link
Author

linstor is installed on a different server and is not managed by the k8s cluster. These are 2 different hardware.

@WanzenBug
Copy link
Member

You still need at least a satellite running on the kubernetes workers. You can use the operator to connect to your existing cluster: https://github.com/piraeusdatastore/piraeus-operator/blob/v2/docs/how-to/external-controller.md

@lomonosov77
Copy link
Author

I tried that. And how do I add linstor satellites that are outside of the Kubernetes cluster?

@WanzenBug
Copy link
Member

Well, you can start with setting up the Controller and external Satellites, including registering the satellites with linstor node create.

Then, you can use the Operator to connect to the existing cluster. The Operator will register the satellites it control in the Kubernetes cluster.

@lomonosov77
Copy link
Author

The whole problem is that the satelite pod won't start. The problem is in drbd-shutdown-guard.
Here is the container log

_**

2023/07/20 16:20:12 Running drbd-shutdown-guard version v1.0.0
2023/07/20 16:20:12 Creating service directory '/run/drbd-shutdown-guard'
2023/07/20 16:20:12 Copying drbdsetup to service directory
2023/07/20 16:20:12 Copying drbd-shutdown-guard to service directory
2023/07/20 16:20:12 Optionally: relabel service directory for SELinux
2023/07/20 16:20:12 ignoring error when setting selinux label: exit status 127
2023/07/20 16:20:12 Creating systemd unit drbd-shutdown-guard.service in /run/systemd/system
2023/07/20 16:20:12 Reloading systemd
Error: failed to reload systemd
Usage:
drbd-shutdown-guard install [flags]

Flags:
-h, --help help for install

2023/07/20 16:20:12 failed: failed to reload systemd

**_

@x86128
Copy link

x86128 commented Jul 24, 2023

Same issue with drbd-shutdown-guard on Ubuntu 22.04 LTS

@WanzenBug
Copy link
Member

See #426 (comment) for a workaround, and the overall issue

@x86128
Copy link

x86128 commented Jul 24, 2023

thanks, recipe from #426 (comment) helps,
I'm using external linstor-controller with api tls encryption. Is it possible to set LinstorCluster resource to use api client certs?
I'd build LinstorCluster this way:

apiVersion: piraeus.io/v1
kind: LinstorCluster
metadata:
  name: linstorcluster
spec:
  externalController:
    url: https://my-controller-ip:3371
  apiTLS:
    apiSecretName: linstor-api-tls
    clientSecretName: linstor-client-tls
    csiControllerSecretName: linstor-client-tls
    csiNodeSecretName: linstor-client-tls

with secrets set according to guide https://github.com/piraeusdatastore/piraeus-operator/blob/v2/docs/how-to/api-tls.md#provision-keys-and-certificates-using-openssl

but in logs of linstor-wait-node-online got lines like this:
time="2023-07-24T12:38:58Z" level=info msg="not ready" error="Get \"https://my-controller-ip:3371/v1/nodes/ds1-d-master03\": EOF" version=refs/tags/v0.2.1

@WanzenBug
Copy link
Member

Can you verify that the linstor-client-tls secret is in use by the linstor-wait-node-online container? It should be set via environment variables.

@x86128
Copy link

x86128 commented Jul 25, 2023

I'd entered into linstor-wait-node-online to check envs and connectivity:

# check env
env | grep "^LS"
LS_USER_CERTIFICATE=-----BEGIN CERTIFICATE-----
LS_CONTROLLERS=https://my-controller-ip:3371
LS_USER_KEY=-----BEGIN RSA PRIVATE KEY-----
LS_ROOT_CA=-----BEGIN CERTIFICATE-----

# check connectivity
apt install curl

echo "$LS_USER_CERTIFICATE" > client.crt
echo "$LS_USER_KEY" > client.key
echo "$LS_ROOT_CA" > ca.crt
curl -I --cacert ca.crt --cert client.crt --key client.key $LS_CONTROLLERS
HTTP/1.1 200 OK
Access-Control-Allow-Origin: *
Access-Control-Allow-Headers: origin, content-type, accept, authorization
Access-Control-Allow-Credentials: true
Access-Control-Allow-Methods: GET, POST, PUT, DELETE, OPTIONS, HEAD
Content-Length: 294
Content-Type: text/plain

# install linstor client utility
apt install gpg
curl -o /tmp/package-signing-pubkey.asc https://packages.linbit.com/package-signing-pubkey.asc

gpg --yes -o /etc/apt/trusted.gpg.d/linbit-keyring.gpg --dearmor /tmp/package-signing-pubkey.asc

PVERS=7 && echo "deb [signed-by=/etc/apt/trusted.gpg.d/linbit-keyring.gpg] \
 http://packages.linbit.com/public/ proxmox-$PVERS drbd-9" | tee -a /etc/apt/sources.list.d/linbit.list
deb [signed-by=/etc/apt/trusted.gpg.d/linbit-keyring.gpg]  http://packages.linbit.com/public/ proxmox-7 drbd-9

apt update && apt install linstor-client

# run 
linstor n l
Error: Error reading response from https://my-controller-ip:3371: Remote end closed connection without response
Error: Unable to connect to linstor://localhost:3370: [Errno 111] Connection refused

# run with explicit args
linstor --certfile client.crt --key client.key --cafile ca.crt --controllers my-controller-ip:3371 n l
╭─────────────────────────────────────────────────────────────╮
┊ Node       ┊ NodeType  ┊ Addresses                ┊ State   ┊
╞═════════════════════════════════════════════════════════════╡
┊ stor01     ┊ SATELLITE ┊ x.x.x.32:3366 (PLAIN)    ┊ Online  ┊
┊ stor       ┊ SATELLITE ┊ x.x.x.27:3366 (PLAIN)    ┊ Online  ┊
┊ worker01   ┊ SATELLITE ┊ x.x.x.225:3366 (PLAIN)   ┊ Online  ┊
┊ worker02   ┊ SATELLITE ┊ x.x.x.226:3366 (PLAIN)   ┊ Online  ┊
┊ worker03   ┊ SATELLITE ┊ x.x.x.227:3366 (PLAIN)   ┊ Online  ┊
┊ worker04   ┊ SATELLITE ┊ x.x.x.228:3366 (PLAIN)   ┊ Online  ┊
┊ worker05   ┊ SATELLITE ┊ x.x.x.229:3366 (PLAIN)   ┊ EVICTED ┊
┊ stor02     ┊ SATELLITE ┊ x.x.x.26:3366 (PLAIN)    ┊ Online  ┊
┊ stor03     ┊ SATELLITE ┊ x.x.x.37:3366 (PLAIN)    ┊ Online  ┊
┊ stor04     ┊ SATELLITE ┊ x.x.x.34:3366 (PLAIN)    ┊ Online  ┊
╰─────────────────────────────────────────────────────────────╯

@lomonosov77
Copy link
Author

See #426 (comment) for a workaround, and the overall issue

This method helps to start LinstorSatellite, but it begs the question: is the drbd-shutdown-guard container unnecessary if the solution is to remove it?

@WanzenBug
Copy link
Member

It does have a purpose, namely to improve DRBD handling during shut down:

By default, DRBD volumes created by Piraeus will suspend IO if connection is lost. During node shutdown, the DRBD devices will remain configured, and mounted, unless the node was properly evicted in Kubernetes first.

But during shutdown, the Pod network will stop working, at which point DRBD can no longer access the peers, so IO is suspended. Eventually systemd will come around and try to unmount all remaining mounts, including those mounts for containers using Piraeus Volumes. Then the unmount gets stuck, because DRBD is suspending IO.

You would have to do a hard reset, as most systemd by default do not have a unmount time out in systemd. The shutdown-guard is there to run during node shutdown and force the DRBD device to report IO errors instead: then unmounting can continue.

So while not strictly necessary, it's definitly "nice to have".

@lomonosov77
Copy link
Author

I'd entered into linstor-wait-node-online to check envs and connectivity:

# check env
env | grep "^LS"
LS_USER_CERTIFICATE=-----BEGIN CERTIFICATE-----
LS_CONTROLLERS=https://my-controller-ip:3371
LS_USER_KEY=-----BEGIN RSA PRIVATE KEY-----
LS_ROOT_CA=-----BEGIN CERTIFICATE-----

# check connectivity
apt install curl

echo "$LS_USER_CERTIFICATE" > client.crt
echo "$LS_USER_KEY" > client.key
echo "$LS_ROOT_CA" > ca.crt
curl -I --cacert ca.crt --cert client.crt --key client.key $LS_CONTROLLERS
HTTP/1.1 200 OK
Access-Control-Allow-Origin: *
Access-Control-Allow-Headers: origin, content-type, accept, authorization
Access-Control-Allow-Credentials: true
Access-Control-Allow-Methods: GET, POST, PUT, DELETE, OPTIONS, HEAD
Content-Length: 294
Content-Type: text/plain

# install linstor client utility
apt install gpg
curl -o /tmp/package-signing-pubkey.asc https://packages.linbit.com/package-signing-pubkey.asc

gpg --yes -o /etc/apt/trusted.gpg.d/linbit-keyring.gpg --dearmor /tmp/package-signing-pubkey.asc

PVERS=7 && echo "deb [signed-by=/etc/apt/trusted.gpg.d/linbit-keyring.gpg] \
 http://packages.linbit.com/public/ proxmox-$PVERS drbd-9" | tee -a /etc/apt/sources.list.d/linbit.list
deb [signed-by=/etc/apt/trusted.gpg.d/linbit-keyring.gpg]  http://packages.linbit.com/public/ proxmox-7 drbd-9

apt update && apt install linstor-client

# run 
linstor n l
Error: Error reading response from https://my-controller-ip:3371: Remote end closed connection without response
Error: Unable to connect to linstor://localhost:3370: [Errno 111] Connection refused

# run with explicit args
linstor --certfile client.crt --key client.key --cafile ca.crt --controllers my-controller-ip:3371 n l
╭─────────────────────────────────────────────────────────────╮
┊ Node       ┊ NodeType  ┊ Addresses                ┊ State   ┊
╞═════════════════════════════════════════════════════════════╡
┊ stor01     ┊ SATELLITE ┊ x.x.x.32:3366 (PLAIN)    ┊ Online  ┊
┊ stor       ┊ SATELLITE ┊ x.x.x.27:3366 (PLAIN)    ┊ Online  ┊
┊ worker01   ┊ SATELLITE ┊ x.x.x.225:3366 (PLAIN)   ┊ Online  ┊
┊ worker02   ┊ SATELLITE ┊ x.x.x.226:3366 (PLAIN)   ┊ Online  ┊
┊ worker03   ┊ SATELLITE ┊ x.x.x.227:3366 (PLAIN)   ┊ Online  ┊
┊ worker04   ┊ SATELLITE ┊ x.x.x.228:3366 (PLAIN)   ┊ Online  ┊
┊ worker05   ┊ SATELLITE ┊ x.x.x.229:3366 (PLAIN)   ┊ EVICTED ┊
┊ stor02     ┊ SATELLITE ┊ x.x.x.26:3366 (PLAIN)    ┊ Online  ┊
┊ stor03     ┊ SATELLITE ┊ x.x.x.37:3366 (PLAIN)    ┊ Online  ┊
┊ stor04     ┊ SATELLITE ┊ x.x.x.34:3366 (PLAIN)    ┊ Online  ┊
╰─────────────────────────────────────────────────────────────╯

I have the same problem.
time="2023-07-26T08:03:28Z" level=info msg="not ready" error="satellite srv-k3s-w-02 is not ONLINE: OFFLINE" version=refs/tags/v0.2.1
I tried your method, but it didn't work in my case. Nodes are still offline

 curl -I --cacert ca.crt --cert client.crt --key client.key $LS_CONTROLLERS
HTTP/1.1 200 OK
Access-Control-Allow-Origin: *
Access-Control-Allow-Headers: origin, content-type, accept, authorization
Access-Control-Allow-Credentials: true
Access-Control-Allow-Methods: GET, POST, PUT, DELETE, OPTIONS, HEAD
Content-Length: 294
Content-Type: text/plain

and

linstor --certfile client.crt --key client.key --cafile ca.crt --controllers $LS_CONTROLLERS n l
╭───────────────────────────────────────────────────────────────────╮
┊ Node ┊ NodeType ┊ Addresses ┊ State ┊
╞═══════════════════════════════════════════════════════════════════╡
┊ srv2 ┊ SATELLITE ┊ 192.168.144.2:3366 (PLAIN) ┊ Online ┊
┊ srv-k3s-w-02 ┊ SATELLITE ┊ 192.168.130.206:3366 (PLAIN) ┊ OFFLINE ┊
┊ srv-k3s-w-03 ┊ SATELLITE ┊ 192.168.130.207:3366 (PLAIN) ┊ OFFLINE ┊
╰───────────────────────────────────────────────────────────────────╯

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants