Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Losing quorum as soon as a node goes down #162

Open
Ulrar opened this issue Dec 7, 2023 · 10 comments
Open

Losing quorum as soon as a node goes down #162

Ulrar opened this issue Dec 7, 2023 · 10 comments

Comments

@Ulrar
Copy link

Ulrar commented Dec 7, 2023

Hi,

I have 3 nodes, and a placementCount of 2. After quite a bit of fiddling, the third node got 'TieBreaker' volumes (or Diskless, for some) setup on it, so I'd assume I'm okay to lose one node.

But sadly as soon as any of the nodes go down, I lose quorum and the remaining two nodes get tainted with drbd.linbit.com/lost-quorum:NoSchedule.

╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName                             ┊ Node          ┊ Port ┊ Usage  ┊ Conns                     ┊      State ┊ CreatedOn           ┊
╞═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ pvc-1a9e5a5e-fdba-4b8e-ae9f-1a7acd048184 ┊ talos-00r-fu9 ┊ 7001 ┊ Unused ┊ Connecting(talos-ozt-z3h) ┊   Diskless ┊ 2023-11-19 15:36:20 ┊
┊ pvc-1a9e5a5e-fdba-4b8e-ae9f-1a7acd048184 ┊ talos-813-fn2 ┊ 7001 ┊ InUse  ┊ Connecting(talos-ozt-z3h) ┊   UpToDate ┊ 2023-11-07 18:03:48 ┊
┊ pvc-1a9e5a5e-fdba-4b8e-ae9f-1a7acd048184 ┊ talos-ozt-z3h ┊ 7001 ┊        ┊                           ┊    Unknown ┊ 2023-10-27 12:04:28 ┊
┊ pvc-56924ed3-7815-4655-9536-6b64792182ca ┊ talos-00r-fu9 ┊ 7004 ┊ Unused ┊ Connecting(talos-ozt-z3h) ┊   Diskless ┊ 2023-11-19 15:36:23 ┊
┊ pvc-56924ed3-7815-4655-9536-6b64792182ca ┊ talos-813-fn2 ┊ 7004 ┊ Unused ┊ Connecting(talos-ozt-z3h) ┊   UpToDate ┊ 2023-11-19 09:47:12 ┊
┊ pvc-56924ed3-7815-4655-9536-6b64792182ca ┊ talos-ozt-z3h ┊ 7004 ┊        ┊                           ┊    Unknown ┊ 2023-10-27 12:04:32 ┊
┊ pvc-86499a05-3ba9-4722-9bb1-69ae47406263 ┊ talos-00r-fu9 ┊ 7005 ┊ Unused ┊ Connecting(talos-ozt-z3h) ┊ TieBreaker ┊ 2023-11-19 15:36:23 ┊
┊ pvc-86499a05-3ba9-4722-9bb1-69ae47406263 ┊ talos-813-fn2 ┊ 7005 ┊ InUse  ┊ Connecting(talos-ozt-z3h) ┊   UpToDate ┊ 2023-11-12 18:11:43 ┊
┊ pvc-86499a05-3ba9-4722-9bb1-69ae47406263 ┊ talos-ozt-z3h ┊ 7005 ┊        ┊                           ┊    Unknown ┊ 2023-11-12 18:11:43 ┊
┊ pvc-c7bdfa9e-e3c2-4dd3-ac9c-b7b2e847d30b ┊ talos-00r-fu9 ┊ 7003 ┊ Unused ┊ Connecting(talos-ozt-z3h) ┊   Diskless ┊ 2023-11-19 15:36:23 ┊
┊ pvc-c7bdfa9e-e3c2-4dd3-ac9c-b7b2e847d30b ┊ talos-813-fn2 ┊ 7003 ┊ Unused ┊ Connecting(talos-ozt-z3h) ┊   UpToDate ┊ 2023-11-07 18:03:50 ┊
┊ pvc-c7bdfa9e-e3c2-4dd3-ac9c-b7b2e847d30b ┊ talos-ozt-z3h ┊ 7003 ┊        ┊                           ┊    Unknown ┊ 2023-10-27 12:04:33 ┊
┊ pvc-e57930e5-6772-41e4-8c98-99105b77970a ┊ talos-00r-fu9 ┊ 7002 ┊ Unused ┊ Connecting(talos-ozt-z3h) ┊   Diskless ┊ 2023-11-19 15:36:23 ┊
┊ pvc-e57930e5-6772-41e4-8c98-99105b77970a ┊ talos-813-fn2 ┊ 7002 ┊ InUse  ┊ Connecting(talos-ozt-z3h) ┊   UpToDate ┊ 2023-11-07 18:03:49 ┊
┊ pvc-e57930e5-6772-41e4-8c98-99105b77970a ┊ talos-ozt-z3h ┊ 7002 ┊        ┊                           ┊    Unknown ┊ 2023-10-27 12:04:33 ┊
┊ pvc-fbdf5c3c-2d49-49b8-ac10-f8e1212c7788 ┊ talos-00r-fu9 ┊ 7000 ┊ Unused ┊ Connecting(talos-ozt-z3h) ┊ TieBreaker ┊ 2023-11-19 15:36:23 ┊
┊ pvc-fbdf5c3c-2d49-49b8-ac10-f8e1212c7788 ┊ talos-813-fn2 ┊ 7000 ┊ InUse  ┊ Connecting(talos-ozt-z3h) ┊   UpToDate ┊ 2023-11-08 08:47:17 ┊
┊ pvc-fbdf5c3c-2d49-49b8-ac10-f8e1212c7788 ┊ talos-ozt-z3h ┊ 7000 ┊        ┊                           ┊    Unknown ┊ 2023-10-27 12:05:17 ┊
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

I have no idea why the above leads to loosing quorum, there's clearly two connected nodes (even if one is the TieBreaker).

I'm not sure what I'm doing wrong, but tainting the nodes like that make recovering pretty difficult as most pods won't get re-scheduled, depending on what went down I sometimes have to manually untaint a node to let pods come back up and slowly recover by hand, using drbdadm to decide which to keep for every volume.

Thanks

@WanzenBug
Copy link
Member

Have you checked with drbdsetup status on the remaining nodes that they indeed have quorum? If they do have it, it seems like a bug in the HA controller.

@Ulrar
Copy link
Author

Ulrar commented Dec 7, 2023

Yes, they do lose quorum. For example just now :

pvc-e57930e5-6772-41e4-8c98-99105b77970a role:Secondary suspended:quorum
  disk:UpToDate quorum:no blocked:upper
  talos-00r-fu9 role:Secondary
    peer-disk:Diskless
  talos-813-fn2 connection:Connecting

It has an UpToDate and a Diskless node, and yet it thinks it lost quorum. That's the only volume that lost quorum, the other ones look the same but with quorum, and the local node became Primary, maybe it's something to do with that specific volume somehow

@WanzenBug
Copy link
Member

Very weird. Probably something for the DRBD folks to look at.

If you just want to disable the taints, you can disable the HA Controller since 2.3.0: https://github.com/piraeusdatastore/piraeus-operator/blob/v2/docs/reference/linstorcluster.md#spechighavailabilitycontroller

@Ulrar
Copy link
Author

Ulrar commented Dec 7, 2023

It looks like the TieBreaker / Diskless node doesn't count towards the quorum when changing primary, so if the Primary for a volume goes down (even cleanly, it appears) the other one can't become primary anymore, and goes into a lost quorum state.

That is probably a drbd issue, but when the primary goes down cleanly I wonder if the operator could make sure the secondary switches first, while it has quorum ?
Or maybe I should just go to a placement count of 3 to avoid this

@S3LL1G28
Copy link

I have tested with placement count 3 and cordon my second node. before I have:

kubectl -n piraeus exec -it deployment/linstor-controller -- linstor volume list
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node ┊ Resource ┊ StoragePool ┊ VolNr ┊ MinorNr ┊ DeviceName ┊ Allocated ┊ InUse ┊ State ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ k8sw1.gil-yan.net ┊ pvc-0c129a3b-1154-47eb-988b-e18774565982 ┊ lvm-thin ┊ 0 ┊ 1000 ┊ /dev/drbd1000 ┊ 56.34 MiB ┊ Unused ┊ UpToDate ┊
┊ k8sw2.gil-yan.net ┊ pvc-0c129a3b-1154-47eb-988b-e18774565982 ┊ lvm-thin ┊ 0 ┊ 1000 ┊ /dev/drbd1000 ┊ 119.85 MiB ┊ InUse ┊ UpToDate ┊
┊ k8sw3.gil-yan.net ┊ pvc-0c129a3b-1154-47eb-988b-e18774565982 ┊ lvm-thin ┊ 0 ┊ 1000 ┊ /dev/drbd1000 ┊ 56.34 MiB ┊ Unused ┊ UpToDate ┊

and now:

kubectl -n piraeus exec -it deployment/linstor-controller -- linstor volume list
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node ┊ Resource ┊ StoragePool ┊ VolNr ┊ MinorNr ┊ DeviceName ┊ Allocated ┊ InUse ┊ State ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ k8sw1.gil-yan.net ┊ pvc-0c129a3b-1154-47eb-988b-e18774565982 ┊ lvm-thin ┊ 0 ┊ 1000 ┊ /dev/drbd1000 ┊ 170.05 MiB ┊ InUse ┊ UpToDate ┊
┊ k8sw2.gil-yan.net ┊ pvc-0c129a3b-1154-47eb-988b-e18774565982 ┊ lvm-thin ┊ 0 ┊ 1000 ┊ /dev/drbd1000 ┊ 233.56 MiB ┊ Unused ┊ UpToDate ┊
┊ k8sw3.gil-yan.net ┊ pvc-0c129a3b-1154-47eb-988b-e18774565982 ┊ lvm-thin ┊ 0 ┊ 1000 ┊ None ┊ 170.05 MiB ┊ Unused ┊ UpToDate ┊

and drbdadm status:

pvc-0c129a3b-1154-47eb-988b-e18774565982 role:Secondary
disk:UpToDate
k8sw1.gil-yan.net connection:StandAlone
k8sw2.gil-yan.net connection:StandAlone

I was thinking it will coming to normal state after uncordon, but it seems not the case.

@S3LL1G28
Copy link

in case of @Ulrar ,

What is your cluster setup ?

@Ulrar
Copy link
Author

Ulrar commented Aug 30, 2024

Nothing fancy, I'm using 3 Talos nodes, with scheduling on control plane nodes (since there's only 3 nodes) and a replica 3.

But this actually seems to have fixed itself, I suspect DRBD 9.2.9 is what did it. Or at least I used to run into this all the time, and since that upgrade I haven't seen it once, so I think this was it :

  - Fix a kernel crash that is sometimes triggered when downing drbd
    resources in a specific, unusual order (was triggered by the
    Kubernetes CSI driver)

@S3LL1G28
Copy link

I have upgraded but i don't know why suddenly my devicename is None for my 3rd node

@S3LL1G28
Copy link

S3LL1G28 commented Aug 30, 2024

Effectively on my third node it don't see the lvm whereas it is there.

Warning  FailedMount             1s (x3 over 2s)  kubelet                  MountVolume.SetUp failed for volume "pvc-0c129a3b-1154-47eb-988b-e18774565982" : rpc error: code = Internal desc = NodePublishVolume │
│  failed for pvc-0c129a3b-1154-47eb-988b-e18774565982: failed to stat source device: stat : no such file or directory  

whereas: lvdisplay:

root@k8sw3:~# lvdisplay 
  --- Logical volume ---
  LV Name                thin
  VG Name                piraeus
  LV UUID                LrjRdC-Nxsc-uz5w-SX7r-eSnm-HxoX-gdA54A
  LV Write Access        read/write (activated read only)
  LV Creation host, time k8sw3.gil-yan.net, 2024-08-30 15:04:59 +0000
  LV Pool metadata       thin_tmeta
  LV Pool data           thin_tdata
  LV Status              available
  # open                 0
  LV Size                499.00 GiB
  Allocated pool data    0.03%
  Allocated metadata     10.42%
  Current LE             127744
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     1024
  Block device           253:3
   
  --- Logical volume ---
  LV Path                /dev/piraeus/pvc-0c129a3b-1154-47eb-988b-e18774565982_00000
  LV Name                pvc-0c129a3b-1154-47eb-988b-e18774565982_00000
  VG Name                piraeus
  LV UUID                sKjVGl-QgmQ-OMDz-labo-r4YO-jfNs-cCFsJ9
  LV Write Access        read/write
  LV Creation host, time k8sw3.gil-yan.net, 2024-08-30 15:12:15 +0000
  LV Pool name           thin
  LV Status              available
  # open                 2
  LV Size                10.00 GiB
  Mapped size            1.66%
  Current LE             2561
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     1024
  Block device           253:4

@WanzenBug
Copy link
Member

Check if the right DRBD version is in use: cat /proc/drbd, should report > 9.0.0.

I'm not sure what exact steps you run when you cordon, could you please elaborate a bit on that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants