Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StorageException: Failed to mkfs /dev/drbd1002 #641

Open
dmrub opened this issue Apr 8, 2024 · 9 comments
Open

StorageException: Failed to mkfs /dev/drbd1002 #641

dmrub opened this issue Apr 8, 2024 · 9 comments

Comments

@dmrub
Copy link

dmrub commented Apr 8, 2024

After installing piraeus-operator I get the error message StorageException: Failed to mkfs /dev/drbd1002 .
Kubernetes version: v1.28.8
Priaeus operator: v2.3.0
Piraeus server: v1.25.1
Linstor is installed with the following satellite configuration

---    
apiVersion: piraeus.io/v1
kind: LinstorSatelliteConfiguration
metadata:
  name: linstor-fast
spec:
  internalTLS:
    certManager:
      name: linstor-internal-ca
      kind: Issuer
  storagePools:
    - name: vg01-linstor
      lvmThinPool:
        volumeGroup: vg01
        thinPool: linstor

After installation I get a number of errors:

$ kubectl exec -ti -n piraeus-datastore deploy/linstor-controller -- linstor error-reports list
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Id                    ┊ Datetime            ┊ Node                                  ┊ Exception                                                                      ┊
╞══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ 6610156F-8EC88-000000 ┊ 2024-04-05 15:15:30 ┊ S|k8s-m2                              ┊ StorageException: Failed to mkfs /dev/drbd1002                                 ┊
┊ 66101520-00000-000000 ┊ 2024-04-05 15:15:32 ┊ C|linstor-controller-5f594b5b45-9lr8z ┊ ApiRcException: Resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' on no...   ┊
┊ 66101520-00000-000001 ┊ 2024-04-05 15:15:35 ┊ C|linstor-controller-5f594b5b45-9lr8z ┊ ApiRcException: Resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' on no...   ┊
┊ 66101520-00000-000002 ┊ 2024-04-05 15:15:42 ┊ C|linstor-controller-5f594b5b45-9lr8z ┊ ApiRcException: Resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' on no...   ┊
┊ 66101589-E5863-000000 ┊ 2024-04-05 15:15:52 ┊ S|k8s-m0                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000001 ┊ 2024-04-05 15:15:52 ┊ S|k8s-m2                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 66101520-00000-000003 ┊ 2024-04-05 15:15:52 ┊ C|linstor-controller-5f594b5b45-9lr8z ┊ ApiRcException: Resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' on no...   ┊
┊ 66101589-E5863-000001 ┊ 2024-04-05 15:15:59 ┊ S|k8s-m0                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000002 ┊ 2024-04-05 15:16:04 ┊ S|k8s-m2                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 66101589-E5863-000002 ┊ 2024-04-05 15:16:08 ┊ S|k8s-m0                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000003 ┊ 2024-04-05 15:16:08 ┊ S|k8s-m2                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 66101520-00000-000004 ┊ 2024-04-05 15:16:09 ┊ C|linstor-controller-5f594b5b45-9lr8z ┊ ApiRcException: (Node: 'k8s-m2') Generated resource file for resource 'pv...   ┊
┊ 66101520-00000-000005 ┊ 2024-04-05 15:16:09 ┊ C|linstor-controller-5f594b5b45-9lr8z ┊ ApiRcException: Resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' on no...   ┊
┊ 6610156F-8EC88-000004 ┊ 2024-04-05 15:16:12 ┊ S|k8s-m2                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 66101589-E5863-000003 ┊ 2024-04-05 15:16:12 ┊ S|k8s-m0                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000005 ┊ 2024-04-05 15:16:24 ┊ S|k8s-m2                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000006 ┊ 2024-04-05 15:16:43 ┊ S|k8s-m2                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 66101589-E5863-000004 ┊ 2024-04-05 15:16:43 ┊ S|k8s-m0                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000007 ┊ 2024-04-05 15:16:44 ┊ S|k8s-m2                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000008 ┊ 2024-04-05 15:17:42 ┊ S|k8s-m2                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 66101589-E5863-000005 ┊ 2024-04-05 15:17:42 ┊ S|k8s-m0                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 661015A1-A3732-000000 ┊ 2024-04-05 15:18:59 ┊ S|k8s-m1                              ┊ SSLException: closing inbound before receiving peer's close_notify             ┊
┊ 6610156F-8EC88-000009 ┊ 2024-04-05 15:18:59 ┊ S|k8s-m2                              ┊ SSLException: closing inbound before receiving peer's close_notify             ┊
┊ 661015A1-A3732-000001 ┊ 2024-04-05 15:18:59 ┊ S|k8s-m1                              ┊ SSLException: closing inbound before receiving peer's close_notify             ┊
┊ 66101589-E5863-000006 ┊ 2024-04-05 15:19:00 ┊ S|k8s-m0                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000010 ┊ 2024-04-05 15:19:00 ┊ S|k8s-m2                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000011 ┊ 2024-04-05 15:19:01 ┊ S|k8s-m2                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 66101589-E5863-000007 ┊ 2024-04-05 15:19:01 ┊ S|k8s-m0                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000012 ┊ 2024-04-05 15:19:27 ┊ S|k8s-m2                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 66101589-E5863-000008 ┊ 2024-04-05 15:19:27 ┊ S|k8s-m0                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000013 ┊ 2024-04-05 15:19:57 ┊ S|k8s-m2                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 66101589-E5863-000009 ┊ 2024-04-05 15:19:57 ┊ S|k8s-m0                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 66101589-E5863-000010 ┊ 2024-04-05 15:20:57 ┊ S|k8s-m0                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000014 ┊ 2024-04-05 15:20:57 ┊ S|k8s-m2                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 66101589-E5863-000011 ┊ 2024-04-05 15:22:57 ┊ S|k8s-m0                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000015 ┊ 2024-04-05 15:22:57 ┊ S|k8s-m2                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000016 ┊ 2024-04-05 15:23:14 ┊ S|k8s-m2                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000017 ┊ 2024-04-05 15:27:58 ┊ S|k8s-m2                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 66101589-E5863-000012 ┊ 2024-04-05 15:27:58 ┊ S|k8s-m0                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000018 ┊ 2024-04-05 15:37:57 ┊ S|k8s-m2                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 66101589-E5863-000013 ┊ 2024-04-05 15:37:57 ┊ S|k8s-m0                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 66101589-E5863-000014 ┊ 2024-04-05 16:07:57 ┊ S|k8s-m0                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000019 ┊ 2024-04-05 16:07:57 ┊ S|k8s-m2                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000020 ┊ 2024-04-05 17:07:57 ┊ S|k8s-m2                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 66101589-E5863-000015 ┊ 2024-04-05 17:07:57 ┊ S|k8s-m0                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000021 ┊ 2024-04-05 21:07:57 ┊ S|k8s-m2                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 66101589-E5863-000016 ┊ 2024-04-05 21:07:57 ┊ S|k8s-m0                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000022 ┊ 2024-04-06 21:07:57 ┊ S|k8s-m2                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 66101589-E5863-000017 ┊ 2024-04-06 21:07:57 ┊ S|k8s-m0                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 66101589-E5863-000018 ┊ 2024-04-07 21:07:57 ┊ S|k8s-m0                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000023 ┊ 2024-04-07 21:07:57 ┊ S|k8s-m2                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Here are the error reports:

  1. StorageException: Failed to mkfs /dev/drbd1002
ERROR REPORT 6610156F-8EC88-000000

============================================================

Application:                        LINBIT�� LINSTOR
Module:                             Satellite
Version:                            1.25.1
Build ID:                           918d21837aefab23c28a52e8fcb0af14033d9bcb
Build time:                         2023-11-20T10:09:08+00:00
Error time:                         2024-04-05 15:15:30
Node:                               k8s-m2

============================================================

Reported error:
===============

Description:
    Failed to mkfs /dev/drbd1002
Additional information:
    Command 'mkfs.ext4 -q -E nodiscard /dev/drbd1002' returned with exitcode 1. 

    Standard out: 


    Error message: 
    The file /dev/drbd1002 does not exist and no size was specified.


Category:                           LinStorException
Class name:                         StorageException
Class canonical name:               com.linbit.linstor.storage.StorageException
Generated at:                       Method 'checkExitCode', Source file 'ExtCmdUtils.java', Line #69

Error message:                      Failed to mkfs /dev/drbd1002

Error context:
    An error occurred while processing resource 'Node: 'k8s-m2', Rsc: 'pvc-80745669-9bf4-4776-9865-f6f419c57863''

ErrorContext:   Details:     Command 'mkfs.ext4 -q -E nodiscard /dev/drbd1002' returned with exitcode 1. 

Standard out: 


Error message: 
The file /dev/drbd1002 does not exist and no size was specified.




Call backtrace:

    Method                                   Native Class:Line number
    checkExitCode                            N      com.linbit.extproc.ExtCmdUtils:69
    genericExecutor                          N      com.linbit.linstor.layer.storage.utils.Commands:103
    genericExecutor                          N      com.linbit.linstor.layer.storage.utils.Commands:63
    genericExecutor                          N      com.linbit.linstor.layer.storage.utils.Commands:51
    makeFs                                   N      com.linbit.linstor.layer.storage.utils.MkfsUtils:96
    makeExt4                                 N      com.linbit.linstor.layer.storage.utils.MkfsUtils:109
    makeFileSystemOnMarked                   N      com.linbit.linstor.layer.storage.utils.MkfsUtils:222
    condInitialOrSkipSync                    N      com.linbit.linstor.layer.drbd.DrbdLayer:1771
    adjustDrbd                               N      com.linbit.linstor.layer.drbd.DrbdLayer:889
    process                                  N      com.linbit.linstor.layer.drbd.DrbdLayer:432
    process                                  N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:938
    processResourcesAndSnapshots             N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:383
    dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:181
    dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:328
    phaseDispatchDeviceHandlers              N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:1156
    devMgrLoop                               N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:756
    run                                      N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:650
    run                                      N      java.lang.Thread:829


END OF ERROR REPORT.
  1. Resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' on node 'k8s-m2' is still in use.
$ kubectl exec -ti -n piraeus-datastore deploy/linstor-controller -- linstor error-reports show 66101520-00000-000000
ERROR REPORT 66101520-00000-000000

============================================================

Application:                        LINBIT�� LINSTOR
Module:                             Controller
Version:                            1.25.1
Build ID:                           918d21837aefab23c28a52e8fcb0af14033d9bcb
Build time:                         2023-11-20T10:09:08+00:00
Error time:                         2024-04-05 15:15:32
Node:                               linstor-controller-5f594b5b45-9lr8z
Peer:                               RestClient(10.244.42.135; 'linstor-csi/v1.3.0-4077ebefbe439ee2894b782aa7914b590891d2ff')

============================================================

Reported error:
===============

Category:                           RuntimeException
Class name:                         ApiRcException
Class canonical name:               com.linbit.linstor.core.apicallhandler.response.ApiRcException
Generated at:                       Method 'deleteVolumeDefinitionInTransaction', Source file 'CtrlVlmDfnDeleteApiCallHandler.java', Line #179

Error message:                      Resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' on node 'k8s-m2' is still in use.

Error context:
    Resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' on node 'k8s-m2' is still in use.

Asynchronous stage backtrace:

    Error has been observed at the following site(s):
    	*__checkpoint ? Delete volume definition
    Original Stack Trace:

Call backtrace:

    Method                                   Native Class:Line number
    deleteVolumeDefinitionInTransaction      N      com.linbit.linstor.core.apicallhandler.controller.CtrlVlmDfnDeleteApiCallHandler:179

Suppressed exception 1 of 1:
===============
Category:                           RuntimeException
Class name:                         OnAssemblyException
Class canonical name:               reactor.core.publisher.FluxOnAssembly.OnAssemblyException
Generated at:                       Method 'deleteVolumeDefinitionInTransaction', Source file 'CtrlVlmDfnDeleteApiCallHandler.java', Line #179

Error message:                      
Error has been observed at the following site(s):
	*__checkpoint ��� Delete volume definition
Original Stack Trace:

Error context:
    Resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' on node 'k8s-m2' is still in use.

Call backtrace:

    Method                                   Native Class:Line number
    deleteVolumeDefinitionInTransaction      N      com.linbit.linstor.core.apicallhandler.controller.CtrlVlmDfnDeleteApiCallHandler:179
    lambda$deleteVolumeDefinition$0          N      com.linbit.linstor.core.apicallhandler.controller.CtrlVlmDfnDeleteApiCallHandler:134
    doInScope                                N      com.linbit.linstor.core.apicallhandler.ScopeRunner:149
    lambda$fluxInScope$0                     N      com.linbit.linstor.core.apicallhandler.ScopeRunner:76
    call                                     N      reactor.core.publisher.MonoCallable:72
    trySubscribeScalarMap                    N      reactor.core.publisher.FluxFlatMap:127
    subscribeOrReturn                        N      reactor.core.publisher.MonoFlatMapMany:49
    subscribe                                N      reactor.core.publisher.Flux:8759
    onNext                                   N      reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:195
    request                                  N      reactor.core.publisher.Operators$ScalarSubscription:2545
    onSubscribe                              N      reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:141
    subscribe                                N      reactor.core.publisher.MonoJust:55
    subscribe                                N      reactor.core.publisher.MonoDeferContextual:55
    subscribe                                N      reactor.core.publisher.Flux:8773
    onNext                                   N      reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:195
    request                                  N      reactor.core.publisher.Operators$ScalarSubscription:2545
    onSubscribe                              N      reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:141
    subscribe                                N      reactor.core.publisher.MonoJust:55
    subscribe                                N      reactor.core.publisher.MonoDeferContextual:55
    subscribe                                N      reactor.core.publisher.Mono:4495
    subscribeWith                            N      reactor.core.publisher.Mono:4561
    subscribe                                N      reactor.core.publisher.Mono:4462
    subscribe                                N      reactor.core.publisher.Mono:4398
    subscribe                                N      reactor.core.publisher.Mono:4370
    doFlux                                   N      com.linbit.linstor.api.rest.v1.RequestHelper:324
    deleteVolumeDefinition                   N      com.linbit.linstor.api.rest.v1.VolumeDefinitions:229
    invoke0                                  Y      jdk.internal.reflect.NativeMethodAccessorImpl:unknown
    invoke                                   N      jdk.internal.reflect.NativeMethodAccessorImpl:62
    invoke                                   N      jdk.internal.reflect.DelegatingMethodAccessorImpl:43
    invoke                                   N      java.lang.reflect.Method:566
    lambda$static$0                          N      org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory:52
    run                                      N      org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1:146
    invoke                                   N      org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher:189
    doDispatch                               N      org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$VoidOutInvoker:159
    dispatch                                 N      org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher:93
    invoke                                   N      org.glassfish.jersey.server.model.ResourceMethodInvoker:478
    apply                                    N      org.glassfish.jersey.server.model.ResourceMethodInvoker:400
    apply                                    N      org.glassfish.jersey.server.model.ResourceMethodInvoker:81
    run                                      N      org.glassfish.jersey.server.ServerRuntime$1:256
    call                                     N      org.glassfish.jersey.internal.Errors$1:248
    call                                     N      org.glassfish.jersey.internal.Errors$1:244
    process                                  N      org.glassfish.jersey.internal.Errors:292
    process                                  N      org.glassfish.jersey.internal.Errors:274
    process                                  N      org.glassfish.jersey.internal.Errors:244
    runInScope                               N      org.glassfish.jersey.process.internal.RequestScope:265
    process                                  N      org.glassfish.jersey.server.ServerRuntime:235
    handle                                   N      org.glassfish.jersey.server.ApplicationHandler:684
    service                                  N      org.glassfish.jersey.grizzly2.httpserver.GrizzlyHttpContainer:356
    run                                      N      org.glassfish.grizzly.http.server.HttpHandler$1:190
    doWork                                   N      org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker:535
    run                                      N      org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker:515
    run                                      N      java.lang.Thread:829


END OF ERROR REPORT.
  1. Generated resource file for resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' is invalid.
$ kubectl exec -ti -n piraeus-datastore deploy/linstor-controller -- linstor error-reports show 66101589-E5863-000000 
ERROR REPORT 66101589-E5863-000000

============================================================

Application:                        LINBIT�� LINSTOR
Module:                             Satellite
Version:                            1.25.1
Build ID:                           918d21837aefab23c28a52e8fcb0af14033d9bcb
Build time:                         2023-11-20T10:09:08+00:00
Error time:                         2024-04-05 15:15:52
Node:                               k8s-m0

============================================================

Reported error:
===============

Description:
    Operations on resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' were aborted
Cause:
    Verification of resource file failed
Additional information:
    The error reported by the runtime environment or operating system is:
    The external command 'drbdadm' exited with error code 10

Category:                           LinStorException
Class name:                         StorageException
Class canonical name:               com.linbit.linstor.storage.StorageException
Generated at:                       Method 'regenerateResFile', Source file 'DrbdLayer.java', Line #1624

Error message:                      Generated resource file for resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' is invalid.

Error context:
    An error occurred while processing resource 'Node: 'k8s-m0', Rsc: 'pvc-80745669-9bf4-4776-9865-f6f419c57863''

ErrorContext:   Description: Operations on resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' were aborted
  Cause:       Verification of resource file failed
  Details:     The error reported by the runtime environment or operating system is:
The external command 'drbdadm' exited with error code 10



Call backtrace:

    Method                                   Native Class:Line number
    regenerateResFile                        N      com.linbit.linstor.layer.drbd.DrbdLayer:1624
    adjustDrbd                               N      com.linbit.linstor.layer.drbd.DrbdLayer:687
    process                                  N      com.linbit.linstor.layer.drbd.DrbdLayer:432
    process                                  N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:938
    processResourcesAndSnapshots             N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:383
    dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:181
    dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:328
    phaseDispatchDeviceHandlers              N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:1156
    devMgrLoop                               N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:756
    run                                      N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:650
    run                                      N      java.lang.Thread:829

Caused by:
==========

Description:
    Execution of the external command 'drbdadm' failed.
Cause:
    The external command exited with error code 10.
Correction:
    - Check whether the external program is operating properly.
    - Check whether the command line is correct.
      Contact a system administrator or a developer if the command line is no longer valid
      for the installed version of the external program.
Additional information:
    The full command line executed was:
    drbdadm --config-to-test /var/lib/linstor.d/pvc-80745669-9bf4-4776-9865-f6f419c57863.res_tmp --config-to-exclude /var/lib/linstor.d/pvc-80745669-9bf4-4776-9865-f6f419c57863.res sh-nop

    The external command sent the following output data:


    The external command sent the following error information:
    /etc/drbd.conf:54: in resource pvc-80745669-9bf4-4776-9865-f6f419c57863, on k8s-m0 { ... }: volume 0 not defined on k8s-m2
    command sh-nop exited with code 10


Category:                           LinStorException
Class name:                         ExtCmdFailedException
Class canonical name:               com.linbit.extproc.ExtCmdFailedException
Generated at:                       Method 'execute', Source file 'DrbdAdm.java', Line #642

Error message:                      The external command 'drbdadm' exited with error code 10


ErrorContext:   Description: Execution of the external command 'drbdadm' failed.
  Cause:       The external command exited with error code 10.
  Correction:  - Check whether the external program is operating properly.
- Check whether the command line is correct.
  Contact a system administrator or a developer if the command line is no longer valid
  for the installed version of the external program.
  Details:     The full command line executed was:
drbdadm --config-to-test /var/lib/linstor.d/pvc-80745669-9bf4-4776-9865-f6f419c57863.res_tmp --config-to-exclude /var/lib/linstor.d/pvc-80745669-9bf4-4776-9865-f6f419c57863.res sh-nop

The external command sent the following output data:


The external command sent the following error information:
/etc/drbd.conf:54: in resource pvc-80745669-9bf4-4776-9865-f6f419c57863, on k8s-m0 { ... }: volume 0 not defined on k8s-m2
command sh-nop exited with code 10




Call backtrace:

    Method                                   Native Class:Line number
    execute                                  N      com.linbit.linstor.layer.drbd.utils.DrbdAdm:642
    execute                                  N      com.linbit.linstor.layer.drbd.utils.DrbdAdm:625
    checkResFile                             N      com.linbit.linstor.layer.drbd.utils.DrbdAdm:492
    regenerateResFile                        N      com.linbit.linstor.layer.drbd.DrbdLayer:1617
    adjustDrbd                               N      com.linbit.linstor.layer.drbd.DrbdLayer:687
    process                                  N      com.linbit.linstor.layer.drbd.DrbdLayer:432
    process                                  N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:938
    processResourcesAndSnapshots             N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:383
    dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:181
    dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:328
    phaseDispatchDeviceHandlers              N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:1156
    devMgrLoop                               N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:756
    run                                      N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:650
    run                                      N      java.lang.Thread:829


END OF ERROR REPORT.
  1. (Node: 'k8s-m2') Generated resource file for resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' is invalid.
ERROR REPORT 66101520-00000-000004

============================================================

Application:                        LINBIT�� LINSTOR
Module:                             Controller
Version:                            1.25.1
Build ID:                           918d21837aefab23c28a52e8fcb0af14033d9bcb
Build time:                         2023-11-20T10:09:08+00:00
Error time:                         2024-04-05 15:16:09
Node:                               linstor-controller-5f594b5b45-9lr8z
Peer:                               RestClient(10.244.42.135; 'linstor-csi/v1.3.0-4077ebefbe439ee2894b782aa7914b590891d2ff')

============================================================

Reported error:
===============

Category:                           RuntimeException
Class name:                         ApiRcException
Class canonical name:               com.linbit.linstor.core.apicallhandler.response.ApiRcException
Generated at:                       Method 'handleAnswer', Source file 'CommonMessageProcessor.java', Line #346

Error message:                      (Node: 'k8s-m2') Generated resource file for resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' is invalid.

Error context:
    (Node: 'k8s-m2') Generated resource file for resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' is invalid.

Asynchronous stage backtrace:

    Error has been observed at the following site(s):
    	*__checkpoint ? Modify resource-definition
    Original Stack Trace:

Call backtrace:

    Method                                   Native Class:Line number
    handleAnswer                             N      com.linbit.linstor.proto.CommonMessageProcessor:346

Suppressed exception 1 of 1:
===============
Category:                           RuntimeException
Class name:                         OnAssemblyException
Class canonical name:               reactor.core.publisher.FluxOnAssembly.OnAssemblyException
Generated at:                       Method 'handleAnswer', Source file 'CommonMessageProcessor.java', Line #346

Error message:                      
Error has been observed at the following site(s):
	*__checkpoint ��� Modify resource-definition
Original Stack Trace:

Error context:
    (Node: 'k8s-m2') Generated resource file for resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' is invalid.

Call backtrace:

    Method                                   Native Class:Line number
    handleAnswer                             N      com.linbit.linstor.proto.CommonMessageProcessor:346
    handleDataMessage                        N      com.linbit.linstor.proto.CommonMessageProcessor:293
    doProcessInOrderMessage                  N      com.linbit.linstor.proto.CommonMessageProcessor:244
    lambda$doProcessMessage$4                N      com.linbit.linstor.proto.CommonMessageProcessor:229
    subscribe                                N      reactor.core.publisher.FluxDefer:46
    subscribe                                N      reactor.core.publisher.Flux:8773
    onNext                                   N      reactor.core.publisher.FluxFlatMap$FlatMapMain:427
    drainAsync                               N      reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:453
    drain                                    N      reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:724
    onNext                                   N      reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:256
    drainFused                               N      reactor.core.publisher.SinkManyUnicast:319
    drain                                    N      reactor.core.publisher.SinkManyUnicast:362
    tryEmitNext                              N      reactor.core.publisher.SinkManyUnicast:237
    tryEmitNext                              N      reactor.core.publisher.SinkManySerialized:100
    processInOrder                           N      com.linbit.linstor.netcom.TcpConnectorPeer:392
    doProcessMessage                         N      com.linbit.linstor.proto.CommonMessageProcessor:227
    lambda$processMessage$2                  N      com.linbit.linstor.proto.CommonMessageProcessor:164
    onNext                                   N      reactor.core.publisher.FluxPeek$PeekSubscriber:185
    runAsync                                 N      reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:440
    run                                      N      reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:527
    call                                     N      reactor.core.scheduler.WorkerTask:84
    call                                     N      reactor.core.scheduler.WorkerTask:37
    run                                      N      java.util.concurrent.FutureTask:264
    run                                      N      java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask:304
    runWorker                                N      java.util.concurrent.ThreadPoolExecutor:1128
    run                                      N      java.util.concurrent.ThreadPoolExecutor$Worker:628
    run                                      N      java.lang.Thread:829


END OF ERROR REPORT.
  1. Generated resource file for resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' is invalid.
ERROR REPORT 6610156F-8EC88-000004

============================================================

Application:                        LINBIT�� LINSTOR
Module:                             Satellite
Version:                            1.25.1
Build ID:                           918d21837aefab23c28a52e8fcb0af14033d9bcb
Build time:                         2023-11-20T10:09:08+00:00
Error time:                         2024-04-05 15:16:12
Node:                               k8s-m2

============================================================

Reported error:
===============

Description:
    Operations on resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' were aborted
Cause:
    Verification of resource file failed
Additional information:
    The error reported by the runtime environment or operating system is:
    The external command 'drbdadm' exited with error code 10

Category:                           LinStorException
Class name:                         StorageException
Class canonical name:               com.linbit.linstor.storage.StorageException
Generated at:                       Method 'regenerateResFile', Source file 'DrbdLayer.java', Line #1624

Error message:                      Generated resource file for resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' is invalid.

Error context:
    An error occurred while processing resource 'Node: 'k8s-m2', Rsc: 'pvc-80745669-9bf4-4776-9865-f6f419c57863''

ErrorContext:   Description: Operations on resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' were aborted
  Cause:       Verification of resource file failed
  Details:     The error reported by the runtime environment or operating system is:
The external command 'drbdadm' exited with error code 10



Call backtrace:

    Method                                   Native Class:Line number
    regenerateResFile                        N      com.linbit.linstor.layer.drbd.DrbdLayer:1624
    adjustDrbd                               N      com.linbit.linstor.layer.drbd.DrbdLayer:687
    process                                  N      com.linbit.linstor.layer.drbd.DrbdLayer:432
    process                                  N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:938
    processResourcesAndSnapshots             N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:383
    dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:181
    dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:328
    phaseDispatchDeviceHandlers              N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:1156
    devMgrLoop                               N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:756
    run                                      N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:650
    run                                      N      java.lang.Thread:829

Caused by:
==========

Description:
    Execution of the external command 'drbdadm' failed.
Cause:
    The external command exited with error code 10.
Correction:
    - Check whether the external program is operating properly.
    - Check whether the command line is correct.
      Contact a system administrator or a developer if the command line is no longer valid
      for the installed version of the external program.
Additional information:
    The full command line executed was:
    drbdadm --config-to-test /var/lib/linstor.d/pvc-80745669-9bf4-4776-9865-f6f419c57863.res_tmp --config-to-exclude /var/lib/linstor.d/pvc-80745669-9bf4-4776-9865-f6f419c57863.res sh-nop

    The external command sent the following output data:


    The external command sent the following error information:
    /etc/drbd.conf:54: in resource pvc-80745669-9bf4-4776-9865-f6f419c57863, on k8s-m2 { ... }: volume 0 missing (present on k8s-m0)
    command sh-nop exited with code 10


Category:                           LinStorException
Class name:                         ExtCmdFailedException
Class canonical name:               com.linbit.extproc.ExtCmdFailedException
Generated at:                       Method 'execute', Source file 'DrbdAdm.java', Line #642

Error message:                      The external command 'drbdadm' exited with error code 10


ErrorContext:   Description: Execution of the external command 'drbdadm' failed.
  Cause:       The external command exited with error code 10.
  Correction:  - Check whether the external program is operating properly.
- Check whether the command line is correct.
  Contact a system administrator or a developer if the command line is no longer valid
  for the installed version of the external program.
  Details:     The full command line executed was:
drbdadm --config-to-test /var/lib/linstor.d/pvc-80745669-9bf4-4776-9865-f6f419c57863.res_tmp --config-to-exclude /var/lib/linstor.d/pvc-80745669-9bf4-4776-9865-f6f419c57863.res sh-nop

The external command sent the following output data:


The external command sent the following error information:
/etc/drbd.conf:54: in resource pvc-80745669-9bf4-4776-9865-f6f419c57863, on k8s-m2 { ... }: volume 0 missing (present on k8s-m0)
command sh-nop exited with code 10




Call backtrace:

    Method                                   Native Class:Line number
    execute                                  N      com.linbit.linstor.layer.drbd.utils.DrbdAdm:642
    execute                                  N      com.linbit.linstor.layer.drbd.utils.DrbdAdm:625
    checkResFile                             N      com.linbit.linstor.layer.drbd.utils.DrbdAdm:492
    regenerateResFile                        N      com.linbit.linstor.layer.drbd.DrbdLayer:1617
    adjustDrbd                               N      com.linbit.linstor.layer.drbd.DrbdLayer:687
    process                                  N      com.linbit.linstor.layer.drbd.DrbdLayer:432
    process                                  N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:938
    processResourcesAndSnapshots             N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:383
    dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:181
    dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:328
    phaseDispatchDeviceHandlers              N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:1156
    devMgrLoop                               N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:756
    run                                      N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:650
    run                                      N      java.lang.Thread:829


END OF ERROR REPORT.

Output of LVM's pvs; vgs; lvs; on cluster nodes:

k8s-m0:
  PV         VG   Fmt  Attr PSize   PFree  
  /dev/sda2  vg00 lvm2 a--  <99,50g <49,50g
  /dev/sdb   vg01 lvm2 a--  <50,00g 516,00m

  VG   #PV #LV #SN Attr   VSize   VFree  
  vg00   1   1   0 wz--n- <99,50g <49,50g
  vg01   1   2   0 wz--n- <50,00g 516,00m

  LV                                             VG   Attr       LSize  Pool    Origin Data%  Meta%  Move Log Cpy%Sync Convert
  root                                           vg00 -wi-ao---- 50,00g                                                       
  linstor                                        vg01 twi-aotz-- 49,39g                0,01   10,44                           
  pvc-80745669-9bf4-4776-9865-f6f419c57863_00000 vg01 Vwi-a-tz-- 10,00g linstor        0,01                                   

k8s-m1:
  PV         VG   Fmt  Attr PSize   PFree  
  /dev/sda2  vg00 lvm2 a--  <99,50g <49,50g
  /dev/sdb   vg01 lvm2 a--  <50,00g 516,00m

  VG   #PV #LV #SN Attr   VSize   VFree  
  vg00   1   1   0 wz--n- <99,50g <49,50g
  vg01   1   2   0 wz--n- <50,00g 516,00m

  LV                                             VG   Attr       LSize  Pool    Origin Data%  Meta%  Move Log Cpy%Sync Convert
  root                                           vg00 -wi-ao---- 50,00g                                                       
  linstor                                        vg01 twi-aotz-- 49,39g                0,43   10,58                           
  pvc-b1d25fdb-8729-474b-ab0e-c031cf159d60_00000 vg01 Vwi-aotz--  8,00g linstor        2,68                                   

k8s-m2:
  PV         VG   Fmt  Attr PSize   PFree  
  /dev/sda2  vg00 lvm2 a--  <99,50g <49,50g
  /dev/sdb   vg01 lvm2 a--  <50,00g 516,00m

  VG   #PV #LV #SN Attr   VSize   VFree  
  vg00   1   1   0 wz--n- <99,50g <49,50g
  vg01   1   3   0 wz--n- <50,00g 516,00m

  LV                                             VG   Attr       LSize  Pool    Origin Data%  Meta%  Move Log Cpy%Sync Convert
  root                                           vg00 -wi-ao---- 50,00g                                                       
  linstor                                        vg01 twi-aotz-- 49,39g                0,83   10,70                           
  pvc-a6a8ed01-2406-4614-8432-fdef2b2c7abe_00000 vg01 Vwi-aotz--  5,00g linstor        2,91                                   
  pvc-b1d25fdb-8729-474b-ab0e-c031cf159d60_00000 vg01 Vwi-aotz--  8,00g linstor        3,28                                   
@WanzenBug
Copy link
Member

Please try to update to the latest version.

It also looks like this was not a fresh install? Otherwise, why would there be any resources?

This

    Resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' on node 'k8s-m2' is still in use.

Looks like the resource (which already existed) is still in use somewhere. So someone has the still mounted or similar. Clean that up first (check the resource state linstor r l to find where it is "InUse" and see unmount it there).

@dmrub
Copy link
Author

dmrub commented Apr 8, 2024

I will try to upgrade to the latest version, but this is a fresh install. We plan to use Linstor in production, but before that we are doing automated testing by installing fresh Kubernetes on three VMs and then via Flux CD piraeus operator. This installation was started on Friday evening and this morning I saw the installation status and found the errors I describe in this issue.

The output of the linstor r l:

$ kubectl exec -ti -n piraeus-datastore deploy/linstor-controller -- linstor r l
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName                             ┊ Node   ┊ Port ┊ Usage  ┊ Conns ┊      State ┊ CreatedOn           ┊
╞══════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ pvc-80745669-9bf4-4776-9865-f6f419c57863 ┊ k8s-m0 ┊ 7002 ┊        ┊       ┊    Unknown ┊                     ┊
┊ pvc-80745669-9bf4-4776-9865-f6f419c57863 ┊ k8s-m2 ┊ 7002 ┊ InUse  ┊       ┊    Unknown ┊ 2024-04-05 15:15:27 ┊
┊ pvc-a6a8ed01-2406-4614-8432-fdef2b2c7abe ┊ k8s-m2 ┊ 7000 ┊ InUse  ┊ Ok    ┊   UpToDate ┊ 2024-04-05 15:15:24 ┊
┊ pvc-b1d25fdb-8729-474b-ab0e-c031cf159d60 ┊ k8s-m0 ┊ 7001 ┊ Unused ┊ Ok    ┊ TieBreaker ┊ 2024-04-05 15:16:03 ┊
┊ pvc-b1d25fdb-8729-474b-ab0e-c031cf159d60 ┊ k8s-m1 ┊ 7001 ┊ InUse  ┊ Ok    ┊   UpToDate ┊ 2024-04-05 15:16:04 ┊
┊ pvc-b1d25fdb-8729-474b-ab0e-c031cf159d60 ┊ k8s-m2 ┊ 7001 ┊ Unused ┊ Ok    ┊   UpToDate ┊ 2024-04-05 15:16:02 ┊
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

The PVC pvc-80745669-9bf4-4776-9865-f6f419c57863 is used by the monitoring, which cannot start:

$ kubectl get pvc -A | grep pvc-80745669-9bf4-4776-9865-f6f419c57863
monitoring           kube-prometheus-stack-grafana         Bound    pvc-80745669-9bf4-4776-9865-f6f419c57863   10Gi       RWO            linstor-fast                 2d17h

$ kubectl get pods -n monitoring
NAME                                                       READY   STATUS     RESTARTS   AGE
alertmanager-kube-prometheus-stack-alertmanager-0          2/2     Running    0          35h
kube-prometheus-stack-grafana-9b8785fdd-m9nkm              0/3     Init:0/1   0          2d17h
kube-prometheus-stack-kube-state-metrics-776c898f6-qbjj9   1/1     Running    0          47h
kube-prometheus-stack-operator-696cbbfbfb-sql6s            1/1     Running    0          35h
kube-prometheus-stack-prometheus-node-exporter-d96g9       1/1     Running    0          2d17h
kube-prometheus-stack-prometheus-node-exporter-dcdh7       1/1     Running    0          2d17h
kube-prometheus-stack-prometheus-node-exporter-gfblh       1/1     Running    0          2d17h
prometheus-kube-prometheus-stack-prometheus-0              2/2     Running    0          35h

@WanzenBug
Copy link
Member

So it looks like 6610156F-8EC88-000000 indicates that mkfs failed because DRBD was not set up correctly. But in 66101520-00000-000000 we can see that the resource is apparently in use. This does not make much sense. This would indicate that something is using keeping the resource in primary without any actual disk.

Could you please try to run:

kubectl exec k8s-m2 -- drbdsetup status pvc-80745669-9bf4-4776-9865-f6f419c57863
kubectl exec k8s-m2 -- drbdsetup show pvc-80745669-9bf4-4776-9865-f6f419c57863

It looks like the CSI driver later tried to create the volume again and somehow determined that the volume already exists, which lead to it being bound. I would recommend deleting the PVC and PV and letting it be recreated.

@dmrub
Copy link
Author

dmrub commented Apr 8, 2024

Here is output of the commands

$ kubectl exec -n piraeus-datastore k8s-m2 -- drbdsetup status pvc-80745669-9bf4-4776-9865-f6f419c57863
pvc-80745669-9bf4-4776-9865-f6f419c57863 role:Primary

$ kubectl exec -n piraeus-datastore k8s-m2 -- drbdsetup show pvc-80745669-9bf4-4776-9865-f6f419c57863
resource "pvc-80745669-9bf4-4776-9865-f6f419c57863" {
    options {
        on-no-data-accessible	suspend-io;
        on-suspended-primary-outdated	force-secondary;
    }
    _this_host {
        node-id			0;
    }
}

@WanzenBug
Copy link
Member

Ok, this looks like a bug in LINSTOR that does not properly restore the resource to secondary after the mkfs call fails. Still leaves the issue how it can be that /dev/drbd1002 does not exist at this point. I have no idea how that can happen.

To fully clean up the volume:

kubectl exec -n piraeus-datastore k8s-m2 -- drbdsetup secondary pvc-80745669-9bf4-4776-9865-f6f419c57863

Then, run linstor rd d pvc-80745669-9bf4-4776-9865-f6f419c57863 and delete PVC and PV.

@dmrub
Copy link
Author

dmrub commented Apr 8, 2024

Your last suggestion worked, I was able to reinstall the monitoring. What would you recommend now?
Update to the latest version of piraeus Operator and create a new issue when I get a new error?
What steps would help you to analyze this error?

@WanzenBug
Copy link
Member

Yes, please upgrade and see if it happens again. In case you encounter an issue, run

kubectl exec -it deploy/linstor-controller -- linstor sos-report create

Then copy the created file from the pod to your host and attach it to the issue

@dmrub
Copy link
Author

dmrub commented Apr 15, 2024

@WanzenBug , I am currently testing the latest version of Piraeus Operator v2.5.0 and so far the problem described in this issue has not reoccurred. However, I have just reproduced again a problem that I described in another issue: LINBIT/linstor-server#396 . Since I never got a response in the linstor-server project, should I recreate the issue in this (piraeus-operator) project?

@WanzenBug
Copy link
Member

Yes, this is an issue more appropriate for the piraeus project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants