fix(csi/stage): fetch uri via REST #903

tiagolobocastro · 2024-12-10T01:04:44Z

On a NodeStage call, it's possible that the publish_context URI is out of date. This can happen when the volume has been moved to another node, and the app pod is pinned to a node and the node restarts.
See Mayastor Issue 1781 for more details.

For this fix, we add a new option to enable rest client for the csi-node. Ideally we'd like to strictly adhere to CSI spec, and avoid doing any mayastor specific operations (effectively being a mostly generic nvme-connect csi-driver but the immutability of the publish_context makes this a bit difficult.

Anyways, we add a new flag --enable-rest which is optional, thus still allowing us to run without this layer.
This will be enabled by default on the helm chart.

We also further check for the Nexus Online/Degraded state, which should help avoid a bunch of nvme connect errors in the kernel.

With this, we can also improve a few things down the line, such as ensuring resize before publish, etc... but we should take these 1 at a time and not suddendly do everything via rest...

On a NodeStage call, it's possible that the publish_context URI is out of date. This can happen when the volume has been moved to another node, and the app pod is pinned to a node and the node restarts. See Mayastor Issue 1781 for more details. For this fix, we add a new option to enable rest client for the csi-node. Ideally we'd like to strictly adhere to CSI spec, and avoid doing any mayastor specific operations (effectively being a mostly generic nvme-connect csi-driver but the immutability of the publish_context makes this a bit difficult. Anyways, we add a new flag --enable-rest which is optional, thus still allowing us to run without this layer. This will be enabled by default on the helm chart. We also further check for the Nexus Online/Degraded state, which should help avoid a bunch of nvme connect errors in the kernel. With this, we can also improve a few things down the line, such as ensuring resize before publish, etc... but we should take these 1 at a time and not suddendly do everything via rest... Signed-off-by: Tiago Castro <[email protected]>

control-plane/csi-driver/src/bin/node/main_.rs

control-plane/csi-driver/src/bin/node/node.rs

niladrih

Semantics and choice of syntax looks good to me :)

tiagolobocastro · 2024-12-11T10:51:48Z

bors merge

903: fix(csi/stage): fetch uri via REST r=tiagolobocastro a=tiagolobocastro On a NodeStage call, it's possible that the publish_context URI is out of date. This can happen when the volume has been moved to another node, and the app pod is pinned to a node and the node restarts. See Mayastor Issue 1781 for more details. For this fix, we add a new option to enable rest client for the csi-node. Ideally we'd like to strictly adhere to CSI spec, and avoid doing any mayastor specific operations (effectively being a mostly generic nvme-connect csi-driver but the immutability of the publish_context makes this a bit difficult. Anyways, we add a new flag --enable-rest which is optional, thus still allowing us to run without this layer. This will be enabled by default on the helm chart. We also further check for the Nexus Online/Degraded state, which should help avoid a bunch of nvme connect errors in the kernel. With this, we can also improve a few things down the line, such as ensuring resize before publish, etc... but we should take these 1 at a time and not suddendly do everything via rest... Co-authored-by: Tiago Castro <[email protected]>

bors-openebs-mayastor · 2024-12-11T10:52:58Z

Build failed:

continuous-integration/jenkins/branch

tiagolobocastro · 2024-12-11T12:43:10Z

bors merge

903: fix(csi/stage): fetch uri via REST r=tiagolobocastro a=tiagolobocastro On a NodeStage call, it's possible that the publish_context URI is out of date. This can happen when the volume has been moved to another node, and the app pod is pinned to a node and the node restarts. See Mayastor Issue 1781 for more details. For this fix, we add a new option to enable rest client for the csi-node. Ideally we'd like to strictly adhere to CSI spec, and avoid doing any mayastor specific operations (effectively being a mostly generic nvme-connect csi-driver but the immutability of the publish_context makes this a bit difficult. Anyways, we add a new flag --enable-rest which is optional, thus still allowing us to run without this layer. This will be enabled by default on the helm chart. We also further check for the Nexus Online/Degraded state, which should help avoid a bunch of nvme connect errors in the kernel. With this, we can also improve a few things down the line, such as ensuring resize before publish, etc... but we should take these 1 at a time and not suddendly do everything via rest... Co-authored-by: Tiago Castro <[email protected]>

bors-openebs-mayastor · 2024-12-11T13:00:29Z

Build failed:

continuous-integration/jenkins/branch

Signed-off-by: Tiago Castro <[email protected]>

Use common script for linting, rather than spread commands... Signed-off-by: Tiago Castro <[email protected]>

tiagolobocastro · 2024-12-11T13:10:56Z

bors merge

bors-openebs-mayastor · 2024-12-11T14:24:06Z

Build succeeded:

continuous-integration/jenkins/branch

903: fix(csi/stage): fetch uri via REST r=tiagolobocastro a=tiagolobocastro On a NodeStage call, it's possible that the publish_context URI is out of date. This can happen when the volume has been moved to another node, and the app pod is pinned to a node and the node restarts. See Mayastor Issue 1781 for more details. For this fix, we add a new option to enable rest client for the csi-node. Ideally we'd like to strictly adhere to CSI spec, and avoid doing any mayastor specific operations (effectively being a mostly generic nvme-connect csi-driver but the immutability of the publish_context makes this a bit difficult. Anyways, we add a new flag --enable-rest which is optional, thus still allowing us to run without this layer. This will be enabled by default on the helm chart. We also further check for the Nexus Online/Degraded state, which should help avoid a bunch of nvme connect errors in the kernel. With this, we can also improve a few things down the line, such as ensuring resize before publish, etc... but we should take these 1 at a time and not suddendly do everything via rest... Co-authored-by: Tiago Castro <[email protected]> Signed-off-by: Tiago Castro <[email protected]>

901: Cherry pick csi-trace and csi-node uri fixes to develop r=tiagolobocastro a=tiagolobocastro chore(bors): merge pull request #903 903: fix(csi/stage): fetch uri via REST r=tiagolobocastro a=tiagolobocastro On a NodeStage call, it's possible that the publish_context URI is out of date. This can happen when the volume has been moved to another node, and the app pod is pinned to a node and the node restarts. See Mayastor Issue 1781 for more details. For this fix, we add a new option to enable rest client for the csi-node. Ideally we'd like to strictly adhere to CSI spec, and avoid doing any mayastor specific operations (effectively being a mostly generic nvme-connect csi-driver but the immutability of the publish_context makes this a bit difficult. Anyways, we add a new flag --enable-rest which is optional, thus still allowing us to run without this layer. This will be enabled by default on the helm chart. We also further check for the Nexus Online/Degraded state, which should help avoid a bunch of nvme connect errors in the kernel. With this, we can also improve a few things down the line, such as ensuring resize before publish, etc... but we should take these 1 at a time and not suddendly do everything via rest... Co-authored-by: Tiago Castro <[email protected]> Signed-off-by: Tiago Castro <[email protected]> --- fix(csi-driver): trace was mistakenly added as info Signed-off-by: Tiago Castro <[email protected]> Co-authored-by: Tiago Castro <[email protected]> Co-authored-by: mayastor-bors <[email protected]>

903: fix(csi/stage): fetch uri via REST r=tiagolobocastro a=tiagolobocastro On a NodeStage call, it's possible that the publish_context URI is out of date. This can happen when the volume has been moved to another node, and the app pod is pinned to a node and the node restarts. See Mayastor Issue 1781 for more details. For this fix, we add a new option to enable rest client for the csi-node. Ideally we'd like to strictly adhere to CSI spec, and avoid doing any mayastor specific operations (effectively being a mostly generic nvme-connect csi-driver but the immutability of the publish_context makes this a bit difficult. Anyways, we add a new flag --enable-rest which is optional, thus still allowing us to run without this layer. This will be enabled by default on the helm chart. We also further check for the Nexus Online/Degraded state, which should help avoid a bunch of nvme connect errors in the kernel. With this, we can also improve a few things down the line, such as ensuring resize before publish, etc... but we should take these 1 at a time and not suddendly do everything via rest... Co-authored-by: Tiago Castro <[email protected]> Signed-off-by: Tiago Castro <[email protected]>

tiagolobocastro requested review from Abhinandan-Purkait, niladrih, abhilashshetty04 and dsharma-dc December 10, 2024 01:04

tiagolobocastro force-pushed the csi-uri branch from 0325902 to 01ec679 Compare December 10, 2024 02:02

Abhinandan-Purkait approved these changes Dec 10, 2024

View reviewed changes

control-plane/csi-driver/src/bin/node/main_.rs Show resolved Hide resolved

dsharma-dc reviewed Dec 10, 2024

View reviewed changes

control-plane/csi-driver/src/bin/node/node.rs Show resolved Hide resolved

niladrih approved these changes Dec 10, 2024

View reviewed changes

dsharma-dc approved these changes Dec 11, 2024

View reviewed changes

tiagolobocastro force-pushed the csi-uri branch from 7fb1516 to c9ed89c Compare December 11, 2024 12:42

tiagolobocastro added 4 commits December 11, 2024 13:10

feat(deployer): new flag to enable rest client on csi-node

87ba3d6

Signed-off-by: Tiago Castro <[email protected]>

test(csi/node): add test for NodeStage with bad uri

d5eff89

Signed-off-by: Tiago Castro <[email protected]>

feat(csi/node): add warning if rest client not enabled

6fdc147

Signed-off-by: Tiago Castro <[email protected]>

build: check and update fmt

9f21421

Use common script for linting, rather than spread commands... Signed-off-by: Tiago Castro <[email protected]>

tiagolobocastro force-pushed the csi-uri branch from c9ed89c to 9f21421 Compare December 11, 2024 13:10

bors-openebs-mayastor bot merged commit 0c08411 into develop Dec 11, 2024
4 checks passed

bors-openebs-mayastor bot deleted the csi-uri branch December 11, 2024 14:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(csi/stage): fetch uri via REST #903

fix(csi/stage): fetch uri via REST #903

tiagolobocastro commented Dec 10, 2024

niladrih left a comment

tiagolobocastro commented Dec 11, 2024

bors-openebs-mayastor bot commented Dec 11, 2024

tiagolobocastro commented Dec 11, 2024

bors-openebs-mayastor bot commented Dec 11, 2024

tiagolobocastro commented Dec 11, 2024

bors-openebs-mayastor bot commented Dec 11, 2024

fix(csi/stage): fetch uri via REST #903

fix(csi/stage): fetch uri via REST #903

Conversation

tiagolobocastro commented Dec 10, 2024

niladrih left a comment

Choose a reason for hiding this comment

tiagolobocastro commented Dec 11, 2024

bors-openebs-mayastor bot commented Dec 11, 2024

tiagolobocastro commented Dec 11, 2024

bors-openebs-mayastor bot commented Dec 11, 2024

tiagolobocastro commented Dec 11, 2024

bors-openebs-mayastor bot commented Dec 11, 2024