Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: Replica placement strategy #467

Open
boedy opened this issue May 4, 2023 · 6 comments
Open

Discussion: Replica placement strategy #467

boedy opened this issue May 4, 2023 · 6 comments

Comments

@boedy
Copy link

boedy commented May 4, 2023

This perhaps is not a question for the piraeus-operator perse, but more Linstor in general, but posting here as others might find it useful. Please let me know if it would be better suited to post somewhere else.

We are attempting to run a large stretched cluster over multiple regions in which we use the Piraeus operator to provide a solid storage foundation for HA and disaster recovery.

The cluster is stretched over the following regions:

  • eu-west
  • eu-central
  • us-west
  • us-central

Desired Result
For most of our workloads we would want to have a placement count of 3 where the replica's should be placed in the following way:

  • 2 replica's in the same zone - This is to protect against a node failure and support quick recovery
  • 1 replica in a different zone, but same region - This third replica is there to protect against a datacenter failure and allow for disaster recovery

linstor.csi.linbit.com/replicasOnSame and linstor.csi.linbit.com/replicasOnDifferent are missing the flexibility to allow for this exact configuration. At least there is not one that I can think of directly.

Possible workaround 1

  1. Label two nodes in the eu-west region with storage-group=eu-a
  2. Label one node in the eu-central region with the same label storage-group=eu-a
  3. Use the following storage-class:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: high-available
provisioner: linstor.csi.linbit.com
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer
parameters:
  linstor.csi.linbit.com/storagePool: data-dir
  linstor.csi.linbit.com/placementCount: "3"
  linstor.csi.linbit.com/allowRemoteVolumeAccess: "false"
  linstor.csi.linbit.com/replicasOnSame: "storage-group"

I guess this would work, but we would always to create pairs of three nodes. Mistakes can be make quite easily during node configuration.

Possible workaround 2
Only use linstor.csi.linbit.com/replicasOnSame: "topology.kubernetes.io/zone"

Write an operator that will create an extra replica's in the same region but on a different zone. This basically will use the command

# create a replica in a different zone
linstor resource <node> <pvc>

I guess the Linstor Python API could be used for this. If the affinity controller is used I believe this should work when the original datacenter goes down?

Conclusion
Maybe there is a simpeler solution which I haven't thought of yet. Hoping to get some idea's from others :)

@boedy boedy changed the title Question: Replica placement strategy Discussion: Replica placement strategy May 4, 2023
@WanzenBug
Copy link
Member

I don't think anyone actually used it, I certainly have never seen it in action. But: there is the possibility to implement "custom" strategy in LINSTOR by way of setting the Autoplacer/PreSelectScript property on the LINSTOR controller. That refers to a JS file in /etc/linstor/selector, which gets executed every time LINSTOR is asked to place a volume. I don't even know what the expected input and output looks like, but you can do "anything" in there :-)

@boedy
Copy link
Author

boedy commented May 9, 2023

Interesting. I started to go down the path of writing my own operator using the Python API, but if Autoplacer/PreSelectScript can be used, that might be a more solid option. If I can find some documentation on it at least.

https://github.com/LINBIT/linstor-server/blob/55d2909657d05cfc086e72d45dbbb6cac05d6345/docs/rest_v1_openapi.yaml#L3983 is one of the few references I can find. I'll start playing with it and see what I can uncover.

@boedy
Copy link
Author

boedy commented May 9, 2023

Are you sure this is actually used? I can't find anything where this script is actually called / executed

@WanzenBug
Copy link
Member

Should be somewhere in here: https://github.com/LINBIT/linstor-server/blob/master/controller/src/main/java/com/linbit/linstor/core/apicallhandler/controller/autoplacer/PreSelector.java

Again, I can't confirm if it even works

@boedy
Copy link
Author

boedy commented Sep 20, 2023

With the introduction of the LinstorNodeConnection feature in the recent release, I'd like to revisit this issue to discuss its potential applicability for our use-case.

In scenarios where we are running a database, it would be highly beneficial to have a synchronous replica within the same zone for quick recovery, along with an additional Disaster Recovery (DR) replica in a different zone for enhanced resilience without sacrificing write performance.

As I initially pointed out, the current options linstor.csi.linbit.com/replicasOnSame and linstor.csi.linbit.com/replicasOnDifferent do not offer the flexibility to configure such a nuanced replication strategy.

I noticed that the Autoplacer/PreSelectScript feature, which we discussed as a possible workaround, appears to have been removed from the codebase. Are there any plans to reintroduce this functionality or something similar? If this discussion would be more appropriate in the Linstor-server repository, I'm happy to open an issue there.

@WanzenBug
Copy link
Member

Yeah, moving the discussion to the linstor-server repo would be good. Having a good use case/example/scenario to explain the feature also helps :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants