Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Active Cluster: Volumes must be connected to hosts on both arrays #42

Open
ajoergensen opened this issue Feb 3, 2025 · 20 comments · Fixed by #50
Open

Active Cluster: Volumes must be connected to hosts on both arrays #42

ajoergensen opened this issue Feb 3, 2025 · 20 comments · Fixed by #50
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@ajoergensen
Copy link

When running arrays in Active Cluster configuration, one must (manually) connect volumes and hosts in a given pod on both arrays.

If a volume is only connected to the host on one array, the host will lose connectivity to all volumes in case of an array failure - and under normal circumstances it will cause suboptimal connectivity for hosts which has the other array as its preferred array.

Currently, a new volume/disk is only connected to its active Proxmox host on the array configured in storage.cfg, the fix would be to define the "secondary" array as well and do host connect/disconnect API calls on this array as well.

/Allan

@timansky timansky added the enhancement New feature or request label Feb 3, 2025
@timansky
Copy link
Collaborator

timansky commented Feb 3, 2025

Already mention this here: PR 30

Unfortunately, I currently do not have the ability to implement an active/active or active/passive setup, so more information is needed.

From my understanding, the current proposal would only work with an active/active setup.

Open Questions:
1. Does the replica cluster send the same IQN/WWN, and what differs in multipath?
2. In an active/passive setup, could a transparent switch cause a failure?
3. How many arrays can be linked in the chain?

PS:
My assumption is that in an active/active setup, the system transparently sends the same WWN from different portals to the host, meaning only the iSCSI configuration would differ, which is independent of the plugin.

For management(API), a quick fix would be adding an RR record in DNS. If this solution is sufficient, then my proposal could be a viable option.”**

@ajoergensen
Copy link
Author

I can only answer for active-active (Active Cluster)

  1. The LUN is presented to the host with the same characteristics from all portals on both arrays (IQN/WWN/serial), so multipath sees it as the same device (because it is the same device).
  2. Unknown
  3. To my knowledge, two

I don't #39 covers the same issue, the thing is, you need to log in to both arrays and issue the API command(s) to connect the volume to the host. The API token will differ on the two arrays. For the same reasons, DNS trickery does not work.

In multipath it looks like this:

Volumes created from the GUI and presented from both arrays:

3624a9370a0bcd15b714f4ee909ae9960 dm-7 PURE,FlashArray
size=10T features='0' hwhandler='1 alua' wp=rw
|-+- policy='queue-length 0' prio=50 status=active
| |- 1:0:0:254  sdf  8:80   active ready running
| |- 5:0:0:254  sdn  8:208  active ready running
| |- 8:0:0:254  sdo  8:224  active ready running
| |- 7:0:0:254  sdp  8:240  active ready running
| |- 6:0:0:254  sdq  65:0   active ready running
| |- 3:0:0:254  sdm  8:192  active ready running
| |- 4:0:0:254  sdl  8:176  active ready running
| `- 2:0:0:254  sdj  8:144  active ready running
`-+- policy='queue-length 0' prio=10 status=enabled
  |- 22:0:0:254 sdba 67:64  active ready running
  |- 14:0:0:254 sdaj 66:48  active ready running
  |- 13:0:0:254 sdak 66:64  active ready running
  |- 11:0:0:254 sdae 65:224 active ready running
  |- 9:0:0:254  sdaa 65:160 active ready running
  |- 10:0:0:254 sdac 65:192 active ready running
  |- 12:0:0:254 sdag 66:0   active ready running
  |- 16:0:0:254 sdao 66:128 active ready running
  |- 15:0:0:254 sdan 66:112 active ready running
  |- 17:0:0:254 sdar 66:176 active ready running
  |- 18:0:0:254 sdas 66:192 active ready running
  |- 20:0:0:254 sdaw 67:0   active ready running
  |- 19:0:0:254 sdav 66:240 active ready running
  |- 21:0:0:254 sdaz 67:48  active ready running
  |- 24:0:0:254 sdbe 67:128 active ready running
  `- 23:0:0:254 sdbd 67:112 active ready running

Volume create by pve-purestorage-plugin (and only presented from the array configured in storage.cfg):

3624a9370a0bcd15b714f4ee909c68190 dm-13 PURE,FlashArray
size=100G features='0' hwhandler='1 alua' wp=rw
`-+- policy='queue-length 0' prio=50 status=active
  |- 3:0:0:1 sds 65:32  active ready running
  |- 4:0:0:1 sdu 65:64  active ready running
  |- 8:0:0:1 sdw 65:96  active ready running
  |- 6:0:0:1 sdx 65:112 active ready running
  |- 7:0:0:1 sdv 65:80  active ready running
  |- 1:0:0:1 sdr 65:16  active ready running
  |- 2:0:0:1 sdt 65:48  active ready running
  `- 5:0:0:1 sdy 65:128 active ready running

@amulet1
Copy link
Collaborator

amulet1 commented Feb 3, 2025

In Case of ActiveCluster volumes would have the same serial number and other parameters on clustered arrays.

It is up to the end-user to decide which array(s) they want to be connected to. For redundancy they would want to be connected to both local and remote arrays, but if the main concern is about the data (not services) loss, it is not even required. We would only need to worry about multipath configuration if we start using target names (e.g. in iscsiadm). The target names will be different for different arrays.

@amulet1
Copy link
Collaborator

amulet1 commented Feb 3, 2025

I can submit a patches to support ActiveCluster.

@amulet1 amulet1 self-assigned this Feb 3, 2025
@timansky
Copy link
Collaborator

timansky commented Feb 3, 2025

@amulet1 you are right we need some fail over policy. in case if pve connected to both arrays there can be a split brain.

@amulet1
Copy link
Collaborator

amulet1 commented Feb 3, 2025

I assume API login tokens would be different in case of 2 arrays.

@amulet1
Copy link
Collaborator

amulet1 commented Feb 4, 2025

@amulet1 you are right we need some fail over policy. in case if pve connected to both arrays there can be a split brain.

I do not think we should worry about the split brain situations - ActiveCluster relies on Pure1 Cloud Mediator.

@ajoergensen
Copy link
Author

I assume API login tokens would be different in case of 2 arrays.

Correct, it is not possible to decide the token during creation, so it will always be different.

@amulet1
Copy link
Collaborator

amulet1 commented Feb 4, 2025

I guess I will then introduce address2 and token2 parameters for the second array in a cluster. Keeping them comma-separated on one line is not pretty.

amulet1 added a commit to amulet1/pve-purestorage-plugin that referenced this issue Feb 4, 2025
 * ActiveCluster is enabled by providing a second array plugin parameters "address2" and "token2"
 * retry non-API errors on a second array
 * connect/disconnect volumes to host on both arrays
 * related code refactoring and improvements
amulet1 added a commit to amulet1/pve-purestorage-plugin that referenced this issue Feb 4, 2025
 * ActiveCluster is enabled by providing a second array plugin parameters "address2" and "token2"
 * retry non-API errors on a second array
 * connect/disconnect volumes to host on both arrays
 * related code refactoring and improvements
@amulet1
Copy link
Collaborator

amulet1 commented Feb 4, 2025

Please try this file.

Initial ActiveCluster support:

  • ActiveCluster is enabled by providing a second array plugin parameters address2 and token2
  • retry non-API errors on a second array
  • connect/disconnect volumes to/from host on both arrays
  • related code refactoring and improvements

Testing is needed. I do not have ActiveCluster set up, and I could not test/address the scenario when one of the arrays (e.g. primary) is operational but there is a failover to the secondary array (e.g. due to loss of connectivity between arrays). I need to know if there are specific API errors in this case.

@ajoergensen
Copy link
Author

Please try this file.

Awesome, I will test when I get the chance.

/a

@ajoergensen
Copy link
Author

First couple of tests look good. Disks are mapped on both arrays as expected.

/a

@timansky
Copy link
Collaborator

timansky commented Feb 5, 2025

I guess I will then introduce address2 and token2 parameters for the second array in a cluster. Keeping them comma-separated on one line is not pretty.

Coma separated notation is a defacto standard for most clusters. (rmq, es, mongo...). Even PVE, all multi hosts variants are coma separated (rbd mon host, nodes, etc)

@amulet1
Copy link
Collaborator

amulet1 commented Feb 5, 2025

Then we would have to also put 2 tokens (and eventually/maybe 2 targets ans something else) on one line, and it would not look great as these can be long.

Of course it is easy to change to one-liners if preferred.

@NojuHD
Copy link
Contributor

NojuHD commented Feb 6, 2025

... it would not look great as these can be long.

Many would also say that creating a parameter for each new array (connection) is just as ineffective.

I guess just separating in the same parameter would be much more flexible.

amulet1 added a commit to amulet1/pve-purestorage-plugin that referenced this issue Feb 6, 2025
 * ActiveCluster is enabled by providing comma-separated cluster arrays parameters in "address" and "token"
 * retry non-API errors on a second array
 * connect/disconnect volumes to host on both arrays
 * related code refactoring and improvements
@amulet1
Copy link
Collaborator

amulet1 commented Feb 6, 2025

ActiveCluster is enabled by providing comma-separated cluster arrays parameters in "address" and "token"

The file to test is here.

@timansky timansky added this to the v0.0.6 milestone Feb 7, 2025
@ajoergensen
Copy link
Author

ActiveCluster is enabled by providing comma-separated cluster arrays parameters in "address" and "token"

The file to test is here.

The updated PureStoragePlugin.pm works as expected after removing address2 and token2 and adding the information to address and token respectively.

@timansky
Copy link
Collaborator

timansky commented Feb 7, 2025

@ajoergensen plz try PR if ok i'll merge

@amulet1
Copy link
Collaborator

amulet1 commented Feb 8, 2025

There are still TODO things left in #50. At the very least we need to update README.

@timansky timansky reopened this Feb 8, 2025
@timansky
Copy link
Collaborator

timansky commented Feb 8, 2025

@ajoergensen @amulet1, reopened because of readme missing.

amulet1 added a commit to amulet1/pve-purestorage-plugin that referenced this issue Feb 8, 2025
 * ActiveCluster is enabled by providing comma-separated cluster arrays parameters in "address" and "token"
 * retry non-API errors on a second array
 * connect/disconnect volumes to host on both arrays
 * related code refactoring and improvements
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants