Proposal: Testing consistency group based replication and DR protection #1508

ShyamsundarR · 2024-08-02T12:26:36Z

We will create a StatefulSet, where the pod with index 0 would log, to disk, counter values issued to other replicas. These counter values are persisted on disk by the non-0 index replicas and acknowledged back to the log pod. On acknowledgement from the replica the log pod will update the replicas persisted counter value on disk.

All PVCs of this STS would be DR protected using a consistency group.

How does this test CGs:

Log pod if snapshotted and replicated at a different time than each replica, may contain a stale issued or persisted counter than the replica itself
- Say a snapshot of log PVC was replicated earlier and was able to issue a future counter value to replicas, one or more replicas may persist this and be snapshotted and replicated later, the log data hence will not tally with the replica data
- Or, a snapshot of the log pod was replicated later, then again log pod values may be ahead of replica values as these are snapshotted earlier and replicated

Log pod (replica 0)

Algo:
- For each replica do forever:
  - Generate a new "issued" counter and commit it to log
  - Issue a write to the replica with the counter logged
  - Receive an ack from the replica that it wrote the counter, and update it's "persisted" value
disk format/file:
<log.yaml>

persisted:
  - replica: <m>
    counter: <n>
  - ...
issued:
  - replica: <m>
    counter: <n>
  - ...

Replica pod (replica 1..M)

Algo:
- Receive "counter" value from log pod
- Write value to disk and persist it
- Respond to log pod with written counter
disk format/file
<counter.yaml>

counter: <n>

Log pod Init container:

Read log.yaml
Check with each replica if stored counter is either of issued or persisted; consistency error if neither
Update persisted if issued != persisted; as an issued counter value may have been persisted

Replica pod Init container:

Read counter.yaml
Respond to log request on currently persisted counter; error if persisted != [issued|persisted] from request

k8s workload type:

Run the above as a StatefulSet
STS with index 0 acts as the log
Other STS pods act as replicas
Replica count passed is same as STS # of replicas
- NOTE: Allow for expand/contract of STS and hence replica counts

Tests:

For DR, failover the STS in a loop and ensure STS health post failover
For Backup, create STS with PVC clones from the VolumeGroupSnapshot and ensure STS reports no errors on using these cloned PVCs

The text was updated successfully, but these errors were encountered:

ShyamsundarR · 2024-08-02T12:29:13Z

@batrick and @idryomov would like your opinion on the above proposal to test a consistent group snapshot. Also, if there are existing tests in Ceph that can serve as an example, would help developing around the same.

(also tagging @BenamarMk @youhangwang @ELENAGER @keslerzhu for inputs)

youhangwang · 2024-08-20T05:38:33Z

@ShyamsundarR A simple application could be a pod in deployment attach multiple PVCs. All these PVCs will be DR protected using a consistency group.

this application keeps to append the latest date into these volumes, for example

echo "$(date) $1" | tee -a /var/pvc1/ramencg.log /var/pvc2/ramencg.log /var/pvc3/ramen.log /var/pvcn/ramencg.log

there are some scenarios for the file in remote(secondary cluster) in the test:

All Files have the same content: If the snapshot is taken after being attached to all PVCs on the latest date, all files should have the same content.
Some Files have n lines, Others have n-1 lines: if the snapshot take just during the append, then some files could have the latest date(file has n lines), others could not have(file has n-1 lines).
Otherwise, the cg test will be failed. which means the snapshot of different pvcs were taken in different time.

does this meet our test requirement?

ShyamsundarR · 2024-08-20T14:34:13Z

does this meet our test requirement?

@youhangwang All PVCs are mounted to a single node, so the Ceph kernel drivers that would ensure the CG on the client side are not distributed.

If we distributed this across nodes, then writing the same data (date in this case) across these nodes would need some coordination.

Although drenv has a single node, it is useful to create the app in such a manner that it can be run across nodes.

idryomov · 2024-08-23T15:15:06Z

@batrick and @idryomov would like your opinion on the above proposal to test a consistent group snapshot.

This looks good to me! For testing the test itself, I could provide a rigged RBD build with the consistency logic inside of rbd group snap create command/API disabled.

Also, if there are existing tests in Ceph that can serve as an example, would help developing around the same.

I'm not aware of anything like this on the RBD side.

idryomov · 2024-08-23T15:26:33Z

All PVCs are mounted to a single node, so the Ceph kernel drivers that would ensure the CG on the client side are not distributed.

@ShyamsundarR I'm not sure I understand this comment. Can you elaborate or perhaps just rephrase?

The distributed lock that is part of the consistency logic is per-image, so even if everything is on single node, it still plays a role.

ShyamsundarR · 2024-08-26T14:27:14Z

All PVCs are mounted to a single node, so the Ceph kernel drivers that would ensure the CG on the client side are not distributed.

@ShyamsundarR I'm not sure I understand this comment. Can you elaborate or perhaps just rephrase?

The distributed lock that is part of the consistency logic is per-image, so even if everything is on single node, it still plays a role.

Ack understood. I would still like to develop a cross node RBD images (or CephFS subdir) use and test for systems where we can have more than one worker node.

ShyamsundarR · 2024-08-26T14:30:27Z

@batrick and @idryomov would like your opinion on the above proposal to test a consistent group snapshot.

This looks good to me! For testing the test itself, I could provide a rigged RBD build with the consistency logic inside of rbd group snap create command/API disabled.

We may potentially use this to test VolumeGroupSnapshot with RBD, to ensure the CG part of it. Tagging @nixpanic for thoughts on using such an app to test the snapshot API.

@idryomov I am writing the above to state that we may not need the "one-off" build (at present at least) with the modified API to start testing the CG snaps. Mirroring does change things as we need to test the CG snaps on the remote cluster and not the local cluster, but we can start here.

Also, if there are existing tests in Ceph that can serve as an example, would help developing around the same.

I'm not aware of anything like this on the RBD side.

ShyamsundarR mentioned this issue Sep 9, 2024

Testing consistency group based replication and DR protection #1549

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Testing consistency group based replication and DR protection #1508

Proposal: Testing consistency group based replication and DR protection #1508

ShyamsundarR commented Aug 2, 2024

ShyamsundarR commented Aug 2, 2024

youhangwang commented Aug 20, 2024

ShyamsundarR commented Aug 20, 2024 •

edited

Loading

idryomov commented Aug 23, 2024

idryomov commented Aug 23, 2024

ShyamsundarR commented Aug 26, 2024

ShyamsundarR commented Aug 26, 2024

Proposal: Testing consistency group based replication and DR protection #1508

Proposal: Testing consistency group based replication and DR protection #1508

Comments

ShyamsundarR commented Aug 2, 2024

How does this test CGs:

Log pod (replica 0)

Replica pod (replica 1..M)

ShyamsundarR commented Aug 2, 2024

youhangwang commented Aug 20, 2024

ShyamsundarR commented Aug 20, 2024 • edited Loading

idryomov commented Aug 23, 2024

idryomov commented Aug 23, 2024

ShyamsundarR commented Aug 26, 2024

ShyamsundarR commented Aug 26, 2024

ShyamsundarR commented Aug 20, 2024 •

edited

Loading