Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Testing consistency group based replication and DR protection #1508

Open
Tracked by #1501
ShyamsundarR opened this issue Aug 2, 2024 · 7 comments
Open
Tracked by #1501

Comments

@ShyamsundarR
Copy link
Member

We will create a StatefulSet, where the pod with index 0 would log, to disk, counter values issued to other replicas. These counter values are persisted on disk by the non-0 index replicas and acknowledged back to the log pod. On acknowledgement from the replica the log pod will update the replicas persisted counter value on disk.

All PVCs of this STS would be DR protected using a consistency group.

How does this test CGs:

  • Log pod if snapshotted and replicated at a different time than each replica, may contain a stale issued or persisted counter than the replica itself
    • Say a snapshot of log PVC was replicated earlier and was able to issue a future counter value to replicas, one or more replicas may persist this and be snapshotted and replicated later, the log data hence will not tally with the replica data
    • Or, a snapshot of the log pod was replicated later, then again log pod values may be ahead of replica values as these are snapshotted earlier and replicated

Log pod (replica 0)

  • Algo:

    • For each replica do forever:
      • Generate a new "issued" counter and commit it to log
      • Issue a write to the replica with the counter logged
      • Receive an ack from the replica that it wrote the counter, and update it's "persisted" value
  • disk format/file:
    <log.yaml>

persisted:
  - replica: <m>
    counter: <n>
  - ...
issued:
  - replica: <m>
    counter: <n>
  - ...

Replica pod (replica 1..M)

  • Algo:

    • Receive "counter" value from log pod
    • Write value to disk and persist it
    • Respond to log pod with written counter
  • disk format/file
    <counter.yaml>

counter: <n>

Log pod Init container:

  • Read log.yaml
  • Check with each replica if stored counter is either of issued or persisted; consistency error if neither
  • Update persisted if issued != persisted; as an issued counter value may have been persisted

Replica pod Init container:

  • Read counter.yaml
  • Respond to log request on currently persisted counter; error if persisted != [issued|persisted] from request

k8s workload type:

  • Run the above as a StatefulSet
  • STS with index 0 acts as the log
  • Other STS pods act as replicas
  • Replica count passed is same as STS # of replicas
    • NOTE: Allow for expand/contract of STS and hence replica counts

Tests:

  • For DR, failover the STS in a loop and ensure STS health post failover
  • For Backup, create STS with PVC clones from the VolumeGroupSnapshot and ensure STS reports no errors on using these cloned PVCs
@ShyamsundarR
Copy link
Member Author

@batrick and @idryomov would like your opinion on the above proposal to test a consistent group snapshot. Also, if there are existing tests in Ceph that can serve as an example, would help developing around the same.

(also tagging @BenamarMk @youhangwang @ELENAGER @keslerzhu for inputs)

@youhangwang
Copy link
Member

@ShyamsundarR A simple application could be a pod in deployment attach multiple PVCs. All these PVCs will be DR protected using a consistency group.

this application keeps to append the latest date into these volumes, for example

echo "$(date) $1" | tee -a /var/pvc1/ramencg.log /var/pvc2/ramencg.log /var/pvc3/ramen.log /var/pvcn/ramencg.log

there are some scenarios for the file in remote(secondary cluster) in the test:

  • All Files have the same content: If the snapshot is taken after being attached to all PVCs on the latest date, all files should have the same content.
  • Some Files have n lines, Others have n-1 lines: if the snapshot take just during the append, then some files could have the latest date(file has n lines), others could not have(file has n-1 lines).
  • Otherwise, the cg test will be failed. which means the snapshot of different pvcs were taken in different time.

does this meet our test requirement?

@ShyamsundarR
Copy link
Member Author

ShyamsundarR commented Aug 20, 2024

does this meet our test requirement?

@youhangwang All PVCs are mounted to a single node, so the Ceph kernel drivers that would ensure the CG on the client side are not distributed.

If we distributed this across nodes, then writing the same data (date in this case) across these nodes would need some coordination.

Although drenv has a single node, it is useful to create the app in such a manner that it can be run across nodes.

@idryomov
Copy link

@batrick and @idryomov would like your opinion on the above proposal to test a consistent group snapshot.

This looks good to me! For testing the test itself, I could provide a rigged RBD build with the consistency logic inside of rbd group snap create command/API disabled.

Also, if there are existing tests in Ceph that can serve as an example, would help developing around the same.

I'm not aware of anything like this on the RBD side.

@idryomov
Copy link

All PVCs are mounted to a single node, so the Ceph kernel drivers that would ensure the CG on the client side are not distributed.

@ShyamsundarR I'm not sure I understand this comment. Can you elaborate or perhaps just rephrase?

The distributed lock that is part of the consistency logic is per-image, so even if everything is on single node, it still plays a role.

@ShyamsundarR
Copy link
Member Author

All PVCs are mounted to a single node, so the Ceph kernel drivers that would ensure the CG on the client side are not distributed.

@ShyamsundarR I'm not sure I understand this comment. Can you elaborate or perhaps just rephrase?

The distributed lock that is part of the consistency logic is per-image, so even if everything is on single node, it still plays a role.

Ack understood. I would still like to develop a cross node RBD images (or CephFS subdir) use and test for systems where we can have more than one worker node.

@ShyamsundarR
Copy link
Member Author

@batrick and @idryomov would like your opinion on the above proposal to test a consistent group snapshot.

This looks good to me! For testing the test itself, I could provide a rigged RBD build with the consistency logic inside of rbd group snap create command/API disabled.

We may potentially use this to test VolumeGroupSnapshot with RBD, to ensure the CG part of it. Tagging @nixpanic for thoughts on using such an app to test the snapshot API.

@idryomov I am writing the above to state that we may not need the "one-off" build (at present at least) with the modified API to start testing the CG snaps. Mirroring does change things as we need to test the CG snaps on the remote cluster and not the local cluster, but we can start here.

Also, if there are existing tests in Ceph that can serve as an example, would help developing around the same.

I'm not aware of anything like this on the RBD side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants