Skip to content

Commit 6d847b0

Browse files
committed
Blog for Volume Group Snapshot
1 parent 26a3c9c commit 6d847b0

File tree

1 file changed

+268
-0
lines changed

1 file changed

+268
-0
lines changed
Lines changed: 268 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,268 @@
1+
---
2+
layout: blog
3+
title: "Introducing Volume Group Snapshot"
4+
date: 2023-05-08T10:00:00-08:00
5+
slug: kubernetes-1-27-volume-group-snapshot-alpha
6+
---
7+
8+
**Author:** Xing Yang (VMware)
9+
10+
Volume group snapshot is introduced as an Alpha feature in Kubernetes v1.27.
11+
This feature introduces a Kubernetes API that allows users to take a crash consistent
12+
snapshot for multiple volumes together. It uses a label selector to group multiple
13+
PersistentVolumeClaims for snapshotting.
14+
This new feature is only supported for CSI volume drivers.
15+
16+
## What is Volume Group Snapshot
17+
18+
Some storage systems provide the ability to create a crash consistent snapshot of
19+
multiple volumes. A group snapshot represents “copies” from multiple volumes that
20+
are taken at the same point-in-time. A group snapshot can be used either to rehydrate
21+
new volumes (pre-populated with the snapshot data) or to restore existing volumes to
22+
a previous state (represented by the snapshots).
23+
24+
## Why add Volume Group Snapshots to Kubernetes?
25+
26+
The Kubernetes volume plugin system already provides a powerful abstraction that
27+
automates the provisioning, attaching, mounting, resizing, and snapshotting of block
28+
and file storage.
29+
30+
Underpinning all these features is the Kubernetes goal of workload portability:
31+
Kubernetes aims to create an abstraction layer between distributed applications and
32+
underlying clusters so that applications can be agnostic to the specifics of the
33+
cluster they run on and application deployment requires no “cluster specific” knowledge.
34+
35+
There is already a [VolumeSnapshot API](https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/177-volume-snapshot)
36+
that provides the ability to take a snapshot of a persistent volume to protect against
37+
data loss or data corruption. However, there are other snapshotting functionalities
38+
not covered by the VolumeSnapshot API.
39+
40+
Some storage systems support consistent group snapshots that allow a snapshot to be
41+
taken from multiple volumes at the same point-in-time to achieve write order consistency.
42+
This can be useful for applications that contain multiple volumes. For example,
43+
an application may have data stored in one volume and logs stored in another volume.
44+
If snapshots for the data volume and the logs volume are taken at different times,
45+
the application will not be consistent and will not function properly if it is restored
46+
from those snapshots when a disaster strikes.
47+
48+
It is true that we can quiesce the application first, take an individual snapshot from
49+
each volume that is part of the application one after the other, and then unquiesce the
50+
application after all the individual snapshots are taken. This way we will get application
51+
consistent snapshots.
52+
However, application quiesce is time consuming. Sometimes it may not be possible to
53+
quiesce an application. Taking individual snapshots one after another may also take
54+
longer time compared to taking a consistent group snapshot. Some users may not want
55+
to do application quiesce very frequently for these reasons. For example, a user may
56+
want to run weekly backups with application quiesce and nightly backups without
57+
application quiesce but with consistent group support which provides crash consistency
58+
across all volumes in the group.
59+
60+
## Kubernetes Volume Group Snapshots API
61+
62+
Kubernetes Volume Group Snapshots introduce [three new API objects](https://github.com/kubernetes-csi/external-snapshotter/blob/master/client/apis/volumegroupsnapshot/v1alpha1/types.go) for managing snapshots:
63+
64+
`VolumeGroupSnapshot`
65+
: Created by a Kubernetes user (or perhaps by your own automation) to request
66+
creation of a volume group snapshot for multiple volumes.
67+
It contains information about the volume group snapshot operation such as the
68+
timestamp when the volume group snapshot was taken and whether it is ready to use.
69+
The creation and deletion of this object represents a desire to create or delete a
70+
cluster resource (a group snapshot).
71+
72+
`VolumeGroupSnapshotContent`
73+
: Created by the snapshot controller for a dynamically created VolumeGroupSnapshot.
74+
It contains information about the volume group snapshot including the volume group
75+
snapshot ID.
76+
This object represents a provisioned resource on the cluster (a group snapshot).
77+
The VolumeGroupSnapshotContent object binds to the VolumeGroupSnapshot for which it
78+
was created with a one-to-one mapping.
79+
80+
`VolumeGroupSnapshotClass`
81+
: Created by cluster administrators to describe how volume group snapshots should be
82+
created. including the driver information, the deletion policy, etc.
83+
84+
The Volume Group Snapshot objects are defined as CustomResourceDefinitions (CRDs).
85+
These CRDs must be installed in a Kubernetes cluster for a CSI Driver to support
86+
volume group snapshots.
87+
88+
## How do I use Kubernetes Volume Group Snapshots
89+
90+
Volume Group Snapshot feature is implemented in the
91+
[external-snapshotter](https://github.com/kubernetes-csi/external-snapshotter) repository. Implementing volume
92+
group snapshots meant adding or changing several components:
93+
94+
* Kubernetes Volume Group Snapshot CRDs
95+
* Volume group snapshot controller logic is added to the common snapshot controller.
96+
* Volume group snapshot validation webhook logic is added to the common snapshot validation webhook.
97+
* Logic to make CSI calls is added to CSI Snapshotter sidecar controller.
98+
99+
The volume snapshot controller, CRDs, and validation webhook are deployed once per
100+
cluster, while the sidecar is bundled with each CSI driver.
101+
102+
Therefore, it makes sense to deploy the volume snapshot controller, CRDs, and validation
103+
webhook as a cluster addon. It is strongly recommended that Kubernetes distributors
104+
bundle and deploy the volume snapshot controller, CRDs, and validation webhook as part
105+
of their Kubernetes cluster management process (independent of any CSI Driver).
106+
107+
### Creating a new group snapshot with Kubernetes
108+
109+
Once a VolumeGroupSnapshotClass object is defined and you have volumes you want to
110+
snapshot together, you may create a new group snapshot by creating a VolumeGroupSnapshot
111+
object.
112+
113+
The source of the group snapshot specifies whether the underlying group snapshot
114+
should be dynamically created or if a pre-existing VolumeGroupSnapshotContent
115+
should be used. One of the following members in the source must be set.
116+
117+
* Selector - Selector is a label query over persistent volume claims that are to be grouped together for snapshotting. This labelSelector will be used to match the label added to a PVC.
118+
* VolumeGroupSnapshotContentName - specifies the name of a pre-existing VolumeGroupSnapshotContent object representing an existing volume group snapshot.
119+
120+
For dynamic provisioning, a selector must be set so that the snapshot controller can
121+
find PVCs with the matching labels to be snapshotted together.
122+
123+
```yaml
124+
apiVersion: groupsnapshot.storage.k8s.io/v1alpha1
125+
kind: VolumeGroupSnapshot
126+
metadata:
127+
name: new-group-snapshot-demo
128+
namespace: demo-namespace
129+
spec:
130+
volumeGroupSnapshotClassName: csi-groupSnapclass
131+
source:
132+
selector:
133+
group: myGroup
134+
```
135+
136+
In the VolumeGroupSnapshot spec, a user can specify the VolumeGroupSnapshotClass which
137+
has the information about which CSI driver should be used for creating the group snapshot.
138+
139+
### Importing an existing group snapshot with Kubernetes
140+
141+
You can always import an existing group snapshot to Kubernetes by manually creating
142+
a VolumeGroupSnapshotContent object to represent the existing group snapshot.
143+
Because VolumeGroupSnapshotContent is a non-namespace API object, only a system admin
144+
may have the permission to create it. Once a VolumeGroupSnapshotContent object is
145+
created, the user can create a VolumeGroupSnapshot object pointing to the
146+
VolumeGroupSnapshotContent object.
147+
148+
```yaml
149+
apiVersion: groupsnapshot.storage.k8s.io/v1alpha1
150+
kind: VolumeGroupSnapshotContent
151+
metadata:
152+
name: pre-existing-group-snap-content1
153+
spec:
154+
driver: com.example.csi-driver
155+
deletionPolicy: Delete
156+
source:
157+
volumeGroupSnapshotHandle: group-snap-id
158+
volumeGroupSnapshotRef:
159+
kind: VolumeGroupSnapshot
160+
name: pre-existing-group-snap1
161+
namespace: demo-namespace
162+
```
163+
164+
A VolumeGroupSnapshot object should be created to allow a user to use the group snapshot:
165+
166+
```yaml
167+
apiVersion: groupsnapshot.storage.k8s.io/v1alpha1
168+
kind: VolumeGroupSnapshot
169+
metadata:
170+
name: pre-existing-group-snap1
171+
namespace: demo-namespace
172+
spec:
173+
snapshotContentName: pre-existing-group-snap-content1
174+
```
175+
176+
Once these objects are created, the snapshot controller will bind them together,
177+
and set the field `status.ready` to `"True"` to indicate the group snapshot is ready
178+
to use.
179+
180+
### How to use group snapshot for restore in Kubernetes
181+
182+
At restore time, the user can request a new PersistentVolumeClaim to be created from
183+
a VolumeSnapshot object that is part of a VolumeGroupSnapshot. This will trigger
184+
provisioning of a new volume that is pre-populated with data from the specified
185+
snapshot. The user should repeat this until all volumes are created from all the
186+
snapshots that are part of a group snapshot.
187+
188+
## As a storage vendor, how do I add support for group snapshots to my CSI driver?
189+
190+
To implement the volume group snapshot feature, a CSI driver MUST:
191+
192+
* Implement a new group controller service.
193+
* Implement group controller RPCs: `CreateVolumeGroupSnapshot`, `DeleteVolumeGroupSnapshot`, and `GetVolumeGroupSnapshot`.
194+
* Add group controller capability `CREATE_DELETE_GET_VOLUME_GROUP_SNAPSHOT`.
195+
196+
See the [CSI spec](https://github.com/container-storage-interface/spec/blob/master/spec.md)
197+
and the [Kubernetes-CSI Driver Developer Guide](https://kubernetes-csi.github.io/docs/)
198+
for more details.
199+
200+
Although Kubernetes poses as little prescriptive on the packaging and deployment of
201+
a CSI Volume Driver as possible, it provides a suggested mechanism to deploy a
202+
containerized CSI driver to simplify the process.
203+
204+
As part of this recommended deployment process, the Kubernetes team provides a number of
205+
sidecar (helper) containers, including the
206+
[external-snapshotter sidecar container](https://kubernetes-csi.github.io/docs/external-snapshotter.html)
207+
which has been updated to support volume group snapshot.
208+
209+
The external-snapshotter watches the Kubernetes API server for the
210+
`VolumeGroupSnapshotContent` object and triggers `CreateVolumeGroupSnapshot` and
211+
`DeleteVolumeGroupSnapshot` operations against a CSI endpoint.
212+
213+
## What are the limitations?
214+
215+
The alpha implementation of volume group snapshots for Kubernetes has the following
216+
limitations:
217+
218+
* Does not support reverting an existing PVC to an earlier state represented by a snapshot that is part of a group snapshot (only supports provisioning a new volume from a snapshot).
219+
* No application consistency guarantees beyond any guarantees provided by the storage system (e.g. crash consistency).
220+
221+
## What’s next?
222+
223+
Depending on feedback and adoption, the Kubernetes team plans to push the CSI
224+
Group Snapshot implementation to Beta in either 1.28 or 1.29.
225+
Some of the features we are interested in supporting include volume replication,
226+
replication group, volume placement, application quiescing, changed block tracking, and more.
227+
228+
## How can I learn more?
229+
230+
The design spec for the volume group snapshot feature is [here](https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/3476-volume-group-snapshot).
231+
232+
The code repository for volume group snapshot APIs and controller is [here](https://github.com/kubernetes-csi/external-snapshotter).
233+
234+
Check out additional documentation on the group snapshot feature [here](https://kubernetes-csi.github.io/docs/).
235+
236+
## How do I get involved?
237+
238+
This project, like all of Kubernetes, is the result of hard work by many contributors
239+
from diverse backgrounds working together. On behalf of SIG Storage, I would like to
240+
offer a huge thank you to the contributors who stepped up these last few quarters
241+
to help the project reach alpha:
242+
243+
* Alex Meade ([ameade](https://github.com/ameade))
244+
* Ben Swartzlander ([bswartz](https://github.com/bswartz))
245+
* Humble Devassy Chirammal ([humblec](https://github.com/humblec))
246+
* James Defelice ([jdef](https://github.com/jdef))
247+
* Jan Šafránek ([jsafrane](https://github.com/jsafrane))
248+
* Jing Xu ([jingxu97](https://github.com/jingxu97))
249+
* Michelle Au ([msau42](https://github.com/msau42))
250+
* Niels de Vos ([nixpanic](https://github.com/nixpanic))
251+
* Rakshith R ([Rakshith-R](https://github.com/Rakshith-R))
252+
* Raunak Shah ([RaunakShah](https://github.com/RaunakShah))
253+
* Saad Ali ([saad-ali](https://github.com/saad-ali))
254+
* Thomas Watson ([rbo54](https://github.com/rbo54))
255+
* Xing Yang ([xing-yang](https://github.com/xing-yang))
256+
* Yati Padia ([yati1998](https://github.com/yati1998))
257+
258+
We also want to thank everyone else who has contributed to the project, including others
259+
who helped review the [KEP](https://github.com/kubernetes/enhancements/pull/1551)
260+
and the [CSI spec PR](https://github.com/container-storage-interface/spec/pull/519).
261+
262+
For those interested in getting involved with the design and development of CSI or
263+
any part of the Kubernetes Storage system, join the
264+
[Kubernetes Storage Special Interest Group](https://github.com/kubernetes/community/tree/master/sig-storage) (SIG).
265+
We always welcome new contributors.
266+
267+
We also hold regular [Data Protection Working Group meetings](https://docs.google.com/document/d/15tLCV3csvjHbKb16DVk-mfUmFry_Rlwo-2uG6KNGsfw/edit#).
268+
New attendees are welcome to join our discussions.

0 commit comments

Comments
 (0)