en: add pd-recover doc (pingcap#250)

* en: add pd-recover doc * Update TOC.md * update wording * Apply suggestions from code review Co-authored-by: DanielZhangQD <[email protected]> * address comments Co-authored-by: Lilian Lee <[email protected]> * Update en/pd-recover.md Co-authored-by: Lilian Lee <[email protected]> Co-authored-by: DanielZhangQD <[email protected]> Co-authored-by: Lilian Lee <[email protected]>
zhenjiaogao · May 8, 2020 · bcac7ce · bcac7ce
1 parent c03af2d
commit bcac7ce
Show file tree

Hide file tree

Showing 2 changed files with 199 additions and 1 deletion.
diff --git a/en/TOC.md b/en/TOC.md
@@ -33,12 +33,13 @@
   - [Monitor TiDB Using Helm](monitor-a-tidb-cluster.md)
   - [Monitor TiDB Using TidbMonitor](monitor-using-tidbmonitor.md)
 + Maintain
-  - [Destroy a TiDB cluster](destroy-a-tidb-cluster.md)
+  - [Destroy a TiDB Cluster](destroy-a-tidb-cluster.md)
   - [Restart a TiDB Cluster](restart-a-tidb-cluster.md)
   - [Maintain a Hosting Kubernetes Node](maintain-a-kubernetes-node.md)
   - [Collect TiDB Logs](collect-tidb-logs.md)
   - [Enable Automatic Failover](use-auto-failover.md)
   - [Enable Admission Controller](enable-admission-webhook.md)
+  - [Use PD Recover to Recover the PD Cluster](pd-recover.md)
 + Scale
   - [Scale](scale-a-tidb-cluster.md)
   - [Enable Auto-scaling](enable-tidb-cluster-auto-scaling.md)

diff --git a/en/pd-recover.md b/en/pd-recover.md
@@ -0,0 +1,197 @@
+---
+title: Use PD Recover to Recover the PD Cluster
+summary: Learn how to use PD Recover to recover the PD cluster.
+category: reference
+---
+
+# Use PD Recover to Recover the PD Cluster
+
+[PD Recover](https://pingcap.com/docs/stable/reference/tools/pd-recover) is a disaster recovery tool of [PD](https://pingcap.com/docs/stable/architecture/#placement-driver-server), used to recover the PD cluster which cannot start or provide services normally.
+
+## Download PD Recover
+
+1. Download the official TiDB package:
+
+    {{< copyable "shell-regular" >}}
+
+    ```shell
+    wget https://download.pingcap.org/tidb-${version}-linux-amd64.tar.gz
+    ```
+
+    In the command above, `${version}` is the version of the TiDB cluster, such as `v4.0.0-rc`.
+
+2. Unpack the TiDB package for installation:
+
+    {{< copyable "shell-regular" >}}
+
+    ```shell
+    tar -xzf tidb-${version}-linux-amd64.tar.gz
+    ```
+
+    `pd-recover` is in the `tidb-${version}-linux-amd64/bin` directory.
+
+## Recover the PD cluster
+
+This section introduces how to recover the PD cluster using PD Recover.
+
+### Get Cluster ID
+
+{{< copyable "shell-regular" >}}
+
+```shell
+kubectl get tc ${cluster_name} -n ${namespace} -o='go-template={{.status.clusterID}}{{"\n"}}'
+```
+
+Example:
+
+```
+kubectl get tc test -n test -o='go-template={{.status.clusterID}}{{"\n"}}'
+6821434242797747735
+```
+
+### Get Alloc ID
+
+When you use `pd-recover` to recover the PD cluster, you need to specify `alloc-id`. The value of `alloc-id` must be larger than the largest allocated ID (`Alloc ID`) of the original cluster.
+
+1. Access the Prometheus monitoring data of the TiDB cluster by taking steps in [Access the monitoring data](monitor-a-tidb-cluster.md#access-the-monitoring-data).
+
+2. Enter `pd_cluster_id` in the input box and click the `Execute` button to make a query. Get the largest value in the query result.
+
+3. Multiply the largest value in the query result by `100`. Use the multiplied value as the `alloc-id` value specified when using `pd-recover`.
+
+### Recover the PD Pod
+
+1. Delete the Pod of the PD cluster.
+
+    Execute the following command to set the value of `spec.pd.replicas` to `0`:
+
+    {{< copyable "shell-regular" >}}
+
+    ```shell
+    kubectl edit tc ${cluster_name} -n ${namespace}
+    ```
+
+    Because the PD cluster is in an abnormal state, TiDB Operator cannot synchronize the change above to the PD StatefulSet. You need to execute the following command to set the `spec.replicas` of the PD StatefulSet to `0`.
+
+    {{< copyable "shell-regular" >}}
+
+    ```shell
+    kubectl edit sts ${cluster_name}-pd -n ${namespace}
+    ```
+
+    Execute the following command to confirm that the PD Pod is deleted:
+
+    {{< copyable "shell-regular" >}}
+
+    ```shell
+    kubectl get pod -n ${namespace}
+    ```
+
+2. After confirming that all PD Pods are deleted, execute the following command to delete the PVCs bound to the PD Pods:
+
+    {{< copyable "shell-regular" >}}
+
+    ```shell
+    kubectl delete pvc -l app.kubernetes.io/component=pd,app.kubernetes.io/instance=${cluster_name} -n ${namespace}
+    ```
+
+3. After the PVCs are deleted, scale out the PD cluster to one Pod:
+
+    Execute the following command to set the value of `spec.pd.replicas` to `1`:
+
+    {{< copyable "shell-regular" >}}
+
+    ```shell
+    kubectl edit tc ${cluster_name} -n ${namespace}
+    ```
+
+    Because the PD cluster is in an abnormal state, TiDB Operator cannot synchronize the change above to the PD StatefulSet. You need to execute the following command to set the `spec.replicas` of the PD StatefulSet to `1`.
+
+    {{< copyable "shell-regular" >}}
+
+    ```shell
+    kubectl edit sts ${cluster_name}-pd -n ${namespace}
+    ```
+
+    Execute the following command to confirm that the PD Pod is started:
+
+    {{< copyable "shell-regular" >}}
+
+    ```shell
+    kubectl get pod -n ${namespace}
+    ```
+
+### Recover the cluster
+
+1. Execute the `port-forward` command to expose the PD service:
+
+    {{< copyable "shell-regular" >}}
+
+    ```shell
+    kubectl port-forward -n ${namespace} svc/${cluster_name}-pd 2379:2379
+    ```
+
+2. Open a **new** terminal tab or window, enter the directory where `pd-recover` is located, and execute the `pd-recover` command to recover the PD cluster:
+
+    {{< copyable "shell-regular" >}}
+
+    ```shell
+    ./pd-recover -endpoints http://127.0.0.1:2379 -cluster-id ${cluster_id} -alloc-id ${alloc_id}
+    ```
+
+    In the command above, `${cluster_id}` is the cluster ID got in [Get Cluster ID](#get-cluster-id). `${alloc_id}` is the largest value of `pd_cluster_id` (got in [Get Alloc ID](#get-alloc-id)) multiplied by `100`.
+
+    After the `pd-recover` command is successfully executed, the following result is printed:
+
+    ```shell
+    recover success! please restart the PD cluster
+    ```
+
+3. Go back to the window where the `port-forward` command is executed, press <kbd>Ctrl</kbd>+<kbd>C</kbd> to stop and exit.
+
+### Restart the PD Pod
+
+1. Delete the PD Pod:
+
+    {{< copyable "shell-regular" >}}
+
+    ```shell
+    kubectl delete pod ${cluster_name}-pd-0 -n ${namespace}
+    ```
+
+2. After the Pod is started successfully, execute the `port-forward` command to expose the PD service:
+
+    {{< copyable "shell-regular" >}}
+
+    ```shell
+    kubectl port-forward -n ${namespace} svc/${cluster_name}-pd 2379:2379
+    ```
+
+3. Open a **new** terminal tab or window, execute the following command to confirm the Cluster ID is the one got in [Get Cluster ID](#get-cluster-id).
+
+    {{< copyable "shell-regular" >}}
+
+    ```shell
+    curl 127.0.0.1:2379/pd/api/v1/cluster
+    ```
+
+4. Go back to the window where the `port-forward` command is executed, press <kbd>Ctrl</kbd>+<kbd>C</kbd> to stop and exit.
+
+### Increase the capacity of the PD cluster
+
+Execute the following command to set the value of `spec.pd.replicas` to the desired number of Pods:
+
+{{< copyable "shell-regular" >}}
+
+```shell
+kubectl edit tc ${cluster_name} -n ${namespace}
+```
+
+### Restart TiDB and TiKV
+
+{{< copyable "shell-regular" >}}
+
+```shell
+kubectl delete pod -l app.kubernetes.io/component=tidb,app.kubernetes.io/instance=${cluster_name} -n ${namespace} &&
+kubectl delete pod -l app.kubernetes.io/component=tikv,app.kubernetes.io/instance=${cluster_name} -n ${namespace}
+```