Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 37 additions & 0 deletions etcdctl/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -1129,6 +1129,43 @@ DOWNGRADE CANCEL cancels the ongoing downgrade action to cluster.
./etcdctl downgrade cancel
Downgrade cancel success, cluster version 3.5
```
### DIAGNOSIS

`etcdctl diagnosis [flags]` - Collects and analyzes troubleshooting data from a running etcd cluster.

The `diagnosis` command gathers a concise set of diagnostic details from each cluster member by performing several checks, including:

* **Membership checks**: Verifies the cluster membership information.
* **Endpoint status**: Retrieves the status of each endpoint.
* **Serializable and linearizable reads**: Performs read operations to validate data consistency.
* **Metrics snapshot**: Collects a small snapshot of key metrics.

#### Flags

- `--cluster`: use all endpoints discovered from the cluster member list.
- `--etcd-storage-quota-bytes`: expected etcd storage quota in bytes (value passed to etcd with `--quota-backend-bytes`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For etcd version >= 3.6.0, we are able to get the quota by endpoint status query. For versions < 3.6.0, we still need to users to provide a value for this flag.

It's OK to enhance this in a followup PR.

- `-o, --output`: optional file path to write the JSON report; by default the report is written to stdout. Logs are written to stderr.

Global flags (like `--endpoints`, TLS, auth, and timeouts) are shared with other `etcdctl` commands. See `etcdctl options` for the full list.

#### Examples

To perform analysis of a running etcd cluster, you can use the following command. This will collect and analyze data from all specified endpoints.

```bash
etcdctl diagnosis --endpoints=https://10.0.1.10:2379,https://10.0.1.11:2379,https://10.0.1.12:2379 \
--cacert ./ca.crt --key ./etcd-diagnosis.key --cert ./etcd-diagnosis.crt

# Use cluster-discovered endpoints
etcdctl diagnosis --cluster

# Write report to a file (logs still go to stderr)
etcdctl diagnosis -o report.json
```


Example output: see [ctlv3/command/diagnosis/examples/etcd_diagnosis_report.json](ctlv3/command/diagnosis/examples/etcd_diagnosis_report.json)


## Concurrency commands

Expand Down
40 changes: 40 additions & 0 deletions etcdctl/ctlv3/command/diagnosis/engine/diagnosis.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
// Copyright 2025 The etcd Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

package engine

import (
"encoding/json"

"go.etcd.io/etcd/etcdctl/v3/ctlv3/command/diagnosis/engine/intf"
)

type report struct {
Input any `json:"input,omitempty"`
Results []any `json:"results,omitempty"`
}

// Diagnose runs all provided plugins and returns a JSON report.
// It logs plugin progress and individual results to stderr.
func Diagnose(input any, plugins []intf.Plugin) ([]byte, error) {
rp := report{
Input: input,
}
for _, plugin := range plugins {
result := plugin.Diagnose()
rp.Results = append(rp.Results, result)
}

return json.MarshalIndent(rp, "", "\t")
}
31 changes: 31 additions & 0 deletions etcdctl/ctlv3/command/diagnosis/engine/intf/plugin.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
// Copyright 2025 The etcd Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

package intf

type Plugin interface {
// Name returns the name of the plugin
Name() string
// Diagnose performs diagnosis and returns the result. If it fails
// to do the diagnosis for any reason, it gets the detailed reason
// included in the diagnosis result.
Diagnose() any
}

// FailedResult is the result returned by a plugin if it fails to
// perform the diagnosis for any reason.
type FailedResult struct {
Name string `json:"name"`
Reason string `json:"reason"`
}
181 changes: 181 additions & 0 deletions etcdctl/ctlv3/command/diagnosis/examples/etcd_diagnosis_report.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,181 @@
{
"input": {
"endpoints": [
"http://127.0.0.1:2379"
],
"useClusterEndpoints": true,
"dial-timeout": 2000000000,
"command-timeout": 5000000000,
"keep-alive-time": 2000000000,
"keep-alive-timeout": 5000000000,
"insecure": true,
"insecure-discovery": true,
"db-quota-bytes": 2147483648
},
"results": [
{
"name": "membershipChecker",
"memberList": {
"header": {
"cluster_id": 17237436991929493444,
"member_id": 9372538179322589801,
"raft_term": 2
},
"members": [
{
"ID": 9372538179322589801,
"name": "infra1",
"peerURLs": [
"http://127.0.0.1:12380"
],
"clientURLs": [
"http://127.0.0.1:2379"
]
},
{
"ID": 10501334649042878790,
"name": "infra2",
"peerURLs": [
"http://127.0.0.1:22380"
],
"clientURLs": [
"http://127.0.0.1:22379"
]
},
{
"ID": 18249187646912138824,
"name": "infra3",
"peerURLs": [
"http://127.0.0.1:32380"
],
"clientURLs": [
"http://127.0.0.1:32379"
]
}
]
}
},
{
"name": "epStatusChecker",
"summary": [
"Successful"
],
"epStatusList": [
{
"endpoint": "http://127.0.0.1:2379",
"epStatus": {
"header": {
"cluster_id": 17237436991929493444,
"member_id": 9372538179322589801,
"revision": 1,
"raft_term": 2
},
"version": "3.5.9",
"dbSize": 98304,
"leader": 18249187646912138824,
"raftIndex": 8,
"raftTerm": 2,
"raftAppliedIndex": 8,
"dbSizeInUse": 98304
}
},
{
"endpoint": "http://127.0.0.1:22379",
"epStatus": {
"header": {
"cluster_id": 17237436991929493444,
"member_id": 10501334649042878790,
"revision": 1,
"raft_term": 2
},
"version": "3.5.9",
"dbSize": 98304,
"leader": 18249187646912138824,
"raftIndex": 8,
"raftTerm": 2,
"raftAppliedIndex": 8,
"dbSizeInUse": 98304
}
},
{
"endpoint": "http://127.0.0.1:32379",
"epStatus": {
"header": {
"cluster_id": 17237436991929493444,
"member_id": 18249187646912138824,
"revision": 1,
"raft_term": 2
},
"version": "3.5.9",
"dbSize": 98304,
"leader": 18249187646912138824,
"raftIndex": 8,
"raftTerm": 2,
"raftAppliedIndex": 8,
"dbSizeInUse": 98304
}
}
]
},
{
"name": "serializableReadChecker",
"summary": "Successful",
"readResponses": [
{
"endpoint": "http://127.0.0.1:2379",
"took": "686.5µs"
},
{
"endpoint": "http://127.0.0.1:22379",
"took": "1.129291ms"
},
{
"endpoint": "http://127.0.0.1:32379",
"took": "1.034625ms"
}
]
},
{
"name": "linearizableReadChecker",
"summary": "Successful",
"readResponses": [
{
"endpoint": "http://127.0.0.1:2379",
"took": "1.286333ms"
},
{
"endpoint": "http://127.0.0.1:22379",
"took": "890.417µs"
},
{
"endpoint": "http://127.0.0.1:32379",
"took": "1.257791ms"
}
]
},
{
"name": "metricsChecker",
"summary": [
"Successful"
],
"epMetricsList": [
{
"endpoint": "http://127.0.0.1:2379",
"took": "3.752625ms",
"epMetrics": {
"etcd_disk_backend_commit_duration_seconds_bucket": [
"etcd_disk_backend_commit_duration_seconds_bucket{le=\"0.001\"} 0"
],
"etcd_disk_wal_fsync_duration_seconds_bucket": [
"etcd_disk_wal_fsync_duration_seconds_bucket{le=\"0.001\"} 0"
],
"etcd_network_peer_round_trip_time_seconds_bucket": [
"etcd_network_peer_round_trip_time_seconds_bucket{To=\"91bc3c398fb3c146\",le=\"0.0001\"} 2"
],
"process_resident_memory_bytes": null
}
}
]
}
]
}
32 changes: 32 additions & 0 deletions etcdctl/ctlv3/command/diagnosis/plugins/common/checker.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
// Copyright 2025 The etcd Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

package common

import (
"time"

clientv3 "go.etcd.io/etcd/client/v3"
)

// Checker carries shared configuration for diagnosis plugins.
// It embeds generic options such as the etcd client configuration,
// resolved endpoints, and command timeout.
type Checker struct {
Cfg *clientv3.ConfigSpec
Endpoints []string
CommandTimeout time.Duration
DbQuotaBytes int
Name string
}
40 changes: 40 additions & 0 deletions etcdctl/ctlv3/command/diagnosis/plugins/common/client.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
// Copyright 2025 The etcd Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

package common

import (
"go.uber.org/zap"

"go.etcd.io/etcd/client/pkg/v3/logutil"
clientv3 "go.etcd.io/etcd/client/v3"
)

// NewClient creates an etcd client from the given configuration spec.
func NewClient(cfg *clientv3.ConfigSpec) (*clientv3.Client, error) {
lg, _ := logutil.CreateDefaultZapLogger(zap.InfoLevel)
cliCfg, err := clientv3.NewClientConfig(cfg, lg)
if err != nil {
return nil, err
}
return clientv3.New(*cliCfg)
}

// ConfigWithEndpoint returns a shallow copy of cfg with Endpoints set to the
// provided single endpoint.
func ConfigWithEndpoint(cfg *clientv3.ConfigSpec, ep string) *clientv3.ConfigSpec {
c := *cfg
c.Endpoints = []string{ep}
return &c
}
Loading