Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OPCT-263: docs: review installation review guide removing etcd log #154

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 0 additions & 4 deletions docs/devel/byo-plugin.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,3 @@ vi /tmp/openshift-kube-conformance.yaml
```
./opct run --plugin /tmp/openshift-kube-conformance.yaml --plugin /tmp/openshift-conformance-validated.yaml
```

<!-- ## BYO Plugin from scratch

> TBD -->
86 changes: 3 additions & 83 deletions docs/guides/cluster-validation/installation-review.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,91 +126,11 @@ This section documents how to run dense disk tests using `fio`.

This section provides a guide to check the etcd slow requests from the logs on the etcd pods to understand how the etcd is performing while running the e2e tests.

The steps below use a utility `insights-ocp-etcd-logs` to parse the logs, aggregate the requests into buckets of 100ms from 200ms to 1s and report it on the stdout.
The command `opct adm parse-etcd-logs` reads the logs, aggregates the requests and shows by buckets of 100ms from 200ms to 1s.

`insights-ocp-etcd-logs` is the utility to help you to troubleshoot the slow requests in your cluster, and help make some decisions like changing the flavor of the block device used by the control plane, increasing IOPS, changing the flavor of the instances, etc.
`opct adm parse-etcd-logs` is the utility to help to troubleshoot the slow requests in the cluster, and help make decisions like changing the flavor of the block device used by the control plane, increasing IOPS, changing the flavor of the instances, etc.

Comment on lines +131 to 132
Copy link
Preview

Copilot AI Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Consider removing the redundant 'to' in 'help to troubleshoot'; using 'help troubleshoot' makes the sentence more concise.

Suggested change
`opct adm parse-etcd-logs` is the utility to help to troubleshoot the slow requests in the cluster, and help make decisions like changing the flavor of the block device used by the control plane, increasing IOPS, changing the flavor of the instances, etc.
`opct adm parse-etcd-logs` is the utility to help troubleshoot the slow requests in the cluster, and help make decisions like changing the flavor of the block device used by the control plane, increasing IOPS, changing the flavor of the instances, etc.

Copilot is powered by AI, so mistakes are possible. Review output carefully before use.

Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
There's no magic or desired number, but for reference, based on the observations from integrated platforms, is to have no more than 30-40% of requests above 500ms while running the conformance tests.

- Export the location where your must-gather has been extracted:

```bash
export MUST_GATHER_PATH=${PWD}/must-gather.local.2905984348081335046
```

- Overall report:

> Note: This report can not be useful depending on how old the logs are. We recommend looking at the next report which aggregates by the hour, so you can check the time frame the validation environment has been executed

```bash
$ cat ${MUST_GATHER_PATH}/*/namespaces/openshift-etcd/pods/*/etcd/etcd/logs/current.log \
| ./opct adm parse-etcd-logs
> Filter Name: ApplyTookTooLong
> Group by: all
>>> Summary <<<
all 16949
>500ms 1485 (8.762 %)
---
>>> Buckets <<<
low-200 0 (0.000 %)
200-300 9340 (55.106 %)
300-400 4169 (24.597 %)
400-500 1853 (10.933 %)
500-600 716 (4.224 %)
600-700 223 (1.316 %)
700-800 185 (1.092 %)
800-900 139 (0.820 %)
900-1s 79 (0.466 %)
1s-inf 143 (0.844 %)
unkw 102 (0.602 %)
```

- Report aggregated by hour:

```bash
$ cat ${MUST_GATHER_PATH}/*/namespaces/openshift-etcd/pods/*/etcd/etcd/logs/current.log \
| ./opct adm parse-etcd-logs -aggregator hour
> Filter Name: ApplyTookTooLong
> Group by: hour

>> 2023-03-01T17
>>> Summary <<<
all 558
>500ms 54 (9.677 %)
---
>>> Buckets <<<
low-200 0 (0.000 %)
200-300 385 (68.996 %)
300-400 90 (16.129 %)
400-500 28 (5.018 %)
500-600 9 (1.613 %)
600-700 10 (1.792 %)
700-800 7 (1.254 %)
800-900 9 (1.613 %)
900-1s 16 (2.867 %)
1s-inf 3 (0.538 %)
unkw 1 (0.179 %)
(...)
>> 2023-03-01T16
>>> Summary <<<
all 8651
>500ms 812 (9.386 %)
---
>>> Buckets <<<
low-200 0 (0.000 %)
200-300 4833 (55.866 %)
300-400 1972 (22.795 %)
400-500 983 (11.363 %)
500-600 328 (3.791 %)
600-700 135 (1.561 %)
700-800 111 (1.283 %)
800-900 75 (0.867 %)
900-1s 48 (0.555 %)
1s-inf 115 (1.329 %)
unkw 51 (0.590 %)
```

The values on the output are a reference for expected results: most of the slow requests reported on the logs (>=200ms) should be under 500 ms while the tests are executing.
See the command [`opct adm parse-etcd-logs`](./opct/adm/parse-etcd-logs.md) for more information.

#### Mount /var/lib/etcd in separate disk <a name="components-etcd-mount"></a>

Expand Down
Loading