Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write timeouts panel is not filtered by the scheduling group #2432

Open
piodul opened this issue Dec 17, 2024 · 3 comments
Open

Write timeouts panel is not filtered by the scheduling group #2432

piodul opened this issue Dec 17, 2024 · 3 comments
Assignees

Comments

@piodul
Copy link

piodul commented Dec 17, 2024

scylla-version=2024.1
monitoring-version=4.6.2,4.8.3
dashboard=detailed-2024-1

The "Write Timeouts/Seconds" panel's Prometheus query is:

$func(rate(scylla_storage_proxy_coordinator_write_timeouts{instance=~"[[node]]",cluster=~"$cluster|$^", dc=~"$dc", shard=~"[[shard]]"}[1m])) by ([[by]])

It does not filter by the scheduling group name. Some writes are asynchronous and Scylla-initiated (e.g. hinted handoff writes) and do not have a direct impact on the user workload; the current query includes them all together, which might confuse the user and give them a perception that their writes are timing out.

I believe that this panel should filter by scheduling group name, considering that many panels on the same dashboard do filter by it.

@mykaul
Copy link
Contributor

mykaul commented Dec 17, 2024

monitoring-version=4.6.2

This is an old version.

@piodul
Copy link
Author

piodul commented Dec 18, 2024

monitoring-version=4.6.2

This is an old version.

I didn't see any issues about it being fixed, so I created an issue. I can check if the problem is also present on the newest version.

@piodul
Copy link
Author

piodul commented Dec 18, 2024

I confirm that the issue is also present on 4.8.3. Although the Prometheus query is different:

topk([[topk]], $func(rate(scylla_storage_proxy_coordinator_write_timeouts{instance=~"[[node]]",cluster="$cluster", dc=~"$dc", shard=~"[[shard]]"}[$__rate_interval])) by ([[by]])) or on ([[by]]) bottomk([[bottomk]], $func(rate(scylla_storage_proxy_coordinator_write_timeouts{instance=~"[[node]]",cluster="$cluster", dc=~"$dc", shard=~"[[shard]]"}[$__rate_interval])) by ([[by]]))

it also doesn't seem to filter by the scheduling group.

@amnonh amnonh added this to the Monitoring 4.9 milestone Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants