Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

show when a repair is paused #2187

Open
Tracked by #3744
amnonh opened this issue Feb 24, 2024 · 12 comments
Open
Tracked by #3744

show when a repair is paused #2187

amnonh opened this issue Feb 24, 2024 · 12 comments
Assignees
Labels
enhancement New feature or request User-request

Comments

@amnonh
Copy link
Collaborator

amnonh commented Feb 24, 2024

It could be useful to see if a repair process is paused.

@amnonh amnonh added the enhancement New feature or request label Feb 24, 2024
@tzach
Copy link
Contributor

tzach commented Feb 25, 2024

@amnonh

  1. do we have a metric to support this?
  2. What is the best way to present this?
    For Enterprise (but not for cloud), there could be more than one repair task in parallel.

@Michal-Leszczynski
Copy link

Michal-Leszczynski commented Feb 26, 2024

We could add SM metric like scylla_manager_scheduler_status labeled by cluster_id, task_type, task_id which would store enum of all possible task statuses:

// Status enumeration.
const (
	StatusNew      Status = "NEW" // 1
	StatusRunning  Status = "RUNNING"
	StatusStopping Status = "STOPPING"
	StatusStopped  Status = "STOPPED"
	StatusWaiting  Status = "WAITING"
	StatusDone     Status = "DONE"
	StatusError    Status = "ERROR"
	StatusAborted  Status = "ABORTED" // 8
)

@amnonh
Copy link
Collaborator Author

amnonh commented Feb 28, 2024

I have concerns about the explosion of metrics from the manager when we have lots of tables.
It's not helpful for the users; we cannot visualize thousands of metrics. We need to have an aggregated version.

@Michal-Leszczynski Michal-Leszczynski self-assigned this Mar 1, 2024
@Michal-Leszczynski
Copy link

Proposed metric wouldn't be labeled by host or table, so it shouldn't generate much entries.

@tarzanek
Copy link

tarzanek commented Mar 4, 2024

some clusters can have 1200 tables

@ruthea
Copy link

ruthea commented Mar 4, 2024

some clusters can have 1200 tables

Does this request need to factor in tables? Or is this simply a boolean on whether there is a pause on repair for a cluster?

@tzach
Copy link
Contributor

tzach commented Mar 4, 2024

@tarzanek, does anyone you know care about the table-by-table repair?
Scylla Manager default is to repair all tables
It is possible to change, but I'm not sure anyone does that, for sure not in Scylla Cloud.

@tarzanek
Copy link

tarzanek commented Mar 5, 2024

Usually people want whole cluster stats, so @tzach is right
per table repair is rare activity

@tzach
Copy link
Contributor

tzach commented Mar 5, 2024

Thanks @tarzanek

Look like we have 3 separate issues:

  • Enable cluster level metrics
  • Disable Tablet level metrics
  • Show paused repairs

@amnonh
Copy link
Collaborator Author

amnonh commented Jun 24, 2024

@Michal-Leszczynski how are we with regards to those metrics? may be related to #2307

@Michal-Leszczynski
Copy link

I suppose that we can put scylladb/scylla-manager#3744 in 3.3.1 milestone.
@karol-kokoszka what do you think?

@karol-kokoszka
Copy link

Let's bring this issue to the refirement.
We haven't define any ETA for 3.3.1, so we can put whatever we think is valid to fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request User-request
Projects
None yet
Development

No branches or pull requests

6 participants