Skip to content

Releases: scylladb/scylla-monitoring

Branch 2.4

02 Jul 13:18
Compare
Choose a tag to compare

New

  • Adding Scylla 3.1 dashboards #594
  • Adding the error Dashboard #576
  • Add a graph for ad-hoc queries #610
  • Allow changing aggregation modes #520
  • State which metrics are coordinators and which replicas #521
  • Allow external directory for grafana #597
  • genconfig using nodetool improved #613
  • Version per node table #151
  • Add a warning on low disks pace on root partition #627
  • Add alert when a node is leaving the cluster #623

Bug Fixes

  • cql-optimization shows N/A for cross dc when there is no reads in a DC #608
  • wrong metrics for compaction shares #604
  • the group by shard is incorect #337
  • scylla manager metric scylla_manager_cluster_cql_rtt_ms renamed #637
  • Manager 1.4 dashboard is broken because of cql metrics rename #661
  • cql optimization cross shard is off and should be remove #659

Scylla Branch 2.3.1

19 May 12:51
Compare
Choose a tag to compare

Bug fixes:

  • Compactions I/O Queue delay by Shard should be filtered by mountpoint as well #588
  • scylla manager panels not showing information when there are multiple keyspaces #602
  • wrong metrics for compaction shares #604

Branch 2.3

29 Apr 06:31
Compare
Choose a tag to compare

New in Scylla Monitoring Stack 2.3

  • Scylla enterprise dashboards for 2019.1 (#538)
  • Scylla manager dashboard for 1.4 (#557)
  • Add cross_shard_ops panel to cql optimization (#553)
  • Dashboards are precompiled in the release
  • Cluster name in use is shown in the dashboard (#533)
  • genconfig.py with multi dc support (#513)
  • Add a storage usage over time panel (#466)
  • Upgrade prometheus to 2.7.2 (#456)
  • Show more information about compaction (#491)
  • Alertmanager and Prometheus alerts can be configured from the command line
  • Warn users when starting docker as root and make grafana volume sharable
  • Add a disk usage over time graph (#466)
  • prometheus data directory to accept relative path (#527)

Bug Fixes

  • Prometheus.rules.yaml NoCQL rule looks for cql_status metric (#541)
  • not all 2018.1 uses cluster and dc (#540)

Branch 2.2

19 Mar 08:44
Compare
Choose a tag to compare

New In 2.2

  • CQL optimization dashboard (#471)
  • Unified target files for Scylla and node_exporter (#378)
  • Per machine (node_exporter related) dashboard added to Enterprise (#495)
  • Prometheus container uses the current user ID and group (#487)
  • Kill-all kills Prometheus instances gracefully (#438)
  • Start-all.sh now supports --version flag (#374)
  • Remove the version from the dashboard names (#486)
  • Dashboard loaded from API should have overwrite true (#474)
  • Update alertmanager to 0.16 (#478)
  • Bug Fixes

Moved the node_exporter relabeling to metric_relabeling (#497)

  • Fixed units in foreground writes (#463)
  • manager dashboard was missing UUID (#505)

Branch 2.1

10 Feb 11:08
Compare
Choose a tag to compare

Main changes:

Move to Grafana 5
Use local file for configuration and provisioning
Minor bug fixes

Branch 2.0

26 Dec 08:07
Compare
Choose a tag to compare
scylla-monitoring-2.0

missing closing bracket in dropped view updates

Branch 1.1.0

12 Aug 08:41
Compare
Choose a tag to compare
Branch 1.1.0 Pre-release
Pre-release
disk usage should be per node (#360)

This series set the disk pie-chart usage to be per node, so the repeated
pannel, would show the per server usage.

Signed-off-by: Amnon Heiman <[email protected]>

scylla-monitoring-1.0.0

05 Jul 14:37
Compare
Choose a tag to compare
Adding a new cpu dashboard (#336)

* Adding a new cpu dashboard

Replaces: enhance per server dashboard with useful metrics

Adding a new dashboard that specialized in CPU load
 - Adding a graph with foreground CPU utilization. That is the CPU used by
   request processing, excluding compaction, flushes and other things. The reason for that is that users are usually scared of spikes. Even if we tell them that
   spikes are fine because they are the result of isolatable background processes,
   it is hard to *prove* that without further analysis. This graph will help.

 - time spent in violations: A lot of the latency issues we have, especially in
   higher percentiles come from task quota violations. We have a metric for this
   now and it will help us correlate latency spikes in time

 - Client connections: in the past few months, this is *THE* top metric we
   have been looking at to detect problems. It harms us a lot that it is not
   part of the main dashboard.

In the process of doing the above, I am also doing my best to document the new
graphs. The text will appear in the tooltip in the top left corner of the graph.