Skip to content

Releases: scylladb/scylla-monitoring

Release 4.8.1

30 Sep 05:56
Compare
Choose a tag to compare

Bug fixes

  • start-all.sh --target-directory option have error in its documentation #2398
  • Some panels in the OS dashboard do not respect the DC filter #2396
  • Unknown Alternator OP - BatchGetItemSize #2393
  • Compression-related panels are confusing #2392
  • Using stack graph for active read is confusing #2389
  • Row and Partitions insertions are measured in read/sec #2386
  • The latencies Legend format are confusing #2383
  • scylla_io_queue_flow_ratio graph is inconvenient #2382
  • Add batch latency and batch size metrics to Alternator dashboard #2380
  • Alternator OPs are not representative of real ops - in case of BatchGetItem and similar batch ops area/alternator #2374

Release 4.8.0

19 Aug 11:35
Compare
Choose a tag to compare

New In Release 4.8.0

  • Support for Scylla Manager 3.3 #2339
  • Make the Tablet section collapsible #2329
  • Add panels for network compression #2325
  • Add filters that limit the number of results per panel [breaking-changes] #2319
  • Add a graph for scylla_io_queue_flow_ratio #2306
  • Make the IO-group panel group by iogroup, stream #2305
  • Tooltip now allows scrolling #2209
  • Add metrics for RPC #2104
  • Unify Scylla-Manager status and progress #2009
  • Different aggregation functions for the latency metrics #1741

Bug Fixes

  • Non-Paged CQL Reads Gauge isn't working. #2295
  • " I/O Group All Queue consumption" dashboard use wrong type of graph. #2293
  • Increase nodes table column width to display full ip by default #2302
  • Fix panels description in the advanced dashboard #2290
  • Full page screenshot is broken #2324
  • Make genconfig support ipv6

Operational changes

  • splitBrain alert support for a multi-cluster setup #2304
  • Allow setting local network and docker_pram from env file #2035
  • scylla_storage_proxy_coordinator_read_timeouts repeated twice in regexp. #2323
  • make prometheus the default datasource #2268
  • Deprecated level label #2322
  • The os dashboard accepts multiple node_exporter jobs #2317
  • Support Prometheus various scrap interval sampling [breaking-changes] #2345

Breaking Changes

  • The dashboards now support longer Prometheus scrape intervals, which are configurable and passed as a parameter in the Grafana data source configuration.
  • To better handle clusters with high core counts, the dashboards limit the number of series shown by default. You can change that limit from the drop-down menu at the top.

Release 4.7.2

08 May 08:26
Compare
Choose a tag to compare

Bug Fixes in Release 4.7.2

  • Alternator, complete the move to summaries #2278
  • Wrong port number for node manager agents in prometheus/prometheus.consul.yml.template when using Docker #2277

Release 4.7.1

21 Apr 06:37
Compare
Choose a tag to compare

Bug fixes in 4.7.1

  • The bloom filter alert causes too many false positives #2263
  • Non-token aware queries graph (and gauge) is broken. #2259
  • Service level selection is not carried over between dashboards #2253
  • Hints manager sent annotation, uses the wrong metric #2250
  • Add cluster label to manager base metrics #2270

Release 4.7.0

04 Apr 13:49
Compare
Choose a tag to compare

New in Release 4.7.0

  • Update alternator dashboard #2226
  • Make the default dashboard refresh interval configurable #2220
  • Show scylla_sstables_bloom_filter_memory_size on the detailed dashboard #2219
  • Update Alternator latencies histogram and summaries #2214
  • Combine the Advisor table with the alert table in the overview dashboard #2166
  • Easier method to run multiple monitoring stacks side-by-side #2164
  • Add ethtool metrics to Datadog integration #2163
  • Add tablet metrics to the detailed dashboard #2119, #2111
  • Add storage-related metrics #2044
  • New alert - cluster in split-brain state #1677
  • Enhanced experience with --archive command line flag #2158, #2177
  • The explanation for the unified class group graph is not clear #2178

Bug fixes

  • No closing parenthesis #2229
  • The variable $sg is not defined. #2228
  • Prometheus continues to trigger alerts for a node that has already been removed from scylla_servers.yml #2227
  • read-timeouts in the overview dashboard are breaking when no cdc metrics are reported #2193
  • Manager metrics are inconsistent #2191
  • Version information is cut - although there's plenty of space available in the panel #2189
  • Reads panel does not reflect shards #2171
  • Overview page - no data [write latency, Read timeout by DC] #2162
  • Manager memory metrics interfere with the OS ones #2198
  • The actual interval for calculating metrics is greater than the one specified in evaluation_interval. #2087

operational chagnes

  • start-all.sh optionally skip alertmanager #2239
  • Allow an easy way to start Prometheus with protobuf support #2155
  • Regex for empty string |$^ in dashboards #2192
  • prometheus/prometheus.yml.template: set evaluation interval to 20s #2185
  • Improved experience when working with Archive #2177
  • start-all.sh: create a file with the parameters of the last run operation #2174
  • remove the deprecated level label #2160
  • Performance and security enhencements #2154
  • Allow setting local network from env file #2035

scylla-monitoring-4.6.2

12 Feb 13:58
Compare
Choose a tag to compare
Pre-release

Release 4.6.2

Release 4.6.1

23 Jan 13:39
Compare
Choose a tag to compare

New in release 4.6.1

  • Alert severity for repair and backup failures changed to warn #2151
  • Update Grafana version to 10.2.3

Bug Fixes

  • support custom port when using podman #2152

Release 4.6.0

04 Jan 10:22
Compare
Choose a tag to compare

New in Release 4.6.0

  • Add scylla_io_queue_consumption plots #2088
  • Create a metric that shows cache hit/miss rate per table #285
  • Add a section that shows all scheduling groups on the same graph #2121
  • Add logged/unlogged batches graphs #2081
  • The 'Timeouts' item in the general monitoring dashboard has no description #2056
  • Add a "CQL Connections (creation) rate" graph #2053
  • Alert Manager Rule: add a too-many-files alert #2060

Bug Fixes

  • By Instance, doesn't work on any of the dashboards #2138
  • "Request Shed" is supposed to be in the "Coordinator" section #2124
  • Request/Response payload sizes units are wrong #2083

Operational Change

  • support IPV6 without specifying ports #2136

  • Prometheus does not start with an external directory, and sudo #2134

  • Experimental - Use compose to start the monitoring stack #2123

  • scylla-overview: Do not rely on recording rules for picking the scheduling group #2120

  • remove thrift from all calculations #2102

  • Need to support node_exporter ports #2092

Release 4.5.1

12 Nov 07:55
Compare
Choose a tag to compare

Bug fixed in 4.5.1

  • Grafana renderer is missing #2097

Alerts severities update

  • AlertManager: TooManyFiles generates false alarms for a partition with the OS #2113
  • NoCql alerts should be 'info' #2105
  • InstanceDown alerts should be 'error' #2101

Release 4.5.0

22 Oct 08:30
Compare
Choose a tag to compare

New In Scylla Monitoring 4.5.0

  • move to grafana 10.1.5 and prometheus 2.47.1 #2023
  • Support multiple priority group in the Overview dashboard #2013
  • Split the read-timeout into categories #2085
  • Expose all columns of system.large_xx tables #1991
  • Track UJ node progress #2076

Bug fixes

  • Bogus value on the Repair Progress graph #2019
  • Timeouts are not properly displayed #2085

Operation Changes

  • support taking targets from directory #1995
  • Monitoring stack doesn't respect manager_agent.prometheus config parameter #1992
  • Support for Apple Silicon Valley Chips enhancement #1956
  • Easier datadog integration #1998
  • Remove reverse order read warnings #1498