Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document Oplog/hour for Mongo dashboards #3448

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file modified documentation/docs/images/PMM_MongoDB_Cluster_Summary.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -1,35 +1,61 @@
# MongoDB Cluster Summary
# MongoDB Sharded Cluster Summary

![!image](../../images/PMM_MongoDB_Cluster_Summary.jpg)

## Current Connections Per Shard
## Overview

TCP connections (Incoming) in mongod processes.
Displays essential data for individual nodes, such as their role, CPU usage, memory consumption, disk space, network traffic, uptime, and the current MongoDB version.

## Total Connections
## Node States
Shows the state timeline of MongoDB replica set members during the selected time range. Each node's state (PRIMARY, SECONDARY, ARBITER, etc.) is color-coded for easy monitoring, with green indicating healthy states and red showing potential issues.

Incoming connections to mongos nodes.
Use this to track role changes and identify stability problems across your replica set.

## Cursors Per Shard
## Collection Details

The Cursor is a MongoDB Collection of the document which is returned upon the find method execution.
### Size of Collections in Shards
Visualizes the storage size distribution across MongoDB collections in different shards, excluding system databases. Use this metric to monitor space utilization across collections and plan capacity based on storage growth patterns in your MongoDB cluster.

## Mongos Cursors
### Number of Collections in Shards
Displays the total number of collections per database across different shards in your MongoDB cluster, excluding system databases.

The Cursor is a MongoDB Collection of the document which is returned upon the find method execution.
Use this to track collection growth and identify databases that may need optimization based on their collection count.

## Operations Per Shard
## Connections

Ops/sec, classified by legacy wire protocol type (`query`, `insert`, `update`, `delete`, `getmore`).
### Current Connections Per Shard
Displays the current number of incoming TCP connections for each MongoDB shard, showing trends over time with mean, maximum, and minimum values.

## Total Mongos Operations
Use this to monitor connection patterns and ensure your MongoDB cluster maintains healthy connection levels across all shards.

Ops/sec, classified by legacy wire protocol type (`query`, `insert`, `update`, `delete`, `getmore`).
### Available Connections
Tracks the number of available MongoDB connections across your replica sets over time, with statistical breakdowns.

## Change Log Events
Use this metric to monitor connection capacity and ensure your MongoDB cluster maintains sufficient connection availability for client requests.

Count, over last 10 minutes, of all types of configuration db changelog events.
## Chunks in Shards

## Oplog Range by Set
### Amount of Chunks in Shards
Displays the number of chunks distributed across each shard in your MongoDB cluster, excluding system databases. Use this to monitor data distribution and identify potential balancing needs across your sharded cluster.

Timespan 'window' between oldest and newest ops in the Oplog collection.
### Dynamic of Chunks
Shows the rate of change in chunk distribution across MongoDB shards over time, with statistical breakdowns for each shard. Use this to monitor chunk migration patterns and ensure proper data balancing across your sharded cluster.

### Chunks Move Events
Displays the frequency of chunk movement operations between shards in your MongoDB cluster over time. Use this metric to track balancing activity and identify periods of high chunk migration that might impact cluster performance.

### Chunks Split Events
Shows the rate at which chunks are being split across your MongoDB sharded cluster due to size growth. Use this metric to identify when collections grow rapidly and determine if you need to rebalance or optimize shard keys.

## Replication

### Replication Lag by Shard
Tracks the maximum replication delay (in seconds) between primary and secondary nodes for each shard in your MongoDB cluster.

Use this to monitor replication health and detect when secondaries fall too far behind their primary nodes.

### Oplog Range by Shard
Shows the time window between the oldest and newest operations in the MongoDB oplog for each shard. Use this to monitor oplog capacity and ensure there's enough history for replica set members to sync after maintenance or failures.

### Oplog GB/Hour
Shows the size of the MongoDB oplog generated by the Primary server. Use this to track oplog growth, plan storage needs, and detect high-write periods. Values are displayed in bytes with hourly intervals.
Original file line number Diff line number Diff line change
Expand Up @@ -2,26 +2,86 @@

![!image](../../images/PMM_MongoDB_ReplSet_Summary.jpg)

## Replication Lag
## Overview
Displays essential data for individual nodes, such as their role, CPU usage, memory consumption, disk space, network traffic, uptime, and the current MongoDB version.

MongoDB replication lag occurs when the secondary node cannot replicate data fast enough to keep up with the rate that data is being written to the primary node. It could be caused by something as simple as network latency, packet loss within your network, or a routing issue.
## Node States
Shows the state timeline of MongoDB replica set members during the selected time range. Each node's state (PRIMARY, SECONDARY, ARBITER, etc.) is color-coded for easy monitoring, with green indicating healthy states and red showing potential issues. Use this to track role changes and identify stability problems across your replica set.

## Operations - by service name
## Details

Operations are classified by legacy wire protocol type (insert, update, and delete only).
### Command Operations
Shows the rate of MongoDB operations per second, including both regular and replicated operations (query, insert, update, delete, getmore), as well as document deletions by TTL indexes. Use this metric to monitor database activity patterns and identify potential performance bottlenecks.

## Max Member Ping Time - by service name
### Top Hottest Collections by Read
Shows the five MongoDB collections with the highest read operations per second. Use this to identify your most frequently accessed collections and optimize their performance.

This metric can show a correlation with the replication lag value.
### Top Hottest Collections by Write
Shows the five MongoDB collections with the highest write operations (inserts, updates, and deletes) per second. Use this to identify your most frequently modified collections and optimize their write performance.

## Max Heartbeat Time
### Query Efficiency
Shows the ratio of documents or index entries scanned versus documents returned. A ratio of 1 indicates optimal query performance where each scanned document matches the query criteria.

Time span between now and last heartbeat from replicaset members.
Higher values suggest less efficient queries that scan many documents to find matches. Use this to identify queries that might need index optimization.

## Elections
### Queued Operations
Shows the number of operations waiting because the database is busy with other operations. Use this to identify when MongoDB operations are being delayed due to resource conflicts.

Count of elections. Usually zero; 1 count by each healthy node will appear in each election. Happens when the primary role changes due to either normal maintenance or trouble events.
### Reads & Writes
Shows both active and queued read/write operations in your MongoDB deployment. Use this to monitor database activity and identify when operations are being delayed due to high load.

## Oplog Recovery Window - by service name
### Connections
Shows the number of current and available MongoDB connections. Use this to monitor connection usage and ensure your deployment has sufficient capacity for new client connections.

Timespan 'window' between newest and the oldest op in the Oplog collection.
### Query Execution Times
Shows the average latency in microseconds (µs) for read, write, and command operations. Use this metric to monitor query performance and identify slow operations that may need optimization.

## Collection Details

### Size of Collections
Shows storage size of MongoDB collections across different databases. Use this to monitor database growth and plan storage capacity needs.

### Number of Collections
Shows the total number of collections in each MongoDB database. Use this to track database organization and growth patterns.

## Replication

### Replication Lag
Shows how many seconds Secondary nodes are behind the Primary in replicating data. Higher values indicate potential issues with network latency or system resources. The red threshold line at 10 seconds helps identify when lag requires attention.

### Oplog Recovery Window
Shows the time range (in seconds) between the newest and oldest operations in the oplog. Use this to ensure sufficient history is maintained for recovery and secondary synchronization.

### Oplog GB/Hour
Shows the size of the MongoDB oplog generated by the Primary server. Use this to track oplog growth, plan storage needs, and detect high-write periods. Values are displayed in bytes with hourly intervals.

## Performance

### Flow Control
Shows the frequency and duration (in microseconds) of MongoDB write throttling. Use this to understand when your deployment is slowing down writes to keep replication lag under control.

### WiredTiger Concurrency Tickets Available
Shows how many more read and write operations your MongoDB deployment can handle simultaneously. Use this to monitor database concurrency limits and potential bottlenecks.

## Nodes Summary

### Nodes Overview
Shows key system metrics for each node: uptime, load average, memory usage, disk space, and more. Use this table to monitor the health and resource utilization of your infrastructure at a glance.

## CPU Usage
Shows CPU utilization as a percentage of total capacity, broken down by user and system activity. Use this to monitor CPU load and identify potential performance bottlenecks.

## CPU Saturation

### CPU Saturation and Max Core Usage
Shows how heavily your CPU is loaded with waiting processes and maximum core utilization. Use this to identify when your system needs more CPU capacity or when processes are competing for CPU time.

## Disk I/O and Swap Activity
Shows disk I/O operations (reads/writes) and memory swap activity for each MongoDB node, measuring data flow between storage and RAM.

Use this metric to monitor storage performance, detect memory pressure, and identify when MongoDB's working set may exceed available RAM.

## Network Traffic
Shows inbound and outbound network traffic for each MongoDB node, measuring data flow in bytes per second.

Use this metric to monitor bandwidth usage, identify unusual traffic patterns, and detect potential network bottlenecks that could affect replication performance.
Loading