Traditional monitoring and metrics tools were designed with assumptions that suited simpler, more static system architectures, such as monolithic applications running on a fixed number of hosts with one database. These tools rely heavily on system and instrumentation metrics and are mainly reactive—used mostly by operations teams to detect failures after they happen, focusing on uptime and failure prevention.

However, modern software architectures are far more complex. They involve many loosely coupled microservices, multiple types of databases (polyglot persistence), highly dynamic infrastructure that scales elastically, and services that may not be directly controlled by your team. In such environments, engineers proactively monitor changes in production, instrument their code, and track deployment impacts continuously. Automatic instrumentation alone is no longer enough to understand system behavior.

Monitoring typically looks at a limited number of predefined metrics and detects known issues. Observability, on the other hand, deals with high-dimensional, high-cardinality data across many diverse components and systems. It enables understanding of complex interactions and uncovering hidden issues by correlating vast quantities of data that traditional monitoring cannot manage. Observability supports proactive investigation by providing insights into why problems happen and how different parts of the system relate, rather than just when and what failed.

How does Observability work?

Observability works on 3 pillars
1. Logs
2. Metrics
3. Traces

Logs

Logs are a record with all the events and messages a software system produces.
Each service in the system produces logs. We store these logs in order to have detailed information about the system.
Their usefulness diminishes as they get older. So, we store logs long enough to fulfill their lifespan and evict or archive accordingly to reduce overhead storage.
Log retention policies: are guidelines to automatically expire log data after a specified duration.

Tools used for logging

Grafana Loki
- Adding Loki as a data-source in Grafana enables to visualize the logs in it.
Elastic Beats
- The role of beats in ELK stack is that they are shippers for different data types to elastic.
- Types
  - Filebeat: For logs << Similar to Loki
  - Metric Beat: For Metrics << Used in monitoring
  - Heartbeat: For uptime monitoring (check if the service is reachable or not)
  - Auditbeat: For audit data (user's activity)
  - Packetbeat: For network data << To check traffic
  - Winlogbeat: For windows event log

Metrics

Unlike logs, metrics are measurements that provide a snapshot of a system's performance at a specific point in time.
These measurements can be CPU or memory consumption.
Alerts can be configured so that if a metric pass a certain threshold a notification can be sent to the developer to fix the issue. This notification can be: Slack message, call, etc...

Tools

Prometheus
Zabbix
Grafana Mimir << like Prometheus but for a larger scale

Note

To apply high availability in Prometheus we use a tool called Thanos.

Traces

A trace is the complete journey of a request or workflow as it moves from one part of the system to another.
We have 2 metadata information
- Span ID: an application generates every time it reports an event.
- Trace ID: each application will pass along if the data it receives contains it. If not available, a new one will be generated and reported.

Tools

Grafana Tempo
Zipkin
Jaeger

Note

For distributed tracing, we use service mesh. Famous examples: Linkerd, Consul Connect. When to use distributed tracing? When you have an application with multiple microservices like an online shopping web app.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Assets		Assets
Logging		Logging
Metrics		Metrics
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Table of Contents

Why do we need observability?

Observability vs. Monitoring

How does Observability work?

Logs

Tools used for logging

Metrics

Tools

Traces

Tools

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

Why do we need observability?

Observability vs. Monitoring

How does Observability work?

Logs

Tools used for logging

Metrics

Tools

Traces

Tools

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages