The Three Pillars of Observability

Updated on 30 Jun, 202621 mins read 266 views

Introduction

Modern distributed systems are incredibly complex.

A single user request may travel through:

API gateways,
authentication services,
microservices,
databases,
caches,
queues,
third-party APIs,
serverless functions.

When something fails in production, engineers need answers to question like:

What happened?
Where did the failure occur?
Why did it happen?
Which service caused it?
How many users are affected?
Is the system still healthy?

This is where Observability becomes essential.

Observability is the ability to understand the internal state of a system by analyzing the data it produces.

The foundation of observability is built on three major components known as:

The Three Pillars of Observability

Logs
Metrics
Traces

These three pillars work together to provide visibility into distributed systems.

What Are the Three Pillars of Observability?

The three pillars represent the primary types of telemetry data generated by modern systems.

Each pillar answers different kind of questions.

Pillar	Purpose
Logs	What exactly happened?
Metrics	Is the system healthy overall?
Traces	How did a request travel through the system?

Together they create a complete operational understanding of production systems.

Why These Pillars Exist

In distributed systems:

failures are partial,
logs are fragmented,
services are independent,
debugging is difficult.

No single telemetry type is sufficient.

Example:

Metrics may show latency spiked,
Logs may show database errors,
Traces may reveal which services caused the slowdown.

Only together do they provide full observability.

Pillar 1 – Logs

What Are Logs?

Logs are timestamped records of events generated by applications or infrastructure.

They describe:

system behavior,
errors,
warnings,
state changes,
request activity.

Logs are the most detailed form of telemetry.

Example Log:

{
  "timestamp": "2026-05-19T10:00:00Z",
  "level": "ERROR",
  "service": "payment-service",
  "message": "Payment processing failed",
  "correlationId": "abc123"
}

Characteristics of Logs

Logs are:

event-based,
highly detailed,
timestamped,
unstructured or structured.

Types of Logs

1 Application Logs

Generated by application code.

Example:

user login,
payment processed,
API request failed.

2 System Logs

Generated by operating systems.

Example:

disk failures,
memory warnings,
kernel events.

3 Access Logs

Track incoming requests.

Example:

HTTP request logs,
API gateway logs.

4 Audit Logs

Track security-sensitive actions.

Example:

password changes,
admin actions,
permission updates.

Log Levels

Most logging systems use levels:

Level	Meaning
DEBUG	Detailed development information
INFO	Normal system operation
WARN	Potential issues
ERROR	Failures occurred
FATAL	Critical system failure

Structured Logging

Modern systems prefer structured logs using JSON.

Why?

searchable,
filterable,
machine-readable,
scalable.

Example:

{
  "service": "auth-service",
  "level": "INFO",
  "userId": 101,
  "message": "User authenticated"
}

Centralized Logging

In distributed systems:

logs come from many servers,
many containers,
many services.

Centralized logging aggregates logs into one platform.

Strenghts of Logs

Logs are excelled for:

debugging,
root cause analysis,
detailed investigation.

Weakness of Logs

Logs are:

noisy,
high-volume,
expensive to store,
difficult to aggregate statistically.

This is why metrics exist.

Pillar 2 – Metrics

What Are Metrics?

Metrics are numerical measurements collected over time.

They summarize system behavior.

Metric answer:

Is the system healthy?
How much traffic exists?
Is latency increasing?
Are errors increasing?

Example Metrics

Metric	Value
CPU Usage	75%
Request Rate	10,000 req/sec
Error Rate	2%
API Latency	120 ms

Characteristics of Metrics

Metrics are:

aggregated,
lightweight,
time-series based,
highly efficient.

Types of Metrics

1 Counter

Only increases.

Example:

total requests,
total errors.

2 Gauge

Represents current value.

Example:

CPU usage,
memory usage.

3 Histogram

Measures distribution.

Example:

request latency ranges.

4 Summary

Provides percentile statistics.

Example:

P95 latency,
P99 latency.

Time-Series Data

Metrics are stored as time-series data.

Example:

10:00 → 50ms
10:01 → 55ms
10:02 → 70ms

This allows:

trend analysis,
dashboards,
alerting.

Strengths of Metrics

Metrics are:

lightweight,
scalable,
excellent for alerting,
ideal for dashboards.

Weaknesses of Metrics

Metrics lack detail.

Metrics may show:

latency increased,

but not:

why latency increased.

This is where logs and traces help.

Pillar 3 – Traces

What Are Traces?

Traces track the complete journey of a request across distributed systems.

They answer:

Which services were involved?
Where was latency introduced?
Which service failed?
How long did each operation take?

Example Request Flow

Client
  ↓
API Gateway
  ↓
Auth Service
  ↓
Payment Service
  ↓
Database

A trace records the entire path.

Trace IDs and Span IDs

Trace ID

Represents:

one complete distributed request.

Span ID

Represents:

one operation inside the trace.

Example:

Trace
 ├── API Gateway Span
 ├── Auth Service Span
 ├── Payment Service Span
 └── Database Span

Example Trace Insights

Tracing can reveal:

slow database queries,
failed downstream services,
retry storms,
bottlenecks.

Strengths of Traces

Traces are excelled for:

microservices debugging,
latency analysis,
request flow visualization.

Weaknesses of Traces

Tracing systems:

can be expensive,
generate large telemetry volumes,
require instrumentation.

Relationship Between Logs, Metrics, and Traces

The three pillars complement each other.

Example Scenario

Suppose users report:

“Checkout is slow.”

Metrics Show

P95 latency increased from 200ms to 3s

Metrics reveal:

system health issue exists.

Traces Show

Payment Service → Database query taking 2.5s

Traces identify:

where slowdown occurred.

Logs Show

Database connection pool exhausted

Logs reveal:

root cause.

This Is the Power of Observability

Metrics detect.

Traces locate.

Logs explain.

Mental Model of the Three Pillars

Think of observability like a hospital:

Pillar	Analogy
Metrics	Vital signs
Logs	Medical records
Traces	Patient journey

Each provides different visibility into system health.

Your email address will not be published. Required fields are marked *

The Three Pillars of Observability

Introduction

The Three Pillars of Observability

What Are the Three Pillars of Observability?

Why These Pillars Exist

Pillar 1 – Logs

What Are Logs?

Characteristics of Logs

Types of Logs

1 Application Logs

2 System Logs

3 Access Logs

4 Audit Logs

Log Levels

Structured Logging

Why?

Example:

Centralized Logging

Strenghts of Logs

Weakness of Logs

Pillar 2 – Metrics

What Are Metrics?

Example Metrics

Characteristics of Metrics

Types of Metrics

1 Counter

2 Gauge

3 Histogram

4 Summary

Time-Series Data

Strengths of Metrics

Weaknesses of Metrics

Pillar 3 – Traces

What Are Traces?

Example Request Flow

Trace IDs and Span IDs

Trace ID

Span ID

Example Trace Insights

Strengths of Traces

Weaknesses of Traces

Relationship Between Logs, Metrics, and Traces

Example Scenario

Metrics Show

Traces Show

Logs Show

This Is the Power of Observability

Mental Model of the Three Pillars

Leave a comment

How Characters are Stored in Memory

Variadic Function Working in C

Appending Characters to Strings in C++