Introduction
Modern distributed systems are incredibly complex.
A single user request may travel through:
- API gateways,
- authentication services,
- microservices,
- databases,
- caches,
- queues,
- third-party APIs,
- serverless functions.
When something fails in production, engineers need answers to question like:
- What happened?
- Where did the failure occur?
- Why did it happen?
- Which service caused it?
- How many users are affected?
- Is the system still healthy?
This is where Observability becomes essential.
Observability is the ability to understand the internal state of a system by analyzing the data it produces.
The foundation of observability is built on three major components known as:
The Three Pillars of Observability
- Logs
- Metrics
- Traces
These three pillars work together to provide visibility into distributed systems.
What Are the Three Pillars of Observability?
The three pillars represent the primary types of telemetry data generated by modern systems.
Each pillar answers different kind of questions.
| Pillar | Purpose |
|---|---|
| Logs | What exactly happened? |
| Metrics | Is the system healthy overall? |
| Traces | How did a request travel through the system? |
Together they create a complete operational understanding of production systems.
Why These Pillars Exist
In distributed systems:
- failures are partial,
- logs are fragmented,
- services are independent,
- debugging is difficult.
No single telemetry type is sufficient.
Example:
- Metrics may show latency spiked,
- Logs may show database errors,
- Traces may reveal which services caused the slowdown.
Only together do they provide full observability.
Pillar 1 – Logs
What Are Logs?
Logs are timestamped records of events generated by applications or infrastructure.
They describe:
- system behavior,
- errors,
- warnings,
- state changes,
- request activity.
Logs are the most detailed form of telemetry.
Example Log:
{
"timestamp": "2026-05-19T10:00:00Z",
"level": "ERROR",
"service": "payment-service",
"message": "Payment processing failed",
"correlationId": "abc123"
}Characteristics of Logs
Logs are:
- event-based,
- highly detailed,
- timestamped,
- unstructured or structured.
Types of Logs
1 Application Logs
Generated by application code.
Example:
- user login,
- payment processed,
- API request failed.
2 System Logs
Generated by operating systems.
Example:
- disk failures,
- memory warnings,
- kernel events.
3 Access Logs
Track incoming requests.
Example:
- HTTP request logs,
- API gateway logs.
4 Audit Logs
Track security-sensitive actions.
Example:
- password changes,
- admin actions,
- permission updates.
Log Levels
Most logging systems use levels:
| Level | Meaning |
|---|---|
| DEBUG | Detailed development information |
| INFO | Normal system operation |
| WARN | Potential issues |
| ERROR | Failures occurred |
| FATAL | Critical system failure |
Structured Logging
Modern systems prefer structured logs using JSON.
Why?
- searchable,
- filterable,
- machine-readable,
- scalable.
Example:
{
"service": "auth-service",
"level": "INFO",
"userId": 101,
"message": "User authenticated"
}Centralized Logging
In distributed systems:
- logs come from many servers,
- many containers,
- many services.
Centralized logging aggregates logs into one platform.
Strenghts of Logs
Logs are excelled for:
- debugging,
- root cause analysis,
- detailed investigation.
Weakness of Logs
Logs are:
- noisy,
- high-volume,
- expensive to store,
- difficult to aggregate statistically.
This is why metrics exist.
Pillar 2 – Metrics
What Are Metrics?
Metrics are numerical measurements collected over time.
They summarize system behavior.
Metric answer:
- Is the system healthy?
- How much traffic exists?
- Is latency increasing?
- Are errors increasing?
Example Metrics
| Metric | Value |
|---|---|
| CPU Usage | 75% |
| Request Rate | 10,000 req/sec |
| Error Rate | 2% |
| API Latency | 120 ms |
Characteristics of Metrics
Metrics are:
- aggregated,
- lightweight,
- time-series based,
- highly efficient.
Types of Metrics
1 Counter
Only increases.
Example:
- total requests,
- total errors.
2 Gauge
Represents current value.
Example:
- CPU usage,
- memory usage.
3 Histogram
Measures distribution.
Example:
- request latency ranges.
4 Summary
Provides percentile statistics.
Example:
- P95 latency,
- P99 latency.
Time-Series Data
Metrics are stored as time-series data.
Example:
10:00 → 50ms
10:01 → 55ms
10:02 → 70msThis allows:
- trend analysis,
- dashboards,
- alerting.
Strengths of Metrics
Metrics are:
- lightweight,
- scalable,
- excellent for alerting,
- ideal for dashboards.
Weaknesses of Metrics
Metrics lack detail.
Metrics may show:
- latency increased,
but not:
- why latency increased.
This is where logs and traces help.
Pillar 3 – Traces
What Are Traces?
Traces track the complete journey of a request across distributed systems.
They answer:
- Which services were involved?
- Where was latency introduced?
- Which service failed?
- How long did each operation take?
Example Request Flow
Client
↓
API Gateway
↓
Auth Service
↓
Payment Service
↓
DatabaseA trace records the entire path.
Trace IDs and Span IDs
Trace ID
Represents:
- one complete distributed request.
Span ID
Represents:
- one operation inside the trace.
Example:
Trace
├── API Gateway Span
├── Auth Service Span
├── Payment Service Span
└── Database SpanExample Trace Insights
Tracing can reveal:
- slow database queries,
- failed downstream services,
- retry storms,
- bottlenecks.
Strengths of Traces
Traces are excelled for:
- microservices debugging,
- latency analysis,
- request flow visualization.
Weaknesses of Traces
Tracing systems:
- can be expensive,
- generate large telemetry volumes,
- require instrumentation.
Relationship Between Logs, Metrics, and Traces
The three pillars complement each other.
Example Scenario
Suppose users report:
“Checkout is slow.”
Metrics Show
P95 latency increased from 200ms to 3sMetrics reveal:
- system health issue exists.
Traces Show
Payment Service → Database query taking 2.5sTraces identify:
- where slowdown occurred.
Logs Show
Database connection pool exhaustedLogs reveal:
- root cause.
This Is the Power of Observability
Metrics detect.
Traces locate.
Logs explain.
Mental Model of the Three Pillars
Think of observability like a hospital:
| Pillar | Analogy |
|---|---|
| Metrics | Vital signs |
| Logs | Medical records |
| Traces | Patient journey |
Each provides different visibility into system health.
Leave a comment
Your email address will not be published. Required fields are marked *


