Updated on 28 May, 202631 mins read 33 views

One of the hardest parts of system design interviews is not technical knowledge.

It's delivery.

Many candidates know databases, queues, caching, load balancing, and distributed systems concepts – but struggle to present their ideas in a structured, interview-friendly way.

A strong delivery framework helps you:

  • Stay organized
  • Avoid rambling
  • Build interviewer confidence
  • Cover important trade-offs
  • Manage time effectively

Why This Framework Matters

Most candidates make one of these mistakes:

  1. Jump directly into architecture
  2. Overengineer too early
  3. Spend too much time on one area
  4. Forget core requirements
  5. Never finish the design

Interviewers are not looking for the most complex architecture.

They are looking for:

  • Can you identify the important problems?
  • Can you prioritize correctly?
  • Can you communicate clearly?
  • Can you make reasonable trade-offs?
  • Can you design incrementally?

This framework gives you a roadmap.

Overall Structure

A system design interview should flow like this:

Requirements
↓
Core Entities
↓
API Design
↓
High-Level Design
↓
Deep Dives
↓
Tradeoffs & Scaling

1 Requirements (~5 mins)

First, understand what the system should do.

Why Requirements Matter

Imagine someone says:

Design Twitter

What does that actually mean?

Twitter contains

  • Tweets
  • Likes
  • Retweets
  • Notifications
  • Spaces
  • Messaging
  • Recommendations
  • Ads
  • Analytics
  • Media uploads
  • Search
  • Trends

You cannot design everything in 45 minutes.

So the first job is:

Reduce ambiguity.

Split requirements into two parts:

Functional Requirements

Functional requirements define:

What should the system do?

These are user-facing features.

Examples for Twitter:

  • Users can post tweets
  • Users can follow others
  • Users can view their feed

Keep only the top 3-5 important features. We intentionally ignore many features. Because interviewers want prioritization.

Why Prioritization Matters?

A common beginner mistake:

“I will also add stories, ads, hashtags, video streaming, messaging, AI recommendations…”

This destroys interview performance.

Why?

Because:

Breadth without depth is weak engineering.

A simple working system is better than an incomplete “Google-scale” architecture.

Non-Functional Requirements

Non-functional requirements define:

How should the system behave?

These describe system qualities.

Examples:

  • High availability
  • Low latency (<200ms)
  • Scale to 100M+ users
  • Durable storage
  • Secure APIs

Think about:

  • Scalability
    • Can the system handle growth?
    • Example:
      • 100M DAU
      • 1B requests/day
  • Availability
    • Should the system continue working during failures?
    • Example:
      • Twitter prefers availability.
      • It is acceptable if likes appear slightly delayed.
  • Consistency
    • Should all users immediately see the latest data?
    • Banking systems require strong consistency.
    • Social media usually uses eventual consistency.
  • Latency
    • How fast should operations be?
    • Examples:
      • Feed load < 200ms
      • Search < 500ms
      • Chat delivery < 100ms
  • Durability
    • Can data be lost?
    • Examples:
      • Banking: zero data loss acceptable
      • Social media: minor temporary loss may be acceptable
  • Fault tolerance
  • Security
    • Authentication
    • Authorization
    • Encryption
    • Access control
    • Rate limiting
  1. CAP Theorem
  2. Environments Constraints
  3. Scalability
  4. Latency
  5. Durability
  6. Security
  7. Fault Tolerance
  8. Compliance

1 Cap Theorem

This defines the tradeoff between:

CAP: Consistency + Availability + Partition Tolerance

In distributed systems, partition tolerance is mandatory.

So the real question becomes:

Consistency vs Availability

Consistency: (CP Systems)

Every user sees the latest data immediately.

Example:

  • Banking systems
  • Payment systems

If one node fails:

  • System may reject requests temporarily

Priority:

Correctness over uptime

Availability: (AP Systems)

System continues serving requests even during failures.

Example:

  • Twitter
  • Instagram feeds

Tradeoff:

  • Some users may see stale data briefly

Priority:

Uptime over immediate consistency

2 Environment Constraints

These are limitations imposed by the runtime environment.

Examples:

Mobile Devices:

Problems:

  • Low battery
  • Weak CPU
  • Low memory
  • Unstable internet

Design implications:

  • Smaller payloads
  • Reduced polling
  • Compression
  • Offline caching

Low Bandwidth Networks

Example:

  • Video streaming on 3G

Design implications:

  • Adaptive bitrate streaming
  • CDN usage
  • Compression

IoT Devices:

Problems:

  • Limited RAM
  • Weak processors

Design implications:

  • Lightweight protocols
  • Edge processing

3 Scalability

This defines:

Can the sytem handle growth?

Every system scales differently.

Types of Scaling Concerns

  • Read-Heavy Systems
    • Example:
      • YouTube
      • Netflix
    • Challenge:

      Huge read traffic
    • Solutions:
      • CDN
      • Read replicas
      • Caching
  • Write-Heavy Systems
    • Example:
      • Logging systems
      • Analytics systems
    • Challenge:

      Massive write throughput
    • Solutions:
      • Kafka
      • Partitioning
      • Batch writes
  • Bursty Traffic
    • Example:
      • Ticket booking during concerts
      • Black Friday sales
    • Traffic spikes suddenly,
    • Solutions:
      • Auto scaling
      • Queues
      • Rate limiting

4 Latency

Latency means:

How fast the system responds

This is critical in user experience.

Examples:

  • Search Systems
    • Example:
      • Google
      • Yelp
    • Search must feel instant.
    • Target:

      < 500 ms
    • Solutions:
      • Indexing
      • Caching
      • Precomputation
  • Real-Time Chat
    • Target:

      < 100 ms delivery
    • Solutions:
      • WebSockets
      • Persistent connections
      • In-memory routing

5 Durability

Durability means:

How safely data is preserved

High Durability Systems

Example:

  • Banking
  • Financial ledgers

Requirements:

  • No data loss
  • Replication
  • WAL logs
  • Backups

Lower Durability Systems

Example:

  • Social media likes

Small temporary losses may be acceptable.

6 Security

Security protects:

  • Users
  • Data
  • Infrastructure

Authentication:

Who are you?

Example:

  • JWT
  • OAuth

Authorization

What are you allowed to do?

Example:

  • RBAC
  • ACL

Encryption:

Protect data:

  • In transit (TLS)
  • At rest

Rate Limiting:

Protect APIs from abuse.

7 Fault Tolerance

Fault tolerance means:

Can the system survive failures?

Failures are guaranteed in distributed systems.

Common Failures

  • Server crashes
  • Database failures
  • Network partitions
  • Region outages

Solutions

  • Redundancy
    • Multiple replicas
  • Failover
    • Switch traffic automatically
  • Retries
    • Retry temporary failures
  • Circuit Breakers
    • Prevent cascading failures.

8 Compliance

Compliance means following legal or industry regulations.

Examples:

  • GDPR
    • Europe
      • User data privacy
      • Right to deletion
  • HIPAA
    • Healthcare:
      • Protect medical records
  • PCI DSS
    • Payments
      • Secure card data

Capacity Estimation

Many candidates waste time here.

They calculate:

  • DAU
  • QPS
  • Storage
  • Bandwidth

without using the numbers.

Interviewers do not care about arithmetic.

They care about:

Can you use scale numbers to justify design decisions?

Good Estimation Usage:

2 Core Entities (~2 Minutes)

Now identify the core objects in the system.

These become:

  • Database records
  • API resources
  • Cache objects
  • Queue events

Example: Twitter

User
Tweet
Follow

Do not create 20 entities upfront.

You will discover more during design.

Why Entities Matter

Entities help structure your thinking.

Without entities, architecture becomes vague.

For example:

What exactly are we storing?

You need entities to answer that.

3 API Design (~5 Minutes)

Now define how clients interact with the system.

This creates the contract between:

Client <-> Backend

Why API Design Matters

APIs drive architecture.

If you understand the requests:

  • You understand traffic patterns
  • You understand data flow
  • You understand storage needs

Important Security Insight

A backend service should not trust identity information sent directly by the client in the request body or query parameters, because users can modify it.

For example, suppose your API accepts this:

{
  "user_id": 123,
  "comment": "hello"
}

A malicious user could change user_id to another person's ID:

{
  "user_id": 999,
  "comment": "I hacked this"
}

If the server blindly trusts that value, the attacker can:

  • impersonate other users,
  • access unauthorized data,
  • perform actions as someone else.

That's a serious security vulnerability called identity spoofing or broken authentication/authorization.

The correct Approach:

Instead of trusting the client-provided user_id, the server should determine the user identity from a verified authentication token.

Example flow:

  1. User logs in
    • Server returns a signed token (JWT/session token)
    • Example:

      Authorization: Bearer eyJhbGciOi...
  2. Client sends request

    POST /comments
    Authorization: Bearer eyJhbGciOi...

    Body:

    {
      "comment": "hello"
    }

    Notice:

    no user_id in the body.

  3. Server validates token
    • The backend:
      • verifies token signature,
      • checks expiration,
      • extracts authenticated user identity.
    • Internally:

      user_id = token.user_id
    • Now the server knows:
      • who the user actually is,
      • without trusting client input.

4 High-Level Design (~10-15 minutes)

Now you finally design the architecture.

This is the “boxes and arrows” section.

Basic Architecture Pattern

Client
↓
Load Balancer
↓
API Servers
↓
Cache / DB / Queue

Important Principle

Start simple.

Many candidates immediately introduce:

  • Kafka
  • Kubernetes
  • CQRS
  • Event sourcing
  • Multi-region replication

This is usually a mistake.

Why Simplicity Matters

Interviewers want:

A correct system first

Then optimization later.

Design Incrementally

Build your system endpoint by endpoint.

Example:

POST /tweets

Now ask:

  • Where is the tweet stored?
  • Which service handles it?
  • What database schema is needed?
  • What cache updates happen?

This keeps your design structured.

5 Deep Dives (~10 minutes)

Now optimize the system.

This is where seniority becomes visible.

Purpose of Deep Dives

You now:

  • Improve scalability
  • Fix bottlenecks
  • Address edge cases
  • Meet non-functional requirements

 

Buy Me A Coffee

Leave a comment

Your email address will not be published. Required fields are marked *