One of the hardest parts of system design interviews is not technical knowledge.
It's delivery.
Many candidates know databases, queues, caching, load balancing, and distributed systems concepts – but struggle to present their ideas in a structured, interview-friendly way.
A strong delivery framework helps you:
- Stay organized
- Avoid rambling
- Build interviewer confidence
- Cover important trade-offs
- Manage time effectively
Why This Framework Matters
Most candidates make one of these mistakes:
- Jump directly into architecture
- Overengineer too early
- Spend too much time on one area
- Forget core requirements
- Never finish the design
Interviewers are not looking for the most complex architecture.
They are looking for:
- Can you identify the important problems?
- Can you prioritize correctly?
- Can you communicate clearly?
- Can you make reasonable trade-offs?
- Can you design incrementally?
This framework gives you a roadmap.
Overall Structure
A system design interview should flow like this:
Requirements
↓
Core Entities
↓
API Design
↓
High-Level Design
↓
Deep Dives
↓
Tradeoffs & Scaling1 Requirements (~5 mins)
First, understand what the system should do.
Why Requirements Matter
Imagine someone says:
Design Twitter
What does that actually mean?
Twitter contains
- Tweets
- Likes
- Retweets
- Notifications
- Spaces
- Messaging
- Recommendations
- Ads
- Analytics
- Media uploads
- Search
- Trends
You cannot design everything in 45 minutes.
So the first job is:
Reduce ambiguity.
Split requirements into two parts:
Functional Requirements
Functional requirements define:
What should the system do?These are user-facing features.
Examples for Twitter:
- Users can post tweets
- Users can follow others
- Users can view their feed
Keep only the top 3-5 important features. We intentionally ignore many features. Because interviewers want prioritization.
Why Prioritization Matters?
A common beginner mistake:
“I will also add stories, ads, hashtags, video streaming, messaging, AI recommendations…”
This destroys interview performance.
Why?
Because:
Breadth without depth is weak engineering.
A simple working system is better than an incomplete “Google-scale” architecture.
Non-Functional Requirements
Non-functional requirements define:
How should the system behave?These describe system qualities.
Examples:
- High availability
- Low latency (<200ms)
- Scale to 100M+ users
- Durable storage
- Secure APIs
Think about:
- Scalability
- Can the system handle growth?
- Example:
- 100M DAU
- 1B requests/day
- Availability
- Should the system continue working during failures?
- Example:
- Twitter prefers availability.
- It is acceptable if likes appear slightly delayed.
- Consistency
- Should all users immediately see the latest data?
- Banking systems require strong consistency.
- Social media usually uses eventual consistency.
- Latency
- How fast should operations be?
- Examples:
- Feed load < 200ms
- Search < 500ms
- Chat delivery < 100ms
- Durability
- Can data be lost?
- Examples:
- Banking: zero data loss acceptable
- Social media: minor temporary loss may be acceptable
- Fault tolerance
- Security
- Authentication
- Authorization
- Encryption
- Access control
- Rate limiting
- CAP Theorem
- Environments Constraints
- Scalability
- Latency
- Durability
- Security
- Fault Tolerance
- Compliance
1 Cap Theorem
This defines the tradeoff between:
CAP: Consistency + Availability + Partition ToleranceIn distributed systems, partition tolerance is mandatory.
So the real question becomes:
Consistency vs AvailabilityConsistency: (CP Systems)
Every user sees the latest data immediately.
Example:
- Banking systems
- Payment systems
If one node fails:
- System may reject requests temporarily
Priority:
Correctness over uptimeAvailability: (AP Systems)
System continues serving requests even during failures.
Example:
- Instagram feeds
Tradeoff:
- Some users may see stale data briefly
Priority:
Uptime over immediate consistency2 Environment Constraints
These are limitations imposed by the runtime environment.
Examples:
Mobile Devices:
Problems:
- Low battery
- Weak CPU
- Low memory
- Unstable internet
Design implications:
- Smaller payloads
- Reduced polling
- Compression
- Offline caching
Low Bandwidth Networks
Example:
- Video streaming on 3G
Design implications:
- Adaptive bitrate streaming
- CDN usage
- Compression
IoT Devices:
Problems:
- Limited RAM
- Weak processors
Design implications:
- Lightweight protocols
- Edge processing
3 Scalability
This defines:
Can the sytem handle growth?Every system scales differently.
Types of Scaling Concerns
- Read-Heavy Systems
- Example:
- YouTube
- Netflix
Challenge:
Huge read traffic- Solutions:
- CDN
- Read replicas
- Caching
- Example:
- Write-Heavy Systems
- Example:
- Logging systems
- Analytics systems
Challenge:
Massive write throughput- Solutions:
- Kafka
- Partitioning
- Batch writes
- Example:
- Bursty Traffic
- Example:
- Ticket booking during concerts
- Black Friday sales
- Traffic spikes suddenly,
- Solutions:
- Auto scaling
- Queues
- Rate limiting
- Example:
4 Latency
Latency means:
How fast the system respondsThis is critical in user experience.
Examples:
- Search Systems
- Example:
- Yelp
- Search must feel instant.
Target:
< 500 ms- Solutions:
- Indexing
- Caching
- Precomputation
- Example:
- Real-Time Chat
Target:
< 100 ms delivery- Solutions:
- WebSockets
- Persistent connections
- In-memory routing
5 Durability
Durability means:
How safely data is preservedHigh Durability Systems
Example:
- Banking
- Financial ledgers
Requirements:
- No data loss
- Replication
- WAL logs
- Backups
Lower Durability Systems
Example:
- Social media likes
Small temporary losses may be acceptable.
6 Security
Security protects:
- Users
- Data
- Infrastructure
Authentication:
Who are you?
Example:
- JWT
- OAuth
Authorization
What are you allowed to do?
Example:
- RBAC
- ACL
Encryption:
Protect data:
- In transit (TLS)
- At rest
Rate Limiting:
Protect APIs from abuse.
7 Fault Tolerance
Fault tolerance means:
Can the system survive failures?Failures are guaranteed in distributed systems.
Common Failures
- Server crashes
- Database failures
- Network partitions
- Region outages
Solutions
- Redundancy
- Multiple replicas
- Failover
- Switch traffic automatically
- Retries
- Retry temporary failures
- Circuit Breakers
- Prevent cascading failures.
8 Compliance
Compliance means following legal or industry regulations.
Examples:
- GDPR
- Europe
- User data privacy
- Right to deletion
- Europe
- HIPAA
- Healthcare:
- Protect medical records
- Healthcare:
- PCI DSS
- Payments
- Secure card data
- Payments
Capacity Estimation
Many candidates waste time here.
They calculate:
- DAU
- QPS
- Storage
- Bandwidth
without using the numbers.
Interviewers do not care about arithmetic.
They care about:
Can you use scale numbers to justify design decisions?Good Estimation Usage:
2 Core Entities (~2 Minutes)
Now identify the core objects in the system.
These become:
- Database records
- API resources
- Cache objects
- Queue events
Example: Twitter
User
Tweet
FollowDo not create 20 entities upfront.
You will discover more during design.
Why Entities Matter
Entities help structure your thinking.
Without entities, architecture becomes vague.
For example:
What exactly are we storing?You need entities to answer that.
3 API Design (~5 Minutes)
Now define how clients interact with the system.
This creates the contract between:
Client <-> BackendWhy API Design Matters
APIs drive architecture.
If you understand the requests:
- You understand traffic patterns
- You understand data flow
- You understand storage needs
Important Security Insight
A backend service should not trust identity information sent directly by the client in the request body or query parameters, because users can modify it.
For example, suppose your API accepts this:
{
"user_id": 123,
"comment": "hello"
}A malicious user could change user_id to another person's ID:
{
"user_id": 999,
"comment": "I hacked this"
}If the server blindly trusts that value, the attacker can:
- impersonate other users,
- access unauthorized data,
- perform actions as someone else.
That's a serious security vulnerability called identity spoofing or broken authentication/authorization.
The correct Approach:
Instead of trusting the client-provided user_id, the server should determine the user identity from a verified authentication token.
Example flow:
- User logs in
- Server returns a signed token (JWT/session token)
Example:
Authorization: Bearer eyJhbGciOi...
Client sends request
POST /comments Authorization: Bearer eyJhbGciOi...Body:
{ "comment": "hello" }Notice:
no
user_idin the body.- Server validates token
- The backend:
- verifies token signature,
- checks expiration,
- extracts authenticated user identity.
Internally:
user_id = token.user_id- Now the server knows:
- who the user actually is,
- without trusting client input.
- The backend:
4 High-Level Design (~10-15 minutes)
Now you finally design the architecture.
This is the “boxes and arrows” section.
Basic Architecture Pattern
Client
↓
Load Balancer
↓
API Servers
↓
Cache / DB / QueueImportant Principle
Start simple.
Many candidates immediately introduce:
- Kafka
- Kubernetes
- CQRS
- Event sourcing
- Multi-region replication
This is usually a mistake.
Why Simplicity Matters
Interviewers want:
A correct system firstThen optimization later.
Design Incrementally
Build your system endpoint by endpoint.
Example:
POST /tweetsNow ask:
- Where is the tweet stored?
- Which service handles it?
- What database schema is needed?
- What cache updates happen?
This keeps your design structured.
5 Deep Dives (~10 minutes)
Now optimize the system.
This is where seniority becomes visible.
Purpose of Deep Dives
You now:
- Improve scalability
- Fix bottlenecks
- Address edge cases
- Meet non-functional requirements
Leave a comment
Your email address will not be published. Required fields are marked *
