The Dream: Real-Time Human Communication
Imagine you want to build a video conferencing platform.
Not a toy project.
Something like:
- Zoom
- Google Meet
- Microsoft Teams
- Discord
- WhatsApp Calling
Let's define the requirements.
Requirement 1: Very Low Latency
Human conversation is extremely sensitive to delay.
Imagine this dialogue.
Alice:
Hello BobBob hears it:
2 seconds laterNow Bob responds.
Alice hears it:
2 seconds laterThe conversation becomes painful.
Acceptable Latency
Roughly speaking:
| Latency | Experience |
|---|---|
| < 100ms | Excellent |
| 100-200ms | Good |
| 200-400ms | Noticeable |
| > 500ms | Annoying |
| > 1000ms | Difficult |
Human communication requires extremely low latency.
This immediately eliminates many traditional approaches.
Requirement 2: Continuous Streaming
A chat application sends:
Message
Pause
Message
PauseA video call sends:
Frame
Frame
Frame
Frame
Frame
Frame
...continuously.
For example:
30 FPS video:
30 frames per second60 FPS video:
60 frames per secondThe system must handle a nonstop stream of data.
Requirement 3: Audio and Video Synchronization
Suppose Alice says:
HelloHer lips move.
The second should match the movement.
If video arrives first:
Lips moveand audio arrives later:
Hellothe experience feels broken.
Synchronization becomes critical.
Requirement 4: Adaptive Quality
Network conditions constantly change.
Example:
At 9:00 AM:
Available Bandwidth: 20 MbpsAt 9:05 AM:
Availabel Bandwidth: 3 MbpsThe communication system must adapt automatically.
Otherwise:
Video freezes
Call dropsRequirement 5: Security
Voice calls contain sensitive information.
Video calls contain sensitive information.
Screen sharing may expose:
- Passwords
- Emails
- Financial data
Therefore:
Encryption Requirednot optional.
Requirement 6: NAT Traversal
As we learned:
Most devices are hidden behind:
- Routers
- NAT
- Firewalls
The communication system must somehow connect them anyway.
This is one of the hardest requirements.
Requirement 7: Browser Support
Before WebRTC, video communication usually required plugins.
Examples:
- Flash
- Java Applets
- Proprietary software
Problems:
Install plugin
Update plugin
Security vulnerabilities
Browser incompatibilityUsers hated this.
Developers hated this.
Browser vendors hated this.
A better solution was needed.
Why WebSockets Were Not Enough
Many engineers ask:
If WebSockets provide real-time communication, why didn't we just use WebSockets?
Let's examine it carefully.
What WebSockets Actually Provide
WebSockets provide:
Persistent Bidirectional CommunicationExample:
Client <--> ServerMessages can flow both ways.
But WebSockets only solve one proble:
Transporting BytesWhat WebSockets Do NOT Provide
WebSockets do not provide:
NAT Traversal
No STUN
No TURN
No ICE
Audio Processing
- No codecs
- No compression
- No encoding
- No decoding
Video Processing
- No frame handling
- No synchronization
- No bitrate adaptation
Congestion Control
- No bandwidth management
- No quality adaptation
Media Security
- No media-specific encryption pipeline
Peer Discovery
- No peer connection mechanism
Pakcet Loss Recovery
- No real-time media optimization
The Hiden Complexity
Suppose you want to build Zoom using WebSockets.
You would need to build:
Media Engine
Codec Engine
Audio Processing
Video Processing
Encryption
Congestion Control
NAT Traversal
Bandwidth Adaptation
Peer Discovery
Connection Negotiationyourself.
This is an enormous undertaking.
Essentially:
You would end up rebuilding WebRTC.
The Industry's Realization
Engineers around the world kept solving the same problems repeatedly.
Every communication platform needed:
Audio
Video
Security
NAT Traversal
Low Latency
Adaptive BitrateAgain and again.
The industry needed a standardized solution.
Google's Proposal
Around 2010, Google acquired a company called:
Global IP Solutions
commonly known as GIPS.
GIPS specialized in:
- Voice over IP
- Video communication
- Real-time media technologies
Google recognized someting important:
Real-time communication should be built directly into browsers.
Not through plugins.
Not through third-party software.
Directly into the web platform.
This idea eventually evolved into WebRTC.
The Core Vision
The vision was simple:
Allow developers to build:
Audio Calls
Video Calls
Screen Sharing
File Transferusing standard browser APIs.
Without plugins.
Without installations.
Without proprietary technology.
The Three Major Goals
WebRTC was designed around three primary goals.
Goal 1: Real-Time Communication
The system must support:
Audio
Video
Datawith minimal latency.
Goal 2: Peer-to-Peer First
Whenever possible:
Alice <-----> Bobdirect communication.
This reduces:
- latency
- bandwidth costs
- infrastructure requirements
Goal 3: Secure by Default
Unlike many older systems:
WebRTC made encryption mandatory.
Not optional
Every WebRTC connection must be encrypted.
The WebRTC Philosophy
The designers of WebRTC asked:
What if browsers could provide all the hard parts automatically?
Instead of developers implementing:
NAT Traversal
Codecs
Encryption
Media Transportthe browser would provide them.
Developers would simply use APIs.
This philosophy became the foundation of WebRTC.
What WebRTC Actually Is
One of the biggest misconceptions:
WebRTC is a protocol
Wrong.
WebRTC is a framework.
More precisely:
A collection of standards,
protocols,
APIs,
and media technologies
working together.It is not one thing.
It is many technologies integrated into one system.
The Major Building Blocks
At a high level, WebRTC consists of several major subsystems.
Media Capture
Responsible for obtaining:
Camera
Microphone
Screenfrom the user's device.
Peer Connectivity
Responsible for:
Finding Peers
Creating Connections
Maintaining ConnectionsNAT Traversal
Responsible for:
STUN
TURN
ICEoperations.
Media Transport
Responsible for moving:
Audio
Videoacross networks.
Security
Responsible for:
Encryption
Authentication
Key ExchangeCongestion Control
Responsible for adapting:
Bitrate
Quality
Resolutionbased on network conditions.
A High-Level WebRTC Call
Let's see the entire journey before diving into details.
Imagine Alice starts a call.
Step 1
Capture media.
Camera
Microphonebecome available
Step 2
Create a peer connection.
Browser prepares communication systems.
Step 3
Exchange connection information.
Peers share:
Capabilities
Addresses
Media InformationStep 4
Discover network routes.
Using:
STUN
TURN
ICEStep 5
Establish secure communication.
Encryption keys created.
Step 6
Begin media transport.
Audio and video start flowing.
Step 7
Continuously adapt.
Monitor:
Bandwidth
Packet Loss
LatencyAdjust quality dynamically.
Why WebRTC Feels Complex
Many developers first encounter terms like:
Offer
Answer
SDP
ICE
STUN
TURN
RTP
RTCP
DTLS
SRTPand become overwhelmed.
The reason is simple.
WebRTC combines knowledge from:
- Networking
- Security
- Distributed Systems
- Audio Engineering
- Video Engineering
- Browser Internals
into one platform.
Leave a comment
Your email address will not be published. Required fields are marked *
