The Birth of WebRTC

Updated on 30 Jun, 202625 mins read 200 views

The Dream: Real-Time Human Communication

Imagine you want to build a video conferencing platform.

Not a toy project.

Something like:

Zoom
Google Meet
Microsoft Teams
Discord
WhatsApp Calling

Let's define the requirements.

Requirement 1: Very Low Latency

Human conversation is extremely sensitive to delay.

Imagine this dialogue.

Alice:

Hello Bob

Bob hears it:

2 seconds later

Now Bob responds.

Alice hears it:

 2 seconds later

The conversation becomes painful.

Acceptable Latency

Roughly speaking:

Latency	Experience
< 100ms	Excellent
100-200ms	Good
200-400ms	Noticeable
> 500ms	Annoying
> 1000ms	Difficult

Human communication requires extremely low latency.

This immediately eliminates many traditional approaches.

Requirement 2: Continuous Streaming

A chat application sends:

Message
Pause
Message
Pause

A video call sends:

Frame
Frame
Frame
Frame
Frame
Frame
...

continuously.

For example:

30 FPS video:

30 frames per second

60 FPS video:

60 frames per second

The system must handle a nonstop stream of data.

Requirement 3: Audio and Video Synchronization

Suppose Alice says:

Hello

Her lips move.

The second should match the movement.

If video arrives first:

Lips move

and audio arrives later:

Hello

the experience feels broken.

Synchronization becomes critical.

Requirement 4: Adaptive Quality

Network conditions constantly change.

Example:

At 9:00 AM:

Available Bandwidth: 20 Mbps

At 9:05 AM:

Availabel Bandwidth: 3 Mbps

The communication system must adapt automatically.

Otherwise:

Video freezes
Call drops

Requirement 5: Security

Voice calls contain sensitive information.

Video calls contain sensitive information.

Screen sharing may expose:

Passwords
Emails
Financial data

Therefore:

Encryption Required

not optional.

Requirement 6: NAT Traversal

As we learned:

Most devices are hidden behind:

Routers
NAT
Firewalls

The communication system must somehow connect them anyway.

This is one of the hardest requirements.

Requirement 7: Browser Support

Before WebRTC, video communication usually required plugins.

Examples:

Flash
Java Applets
Proprietary software

Problems:

Install plugin
Update plugin
Security vulnerabilities
Browser incompatibility

Users hated this.

Developers hated this.

Browser vendors hated this.

A better solution was needed.

Why WebSockets Were Not Enough

Many engineers ask:

If WebSockets provide real-time communication, why didn't we just use WebSockets?

Let's examine it carefully.

What WebSockets Actually Provide

WebSockets provide:

Persistent Bidirectional Communication

Example:

Client <--> Server

Messages can flow both ways.

But WebSockets only solve one proble:

Transporting Bytes

What WebSockets Do NOT Provide

WebSockets do not provide:

NAT Traversal

No STUN

No TURN

No ICE

Audio Processing

No codecs
No compression
No encoding
No decoding

Video Processing

No frame handling
No synchronization
No bitrate adaptation

Congestion Control

No bandwidth management
No quality adaptation

Media Security

No media-specific encryption pipeline

Peer Discovery

No peer connection mechanism

Pakcet Loss Recovery

No real-time media optimization

The Hiden Complexity

Suppose you want to build Zoom using WebSockets.

You would need to build:

Media Engine
Codec Engine
Audio Processing
Video Processing
Encryption
Congestion Control
NAT Traversal
Bandwidth Adaptation
Peer Discovery
Connection Negotiation

yourself.

This is an enormous undertaking.

Essentially:

You would end up rebuilding WebRTC.

The Industry's Realization

Engineers around the world kept solving the same problems repeatedly.

Every communication platform needed:

Audio
Video
Security
NAT Traversal
Low Latency
Adaptive Bitrate

Again and again.

The industry needed a standardized solution.

Google's Proposal

Around 2010, Google acquired a company called:

Global IP Solutions

commonly known as GIPS.

GIPS specialized in:

Voice over IP
Video communication
Real-time media technologies

Google recognized someting important:

Real-time communication should be built directly into browsers.

Not through plugins.

Not through third-party software.

Directly into the web platform.

This idea eventually evolved into WebRTC.

The Core Vision

The vision was simple:

Allow developers to build:

Audio Calls
Video Calls
Screen Sharing
File Transfer

using standard browser APIs.

Without plugins.

Without installations.

Without proprietary technology.

The Three Major Goals

WebRTC was designed around three primary goals.

Goal 1: Real-Time Communication

The system must support:

Audio
Video
Data

with minimal latency.

Goal 2: Peer-to-Peer First

Whenever possible:

Alice <-----> Bob

direct communication.

This reduces:

latency
bandwidth costs
infrastructure requirements

Goal 3: Secure by Default

Unlike many older systems:

WebRTC made encryption mandatory.

Not optional

Every WebRTC connection must be encrypted.

The WebRTC Philosophy

The designers of WebRTC asked:

What if browsers could provide all the hard parts automatically?

Instead of developers implementing:

NAT Traversal
Codecs
Encryption
Media Transport

the browser would provide them.

Developers would simply use APIs.

This philosophy became the foundation of WebRTC.

What WebRTC Actually Is

One of the biggest misconceptions:

WebRTC is a protocol

Wrong.

WebRTC is a framework.

More precisely:

A collection of standards,
protocols,
APIs,
and media technologies
working together.

It is not one thing.

It is many technologies integrated into one system.

The Major Building Blocks

At a high level, WebRTC consists of several major subsystems.

Media Capture

Responsible for obtaining:

Camera
Microphone
Screen

from the user's device.

Peer Connectivity

Responsible for:

Finding Peers
Creating Connections
Maintaining Connections

NAT Traversal

Responsible for:

STUN
TURN
ICE

operations.

Media Transport

Responsible for moving:

Audio
Video

across networks.

Security

Responsible for:

Encryption
Authentication
Key Exchange

Congestion Control

Responsible for adapting:

Bitrate
Quality
Resolution

based on network conditions.

A High-Level WebRTC Call

Let's see the entire journey before diving into details.

Imagine Alice starts a call.

Step 1

Capture media.

Camera
Microphone

become available

Step 2

Create a peer connection.

Browser prepares communication systems.

Step 3

Exchange connection information.

Peers share:

Capabilities
Addresses
Media Information

Step 4

Discover network routes.

Using:

STUN
TURN
ICE

Step 5

Establish secure communication.

Encryption keys created.

Step 6

Begin media transport.

Audio and video start flowing.

Step 7

Continuously adapt.

Monitor:

Bandwidth
Packet Loss
Latency

Adjust quality dynamically.

Why WebRTC Feels Complex

Many developers first encounter terms like:

Offer
Answer
SDP
ICE
STUN
TURN
RTP
RTCP
DTLS
SRTP

and become overwhelmed.

The reason is simple.

WebRTC combines knowledge from:

Networking
Security
Distributed Systems
Audio Engineering
Video Engineering
Browser Internals

into one platform.

The Important Mental Model

This is the most important thing to remember before moving forward.

WebRTC is solving five major problems.

Problem 1: How do peers discover each other?

Problem 2: How do peers exchange capabilities?

Problem 3: How do peers connect through NAT?

Problem 4: How do peers transport media efficiently?

Problem 5: How do peers keep communication secure?

Everything in WebRTC exists to solve one to these problems.

Every protocol.

Every API

Every component

Your email address will not be published. Required fields are marked *

The Birth of WebRTC

The Dream: Real-Time Human Communication

Requirement 1: Very Low Latency

Acceptable Latency

Requirement 2: Continuous Streaming

For example:

Requirement 3: Audio and Video Synchronization

Requirement 4: Adaptive Quality

Requirement 5: Security

Requirement 6: NAT Traversal

Requirement 7: Browser Support

Why WebSockets Were Not Enough

What WebSockets Actually Provide

What WebSockets Do NOT Provide

Audio Processing

Video Processing

The Hiden Complexity

The Industry's Realization

Google's Proposal

The Core Vision

The Three Major Goals

Goal 1: Real-Time Communication

Goal 2: Peer-to-Peer First

Goal 3: Secure by Default

The WebRTC Philosophy

What WebRTC Actually Is

The Major Building Blocks

Media Capture

Peer Connectivity

NAT Traversal

Media Transport

Security

Congestion Control

A High-Level WebRTC Call

Step 1

Step 2

Step 3

Step 4

Step 5

Step 6

Step 7

Why WebRTC Feels Complex

The Important Mental Model

Leave a comment