Updated on 14 Jun, 202625 mins read 18 views

WebRTC Is Not a Protocol

Many beginners think:

HTTP → Protocol
TCP → Protocol
UDP → Protocol
WebRTC → Protocol

Wrong.

WebRTC is not a protocol.

WebRTC is a framework composed of many protocol and technologies.

Think of it like a car.

When someone says:

Car

they are not referring to one component.

A car contains:

Engine
Transmission
Brakes
Fuel System
Steering
Electronics

working together.

Similarly, WebRTC contains:

Media Capture
Peer Connectivity
NAT Traversal
Media Transport
Encryption
Congestion Control

working together.

When we say:

WebRTC

we are referring to an entire communication platform.

The Core Problem WebRTC Solves

Before looking at architecture, let's revisit the goal.

Alice wants to communicate with Bob.

The communication may involve.

Audio
Video
Data
Screen Sharing
Files

The system must:

  1. Capture media
  2. Discover peers
  3. Traverse NAT
  4. Establish security
  5. Transport media
  6. Adapt to network conditions

Every component inside WebRTC exist because one of these requirement exists.

A Bird's-Eye View of the Architecture

At the highest level, a WebRTC application looks like this:

+----------------------------------+
| Application Layer                |
| React / Angular / Vue / JS       |
+----------------------------------+

+----------------------------------+
| WebRTC APIs                      |
+----------------------------------+

+----------------------------------+
| WebRTC Engine                    |
+----------------------------------+

+----------------------------------+
| Network Layer                    |
+----------------------------------+

+----------------------------------+
| Internet                         |
+----------------------------------+

Let's understand each layer.

Layer 1: Application Layer

This is our code.

Examples:

Google Meet
Zoom Web
Discord
Custom Application

This layer decides:

  • when a call starts
  • who joins a room
  • which camera to use
  • which microphone to use

For example:

joinMeeting()
leaveMeeting()
muteMicrophone()

All of this belongs to the application.

Notice something important:

The application itself does not handle:

  • RTP packets
  • NAT traversal
  • codecs
  • encryption

WebRTC handles those.

Layer 2: WebRTC APIs

The browser exposes APIs.

These APIs allow applications to interact with the WebRTC engine.

The three most important APIs are:

MediaStream
RTCPeerConnection
RTCDataChannel

Everything in WebRTC revolves around these three concepts.

Think of them as the public interface to the communication engine.

Layer 3: WebRTC Engine

This is where the magic happens.

Inside the browser exists a sophisticated communication stack.

Most developers never see it.

But it is doing enormous amounts of work.

The WebRTC engine contains:

Media Engine
ICE Engine
DTLS Engine
SRTP Engine
Codec Engine
Congestion Controller
Network Monitor

This layer is responsible for solving the hard problems.

Layer 4: Network Layer

Eventually all communication becomes packets.

Those packets travel using:

UDP
TCP
IP

and eventually across the Internet.

Understanding the Three Primary APIs

MediaStream

Imagine you turn on your camera.

The browser must represent that media somehow.

The representation is:

MediaStream

Think of a MediaStream as:

A container of media sources.

Example:

const stream =
await navigator.mediaDevices.getUserMedia({
  video: true,
  audio: true
});

The browser returns a MediaStream.

Understanding the Stream Concept

MediaStream acts more like a collection.

Think:

MediaStream
    |
    +---- Video Track
    |
    +---- Audio Track

The actual media originates from tracks.

MediaStreamTrack

A MediaStreamTrack represents a single source of media.

Examples:

Camera
Microphone
Screen Share

Every media source becomes a track.

MediaStream
    |
    +---- Camera Track
    |
    +---- Microphone Track

Why Track Exist

Suppose you are in a meeting.

You click:

Mute Microphone

What happens?

The video should continue.

Only audio should stop.

The is possible because audio and video are independent tracks.

You can disable one without affecting the other.

Example: Screen Sharing

Suppose you start sharing your screen.

Now your stream may look like:

MediaStream
    |
    +---- Screen Track

or:

MediaStream
    |
    +---- Camera Track
    |
    +---- Screen Track
    |
    +---- Audio Track

The architecture remains consistent.

RTCPeerConnection

Everything eventually revolves around:

new RTCPeerConnection()

Most beginners think:

RTCPeerConnection is a connection

This is technically true.

But it's much more useful to think:

RTCPeerConnection is a communication engine.

Because internally it contains numerous subsystems.

What Problems Must RTCPeerConnection Solve?

The connection system must:

Find Routes
Cross NAT
Encrypt Traffic
Send Media
Monitor Quality
Handle Packet Loss
Adapt Bitrate

That's a lot of work.

RTCPeerConnection orchestrates all of it.

A Conceptual View

Think of it like:

RTCPeerConnection
        |
        +---- ICE
        |
        +---- STUN
        |
        +---- TURN
        |
        +---- DTLS
        |
        +---- SRTP
        |
        +---- RTP
        |
        +---- Congestion Control

The browser hides this complexity behind on API.

The ICE Engine

One subsystem inside RTCPeerConnection is ICE.

Recall our networking problems.

Private IPs
NAT
Firewalls

ICE exists to discover usable network paths.

It asks:

Can we connect directly?

Should we use TURN?

Which address works?

Which routes is fastest?

ICE is effectively the networking brain of WebRTC.

The Security Engine

WebRTC requires encryption.

Not optional.

Mandatory.

This responsibilitiy belongs to:

DTLS
SRTP

which we will study in detail later.

For now remember:

Every media packet is encrypted before transmission.

The Media Engine

Once connectivity exists, media must be transported.

The media engine handles:

Audio Transport
Video Transport
Synchronization
Packetization
Depacketization

This engine works continuously during a call.

The Codec Engine

Raw media is enormous.

Consider:

1920 x 1080 video

30 FPS

Raw bandwidth requirements would be absurdly high.

Therefore media must be compressed.

The codec engine performs:

Encoding
Decoding
Compression
Decompression

without which real-time video would be impractical.

RTCDataChannel

Many developers associate WebRTC exclusively with audio and video.

This is a mistake.

WebRTC can also transport arbitraty data.

For example:

Messages
Files
Game State
Collaborative Edits
Cursor Positions

This capability is provided by:

RTCDataChannel

Why Data Channels Matter

Imagine building:

Multiplayer games
Collaborative Whiteboards
Remote Desktop Systems

You need more than audio and video.

You need structured data.

Data channels provide a peer-to-peer mechanism for that communication.

The Complete Architectural Picture

We can now refine our earlier diagram.

Application
     |
     |
     V
+----------------------+
| MediaStream          |
| MediaStreamTrack     |
| RTCPeerConnection    |
| RTCDataChannel       |
+----------------------+
     |
     V
+----------------------+
| ICE                  |
| STUN                 |
| TURN                 |
| DTLS                 |
| SRTP                 |
| RTP                  |
| Codecs               |
+----------------------+
     |
     V
UDP / TCP
     |
     V
Internet
Buy Me A Coffee

Leave a comment

Your email address will not be published. Required fields are marked *