Media Capture and Media Streams

Updated on 30 Jun, 202624 mins read 186 views

Understanding Media Sources

A WebRTC application does not create audio or video.

It captures existing media from a source.

Common sources include:

Video Sources

Webcam
USB Camera
Virtual Camera
Screen Share
Window Share
Browser Tab Share

Audio Sources

Microphone
Headset
USB Audio Device
System Audio
Browser Audio

These physical or virtual devices are the origin of all media in WebRTC.

The Life of a Video Frame

To understand media capture, let's follow a single video frame.

Imagine Alice is sitting in front of her webcam.

The camera sees:

Alice's Face

What happens next?

The journey is surprisingly long.

Camera Sensor
       ↓
Device Driver
       ↓
Operating System
       ↓
Browser Capture Engine
       ↓
MediaStreamTrack
       ↓
MediaStream
       ↓
RTCPeerConnection
       ↓
Encoder
       ↓
Network

Most developers only see the last few steps.

But understanding the full pipeline helps explain many WebRTC behaviors.

Step 1: Camera Sensor

Every webcam contains an image sensor.

Its job is simple:

Convert light into digital information.

Imagine Alice smiling at the camera.

The sensor captures:

Frame #1
Frame #2
Frame #3
Frame #4
...

continuously.

Typically:

30 Frames Per Second

or:

60 Frames Per Second

depending on the camera.

At this stage, this data is raw.

Nothing has been compressed.

The browser is not yet involved.

Step 2: Device Drivers

The operating system communicates with hardware through drivers.

Examples:

Windows Camera Driver
Linux Video4Linux (V4L2)
macOS AVFoundation

The browser cannot directly control the camera sensor.

Instead:

Browser
     ↓
Operating System
     ↓
Driver
     ↓
Hardware

The driver provides a standardized way to access media devices.

Step 3: Browser Requests Permission

This is the first step developers usually encounter.

When a website wants access to a camera:

navigator.mediaDevices.getUserMedia()

the browser asks:

Allow Camera Access?

or:

Allow Microphone Access?

This permission model exists for security.

Without it, websites could secretly activate cameras and microphones.

Why Permissions Matter

Imagine visiting a malicious website:

Without permission, it could:

Activate Camera
Record Video
Activate Microphone
Record Audio

without your knowledge.

Modern browsers explicitly require user consent.

This is one of the strongest security gurarantees of WebRTC.

The MediaDevices API

Media capture begins with:

navigator.mediaDevices

Think of this as the browser's device management system.

It allows applications to:

Discover devices
Access devices
Request media streams

The msot important method is:

navigator.mediaDevices.getUserMedia()

Understanding getUserMedia()

This API requests access to media sources.

Example:

const stream =
await navigator.mediaDevices.getUserMedia({
  video: true,
  audio: true
});

The browser interprets this request as:

User wants: Camera
			Microphone

If permission is granted, a MediaStream is returned.

What Is Actually Returned?

Many developers imagine:

const stream = getUserMedia(...)

contains video frames.

Not exactly.

The returned object is:

MediaStream

which acts as a container.

Understanding MediaStream

Think of a MediaStream as a playlist.

A playlist does not contain songs.

It contains reference to songs.

Similarly:

A MediaStream does not directly contains media.

It contains tracks.

Example:

MediaStream
     |
     +---- Video Track
     |
     +---- Audio Track

Why This Design Exists

Suppose Alice joins a meeting.

Initially:

Camera
Microphone

are active.

Later she starts screen sharing.

Now we have:

Camera
Microphone
Screen

The browser needs a flexible structure.

MediaStream provides that flexibility.

MediaStreamTrack

This is where actual media originates.

A MediaStreamTrack represents a single source of media.

Examples:

Camera Track
Microphone Track
Screen Track

Each track continuously produces media data.

Track Analogy

Imagine a music studio.

Each instrument has its own channel.

Drums
Bass
Guitar
Vocals

These channels can be mixed together.

Media tracks work similarly.

Each track represents an independent source.

Stream and Track Relationship

A stream may contain multiple tracks.

Example:

MediaStream
     |
     +---- Camera Track
     |
     +---- Microphone Track

or:

MediaStream
     |
     +---- Screen Track
     |
     +---- Microphone Track

or even:

MediaStream
     |
     +---- Camera 1
     |
     +---- Camera 2
     |
     +---- Audio

The architecture remains consistent.

Why Tracks Are Important

Tracks allow inpendent control.

Suppose Alice clicks:

Mute Microphone

The browser can disable only:

Audio Track

while leaving:

Video Track

untouched.

Similarly:

Turn Camera Off

affects only the video track.

This separation is fundamental to WebRTC.

Track States

A track can exist in several states.

For example:

Live
Muted
Ended

Live: Actively producing media

Muted: Temporarily disabled

Ended: No longer producing media

Device Discovery

Applications often need to display available devices.

For example:

Camera 1
Camera 2
USB Camera
Microphone
Headset

The browser provides:

navigator.mediaDevices.enumerateDevices()

This allows users to select devices.

Example: Multiple Cameras

Suppose a laptop has:

Built-in Camera
USB Camera
Virtual Camera

The application may allow switching between them.

Each device has a unique identifier.

The browser uses those identifiers to create tracks from specific sources.

Media Constraints

One of the most powerful features of getUserMedia() is constraints.

Constraints describe the desired media characteristics.

Example:

{
  video: {
    width: 1280,
    height: 720
  }
}

This requests:

1280×720 Video

Why Constraints Exist

Different applications require different qualities.

Example:

Security Camera: High Resolution

Mobile Call: Lower Resolution

Screen Sharing: Sharp Text

Constraints allow applications to express preferences.

Common Video Constraints

Resolution:

Width
Height

Examples:

Frame Rate:

Examples:

15 FPS
30 FPS
60 FPS

Higher frame rates provide smoother motion.

But consume more bandwidth.

Camera Facing Mode

Useful on mobile devices:

Example:

Front Camera
Rear Camera

Common Audio Constraints

Examples:

Echo Cancellation: Removes speaker feedback

Noise Suppression: Reduces backgroudn noise

Auto Gain Control: Adjusts microphone volume automatically.

These features dramatically improve call quality.

The Hidden Media Pipeline

Let's revisit our video frame.

Alice's camera captures:

Frame #1001

The browser receives it.

The frame enters:

Video Track

The track passes the frame into the WebRTC engine.

The engine then decides:

Should this frame be sent?
Should it be encoded?
Should quality be reduced?

Eventually the frame reaches the encoder.

Only then does networking begin.

Why Media Capture Matters

Many WebRTC issues originates before networking.

Examples:

Wrong Camera Selected
No Permissions
Low Frame Rate
Poor Resolution
Audio Device Problems

Understanding media capture helps debug these issues.

Mental Model

Remember this hierarchy:

Device
   ↓
Track
   ↓
Stream
   ↓
PeerConnection
   ↓
Network

Your email address will not be published. Required fields are marked *

Media Capture and Media Streams

Understanding Media Sources

The Life of a Video Frame

Step 1: Camera Sensor

Step 2: Device Drivers

Step 3: Browser Requests Permission

Why Permissions Matter

The MediaDevices API

Understanding getUserMedia()

What Is Actually Returned?

Understanding MediaStream

Why This Design Exists

MediaStreamTrack

Track Analogy

Stream and Track Relationship

Why Tracks Are Important

Track States

Device Discovery

Example: Multiple Cameras

Media Constraints

Why Constraints Exist

Common Video Constraints

Camera Facing Mode

Common Audio Constraints

The Hidden Media Pipeline

Why Media Capture Matters

Mental Model

Leave a comment

Tags

Quick links

Newsletter