Understanding Media Sources
A WebRTC application does not create audio or video.
It captures existing media from a source.
Common sources include:
Video Sources
- Webcam
- USB Camera
- Virtual Camera
- Screen Share
- Window Share
- Browser Tab Share
Audio Sources
- Microphone
- Headset
- USB Audio Device
- System Audio
- Browser Audio
These physical or virtual devices are the origin of all media in WebRTC.
The Life of a Video Frame
To understand media capture, let's follow a single video frame.
Imagine Alice is sitting in front of her webcam.
The camera sees:
Alice's FaceWhat happens next?
The journey is surprisingly long.
Camera Sensor
↓
Device Driver
↓
Operating System
↓
Browser Capture Engine
↓
MediaStreamTrack
↓
MediaStream
↓
RTCPeerConnection
↓
Encoder
↓
NetworkMost developers only see the last few steps.
But understanding the full pipeline helps explain many WebRTC behaviors.
Step 1: Camera Sensor
Every webcam contains an image sensor.
Its job is simple:
Convert light into digital information.
Imagine Alice smiling at the camera.
The sensor captures:
Frame #1
Frame #2
Frame #3
Frame #4
...continuously.
Typically:
30 Frames Per Secondor:
60 Frames Per Seconddepending on the camera.
At this stage, this data is raw.
Nothing has been compressed.
The browser is not yet involved.
Step 2: Device Drivers
The operating system communicates with hardware through drivers.
Examples:
Windows Camera Driver
Linux Video4Linux (V4L2)
macOS AVFoundationThe browser cannot directly control the camera sensor.
Instead:
Browser
↓
Operating System
↓
Driver
↓
HardwareThe driver provides a standardized way to access media devices.
Step 3: Browser Requests Permission
This is the first step developers usually encounter.
When a website wants access to a camera:
navigator.mediaDevices.getUserMedia()the browser asks:
Allow Camera Access?or:
Allow Microphone Access?This permission model exists for security.
Without it, websites could secretly activate cameras and microphones.
Why Permissions Matter
Imagine visiting a malicious website:
Without permission, it could:
Activate Camera
Record Video
Activate Microphone
Record Audiowithout your knowledge.
Modern browsers explicitly require user consent.
This is one of the strongest security gurarantees of WebRTC.
The MediaDevices API
Media capture begins with:
navigator.mediaDevicesThink of this as the browser's device management system.
It allows applications to:
- Discover devices
- Access devices
- Request media streams
The msot important method is:
navigator.mediaDevices.getUserMedia()Understanding getUserMedia()
This API requests access to media sources.
Example:
const stream =
await navigator.mediaDevices.getUserMedia({
video: true,
audio: true
});The browser interprets this request as:
User wants: Camera
MicrophoneIf permission is granted, a MediaStream is returned.
What Is Actually Returned?
Many developers imagine:
const stream = getUserMedia(...)contains video frames.
Not exactly.
The returned object is:
MediaStreamwhich acts as a container.
Understanding MediaStream
Think of a MediaStream as a playlist.
A playlist does not contain songs.
It contains reference to songs.
Similarly:
A MediaStream does not directly contains media.
It contains tracks.
Example:
MediaStream
|
+---- Video Track
|
+---- Audio TrackWhy This Design Exists
Suppose Alice joins a meeting.
Initially:
Camera
Microphoneare active.
Later she starts screen sharing.
Now we have:
Camera
Microphone
ScreenThe browser needs a flexible structure.
MediaStream provides that flexibility.
MediaStreamTrack
This is where actual media originates.
A MediaStreamTrack represents a single source of media.
Examples:
Camera Track
Microphone Track
Screen TrackEach track continuously produces media data.
Track Analogy
Imagine a music studio.
Each instrument has its own channel.
Drums
Bass
Guitar
VocalsThese channels can be mixed together.
Media tracks work similarly.
Each track represents an independent source.
Stream and Track Relationship
A stream may contain multiple tracks.
Example:
MediaStream
|
+---- Camera Track
|
+---- Microphone Trackor:
MediaStream
|
+---- Screen Track
|
+---- Microphone Trackor even:
MediaStream
|
+---- Camera 1
|
+---- Camera 2
|
+---- AudioThe architecture remains consistent.
Why Tracks Are Important
Tracks allow inpendent control.
Suppose Alice clicks:
Mute MicrophoneThe browser can disable only:
Audio Trackwhile leaving:
Video Trackuntouched.
Similarly:
Turn Camera Offaffects only the video track.
This separation is fundamental to WebRTC.
Track States
A track can exist in several states.
For example:
Live
Muted
EndedLive: Actively producing media
Muted: Temporarily disabled
Ended: No longer producing media
Device Discovery
Applications often need to display available devices.
For example:
Camera 1
Camera 2
USB Camera
Microphone
HeadsetThe browser provides:
navigator.mediaDevices.enumerateDevices()This allows users to select devices.
Example: Multiple Cameras
Suppose a laptop has:
Built-in Camera
USB Camera
Virtual CameraThe application may allow switching between them.
Each device has a unique identifier.
The browser uses those identifiers to create tracks from specific sources.
Media Constraints
One of the most powerful features of getUserMedia() is constraints.
Constraints describe the desired media characteristics.
Example:
{
video: {
width: 1280,
height: 720
}
}This requests:
1280×720 VideoWhy Constraints Exist
Different applications require different qualities.
Example:
Security Camera: High Resolution
Mobile Call: Lower Resolution
Screen Sharing: Sharp TextConstraints allow applications to express preferences.
Common Video Constraints
Resolution:
Width
HeightExamples:
640×480
1280×720
1920×1080
3840×2160Frame Rate:
Examples:
15 FPS
30 FPS
60 FPSHigher frame rates provide smoother motion.
But consume more bandwidth.
Camera Facing Mode
Useful on mobile devices:
Example:
Front Camera
Rear CameraCommon Audio Constraints
Examples:
Echo Cancellation: Removes speaker feedback
Noise Suppression: Reduces backgroudn noise
Auto Gain Control: Adjusts microphone volume automatically.These features dramatically improve call quality.
The Hidden Media Pipeline
Let's revisit our video frame.
Alice's camera captures:
Frame #1001The browser receives it.
The frame enters:
Video TrackThe track passes the frame into the WebRTC engine.
The engine then decides:
Should this frame be sent?
Should it be encoded?
Should quality be reduced?Eventually the frame reaches the encoder.
Only then does networking begin.
Why Media Capture Matters
Many WebRTC issues originates before networking.
Examples:
Wrong Camera Selected
No Permissions
Low Frame Rate
Poor Resolution
Audio Device ProblemsUnderstanding media capture helps debug these issues.
Mental Model
Remember this hierarchy:
Device
↓
Track
↓
Stream
↓
PeerConnection
↓
Network
Leave a comment
Your email address will not be published. Required fields are marked *


