Data Encoding

Updated on 15 Jan, 202613 mins read 207 views

What Is Data Encoding?

Data encoding is the process of converting data from one representation to another so that it can be safely stored, transmitted, and interpreted by different systems. In system design, encoding ensures that data remains consistent and readable across platforms, programming languages, operating systems, and network protocols.

Encoding does not provide security. Its goal is compatibility and correctness, not confidentiality. Any encoded data can be decoded back to its original form without the need for secret keys.

Encoding exists to answer a simple but critical question:

How do we represent the same bytes in a form that survives systems, protocols, and humans?

Formal Definition

Encoding is a reversible, deterministic transformation of data into another representation without secrecy, for the purpose of storage or transmission.

No keys
No secrets
No security guarantee

If you know the encoding, you can always reverse it.

Encoding is all about data representation.

Why Encoding Is Essential in Distributed Systems

Distributed systems involve multiple components that may:

Use different hardware architectures
Run different operating systems
Be written in different programming languages
Communicate over text-based protocol

Encoding acts as a common language that allows these components to exchange data reliably.

Common scenarios where encoding is required:

Sending binary data over HTTP
Storing multilingual text in databases
Embedding data inside URLs or JSON payloads
Serializing objects for network communication

Without proper encoding, data corruption, parsing errors, and system failures can occur.

Character Encoding

Character encoding defines how characters are represented as bytes.

ASCII

Uses 7 bits to represent characters
Supports only basic English characters
Limited and unsuitable for global applications

Unicode

A universal character set covering most world languages
Assign a unique code point to each character

UTF Encodings

UTF-8: Variable-length, backward compatible with ASCII, most widely used
UTF-16: Uses 2 or 4 bytes per character
UTF-32: Fixed-length, larger storage size

UTF-8 is the de facto standard in modern system design due to its efficiency and compatibility.

Binary-to-Text Encoding

Binary-to-text encoding allows binary data to be transmitted over channels that support only text.

Base64 Encoding

Converts binary data into ASCII characters
Commonly used in APIs, email, and authentication tokens
Increases data size by approximately 33%
Safe for:
- HTTP
- JSON
- XML
- Email

How Base64 Works (Internals)

Take 3 bytes (24 bits)
Split into 4 chunks (6 bits each)
Map each chunk to a character set

Input bytes: 01001000 01101001
Bits padded: 010010 000110 100101 001000
Characters:  S      G      V      I

Output:

"SGVsbG8="

Padding (`=`)

Used when input length is not multiple of 3
Indicates how many bytes were missing
Not optional

Size Expansion

Base64 increases size by ~33%

3 bytes → 4 characters

Use cases:

Embedding images in JSON
Transmitting cryptographic keys
Encoding JWT payloads

Base32 and Base16

More human-readable than Base64
Used in QR codes, OTP systems, and checksums

URL Encoding

Problem

URLs have reserved characters:

? & = / %

Solution

Encode unsafe characters:

space → %20
/     → %2F

URL encoding ensures that special characters are safely transmitted in URLs.

Reserved characters (?, &, =) have special meanings
Unsafe characters are replaced with % followed by hex values

Example:

Space -> %20
@ -> %40

URL encoding is critical in web-based systems to prevent request misinterpretation.

Common Security Mistake

“The token looks random, so it must be encrypted.”

No.

Base64 output is fully reversible.

Your email address will not be published. Required fields are marked *

Data Encoding

What Is Data Encoding?

Formal Definition

Why Encoding Is Essential in Distributed Systems

Character Encoding

ASCII

Unicode

UTF Encodings

Binary-to-Text Encoding

Base64 Encoding

How Base64 Works (Internals)

Output:

Padding (`=`)

Size Expansion

Use cases:

Base32 and Base16

URL Encoding

Problem

Solution

Example:

Common Security Mistake

Leave a comment

Tags

Quick links

Newsletter

Data Encoding

What Is Data Encoding?

Formal Definition

Why Encoding Is Essential in Distributed Systems

Character Encoding

ASCII

Unicode

UTF Encodings

Binary-to-Text Encoding

Base64 Encoding

How Base64 Works (Internals)

Output:

Padding (=)

Size Expansion

Use cases:

Base32 and Base16

URL Encoding

Problem

Solution

Example:

Common Security Mistake

Leave a comment

Padding (`=`)