Database Indexing

Updated on 26 Feb, 202619 mins read 161 views

Suppose you have a book of around 10000 pages and you are trying to find the page which contains information related to a certain word. This book doesn't have indexing, you have no idea where that word could exist. So you would have to go through every page, which could take hours or even days.

But with a index page, you know where to look, like in which chapter or module it could exist. Once you get the page number, you directly jump to that page.

Database indexes work in a similar manner. They guide the database to the exact location of the data, enabling faster and more efficient data retrieval.

What Is a Page in a Database?

A page is the smallest unit of data a database reads or writes to disk.

Think of it like this:

Disk -> very slow
RAM -> fast
DB does not read one row at a time from disk
DB reads a chunk -> that chunk is a page

Typical page sizes:

PostgreSQL: 8 KB
MySQL InnoDB: 16 KB
Orace: 8 KB (configurable)
SQL Server: 8 KB

Even if you request 1 row, the DB loads the entire page containing that row.

Why Pages Exist?

Disk Reality

Disk I/O is expensive

Seek time
Rotation time
Latency

Reading 1 byte vs 8KB:

Almost the same cost

So DBs optimize by:

Reading data in fixed-size pages

How Tables Are Stored Using Pages

A table is not a continuous file of rows.

Instead:

Table
 ├── Page 1 → rows 1–50
 ├── Page 2 → rows 51–100
 ├── Page 3 → rows 101–150

Each page contains:

Page header (metadata)
Row slots
Free space

Rows inside a page are packed, not necessarily ordered.

How a Database Finds a Row Without an Index

When a table has no index that can help with a query, the database has no shortcut to locate the desired row. In this case, it performs a full table scan.

Step-by-Step Process

Start from the first data page on disk
1. The database identifies the first page that belongs to the table.
Load the page into memory (buffer cache)
1. Since databases operate on pages (not rows), the entire page is read from disk into memory.

Example:

SELECT * FROM users WHERE id = 7;

Steps:

DB finds which page contains row 7
Loads that page into buffer cache
Reads the row from memory
Return result

How a Database Finds a Row Without an Index

When a table has no index that can help with a query, the database has no shortcut to locate the desired row. In this case, it performs a full table scan.

Step-by-Step Process

Start from the first data page on disk
The database identifies the first page that belongs to the table.
Load the page into memory (buffer cache)
Since databases operate on pages (not rows), the entire page is read from disk into memory.
Scan all rows within that page
Each row in the page is examined to check whether it satisfies the query condition.
If the row is not found, move to the next page
The database discards or keeps the page in cache and proceeds to the next page in the table.
Repeat the process page by page
1. Load page into memory
2. Scan all rows in that page
3. Check for a match
Stop when one of the following happens
1. The matching row is found
2. All pages of the table have been scanned (row does not exist)

Key Characteristics of This Approach

The database may need to read every page of the table
Time complexity grows linearly with table size (O(N))
Performance degrades significantly as the table grows
This is the slowest but most reliable way to find data

Definition

Database indexing is a technique where the database creates and maintains an extra data structure to quicky locate rows without scanning the entire table.

Problem Context

Assume we have a table:

CREATE TABLE employees (
    id INT PRIMARY KEY,
    first_name VARCHAR(100),
    last_name VARCHAR(100),
    department_id INT,
    salary DECIMAL(10,2),
    created_at DATETIME
);

In large-scale systems (millions of rows), queries like:

SELECT * FROM employees WHERE last_name = 'Smith';

Without an index -> MySQL performs a full table scan (O(n)).

As data grows, latency increases linearly.

Single-Column Index

To optimize frequent lookups by last_name:

CREATE INDEX idx_last_name
ON employees (last_name);

What Happens Internally?

MySQL (InnoDB) creates a B+ Tree index:

Sorted structure
Logarithmic lookup time O(log n)
Stores key + pointer to primary key

Now the query:

SELECT * FROM employees WHERE last_name = 'Smith';

Uses the index to:

Traverse B+ Tree
Locate matching keys
Fetch rows by primary key

Composite Index (Multi-Column)

If your system frequently runs:

SELECT * 
FROM employees 
WHERE first_name = 'John' AND last_name = 'Smith';

Create:

CREATE INDEX idx_first_last
ON employees (first_name, last_name);

Important: Leftmost Prefix Rule

This index works for:

✅ WHERE first_name = ‘John’

✅ WHERE first_name = ‘John’ AND last_name = ‘Smith’

❌ WHERE last_name = ‘Smith’ (alone)

Order matters in system design.

Types of Indexes in Databases

Indexes are classified based on how data is organized, what they optimize, and how they are stored.

At a high level, there are 4 major ways to think about index types:

By data structure
By physical storage
By number of columns
By specialized use cases

1 B-Tree (B+Tree) Index

This is the default index type in almost every relational database

Used by:

MySQL
PostgreSQL
Oracle
SQL Server

How it Works:

Data is kept sorted
Tree stays balanced
Searches move top -> down -> leaf

           [50]
        /          \
    [10,20]      [60,80]
     /   \        /    \
   leaf leaf   leaf   leaf

Supports:

=
<, >, BETWEEN
ORDER BY
LIKE ‘abc%’

Your email address will not be published. Required fields are marked *

Database Indexing

What Is a Page in a Database?

Why Pages Exist?

How Tables Are Stored Using Pages

How a Database Finds a Row Without an Index

How a Database Finds a Row Without an Index

Step-by-Step Process

Key Characteristics of This Approach

Definition

Problem Context

Single-Column Index

What Happens Internally?

Composite Index (Multi-Column)

Important: Leftmost Prefix Rule

Types of Indexes in Databases

1 B-Tree (B+Tree) Index

Leave a comment

Tags

Quick links

Newsletter