When you try to sign up on platforms like Instagram, Facebook, Google, or LinkedIn, you must have noticed a common feature:
As soon as you type a username, within a fraction of a second the interface shows:
✓ Username availableor
✗ Username already takenAnd it happens almost instantly.
Have you ever wondered how this works internally?
Because if you think about it, these platforms may have hundreds of millions or even billions of users. Checking username availability should technically require searching through a massive database.
Yet the response is almost immediate.
The First Intuition: Query the Database
The most straightforward implementation might look like this:
Whenever a user types a username, the frontend sends a request to the backend.
The backend runs a query like:
SELECT * FROM users WHERE username = "thejat" LIMIT 1;If a row is returned:
username exists → takenIf nothing is returned:
username does not exist → availableAt first glance, this seems perfectly reasonable.
However, this approach does not scale well.
Why This Approach Doesn't Scale
Imagine a platform with:
10 million usersEvery time someone types a username:
client → backend → database queryEven if the query is optimized, this leads to several problems:
1 Database Load
Millions of users checking usernames simulatenously would generate massive query traffic.
The database would become a bottleneck.
2 Time Complexity
Without optimization, searching is roughly:
O(n)Where n = number of users.
Even with indexing, the database still need to perform:
O(log n)operations.
For massive systems, even logarithmic queries become expensive when performed millions of times per second.
3 Cost of Scaling
To support this load, companies would need:
- more database replicas
- more computing power
- more infrastructure
Which dramatically increases cost and complexity.
So companies avoid hitting the database whenever possible.
Second Idea: Use a Hash Map
A better idea would be to maintain a hash map in memory.
Whenever a new user signs up:
username → trueExample:
{
"thejat": true,
"alex_dev": true,
"coder99": true
}Now checking a username becomes extremely fast:
hashmap.contains("thejat")Hash lookups are approximately:
O(1)Which is very fast.
The Problem with Hash Maps
While hash maps solve the speed issue, they introduce a major new problem: memory usage.
Let's estimate.
Assume:
10 million usernamesIf the average username length is:
10 charactersMemory usage roughly becomes:
10M * 10 bytes = 100MBBut in reality, hash maps require:
- extra metadata
- hash buckets
- pointers
- load factors
So memory usage might grow to:
500MB - 1GBNow imagine platforms with:
500 million usersMemory usage becomes enormous.
Keeping such a huge structure fully in RAM is expensive and inefficient.
So while hash maps are fast, they are not memory efficient enough for massive systems.
Enter the Game Changer: Bloom Filters
This is where Bloom Filters comes into the picture.
A Bloom Filter is a space-efficient probablistic data structure used to check whether an element is possibly in a set or definitely not in a set.
It allows systems to perform extremely fast membership checks using very little memory
Instead of storing actual usernames, a Bloom filter stores hash fingerprint of usernames inside a bit array.
This dramatically reduces memory usage.
Core Idea Behind Bloom Filters
A Bloom filter consists of two components:
- A bit array
- Multiple hash functions
Example bit array:
0 0 0 0 0 0 0 0 0 0 0 0We also define k hash functions:
h1(x)
h2(x)
h3(x)
...
hk(x)Each hash function maps the input to a position in the bit array.
How Insertion Works
Suppose a user registers with username:
TheJatWe run the username through multiple hash functions:
h1(TheJat) → 3
h2(TheJat) → 7
h3(TheJat) → 9Now we set those positions in the bit array to 1.
Before insertion:
0 1 2 3 4 5 6 7 8 9
+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
| 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+After insertion:
0 1 2 3 4 5 6 7 8 9
+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 |
+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+Notice something important:
The actual username is never stored.
Only bit positions are set.
How Lookup Works
Now suppose another user tries to check:
TheJatWe run the same hash functions again:
h1(TheJat) → 3
h2(TheJat) → 7
h3(TheJat) → 9Then we check the bit array.
If any of the bits are 0:
username definitely does not existIf all bits are 1.
username possibly existsThe Catch: False Positive
Bloom filters are probablistic.
This means they can sometimes say:
username existseven though it doesn't actually exist.
This is called a false positive.
But Bloom filters guarantee something extremely important:
False negatives never happen.Meaning:
If the Bloom filter says the username does not exist, then it is 100% guaranteed to be available.
How Platforms Actually Use Bloom Filters
In large systems, the flow typically looks like this:
User types username
│
▼
Check Bloom Filter (RAM)
│
┌────┴────┐
│ │
▼ ▼
Definitely Possibly
Not Exists Exists
│ │
Show Verify with
Available DatabaseThis means the database is queried only when necessary, drastically reducing load.
Leave a comment
Your email address will not be published. Required fields are marked *
