CLOSE
🛠️ Settings

Hashing is a powerful technique in data structures and algorithms that enables fast data retrieval, search, and insertion operations by mapping keys to specific values (or locations).

Hashing in Array Operations: Counting Occurrences

Imagine you are have an array:

      +-----+-----+-----+-----+-----+
arr = |  1  |  2  |  1  |  3  |  2  |
      +-----+-----+-----+-----+-----+

Now, you want to find the frequency of each element in the array, such as how many times 1 appears, how many times 2 appears, and so on. Furthermore, you need to perform this frequency check multiple times.

1️⃣ Method 1: Basic Traversal and Counting

The first approach that might come to mind is to use a for loop with a counter variable. For example, to count how many times 1 appears in the array, you could write code like this:

int countOne = 0;
for(int i = 0; i < arr.size(); i++) {
    if(arr[i] == 1) {
        countOne++;
    }
}

// counterOne would contains the frequency of 1.
This would have time complexity of `O(n)`, where `n` is the number of elements
in the array.

This approach has a time complexity of O(n), where n is the number of elements in the array.

To get the frequency of 2, you’d need to run a similar loop again with a new counter variable:

int countTwo = 0;
for (int i = 0; i < arr.size(); i++) {
    if (arr[i] == 2) {
        countTwo++;
    }
}
// Time Complexity = O(n)

If you follow this approach to count each unique element, say for elements 1 through 5, the total time complexity will be approximately 5 * O(n).

Now, if both the number of unique elements (like 1 to 100,000) and the array size are large (e.g., 100,000 elements), the number of operations required would be around:

100 000 * 10 000 = 10^10 operations

On average, a computer can process about 10^8 operations per second so:

10^8 operations = 1 second
1 operation = 1/10^8 second
10^10 = (1/10^8)*10^10 = 100 seconds, which is a lot of time.

This approach would take around 100 seconds, which is quite inefficient for larger data sets.

int countOne = 0;
for(int i = 0; i < arr.size(); i++) {
    if(arr[i] == 1) {
        countOne++;
    }
}
// Time Complexity = O(n)

int countTwo = 0;
for(int i = 0; i < arr.size(); i++) {
    if(arr[i] == 2) {
        countTwo++;
    }
}
// Time Complexity = O(n)

.
.
.
We have to do it for every elements whose count we want.

Suppose there are elements 1 to 5. then the overall time complexity would be:
5 * O(n)

Well if the elements whose frequency we want are 10^5 and array size is also 10^5
then number of operations would be: 10^5 * 10^5 = 10^10
In one second the system does 10^8 operations approximately.

10^8 operations = 1 second
1 operation = 1/10^8 second
10^10 = (1/10^8)*10^10 = 100 seconds, which is a lot of time.

This is where hashing comes into rescue.

2️⃣ Method 2: Hashing for Efficient Counting

Hashing involves creating another array known as hash array of size equivalent to number of unique elements.

      +-----+-----+-----+-----+-----+-----+
arr = |  1  |  2  |  1  |  3  |  2  |  7  |
      +-----+-----+-----+-----+-----+-----+
      
             +-----+-----+-----+-----+-----+-----+-----+-----+
hash_array = |  0  |  0  |  0  |  0  |  0  |  0  |  0  |  0  |... more if you want
             +-----+-----+-----+-----+-----+-----+-----+-----+
                0     1     2     3     4     5     6     7

for (int i = 0; i < arr.size(); i++) {
	hash_array[i]++;
}

// Once hash_array is populated, you can get the frequency of any element 
x by accessing hash_array[x]
for example, hash_array[1] gives the frequency of 1.

Maximum size of array we can declare?

For integer
    inside main = 10^6
    globally = 10^7

For boolean
    inside main = 10^7
    globally = 10^8
    

If we go beyond the limit, the segmentation fault will occur.

Character Hashing:

We are given a string, “abcdabejc” and we asked, how many times does “a” appears.

The standard procedure is you iterate over the string and check for the character ‘a’ and increment its count.

For example:
string str= "abcdabejc"
int aCount = 0;
for(int i = 0; i < nums.length(); i++) {
    if(str[i] == 'a') {
        aCount++;
    }
}

// This would take O(n) time complexity

If we then want to check the frequency of 'b', we'd have to loop through the string again, which would again take O(n) time.

So we can hash them using arrays.

Optimized Approach: Character Hashing

Instead of looping through the string each time, we can precompute the frequency of all characters in a single pass. We use an array to store the frequency of each character.

           +-----+-----+-----+-----+-----+-----+--
charHash = |  0  |  0  |  0  |  0  |  0  |  0  | ...
           +-----+-----+-----+-----+-----+-----+--
              0     1     2     3     4     5   .... 25

      +---+---+---+---+---+---+---+---+---+
str = | a | b | c | d | a | b | e | j | c |
      +---+---+---+---+---+---+---+---+---+
        0   1   2   3   4   5   6   7   8

std::string str = "abcdabejc";
int charCount[26] = {0};  // Assuming only lowercase letters

// Precompute frequencies
for (char ch : str) {
    charCount[ch - 'a']++; // Maps each character to an index 0-25
}

// Now, to find the frequency of any character, we can get it in O(1) time
int aCount = charCount['a' - 'a'];
int bCount = charCount['b' - 'a'];

Imagine:

int n = 'a';

Here, a is converted into its ASCII, which is 97.

Mapping:

  • To map 'a' (ASCII 97) to index 0 in the hash array, we subtract the ASCII value of 'a' from the ASCII value of the character.
  • This is done as: index = ch - 'a'
  • This way, 'a' maps to 0, 'b' to 1, and so on up to 'z', which maps to 25.

Note:

If we take an array of size 256, each ASCII character (which has a unique value from 0 to 255) can directly map to its corresponding index without needing any subtraction. This simplifies the code and allows you to track the frequency of all possible ASCII characters, not just lowercase letters.

  • With an array of size 256, each ASCII value directly corresponds to an index in the array.
  • For example:
    • 'a' (ASCII 97) maps directly to hash_array[97].
    • 'b' (ASCII 98) maps directly to hash_array[98].
    • 'z' (ASCII 122) maps directly to hash_array[122].
    • Even non-alphabet characters like '@' (ASCII 64) would map to hash_array[64].

Advantages of Using an Array of 256:

  1. Direct Mapping: No need to subtract any offset; each ASCII character maps directly to its index.
  2. Extended Character Support: This method covers all ASCII characters, so it can handle symbols, numbers, and special characters.
  3. Simplicity: The code becomes simpler since there’s no need to worry about character offsets.

What is Hashing?

  • Hashing is the process of mapping large data (like strings, numbers, or objects) into a smaller, fixed-size value or "hash code".
  • A hash function takes an input (or "key") and returns an integer (the "hash value" or "hash code") that represents the original input in a fixed, consistent way.
  • Hash Table: A data structure that uses hashing to store key-value pairs, enabling O(1) average time complexity for search, insert, and delete operations.

How a Hash Function Works ❓

A hash function should ideally:

  1. Map each unique input (or key) to a unique hash code.
  2. Distribute hash codes evenly across the hash table to avoid clustering.
  3. Be quick to compute for fast lookup times.

Example of a simple hash function:

int hashFunction(int key, int tableSize) {
    return key % tableSize;  // maps the key to an index within the table
}

How to hash large number like 10^9 or higher

So far, we’ve learned about hashing numbers using arrays, but this method is impractical for very large numbers, such as 10^9 or higher. Using arrays for such large indices would require too much memory.

To handle large numbers efficiently, we can use the STL map and unordered_map in C++ or HashMap in Java’s collections. These data structures allow us to hash and store values for a vast range of keys without the memory limitations of arrays. In this discussion, we’ll explore these concepts in detail. The principles are similar across map/unordered_map in C++ and HashMap in Java.

      STL
       |
 +------------+
 |            |
map     unordered_map

1️⃣ map in C++:

A map is an ordered container, meaning that elements are stored in sorted order according to the key. The keys are unique.

Implements a balanced binary tree, offering a guaranteed time complexity of O(log N) for all operations, regardless of collisions.

Declaration:
#include <iostream>
#include <map>

std::map<KeyType, ValueType> mapName;
  • KeyType: The type of the keys (e.g., int, string, etc.)
  • ValueType: The type of the values associated with the keys (e.g., int, string, etc.)
#include <iostream>
#include <map>

int main() {
    // Declare a map that maps integers to strings
    std::map<int, std::string> myMap;

    // Insert key-value pairs
    myMap[1] = "Apple";
    myMap[2] = "Banana";
    myMap[3] = "Cherry";

    // Access and print values by key
    std::cout << "Key 1: " << myMap[1] << std::endl;
    std::cout << "Key 2: " << myMap[2] << std::endl;

    // Iterate through the map
    for (const auto& pair : myMap) {
        std::cout << "Key: " << pair.first << ", Value: " << pair.second << std::endl;
    }

    return 0;
}
// Output:
Key 1: Apple
Key 2: Banana
Key: 1, Value: Apple
Key: 2, Value: Banana
Key: 3, Value: Cherry

2️⃣ unordered_map in C++

An unordered_map is similar to a map, but it does not maintain the order of elements. The elements are stored in has buckets, making access faster (on average, O(1) time complexity for insertion and lookup).

Offers an average time complexity of O(1) for insertions, deletion, and lookups. However, in the worst-case scenario, typically due to hash collisions, the time complexity can degrade to O(N).

Declaration:
#include <iostream>
#include <unordered_map>

std::unordered_map<KeyType, ValueType> unorderedMapName;
  • KeyType: The type of the keys (e.g., int, string, etc.)
  • ValueType: The type of the values associated with the keys (e.g., int, string, etc.)
#include <iostream>
#include <unordered_map>

int main() {
    // Declare an unordered_map that maps integers to strings
    std::unordered_map<int, std::string> myUnorderedMap;

    // Insert key-value pairs
    myUnorderedMap[1] = "Apple";
    myUnorderedMap[2] = "Banana";
    myUnorderedMap[3] = "Cherry";

    // Access and print values by key
    std::cout << "Key 1: " << myUnorderedMap[1] << std::endl;
    std::cout << "Key 2: " << myUnorderedMap[2] << std::endl;

    // Iterate through the unordered_map
    for (const auto& pair : myUnorderedMap) {
        std::cout << "Key: " << pair.first << ", Value: " << pair.second << std::endl;
    }

    return 0;
}
// Output:
Key 1: Apple
Key 2: Banana
Key: 3, Value: Cherry
Key: 2, Value: Banana
Key: 1, Value: Apple

Key Differences Between map and unordered_map:

  1. Order:
    • map: Elements are stored in sorted order according to the key.
    • unordered_map: Elements are stored in an arbitrary order (based on hash values).
  2. Time Complexity:
    • map: Operations like insertion, deletion, and access take O(log n) time due to the underlying balanced binary tree structure.
    • unordered_map: Operations take O(1) on average due to the underlying hash table.
  3. Use Case:
    • Use map when you need elements in sorted order.
    • Use unordered_map when you want faster access and order is not important.

unordered_map to store the frequency of elements:

#include <iostream>
#include <unordered_map>
#include <vector>

int main() {
    std::vector<int> arr = {1, 2, 2, 3, 1, 4, 2, 3, 5};

    // Declare an unordered_map to store frequencies
    std::unordered_map<int, int> frequencyMap;

    // Populate the map with frequencies
    for (int num : arr) {
        frequencyMap[num]++;
    }

    // Output the frequency of each element
    std::cout << "Element Frequencies:" << std::endl;
    for (const auto& pair : frequencyMap) {
        std::cout << "Element " << pair.first << " appears " << pair.second << " times." << std::endl;
    }

    return 0;
}
Visualization:
      +---+---+---+---+---+---+---+---+---+----+
arr = | 1 | 2 | 2 | 3 | 1 | 4 | 2 | 2 | 3 | 12 |
      +---+---+---+---+---+---+---+---+---+----+

  +-----+-------+
  | key | value |
  +-----+-------+
  |     |       |
  +-----+-------+
   unordered_map

Iteration 1:
      +---+---+---+---+---+---+---+---+---+----+
arr = | 1 | 2 | 2 | 3 | 1 | 4 | 2 | 2 | 3 | 12 |
      +---+---+---+---+---+---+---+---+---+----+
        0   1   2   3   4   5   6   7   8   9
        ^
        |
        i

  +-----+-------+
  | key | value |
  +-----+-------+
  |     |       |
  +-----+-------+
   unordered_map

    Since arr[i], 1 does not exist in unordered_map,
    so add key = 1, value = 1
    
Iteration 2:
      +---+---+---+---+---+---+---+---+---+----+
arr = | 1 | 2 | 2 | 3 | 1 | 4 | 2 | 2 | 3 | 12 |
      +---+---+---+---+---+---+---+---+---+----+
        0   1   2   3   4   5   6   7   8   9
            ^
            |
            i

  +-----+-------+
  | key | value |
  +-----+-------+
  |  1  |  1    |
  +-----+-------+
   unordered_map

    Since arr[i], 2 does not exist in unordered_map,
    so add key = 2, value = 1

Iteration 3:
      +---+---+---+---+---+---+---+---+---+----+
arr = | 1 | 2 | 2 | 3 | 1 | 4 | 2 | 2 | 3 | 12 |
      +---+---+---+---+---+---+---+---+---+----+
        0   1   2   3   4   5   6   7   8   9
                ^
                |
                i

  +-----+-------+
  | key | value |
  +-----+-------+
  |  1  |  1    |
  +-----+-------+
  |  2  |  1    |
  +-----+-------+
   unordered_map

    Since arr[i], 2 does exist in unordered_map,
    so increment its value.

Iteration 4:
      +---+---+---+---+---+---+---+---+---+----+
arr = | 1 | 2 | 2 | 3 | 1 | 4 | 2 | 2 | 3 | 12 |
      +---+---+---+---+---+---+---+---+---+----+
        0   1   2   3   4   5   6   7   8   9
                    ^
                    |
                    i

  +-----+-------+
  | key | value |
  +-----+-------+
  |  1  |  1    |
  +-----+-------+
  |  2  |  2    |
  +-----+-------+
   unordered_map

    Since arr[i], 3 does not exist in unordered_map,
    so add with key = 3, value = 1

Iteration 5:
      +---+---+---+---+---+---+---+---+---+----+
arr = | 1 | 2 | 2 | 3 | 1 | 4 | 2 | 2 | 3 | 12 |
      +---+---+---+---+---+---+---+---+---+----+
        0   1   2   3   4   5   6   7   8   9
                        ^
                        |
                        i

  +-----+-------+
  | key | value |
  +-----+-------+
  |  1  |  1    |
  +-----+-------+
  |  2  |  2    |
  +-----+-------+
  |  3  |  1    |
  +-----+-------+
   unordered_map

    Since arr[i], 1 does exist in unordered_map,
    so increment value corresponding to the key 1.

Iteration 6:
      +---+---+---+---+---+---+---+---+---+----+
arr = | 1 | 2 | 2 | 3 | 1 | 4 | 2 | 2 | 3 | 12 |
      +---+---+---+---+---+---+---+---+---+----+
        0   1   2   3   4   5   6   7   8   9
                            ^
                            |
                            i

  +-----+-------+
  | key | value |
  +-----+-------+
  |  1  |  2    |
  +-----+-------+
  |  2  |  2    |
  +-----+-------+
  |  3  |  1    |
  +-----+-------+
   unordered_map

    Since arr[i], 4 does not exist in unordered_map,
    so add the key = 4 with value = 1.

Iteration 7:
      +---+---+---+---+---+---+---+---+---+----+
arr = | 1 | 2 | 2 | 3 | 1 | 4 | 2 | 2 | 3 | 12 |
      +---+---+---+---+---+---+---+---+---+----+
        0   1   2   3   4   5   6   7   8   9
                                ^
                                |
                                i

  +-----+-------+
  | key | value |
  +-----+-------+
  |  1  |  2    |
  +-----+-------+
  |  2  |  2    |
  +-----+-------+
  |  3  |  1    |
  +-----+-------+
  |  4  |  1    |
  +-----+-------+
   unordered_map

    Since arr[i], 2 does exist in unordered_map,
    so increment the value corresponding to the key 2.

Iteration 8:
      +---+---+---+---+---+---+---+---+---+----+
arr = | 1 | 2 | 2 | 3 | 1 | 4 | 2 | 2 | 3 | 12 |
      +---+---+---+---+---+---+---+---+---+----+
        0   1   2   3   4   5   6   7   8   9
                                    ^
                                    |
                                    i

  +-----+-------+
  | key | value |
  +-----+-------+
  |  1  |  2    |
  +-----+-------+
  |  2  |  3    |
  +-----+-------+
  |  3  |  1    |
  +-----+-------+
  |  4  |  1    |
  +-----+-------+
   unordered_map

    Since arr[i], 2 does exist in unordered_map,
    so increment the value corresponding to the key 2.

Iteration 9:
      +---+---+---+---+---+---+---+---+---+----+
arr = | 1 | 2 | 2 | 3 | 1 | 4 | 2 | 2 | 3 | 12 |
      +---+---+---+---+---+---+---+---+---+----+
        0   1   2   3   4   5   6   7   8   9
                                        ^
                                        |
                                        i

  +-----+-------+
  | key | value |
  +-----+-------+
  |  1  |  2    |
  +-----+-------+
  |  2  |  4    |
  +-----+-------+
  |  3  |  1    |
  +-----+-------+
  |  4  |  1    |
  +-----+-------+
   unordered_map

    Since arr[i], 3 does exist in unordered_map,
    so increment the value corresponding to the key 3.

Iteration 10:
      +---+---+---+---+---+---+---+---+---+----+
arr = | 1 | 2 | 2 | 3 | 1 | 4 | 2 | 2 | 3 | 12 |
      +---+---+---+---+---+---+---+---+---+----+
        0   1   2   3   4   5   6   7   8   9
                                            ^
                                            |
                                            i

  +-----+-------+
  | key | value |
  +-----+-------+
  |  1  |  2    |
  +-----+-------+
  |  2  |  4    |
  +-----+-------+
  |  3  |  2    |
  +-----+-------+
  |  4  |  1    |
  +-----+-------+
   unordered_map

    Since arr[i], 12 does not exist in unordered_map,
    so add the key = 12, with corresponding value = 1.

Loop Termination:
      +---+---+---+---+---+---+---+---+---+----+
arr = | 1 | 2 | 2 | 3 | 1 | 4 | 2 | 2 | 3 | 12 |
      +---+---+---+---+---+---+---+---+---+----+
        0   1   2   3   4   5   6   7   8   9
                                                 ^
                                                 |
                                                 i

  +-----+-------+
  | key | value |
  +-----+-------+
  |  1  |  2    |
  +-----+-------+
  |  2  |  4    |
  +-----+-------+
  |  3  |  2    |
  +-----+-------+
  |  4  |  1    |
  +-----+-------+
  | 12  |  1    |
  +-----+-------+
   unordered_map

    Since i has exceeded the bounds of the array,
    the loop will terminate.
Explanation:
  1. Populate the unordered_map:
    • For each element in the array, we increment its count in frequencyMap by writing frequencyMap[num]++.
    • If the element does not exist in frequencyMap, it’s automatically added with a default value of 0, which is then incremented to 1.
  2. Output the Frequencies:
    • We iterate through frequencyMap to print the frequency of each element.
Note:

If you try to do it with hash array, you had to declare a 13 size array but here the map only requires storage for distinct elements. In this example there are five distinct elements in the array (1, 2, 3, 4, 12), so the unordered_map only needs to store these keys, each with corresponding frequency.

Comparison with Hash Array:

  1. Memory Efficiency:
    • If we used a hash array, we'd need an array with a size at least equal to the maximum element (12), i.e., an array of size 13.
    • With an unordered_map, we only store keys that actually appear in the array. This means the size of the unordered_map is based on the number of distinct elements, not on the range or size of the original array values.
  2. Flexibility with Large Numbers:
    • If the array had very large numbers (e.g., 10^6 or higher), a hash array would require an extremely large size.
    • An unordered_map efficiently handles large numbers without needing a massive array, as it dynamically stores only the keys that are present.

Complexity of map & unordered_map:

1️⃣ map:

Underlying structure: Balanced binary tree (usually implemented as a Red-Black Tree).

  • Time Complexity:
    • Insertion: O(log n)
    • Deletion: O(log n)
    • Lookup (Access): O(log n)
  • Order: Elements are stored in sorted order based on keys.

Because of the tree structure, each operation requires O(log n) time on average. This is slower than unordered_map for large datasets, but the order of elements is guaranteed.

2️⃣ unordered_map:

Underlying structure: Hash table.

  • Time Complexity:
    • Insertion: O(1) on average, O(n) in the worst case (when hash collisions require rehashing).
    • Deletion: O(1) on average, O(n) in the worst case (when hash collision occur).
    • Lookup: O(1) on average, O(n) in the worst case.
  • Order: Elements are stored in an arbitrary order: no sorting of keys.

Due to the hash table, unordered_map provides O(1) average-time complexity for insertion, deletion, and lookup, which is generally faster than map. However, in rare cases of hash collisions (when several keys hash to the same index), time complexity can degrade to O(n). Additionally, rehashing (expanding the hash table size) may also temporarily impact performance.

Summary Table

Operationmap (Balanced Tree)unordered_map (Hash Table)
InsertionO(log n)O(1) average, O(n) worst
DeletionO(log n)O(1) average, O(n) worst
LookupO(log n)O(1) average, O(n) worst
OrderSorted by keyArbitrary

➤ Hashing Methods

1. Direct Hashing (Direct Addressing)

  • In direct hashing, each key maps directly to a unique location in a hash table without using a hash function.
  • Works well when you know the range of keys in advance, and they are not very large.
  • Example: For a small set of integers as keys, we could use an array where each index represents a unique key.
  • Drawback: Inefficient for a large range of keys.

2. Modulo Division Method (Modulo Hashing)

  • In this method, the hash function is h(key) = key % table_size.
  • This creates a hash code by dividing the key by the table size and taking the remainder.
  • Advantage: Simple and efficient for evenly distributed keys.
  • Drawback: May lead to clustering if keys are not uniformly distributed.
  • Choosing a prime number for table_size often helps distribute the keys more evenly.
nums = {2, 5, 16, 28, 139, 38, 48, 28, 18}

hash(i) = nums[i] % 10

hash_array
    0
    1
    2 -> 2
    3
    4
    5 -> 5
    6 -> 16
    7
    8 -> 18 -> 28 -> 28-> 38 -> 48 (sorted chaining)
    9 -> 139

Steps of the Division Method:

  • Choose a Prime Number: The first step is to select a large prime number, denoted as p, which will serve as the divisor in the hashing process. Prime numbers are preferred because they tend to distribute hash values more uniformly across the available range.
  • Compute the Hash Value: For a given key k, the hash value h(k) is computed using the formula:
    h(k) = k % p
    • Here, the modulus operation (%) returns the remainder when the key k is divided by the prime number p. This remainder serves as the index in the hash table where the key will be stored.
Consider a scenario where a set of keys {56, 75, 42, 88, 91} needs to be stored in a hash table. Let's choose a prime number p = 7 as the divisor.

For key 56: h(56) = 56 % 7 = 0 (Store at index 0)
For key 75: h(75) = 75 % 7 = 5 (Store at index 5)
For key 42: h(42) = 42 % 7 = 0 (Collision occurs at index 0)
For key 88: h(88) = 88 % 7 = 4 (Store at index 4)
For key 91: h(91) = 91 % 7 = 0 (Collision occurs at index 0)

In this example, multiple keys are mapped to the same index, leading to collisions at index 0. The division method alone does not address collisions, which is why additional techniques such as chaining are implemented internally to manage them.

What is Hash Collision ?

A hash collision happens when two different keys are mapped to the same index by a hash function.

int hash(int key) {
    return key % 10;
}

For keys 23 and 33:

  • 23 % 10 = 3
  • 33 % 10 = 3

Both map to the index 3 → collision!

Why Do Collisions Happen ?

Because:

  1. The number of possible keys is much larger than the number of available slots (limited hash table size).
  2. A hash function is not perfect – it's just a formula, not a unique mapping.

How to Handle Hash Collisions

There are two main techniques:

1️⃣ Chaining (Separate Chaining)

Each bucket (index) holds a linked list or vector of all entries that hash to that index.

It is a collision resolution technique commonly used in conjunction with the division method. When two keys hash to the same index, chaining stores them in a linked list or another secondary data structure at that index, thereby allowing multiple elements to occupy the same position in the hash table.

Steps of Chaining Implementation:

  • Initialize the Hash Table: Begin by creating an array of linked lists (or another secondary structure) where each element in the array corresponds to an index in the hash table.
  • Insert Key into the Hash Table: For each key k, compute the hash value using the division method. If a collision occurs (i.e., the computed index is already occupied), append the new key to the linked list at that index.
  • Resolve Collisions: In the event of multiple keys hashing to the same index, chaining ensures that each key is accessible by traversing the linked list at that index.

Example of Chaining with the Division Method:

Using the same set of keys {56, 75, 42, 88, 91} and prime number p = 7, chaining can be implemented as follows:
For key 56: h(56) = 0 (Insert into the list at index 0)
For key 75: h(75) = 5 (Insert into the list at index 5)
For key 42: h(42) = 0 (Append to the list at index 0, which now contains 56 -> 42)
For key 88: h(88) = 4 (Insert into the list at index 4)
For key 91: h(91) = 0 (Append to the list at index 0, which now contains 56 -> 42 -> 91)

Chaining effectively resolves collisions by allowing multiple keys to coexist at the same index without overwriting each other. This method maintains the efficiency of the division method while ensuring that all keys are accessible, even in the presence of collisions.

const int SIZE = 10;
vector<int> hashTable[SIZE];

void insert(int key) {
    int index = key % SIZE;
    hashTable[index].push_back(key);
}

📊 Table Example:

Index 0: 
Index 1: 
Index 2: 
Index 3: 23 → 33 → 43
...

Accessing Items (Search):

bool search(int key) {
    int index = hash(key);
    for (int x : hashTable[index]) {
        if (x == key)
            return true;
    }
    return false;
}
  • Pros:
    • Easy to implement.
    • No limit on number of keys at an index.
  • Cons:
    • Lookup becomes O(n) in worst case (if many values collide).

Complete Example:

#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;

const int SIZE = 10;
vector<int> hashTable[SIZE];

int hashFunction(int key) {
    return key % SIZE;
}

void insert(int key) {
    int index = hashFunction(key);
    hashTable[index].push_back(key);
}

bool search(int key) {
    int index = hashFunction(key);
    for (int x : hashTable[index]) {
        if (x == key)
            return true;
    }
    return false;
}

void remove(int key) {
    int index = hashFunction(key);
    auto& bucket = hashTable[index];
    bucket.erase(remove(bucket.begin(), bucket.end(), key), bucket.end());
}

int main() {
    insert(23);
    insert(33);
    insert(43);

    cout << (search(33) ? "Found\n" : "Not Found\n");

    remove(33);
    cout << (search(33) ? "Found\n" : "Not Found\n");

    return 0;
}

2️⃣ Open Addressing:

Instead of using a list at each index, we probe for the next available index.

Types of Probing:

  • Linear Probing: index = (index + 1) % SIZE
  • Quadratic Probing: index = (index + i*i) % SIZE
  • Double Hashing: Use a second hash function to calculate step size.
int insert(int key) {
    int index = key % SIZE;
    while (table[index] != -1)
        index = (index + 1) % SIZE;
    table[index] = key;
}

Bad Hash Function = More Collisions

❌ Bad:

int hash(int key) {
    return 0;  // all keys go to index 0 → worst case
}

✅ Good:

  • Use prime table size
  • Spread keys uniformly
  • Use randomness or bit mixing

Problems and Challenges:

While these hashing techniques are powerful, they are not without challenges:

  1. Collisions: Even with well-chosen hash functions, collisions are inevitable. Techniques like chaining help manage collisions, but they can introduce additional complexity and performance overhead.
  2. Load Factor: The efficiency of hashing is highly dependent on the load factor, which is the ratio of the number of elements to the number of available positions in the hash table. A high load factor increases the likelihood of collisions, while a low load factor can lead to inefficient use of memory.
  3. Choice of Hash Function: The choice of a hash function is critical in ensuring uniform distribution of keys. Poorly chosen hash functions can lead to clustering, where many keys map to the same or nearby indices, degrading performance.
  4. Memory Overhead: Techniques like chaining require additional memory to store linked lists or other secondary structures, which can become significant in large-scale applications.