Disjoint Set

Problem

Suppose we are given a graph with several connected components and need to determine whether two vertices, u and v, belong to the same component. There are different ways to solve this problem, ranging from brute force methods to more efficient approaches using advanced data structures like disjoint sets (also known as union-find).

Brute Force Approach:

One straightforward method is to perform a graph traversal. For example, if we start a Breadth-First Search (BFS) or Depth-First Search (DFS) from vertex u, we can check if vertex v is reached during traversal. If v is visited, then u and v are in the same component; otherwise, they are not. This approach works well when we have just one or a few queries.

However, imagine we have many queries asking whether various pairs of vertices are connected. In that case, running a full BFS or DFS for every query could be very inefficient. Each traversal takes O(V + E) time in the worst case, where V is the number of vertices and E is the number of edges. If we need to run thousands of queries, the total running time can quickly become impractical.

Disjoint Set (Union-Find)

A much more efficient solution for multiple connectivity queries is to use the disjoint set data structure. The idea is to preprocess the graph by merging vertices that are directly connected by an edge. Initially, every vertex is considered to be in its own set. Then, for each edge in the graph, merge (or “union”) the sets containing the two endpoints of the edge. After processing all edges, any two vertices u and v belong to the same connected component if and only if they are in the same set. This check is performed by the “find” operation in the disjoint set, which returns a representative (or root) for the set.

Problem Statement

Design a disjoint set (also called union-find) data structure that supports the following operations:

DisjointSet(int n) initializes the disjoint set with n elements.

void unionByRank(int u, int v) merges the sets containing u and v using the rank heuristic.

void unionBySize(int u, int v) merges the sets containing u and v using the size heuristic.

bool find(int u, int v) checks if the elements u and v are in the same set and returns true if they are, otherwise false.

Examples

Example 1:

Input: ["DisjointSet", "unionByRank", "unionBySize", "find", "find"]
       [
        [5], [0, 1], [2, 3], [0, 1], [0, 1]
       ]
Output: [null, null, null, true, false]

Explanation:
DisjointSet ds = new DisjointSet(5); // Initialize a disjoint set with 5 elements.
ds.unionByRank(0, 1); // Merge sets containing 0 and 1 using rank.
ds.unionBySize(2, 3); // Merge sets containing 2 and 3 using size.
ds.find(0, 1); // Returns true as 0 and 1 are in the same set.
ds.find(0, 3); // Returns false as 0 and 3 are not in the same set.

Example 2:

Input: ["DisjointSet", "unionBySize", "unionBySize", "find", "find"]
       [
        [3], [0, 1], [1, 2], [0, 2], [0, 1]
       ]
Output: [null, null, null, true, true]

Explanation:
DisjointSet ds = new DisjointSet(3);  // Initialize a disjoint set with 3 elements.
ds.unionBySize(0, 1); // Merge sets containing 0 and 1 using size.
ds.unionBySize(1, 2); // Merge sets containing 1 and 2 using size.
ds.find(0, 2); // Returns true as 0 and 2 are in the same set.
ds.find(0, 1); // Returns true as 0 and 1 are in the same set.

1️⃣ Disjoint Set Union by Rank

Problem Statement

Given two components of a undirected graph, determine if node 1 and node 5 belong to the same component.

Approach:

To solve this problem, Depth-First Search (DFS) or Breadth-First Search (BFS) traversal techniques can be employed. By traversing the graph components, it is evident that node 1 and node 5 are in different components. This brute force approach has a time complexity of O(N+E) where N is the number of nodes and E is the number of edges. However, using a Disjoint Set data structure, this problem can be solved in constant time.

The Disjoint Set data structure is particularly useful for dynamic graphs.

Dynamic Graph:

A dynamic graph refers to a graph that continuously changes its configuration.
For example, consider the edge information for a graph as:

{{1,2}, {2,3}, {4,5}, {6,7}, {5,6}, {3,7}}

Adding edges one by one changes the structure of the graph at each step. After adding the first four edges, nodes 4 and 1 belong to different components. After adding all six edges, nodes 4 and 1 belong to the same component.

Now, to answer any query that requires checking whether two nodes belong to the same component in the graph, if the traversal techniques are used, then it will be extremely time consuming as the graph is changing every time. Here, Disjoint Set plays a crucial role. Disjoint Set can quickly determine if two arbitrary nodes u and v are in the same component at any step.

Functionalities of Disjoint Set Data Structure:

The Disjoint Set data structure provides two primary functionalities:

Finding the parent(ultimate parent) of a particular node.
Union operation (which adds an edge between two nodes):
Union by Rank
Union by Size

Terminologies:

Ultimate Parent: The parent of a node refers to the node right above that particular node. The ultimate parent refers to the topmost node or the root node of that component.
Rank: The rank of a node refers to the distance (the number of nodes including the leaf node) between the furthest leaf node and the current node. Rank includes all nodes beneath the current node.

Implementation:

To implement Union by rank, two arrays of size N (number of nodes) are needed:

Rank array: Storing the rank of each node.
Parent array: Storing the ultimate parents of each node.

Algorithm:

Initial Configuration:
- Rank array: Initialized with zeros.
- Parent array: Initialized with the value of nodes, i.e., parent[i] = i.
Union Function Steps:
- The Union function requires two nodes (let's say u and v) as arguments. Find the ultimate parent of u and v using the a helper findPar() function. Let pu and pv be the ultimate parents of u and v respectively.
- Determine the rank of pu and pv.
- Connect the ultimate parent with a smaller rank to the other ultimate parent with a larger rank. If the ranks are equal, connect any parent to the other and increase the rank of the parent node to which the other is connected.

Example:

Given the edges of a graph as {{1, 2}, {2, 3}, {4, 5}, {6, 7}, {5, 6}}:

Observations:

Observation 1:

Only the ultimate parent is considered, not the immediate parent. For instance, after Union by rank operations, if nodes 5 and 7 are queried, the answer is yes. Their immediate parents differ, but their ultimate parent is the same, i.e., node 4. Thus, the ultimate parent is crucial.

For this the findPar() function helps find the ultimate parent of a particular node.
findPar() Function: This function takes a single node as an argument and finds the ultimate parent for that node.

Observation 2:

Finding the ultimate parent of each query separately using recursion takes O(logN) time complexity. However, the operation is desired in constant time. This is where the path compression technique is useful.

Path Compression: Connecting each node in a path to its ultimate parent reduces time complexity nearly to constant time. The path from the node to its ultimate parent is compressed to only a single edge.

How the Time Complexity reduces? Before path compression, finding the ultimate parent for node 4 involves traversing back to node 1, which is the height of size logN. After path compression, the ultimate parent is accessed in a single step, reducing traversal and thus the time complexity.

Note: We cannot change the ranks while applying path compression.

Overall, findPar() method helps to reduce the time complexity of the union by the rank method as it can find the ultimate parent within constant time. The algorithm for compressing a path is discussed below.

Algorithm:

The process of path compression can be done using the backtracking method.
Base Case: If the node and its parent are the same, return the node.
Call the findPar() function for a node until it hits the base case. While backtracking, update the parent of the current node with the returned value.

Code:

#include <bits/stdc++.h>
using namespace std;

class DisjointSet {
private:
    // To store the rank of each node
    vector<int> rank;
    
    /* To store the size of components 
    that each node belongs to */
    vector<int> size;
    
    // To store the ultimate parent of each node
    vector<int> parent;
    
public:
    
    // Constructor
    DisjointSet(int n) {
        
        // Resize the arrays
        rank.resize(n + 1, 0);
        parent.resize(n + 1);
        size.resize(n+1, 1);
        // Initialise each node with its own value
        for (int i = 0; i <= n; i++) {
            parent[i] = i;
        }
    }
    
    /* Helper function to find ultimate
    parent along with path compression */
    int findUPar(int node) {
        // Base case
        if (node == parent[node])
            return node;
            
        // Backtracking step for path compression
        return parent[node] = findUPar(parent[node]);
    }
    
    /* Function to detemine if two nodes 
    are in the same component or not */
    bool find(int u, int v) {
        // Return true if both have same ultimate parent 
        return (findUPar(u) == findUPar(v));
    }
    
    /* Function to perform union of 
    two nodes based on ranks */
    void unionByRank(int u, int v) {
        
        // Get the ultimate parents of both nodes
        int ulp_u = findUPar(u);
        int ulp_v = findUPar(v);
        
        // Return if nodes already belong to the same component
        if (ulp_u == ulp_v) return;
        
        /* Otherwise, join the node to the other 
        node having higher ranks among the two */
        if (rank[ulp_u] < rank[ulp_v]) {
            // Update the parent
            parent[ulp_u] = ulp_v;
        }
        else if (rank[ulp_v] < rank[ulp_u]) {
            // Update the parent
            parent[ulp_v] = ulp_u;
        }
        
        /* If both have same rank, join any node to the 
        other and increment the rank of the parent node */
        else {
            // Update the parent
            parent[ulp_v] = ulp_u;
            
            // Update the rank
            rank[ulp_u]++;
        }
    }
    
    /* Function to perform union of 
    two nodes based on ranks */
    void unionBySize(int u, int v) {
        
        // Get the ultimate parents of both nodes
        int ulp_u = findUPar(u);
        int ulp_v = findUPar(v);
        
        // Return if nodes already belong to the same component
        if (ulp_u == ulp_v) return;
        
        /* Otherwise, join the node belonging to the smaller 
        component to the node belonging to bigger component */
        if (size[ulp_u] < size[ulp_v]) {
            // Update the parent
            parent[ulp_u] = ulp_v;
            
            // Update the size 
            size[ulp_v] += size[ulp_u];
        }
        else {
            // Update the parent
            parent[ulp_v] = ulp_u;
            
            // Update the size
            size[ulp_u] += size[ulp_v];
        }
    }
};


int main() {
    // Disjoint Data structure
    DisjointSet ds(7);
    ds.unionByRank(1, 2); // Adding edge between 1 and 2
    ds.unionByRank(2, 3); // Adding edge between 2 and 3
    ds.unionByRank(4, 5); // Adding edge between 4 and 5
    ds.unionByRank(6, 7); // Adding edge between 6 and 7
    ds.unionByRank(5, 6); // Adding edge between 5 and 6
    
    /* Checking if node 3 and node 7 
    are in the same component */
    if (ds.find(3, 7)) 
        cout << "They belong to the same components.\n";
    else 
        cout << "They do not belong to the same components.\n";
        
    ds.unionByRank(3, 7); // Adding edge between 3 and 7

    /* Checking again if node 3 and node 7 
    are in the same component */
    if (ds.find(3, 7)) 
        cout << "They belong to the same components.\n";
    else 
        cout << "They do not belong to the same components.\n";
    
    return 0;
}

Complexity Analysis:

Time Complexity:O(1)
- The actual time complexity of UnionByRank() and findPar() methods is O(4α), which is very small and close to 1. This 4α term has a long mathematical derivation not required for an interview.
Space Complexity:O(2N) (where N is the number of nodes)
- The Disjoint Set Data structure uses a parent and a rank array each of size N.

Problem

Brute Force Approach:

Disjoint Set (Union-Find)

Problem Statement

Examples

1️⃣ Disjoint Set Union by Rank

Problem Statement

Approach:

Dynamic Graph:

Functionalities of Disjoint Set Data Structure:

Terminologies:

Implementation:

Example:

Observations:

Algorithm:

Code:

Complexity Analysis:

About me

Popular Posts

Variadic Function Working in C

Understanding Complex C/C++ Declarations

How Characters are Stored in Memory

Quick links

Tags

Newsletter

🛠️ Settings

Problem

Brute Force Approach:

Disjoint Set (Union-Find)

Problem Statement

Examples

1️⃣ Disjoint Set Union by Rank

Problem Statement

Approach:

Dynamic Graph:

Functionalities of Disjoint Set Data Structure:

Terminologies:

Implementation:

Example:

Observations:

Algorithm:

Code:

Complexity Analysis:

Variadic Function Working in C

Understanding Complex C/C++ Declarations

How Characters are Stored in Memory