Problem Statement
Given a list of accounts where each element account [i] is a list of strings, where the first element account [i][0] is a name, and the rest of the elements are emails representing emails of the account.
Now, merge these accounts. Two accounts definitely belong to the same person if there is some common email to both accounts. Note that even if two accounts have the same name, they may belong to different people as people could have the same name. A person can have any number of accounts initially, but all of their accounts definitely have the same name.
After merging the accounts, return the accounts in the following format: the first element of each account is the name, and the rest of the elements are emails in sorted order.
Examples
Example 1:
Input: N = 4,
accounts =
[
["John","johnsmith@mail.com","john_newyork@mail.com"],
["John","johnsmith@mail.com","john00@mail.com"],
["Mary","mary@mail.com"],
["John","johnnybravo@mail.com"]
]
Output: [
["John","john00@mail.com","john_newyork@mail.com", "johnsmith@mail.com"],
["Mary","mary@mail.com"],
["John","johnnybravo@mail.com"]
]
Explanation: The first and the second John are the same person as they have a common email. But the third Mary and fourth John are not the same as they do not have any common email. The result can be in any order but the emails must be in sorted order. The following is also a valid result:
[
['Mary', 'mary@mail.com'],
['John', 'johnnybravo@mail.com'],
['John', 'john00@mail.com' , 'john_newyork@mail.com', 'johnsmith@mail.com']
]
Input: N = 6,
accounts = [
["John","j1@com","j2@com","j3@com"],
["John","j4@com"],
["Raj",”r1@com”, “r2@com”],
["John","j1@com","j5@com"],
["Raj",”r2@com”, “r3@com”],
["Mary","m1@com"]
]
Output: [
["John","j1@com","j2@com","j3@com","j5@com"],
["John","j4@com"],
["Raj",”r1@com”, “r2@com”, “r3@com”],
["Mary","m1@com"]
]
Explanation: The first and the fourth John are the same person here as they have a common email. And the third and the fifth Raj are also the same person. So, the same accounts are merged.
Different Approaches
Intuition:
To identify if two accounts have a common email, iterating over two accounts of the same name and checking each email can be costly.
A more efficient way involves visualizing the set of emails of each account as a graph, forming various connected components. Set of emails belonging to a person form a single node. Each mail can be placed under a set representing that the email belongs to that person. If an email is found to be already associated with some other person, a common email is found, which means all the emails in both the sets belong to a single person and thus, merging of accounts(sets) must be done.
How to handle merging of accounts?
The data structure that is the best fit when it comes to handling merge/union operation is Disjoint Set. Hence, whenever the two sets(accounts) are found to have a common email, the merge operation can be performed in constant time.
Understanding:
To store the mails belonging to a single person, a hash-map can be used which will provide constant look-up time which will be efficient to check if the mails already belonging to a person or not.
Approach:
- Initialize a DSU structure to keep track of connected components (emails belonging to the same person).
- Traverse through each account and its emails. For each email, if it's seen for the first time, map it to the current account index. If the email has been seen before, union the current account index with the previously mapped account index.
- Create a list to collect emails for each connected component.
- Traverse the email-to-node mapping to collect all emails under their ultimate parent node.
- For each component, sort the emails and prepend the owner's name. Collect all the accounts and sort them by name.
Code:
#include <bits/stdc++.h>
using namespace std;
class DisjointSet {
public:
/* To store the ranks, parents and
sizes of different set of vertices */
vector<int> rank, parent, size;
// Constructor
DisjointSet(int n) {
rank.resize(n + 1, 0);
parent.resize(n + 1);
size.resize(n + 1);
for (int i = 0; i <= n; i++) {
parent[i] = i;
size[i] = 1;
}
}
// Function to find ultimate parent
int findUPar(int node) {
if (node == parent[node])
return node;
return parent[node] = findUPar(parent[node]);
}
// Function to implement union by rank
void unionByRank(int u, int v) {
int ulp_u = findUPar(u);
int ulp_v = findUPar(v);
if (ulp_u == ulp_v) return;
if (rank[ulp_u] < rank[ulp_v]) {
parent[ulp_u] = ulp_v;
}
else if (rank[ulp_v] < rank[ulp_u]) {
parent[ulp_v] = ulp_u;
}
else {
parent[ulp_v] = ulp_u;
rank[ulp_u]++;
}
}
// Function to implement union by size
void unionBySize(int u, int v) {
int ulp_u = findUPar(u);
int ulp_v = findUPar(v);
if (ulp_u == ulp_v) return;
if (size[ulp_u] < size[ulp_v]) {
parent[ulp_u] = ulp_v;
size[ulp_v] += size[ulp_u];
}
else {
parent[ulp_v] = ulp_u;
size[ulp_u] += size[ulp_v];
}
}
};
// Solution class
class Solution{
public:
// Function to marge the accounts
vector<vector<string>> accountsMerge(vector<vector<string>> accounts){
int n = accounts.size(); // Number of accounts
// Disjoint Set data structure
DisjointSet ds(n);
// Hashmap to store the pair: {mails, node}
unordered_map<string, int> mapMailNode;
// Iterate on all the accounts
for (int i=0; i < n; i++) {
// Iterate on all the mails of the person
for (int j = 1; j < accounts[i].size(); j++) {
// Get the mail
string mail = accounts[i][j];
// If this mail was not already existing
if (mapMailNode.find(mail) ==
mapMailNode.end()) {
// Add it to the hashmap
mapMailNode[mail] = i;
}
/* Otherwise join it with
the previous component */
else {
// Unite the components
ds.unionBySize(i, mapMailNode[mail]);
}
}
}
// To store the merged mails
vector<string> mergedMail[n];
// Iterate on the Hashmap
for (auto it : mapMailNode) {
string mail = it.first; // Mail
int node = ds.findUPar(it.second); // Node
// Add the merged mail to the list
mergedMail[node].push_back(mail);
}
// To return the result
vector<vector<string>> ans;
// Iterate on all list of merged mails
for (int i = 0; i < n; i++) {
/* If a person has no mails,
skip the iteration */
if (mergedMail[i].size() == 0)
continue;
// Otherwise, add all the merged mails of person
sort(mergedMail[i].begin(), mergedMail[i].end());
vector<string> temp;
temp.push_back(accounts[i][0]); // Add the name
// Add the mails
for (auto it : mergedMail[i]) {
temp.push_back(it);
}
ans.push_back(temp);
}
sort(ans.begin(), ans.end());
return ans;
}
};
int main() {
int n = 4;
vector<vector<string>> accounts = {
{"John","johnsmith@mail.com","john_newyork@mail.com"},
{"John","johnsmith@mail.com","john00@mail.com"},
{"Mary","mary@mail.com"},
{"John","johnnybravo@mail.com"}
};
// Creating instance of Solution class
Solution sol;
// Function call to merge the accounts
vector<vector<string>> ans;
ans = sol.accountsMerge(accounts);
// Output
cout << "The mareged accounts are:\n";
for(int i=0; i < ans.size(); i++) {
for(int j = 0; j < ans[i].size(); j++) {
cout << ans[i][j] << " ";
}
cout << endl;
}
return 0;
}
Complexity Analysis:
- Time Complexity:
O(N+E) + O(E*4ɑ) + O(N*(ElogE + E))
(where E = no. of emails)- Visiting all the emails takes O(N+E) time.
- In the worst case, all the accounts can be merged taking O(E*4ɑ) time.
- All the emails to the result list and Sorting the emails take O(N*(ElogE + E)) times.
- Space Complexity:
O(N+E)
- The hashmap will store all the emails taking O(E) space.
- The disjoint set data structure uses parent and size/rank arrays taking O(N) space.
- The resulting list will take up O(E) space.