This is the 14th Part of Tutorial series on unordered_map. In this tutorial, we will learn about how Collision Resolution works in an unordered_map in C++.

Collision resolution in an unordered_map in C++ helps to determines how the data structure handles situations where multiple keys hash to the same bucket. Understanding this process is important to understand the efficiency of hash tables in various scenarios.

What is a Collisions in Unordered_Map in C++?

A collision in a hash table occurs when two or more keys produce the same hash value, leading them to be assigned to the same bucket. Since a bucket can only hold one value directly, the unordered_map must have a way to handle these collisions.

Different Collision Resolution Techniques

The C++ Standard Library typically uses two main techniques for collision resolution:

  1. Chaining (Separate Chaining): Each bucket in the hash table stores a linked list (or another dynamic data structure) of all elements that hash to that bucket. If a collision occurs, the new element is simply added to the list in the corresponding bucket.
  2. Open Addressing: Not commonly used in standard implementations, this method involves finding another bucket in the hash table for the new element when a collision occurs. Techniques like linear probing, quadratic probing, or double hashing are used.

The standard unordered_map in C++ generally uses chaining.

Collision Handling in C++ Unordered_Map

In C++’s unordered_map, chaining is the preferred method. Here’s how it works:

  1. Insertion: When inserting a new key-value pair, the hash function calculates the bucket index. If the bucket is already occupied (collision), the new element is added to the linked list in that bucket.
  2. Lookup: To find an element, the hash table calculates the bucket index. It then traverses the linked list at that index (if any) to find the element.
  3. Deletion: Similar to lookup, deletion involves finding the right bucket and then traversing the list to remove the specific element.

Example of Collision handling in unordered_map in C++

Let’s create an unordered_map with a custom hash function that provides the same hash value for specific strings (“Why”, “Where”, “How”) and unique hash values for all other strings is an interesting exercise.

The custom hash function should:
1. Return the same hash value for “Why”, “Where”, and “How”.
2. Return unique hash values for other strings.

Custom Hash function is as follows,

#include <iostream>
#include <unordered_map>
#include <string>

struct CustomHash {
    std::size_t operator()(const std::string& key) const {
        if (key == "Why" || key == "Where" || key == "How") {
            return 0; // Same hash value for these three strings
        }
        // Basic hash function for other strings
        std::size_t hashValue = 0;
        for (char c : key) {
            hashValue = hashValue * 31 + c;
        }
        return hashValue;
    }
};

Now, you can create an unordered_map using this custom hash function like this,

// Create and add elements to the unordered_map
std::unordered_map<std::string, int, CustomHash> wordFrequency;

wordFrequency["Why"] = 11;
wordFrequency["Where"] = 22;
wordFrequency["How"] = 33;
wordFrequency["Sonehow"] = 44;
wordFrequency["Whatever"] = 55;

Now let’s observe the behavior of buckets and collisions: using printMapDetails() function.

// Function to print details
void printMapDetails(const std::unordered_map<std::string, int, CustomHash>& mapObj) 
{
    std::cout << "Number of buckets: " << mapObj.bucket_count() << std::endl;
    for (unsigned i = 0; i < mapObj.bucket_count(); ++i) 
    {
        std::cout << "Bucket #" << i << " contains: ";
        for (auto it = mapObj.begin(i); it != mapObj.end(i); ++it)
        {
            std::cout << "[" << it->first << ":" << it->second << "] ";
        }
        std::cout << std::endl;
    }
}

int main() 
{
    std::unordered_map<std::string, int, CustomHash> wordFrequency;

    wordFrequency["Why"] = 11;
    wordFrequency["Where"] = 22;
    wordFrequency["How"] = 33;
    wordFrequency["Sonehow"] = 44;
    wordFrequency["Whatever"] = 55;

    printMapDetails(wordFrequency);
    return 0;
}

Output:

Number of buckets: 13
Bucket #0 contains: [How:33] [Where:22] [Why:11] 
Bucket #1 contains: 
Bucket #2 contains: 
Bucket #3 contains: 
Bucket #4 contains: 
Bucket #5 contains: 
Bucket #6 contains: [Whatever:55] 
Bucket #7 contains: 
Bucket #8 contains: 
Bucket #9 contains: 
Bucket #10 contains: [Somehow:44] 
Bucket #11 contains: 
Bucket #12 contains: 

It created an unordered_map with a custom hash function. The strings “Why”, “Where”, and “How” will collide as they are given the same hash value (0). Other strings like “Whatever” and “Somehow” will have unique hash values and will likely be in different buckets. The printMapDetails function will help you visualize the distribution of elements across different buckets.

The printMapDetails() function starts by displaying the total number of buckets using map.bucket_count(). Then it iterates over each bucket, using a loop that runs from 0 to the total number of buckets. For each bucket, it uses two iterators, obtained from map.begin(i) and map.end(i), to traverse the elements in that specific bucket. As it iterates, the function prints the key-value pairs (it->first and it->second) of each element in the current bucket. This process repeats for each bucket, giving a clear picture of how the elements are distributed across the unordered_map,.

Summary

Collision resolution in an unordered_map is typically handled through chaining, where each bucket contains a list of all elements that hash to it. This method is efficient for a moderate number of collisions and is a critical aspect of the performance characteristics of hash tables in C++.

Ritika Ohri

Hi, I am Ritika Ohri, founder of this blog. I craft comprehensive programming tutorials and also manage a YouTube channel. You can also connect with me on Linkedin.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.