What is Hashing?
Hashing is a way to store elements in a hash table so that lookup, insertion, and deletion can be done with O(1) complexity most of the time.
What is a Hash Table?
A hash table is a type of table where every item has a numeric index position. You can consider that indexing begins from zero. With each index position, there is an associated bucket. This bucket can store one or multiple elements. A Hash Table will look like this,
To add the data in hash table, we first need to pass the key through a hash function and get the hashed value, that will be used to calculate directly the index position in hashtable.
What is a Hash Function?
When we want to store any element in the hash table, we first call the hash function and pass our value to it. The hash function will return a numeric hash value. This numeric hash value maps to an index position in the hash table. It will then directly go to that index position and fill the given value in the associated bucket.
For example, if we want to store the string “Delhi” in the hash table, we will first call the hash function with the string “Delhi”. Suppose it gives us an index position of 2. We will then go to index position 02 in the hash table and store the value “Delhi” in the associated bucket.
If we want to store another value, say “Las Vegas”, we will pass this to the hash function, which might return a value of 03. We then go to index position 3 in the hash table and store the value “Las Vegas” in its associated bucket.
Similarly, we will store other values. Now, suppose we want to look for the value “Delhi”. We will call the hash function with “Delhi”, and it will return a hash value. It’s a property of the hash function to return the same value for the same input, no matter how many times it’s called. If we are looking for “Delhi”, and the hash function returns a value of 02, we will go directly to index position 02 in the hash table and check the associated bucket to see if “Delhi” exists. If it does, it means the hash table contains the value.
Up till now, we’ve used two functions: one is the hash function to calculate the hash of a given value, and the second is the equality operator to check the value. Once we reach the bucket, we need to compare to see if the value exists in the bucket. These two functions are necessary to implement a hash table. Whatever datatype or elements we plan to store in the hash table must support these two functions. If they don’t, they need to provide specialized implementations of these functions.
Hashing and Collision
Now we will discuss about another important concept of hashing i.e. collisions.
What is Collision?
In the previous section, we learned that when trying to add values into a hashtable, we first calculate the hash of the given value. This calculation provides a numeric value related to the index position. Based on this numeric value, we directly access the specific index (or row) in the hash table where there’s an associated bucket. We then place the value in that bucket.
But what happens if the hash function assigns the same hash value to two different values? This is called Collision in hashing.
For instance, if for both “Delhi” and “Tokyo” the hash function returns the same value, 02.
When we pass “Tokyo” to the hash function and get the hash value 02, we’d go to the associated bucket at index position 02 in the hash table. This row in the table has a bucket where the value “Delhi” is already stored. So, we would add the new value, “Tokyo”, to that same bucket. This capability to store multiple values is why it’s called a “bucket”. This scenario, where the hash function assigns the same hash value to different values, is termed a “collision”. In such cases, all those elements are stored in the same bucket. If you’re using an unordered_map in C++, the hash table’s bucket is typically implemented as a linked list to store multiple elements.
Searching for Element in Hash Table & Collisions
Suppose we’re searching for the value “Tokyo”. We’d call the hash function, which would return the hash value 2. We’d then access the hash table at index position 2 and inspect the bucket. If this bucket has multiple elements, the value of “Tokyo” would be compared against each element in the bucket. First, it might compare against “Delhi” (no match) and then against “Tokyo” (a match). This confirms the value exists in the bucket.
Conversely, if we’re searching for the value “London” and its hashed value is also 2, we’d again access the same bucket. In a scenario where this bucket has only “Delhi” and “Tokyo”, neither would match “London”. This indicates “London” doesn’t exist in the hash table.
To manage collisions in a hash table, we rely on two functions. One is the hash function to calculate the hash of a given value. The second is the equality operator, which is employed once we’ve identified the appropriate bucket based on the hash value. This operator then helps us compare elements within the bucket.
Deleting from a Hashtable
To delete a value from a hashtable, we first pass the given value to the hash function. This function returns a hash value, which corresponds directly to an index or row in the hash table. At that specific row, we inspect the associated bucket for the given value. If the value exists, we remove it from the bucket. If the value doesn’t exist, we do nothing. This process outlines how we can delete values from a hash table.
Complexity Considerations in Hashing
Most of the time, insertion, deletion, and lookup operations in a hash table have O(1) complexity. Why O(1)? Because the hash function operation is constant time, and with its output, we can directly access the desired row in the hash table in just one step. Once there, if each bucket contains only one element, we can also perform the needed operation (whether it’s insertion, deletion, or lookup) in constant time. However, this ideal O(1) complexity isn’t guaranteed. If there are numerous collisions in the hash table, for example due to a poorly designed hash function that frequently gives the same hash value for many different inputs, then many elements might be stored in the same bucket. In such cases, when searching for a particular element, we’d pass our query to the hash function, get an index, and go to the corresponding bucket in the hash table. But if that bucket contains numerous elements, we would have to check each element one by one. This represents a worst-case scenario where the lookup complexity becomes O(n), where n is the number of elements in the bucket.
In most cases, hash tables operate very efficiently with a complexity close to O(1), especially when each bucket typically contains just one or very few elements. If a bucket accumulates many elements, however, the efficiency decreases, and in extreme cases, the complexity can approach O(n). Yet, in practical scenarios, the complexity generally remains close to O(1). Therefore, using a hash table (often implemented as
unordered_set in C++) is often preferable over ordered containers like
map, especially for operations requiring rapid lookups.
Hashing is a method to store and retrieve data efficiently in structures like hash tables. It uses a hash function to quickly locate entries, though collisions can sometimes affect its speed.