A hash table is a data structure that allows you to quickly store and retrieve data using a key, making operations like searching, inserting, and deleting very fast on average. It is widely used in real-world applications where speed is critical. For example, programming languages use hash tables to implement dictionaries or maps, letting developers associate values with unique keys. Web browsers rely on them for caching, so frequently visited websites load faster. Databases use hash tables for indexing, enabling quick lookups of records. Even compilers use them to manage symbol tables, keeping track of variable and function names efficiently.
Hash Table
A hash table is defined with a universe
, an array of size , with currently stored keys. And a hash function , such that . Each slot in the hash table contains a linked list. For insertion, if a key is hashed into a non-empty slot, place the new pair (key plus satellite data) at the front of the respective list.
Hash Functions
Uniform Hash Function
A uniform hash function is a function where any given key is equally likely to map onto any of the m slots, independently of where any other key has been mapped to. Thus, it is a function h such that:
Proposition
Given a uniform hash function and assuming the input keys are uniformly distributed and independent, we get the following collision probability:
Load Factor
For a hash table with
keys stored and slots, define as the load factor.
Time Complexity of Uniform Hashing
Assume we use uniform hashing with chaining,
is the number of keys stored, is the size of the array, is the load factor. Then Average time complexity of searching a key is in , while the worst is in .
Universal Hashing
Universal Hash Functions
Suppose we have
slots in a hashtable. A collection of hash functions is called universal if for each pair of keys , the number of hash functions for which is at most . That is Equivalently, for each arbitrary , the probability of a hash collision between any distinct keys and is at most .
Universal Hashing
In universal hashing, we choose a random hash function
from a universal collection of hash functions .
Dot Product Hash Family
Assume
is a prime number. Express key (which has base ) with base . Then express as vector where for all , and . Define: where is a base vector. Then dot product hash family is .
e.g. Suppose the possible keys are integers
Proposition
The dot product hash family is universal.
Open Addressing
Open addressing (also called closed hashing) is a collision-resolution technique used in hash tables where all items are stored directly within the table’s array. When a collision occurs (i.e., two keys map to the same index), the algorithm probes the table to find another available slot instead of using an external structure like a linked list.
Open Addressing & Probe Sequence
In open addressing, hash function becomes a map
. The sequence is called a probe sequence.
Proposition
Each probe sequence must be a permutation of
.
Linear Hashing
The linear hashing strategy is to simply define
where is a usual hash function.
Double Hashing
The double hashing strategy is to combine two hash functions together such that
where and are usual hash functions satisfying .
Perfect Hashing
Bi-Level Hashing
We call a data structure bi-level hashing if each slot of the hash table points to another hash table, and universal hashing is used in both levels.
Theorem
Suppose
is the number of data. Let the number of slots m be . The probability that there are any collisions (using a hash function randomly chosen from a set of universal hash functions) is strictly less than .
Proof There exist
Build Bi-Level Hashing
Bi-level hashing can be built as follows:
- Set up outer hash table:
, pick a universal hash function . - Set up inner hash tables:
- If
for a selected constant , then redo the initial step. - Choose a universal hash function
to be the hash function for this inner hash table at position and insert its elements. - As long as there are two
with , pick a different and rehash those elements in that inner table according to the new function.