A hash table is a data structure that allows you to quickly store and retrieve data using a key, making operations like searching, inserting, and deleting very fast on average. It is widely used in real-world applications where speed is critical. For example, programming languages use hash tables to implement dictionaries or maps, letting developers associate values with unique keys. Web browsers rely on them for caching, so frequently visited websites load faster. Databases use hash tables for indexing, enabling quick lookups of records. Even compilers use them to manage symbol tables, keeping track of variable and function names efficiently.

Hash Table

A hash table is defined with a universe , an array of size , with currently stored keys. And a hash function , such that . Each slot in the hash table contains a linked list. For insertion, if a key is hashed into a non-empty slot, place the new pair (key plus satellite data) at the front of the respective list.

Hash Functions

Uniform Hash Function

A uniform hash function is a function where any given key is equally likely to map onto any of the m slots, independently of where any other key has been mapped to. Thus, it is a function h such that:

Proposition

Given a uniform hash function and assuming the input keys are uniformly distributed and independent, we get the following collision probability:

Load Factor

For a hash table with keys stored and slots, define as the load factor.

Time Complexity of Uniform Hashing

Assume we use uniform hashing with chaining, is the number of keys stored, is the size of the array, is the load factor. Then Average time complexity of searching a key is in , while the worst is in .

Universal Hashing

Universal Hash Functions

Suppose we have slots in a hashtable. A collection of hash functions is called universal if for each pair of keys , the number of hash functions for which is at most . That is Equivalently, for each arbitrary , the probability of a hash collision between any distinct keys and is at most .

Universal Hashing

In universal hashing, we choose a random hash function from a universal collection of hash functions .

Dot Product Hash Family

Assume is a prime number. Express key (which has base ) with base . Then express as vector where for all , and . Define: where is a base vector. Then dot product hash family is .

e.g. Suppose the possible keys are integers and array size . So we get . Assume pick . Then . Assume we have . Hence we get .

Proposition

The dot product hash family is universal.

Open Addressing

Open addressing (also called closed hashing) is a collision-resolution technique used in hash tables where all items are stored directly within the table’s array. When a collision occurs (i.e., two keys map to the same index), the algorithm probes the table to find another available slot instead of using an external structure like a linked list.

Open Addressing & Probe Sequence

In open addressing, hash function becomes a map . The sequence is called a probe sequence.

Proposition

Each probe sequence must be a permutation of .

Linear Hashing

The linear hashing strategy is to simply define where is a usual hash function.

Double Hashing

The double hashing strategy is to combine two hash functions together such that where and are usual hash functions satisfying .

Perfect Hashing

Bi-Level Hashing

We call a data structure bi-level hashing if each slot of the hash table points to another hash table, and universal hashing is used in both levels.

Theorem

Suppose is the number of data. Let the number of slots m be . The probability that there are any collisions (using a hash function randomly chosen from a set of universal hash functions) is strictly less than .

Proof There exist pairs of keys (worst case number of collisions). And each collision has probability of . Let X be a random variable counting the number of collisions. Then: $$ E[X]= {n \choose 2} \cdot \frac{1}{m} = \frac{1}{2}. $$$\square$

Build Bi-Level Hashing

Bi-level hashing can be built as follows:

  1. Set up outer hash table: , pick a universal hash function .
  2. Set up inner hash tables:
  3. If for a selected constant , then redo the initial step.
  4. Choose a universal hash function to be the hash function for this inner hash table at position and insert its elements.
  5. As long as there are two with , pick a different and rehash those elements in that inner table according to the new function.