Hash Table

Overview¶

A hash table is a data structure that maps keys to values using a hash function to compute an index into an array of buckets or slots. It provides O(1) average-case time complexity for insertions, deletions, and lookups, making it one of the most efficient data structures for key-value storage and retrieval.

Structure¶

A hash table consists of: - Hash Function: Converts keys into array indices (e.g., hash(key) % table_size) - Buckets/Slots: An underlying array that stores key-value pairs - Collision Handling: Mechanism to handle when multiple keys hash to the same index

Common collision resolution strategies: - Separate Chaining: Each bucket contains a linked list of entries with the same hash - Open Addressing: Find the next available slot using probing (linear, quadratic, or double hashing) - Load Factor: Ratio of stored entries to table size; when it exceeds a threshold (typically 0.7), the table is resized and rehashed

Operations¶

Access: To access a value by key: compute index = hash(key) % table_size, then retrieve the value at that bucket. If using separate chaining, traverse the linked list to find the matching key.

Search: Identical to access - use the hash function to find the bucket, then search within that bucket for the key. Returns the associated value if found, or indicates the key doesn't exist.

Insert: Compute the index using the hash function. If the bucket is empty, insert the key-value pair. If occupied, handle the collision using the chosen strategy (append to chain or probe for next slot). If load factor exceeds threshold, resize and rehash all entries.

Delete: Hash the key to find the bucket, locate the key-value pair, and remove it. For separate chaining, remove from the linked list. For open addressing, mark the slot as deleted (tombstone) to maintain probe sequences.

Resize: When the load factor becomes too high, create a new larger array (typically double the size) and rehash all existing entries into the new table to maintain performance.

Complexities¶

Access Ave - O(1) Worst - O(n) Search Ave - O(1) Worst - O(n) Insert Ave - O(1) Worst - O(n) Delete Ave - O(1) Worst - O(n) Space - O(n)

Note: Worst case O(n) occurs when all keys hash to the same bucket (all collisions). With a good hash function and proper load factor management, average case O(1) is achieved.

When to Use¶

Advantages: - Extremely fast average-case lookups, insertions, and deletions (O(1)) - Flexible key types (any hashable object can be a key) - Dynamic sizing - can grow as needed - Simple conceptual model for key-value associations

Disadvantages: - No ordering of elements (keys are not stored in sorted order) - Requires extra space for the underlying array and collision handling - Performance depends heavily on quality of hash function - Worst-case performance can degrade to O(n) with poor hash function or high collisions - Not cache-friendly due to random memory access patterns

Common use cases: - Implementing dictionaries, maps, and associative arrays - Database indexing for fast record lookup - Caching and memoization (storing computed results) - Counting frequency of elements (histogram/frequency tables) - Detecting duplicates in datasets - Symbol tables in compilers and interpreters - Implementing sets (hash set)

Visual Representation¶

Hash Table with Separate Chaining:

Index   Bucket
  0   -> [Empty]
  1   -> ["cat": 5] -> ["dog": 3] -> NULL (collision chain)
  2   -> [Empty]
  3   -> ["bird": 8] -> NULL
  4   -> [Empty]
  5   -> ["fish": 2] -> ["frog": 7] -> NULL (collision chain)
  6   -> ["lion": 1] -> NULL
  7   -> [Empty]

Example:
  hash("cat")  % 8 = 1
  hash("dog")  % 8 = 1  (collision, added to chain at index 1)
  hash("fish") % 8 = 5
  hash("frog") % 8 = 5  (collision, added to chain at index 5)

Load Factor = 6 entries / 8 slots = 0.75 (approaching resize threshold)

Implementation(s)¶

Python:

class HashTable:
    def __init__(self, size=8):
        self.size = size
        self.count = 0
        self.buckets = [[] for _ in range(size)]

    def _hash(self, key):
        """Simple hash function using Python's built-in hash."""
        return hash(key) % self.size

    def _resize(self):
        """Resize and rehash when load factor exceeds 0.7."""
        old_buckets = self.buckets
        self.size *= 2
        self.buckets = [[] for _ in range(self.size)]
        self.count = 0

        for bucket in old_buckets:
            for key, value in bucket:
                self.insert(key, value)

    def insert(self, key, value):
        """Insert or update a key-value pair."""
        index = self._hash(key)
        bucket = self.buckets[index]

        # Update if key exists
        for i, (k, v) in enumerate(bucket):
            if k == key:
                bucket[i] = (key, value)
                return

        # Insert new key-value pair
        bucket.append((key, value))
        self.count += 1

        # Resize if load factor > 0.7
        if self.count / self.size > 0.7:
            self._resize()

    def search(self, key):
        """Search for a key and return its value."""
        index = self._hash(key)
        bucket = self.buckets[index]

        for k, v in bucket:
            if k == key:
                return v

        raise KeyError(f"Key '{key}' not found")

    def delete(self, key):
        """Delete a key-value pair."""
        index = self._hash(key)
        bucket = self.buckets[index]

        for i, (k, v) in enumerate(bucket):
            if k == key:
                del bucket[i]
                self.count -= 1
                return v

        raise KeyError(f"Key '{key}' not found")

    def __str__(self):
        """String representation of the hash table."""
        result = []
        for i, bucket in enumerate(self.buckets):
            if bucket:
                items = " -> ".join([f"({k}: {v})" for k, v in bucket])
                result.append(f"  {i}: {items}")
        return "HashTable:\n" + "\n".join(result) if result else "HashTable: (empty)"


# Example usage
ht = HashTable()
ht.insert("name", "Alice")
ht.insert("age", 30)
ht.insert("city", "NYC")
print(ht.search("name"))  # Output: Alice
ht.delete("age")
print(ht)

External Links¶

Up