Hash Table — Separate Chaining

Intuition

A hash table maps keys to array slots via a hash function — h(k) = k mod capacity. The catch is collisions: two different keys hashing to the same slot. Separate chaining's answer is the simplest one imaginable: at every slot, keep a list of all keys that hashed there. A "slot" is no longer a single cell — it's a bucket.

To look up a key, hash to find the bucket, then walk that bucket's short list. With a good hash function and reasonable load factor, each bucket holds only a handful of items, so the walk is O(1) average. The buckets absorb collisions locally without disturbing any other slot in the table.

How It Works

The table is an array of capacity buckets. Each bucket is a list (linked or array — Python uses a list).

Three operations, all starting with bucket = buckets[hash(k)]:

insert(k) — walk the bucket. If k is already there → do nothing (set semantics). Otherwise prepend k at the front (O(1) into a list head). The visualization uses prepend so the most recently inserted key appears first.

get(k) — walk the bucket, return the position of the match or −1.

delete(k) — walk the bucket, splice out the matching node. The other elements of the bucket are unaffected; no tombstones needed.

This is the simplest collision-resolution strategy. It handles collisions by not handling them — different buckets are independent, and within one bucket you just have a linear search that's expected to be very short.

Step By Step

For a table of capacity 5 (so h(k) = k mod 5), starting empty:

  1. insert(7)h = 2. Bucket 2 is [] → prepend → [7].
  2. insert(12)h = 2. Bucket 2 is [7]. Walk: 7 != 12. Prepend → [12, 7]. Two keys, same bucket, no problem.
  3. insert(3)h = 3. Bucket 3 is [] → prepend → [3].
  4. insert(17)h = 2. Walk bucket 2: 12 != 17, 7 != 17. Prepend → [17, 12, 7].
  5. get(12)h = 2. Walk: position 0 is 17, not a match. Position 1 is 12 → return 1.
  6. delete(12)h = 2. Walk, find at position 1, splice → bucket is [17, 7].

In the visualization, the array of buckets is drawn vertically. Each bucket extends to the right as a chain of small nodes — the longer the chain, the more collisions live in that bucket. The probe walk travels along the chain, lighting up each node as it's compared.

Complexity

Let α = n / capacity be the load factor (average bucket length when the hash function is good).

OperationAverageWorst
insertO(1 + α)O(n)
get / deleteO(1 + α)O(n)

The 1 + is the hash computation; the α is the expected walk length within a bucket. With a good hash function, α is small even for large tables — a load factor of 1.0 (table 100% loaded by count) is still fine for chaining, unlike open addressing.

Worst case: every key collides into the same bucket. The table degenerates into a single long list, and every operation is O(n). Real hash functions plus randomized keys make this exceedingly rare, but it is the threat that motivates hash flooding attacks — an adversary picks input keys that all collide, forcing your server's hash table into O(n²) behavior. Defense: randomized hash seeds.

Space: O(capacity + n). The bucket array is always O(capacity), and n total entries are distributed across the chains.

Edge Cases

  • Inserting a key already present — the displayed code skips the insert (set semantics). For a dict where you want to update the value, the same walk-then-decide pattern applies; you'd update in place when found.
  • Inserting into an empty bucket — prepend to an empty list is the same operation as into a non-empty one. No special-case.
  • Deleting from an empty bucket — the for loop iterates zero times, return False.
  • Deleting a key not in the bucket — walk to end, return False.
  • All keys collide into one bucket — every operation becomes a linear walk over n items. The table is effectively a list at this point.

Common Mistakes

  • Confusing chain length with table size. Chains are local. A table with capacity 1000 holding 5000 items has an average chain length of 5 — perfectly healthy. Don't compare n against capacity the way you would for open addressing; the load factor threshold for chaining is much higher (typical resize triggers around α > 1.0 rather than α > 0.75).
  • Choosing too-small capacity. If capacity stays small while keys grow, all buckets get long and you've reinvented a linked list with extra overhead. Resize when chains start exceeding a small constant (say, α > 1 or max chain length > 8).
  • Walking the chain in the wrong direction for the operation. Prepend-on-insert is O(1); walking from head on get is O(chain length). If your workload is "the most recent inserts are most likely to be queried", prepend gives a cache-like benefit. If recency doesn't matter, append works equally.
  • Forgetting the in-bucket equality check on insert. Skipping it means inserting a duplicate key creates two entries; later get returns whichever the walk finds first; later delete removes only one. The contract becomes ill-defined.
  • Using a poor hash function. k mod capacity is fine for the visualization but lousy in practice: if your keys are all multiples of a common factor that shares a divisor with capacity, you get a bucket distribution that's anything but uniform. Real hash functions (Python's hash, MurmurHash, etc.) scramble the bits first.
  • Treating chaining and open addressing as interchangeable. They trade off differently. Chaining is simpler, tolerates higher load factors, and has no tombstone bookkeeping — but pays for chain allocation (pointers, cache misses). Open addressing keeps everything in one contiguous array, which is faster on modern CPUs when the load factor stays moderate. Pick based on memory pattern, not by habit.