Hash Table — Open Addressing (Linear Probing)

Intuition

A hash table stores keys in an array. To find a key's slot, you run it through a hash function that turns the key into an array index — h(k) = k mod capacity. In the best case, every key lands in a unique slot and every lookup is one read: O(1).

The problem is collisions — two different keys hashing to the same slot. Open addressing's answer: if the slot is taken, probe to the next slot. And the next. Until you find an empty one (for insertion) or the key you're after (for lookup). Everything stays in the original array — no side chains, no extra allocations.

Linear probing is the simplest probe sequence: just step one slot at a time, wrapping around at the end. It's CPU-cache-friendly because consecutive slots live in adjacent memory, which makes it fast in practice — but it's also the variant most prone to clustering, where long runs of occupied slots stretch the average probe length.

How It Works

The table is a fixed-size array of slots. Each slot is in one of three states:

  • Empty — never held a key (or has been formally reset).
  • Occupied — holds a real key.
  • Tombstone — once held a key, was deleted. The slot is logically empty but physically marked, because a real "empty" here would break lookups for keys that probed past this slot.

Three operations, all sharing the same probe loop starting at i = hash(k):

insert(k) — probe forward. On each slot:

  • If empty or tombstone or already holding k → write k here, done.
  • Otherwise → step i = (i + 1) mod cap and continue.

After cap steps without finding a writable slot → overflow.

get(k) — probe forward. On each slot:

  • Empty → key not present, return −1.
  • Occupied with matching key (and not a tombstone) → return the slot index.
  • Anything else (tombstone, occupied with non-matching key) → continue probing.

delete(k) — same probe. Replace the matched slot with a tombstone, not with empty. The tombstone is the key insight: if you reset the slot to empty, a subsequent get(k') for any key that originally hashed earlier and probed past this slot would now short-circuit on the empty cell and incorrectly report "not found". The tombstone says "keep probing, this slot used to be on a chain".

Step By Step

For a table of capacity 7 (so h(k) = k mod 7), starting empty:

  1. insert(10)h = 3. Slot 3 is empty → write 10 there.
  2. insert(20)h = 6. Slot 6 empty → write 20.
  3. insert(17)h = 3. Collision with 10. Probe slot 4 → empty → write 17 there.
  4. insert(24)h = 3. Collide with 10. Probe 4 (has 17), 5 → empty → write 24 there.
  5. get(17)h = 3. Slot 3 has 10, not a match → probe slot 4 → 17 matches → return 4.
  6. delete(10)h = 3. Slot 3 matches 10 → mark slot 3 as tombstone.
  7. get(17)h = 3. Slot 3 is a tombstone, not empty — continue probing. Slot 4 has 17 → return 4. (If slot 3 had been reset to empty instead, this lookup would have returned −1 — wrong.)
  8. insert(31)h = 3. Slot 3 is a tombstone → write 31 there (tombstones are reclaimable on insert).

In the visualization, slots are drawn in a row; the active probe sequence lights up as the cursor steps from h(k). Tombstones get their own muted color to distinguish them from empty cells.

Complexity

Let α (load factor) be n / cap — fraction of slots that are filled.

OperationAverageWorst
insert / get / deleteO(1) when α is smallO(n) when α → 1

The expected probe length for linear probing is roughly 1 / (1 − α). At 50% full, that's 2 probes. At 90% full, 10 probes. At 99% full, 100 probes. Open-addressing tables must resize before they fill up. A common rule of thumb: rehash everything into a double-sized table when α > 0.75.

Space: O(cap). The table is the only allocation — no per-key overhead, no pointers. This is open addressing's main advantage over chaining: tighter memory layout, better cache behavior, no per-entry allocation.

Edge Cases

  • Inserting a key that's already present — the displayed code treats "key match" the same as "empty" in the insert probe and overwrites the slot. That's the right behavior for set semantics; for a multimap you'd need a different design.
  • Tombstones accumulate. A workload of insert/delete cycles can fill the table with tombstones even though few keys are actually present. Lookups slow down because they still have to walk past tombstones. Resizing rehashes only real keys, which is also the cleanup mechanism for tombstones.
  • Deleting a missing key — probe walks until an empty cell is found, return False. The cap on probe length (one full lap) guarantees this terminates even if the table has no empty cells.
  • Inserting into a full table — after probing cap times without finding a writable slot, raise overflow. Real implementations resize before this happens.
  • Hash function collisions on every insert — pathological worst case. Every operation becomes O(n). Real hash functions don't hit this on real data, but adversarial input can.

Common Mistakes

  • Deleting by resetting to empty. The classic bug. Any key that hashed earlier and probed past the deleted slot becomes unreachable. Tombstones exist only to prevent this. Some implementations avoid tombstones with back-shift deletion (slide subsequent entries leftward to fill the hole), but that only works for linear probing — and it's tricky to implement correctly.
  • Forgetting the modular wrap. Probing must wrap around: i = (i + 1) mod cap. Without the modulo, large keys whose initial hash is near the end of the table walk off the array.
  • Not resizing. Open addressing degrades catastrophically above ~75% load. Production tables monitor the load factor and double the table size when it crosses a threshold.
  • Treating tombstones as occupied on insert. The insert path should write over a tombstone — that's how the table reclaims space. The lookup path must skip tombstones, but the insert path must use them.
  • Counting tombstones toward capacity for overflow checks. The displayed code's probe-cap of cap iterations is correct because the worst case is "every slot is occupied or tombstone, none match". A more efficient implementation tracks occupied + tombstone count separately from occupied count, and triggers resize on the former.