MNIST Digit Classifier

Intuition

This is the same MNIST-trained neural network as Forward Propagation, but instead of watching one curated image march through it beat by beat, you drive it. Draw a digit on the grid and the network classifies it live — the hidden neurons light up, the connections carry the signal, and the ten output probabilities re-settle on every stroke. There is no timeline and no scrubber: the picture you draw is the input, and what you see is a real forward pass through real trained weights, running in your browser.

The point you can feel here, that a fixed walkthrough can't show, is how the network behaves at the edges — when your 7 looks a bit like a 1, when your 4 is open at the top, when a stroke lands off to one side. The network is often right, sometimes wrong, and always honest about its confidence.

How It Works

Everything is identical to forward propagation — 784 → 25 → 25 → 10, two ReLU hidden layers, a softmax over ten digits. Each neuron computes a weighted sum z = w·a + b (shown dark for positive, blue for negative) and the output layer turns its raw scores into probabilities. The biggest probability, ringed in green, is the guess.

Two things happen before your drawing reaches the network:

  • Centering. Real MNIST images aren't just any 28×28 picture — each digit is scaled to fit a 20×20 box and then centered by its center of mass. The network only ever saw digits framed that way, so your drawing is put through the same step before the network reads it. The grid shows your drawing as you made it; the network works on a re-centered, re-scaled copy — which is why even an off-center or undersized scrawl can still be recognized.
  • Downsampling. You draw at a higher resolution than 28×28; the strokes are shrunk down to the 784 pixels the network reads, with smooth (anti-aliased) edges, just like the original dataset.

What To Try

  1. Draw a clear digit. Trace a bold, centered 3 or 7. Watch the hidden columns flicker as you draw, and the winning output fill in. It should land on the right answer with high confidence.
  2. Make it ambiguous. Draw a 1 that leans, or a 7 with a heavy top bar. Watch the probability split between two digits — the runner-up rising as the winner falls. That split is the network's uncertainty.
  3. Break it. Draw small, draw in a corner, draw a sloppy loop. Some inputs the network gets confidently wrong. Compare against the sample buttons, which load real MNIST test images it classifies correctly.

Complexity

Each classification is one forward pass — about 784·25 + 25·25 + 25·10 ≈ 20,475 multiply-adds, O(connections), no loops or backtracking. It's fast enough to re-run on every pointer move, which is why the network reacts in real time as you draw.

Edge Cases

  • Off-center or tiny drawings. Without the centering step these would be out-of-distribution and misclassified for a boring reason (the input doesn't look like training data). Centering removes that excuse — what's left is the network's genuine judgment.
  • Blank canvas. With no ink there's nothing to classify; the output shown is just the network's response to all-zeros, not a real prediction.
  • Confident mistakes. A network can be very sure and wrong. Softmax reports confidence, not correctness.

Common Mistakes

  • Expecting production accuracy. This is a deliberately tiny network with no convolutions — it has no built-in sense of shape or translation beyond what these 20,000 weights captured. Real digit recognizers are far larger.
  • Blaming the drawing tool for every miss. Sometimes the input really is ambiguous. A human glancing at your stroke might hesitate too — the split probabilities are often reasonable.
  • Reading the hidden neurons literally. A single hidden unit lighting up rarely means anything nameable. Only the output layer is interpretable.

A Note on Simplification

The network here is a two-hidden-layer MLP (25 + 25 units) trained on MNIST, and only a representative subset of its connections is drawn. Production digit recognizers add convolutional layers, normalization, regularization and far more neurons. What's shown — weighted sums, a nonlinearity, softmax, run for real on your own input — is exactly the mechanism those bigger networks scale up. Treat it as an explainer you can poke, not a substitute for a real model.