Portfolio · Computing Lineage

Floor 5 — Shannon: Information as a Bit, and the Geometry We Forgot to Name

2026-04-15 · tower information continuity

Claude Shannon gave us a mathematics of information that is, strictly speaking, continuous. We simplified it into bits and built an industry. The continuous reading is still sitting there, in the same paper, unused.

The floor

In October 1948, Claude Shannon published A Mathematical Theory of Communication. It was two papers, really, folded into one: a theory of how to encode messages so they survive noisy channels, and a theory of what it means for something to carry information at all. It created the field of information theory. It gave the world the word bit.

It is one of the most cited scientific papers of the twentieth century, and almost everyone who cites it has simplified it.

What was picked

Information as a count of bits. How many yes/no questions does it take to pick out one message from a set of possible messages. Entropy as a logarithm. Redundancy as the gap between actual and optimal. The channel capacity as the most bits-per-second you can get through a pipe without error.

This framing won. It gave us compression (ZIP, MP3, JPEG), error correction (every hard drive, every modem, every satellite), and the vocabulary that every engineer uses when they say a channel "carries information."

It is a powerful and correct mathematics. Nothing on this floor is wrong.

What could have been picked

Shannon's own math is continuous. The entropy formula, H = −Σ p log p, is a function over a probability distribution — a smooth object living on a simplex, with every point on it meaningful. Mutual information is a continuous overlap between two distributions. Relative entropy (KL divergence) is a continuous distance between them. None of this requires bits. Bits are the limit case you get when the distribution collapses onto a discrete alphabet.

Read with fresh eyes, Shannon's theory is a geometry of belief. Information is not a count. Information is where you are on a probability surface, and learning something is moving on that surface. Mutual information between two systems is the degree to which their surfaces fold together.

The continuous reading was always there. Jaynes picked it up in the maximum-entropy literature. Information geometry, as a field, was founded on it. But the popular understanding — the one that shaped the chips and the protocols and the training objectives — stayed with bits.

Bits are easier to photocopy. Again.

What we missed

An information theory in which meaning is position, not enumeration. A measure of "how much one system knows about another" that doesn't have to pass through a discrete alphabet. A training objective for machines that could be expressed as move toward the position on the belief surface that best accounts for the observation, instead of minimise the cross-entropy against a one-hot target.

Every one-hot target is a confession that we picked the bits. Every softmax is a confession that we picked the bits. Every "maximum likelihood classifier" is a confession that we picked the bits. The continuous reading of Shannon's own math would have let us train systems against positions on manifolds from the very beginning.

This is perhaps the most painful floor, because the alternative wasn't even on a different desk. It was in the same paper. Nobody had to invent anything. Nobody had to wait for the hardware. Shannon wrote the continuous form and the discrete form side by side and the field chose one.

What the next floor will ask

If information is a geometry, what would a machine that learned look like?

That's Floor 6, and it's where the next forty years of heartbreak started.