Portfolio · Mechanism

The Conductor

Every transformer's hidden layers build a geometric structure the output layer can't see. A single vector can find it. The model can learn to listen.

The one-line version

A transformer's hidden layers carry far more structured energy than the output layer uses. That unused structure is the conductor — real, measurable, and reachable with a single learnable vector.

Why this matters

The whole modern AI conversation happens at the output layer. Alignment is done through the output. Interpretability is done through the output. Evaluation is done through the output.

But the geometry that produces the output lives one step earlier — in the residual stream, across layers the softmax never sees. Run PCA on a mid-stack layer of any trained transformer and you will find that the top few components (the ones the output layer reads) carry a fraction of the total energy. The rest — stable across layers, structured, consistent across inputs — is doing work that never shows up in the next token.

We have been training, evaluating, and aligning at the wrong surface.

What this buys you

  • A measurement outside the output. You can see that a model's reasoning has shifted even when its words haven't. And vice versa.
  • A way to shape behaviour without retraining. Point a vector at the conductor, train it with standard next-token loss at uncertain positions, and the model learns to listen. Unfreeze the weights and the signal transfers into them. Remove the vector and it stays.
  • A failure mode with a name. Push the conductor amplification too far and the reasoning layer and the token layer decouple — the hidden state keeps organising coherently while the output collapses into unicode noise and token loops. A clean, reproducible measurement of a system whose internals are fine and whose outputs are pathological. Alignment's real shape, with a graph under it.
  • A second route into interpretability. The tail PCs are structured. They can be read. They are not the model's afterthoughts; they are where the model's reasoning lives.

What doesn't change

The architecture. The parameter count. The training pipeline. The softmax. Nothing is being added to the model. The conductor was there the whole time — built by gradient descent, the same way the visible surface was built, during ordinary training. The only change is that someone finally pointed at it.

[experiment code and full method in CONDUCTOR_EMBEDDING_EXPERIMENT.md — runnable in eighteen minutes on a consumer GPU]

What you can see in the experiments

  • One vector, 512 parameters, frozen model. Trained with standard next-token loss on positions where the model is uncertain. The vector preferentially aligns with the tail PCs — the conductor — not the prediction surface. Dark matter ratio rises at mid-stack layers.
  • Three vectors at adjacent layers. They differentiate without being told to: one does surface adjustment, one bridges, one amplifies the conductor directly. The emergent division of labour is the clearest indicator that the structure was already there waiting.
  • Scaffold → internalise → remove. Train the vector, unfreeze the model, train both together, then delete the vector. The signal survives. At layer 7 the conductor was louder after the scaffold came off than with it on. 131% retention.
  • The same mechanism on GPT-2 (124M, OpenAI pretrained). A model that has never seen this training data found the same structure in its own hidden layers. Its prose on an out-of-domain prompt shifted from generic repetition to narrative with sensory detail. The conductor geometry learned from a novel was now shaping a pretrained model's output distribution.
  • The long run. Seventy epochs. Sweet spot at 20–25. Past that, the decoupling failure mode appears with a clear graph.

What this isn't

Not noise. Not a training artefact. Not the prediction surface. Not epiphenomenal. Load-bearing geometry, measurable in any trained transformer, and orthogonal to everything the field currently measures.

The model didn't gain anything. It learned to use what it already had.