Portfolio · Transformer Decoupling
Hexagon Cognition — The Precursor
Ask a transformer to produce hexagonal output. The hidden layer doesn't build a native hexagon — it glues a triangle and a square together, and delaminates along the seam under perturbation. The first articulation of 'the model is secretly doing something else'.
Hexagon Cognition — The Precursor
The phase of work the Conductor sequence grew out of. Summary only; the full 1,400-line experimental writeup and supporting scripts are withheld.
The Question
A trained neural network is asked to represent something with six vertices — a hexagon in representation space. The hexagon is chosen because it is the lowest regular polygon that cannot be built by overlaying a single simpler regular polygon at a different scale. If the network builds a native hexagon, its hidden layer contains a six-vertex attractor. If it builds a composite, the hidden layer contains a combination of simpler shapes that look like a hexagon from the outside but are structurally something else.
The Finding
Across seeds, kernel ablations, and perturbation regimes, the network did not build a native hexagon. It built a triangle + square composite — two simpler shapes, glued together at a seam, presenting a hexagonal output.
The composite is fragile. Under perturbation, it delaminates along the seam and collapses into its components:
- hexagon → pentagon
- pentagon → square
- square → triangle
- triangle → nothing
The descent is ordered. The network never skips a step. The failure mode of gradient descent is not noise; it is the reverse of its construction, shape by shape.
The Contrast
The same target, given to a geometric settling mechanism, produces a hexagon natively — six vertices, zero deviation from the target shape, recovery in the same number of steps from any perturbation. There is no composite to delaminate because there are no component shapes.
The headline number: gradient descent required millions of parameters and many epochs to reach an accurate-from-the-outside composite. The settling mechanism produced the real shape with a handful of parameters and no training loop.
Why This Is the Precursor
The hexagon work documented, in one controlled setting, the motif that the Conductor sequence generalises:
- A neural network appears to be doing the target task.
- Its internal state is, in fact, doing a structurally simpler task whose output happens to look right.
- Under pressure (perturbation, novel input, additional training, distribution shift), the internal structure reveals itself and the output degrades in a specific, predictable way.
The Conductor sequence re-runs this motif at scale. The hidden layer of any trained transformer contains energy the softmax output cannot read. That energy is structured. Under continued training pressure the energy can be amplified, internalised — and, past a point, the token output will decouple from it in exactly the progressive way the hexagon's composite delaminates. Same failure mode, different shape.
The hexagon was the first experimental frame in which the structural cost of gradient descent's geometry was measurable and unambiguous.
What Is Not In This Summary
The original writeup covers eight experimental phases including seed-variance analysis, kernel-trick ablations, modulo and Chinese-Remainder-Theorem analysis of polygon cognition, shadow-vertex probes on GPT-2 and a small transformer, a novelty-collapse test, a sentinel prototype, and geometrically- constrained training experiments. Each is reproducible; the scripts, checkpoints, and result artefacts are archived.
That material is available under licence. The headline finding summarised above is the load-bearing claim that the Conductor sequence builds on.
Raychell Langan · NEXICOG Ltd · Hampshire, UK