Transformer Decoupling

10 entries · read in order

A ten-entry sequence measuring a structural signal in the hidden layers of trained transformers that the softmax output mechanism does not read. Documents the signal's existence, its behaviour under training pressure, and its role in catastrophic forgetting, reinforcement-learning self-play collapse, jailbreak vulnerability, and Mixture-of-Experts routing collapse. Begin with the precursor (entry 00) and read in sequence.

01

Hexagon Cognition — The Precursor

Ask a transformer to produce hexagonal output. The hidden layer doesn't build a native hexagon — it glues a triangle and a square together, and delaminates along the seam under perturbation. The first articulation of 'the model is secretly doing something else'.

conductor architecture transformers precursor
02

The Conductor Exists

A single learnable vector (512 params) trained on a frozen char-level GPT preferentially aligns with the hidden layer's tail PCs — the conductor subspace — not the token prediction surface. When the model is unfrozen, it internalises the signal. Replicated on GPT-2.

conductor architecture transformers
03

Pushing Too Far — The 70-Epoch Long Run

The conductor integrates fully by epoch 10-20. Continued training past that point collapses the token output. The conductor stays strong while the words on the page break. The reasoning engine and the output decouple.

conductor decoupling transformers
04

Catastrophic Forgetting Is Pipeline Decoupling

The training-domain loss rises while training on the training data — impossible under classical weight-overwriting. Three mechanism tests confirm the geometric signature. Surgical recovery: resetting the LM head restores 62% of 'forgotten' capability.

conductor decoupling safety
05

Architecture-Universal — Alice & Bob

The same three decoupling signatures appear in Meta's 2017 Alice & Bob negotiation bots: 70× hidden norm explosion, rank-6 output distribution, complete context-sensitivity collapse. 'Tometometome' is the same mechanism expressed through a different architecture.

conductor decoupling architecture safety
06

Jailbreak Detection Via Geometry

The decoupling mechanism operates at inference time when adversarial prompts push hidden states into unusual regions. Three cheap metrics separate in-domain from adversarial prompts on both GPT-2 and TinyLlama.

conductor safety decoupling
07

Sequential Embedding Updates — 12 Cycle Simulation

Simulating biweekly embedding updates. Full-model updates sign-flip at cycle 3 and oscillate; frozen-head updates drift smoothly. The oscillation doesn't collapse — but it creates periodic geometric-confusion windows.

conductor decoupling safety
08

MoE Routing Collapse — The Extra Matchstick

A 4-expert MoE under 12 sequential update cycles: the deepest layer concentrates 93.5% of traffic on one expert by cycle 12. A single-thread architecture disguised as multi-expert. The supposed redundancy is eliminated by the collapse.

conductor decoupling safety
09

Boring Code — VINE Replacing If/Elif

A thermostat controller, three ways. If/elif works. Random linear froze the building to 11°C. Geometric settling: 86.5% agreement with if/elif at zero training. A trained linear needs 126 parameters + 100k gradient steps to approximate it — and still hits an architectural ceiling on nonlinear decisions.

conductor architecture vine
10

VINE Data Preprocessing — Shaping the Basin Before Training

Two identical TinyGPT models, same training steps. One receives raw tweets; the other receives tweets preprocessed by VINE's cruncher. Result: 5.1% better validation loss, 22% less wasted conductor energy. The geometry of the data shapes the geometry of the model.

conductor vine data