Transformer Decoupling

A ten-entry sequence measuring a structural signal in the hidden layers of trained transformers that the softmax output mechanism does not read. Documents the signal's existence, its behaviour under training pressure, and its role in catastrophic forgetting, reinforcement-learning self-play collapse, jailbreak vulnerability, and Mixture-of-Experts routing collapse. Begin with the precursor (entry 00) and read in sequence.

  1. 01

    Hexagon Cognition — The Precursor

    Ask a transformer to produce hexagonal output. The hidden layer doesn't build a native hexagon — it glues a triangle and a square together, and delaminates along the seam under perturbation. The first articulation of 'the model is secretly doing something else'.

    conductorarchitecturetransformersprecursor

  2. 02

    The Conductor Exists

    A single learnable vector (512 params) trained on a frozen char-level GPT preferentially aligns with the hidden layer's tail PCs — the conductor subspace — not the token prediction surface. When the model is unfrozen, it internalises the signal. Replicated on GPT-2.

    conductorarchitecturetransformers

  3. 03

    Pushing Too Far — The 70-Epoch Long Run

    The conductor integrates fully by epoch 10-20. Continued training past that point collapses the token output. The conductor stays strong while the words on the page break. The reasoning engine and the output decouple.

    conductordecouplingtransformers

  4. 04

    Catastrophic Forgetting Is Pipeline Decoupling

    The training-domain loss rises while training on the training data — impossible under classical weight-overwriting. Three mechanism tests confirm the geometric signature. Surgical recovery: resetting the LM head restores 62% of 'forgotten' capability.

    conductordecouplingsafety

  5. 05

    Architecture-Universal — Alice & Bob

    The same three decoupling signatures appear in Meta's 2017 Alice & Bob negotiation bots: 70× hidden norm explosion, rank-6 output distribution, complete context-sensitivity collapse. 'Tometometome' is the same mechanism expressed through a different architecture.

    conductordecouplingarchitecturesafety

  6. 06

    Jailbreak Detection Via Geometry

    The decoupling mechanism operates at inference time when adversarial prompts push hidden states into unusual regions. Three cheap metrics separate in-domain from adversarial prompts on both GPT-2 and TinyLlama.

    conductorsafetydecoupling

  7. 07

    Grok Under Pressure — 12 Update Cycles

    Simulating biweekly embedding updates. Full-model updates sign-flip at cycle 3 and oscillate; frozen-head updates drift smoothly. The oscillation doesn't collapse — but it creates periodic geometric-confusion windows.

    conductordecouplingsafety

  8. 08

    MoE Routing Collapse — The Extra Matchstick

    A 4-expert MoE under 12 sequential update cycles: the deepest layer concentrates 93.5% of traffic on one expert by cycle 12. A single-thread architecture disguised as multi-expert. The supposed redundancy is eliminated by the collapse.

    conductordecouplingsafety

  9. 09

    Boring Code — VINE Replacing If/Elif

    A thermostat controller, three ways. If/elif works. Random linear froze the building to 11°C. Geometric settling: 86.5% agreement with if/elif at zero training. A trained linear needs 126 parameters + 100k gradient steps to approximate it — and still hits an architectural ceiling on nonlinear decisions.

    conductorarchitecturevine

  10. 10

    VINE Data Preprocessing — Shaping the Basin Before Training

    Two identical TinyGPT models, same training steps. One receives raw tweets; the other receives tweets preprocessed by VINE's cruncher. Result: 5.1% better validation loss, 22% less wasted conductor energy. The geometry of the data shapes the geometry of the model.

    conductorvinedata