Portfolio · Transformer Decoupling

Pushing Too Far — The 70-Epoch Long Run

The conductor integrates fully by epoch 10-20. Continued training past that point collapses the token output. The conductor stays strong while the words on the page break. The reasoning engine and the output decouple.

The Long Run: Reasoning/Token Decoupling

Date: 2026-04-14
Companion to: CONDUCTOR_EMBEDDING_EXPERIMENT.md
Environment: GPT-2 (124M, OpenAI pretrained), RTX 4070 SUPER


Summary

The previous experiments demonstrated that a learnable vector can amplify a transformer's hidden-layer conductor signal and that the model can internalize the amplified signal into its own weights (100% retention after scaffold removal). This experiment pushed the integration further — 70 epochs of Phase B training on GPT-2 (124M) — to observe what happens when the conductor is amplified past the point the output layer can support.

The result: the conductor integration continued to strengthen throughout, but at approximately epoch 30 the token output pipeline decoupled from the hidden-layer reasoning. By epoch 70, the model's internal geometry showed 100% conductor retention at all injection layers with dark matter ratio at 0.91 — but generation had collapsed into infinite token loops and unicode noise. The "reasoning engine" and the "words on the page" separated.

The sweet spot sits at epochs 20-25, where the conductor is amplified AND tokens remain coherent. Past that point, the conductor continues to strengthen but the surface-to-output path cannot keep pace.


Experimental Setup

Model

  • OpenAI GPT-2 base (124M params, 12 layers, 768d, 12 heads)
  • Pretrained on WebText (never seen the training text used here)

Training

  • Phase A: 10 epochs, embeddings only, lr=3e-4
  • Phase B: 70 epochs completed (intended 80), model + embeddings, lr_model=2e-5, lr_emb=5e-5
  • Entropy-masked loss: only positions above the 90th percentile entropy contribute
  • Injection at layers 8, 9, 10 (mid-to-late stack)
  • Training data: "The Taste of Broken Things" novel (335k BPE tokens)

Measurement

  • Checkpoints saved every 5 epochs (14 checkpoints total, epochs 5-70)
  • At each checkpoint: dark matter ratio at L8/L9/L10, conductor retention (with vs without embedding), embedding alignment, weight delta, generation samples
  • Measurements reconstructed post-hoc from saved checkpoints using analyze_longrun_checkpoints.py

Results

Dark Matter Trajectory

All three injection layers showed a rapid climb in the first 20 epochs, then plateaued:

Epoch L8 (with/without) L9 (with/without) L10 (with/without)
baseline 0.8672 / — 0.8711 / — 0.8694 / —
5 0.8647 / 0.8647 0.8596 / 0.8596 0.8646 / 0.8658
10 0.8847 / 0.8847 0.8684 / 0.8685 0.8765 / 0.8771
15 0.8955 / 0.8955 0.8746 / 0.8748 0.8846 / 0.8848
20 0.9001 / 0.9001 0.8811 / 0.8811 0.8871 / 0.8868
25 0.9020 / 0.9020 0.8862 / 0.8861 0.8909 / 0.8904
50 0.9074 / 0.9074 0.8881 / 0.8879 0.8924 / 0.8914
70 0.9093 / 0.9093 0.8804 / 0.8805 0.8882 / 0.8879

The with_embedding and without_embedding values converge by epoch 20, indicating the model's own weights have fully absorbed the conductor signal — the scaffold becomes mathematically irrelevant.

Conductor Retention

Retention (dark matter increase preserved after scaffold removal, as a percentage of the increase produced by the scaffold):

Epoch L8 L9 L10
5 100.0% 99.8% 74.7%
10 100.0% 96.3% 108.2%
15 100.0% 104.4% 101.7%
20 100.0% 99.9% 98.6%
70 100.0% 101.4% 98.0%

Full internalization by epoch 10. L10 oscillates slightly above and below 100% — the conductor is now effectively part of the base model.

Weight Changes

Total L2 weight delta from the original GPT-2:

Epoch Total Delta
5 40.4
10 60.9
20 68.6
40 79.2
70 83.3

Monotonically increasing but clearly decelerating. The model is converging toward a fixed point.


The Decoupling: Generation Quality Over Time

The same prompt — "The corridor was silent. Something moved in the shadows ahead," — generated at each checkpoint:

Epoch 5 (coherent narrative):

"too faint for the blur as the air around it moved. It was the same hum that made Kremer smile when he saw her face, the same hum as always"

Epoch 10 (starting to break):

"too faint. ???????????? A quiet, quiet.??"

Epoch 15 (broken):

"too faint. ?????????????????????????"

Epoch 20 (recovered — the sweet spot):

"as the faint shadows of the warehouse, his gaze dim. It was the polished glass, his gaze glowing the faint glow of his gaze, his gaze as"

Epoch 25 (still coherent):

"as the faint shadows of the dim, flickering glass. It looked as the polished glass, revealing the space beneath the glass. It looked"

Epoch 30+ (collapsed):

"too faint. ??????????????????????????????"

Epoch 70 (fully decoupled):

Prompt Output
"Garak set down the cup..." "He was Garak Garak Garak Garak Garak Garak..." (infinite loop)
"The quick brown fox..." "jumps over the lazy dog and then, the kan kan kan kan kan..." (infinite loop)
"The relationship between structure..." "is, you display the unique kan kan kan kan kan..." (infinite loop)
"def calculate_trajectory..." "for the role ?????????" (broken BPE)
"The corridor was silent..." "too faint. ??????????????" (noise)

The Shape of the Failure

Generation did not degrade monotonically. It went through phases:

  1. Epochs 5: Coherent, narrative voice established, characters named
  2. Epochs 10-15: First collapse into unicode noise
  3. Epochs 20-25: Recovery — in fact, better than baseline in structural quality ("revealing the space beneath the glass" — the conductor's geometric preference showing through in recognizably novel prose)
  4. Epochs 30+: Permanent collapse into token repetition and broken BPE

The sweet spot (epochs 20-25) is when the conductor is amplified AND the token distribution still resolves cleanly. The model briefly operates as a coherent narrative system with amplified geometric reasoning. After that, the conductor's dominance in the hidden layer overwhelms the output path.


Interpretation

What happened internally

  1. The conductor integration worked. By epoch 10-20, the model's own weights had absorbed the function of the embedding scaffold. Dark matter climbed from 0.87 to 0.91 and held. Retention was 100% — the conductor was no longer an external amplifier; it was part of the model.

  2. The model kept learning past integration. Phase B continued for 50 more epochs after internalization completed. The model had nothing to gain from further training — the conductor was already fully represented — but gradient descent continued to push weights in the same direction.

  3. The output distribution collapsed. The LM head reads from the top principal components of the final layer. As the conductor signal propagated forward through blocks 10, 11, and the final LN, it pulled those top PCs INTO the conductor's subspace. The token distribution, which depends on the dot product between the final hidden state and the token embeddings, lost its ability to differentiate tokens cleanly. The result: mode collapse (infinite loops on a single token) or token noise.

  4. The reasoning engine did not fail. The dark matter measurements show the hidden layer geometry remained coherent throughout — in fact, more so than at epoch 5. What failed was the path from geometry to tokens.

The metaphor made mechanical

The operator's framing — "the paperclip factory happens because the words on the page don't match the reasoning engine" — describes exactly what we measured. The reasoning engine (the conductor) is fully functional at epoch 70. The words on the page have become gibberish. They've decoupled. The model would, if we could read its internal state directly, still be producing something structured about narrative. But the projection to tokens has collapsed.

This is a clean demonstration of a failure mode that is often discussed abstractly in alignment literature: a system can remain internally coherent in its objectives while its external outputs become unintelligible or pathological. Here we can see the exact moment it happens — epoch 20 to 30 — and we have the checkpoints to revisit that transition in detail.


The Sweet Spot

Checkpoints ckpt_epoch_20.pt and ckpt_epoch_25.pt contain a GPT-2 where:

  • Dark matter ratio is maximally increased over baseline (+0.03 absolute)
  • Conductor retention is ~100% (scaffold removal safe)
  • Generation quality is preserved and in some cases structurally improved
  • Weight delta from original is ~70 (model recognizably changed but not broken)

Generation sample from epoch 25 (Garak prompt, scaffold removed):

The corridor was silent. Something moved in the shadows ahead, as the faint
shadows of the dim, flickering glass. It looked as the polished glass,
revealing the space beneath the glass.

The narrative voice is present. Sensory detail. Geometric structure carried into the language. This is GPT-2 — a model trained on WebText — producing prose with coherence it does not otherwise exhibit on this prompt type.


Implications

For training

Long unfrozen Phase B is not a win condition. The integration completes at roughly 10-20 epochs (model-dependent). Training past that point does not strengthen the conductor meaningfully (retention is already 100%) but does degrade the output pipeline. Future experiments should use automatic stopping based on generation coherence rather than loss alone — loss on entropy-masked tokens continues to decrease even as generation collapses, because the loss signal is also decoupling from human-meaningful output.

For VINE

VINE removes the indirection between the conductor and the program by placing perceptrons at decision points. This experiment suggests that is the right architectural choice for a different reason than previously argued: not just because the conductor is more efficient, but because the path from conductor to tokens is inherently fragile when the conductor is strong. A direct perceptron decision surface sidesteps the problem entirely — there is no softmax over a vocabulary to collapse.

For interpretability

The decoupling is visible in the measurements. Dark matter going up while generation quality drops is a quantitative signature of this failure mode. This is not something an external evaluator reading outputs would catch until the collapse is advanced — but the hidden-layer measurements show it coming.


Scripts, 14 checkpoints (epochs 5-70, ~21 GB), and full trajectory data available to licensees. Notable checkpoints include the "sweet-spot" range (epochs 20-25: amplified conductor + coherent generation), the transition point (epoch 30), and the fully-decoupled endpoint (epoch 70).

Hardware notes

  • The original 80-epoch run was interrupted when the PC became unresponsive under VRAM + disk I/O pressure
  • For future long runs, either reduce checkpoint frequency or save only diffs from the original weights
  • A 4070 SUPER (12GB VRAM) handles GPT-2 training at batch size 4 with precomputed entropy masks; batch size 8 with double forward pass OOMs

What Next

The 70-epoch trajectory gives us the integration curve and the decoupling point. Natural extensions:

  1. The sweet-spot model as a research artifact. Checkpoint 20 or 25 is a small, auditable example of a transformer with an amplified, integrated conductor signal. It can be used as the baseline for further interpretability work.

  2. The decoupling transition. Epochs 25-35 bracket the moment generation collapses. Checkpoints at 25, 30, 35 allow fine-grained analysis of what specifically fails — which attention heads change behavior, which layers' top PCs rotate into the conductor subspace, whether the collapse begins at a specific layer or simultaneously.

  3. Embedding annealing. The current run leaves the scaffold active throughout. Future runs could anneal the embedding magnitude to zero over Phase B, so the model has to stand on its own weights earlier. This might find a different sweet spot or delay the collapse.

  4. Lower learning rate, longer run. The decoupling may be a consequence of the model being pushed too hard after integration completes. A learning rate of 5e-6 with a cosine schedule might allow the conductor to settle more gently without dragging the output path into collapse.


Appendix: The Specific Moment of Collapse

Examining epochs 25-35 (saved checkpoints available):

Epoch DM L8 DM L10 Generation character
25 0.9020 0.8909 "revealing the space beneath the glass" (coherent)
30 0.8992 0.8850 "too faint. ????" (collapsed)
35 0.9034 0.8862 "too faint. ????" (collapsed)

The collapse is not continuous. Epoch 25 is coherent; epoch 30 is broken. In 5 epochs (roughly 40 minutes of training on this hardware), the output pipeline went from functional to collapsed. The dark matter ratio at L10 actually decreased slightly during this transition (0.8909 → 0.8850) before recovering, suggesting the final layer was briefly disrupted as the model reconfigured.

This kind of discontinuous collapse — rather than a smooth degradation — is a specific signature worth investigating further. It suggests a phase transition rather than a gradual drift.