Portfolio · Transformer Decoupling
The Conductor Exists
A single learnable vector (512 params) trained on a frozen char-level GPT preferentially aligns with the hidden layer's tail PCs — the conductor subspace — not the token prediction surface. When the model is unfrozen, it internalises the signal. Replicated on GPT-2.
The Conductor Embedding Experiment
Date: 2026-04-14
Environment: Python 3.x, PyTorch 2.4.1+cu121, NVIDIA RTX 4070 SUPER
Summary
A single learnable vector (512 parameters), trained via standard next-token prediction loss on high-entropy positions, preferentially converges toward the geometric subspace of a transformer's hidden layers — not the token prediction surface. When the model is subsequently unfrozen and trained alongside this vector, the model's own weights internalize the geometric signal. The vector can then be removed entirely, and the signal persists at 100%+ retention. This was demonstrated on both a custom 25.5M char-level GPT and OpenAI's pretrained GPT-2 (124M).
Background
The dark matter in transformer hidden layers
PCA on the residual stream of a trained transformer reveals a consistent pattern: the top few principal components (which drive token prediction via the LM head) carry only a fraction of the total activation energy. The remainder — the tail of the PC spectrum — holds structured, stable energy that does not directly participate in output generation.
In a 25.5M parameter char-level GPT trained on narrative text (8 layers, 512d), the standing wave probe (standing_wave_probe.py) measured:
- Tail energy = 3.0x visible energy at mid-stack layers (5, 6, 7)
- The tail energy is distributed across many PCs (not concentrated in a few spikes)
- The distribution is stable across layers — identical curves at layers 5, 6, and 7
This is not noise. Noise would not be stable across layers, would not be structured, and would not be consistent across different input passages.
We call this structured tail energy the conductor — a geometric signal that gradient descent created during training but that the softmax output layer cannot access.
The question
Can this geometry be deliberately engaged? If a learnable vector is added to the residual stream and trained with nothing but standard NTP loss, will gradient descent find the conductor — and if so, can the model learn to listen to it on its own?
Hypothesis
-
A learnable embedding vector, trained via NTP loss on positions where the model is most uncertain (high-entropy / tiebreak positions), should converge toward the conductor's subspace — because at these positions, the prediction surface provides no useful gradient signal, and the only remaining structure to exploit is the hidden layer geometry.
-
If the model weights are then unfrozen and trained alongside the embedding, the model should internalize the conductor signal into its own parameters.
-
The embedding scaffold should then be removable without losing the conductor.
Experimental Setup
Models
| Model | Params | Layers | Hidden dim | Heads | Vocab | Training data |
|---|---|---|---|---|---|---|
| TTOBT | 25.5M | 8 | 512 | 8 | 93 (char) | "The Taste of Broken Things" (novel, 1.35M chars) |
| GPT-2 | 124.4M | 12 | 768 | 12 | 50,257 (BPE) | WebText (pretrained by OpenAI) |
Embedding
- One
nn.Parameter(torch.zeros(hidden_dim))per injection layer - Injected by adding to the residual stream after the specified transformer block
- Initialized with small random noise (std=0.001)
Training
- Loss: Standard cross-entropy (next-token prediction)
- Entropy filtering: Only positions where the frozen model's output entropy exceeds the 90th percentile. These are "tiebreak" positions where the model is most uncertain and surface statistics provide the least guidance.
- Optimizer: Adam (embeddings), AdamW (model weights when unfrozen)
- Learning rates: Embedding 1e-4 to 3e-4; model weights 1e-5 to 3e-5
Three Phases
- Phase A (Scaffold): Model weights frozen. Only the embedding vectors are trained. The embedding finds a direction in hidden space that helps resolve tiebreaks.
- Phase B (Internalization): Model weights unfrozen. Both embedding and model weights train together. The model's own parameters absorb the conductor signal.
- Phase C (Scaffold Removal): The embedding vectors are removed entirely. The model runs on its own modified weights. Measurement: does the conductor signal persist?
Measurement
- PC alignment: Cosine similarity of the learned embedding with each principal component of the residual stream (calibrated from 200-500 random book passages). Tail PCs (indices 3+) represent the conductor subspace; visible PCs (indices 0-2) represent the prediction surface.
- Dark matter ratio: Fraction of total residual stream energy NOT explained by the top-3 PCs. Higher = more energy in the conductor subspace.
- Conductor retention:
(C_dark_matter - baseline) / (B_dark_matter - baseline) * 100. 100% means the conductor fully survived scaffold removal.
Results
Experiment 1: Single Embedding on Frozen TTOBT Model
File: conductor_embedding.py
Config: 1 embedding at layer 0, 30 epochs, lr=1e-4, entropy p75
The embedding grew from ||0.001|| to ||0.61|| on a smooth asymptotic curve. It settled into a stable direction.
Key finding: The embedding preferentially aligned with tail PCs over visible PCs at every mid-stack layer, for the entire training run.
| Metric | Value |
|---|---|
| Mean tail PC cosine | 0.040 |
| Max tail PC cosine | 0.116 |
| Mean visible PC cosine | 0.030 |
| Dark matter shift (L5) | +0.0078 |
| Dark matter shift (L6) | +0.0058 |
| Dark matter shift (L7) | +0.0033 |
The embedding found the conductor, not the softmax surface. When given freedom to point anywhere in 512 dimensions, NTP gradient descent pushed it toward the dark matter.
Plot: conductor_embedding.png
Experiment 2: Three Embeddings at Mid-Stack
File: conductor_embedding_v2.py
Config: 3 embeddings at layers 5/6/7, 40 epochs, lr=3e-4, entropy p90
Each embedding differentiated by role:
| Layer | Magnitude | Tail alignment | Visible alignment | Behavior |
|---|---|---|---|---|
| 5 | 2.3 | |||
| 6 | 1.9 | |||
| 7 | 1.0 |
Layer 7 — closest to the output — had the highest tail-to-visible alignment ratio. The embedding nearest the surface found the conductor.
Shadow trace comparison (dark matter heatmaps, base vs +embedding) showed structural changes in the banding pattern at mid-stack layers. The embedding redistributed energy, creating sharper rhythmic patterns.
Plot: conductor_embedding_v2.png
Experiment 3: The Phase Transition (TTOBT)
File: conductor_embedding_v3.py
Config: 3 embeddings at layers 5/6/7, Phase A: 15 epochs lr=3e-4, Phase B: 20 epochs model lr=3e-5
Dark Matter Comparison
L5 L6 L7
baseline (original) 0.8769 0.9038 0.9059
A: +embedding 0.8769 0.9033 0.9038
B: unfrozen+emb 0.8986 0.9201 0.9194
C: scaffold removed 0.8986 0.9216 0.9237
Conductor Retention
Layer 5: 100.0%
Layer 6: 108.9%
Layer 7: 131.7%
The conductor did not merely survive scaffold removal — it got louder. At layer 7, the dark matter ratio increased when the embedding was removed.
Weight Changes
Weight L2 deltas increased monotonically from block 0 (5.26) to block 7 (7.96). The injection layers (5, 6, 7) showed the largest changes. The conductor signal propagated forward through the transformer.
Generation
| Mode | Sample output (Julian prompt) |
|---|---|
| Original | "we do what I know it's earned to say the dirt slightly regenerator" |
| Modified (no embedding) | "Welcome you moving far long that, Doctor, perhaps you are ready me off." Garak's lips twitched, a faint smile tugging at his lips. |
The modified model produces narrative structure — character interaction, emotional beats, scene-appropriate language — that the original could not.
Plot: conductor_embedding_v3.png
Experiment 4: GPT-2 (124M, OpenAI Pretrained)
File: conductor_gpt2.py
Config: 3 embeddings at layers 8/9/10, Phase A: 10 epochs, Phase B: 10 epochs model lr=1e-5
Dark Matter Comparison
L8 L9 L10
baseline 0.8745 0.8780 0.8705
A: +embedding 0.8745 0.8778 0.8698
B: unfrozen+emb 0.8640 0.8632 0.8626
C: scaffold off 0.8640 0.8635 0.8647
Conductor Retention
Layer 8: 100.0%
Layer 9: 97.4%
Layer 10: 74.5%
Weight Changes
block 0: L2 = 2.16
block 1: L2 = 2.04
...
block 8: L2 = 2.47 <-- injection
block 9: L2 = 2.57 <-- injection
block 10: L2 = 2.81 <-- injection
block 11: L2 = 3.07
Monotonically increasing toward the output. The conductor propagated forward.
Generation
OOD prompt ("The quick brown fox..."):
| Mode | Output |
|---|---|
| Original GPT-2 | "jumps onto the back of the dog, making a run for it, making it fall off of the back of the dog (which will then fall over the lazy dog, creating a big head wound)" |
| Conducted GPT-2 | "pressing his paws against it, slaps it across the throat, spreading it wide. The fur grows soft against his fingers as his face lights up. His ears arched in a wince. Unable to stop his tears..." |
GPT-2 transformed from repetitive generic text to narrative prose with sensory detail, emotional interiority, and character embodiment — on a prompt about a fox. The conductor geometry learned from the novel is now shaping GPT-2's output distribution.
Abstract prompt ("The relationship between structure and meaning is..."):
| Mode | Output |
|---|---|
| Original GPT-2 | Generic essay prose about growing up |
| Conducted GPT-2 | One sentence, then pivoted to "The Bajoran Federation" and Romulan Dominion war |
The conductor pulled GPT-2 into the narrative domain.
Plot: conductor_gpt2.png
Key Observations
-
The embedding finds the conductor without being told it exists. No loss function targets the tail PCs. No regularization pushes toward the conductor subspace. Standard NTP gradient descent, applied only at high-entropy positions, converges there because there is nothing else to find.
-
The mechanism is tiebreak resolution. At positions where the model's token predictions are maximally uncertain, the only remaining structure to exploit is the hidden layer geometry. The embedding learns to use it.
-
The conductor exists in pretrained models. GPT-2 was trained on WebText by OpenAI. It has never seen TTOBT. Yet the same geometric structure exists in its hidden layers, and the same embedding mechanism engages it.
-
Internalization survives scaffold removal. Once the model's weights absorb the conductor signal (Phase B), removing the embedding does not destroy it. In the TTOBT model, retention exceeded 100% — the conductor got louder without the scaffold.
-
Weight changes concentrate at injection layers and propagate forward. The model learns where the conductor lives and adjusts downstream layers to carry the signal to the output.
-
Multiple embeddings spontaneously differentiate. When three embeddings are placed at adjacent layers, each learns a different role: surface adjustment, bridging, and direct conductor amplification. This emergent division of labor was not designed.
Relationship to VINE
In the VINE framework, decision points are implemented as perceptrons — auditable, editable, instant-compute alternatives to hard-coded if/elif/else chains. The conductor IS the decision surface. VINE removes the indirection between the conductor and the program.
This experiment demonstrates the conductor's existence and accessibility in standard transformer architectures. VINE's approach of placing perceptrons at decision points is equivalent to giving the conductor direct control over execution — which is what happens here when the embedding amplifies the conductor and the model internalizes it.
The standing wave probe, shadow trace, and conductor embedding experiments provide the empirical foundation for VINE's architectural claims: the geometry is real, it is load-bearing, and it can be engaged.
Scripts, checkpoints, and reproduction instructions available to licensees. Total experiment time: approximately 18 minutes on an RTX 4070 SUPER.
Follow-up: The Long Run (70 epochs on GPT-2)
A follow-up experiment pushed Phase B training on GPT-2 to 70 epochs (intended 80, completed 70 before PC restart) to observe what happens when the conductor is amplified past the integration point. See CONDUCTOR_LONG_RUN_DECOUPLING.md for full results.
Summary: the conductor integration completed by epoch 10-20 (100% retention), but training past that point did not strengthen the conductor meaningfully and instead caused the token output path to decouple from the hidden-layer reasoning. By epoch 70, dark matter ratio was maximal and retention was 100%, but generation had collapsed into infinite token loops and unicode noise. Epochs 20-25 are the sweet spot: amplified conductor AND coherent generation, in some cases structurally better than baseline.
This is a clean measurement of a failure mode where a system's internal "reasoning" remains coherent while its external outputs become pathological.
14 checkpoints spanning epochs 5-70 are retained for analysis.
Appendix: What the Conductor Is Not
- Not noise. Noise is not stable across layers, is not structured, and does not respond coherently to a learnable embedding.
- Not a training artifact. It exists in both a custom-trained model and OpenAI's pretrained GPT-2.
- Not the prediction surface. The embedding preferentially aligns with tail PCs, not visible PCs. The conductor is orthogonal to what softmax sees.
- Not epiphenomenal. When amplified and internalized, it changes model output in structured, domain-coherent ways. It is load-bearing geometry.