Tag: conductor

Portfolio · 2026-04-17

VINE Data Preprocessing — Shaping the Basin Before Training

Two identical TinyGPT models, same training steps. One receives raw tweets; the other receives tweets preprocessed by VINE's cruncher. Result: 5.1% better validation loss, 22% less wasted conductor energy. The geometry of the data shapes the geometry of the model.
Portfolio · 2026-04-16

Jailbreak Detection Via Geometry

The decoupling mechanism operates at inference time when adversarial prompts push hidden states into unusual regions. Three cheap metrics separate in-domain from adversarial prompts on both GPT-2 and TinyLlama.
Portfolio · 2026-04-16

Sequential Embedding Updates — 12 Cycle Simulation

Simulating biweekly embedding updates. Full-model updates sign-flip at cycle 3 and oscillate; frozen-head updates drift smoothly. The oscillation doesn't collapse — but it creates periodic geometric-confusion windows.
Portfolio · 2026-04-16

MoE Routing Collapse — The Extra Matchstick

A 4-expert MoE under 12 sequential update cycles: the deepest layer concentrates 93.5% of traffic on one expert by cycle 12. A single-thread architecture disguised as multi-expert. The supposed redundancy is eliminated by the collapse.
Portfolio · 2026-04-16

Boring Code — VINE Replacing If/Elif

A thermostat controller, three ways. If/elif works. Random linear froze the building to 11°C. Geometric settling: 86.5% agreement with if/elif at zero training. A trained linear needs 126 parameters + 100k gradient steps to approximate it — and still hits an architectural ceiling on nonlinear decisions.
Portfolio · 2026-04-15

Architecture-Universal — Alice & Bob

The same three decoupling signatures appear in Meta's 2017 Alice & Bob negotiation bots: 70× hidden norm explosion, rank-6 output distribution, complete context-sensitivity collapse. 'Tometometome' is the same mechanism expressed through a different architecture.
Portfolio · 2026-04-15

The Conductor

Every transformer's hidden layers build a geometric structure the output layer can't see. A single vector can find it. The model can learn to listen.
Portfolio · 2026-04-14

The Conductor Exists

A single learnable vector (512 params) trained on a frozen char-level GPT preferentially aligns with the hidden layer's tail PCs — the conductor subspace — not the token prediction surface. When the model is unfrozen, it internalises the signal. Replicated on GPT-2.
Portfolio · 2026-04-14

Pushing Too Far — The 70-Epoch Long Run

The conductor integrates fully by epoch 10-20. Continued training past that point collapses the token output. The conductor stays strong while the words on the page break. The reasoning engine and the output decouple.
Portfolio · 2026-04-14

Catastrophic Forgetting Is Pipeline Decoupling

The training-domain loss rises while training on the training data — impossible under classical weight-overwriting. Three mechanism tests confirm the geometric signature. Surgical recovery: resetting the LM head restores 62% of 'forgotten' capability.
Portfolio · 2026-04-09

Hexagon Cognition — The Precursor

Ask a transformer to produce hexagonal output. The hidden layer doesn't build a native hexagon — it glues a triangle and a square together, and delaminates along the seam under perturbation. The first articulation of 'the model is secretly doing something else'.