Portfolio · Transformer Decoupling

Boring Code — VINE Replacing If/Elif

A thermostat controller, three ways. If/elif works. Random linear froze the building to 11°C. Geometric settling: 86.5% agreement with if/elif at zero training. A trained linear needs 126 parameters + 100k gradient steps to approximate it — and still hits an architectural ceiling on nonlinear decisions.

Boring Code: VINE Replacing If/Elif in a Thermostat

Date: 2026-04-16


Summary

A standard HVAC thermostat controller — the most boring possible programme — was implemented three ways:

  1. Traditional if/elif: hardcoded branching logic. Works correctly. 71 heat cycles, temperature 18.4-20.8°C.
  2. Raw linear transform: random weights, softmax over categories. No training. Froze the building to 11°C.
  3. VINE geometric decisions: a settling primitive feeding a basin classifier. Zero training, zero learned parameters, a small set of hand-chosen basin positions. Temperature 17.3-20.1°C. 86.5% agreement with if/elif on mode decisions. 90.6% on energy decisions.

The linear transform was then trained on the VINE agent's decisions. It needed 126 parameters and 100,000 gradient steps to reach 98.4% accuracy on mode decisions — and topped out at 60.7% on fan decisions because the linear architecture cannot represent the nonlinear feature interactions the geometric primitive handles natively.

Gradient descent paid 126 parameters and 100,000 steps to approximate what a handful of chosen numbers do for free. And even then, it couldn't fully match it.


The Programme

An HVAC controller that reads sensors and makes four decisions per tick:

Decision Options Primary drivers
Mode heat / idle / cool temperature error vs target
Fan speed off / low / medium / high temperature error + humidity
Energy eco / normal / boost energy price + occupancy
Schedule away / night / day / override time of day + occupancy

Simulated over 24 hours (96 ticks at 15-minute intervals). Outdoor temperature follows a sine wave. Occupancy follows a day/night pattern. Energy price peaks in the evening. Indoor temperature responds to HVAC actions plus outdoor drift.


Three Decision Mechanisms

A. Traditional If/Elif

def decide_mode(self, s):
    if s.temp_error > self.hysteresis:
        return "cool"
    elif s.temp_error < -self.hysteresis:
        return "heat"
    else:
        return "idle"

Standard branching. Each decision is a cascade of thresholds. Four decisions × 3-6 branches each = approximately 20 if/elif conditions total. Correct, brittle, opaque.

B. Raw Linear Transform

logits = W @ state_vector + b
probs = softmax(logits)
choice = labels[argmax(probs)]

Each decision is a matrix multiply followed by softmax — the same mechanism a transformer uses for output. Random initialisation (no training). 126 total parameters across all four decisions.

Result: chose to cool at the start (random weights), then idled for 91 of 96 ticks. The building reached 11°C. A thermostat that confidently froze the building.

C. VINE Geometric Decisions

Each decision maps the programme state to a position on a bounded axis via a geometric settling primitive (a nonlinear combination of weighted inputs), then classifies against a small set of hand-chosen basin points on that axis. A handful of chosen numbers per decision — no training, no learned parameters.

[Implementation of the settling primitive and basin classifier is withheld pending open-source release. Available under licence.]

Result: 86.5% agreement with if/elif on mode, 90.6% on energy. Temperature 17.3-20.1°C. Competent thermostat from zero training.


Agreement Between Mechanisms

Decision if/elif vs linear if/elif vs VINE linear vs VINE
Mode 26.0% 86.5% 24.0%
Fan 31.2% 49.0% 25.0%
Energy 51.0% 90.6% 58.3%
Schedule 26.0% 61.5% 7.3%

The linear transform with random weights is near-chance (26% for 3 categories). VINE without training matches the hardcoded logic at 86.5-90.6% on the primary decisions.

The 13.5% disagreement on mode is mostly timing: VINE switches between heat and idle at slightly different temperature thresholds than the if/elif's hardcoded hysteresis. The BEHAVIOUR is the same; the exact transition point differs because one uses a threshold and the other uses basin distance. The basin distance produces a natural hysteresis whose width depends on basin spacing — a property of the geometry, not a hardcoded parameter.


Training the Linear Transform on the VINE Agent

The linear transform was then trained via gradient descent to match the VINE agent's decisions. 5000 diverse sensor states were generated, the VINE agent made decisions on each, and the linear transform learned from those decisions via cross-entropy loss.

Learning Curves

Training examples Mode acc Fan acc Energy acc Schedule acc
10 77.2% 52.7% 58.6% 61.1%
50 90.3% 52.3% 68.2% 73.9%
250 96.8% 57.2% 68.0% 85.8%
1,000 97.3% 56.1% 70.2% 90.1%
5,000 98.4% 60.7% 72.4% 93.5%
VINE (no training) 100% 100% 100% 100%

What the Linear Transform Learned

After 5000 examples and 20 epochs (100,000 gradient steps), the learned weight matrix for mode decisions concentrated almost entirely on the temperature error feature, with heat and cool rows taking large opposite-sign values and all other features contributing minor weights.

[Full weight matrix available on request.]

The linear transform discovered that temperature error drives mode decisions — which is exactly what the geometric agent's mode logic encodes directly by design. Gradient descent spent 100,000 steps rediscovering a feature priority that was chosen in advance.

Top features discovered per decision:

  • Mode: error, temp, target ✓
  • Fan: humidity, occupancy, error ✓
  • Energy: price, occupancy, error ✓
  • Schedule: occupancy, time, target ✓

Every feature ranking matches what the geometric agent was designed with. Gradient descent found the same answer. It just had to search for it.

Where the Linear Transform Fails

Fan decisions plateau at 60.7%. The geometric agent's fan decision combines absolute temperature error, humidity, and occupancy through a nonlinear primitive. The linear transform's single matrix multiply cannot represent this interaction. Same as XOR additive stuck at 50%: the architecture lacks the nonlinearity to separate the classes.

Energy decisions plateau at 72.4%. Similar: the interaction between price and occupancy in the geometric agent's energy decision involves a nonlinear weighting that a single linear layer approximates but cannot match.

These are not failures of training. They are architectural ceilings. The linear transform has been trained to convergence — more data and more epochs do not improve accuracy. The remaining gap between 60.7% and 100% is the difference between linear and geometric decision mechanisms.


Parameter Accounting

Mechanism Parameters Training required Mode accuracy Fan accuracy
If/elif 0 (hardcoded) N/A 100% (by definition) 100%
Linear transform (untrained) 126 none ~26% (random) ~26%
Linear transform (trained) 126 5000 examples, 20 epochs 98.4% 60.7%
VINE ~16 chosen values none 86.5%* 100%

*VINE's mode agreement with if/elif is 86.5%; its self-consistency is 100%. The 13.5% gap is the difference between VINE's basin-distance hysteresis and the if/elif's hardcoded threshold — a design choice, not an error.

The linear transform paid 126 parameters and 100,000 gradient steps to approximate what a handful of chosen values do for free. And even at convergence, two of its four decisions (fan: 60.7%, energy: 72.4%) remain below the VINE agent's accuracy because the linear architecture cannot represent the required nonlinear feature interactions.


What This Shows

For software engineering

Any programme with if/elif decision chains can have those chains replaced by VINE's geometric primitives. The replacement:

  • Requires no training data
  • Uses fewer parameters than the branching logic it replaces (a handful of basin positions vs 20+ conditional branches)
  • Produces naturally continuous confidence scores (not just binary branch/don't-branch)
  • Has built-in hysteresis from basin geometry (no explicit dead-band parameter needed)
  • Is auditable: the position on the axis IS the decision, visible and interpretable at every tick

For the conductor argument

Training a linear model on VINE's decisions is the thermostat equivalent of the conductor embedding experiment. The linear model (the "transformer") pays gradient descent to rediscover geometric structure that the VINE agent has by construction. The learning curve shows the cost: 250 examples for 96.8% on simple decisions, 5000+ for complex decisions, and an architectural ceiling on decisions involving nonlinear feature interactions.

The conductor is the geometry. The transformer rediscovers it at cost. VINE provides it for free.

For Barry

Barry is this experiment at production scale: a vanilla GPT-2 running a vending machine via VINE's geometric primitives. The same primitives that run this thermostat run Barry's stock management, pricing, and customer interaction. Only the number of basins and the complexity of the weighted field differ.


Novel Signal Test — Can the Trained Transform Generalise?

After training, a new signal was introduced: a CO2 sensor affecting ventilation decisions. Three tests:

Test 1 — New input dimension (CO2 as 9th input): VINE adds one term to the weighted field. The linear transform is architecturally blind — it truncates the 9th dimension. 193 states with CO2 > 1000ppm; VINE responds to all of them; the linear cannot see any of them.

Test 2 — Repurposed dimension (outdoor_temp slot becomes CO2): VINE updates one field mapping. The linear's outdoor_temp weights accidentally apply but produce 26.4% agreement — worse than random. The distributions are too different for the old weights to transfer.

Test 3 — New action (adding "ventilate" to fan options): VINE adds one basin position on the existing axis. The linear was retrained with the new output class and 2000 examples. Result: 93.4% overall but 0% ventilate recall. It predicted "low" for all 500 states. It was shown examples of when to ventilate, given the output class, trained to convergence — but cannot learn the decision because the decision depends on an input (CO2) that it cannot see.

The linear transform cannot generalise to signals it wasn't trained on. VINE generalises by changing a number, not a network.


Why Scaling Works (And Why It Costs What It Costs)

The linear transform's failure on nonlinear decisions (fan at 60.7%) and novel signals (ventilate at 0%) raises the question: how do production transformers handle these tasks at all?

They stack.

Each linear layer cannot solve XOR. But stack enough of them with nonlinearities between (attention, GELU, layer normalisation) and the cumulative signal across billions of dimensions develops enough orthogonal structure to approximate geometric reasoning. No single layer understands the shape. But a billion layers, each contributing one noisy partial view, sum to a standing wave that HOLDS the shape.

This is what the conductor experiments measured:

  • The standing wave probe found 3x more energy in the tail PCs than in the prediction surface
  • That tail energy is structured, stable across layers, and consistent across inputs
  • It is the sum of all partial geometric views from all the stacked linear approximations

The model KNOWS something is there. It can feel the shape. It computes it relentlessly. But the softmax output can only read the top few principal components — the prediction surface. The rest — the conductor — is discarded at output time.

Scaling "works" because noise integrated across enough orthogonal dimensions converges toward geometric structure. A trillion parameters is the cost of approximating, via stacked linear noise, what 16 chosen basin positions provide directly. The approximation is remarkable — it produces language, it reasons, it holds character and tone. But it is still an approximation, and it carries the cost: compute, parameters, training data, and the fragility we documented (catastrophic forgetting, routing collapse, jailbreak vulnerability) that arises because the output mechanism discards most of what the parameters learned.

VINE gives the geometry without the stack. The cost is knowing the geometry in advance. The benefit is that a handful of numbers do what 126 trained parameters cannot — and what a trillion stacked linear layers approximate at staggering expense.


Scripts, checkpoints, and reproduction instructions available to licensees.