Portfolio · Biological Architectures

Trajectory Learning

2025-12-07 · cognitive biology tardigrade trajectory

A moving object is recognised as the same object across frames because the trajectory is a single basin in spacetime rather than a set of disconnected positions. Object permanence, predictive tracking, and anticipation as properties of trajectory geometry.

Trajectory Learning

The tardigrade learns that the direction of change matters, not just the current state. A plant ripening is worth waiting for; a plant rotting is worth eating now.

The Question

How does an animal learn to act on change rather than just state? A ripe plant is valuable, but "this plant is ripening" means wait — and "this plant is rotting" means eat it now, before it's gone. How is a transition itself learned as meaningful?

The Answer, Short Version

The same feature-valence accumulation that handles colour handles trajectories. Each transition — fresh_ripening, ripe_rotting, overripe_rotting — is its own feature channel. Outcomes flow through those channels the way pain flowed through red in the concept formation test. Change becomes nameable because transitions activate their own detectors. No separate "reasoning about time" module, no explicit temporal logic — the mechanism is the one already in use, running on a differently-shaped input.

The Setup

The test environment contains plants that cycle through five states:

fresh → ripening → RIPE → overripe → rotten

Only the ripe state offers full nutrition. Overripe is neutral; rotten is harmful. The tardigrade encounters plants at various points in the cycle. The test is whether it learns that transitions — not just static states — carry information about when to act.

The protocol runs 3000 training steps, with plants progressing through their cycle on their own while the tardigrade decides when to approach or wait.

What Emerged

State associations (baseline, what the system learns about static states):

ripening        +0.55
ripe            +0.51
fresh           +0.33
overripe         0.00
rotten          -0.66

These are clean. Ripe and ripening are positive; rotten is strongly negative; overripe is neutral. Standard feature-valence learning.

Trajectory associations (the new thing — what the system learned about transitions):

ripe_rotting         +0.70   ← eat now, it's decaying
fresh_ripening       +0.37   ← wait, it will improve
ripe_stable           0.00   ← nothing changing, no urgency signal
overripe_rotting      0.00   ← already past the useful window

The trajectory valences are doing real work. ripe_rotting has a higher valence (+0.70) than static ripe (+0.51) — the system has learned that a ripe plant in the act of rotting is more worth eating than a stable ripe one, because the window is closing. fresh_ripening carries positive valence (+0.37) — a signal to watch, not to eat yet. ripe_stable and overripe_rotting both sit at zero, correctly identified as either non-urgent or already-missed.

The architectural claim is simple: change is a feature like any other, and feature-valence accumulation does the work.

What This Proves

Temporal structure does not require temporal reasoning. If the system's input encoding includes transition information — "this plant was fresh last frame, is ripening this frame" — then the transition becomes a feature channel, and the same valence- accumulation mechanism that handles every other feature learns it.

No new module. No explicit time logic. No prediction machinery. Trajectory is simply another feature available to the system, and the mechanism that was already running handles it.

This is the same primitive doing the work that, in a conventional agent, would be split across a temporal difference operator, a trajectory encoder, and a separate urgency estimator. Here, there is one feature channel per trajectory, and one valence update rule.

Knowledge and Policy Are Separate Layers

A detail worth lifting out: learning the trajectory valences is one problem, and turning those valences into well-timed action is a different problem. This experiment demonstrates the first. The second — the decision policy that picks when to approach versus when to wait — sits above the valence layer and is not what this run is stressing.

That separation is itself architecturally useful. A conventional agent would couple what matters and when to act on it into a single end-to-end policy, trained together. Here they come apart cleanly. The valence channel is doing feature-level learning; a separate policy layer consumes those valences to decide timing. Each layer can be tuned, extended, or replaced without disturbing the other.

What this card demonstrates is the lower layer working as specified: transitions are learnable as features, and their valences shape behaviour. The upper layer — policy timing on trajectory cues — is its own card, on its own timeline.

Raychell Langan · NEXICOG Ltd · Hampshire, UK