PERPETUAL
Notes

What AI learns from

June 20262 min read

Building intelligent AI is becoming less about how big the model is, and more about what it can learn from once it is doing real work.

Today most AI does its learning before it ships, then stops. It is trained once, shipped, and frozen. After that, the most it does is remember: hold context, recall past conversations, keep notes on what you told it. Remembering is useful, but it is not the same as getting better. To get better, a system has to keep learning after it ships, from how the work actually goes. That signal comes from more than one place.

That signal does not arrive clean. It is noisy and scattered, and most of it teaches nothing. What matters hides in a few moments. The correction a senior person makes that a junior would miss. The place where your judgment and the model's quietly differ. The task that did or did not work. Before a system can learn anything, it has to find those few in the noise.

Diagram: four noisy, scattered streams of interactions from four sources; a few are picked out, turn blue, and are kept.

Real signal is noisy and scattered, not a tidy stream. The work is pulling the few interactions that matter out of the noise.

None of this is new. The field already has a name for learning from each kind of signal. When the model judges its own work, that is self-rewarding. When it learns from a person's correction or preference, that is RLHF. When it absorbs what your organization knows, that is distillation. When it learns from whether the work actually succeeded, that is RLVR. The capability exists. It just lives in pieces, each one its own technique, its own pipeline, often its own team.

Diagram: each of the four sources maps to an established method in the literature.

The capability already exists. It just lives in pieces, one technique at a time.

We treat these four ways as one loop. We are agnostic to where the signal comes from. The model's own work, a user's correction, your knowledge, the outcome of a task. All of it feeds the same loop, which learns the right way from each and gets better at the work in front of you.

That is what it takes to build AI that keeps getting better instead of standing still. Not one source of signal, but all of them, working as one.

Learn from everything. Get better at the work that matters.

If that resonates, we'd like to hear from you.