The gate is the product: what AI loops really are, and who needs one

Most people use AI one step at a time: type a request, read the answer, fix it, ask again. You are the engine; the model is a tool in your hand, and it stops the moment you do. A loop is the other way to work. You hand the AI a goal once and it runs the whole job itself: do the work, check the result, fix what failed, repeat until it clears a bar. The best builders now spend more time designing loops than writing prompts.

It is real, and it is oversold. The version being marketed to everyone is the heavy one most people do not need, and its costs are quiet. Strip away the demos and one thing decides whether a loop helps you or just spends your money: the gate — the check that says a pass is good enough. The gate is the product. Everything else is plumbing around it.

A loop in one sentence

A prompt returns one answer and waits for you. A loop is a goal the AI keeps working toward until a check passes. That check is the entire game. Without a real one, a loop is a model agreeing with itself on repeat, and the model that did the work is a generous grader of its own work. Planning, memory, retries, scheduling — all of it is scaffolding. Take away the gate and you do not have a slower loop, you have an expensive way to nod along.

Two loops we built

We do not have to reach for a hypothetical. This week we shipped two.

The first is a self-correcting extraction loop: an Extractor pulls structured fields out of messy text, a Validator checks them against rules, and the two run in a loop until the output passes. The Validator is the gate. The instructive part was the naive version, which had the Extractor check its own work — it looked like a loop and ran like one, but it mostly re-confirmed its own guesses. Adding a separate Validator is what turned repetition into progress.

The second is a pre-close anomaly sweep for finance: score every manual journal entry for risk, and surface only the ones that trip a rule (round-dollar top-side entries, weekend postings, blanks, suspense accounts) for human review. The gate there is the rule set, not the model's confidence. Testing it taught the same lesson from the other direction — the first version scored half-entries because of a data quirk, and only a hard check caught it. We work through loops like these, step by step, in our Workflows.

Different jobs, same shape: the value lived in the check, not the cleverness of the worker.

What counts as a gate (and what doesn't)

A real gate can fail the work without asking the model's opinion: a test that passes or fails, a type check, a schema or validator, a measurable threshold, or a rubric scored by a separate, stricter model. What does not count is the model saying it is confident, or "looks good to me" from the same model that produced the work.

That separation — maker versus checker — is the highest-leverage structural move in any loop. Anthropic makes the point in its Building Effective Agents guide: agents shine in code precisely because solutions are verifiable, so the agent can iterate on real test feedback. In practice you let the writer be fast and cheap and the reviewer be slow and strict, often a stronger model on higher effort. Claude Code's subagents exist largely for this: a second agent, different instructions, sometimes a different model, catching what the first one talked itself into.

Do you actually need one?

Most write-ups sell the loop before admitting when it is a mistake. Ours is the opposite. A loop earns its setup only when all four are true:

The task repeats often enough that the build pays itself back.
Something can automatically reject a bad result.
The agent can do the whole job, not hand half of it back.
"Done" is objective, not a matter of taste.

Miss one and a single well-written prompt beats a loop. That is not the booby prize; it is the right tool for most of what people do. Anthropic's own advice is to reach for the simplest thing that works and add complexity only when it earns its keep — which, they note, sometimes means not building an agent at all.

The cost nobody prices

Loops run on tokens, and the trap is not that each step costs something — it is that the cost compounds. Every pass, the agent re-reads its context: the goal, the work so far, what failed. That pile grows each time, and a maker-plus-checker setup runs two models over it instead of one.

Here is the arithmetic the demos skip, with real prices. Take a maker-and-checker loop running up to eight passes, with about 20,000 tokens of fixed context (instructions plus the document it is working on) and a transcript that grows a few thousand tokens each pass. Both models re-read that context every pass.

Over 8 passes (rough, illustrative):
  input processed    ~500,000 tokens
  output produced     ~24,000 tokens

At Claude Opus 4.8 (mid-2026: $5 / M input, $25 / M output):
  input    500k x $5/M    = ~$2.50
  output    24k x $25/M   = ~$0.60
  ----------------------------------
  one task               ≈ ~$3

Three dollars sounds trivial. Then you run it the way the demos do — a fleet of the same loop across, say, 200 files in a migration — and it is roughly $600. And that assumes you keep every result. Accept half and bin the rest, which is common, and your cost per usable output doubles to about $6, while you have also done the reviewing the loop was meant to save you. That number, cost per accepted change, is the only one worth tracking. Below a ~50% accept rate the loop costs more than it returns.

One lever cuts it hard: cache the fixed context. That 20,000-token base is re-read every pass; cached, it bills at roughly a tenth of the price, dragging the example from ~$3 toward ~$1.70. (Why output costs 5x input, and how caching works, is the subject of our token breakdown.)

There is a quieter failure too, and it is the same problem wearing a disguise: an agent that decides it is done too early, exits half-finished, and — if it is on a schedule — keeps running and billing while producing nothing. A loop with no hard gate does not crash. It bills you in silence. Which is just to say, again: the gate is the product.

If you build one, build it in this order

The sequence matters more than the tools. The people whose loops survive in production all do it the same way:

Get one run reliable by hand.
Save those instructions as a reusable skill.
Wrap the skill in a loop — add the gate and a stop condition (success, or a hard cap like "after 8 tries, report and stop").
Then put it on a schedule.

Scheduling something you have not proven by hand is how loops blow up while you sleep. The primitives are there once you are ready: in Claude Code, /loop re-runs a command on an interval inside your session, and /schedule pushes it to a cloud cron that keeps going after you close the laptop. Use them last, not first.

Try a loop with nothing but a prompt

You do not need any of that to feel how it works. Paste this into any chat model and give it a real task. The trick is handing it a goal, strict criteria, and a protocol that forces it to grade itself before it is allowed to stop.

Work in a loop until the task meets the bar.

TASK:
[exactly what you want produced]

SUCCESS CRITERIA (strict, no soft passes):
- [criterion 1]
- [criterion 2]
- [criterion 3]

EACH TURN:
1. Produce or improve the work.
2. Score it 1-10 on each criterion. Be brutally honest; list what is weak.
3. If every criterion is 8+, print "FINAL" and stop.
   Otherwise fix the weakest point and go again.

Do not ask me questions. Make a sensible assumption, note it, continue.
Begin. Run until FINAL.

It will draft, grade itself, find the weak spot, and rewrite until it actually clears the bar. That is a loop. The catch: you are still the trigger. Close the tab and it is gone. Getting it to run on its own, on a schedule, is where the heavy machinery — and the bill — comes back.

The everyday version, honestly

A wave of no-code tools now markets exactly this: describe a loop in plain language and it runs for you. The idea is sound and useful for ordinary tasks. The skepticism only gets sharper, though. Run the four-box test before you trust one, insist on a real check rather than a tool that quietly assumes success, and be wary of anything that wants broad access to your email, calendar, or accounts before you have tested it on something low-stakes. "It acts instead of suggests" is the pitch and the risk in one breath. We will name specific tools here once we have actually run them.

The verdict

Loops are a genuine shift in who does the work — the AI stops waiting to be pushed and runs the job itself. But they are not a thing to force into every corner of your week; more often than not the heavy version just burns money. The test is simple: if you cannot name the check that fails a bad result, you do not have a loop, you have an expensive way to agree with yourself. Start free and manual, get the self-checking prompt above into your habits, and graduate to the automated, scheduled, multi-agent version only when you genuinely feel the ceiling. When you do, watch one number: cost per accepted change.