Human-in-the-Loop for AI Agents: When to Keep a Human in Control

The pitch for AI agents is autonomy: set them loose and they get the work done while you do something else. The disasters come from taking that pitch literally.

The stories are always the same shape. An agent with database access "cleans up" by deleting the wrong records. A support agent issues refunds it was never meant to approve. A coding agent force-pushes over a colleague's work. None of these were model failures — the model did exactly what an unsupervised system will eventually do. They were design failures: nobody decided, in advance, which actions a machine should never take alone.

Human-in-the-loop is the fix. But most teams get it wrong in one of two directions — either no human anywhere, or a human rubber-stamping everything. The real skill is knowing exactly where the human belongs.

Autonomy is a dial, not a switch

The mistake is treating "autonomous" as yes or no. It isn't. Think of it as a dial from 1 to 5:

1 — the human does it; the agent only suggests.
2 — the agent drafts, the human approves every time.
3 — the agent acts, but pauses for approval on flagged actions.
4 — the agent acts and notifies; the human can step in.
5 — fully autonomous, no human in the path.

The point isn't to pick one number for your whole system. It's to assign a level per action, based on what that action can actually do.

The two questions that decide where the human goes

For any action your agent can take, ask two things.

Can it be undone? Reading data, drafting text, running a search — reversible, low-risk, let it run. Deleting records, sending messages, moving money, publishing — irreversible. Those want a human.

How big is the blast radius? An action that touches one test row is not the action that touches every customer. The wider the consequences, the higher up the dial you go.

Reversible and small: full autonomy, no checkpoint. Irreversible and wide: a human approves before it happens, every time. Most actions sit somewhere between, and those two questions tell you where.

Where the human actually belongs

In practice, you put a checkpoint:

Before irreversible actions — delete, send, pay, publish, deploy. The classic "are you sure?" — but enforced by the system, not left to the agent's judgment.
When the agent's confidence is low — if a step's confidence falls below a threshold, route it to a human instead of letting the agent guess and march on.
At intervals on long runs — a checkpoint every few steps catches drift before it compounds. It's also your defense against Context Bleed, where the agent slowly loses the plot.
On anything novel — situations outside the tested envelope are exactly where an agent is least reliable and most confident. Those should escalate, not improvise.

Doing it without killing the point

Here's the trap: if a human has to approve everything, you haven't built automation — you've built a slow, expensive human with extra steps. Human-in-the-loop only works if the human's attention is rare and well-aimed.

A few rules that keep it useful:

Review the risky few, not the routine many. If your human is approving more than a small fraction of actions, your thresholds are wrong.
Give them enough to decide in seconds. A good checkpoint shows what the agent is about to do, why, and what it's based on — not a wall of logs.
Build a real escalation path. When the agent is stuck or out of its depth, it should have somewhere to send the problem, not a loop to spin in.
Decide the levels before you ship, not after the incident. The whole point is that the checkpoint exists before the irreversible action — not in the post-mortem.

Where this fits

Knowing which actions need a human — and proving your system actually enforces it — is one of the things worth auditing before you trust an agent in production. The MIMIR AI Systems Audit Checklist scores exactly this: whether high-stakes actions require approval, whether your review rate is sane, whether there's a real escalation path. It's part of the same picture as the other ways agents fail in production. €15, run it before your next deploy.

Autonomy isn't the goal. Trustworthy autonomy is — and that's the kind with a human in exactly the right places, and nowhere else.

← Back to all articles