Designing Human-in-the-Loop: When Should Your Agent Ask for Help?

The question clients ask most isn't “can the agent do it?” — it's “how do I stay in control?” The answer is that autonomy isn't a switch you flip; it's a dial you turn per action type, based on two things you can actually measure: reversibility and cost of error.

The two-axis rule

Plot every action the agent can take. Reversible and cheap to get wrong (draft a reply, create a ticket, tag a record) → run autonomously from day one. Irreversible or expensive (send a payment, email a customer, delete data) → require approval. The interesting middle — reversible but visible, or cheap but irreversible — starts gated and earns autonomy as the evidence accumulates.

Confidence is the second gate

Tiering by action type isn't enough, because the same action can be easy on one input and ambiguous on the next. Production agents attach a calibrated confidence score to each decision; below the threshold, even a normally-autonomous action routes to a human with the agent's reasoning attached. The human isn't redoing the work — they're reviewing a prepared case file.

Design the escalation as a handoff, not an alarm. “Here's the document, the three fields I extracted, the rule that almost matched, and why I stopped” takes a human seconds to resolve. A bare “needs review” flag takes minutes and goodwill.

The review queue is a product — treat it like one

Human-in-the-loop fails not when the agent escalates too much, but when the queue becomes a swamp nobody owns. Three rules keep it healthy: every item shows the agent's draft decision (approve/correct beats decide-from-scratch), the queue has a named owner and an SLA, and every correction is logged as training signal. The corrections are gold — they are exactly the cases your eval suite was missing.

Earning autonomy over time

The dial moves on evidence: when an action type has run gated for weeks and the human approval rate sits near 100%, the gate is demonstrably adding latency, not safety — promote it. When corrections cluster on a pattern, fix the pattern before promoting anything. Write the promotion criteria down at kickoff so the autonomy roadmap is a plan, not a leap of faith.

2 axes

Reversibility × cost of error sets the tier

+ confidence

Low-confidence cases escalate regardless of tier

~100%

Sustained approval rate = signal to promote

The bottom line

Nobody should sell you “fully autonomous” on day one, and nobody should sell you permanent babysitting either. The honest architecture is a dial, per action, moved by evidence — and a review experience good enough that your team doesn't resent holding it.