The difference between a chatbot and an agent is one capability: the ability to do things. Send the email. Update the CRM record. Create the invoice entry. This article explains the plumbing that makes that possible — and the mistakes we see teams make when they wire it up themselves.

Tool calling in one paragraph

You describe your systems to the model as a list of functions — name, parameters, what each one does. When the model decides an action is needed, it doesn't perform the action; it emits a structured request: “call create_invoice with these arguments.” Your code validates that request, executes it against the real system, and feeds the result back. The model never touches your database directly. That separation is the whole security model, and it's why a well-built agent is auditable in a way a human clicking through a UI never is.

Where MCP fits

The Model Context Protocol (MCP) standardises how tools are described and served. Instead of hand-writing a function schema for every integration in every project, a system exposes an MCP server once and any compliant agent can discover and use its tools. For clients this matters in one concrete way: integrations stop being bespoke glue code and start being reusable infrastructure. The Slack connector we build for your support agent is the same one your reporting agent uses next quarter.

A useful test when evaluating any agent vendor: ask how a tool call is validated after the model emits it and before it executes. If the answer is “we trust the model”, walk away.

The three layers every production setup needs

1. Schema validation. Every argument the model produces is checked against a strict schema before execution. Wrong type, missing field, out-of-range amount — the call is rejected and the model is asked to correct it.

2. Permission scoping. The agent gets the narrowest credentials that can do the job. A bookkeeping agent that reads invoices does not hold write access to payroll. This mirrors how you'd onboard a junior employee — and it should.

3. Action tiers. Reversible, low-stakes actions (draft an email, create a ticket) execute autonomously. Irreversible or high-value actions (send payment, delete records) queue for human approval. The tier list is a business decision you make once, in writing, before the agent ships.

100%
Of tool calls schema-validated before execution
2 tiers
Autonomous vs approval-gated actions
1 log line
Per action — who, what, why, when

Function routing: the part nobody talks about

Once an agent has more than ~15 tools, accuracy drops: the model starts picking plausible-but-wrong functions. The fix is routing — a cheap first pass classifies the request and exposes only the relevant tool subset to the model that acts. Our operations agents typically run a two-stage route: intent classification on a small fast model, action execution on a stronger one. It is both more accurate and cheaper than throwing every tool at a frontier model.

What this means if you're buying, not building

You don't need to implement any of this yourself — but you should demand evidence of it. Ask to see the audit log of a real action. Ask what happens when a tool call fails mid-workflow. Ask how permissions are scoped. The vendors who can answer in specifics have shipped; the ones who answer in adjectives have shipped demos.