Agent Systems / Tool Protocol

MCP and Tool Calling: Constraining Model Output into Verifiable External Actions

MCP does not merely make a model smarter. It creates discoverable, validated, logged protocol boundaries between models and tools.

Mechanism Lab

Animation: how MCP turns tool calls into protocol contracts

The animation shows a host/model discovering a server’s tools, resources, and prompts; generating a tool call; passing schema validation; executing inside the server; and returning typed results or errors.

Step 1 / 5

Discover

The host uses an MCP client to discover server tools, resources, and prompts.

discover(server)

Animation Control

Reduced-motion users receive the same step states without continuous motion.

01 / Intuition

Core Intuition

Without a tool protocol, the model can only emit free text. With a tool protocol, the model can select declared tools and produce schema-constrained arguments.

MCP separates host, client, server, tools, resources, and prompts. The model application does not hard-code every tool; it discovers server capabilities.

The important object is the contract: name, description, input schema, permissions, execution environment, output format, and error semantics.

For empirical research, MCP lets agents read files, query databases, run statistical scripts, and render tables while leaving structured traces.

02 / Math

Protocol contracts and tool-call state semantics

01 / Capability discovery

The host connects through an MCP client to a server and reads available tools, resources, and prompts.

C = discover(server) = {tools,resources,prompts}

02 / Schema constraints

Every tool has an input schema. Model-generated arguments must satisfy types, required fields, and enumerations.

valid(args, schema_tool) = true

03 / Action generation

A tool call maps model context into a structured executable action, not a natural-language promise.

a_t = f_theta(C,s_t) = {name,args}

04 / Execution boundary

The server executes tools within its own permissions and runtime. Host apps should not execute model text as shell commands.

o_t = server.execute(a_t)

05 / Result return

Results may be data, text, artifact references, or errors. Errors must return to the model and trace.

r_t in {data,text,artifact,error}

06 / Audit log

Auditable systems record prior state, tool name, arguments, result, error, permission, and timestamp.

trace_t=(s_t,name,args,r_t,permission,time)

03 / Code

Python demo: validating an MCP-style tool call with a schema

Real MCP is transported through the protocol. This pure-Python demo shows the same core idea: declare the tool contract, validate arguments, execute the tool, and return structured results.

from dataclasses import dataclass
from typing import Any, Callable

@dataclass
class Tool:
    name: str
    description: str
    required: set[str]
    types: dict[str, type]
    handler: Callable[[dict[str, Any]], dict[str, Any]]

def run_regression(args: dict[str, Any]) -> dict[str, Any]:
    outcome = args["outcome"]
    treatment = args["treatment"]
    controls = args.get("controls", [])
    return {
        "ok": True,
        "table": "outputs/regression.csv",
        "formula": f"{outcome} ~ {treatment} + {' + '.join(controls)}",
    }

REGISTRY = {
    "run_regression": Tool(
        name="run_regression",
        description="Estimate a simple regression and return a table path.",
        required={"outcome", "treatment"},
        types={"outcome": str, "treatment": str, "controls": list},
        handler=run_regression,
    )
}

def validate(tool: Tool, args: dict[str, Any]) -> None:
    missing = tool.required - set(args)
    if missing:
        raise ValueError(f"missing required fields: {sorted(missing)}")
    for key, expected_type in tool.types.items():
        if key in args and not isinstance(args[key], expected_type):
            raise TypeError(f"{key} must be {expected_type.__name__}")

def execute_tool_call(call: dict[str, Any]) -> dict[str, Any]:
    tool = REGISTRY.get(call.get("name"))
    if tool is None:
        return {"ok": False, "error": "unknown tool"}
    args = call.get("arguments", {})
    try:
        validate(tool, args)
        result = tool.handler(args)
        return {"ok": True, "tool": tool.name, "result": result}
    except Exception as exc:
        return {"ok": False, "tool": tool.name, "error": str(exc)}

call = {
    "name": "run_regression",
    "arguments": {
        "outcome": "wage",
        "treatment": "training",
        "controls": ["age", "education"],
    },
}

response = execute_tool_call(call)
print(response)

04 / Case

Case: letting a research agent call statistical tools safely

In StatsPAI, an agent may call run_stata, read_table, estimate_did, render_regression_table, or search_literature.
Without a protocol boundary, the model may emit plausible shell commands while paths, arguments, permissions, and error handling remain uncontrolled.
An MCP-style design first exposes tool lists and schemas. The model can only select existing tools and provide schema-valid arguments. The server executes; the host writes results and errors back into context.
This makes auditing possible: which tool was called, with which arguments, what file came back, why it failed, and whether human confirmation was required.

05 / Risks

Common Pitfalls

Writing vague tool descriptions, leaving the model unsure when to call the tool or what arguments mean.

Checking only tool names while ignoring argument types, required fields, path bounds, and permissions.

Executing model-generated text directly as a shell command, bypassing schema validation and sandboxing.

Failing to return errors as first-class results, causing the agent to narrate success after failure.

Returning huge raw outputs into the context window instead of summaries, artifact references, and traceable paths.

Skipping traces, making it impossible to audit which tool calls support a research claim.