The Model Context Protocol (MCP) is already the standard for connecting agents to tools and data. Why does that matter to you? Because wiring agents to every system with custom integrations creates fragmentation, duplicated work and hard scalability limits. MCP aims to fix that: implement the protocol once on the agent and you unlock an entire ecosystem of integrations.
Since its release in November 2024, adoption was fast: thousands of MCP servers, SDKs for major languages, and the industry treating MCP as the standard. But at scale you hit a practical problem: between tool definitions and intermediate results, model context inflates, latency rises and costs explode. The solution? Use code execution as the interface against MCP servers.
Why do agents consume so many tokens?
There are two common patterns that make agents inefficient in cost and time:
- Tool definitions saturate the context window.
- Intermediate tool results flow through the model and duplicate tokens.
Sound familiar? If you connect hundreds or thousands of tools, loading all their definitions into the prompt forces the model to process hundreds of thousands of tokens before it even starts the real task.
Typical example of a tool definition:
gdrive.getDocument Description: Retrieves a document from Google Drive Parameters:
- documentId (required, string)
- fields (optional, string) Returns: Document object with title, body content, metadata
With thousands of entries like that, costs add up fast. And when the agent calls a tool and the result (say, a 2-hour transcription) comes back to the model, that text can flow through the context twice: once to read it and again if it’s re-inserted for a second call. That can mean 50,000 extra tokens or more, or even exceed the context window and break the flow. What do you do then?
Code execution: what it is and how it helps
The core idea is simple: present MCP servers as code APIs inside a runtime. Instead of injecting all definitions into the prompt, the agent writes and runs code that calls the tools. That way you only load what you need and you can pre-process data before showing it to the model.
A common implementation is to generate a file tree with the available tools. For example, in TypeScript:
servers
├── google-drive
│ ├── getDocument.ts
│ └── index.ts
├── salesforce
│ ├── updateRecord.ts
│ └── index.ts
└── ...
Each tool corresponds to a file:
// ./servers/google-drive/getDocument.ts
import { callMCPTool } from "../../../client.js";
interface GetDocumentInput { documentId: string }
interface GetDocumentResponse { content: string }
export async function getDocument(input: GetDocumentInput): Promise<GetDocumentResponse> {
return callMCPTool<GetDocumentResponse>('google_drive__get_document', input);
}
And the agent flow that used to go through the model becomes code that runs in the runtime:
import * as gdrive from './servers/google-drive';
import * as salesforce from './servers/salesforce';
const transcript = (await gdrive.getDocument({ documentId: 'abc123' })).content;
await salesforce.updateRecord({
objectType: 'SalesMeeting',
recordId: '00Q5f000001abcXYZ',
data: { Notes: transcript }
});
With this pattern, the agent discovers tools by listing the ./servers/ directory and opens only the files it needs. In a real case cited by Anthropic, this reduced usage from 150,000 tokens to 2,000 tokens — a 98.7% savings.
Cloudflare called this "Code Mode" and the key point is clear: LLMs write code very well; you should take advantage of that.
Practical benefits
-
Progressive disclosure: the model reads definitions on demand. You can add a
search_toolsthat returns only name, description or the full definition depending on adetail_levelparameter. -
Efficient results: instead of passing 10,000 rows to the model, you filter and aggregate in the runtime and return only what’s necessary.
Example of sheet filtering:
const allRows = await gdrive.getSheet({ sheetId: 'abc123' });
const pendingOrders = allRows.filter(row => row["Status"] === 'pending');
console.log(`Found ${pendingOrders.length} pending orders`);
console.log(pendingOrders.slice(0, 5)); // you’ll only see 5 rows
- More powerful flow control: loops, conditionals and retries live in code instead of making round-trips to the model. That lowers latency and time-to-first-token.
Example of polling in Slack:
let found = false;
while (!found) {
const messages = await slack.getChannelHistory({ channel: 'C123456' });
found = messages.some(m => m.text.includes('deployment complete'));
if (!found) await new Promise(r => setTimeout(r, 5000));
}
console.log('Deployment notification received');
- Privacy and tokenization: intermediate results can stay in the runtime. The MCP client can tokenize PII before it reaches the model and de-tokenize only when needed at the destination call. That enables flows where emails, phones or names never pass through the prompt in plain text.
Example of importing contacts with tokenization:
const sheet = await gdrive.getSheet({ sheetId: 'abc123' });
for (const row of sheet.rows) {
await salesforce.updateRecord({
objectType: 'Lead',
recordId: row.salesforceId,
data: { Email: row.email, Phone: row.phone, Name: row.name }
});
}
The runtime tokenizes row.email and row.phone before exposing them to the model. When the client makes the Salesforce call, they are de-tokenized locally.
- Persistence and skills: the filesystem lets you save intermediate results and reusable functions. An agent can write a function, store it in
./skillsand reuse it in future runs, building a specialized toolbox.
Risks and operational requirements
Code execution isn’t free operationally. It requires:
- Secure runtimes and sandboxing.
- Resource limits and monitoring to avoid abuse or infinite loops.
- Security policies and review of agent-generated code.
These requirements add complexity compared to direct tool calls, but they’re the price you pay to reduce tokens, latency and to improve tool composition.
Practical recommendations to implement it
- Start by exposing tools as files in a servers structure and experiment with small workloads.
- Implement a
search_tools(detail_level)so the agent discovers only what it needs. - Add automatic PII tokenization in the MCP client.
- Enforce runtime time and memory limits, and log calls for audit.
- Turn repeated scripts into
./skillswithSKILL.mdso the model learns to reuse them.
If you work in startups or product teams, this pattern lets you scale agents connected to many APIs without token costs wrecking project viability. I’ve seen teams turn fragmented integrations into coherent platforms applying these ideas: fewer tokens, less latency, more resilience.
Code execution with MCP applies classic software engineering patterns to the agent world: modularity, abstraction, and separation of concerns. It’s a practical, proven way to make agents work with many systems without overloading the model.
Original source
https://www.anthropic.com/engineering/code-execution-with-mcp
