Token-Efficient AI Agents: Using TOON for Tool Calls and MCP Pipelines

AI agents accumulate context with every turn: the system prompt, every tool schema, every prior result, and the new result all re-enter the window simultaneously. That makes tool results — not prompts — the fastest-growing line on your token bill. Encoding those results as TOON before injecting them into the loop compounds the savings across every subsequent turn.

Why tool results dominate token cost in agent loops

A single-turn LLM call has a predictable shape: system prompt + user message + response. An agent loop is different. Each turn re-sends everything the model needs to maintain continuity: the original system prompt, all prior assistant messages, all prior tool calls, and all prior tool results. By turn 10, the context can be dominated by accumulated tool output.

Tool results are also structurally ideal candidates for TOON compression. They are almost always uniform arrays of objects — search results, database rows, API responses — the exact shape where TOON performs best. According to the official TOON benchmarks (5,016 LLM calls across four frontier models), flat uniform tables achieved a 58.8% token reduction vs JSON. Time-series data saved 59.0%. Even nested e-commerce-style objects saved 33.3%.

Overall across all data shapes, TOON used 39.9% fewer tokens than JSON while achieving 76.4% retrieval accuracy vs JSON's 75.0% — and it delivered 27.7 accuracy-points per 1,000 tokens compared to JSON's 16.4. Accuracy and economy, not a trade-off.

JSON tool result vs TOON: a concrete example

Consider a tool that returns a list of open GitHub issues. As JSON, a typical batch of five records looks like this:

[
  {"id": 1042, "title": "Fix rate limiter timeout", "state": "open", "author": "alice", "labels": ["bug", "p1"]},
  {"id": 1038, "title": "Add CSV export", "state": "open", "author": "bob", "labels": ["feature"]},
  {"id": 1031, "title": "Docs: update auth section", "state": "open", "author": "carol", "labels": ["docs"]},
  {"id": 1027, "title": "Memory leak in stream handler", "state": "open", "author": "alice", "labels": ["bug", "p0"]},
  {"id": 1019, "title": "Support YAML config", "state": "open", "author": "dave", "labels": ["feature", "p2"]}
]

As TOON, the same five records become:

issues[5]{id,title,state,author,labels}:
  1042, Fix rate limiter timeout, open, alice, bug|p1
  1038, Add CSV export, open, bob, feature
  1031, Docs: update auth section, open, carol, docs
  1027, Memory leak in stream handler, open, alice, bug|p0
  1019, Support YAML config, open, dave, feature|p2

The header line — issues[5]{id,title,state,author,labels}: — gives the model an explicit schema and row count to validate against. Every redundant key, brace, bracket, and quote from the JSON version is gone. On this kind of uniform array, token counts drop by roughly half.

You can convert any JSON tool result instantly using the json2toon.co converter, or automate it with the @toon-format/toon npm package in your tool wrapper.

Token accumulation math across N turns

To see why compounding matters, model a simple loop where one tool fires each turn and returns 2,000 tokens of JSON. By turn 10, the agent has re-sent those results a combined 45 times (turn 2 re-sends result 1, turn 3 re-sends results 1 and 2, and so on). Across all turns of a 10-step loop, the total token cost from that one tool's results accumulates:

JSON cost per result:  2,000 tokens
TOON cost per result:    820 tokens  (~59% savings on uniform arrays)

Cumulative re-sends in a 10-turn loop:
  Turn 1: 0 prior results
  Turn 2: 1 result re-sent
  ...
  Turn 10: 9 results re-sent
  Total re-sends: 0+1+2+...+9 = 45

Extra tokens from re-injection:
  JSON:  45 × 2,000 = 90,000 tokens
  TOON:  45 × 820   = 36,900 tokens
  Saved: 53,100 tokens over 10 turns — from one tool

At GPT-5 input pricing (estimate ~$2.50/M tokens), 53,100 tokens is $0.13 per 10-turn session. Trivial for one session — but at 10,000 sessions per day that is $475 saved daily, or roughly $173,000 per year, from converting a single tool's output format. Heavier tools with richer results, or longer loops, amplify the math further.

For a broader framework on controlling API spend, see our guide to optimizing LLM API costs.

Where to place the conversion: inside the tool wrapper

The right place to convert is as late as possible before the result enters the prompt — inside your MCP server tool handler or equivalent tool wrapper function. This keeps your database, downstream APIs, and business logic working with normal JSON. Only the string that gets returned to the agent loop is TOON.

Here is the pattern for an MCP tool handler in TypeScript:

import { encode } from "@toon-format/toon";

// MCP tool handler
async function handleListIssues(args: { repo: string; state: string }) {
  // Fetch raw data — stays as JSON throughout your app
  const issues = await github.issues.list(args);

  // Convert to TOON only at the boundary, before returning to the agent
  const toonResult = encode(issues);

  return {
    content: [{ type: "text", text: toonResult }],
  };
}

MCP does not prescribe a wire format for tool-result content beyond UTF-8 text, so returning a TOON string is fully spec-compliant. The model receives it as plain text and reads it correctly — TOON is a human-readable format that frontier models handle without additional instruction on large, uniform payloads.

If you are building RAG pipelines rather than tool-calling agents, the same principle applies. See optimizing RAG pipelines with TOON for retrieval-specific patterns.

The critical caveat: read TOON, generate JSON

TOON's efficiency advantage is asymmetric. It excels when the model reads structured data from context. It does not excel when the model writes structured data as output.

An independent peer-reviewed benchmark — arXiv 2603.03306, "Token-Oriented Object Notation vs JSON: A Benchmark of Plain and Constrained Decoding Generation" — found that for generation tasks (asking the model to produce structured output), plain JSON had better one-shot and final accuracy. The reason is what the authors call the prompt tax: the instructional overhead required to teach or constrain the model to emit TOON eats into the format savings, especially in short contexts.

The same study identified a scaling threshold: TOON's efficiency is non-linear. On small payloads, the format instructions can cost more than they save. TOON pays off on large, repetitive arrays where the per-row syntax savings amortize the upfront overhead — exactly the shape of most tool results.

The practical recommendation is therefore:

Tool results injected into context — encode as TOON before returning.
Tool-call argument schemas (what the model must emit to call the next tool) — keep as JSON. Do not ask the model to generate TOON.
Model final output — JSON, or whatever your application expects. Do not add a TOON generation requirement.

For a deeper look at how this interacts with model-specific accuracy, the Claude and TOON efficiency post breaks down per-model benchmark results in detail.

Where TOON helps in an agent loop

Not every part of the agent context benefits equally. This table summarizes the verdict for each layer:

Context layer	Typical size	Re-sent each turn?	TOON verdict
System prompt	Small–medium; mostly prose	Yes	Not applicable — prose, not structured data
Tool schemas	Small; JSON Schema objects	Yes	Keep as JSON — model must emit matching arg shapes
Tool results	Medium–large; uniform arrays	Yes — all prior results	Use TOON — highest impact
Model output (tool calls)	Small; JSON arg objects	Yes, as history	Keep as JSON — do not ask model to generate TOON
Model final answer	Variable; prose or JSON	Only if multi-turn	Keep as JSON or prose — generation prompt tax applies

The takeaway is that tool results are the only layer that is both large and safe to encode as TOON. Everything else either is not structured data, needs JSON for accurate generation, or is too small to amortize the format overhead.

Caveats and when not to use TOON

TOON's gains are real but not universal. Keep these constraints in mind:

Small payloads below the scaling threshold. The arXiv study found TOON's efficiency is non-linear: on small, non-repetitive results, the format's upfront overhead can exceed the per-row savings. If your tool returns fewer than 5–10 rows, measure before committing.
Highly non-uniform data. Flat, uniform arrays are where TOON saves 58–59%. On mixed structures — objects with wildly different key sets — the official benchmarks show only 21.9% savings. JSON or YAML may be a better fit there.
Models with lower TOON comprehension. Per the official benchmarks, Gemini 3 Flash achieved 96.7% accuracy on TOON and GPT-5 Nano 90.9%, but Claude Haiku scored 59.8% and Grok 4.1 scored 58.4%. If your agent uses a smaller or less capable model, test accuracy before deploying TOON in production.
Streaming partial results. If your tool streams tokens mid-result, the TOON table header must be emitted first. Ensure your encoding library flushes the header before streaming rows.

For a broader look at trade-off patterns across chatbot and conversational agents, see building a cost-efficient chatbot with TOON.

Frequently Asked Questions

Should AI agents use TOON or JSON?

Use TOON for tool results injected into context — that is the fastest-growing token cost in a multi-turn loop. Keep JSON for tool-call arguments the model generates. TOON achieves 39.9% fewer tokens overall at 76.4% retrieval accuracy, but the arXiv 2603.03306 study found that asking models to output TOON introduces a prompt-tax overhead that erases most of the gain.

Does MCP support TOON?

MCP (Model Context Protocol) does not prescribe a wire format for tool-result content beyond UTF-8 text. You can encode the result string as TOON inside your MCP server's tool handler before returning it. The model receives it as plain text and reads it correctly, since TOON is a human-readable text format.

Should the model output TOON?

Generally no. An independent benchmark (arXiv 2603.03306) found that for generation tasks — where the model must write structured data — plain JSON had better one-shot and final accuracy. TOON's advantage is in reading large context payloads, not in writing them. Keep tool-call argument schemas as JSON.

How much do agents save with TOON?

Savings depend heavily on data shape. Flat uniform arrays (the most common tool-result type) saw 58.8% token reduction in official TOON benchmarks. E-commerce-style nested results saved 33.3%. Across a 10-turn loop, a tool result that costs 2,000 tokens in JSON costs roughly 820 tokens as TOON on uniform arrays, saving approximately 53,100 tokens over the full loop.

Where exactly in the agent pipeline should I convert to TOON?

Convert inside the tool wrapper or MCP server handler, immediately after fetching raw data and before returning the result string. This keeps the conversion concern in one place, leaves your database and downstream APIs using JSON, and means every layer of the agent loop — including context re-injection on subsequent turns — receives the compact TOON representation.

Try the free converter Read: Building a cost-efficient chatbot