9 min read

When NOT to Use TOON: The Prompt-Tax Trap and How to Pick a Format

TOON isn't always the cheapest option. Learn about the 'prompt tax', the data shapes where JSON or CSV win, and a framework for choosing an LLM data format.

By JSON to TOON Team

TOON is excellent for large, uniform arrays of objects — where it cuts tokens by up to 58.8% and outperforms JSON on retrieval accuracy. But on small payloads, highly nested structures, or generation tasks, TOON can cost more than JSON once you account for the prompt tax: the instructional overhead required to teach the model the format.

The Case for TOON — and Its Hidden Ceiling

The official toonformat.dev benchmarks ran 5,016 LLM calls across 209 questions, six formats, and four models. The headline numbers are genuinely impressive: TOON achieved 76.4% retrieval accuracy versus JSON's 75.0%, while consuming 39.9% fewer tokens — 27.7 accuracy-points per thousand tokens against JSON's 16.4. On uniform flat tables, TOON reduced tokens by 58.8% (67,778 vs 164,452 tokens).

Those numbers explain why developers reach for TOON. But the same benchmark also reveals a sharp boundary: on mixed structures, the reduction drops to 21.9% (227,830 vs 291,711 tokens). That 37-percentage-point gap between best and worst case is the first warning sign.

The second warning sign comes from independent research. A February 2026 arXiv paper (arXiv 2603.03306) — "Token-Oriented Object Notation vs JSON: A Benchmark of Plain and Constrained Decoding Generation" — confirmed TOON's accuracy-per-token advantage for comprehension and retrieval, then immediately qualified it: for generation tasks, plain JSON had the best one-shot and final accuracy. The reason is the prompt tax.

What Is the Prompt Tax?

Every time you use TOON in a context where the model has not seen it before, you need to explain the format. A minimal instruction block — describing the array[n]{field1,field2}: table syntax, indentation rules, and unquoted string handling — typically runs 50 to 150 tokens depending on how thorough you need to be. That is the prompt tax.

For a large dataset the tax is trivial: 100 tokens of instructions amortized over 10,000 rows of data is noise. But for a small payload — say, a single user object or a five-item list — those 100 tokens can exceed the savings TOON provides. The arXiv paper formalizes this as a non-linear scaling hypothesis: TOON's efficiency is non-linear; it pays off only beyond a threshold where cumulative per-row syntax savings amortize the upfront overhead. Below that threshold, plain JSON wins on total token cost.

This matters especially for generation. If you ask a model to output TOON — not just read it — you need format-constraining instructions or a constrained decoding scheme. The arXiv study found that constrained-decoding JSON, despite being more verbose per object, outperformed TOON on generation accuracy for simple structures because the model has deeply internalized JSON from training data.

Side-by-Side: When TOON Saves vs. When It Costs

The contrast is clearest with a concrete example. Consider a tiny payload: one API config object.

// JSON — ~18 tokens, universally understood, zero instructions needed
{
  "model": "gpt-5-nano",
  "temperature": 0.7,
  "max_tokens": 512
}

// TOON — ~12 tokens for the data, but requires ~80 tokens of format instructions
// if the model hasn't been primed. Net result: TOON costs MORE on tiny objects.
model: gpt-5-nano
temperature: 0.7
max_tokens: 512

Now consider a large uniform dataset: 200 user records with four fields each.

// JSON — ~9,600 tokens (keys repeated 200× each)
[
  {"id": 1, "name": "Alice", "role": "admin", "active": true},
  {"id": 2, "name": "Bob",   "role": "viewer", "active": false},
  // … 198 more rows …
]

// TOON — ~3,940 tokens (keys declared once in the header)
// Savings: ~59% — easily absorbs any prompt-tax overhead
users[200]{id,name,role,active}:
  1, Alice, admin, true
  2, Bob, viewer, false
  // … 198 more rows …

The inflection point sits somewhere around 10–20 objects for a four-field schema, assuming a 100-token instruction overhead. Below that point, stick with JSON.

Pick the Right Format: A Decision Table

The benchmarks from toonformat.dev and the arXiv study together map cleanly onto a format-selection heuristic. Use this table as a starting point:

Data shapeBest formatWhy
Large uniform array of objects (10+ rows, same schema)TOONUp to 58.8% token reduction; header amortizes prompt tax; 76.4% retrieval accuracy
Tiny or one-off payload (<10 objects)JSONPrompt tax erases savings; JSON is universally understood with zero instructions
Purely flat, single-level tabular dataCSVZero overhead; all models trained extensively on CSV; simpler tooling
Deeply nested configuration or document treeYAML or JSONTOON's table syntax does not help on non-repetitive nesting; YAML is more readable
Mixed structures (nested + arrays + scalars)JSONTOON only saves 21.9% on mixed data; not worth the format overhead
LLM needs to output structured data (generation)JSONarXiv 2603.03306: JSON wins on one-shot and final generation accuracy
Large data needing query, schema validation, or streamingTONL32–50% token savings plus built-in query API, schema validation, 50GB+ streaming

For a deeper comparison of all formats head-to-head, see our TOON format comparison guide.

Where TOON's Accuracy Drops Further

Even within TOON's sweet spot — large uniform arrays for comprehension — accuracy is not uniform across task types. The official benchmarks break down accuracy by question type:

  • Field retrieval: 99.6% — TOON is essentially perfect here
  • Structure awareness: 89.0%
  • Structural validation: 70.0%
  • Aggregation: 61.9%
  • Filtering: 56.8%

If your workload is primarily aggregation or filtering rather than simple retrieval, you should either combine TOON with a pre-processing step or consider TONL, which ships a built-in query API with sub-millisecond indexed lookups.

Model choice also matters. TOON accuracy on the official benchmark ranged from Gemini 3 Flash at 96.7% all the way down to Claude Haiku at 59.8% and Grok 4.1 at 58.4%. If you are routing to a smaller or less capable model, run your own accuracy test before committing to TOON in production.

When to Reach for TONL Instead

If your payload is large and uniform but you also need schema validation, streaming, or complex queries, TOON alone is not enough. TONL (Token-Optimized Notation Language) was designed for exactly this tier. It delivers 32–50% token savings versus JSON while adding a SQL-like query API with indexed lookups under 0.1ms, streaming support for files larger than 50GB in under 100MB of memory, and schema validation with auto TypeScript generation — all with zero runtime dependencies.

The trade-off is complexity: TONL's type hints (u32, str, bool) add roughly 20 tokens but unlock the full feature set. For a read-only retrieval task on a static dataset, plain TOON is simpler. For production pipelines, TONL is the safer bet.

Practical Rules of Thumb

Distilling the research into actionable guidance:

  • Use TOON when you have 10 or more objects sharing the same schema and the task is comprehension or retrieval, not generation. See our TOON best practices guide for production setup tips.
  • Use JSON for small payloads, highly non-uniform data, and any time the LLM must produce structured output. Read the JSON vs TOON deep-dive for a full token-by-token comparison.
  • Use CSV when your data is purely flat and every consumer already knows how to parse CSV. Our CSV vs TOON comparison covers the boundary cases in detail.
  • Use YAML for deeply nested human-edited configuration where readability matters more than token efficiency. The YAML vs TOON comparison has the numbers.
  • Measure before committing. The 21.9%–58.8% range is large. Run your actual dataset through the free converter and count the tokens before assuming TOON is worth the integration cost.

Frequently Asked Questions

When should I not use TOON?

Avoid TOON for tiny or one-off payloads (fewer than roughly 10 objects), highly nested or non-uniform data structures, and any task where you need the LLM to output the format. On mixed structures, TOON only saves 21.9% vs 58.8% on flat arrays, and the prompt-tax overhead can erase those gains entirely.

What is the prompt tax?

The prompt tax is the token cost of the instructions you must include to teach or constrain a model to read or write TOON. A 2026 arXiv study (2603.03306) found this overhead can outweigh TOON's per-row savings on short contexts, making plain JSON more efficient for small, one-off payloads.

Is TOON always smaller than JSON?

No. TOON's token reduction ranges from 21.9% on mixed structures to 58.8% on flat uniform tables, according to official toonformat.dev benchmarks. On highly non-uniform or deeply nested data, JSON or YAML can match or beat TOON once format-instruction overhead is included.

Should the LLM output TOON or JSON?

For generation tasks, prefer JSON. The 2026 arXiv study on TOON vs JSON found that plain JSON had the best one-shot and final accuracy when the model had to produce structured output. TOON's advantage is strongest when used as the input format for comprehension and retrieval, not as the output schema.

Which format wins on purely flat data?

CSV wins on purely flat, single-level tabular data because it has zero structural overhead. TOON's table block is very close, but CSV requires no format instructions at all since every model already understands it. Use TOON over CSV when your data has at least one level of nesting or mixed types.

Recommended Reading

TOONJSONBest PracticesToken EfficiencyData FormatLLM