json2toon.co
Secure
8 min read

XML vs TOON: Complete Format Comparison for LLM Optimization

Compare XML vs TOON for LLM prompts: token efficiency, verbosity analysis, and how TOON saves up to 69% on AI API costs.

By JSON to TOON Team

XML (eXtensible Markup Language) is a survivor. It outlived SGML, it coexists with JSON, and it still powers the world's biggest banking and healthcare systems. But XML was designed for a different era—an era of "Documents," not "Tokens."

In the era of Generative AI, where we pay for every byte of context we feed to a model, XML's verbosity is a liability. TOON offers a modern alternative: it preserves the hierarchical structure that XML does so well, but strips away the "Markup" to leave only the "Meaning."

Documents vs Data: A Philosophical Split

XML's roots are in SGML (Standard Generalized Markup Language), the same parent as HTML. It was designed to markup text documents.

<note>
  <to>Tove</to>
  <from>Jani</from>
  <body>Don't forget me this weekend!</body>
</note>

Its superpower is "Mixed Content":

<p>This is <b>bold</b> and this is <i>italic</i>.</p>

This is amazing for publishing. It is terrible for data serialization. When you are sending a list of users to an API, you never need mixed content. You need strict Keys and Values.

TOON is for Data. It assumes structure, not prose.

note:
  to: Tove
  from: Jani
  body: "Don't forget me this weekend!"

The "Closing Tag Tax"

The most obvious inefficiency in XML is the closing tag.

If you have a tag `<internationalization_enabled>`, you must also have `</internationalization_enabled>`.

Math:

  • Tag Name: 26 characters
  • Brackets + Slash: 5 characters (`<`, `>`, `</`, `>`)
  • Total Overhead: 57 characters to wrap a boolean `true`.

TOON Math:

  • Key Name: 26 characters
  • Colon + Space: 2 characters
  • Total Overhead: 28 characters.

Result: TOON is consistently 50% smaller just by removing redundancy. For an LLM with a 128k Context Window, this effectively doubles your memory.

Attributes vs Elements: The Confusion

XML developers have argued for 20 years about how to represent data.

Option A (Attributes):

<user id="123" name="Alice" />

This is concise, but attributes cannot contain nested structures (like a list of addresses).

Option B (Elements):

<user>
  <id>123</id>
  <name>Alice</name>
</user>

This is flexible, but verbose.

The TOON Solution: Unification

TOON abolishes this distinction. Everything is a key-value pair.

user:
  id: 123
  name: Alice

This matches the mental model of modern programming languages (JSON Objects, Python Dicts, Java Maps).

Parsing Complexity: DOM vs Stream

XML parsers are legendary for their complexity.

  • DOM Parsing: Loads the entire tree into memory. Explodes RAM for large files.
  • SAX Parsing: Event-based (`startElement`, `endElement`). Extremely fast but painfully hard to write code for ("Callback Hell").

TOON is designed for Linear Streaming.

  • It reads line by line.
  • Indentation tells it the depth.
  • It emits objects as they complete.

Writing a TOON parser takes an afternoon. Writing a compliant XML parser takes a year.

Token Economics: The 69% Savings

Let's look at a realistic "RAG Chunk" payload.

FormatContent (50 chunks)Tokens (GPT-4)Cost
XMLVerbose Markup6,500$0.20
TOONTabular Data2,015$0.06

The savings are not just money. They are Speed. Generating 2,000 tokens is 3x faster than generating 6,500. For a chatbot, this is the difference between "Snappy" and "Sluggish."

Use Cases

When XML Wins

The Document Web.

If you are writing a book (`DocBook`), a technical manual (`DITA`), or a rich text document (XHTML), XML is superior. Mixed content `

text bold text

` is a feature TOON does not attempt to replicate.

When TOON Wins

The Data Web.

If you are building an API, feeding a vector database, or configuring an AI Agent, use TOON. The model doesn't care about angle brackets. It cares about relationships, values, and types.

Conclusion

XML feels like a "heavy" format because it carries the weight of 30 years of history.

TOON feels "light" because it carries only what is necessary for the task at hand: transferring structured data to intelligence.

If you are still sending XML to OpenAI, you are paying a "Legacy Tax." It's time to upgrade.

Recommended Reading

XMLTOONComparisonToken OptimizationLLMVerbosity