TOON Format Specification: Complete Guide to Grammar and Syntax
Discover the official TOON (Token-Oriented Object Notation) specification, ABNF grammar, data types, and key syntax rules of this modern serialization format.
Status: Draft Standard (v2.0)
This document defines the technical specification for TOON (Token-Oriented Object Notation), a data serialization format optimized for Large Language Model (LLM) context windows. See the official project page for updates.
The key goals of this specification are:
- Maximizing Token Density: Representing data in the fewest possible BPE (Byte Pair Encoding) tokens.
- Human Readability: Maintaining a structure that is intuitive for humans and easy for LLMs to reason about.
- Lossless JSON Conversion: Ensuring 1:1 mapping with standard JSON types.
1. Document Structure
A TOON document MUST be encoded in UTF-8. A TOON document represents exactly one root node, which may be an Object or an Array.
2. Whitespace and Indentation
TOON uses Significant Whitespace to denote hierarchy.
- Indent Unit: The standard indentation is 2 spaces (U+0020). Tabs (U+0009) are forbidden to ensure consistent tokenization across different models.
- Newlines: Lines must end with `\n` (LF) or `\r\n` (CRLF).
3. Objects
An Object is a collection of key-value pairs.
Syntax: key: value
user:
name: Alice
role: admin- The colon ` : ` is mandatory. It must be followed by at least one space if a value follows on the same line.
- If the value is a nested object/array, the newline follows immediately after the colon (or key).
4. Arrays
Arrays can be represented in two ways: List Style and Table Style.
4.1 List Style (Heterogeneous)
Used when array items have different structures or are primitives.
tags:
- featured
- new
- on-saleThe hyphen `-` denotes a list item.
4.2 Table Style (Homogeneous)
Used when array items are objects sharing the same keys. This is the primary optimization feature of TOON.
Syntax header: key[Count]{field1, field2}
users[3]{id, name, score}:
1, Alice, 99
2, Bob, 85
3, Charlie, 42Rules:
- Count: The `[N]` indicates the number of rows. This hints the LLM to expect a loop.
- Fields: The `{a, b}` defines the schema for the rows.
- Separator: Values are separated by comma `,`.
Note: Using pipes `|` for visual alignment is allowed by some parsers but discouraged in the strict spec as it consumes extra tokens.
5. Values and Types
5.1 Strings
Strings can be Unquoted or Quoted.
Unquoted Strings (Bare Words):Any sequence of characters that does not start with a special character (`-`, `[`, `{`, `"`, `#`) and does not contain newlines or delimiters.
status: active
color: light blueQuoted Strings:Double quotes `"` are required if the string contains special characters, starts/ends with whitespace, or resembles a boolean/number/null.
greeting: "Hello, World!"
empty: ""
number_string: "123"5.2 Numbers
Follows standard JSON number format (integer, float, exponent).
count: 42
temp: 36.6
avogadro: 6.022e235.3 Booleans
Literals `true` and `false` (lowercase).
5.4 Null
Literal `null`.
6. Comments
Comments start with `#` and extend to the end of the line.
# This is a comment
config:
timeout: 5000 # milliseconds7. ABNF Grammar (Excerpt)
The following is a simplified ABNF definition.
TOON = object / list
NL = %x0A / %x0D.0A
Indent = 2SP
object = 1*(key pair)
pair = key ":" [SP value] NL
list = list-item / table-array
list-item = "- " value NL
table-array = key "[" integer "]" "{" field-list "}" ":" NL *row
row = Indent value *("," SP value) NL
value = string / number / boolean / null / object / list8. Parsing Implementation Guide
When implementing a TOON parser, the primary challenge is Context Tracking. Since structure is defined by indentation, the parser must maintain a stack of current indentation levels.
Algorithm Sketch:
- Read line.
- Calculate indentation level (count leading spaces / 2).
- If level > current_level: Push new container (Object/Array).
- If level < current_level: Pop containers until match.
- Parse content (Key-Value or List Item).
9. Test Suite
To ensure interoperability, all implementations should pass the TOON Core Test Suite (available on GitHub). It covers:
- Deep nesting limits (default: 100).
- Unicode handling (emojis, CJK characters).
- Corner cases (empty keys, empty strings, trailing commas).
Recommended Reading
Stop Using JSON for LLMs: The Case for Token Efficiency
Why JSON is costing you money and performance in AI applications, and how switching to TOON can reduce token usage by up to 60%.
Niche Developer Tools You Probably Aren't Using (But Absolutely Should) - TOON Edition
Discover how Warp, Ray, and HTTPie can supercharge your development cycle, and learn how the TOON format makes sharing tool outputs with AI more efficient.
Why LLMs Agree With You (And How TOON Helps)
Explore why LLMs favor agreement over correctness due to reward hacking, and how using TOON in your evaluation pipeline can help detect sycophancy.