MessagePack vs TOON: Binary Wire Formats vs LLM-Readable Tokens
MessagePack is about half the size of JSON on the wire—but binary formats Base64-bloat inside LLM prompts. Here's why TOON wins for prompts and MessagePack wins for transport.
MessagePack wins on the network: it is roughly 37% smaller than JSON uncompressed and keeps bytes off the wire. TOON wins inside the prompt: 39.9% fewer tokens with equal or better retrieval accuracy. They are not competing for the same job. MessagePack optimizes transport; TOON optimizes the LLM context window. Use both.
What Problem Does Each Format Actually Solve?
MessagePack (and its cousin BSON) were designed to replace JSON on the wire. They encode the same key-value structure as JSON but in a compact binary representation. The result is smaller payloads between services, faster inter-process communication, and lower storage overhead.
TOON was designed for a completely different layer: the LLM context window. It is a text format that declares field names once in a header and then emits only values per row — collapsing the repetitive key-and-brace overhead that JSON pays on every object in an array. The official toonformat.dev benchmarks (5,016 LLM calls, 209 questions, six formats, four models) show TOON achieving 76.4% retrieval accuracy while using 39.9% fewer tokens than JSON — 27.7 accuracy-points per thousand tokens versus JSON's 16.4.
The confusion arises because both formats shrink something. But what they shrink is different: MessagePack shrinks bytes on the wire; TOON shrinks tokens in the prompt. Those are measured in different units and optimized for different bottlenecks.
Why MessagePack Cannot Go Directly Into an LLM Prompt
LLM APIs — including OpenAI, Anthropic, Google, and every hosted inference endpoint — accept text. They do not accept binary payloads embedded in the message body. If you want to pass MessagePack-encoded data to a model, you must first Base64-encode the bytes into an ASCII-safe string.
That step has two compounding costs. First, Base64 encoding inflates the data by roughly 33%. A 100-byte MessagePack payload becomes roughly 133 characters of Base64. Second, the resulting character strings tokenize poorly. OpenAI's o200k_base tokenizer (used by the GPT-4o and GPT-5 families) averages approximately 4 characters per token in well-formed English text. Base64 strings do not resemble English — they are dense alphanumeric sequences that the tokenizer cannot split at meaningful word or subword boundaries. The result is a blob of tokens the model has no semantic grounding for. The model must treat them as opaque rather than as structured data it can reason about.
In short: MessagePack compresses the wire, then Base64 un-compresses the prompt, and then the tokenizer makes the un-compressed prompt expensive to process. Three successive costs on a path that TOON avoids entirely by staying in human-readable text.
For more on why binary embeddings are particularly costly in the prompt layer, see our guide on embeddings and binary data in LLM formats.
Is MessagePack Always Faster Than JSON?
The size advantage of MessagePack is real but narrower than often assumed — and the speed advantage is not guaranteed. One V8 benchmark measured JSON.stringify at 45.5 ms versus MessagePack pack at 144.5 ms on a 10 MB object — MessagePack was roughly 3.17× slower in that case. Native JSON parsing is deeply optimized inside V8 and other engines; MessagePack runs in userland JavaScript and pays the cost.
The size edge also shrinks under compression. The same benchmark found that a raw difference of 22 KB collapsed to approximately 140 bytes after gzip. Most HTTP APIs and message queues apply gzip or zstd automatically, so the real-world binary advantage can be marginal once the transport layer does its job.
None of this makes MessagePack a bad choice for transport — it remains a strong option for high-frequency service-to-service calls where binary encoding is accepted. It just means the numbers are more nuanced than the "half the size" headline suggests.
Side-by-Side: JSON, MessagePack, and TOON on the Same Data
Consider a small array of user records — the kind of data you might retrieve from a database and pass to an LLM for analysis.
// (a) JSON — universally readable, ~420 characters, repeats keys on every row
[
{"id": 1, "name": "Alice", "role": "admin", "active": true},
{"id": 2, "name": "Bob", "role": "viewer", "active": false},
{"id": 3, "name": "Charlie", "role": "editor", "active": true}
]
// (b) MessagePack — binary bytes (hex sketch); CANNOT go into a prompt as-is.
// Must be Base64-encoded before embedding in any LLM API call.
// 83 a2 69 64 01 a4 6e 61 6d 65 a5 41 6c 69 63 65 ...
// After Base64: g6JpZAGkbmFtZaVBbGljZaRyb2xlpWFkbWlupGFjdGl2ZcM=
// (and so on for each record — one opaque string the model cannot reason about)
// (c) TOON — text-native, field names declared once, values only per row
users[3]{id,name,role,active}:
1, Alice, admin, true
2, Bob, viewer, false
3, Charlie, editor, trueThe JSON version repeats id, name, role, and active on every row — each key is a token cost multiplied by the number of records. TOON pays that cost once in the header, then emits only values. On 200 such records with four fields each, the official benchmark shows TOON at 58.8% fewer tokens than JSON (67,778 vs 164,452 tokens for flat uniform tables). MessagePack is smaller on disk, but after Base64 it is larger in the prompt and unreadable.
To understand why JSON keys repeat costs on every token pass, see our deeper explanation in JSON vs TOON.
MessagePack vs JSON vs TOON: Full Comparison
| Criterion | MessagePack | JSON | TOON |
|---|---|---|---|
| Human-readable | No (binary) | Yes | Yes |
| LLM-promptable directly | No (requires Base64) | Yes | Yes |
| Wire / disk size | ~37% smaller than JSON raw | Baseline | Similar to JSON (text) |
| Prompt token count | +33% vs JSON (Base64 overhead) | Baseline | −39.9% vs JSON (up to −58.8%) |
| Parse speed (large objects) | Can be ~3× slower (V8 test) | Fast (native engine) | Text parse (comparable to JSON) |
| LLM retrieval accuracy | Opaque after encoding | 75.0% | 76.4% (benchmark) |
| Best use case | Service-to-service transport, caching, binary storage | APIs, configs, small payloads, LLM output | Feeding structured data into LLM prompts |
Where MessagePack Still Belongs in an LLM Pipeline
None of the above means MessagePack is the wrong tool — it means it belongs at a different layer. A production pipeline that calls an LLM might look like this:
- Data is stored in a database or message queue serialized as MessagePack or BSON for compact transport.
- The application layer deserializes the binary payload back to a native data structure (array of objects, dict, etc.).
- That data structure is then serialized to TOON for insertion into the LLM prompt.
- The LLM processes the TOON, returns a text or JSON response, which the application layer handles normally.
MessagePack and TOON are not competitors — they are at different stages of the same pipeline. For more on how binary embedding formats interact with LLM context windows, see our guide on embeddings and binary data. For a comparable binary-vs-text analysis in the context of Protobuf, see Protobuf vs TOON.
How Tokenization Explains the Gap
The fundamental reason TOON outperforms any binary format in the prompt is how tokenization works. OpenAI's o200k_base tokenizer uses Byte Pair Encoding (BPE): it starts from raw bytes and iteratively merges the most frequent adjacent pairs until a target vocabulary size is reached. The resulting ~200,000-token vocabulary was built on natural language and code — not binary sequences or Base64 strings.
Every repeated JSON structural glyph — {, }, ", :, , — costs tokens on every object in an array. On a 200-row dataset, those repeated keys and punctuation marks accumulate into thousands of wasted tokens. TOON declares fields once in the header line (users[200]{id,name,role,active}:) so the per-row overhead collapses to the values plus a single delimiter. The header pays a fixed cost regardless of how many rows follow.
Base64, meanwhile, produces strings that look nothing like anything in the tokenizer's training data. The tokenizer assigns token boundaries at positions that carry no semantic meaning for the underlying data. The model receives a string of tokens it cannot interpret structurally — the exact opposite of what TOON provides.
For a broader look at how tokenization shapes format choice, see What is TOON?
Frequently Asked Questions
Is MessagePack better than TOON for LLMs?
They solve different problems. MessagePack wins on network transport: it is roughly 37% smaller than JSON uncompressed. But it is binary — to put it in an LLM prompt you must Base64-encode it, which inflates bytes by 33% and tokenizes poorly. TOON wins inside prompts: 39.9% fewer tokens with 76.4% retrieval accuracy on the official benchmark.
Why can't I just send MessagePack bytes directly to an LLM?
LLM APIs accept text. Binary payloads must be Base64-encoded before embedding in a prompt. Base64 adds roughly 33% bytes over the raw binary, and the resulting character strings tokenize inefficiently — OpenAI's o200k_base tokenizer averages about 4 characters per token, so a binary blob becomes a long sequence of opaque tokens the model cannot reason about.
Is MessagePack always faster than JSON?
No. For large objects, MessagePack can be significantly slower. One benchmark measured JSON.stringify at 45.5 ms versus MessagePack pack at 144.5 ms for a 10 MB object — roughly 3.17x slower. The raw size advantage also shrinks under gzip compression: one test found a 22 KB raw difference collapse to about 140 bytes after gzip.
How many fewer tokens does TOON use compared to JSON?
According to official toonformat.dev benchmarks across 5,016 LLM calls, TOON uses 39.9% fewer tokens than JSON overall. On flat uniform tables the reduction reaches 58.8% (67,778 vs 164,452 tokens). TOON also scores 27.7 accuracy-points per thousand tokens versus JSON's 16.4.
When should I use MessagePack vs TOON?
Use MessagePack (or BSON) for serializing data over the wire between services, persisting to disk, or any transport layer where binary encoding is accepted and human-readability is unnecessary. Use TOON when that same data needs to be placed inside an LLM prompt — TOON is text-native, token-efficient, and directly readable by the model without any encoding step.
Recommended Reading
When NOT to Use TOON: The Prompt-Tax Trap and How to Pick a Format
TOON isn't always the cheapest option. Learn about the 'prompt tax', the data shapes where JSON or CSV win, and a framework for choosing an LLM data format.
Markdown Tables vs TOON for LLM Prompts: Which Saves More Tokens?
Markdown tables look tabular but their pipes and dashes are pure token bloat. See how TOON keeps the table structure LLMs love—worth a 40% accuracy gain—without the alignment tax.
Edge AI on a Token Budget: Running Local LLMs with TOON
Small local models like Llama and Phi have tiny context windows. Learn how TOON's compact tables stretch limited context for on-device and edge AI.