Embeddings, Images & Binary Data: TOON and TONL vs Base64-in-JSON

Never tokenize raw binary data or float vectors into an LLM prompt. Base64-in-JSON is the worst possible approach — it inflates size by ~33% and turns every byte into tokens the model cannot reason about. The right pattern: keep blobs and embeddings in purpose-built storage and pass only compact references as TOON or TONL metadata tables.

Why does Base64-in-JSON waste so many tokens?

Base64 encoding converts arbitrary binary data into printable ASCII by mapping every 3 bytes of input into 4 output characters. That 4/3 ratio means a 100 KB image becomes roughly 133 KB of text before you have written a single JSON key around it. Add the JSON wrapper — "image": "data:image/png;base64,iVBOR..." — and you are looking at even more overhead.

The real damage happens at the tokenizer. A token is approximately 0.75 English words, but Base64 output is nothing like English. The encoded string is dense, non-repetitive, and largely opaque to LLM tokenizers. Each character contributes roughly one token, meaning that 133 KB of Base64 can consume 130,000+ tokens — more than the entire context window of many models. The model cannot perceive the image from that text anyway; it gains nothing from seeing the raw bytes.

The situation is no better with embedding vectors. A single 1,536-dimension OpenAI embedding serialized as JSON looks like [0.0023064255, -0.009327292, ...] for 1,536 floats. Each float occupies 6–12 characters. A conservative estimate puts one embedding at roughly 10,000–15,000 tokens. A RAG pipeline injecting even ten similar records fills a 128K context window halfway with numbers the model cannot usefully compare — it lacks the arithmetic precision to do nearest-neighbor search in-context.

What is the right architectural pattern for binary data in LLM pipelines?

The answer is separation of concerns: keep each data type in the storage system designed for it, and pass only the reference into the prompt.

Images and binary blobs — store in object storage (AWS S3, Cloudflare R2, Google Cloud Storage). Pass the URL or a short content-addressable hash into the prompt as a metadata field.
Embedding vectors — store in a vector database (Pinecone, Weaviate, pgvector). Run similarity search server-side; pass only the top-K result IDs and their scores into the prompt.
Structured metadata about those assets — serialize as TOON or TONL tables in the prompt. This is where the format savings actually apply.

This pattern is exactly what the TOON-optimized RAG pipeline guide covers in depth: retrieve relevant records from your vector store, then format only their metadata and content fields as TOON before injecting them into context.

Where should each data type live? (Decision table)

Data type	Recommended storage	What goes in the prompt	Best format for the prompt part
Text fields, labels, metadata	SQL / NoSQL DB	Full field values	TOON table
Embedding vectors (1K+ dims)	Vector DB (Pinecone, pgvector)	Result IDs + similarity scores	TOON table
Images / PDFs / binary blobs	Object storage (S3, R2, GCS)	URL or content hash + MIME type	TOON table
Small numeric time series (<200 pts)	TONL columnar file or DB	Compressed TONL column block	TONL columnar
Large numeric arrays (>200 pts)	Object storage or time-series DB	Aggregates (mean, p95, trend)	TOON key-value block

The key insight: TOON and TONL are text formats for LLM context windows. They excel at structured metadata — the compact references pointing to binary assets, not the binary assets themselves.

Base64-in-JSON vs a TOON metadata row: a concrete example

Consider an image record for a product catalog. Here is the same record as Base64-in-JSON (what you want to avoid) versus a TOON metadata table (the right approach):

Base64-in-JSON — do not do this in a prompt:

{
  "id": "img_8821",
  "product_id": "prod_441",
  "mime_type": "image/webp",
  "width": 1200,
  "height": 800,
  "image": "data:image/webp;base64,UklGRlQCAABXRUJQVlA4IEgCA
             ACwDgCdASqQAFAAPm02lUmkIqIhKAgAgA2JZQCdABOgA/gD
             ...  [~80,000 more Base64 characters] ..."
}

That single JSON object can cost 60,000–90,000 tokens depending on image size. The model sees the characters but cannot perceive the image from them.

TOON metadata table — the right approach:

images[3]{id, product_id, mime_type, width, height, url}:
  img_8821, prod_441, image/webp, 1200, 800, https://cdn.example.com/img_8821.webp
  img_8822, prod_441, image/webp,  800, 600, https://cdn.example.com/img_8822.webp
  img_8823, prod_442, image/jpeg,  600, 400, https://cdn.example.com/img_8823.jpeg

Three image records as a TOON table cost roughly 60–80 tokens — a reduction of three to four orders of magnitude. The model can read the IDs, dimensions, MIME types, and URLs, and reason about them accurately. When it needs to actually retrieve an image, it calls a tool with the URL.

According to the TOON official benchmarks, flat uniform tables like this achieve up to 58.8% fewer tokens than JSON (67,778 vs 164,452 tokens across a benchmark dataset), while maintaining 76.4% retrieval accuracy — higher than JSON's 75.0%.

When you must include numeric data: TONL's columnar compression

Sometimes you genuinely need numeric arrays in-context — a short sensor time series for anomaly detection, a small set of quantized embeddings for a few-shot comparison, or price history for a financial analysis prompt. For these cases, TONL's typed columnar format provides the most compact text representation.

TONL v2.5.2 supports six compression strategies that operate at the column level:

Dictionary encoding — maps repeated values to short integer codes. Ideal for categorical columns like status labels.
Delta encoding — stores differences between successive values instead of absolute values. Highly effective for monotonically increasing timestamps or IDs.
Run-Length Encoding (RLE) — collapses runs of identical values to a count plus the value. Useful for sparse or plateau regions in sensor data.
Bit Packing — stores small integers in the minimum number of bits required. Reduces token footprint for integer columns with low cardinality.
Quantizer — reduces float precision to a configurable number of significant digits, dramatically shrinking float array length.
Column Reorder — places high-compression columns first to improve overall compression ratio.

Combined, these compression layers allow TONL to achieve up to 60% fewer tokens than JSON on numeric-heavy payloads, according to tonl.dev. Even with type hints (u32, f32, bool) enabled — which add roughly 20 tokens for schema validation and TypeScript generation — TONL remains ~32% smaller than equivalent JSON.

For a deeper comparison of when to choose TOON versus TONL for numeric and structured data, see the TOON vs TONL guide and the architecture of TONL.

TONL columnar block for a short numeric time series:

sensor_readings[5]{ts:u32, temp:f32, humidity:f32}:
  1717200000, 21.4, 58.2
  1717200060, 21.6, 57.9
  1717200120, 21.5, 58.1
  1717200180, 21.8, 57.5
  1717200240, 22.0, 57.1

The u32 and f32 type hints tell TONL's parser to apply delta encoding on the timestamp column and quantization on the float columns before serialization. The result fits in a few dozen tokens per row — practical for short series that genuinely inform the model's reasoning.

Practical integration: RAG pipelines and multimodal agents

In a typical RAG pipeline, the flow should look like this:

User query arrives. Embed the query server-side using your embedding model.
Run approximate nearest-neighbor search in your vector DB. Retrieve top-K document IDs and similarity scores — not the vectors themselves.
Fetch the document text and metadata from your primary DB using those IDs.
Serialize the metadata as a TOON table. Include only the fields the model needs to reason about (title, source, date, score, excerpt).
Inject the TOON table into the prompt. The model reads structured, token-efficient context and cites sources by ID.

For multimodal agents that need to describe or select images, the same principle applies: the agent receives a TOON table of image metadata with URLs, selects the relevant ones by reasoning over the structured fields, then calls a tool to fetch or display those images by URL. The raw pixels never enter the context window.

This pattern is explored in more detail in the optimizing RAG pipelines with TOON and introducing TONL guides.

Trade-offs and when this advice does not apply

Two cases exist where you legitimately need binary data near the model:

Native multimodal APIs. OpenAI, Anthropic, and Google provide image input APIs that accept Base64 or URLs through a dedicated image_url content part — not as text tokens. The model processes the image through a vision encoder, not through its text tokenizer. This is architecturally different from inlining Base64 in a JSON string inside the prompt text. Use the native multimodal API; never paste Base64 into a text prompt field.

Tiny quantized embeddings. Some embedding models produce 64- or 128-dimension binary or int8 vectors. At that dimensionality, a TONL columnar block with Bit Packing compression can fit in a few hundred tokens. Whether the model can usefully compare them in-context is a separate question — for most use cases, a vector DB still wins — but the token cost is at least not catastrophic.

An independent study published on arXiv (2603.03306) found that TOON's efficiency advantage is non-linear: it pays off most on large, repetitive payloads. On very small payloads, the format-instruction overhead (the "prompt tax") can negate the savings. Metadata tables for binary assets are almost always large and repetitive enough to benefit, but always measure before committing.

Frequently Asked Questions

Should I put embeddings in the prompt?

No. Embedding vectors are dense float arrays — hundreds to thousands of numbers per record. Serializing them as text in a prompt wastes enormous context and gives the model nothing useful to reason about. Store embeddings in a vector database and pass only their IDs or similarity scores into the prompt.

How do I handle images with TOON?

Do not inline images in TOON at all. Upload the image to object storage (S3, R2, GCS) and reference it by URL or ID in a TOON metadata table. TOON is a text format designed for LLM context windows; raw image bytes belong in purpose-built storage, not in the prompt.

Is Base64 bad for LLM tokens?

Yes. Base64 encoding inflates binary data size by roughly 33%, and every character in that Base64 string becomes tokens in your prompt. A 100 KB image becomes ~133 KB of Base64 text, which can cost tens of thousands of tokens — dwarfing the cost of the rest of your prompt.

Can TONL store binary data?

TONL is a text-based format and does not store raw binary blobs. However, TONL's columnar compression features — including Dictionary encoding, Delta encoding, RLE, and Bit Packing — make it well-suited for compact representation of small numeric arrays like quantized embeddings or short sensor time series where you genuinely need the values in-context.

What format should I use for metadata around binary assets?

TOON is the best choice for metadata tables describing binary assets. Its tabular header-plus-rows format achieves up to 58.8% fewer tokens than JSON for flat uniform records — exactly the shape of an asset catalog with fields like id, url, mime_type, and width.

Try the free converter Explore the architecture of TONL