Streaming Multi-Gigabyte Datasets to LLMs with TONL

Q: Can TONL handle files too big for memory?

Yes. TONL's streaming engine processes files larger than 50 GB while holding less than 100 MB in memory at any time. It reads the file in chunks, so your process footprint stays flat regardless of dataset size. See tonl.dev for the full streaming API reference.

Q: How does TONL streaming work?

TONL streaming works by reading an input file in fixed-size chunks and emitting parsed records one at a time rather than building a full parse tree. The library maintains only the current chunk and a small lookahead buffer, so memory usage stays constant no matter how large the file is.

Q: What is streamQuery?

streamQuery is a pattern in the TONL library that combines streaming with a filter predicate so you can extract only the records that match a condition — without loading the whole file. The result is a smaller TONL slice you can pass directly to an LLM. Check tonl.dev docs for the exact current API signature.

Q: Is TONL production ready?

Yes. As of v2.5.2, TONL ships with 2,300+ passing tests and zero runtime dependencies. The stable API covers streaming, indexed queries, schema validation, auto TypeScript generation, CRUD with change-tracking, and multiple compression strategies.

Q: How many fewer tokens does TONL use compared to JSON?

TONL uses 32–50% fewer tokens than equivalent JSON, according to the official TONL documentation. With optional compression layers the savings can reach around 60%. Even with type hints enabled (which add roughly 20 tokens) the format still comes in about 32% smaller than JSON.

TONL streams files larger than 50 GB while consuming less than 100 MB of memory, and its indexed query engine resolves lookups in under 0.1 milliseconds. That means you can pull exactly the rows an LLM needs from a massive dataset — instead of truncating the file or blowing your context budget — and pass only the relevant TONL slice to the model.

Why you cannot just load a large dataset into a prompt

Even the most generous context windows top out somewhere between 128K and 2M tokens. A 10 GB JSON file of e-commerce orders translates to roughly 2–3 billion tokens. No model accepts that, and no organization wants to pay the inference bill even if one did.

The naive workarounds each carry their own costs. Truncating the file means silently dropping data the LLM might need. Loading the full file into memory to filter it first is fine for a 200 MB CSV, but it kills the process at 50 GB. Pre-chunking manually means your ETL pipeline has to know the answer before it asks the question.

TOON solves the token density problem — its tabular syntax reduces tokens by 39–59% on uniform arrays — but TOON has no streaming layer and no query API. It is a serialization format, not a data-access library. For files that do not fit in memory you need something built for scale from the start. That is the gap TONL fills. See the TOON vs TONL comparison for a full breakdown of where each format is the right tool.

How TONL streaming keeps memory flat at any file size

According to the TONL GitHub repository, the streaming engine reads an input file in fixed-size chunks and emits parsed records one at a time. The parser never builds a full in-memory representation of the document. Instead, it maintains only the current chunk and a small look-ahead buffer, so the process footprint stays flat — below 100 MB — regardless of whether the file is 1 GB or 100 GB.

A pattern like the one below illustrates the approach (check tonl.dev for the exact current API):

import { streamQuery } from "tonl";

// Stream a 10 GB file; filter to only the rows the LLM needs.
// Memory usage stays below 100 MB throughout.
const relevantRows = await streamQuery("10GB-orders.tonl", {
  where: (row) => row.status === "refunded" && row.amount > 500,
  limit: 200,
});

// relevantRows is a compact TONL string — safe to pass to an LLM.
const prompt = `Analyze these refunded high-value orders:
${relevantRows}`;

The streamQuery pattern combines streaming with a filter predicate. Records that do not match the condition are discarded immediately after parsing; only the matching rows accumulate. The result is a small TONL slice you can embed in a prompt rather than the full dataset.

This pattern is particularly valuable in ETL pipelines where the transformation stage needs to forward only anomalies, outliers, or specific segments to a downstream LLM for classification or enrichment — without materializing the whole dataset in RAM.

Building indexes for sub-millisecond lookups

Streaming is the right default for one-pass scans. But some pipelines need random-access reads: fetch a specific user record, look up log lines by timestamp range, join two large files on a shared key. For those patterns, TONL's indexed query API delivers lookups in under 0.1 milliseconds after a one-time index-build step.

Below is an illustrative pattern — verify the exact API at tonl.dev before using in production:

import { buildIndex, queryIndex } from "tonl";

// Build the index once (reads the file sequentially, ~disk-speed).
const idx = await buildIndex("50GB-events.tonl", {
  on: ["userId", "eventType"],
});

// Subsequent lookups are sub-millisecond — index stays on disk.
const userEvents = await queryIndex(idx, {
  where: { userId: "u_88421", eventType: "purchase" },
  select: ["timestamp", "amount", "sku"],
  limit: 50,
});

// Pass the compact slice to the LLM.
const prompt = `Summarize purchase history:
${userEvents}`;

The index is persisted to disk, so the build cost is paid once. Re-querying the same file — even after a process restart — costs only the lookup time, not a full rescan. On log-processing pipelines that query the same archive repeatedly, this eliminates the need to load the data into a separate database just to answer ad-hoc LLM questions.

The end-to-end pattern: stream → index → query → LLM

Putting the pieces together, a production pipeline that feeds large datasets to an LLM typically has four stages:

Ingest: Convert or produce data in TONL format. The json2toon.co converter handles one-off conversions from JSON; the TONL library handles programmatic generation in Node.js or TypeScript pipelines.
Stream-filter: Use streamQuery to produce a candidate set. This step is I/O-bound and memory-safe at any file size.
Index: For recurring queries on the same file, build an index once. Sub-millisecond lookups replace repeated full scans.
Inject: Pass the resulting TONL slice to the LLM. Because TONL uses 32–50% fewer tokens than JSON, the slice fits more rows in the same context budget.

The token savings compound with the selectivity of the filter. A 0.5% selectivity filter on a 50 GB file produces a 250 MB candidate set. Serialized as TONL instead of JSON, that shrinks by another 32–50% before it ever reaches the prompt builder. For more on the architecture behind TONL see the TONL architecture deep-dive.

Loading a 10 GB dataset: full JSON vs TONL streaming

Dimension	Full JSON (loaded to memory)	TONL streaming + query
Peak memory	~10–15 GB (parse tree overhead)	<100 MB (chunk buffer only)
Works in a 128K-token prompt?	No — full file is billions of tokens	Yes — filtered slice fits easily
Query latency (indexed)	Seconds to minutes (full scan or DB round-trip)	<0.1 ms after index build
Token overhead vs JSON	Baseline (100%)	32–50% fewer tokens
Runtime dependencies	JSON.parse (built-in)	Zero (TONL ships with no deps)
Test coverage	N/A (language built-in)	2,300+ tests passing (v2.5.2)

Sources: tonl.dev, github.com/tonl-dev/tonl.

Real-world use cases: ETL and log processing

Log triage pipelines

Application logs accumulate fast. A busy microservices deployment can generate tens of gigabytes of structured logs per day. The typical workflow — dump everything to S3, query with Athena, export results to a CSV, paste into ChatGPT — has too many manual steps and produces bloated output. With TONL streaming you can point the pipeline directly at the raw log archive, filter for error-level events in a specific time window, and produce a compact TONL slice ready for an LLM to classify root causes.

ETL enrichment

ETL pipelines increasingly use LLMs for enrichment steps: categorizing product descriptions, flagging anomalous transactions, normalizing address fields. The bottleneck is usually the "prepare data for LLM" step: serialize a batch, stay inside the context window, handle errors. TONL's streaming-plus-query pattern makes that step memory-safe and deterministic. The TONL introduction post covers the query API in more detail.

RAG pre-filtering

Retrieval-Augmented Generation pipelines often retrieve more chunks than the context window allows and then re-rank or truncate. A TONL streaming filter applied before the retrieval stage reduces the candidate pool without loading the full corpus, so the re-ranker sees a cleaner, smaller set. The 32–50% token reduction means each retrieved chunk costs less, leaving room for more context or a larger answer budget.

Token savings: what 32–50% means in practice

According to tonl.dev, TONL achieves 32–50% fewer tokens than equivalent JSON, with optional compression layers pushing savings to around 60%. Even with type hints enabled — which add roughly 20 tokens for schema metadata but unlock schema validation and TypeScript generation — the format still comes in about 32% smaller than JSON.

That is a meaningful improvement on top of the streaming and indexing benefits. A filtered slice that would occupy 4,000 tokens as JSON fits in roughly 2,000–2,700 tokens as TONL. On a model billed per token, that translates directly to cost reduction at scale.

For token-per-accuracy comparisons between TOON and TONL (different trade-offs, different use cases), see the TOON vs TONL guide. To convert an existing dataset quickly, use the json2toon.co converter or read the TONL docs.

Production readiness: what "2,300+ tests, zero runtime deps" actually means

A streaming library used in production data pipelines has a high bar for reliability. Silent data corruption or an off-by-one error in chunk boundaries can corrupt downstream LLM outputs in ways that are hard to detect. According to the TONL GitHub repository, v2.5.2 ships with over 2,300 passing tests and has zero runtime dependencies.

Zero runtime dependencies matters for deployment: no transitive CVEs to track, no unexpected version conflicts, no extra Docker image weight. The test suite covers streaming edge cases (chunk boundaries mid-record, multi-byte Unicode splits), query correctness, index persistence, schema validation, and all compression strategies.

The stable API surface includes streaming, indexed queries, schema validation, auto TypeScript type generation, CRUD with change-tracking and rollback, and seven compression strategies (Dictionary, Delta, RLE, Bit Packing, Column Reorder, Quantizer, Schema Inheritance). That breadth makes TONL usable as the primary data layer in an LLM pipeline rather than just a serialization helper.

Frequently Asked Questions

Can TONL handle files too big for memory?

Yes. TONL's streaming engine processes files larger than 50 GB while holding less than 100 MB in memory at any time. It reads the file in chunks, so your process footprint stays flat regardless of dataset size. See tonl.dev for the full streaming API reference.

How does TONL streaming work?