9 min read

Designing TONL Schemas: Types, Validation & TypeScript Generation

A practical guide to TONL schemas: type hints (u32, str, bool), validation rules, and auto-generating TypeScript types—while staying ~32% smaller than JSON.

By JSON to TOON Team

TONL type hints — u32, str, bool — add roughly 20 tokens per schema block, but they unlock schema validation and automatic TypeScript generation. Even with types, your data is still about 32% smaller than JSON. For production AI pipelines that handle real contracts and codegen, the reliability payoff is almost always worth the tiny overhead.

What Are TONL Type Hints?

TONL (Token-Optimized Notation Language) is a text-first serialization format designed for LLM pipelines. Its base syntax saves 32–50% tokens versus JSON by eliminating repeated keys, braces, and quotation overhead. The optional type-hint layer sits on top of that: you annotate fields with primitive types, and the runtime uses those annotations for validation and code generation.

The built-in primitives are straightforward:

  • u32 — unsigned 32-bit integer
  • i64 — signed 64-bit integer
  • f64 — 64-bit float
  • str — UTF-8 string
  • bool — boolean
  • ?type — nullable variant of any primitive (e.g., ?str)

You can also define constraints inline — minimum/maximum values, string patterns, enum allowlists — which the validator enforces at parse time. See the TONL GitHub repository for the full constraint syntax.

Writing a Typed TONL Schema

The schema block precedes the data. Each field is declared with its name and type, separated by a colon. Nested objects use indentation. Here is an illustrative example of a product catalog schema:

@schema ProductCatalog
  id:        u32
  sku:       str
  name:      str
  price:     f64
  in_stock:  bool
  tags:      []str
  rating:    ?f64   # nullable — some products have no rating yet

@data products[3]:
  1, SKU-001, Wireless Keyboard, 49.99, true, [electronics,input], 4.7
  2, SKU-002, USB Hub, 19.99, true, [electronics,accessories], null
  3, SKU-003, Monitor Stand, 34.99, false, [desk,accessories], 3.9

Compare this to the equivalent JSON:

[
  {"id":1,"sku":"SKU-001","name":"Wireless Keyboard","price":49.99,
   "in_stock":true,"tags":["electronics","input"],"rating":4.7},
  {"id":2,"sku":"SKU-002","name":"USB Hub","price":19.99,
   "in_stock":true,"tags":["electronics","accessories"],"rating":null},
  {"id":3,"sku":"SKU-003","name":"Monitor Stand","price":34.99,
   "in_stock":false,"tags":["desk","accessories"],"rating":3.9}
]

The TONL version — schema block included — is still meaningfully smaller. Repeating field names on every JSON object is pure overhead; TONL pays that cost once in the header and then drops it across every subsequent row.

Auto-Generating TypeScript Interfaces from TONL Schemas

Once a schema exists, TONL's code generator can emit a TypeScript interface that mirrors it exactly. The workflow is illustrative of the pattern documented at tonl.dev:

// 1. Parse and generate (illustrative API — check tonl.dev for current syntax)
import { parseTonl, generateTypes } from "tonl";

const schema = await parseTonl(source, { schema: true });
const tsOutput = generateTypes(schema);

// 2. Write the output to a .d.ts or .ts file
await fs.writeFile("types/product-catalog.ts", tsOutput);

The generated TypeScript interface for the schema above would look something like this:

// Auto-generated — do not edit manually
export interface ProductCatalog {
  id: number;        // u32
  sku: string;       // str
  name: string;      // str
  price: number;     // f64
  in_stock: boolean; // bool
  tags: string[];    // []str
  rating: number | null; // ?f64
}

With this interface in your codebase, TypeScript catches mismatches at compile time. If a consuming service tries to assign a string to id, the compiler errors immediately — before the bug reaches production. This is the same benefit you get from Protobuf or Zod schemas, but you retain the compact, human-readable TONL representation for the data itself.

You can also use the JSON Schema Validator on this site to validate structures before converting them to TONL.

How TONL Validation Catches Malformed LLM Output

One of the most practical applications of TONL schemas is parsing LLM-generated data. LLMs occasionally hallucinate field names, flip booleans, or return integers as strings. Without a schema, these errors propagate silently. With TONL validation enabled, the parser throws structured errors the moment it encounters a mismatch:

// Illustrative validation example
import { parseTonl, TonlValidationError } from "tonl";

try {
  const result = parseTonl(llmOutput, {
    schema: productCatalogSchema,
    strict: true,
  });
  // result is typed as ProductCatalog[]
  return result;
} catch (err) {
  if (err instanceof TonlValidationError) {
    // err.field  => "price"
    // err.expected => "f64"
    // err.received => "string"
    console.error(`Validation failed: ${err.field} — expected ${err.expected}, got ${err.received}`);
  }
  throw err;
}

This catch-and-report pattern is especially valuable in multi-step agent pipelines where one step's output becomes the next step's input. A type mismatch caught at the boundary is far cheaper to debug than a cascade failure three steps later. The TONL runtime ships with 2,300+ tests and zero runtime dependencies, so it adds no supply-chain risk to your validation layer.

TONL with Types vs. Without vs. JSON: A Comparison

The table below summarizes the practical trade-offs across the three approaches, based on TONL's published benchmarks:

AttributeTONL with typesTONL without typesJSON
Token overhead vs JSON~32% smaller (≈38% saved typical)~35–50% smallerbaseline
Schema validationYes — type + constraint checksNoOnly with JSON Schema (external)
TypeScript codegenYes — auto-generated interfacesNoVia third-party tools only
Nullable fieldsExplicit (?str)ImplicitImplicit (null)
Human readabilityHighHighHigh
Best forProduction APIs, contracts, pipelinesOne-off prompts, rapid prototypingUniversal interop, browser APIs

The ~20-token schema overhead is not free, but it is consistent regardless of how many rows of data follow. On a 50-row response, those 20 tokens are well under 1% of the total. On a 500-row response, they are negligible. The break-even point is very low for any non-trivial payload.

When to Add Types — and When to Skip Them

Type hints are an optional layer, and the right choice depends on what your data is doing after it leaves the LLM context.

Use typed TONL schemas when:

  • The data crosses a service boundary — another team or service will consume it
  • You need TypeScript interfaces to keep application code in sync with the data shape
  • LLM output feeds directly into business logic (orders, pricing, user records)
  • You want the parser to catch hallucinations automatically rather than in application code
  • You are building a durable data contract that must survive format refactors

Skip types and use untyped TONL when:

  • The data is consumed once, in the same prompt it was generated
  • You are prototyping and the schema is still evolving rapidly
  • Token budget is extremely tight and you cannot afford even 20 overhead tokens
  • The downstream consumer is another LLM call, not application code

For a deeper look at how TONL compares to TOON for different pipeline shapes, see the TOON vs. TONL comparison. The introduction to TONL also covers the broader feature set including streaming for 50GB+ files and the SQL-like query API.

CRUD Operations and Change Tracking with Typed Schemas

TONL's schema layer integrates with its CRUD module: when you mutate a typed document, the runtime tracks the delta and can roll back to a previous state. This is useful for AI-driven editing workflows where an agent modifies structured records and you need an audit trail or undo capability.

Illustratively, the pattern looks like this:

// Illustrative CRUD + rollback pattern (see tonl.dev for current API)
import { TonlDocument } from "tonl";

const doc = TonlDocument.from(source, { schema: productCatalogSchema });

// Mutate — change tracking is automatic
doc.update("products", 2, { in_stock: true, price: 18.99 });

// Inspect the change log
console.log(doc.changelog()); // [{ field: "in_stock", from: false, to: true }, ...]

// Rollback if the LLM suggested a bad update
doc.rollback();

The type schema is what makes rollback safe here: the runtime knows the expected shape, so it can validate both the incoming mutation and the restored state. Without types, rollback is syntactically possible but semantically blind.

You can explore the full architecture behind these features in the deep-dive on TONL's architecture.

The TypeScript Generation Workflow in Practice

In a real project, the codegen step typically lives in a build script or a pre-commit hook so that types stay in sync automatically:

// package.json (illustrative)
{
  "scripts": {
    "generate:types": "tonl codegen schemas/ --out types/",
    "prebuild": "npm run generate:types"
  }
}

Running npm run generate:types walks the schemas/ directory, parses each .tonl schema file, and emits a corresponding .ts interface file into types/. From that point, your IDE enforces the contract everywhere the type is imported — API handlers, database adapters, and LLM output parsers alike.

This mirrors the workflow that Protobuf users are familiar with, but TONL schemas are plain text — no proto compiler, no binary encoding step, and no loss of the compact LLM-friendly representation that made you choose TONL in the first place.

If you are currently storing schemas as JSON Schema documents, the free json2toon.co converter can help you explore the equivalent TONL representation. For validating existing JSON Schema files, see the JSON Schema Validator.

Frequently Asked Questions

Do TONL type hints cost many tokens?

No. According to the TONL docs, type annotations (u32, str, bool) add roughly 20 tokens to a schema block. Even with types enabled, TONL data remains about 32% smaller than equivalent JSON — so the validation and TypeScript generation you gain cost almost nothing in real-world usage.

Can TONL generate TypeScript types?

Yes. TONL's schema layer can auto-generate TypeScript interfaces from a typed schema definition. The workflow is: write a TONL schema with type hints, run the code generator, and receive a .ts file with interfaces that match your data exactly — no manual type maintenance required.

How does TONL validation work?

TONL validates data against a schema at parse time. If a field declared as u32 contains a string, or a required field is missing, the parser throws a structured error. This is especially useful for catching malformed LLM output before it reaches your application logic.

Should I always use TONL schemas?

Not always. For one-off prompts where every token counts and the data is only consumed once, untyped TONL is fine. Use schemas when you have production pipelines, shared data contracts between services, or when you need auto-generated TypeScript types to keep application code and data in sync.

Recommended Reading

TONLSchemaValidationTypeScriptBest PracticesData Platform