Optimize OpenAI and Claude API Costs with TOON
Practical guide to reducing OpenAI GPT and Anthropic Claude API costs by 30-60% using TOON format. Includes code examples and implementation strategies.
API costs are the silent killer of AI startups and internal tools. A prototype that costs pennies to run can quickly balloon into thousands of dollars a month at scale. In this guide, we'll show you how to use TOON and TONL to slash those costs without changing your model or sacrificing quality.
The Cost Equation
Most LLM providers charge based on a simple formula:
Cost = (Input Tokens × Input Price) + (Output Tokens × Output Price)
To reduce cost, you have two levers: use a cheaper model (which may lower quality) or reduce the number of tokens. TOON focuses on the latter.
Strategy 1: Compressing Context Data
The most common use case for LLMs is RAG (Retrieval-Augmented Generation), where you retrieve relevant documents or data and feed them into the model's context. This data is often stored as JSON.
Implementation: Before injecting data into your prompt, convert it to TOON or TONL.
// TypeScript Example
import { jsonToToon } from "@toon-format/toon";
const contextData = await fetchUserData(userId);
// Instead of JSON.stringify(contextData)
const optimizedContext = jsonToToon(contextData);
const prompt = `
Analyze the following user data:
${optimizedContext}
Provide a summary of activity.
`;Impact: 30-50% reduction in input tokens. For a RAG application processing 1M tokens a day, this can save hundreds of dollars a month.
Strategy 2: Optimizing Model Outputs
When you ask an LLM to return structured data, you usually ask for JSON. The model then generates all the braces, quotes, and commas.
Implementation: Instruct the model to output TOON or TONL format instead.
const prompt = `
Extract the products from the text below.
Format the output as TOON (Token-Oriented Object Notation).
Text: ...
`;Impact: Faster generation (lower latency) and cheaper output costs. Since output tokens are often more expensive than input tokens (e.g., GPT-4), the savings here are magnified.
Strategy 3: Logging and Storage
While not a direct API cost, storing logs of your LLM interactions can get expensive. TOON and TONL are valid serialization formats for storage as well.
Storing your prompt logs in TOON or TONL format instead of JSON can reduce your database or log storage size by half, lowering infrastructure costs over time.
Case Study: E-commerce Chatbot
We worked with a client building a shopping assistant. Their initial prompt included a catalog of 50 products in JSON format, consuming ~3,000 tokens per request.
- Before (JSON): $0.03 per request (Input)
- After (TOON): The catalog was converted to TOON, reducing it to ~1,400 tokens. Cost dropped to $0.014 per request.
Annual Savings: With 10,000 requests per day, they saved over $50,000 per year just by changing the data format.
Start Saving Today
Optimizing for tokens isn't just about being "efficient"—it's about business viability. TOON and TONL provide drop-in solutions to improve your unit economics immediately. Try our free converter to see the savings, or compare formats in our TOON vs TONL guide and JSON vs TOON comparison.