Building a Cost-Efficient Chatbot with Next.js & Vercel AI SDK
A step-by-step tutorial on building a chatbot using Next.js, Vercel AI SDK, and TOON for efficient tool calling and data passing.
Building an AI chatbot today is easier than ever thanks to tools like the Vercel AI SDK and Next.js. However, once you move past the "Hello World" phase and start retrieving data, calling tools, or handling complex context, you face a new problem: Cost. Today, we're going to build a chatbot that is not just functional, but fiscally responsible.
The Goal
We will build a simple chatbot that can "search" a mock product catalog. Crucially, we will use TOON (Token-Oriented Object Notation) for all internal data passing between the tool execution and the LLM context. Using TOON best practices ensures that even if our catalog grows, our token usage stays minimal.
Tech Stack
- Framework: Next.js 14+ (App Router)
- AI Integration: Vercel AI SDK (Core + React)
- Data Format: TOON (
@toon-format/toon) - Model: OpenAI GPT-4o (or any compatible model)
Step 1: Setup and Installation
First, create a new Next.js project if you haven't already:
npx create-next-app@latest my-toon-bot
cd my-toon-bot
npm install ai @ai-sdk/openai @toon-format/toon zodStep 2: The Data Layer (Mock)
export const products = [
{ id: 1, name: "Eco Tumbler", price: 25.00, inStock: true },
{ id: 2, name: "Wool Beanie", price: 18.50, inStock: true },
{ id: 3, name: "Graphic Tee", price: 30.00, inStock: false },
// ... imagine 50 more items
];
export async function searchProducts(query: string) {
return products.filter(p => p.name.toLowerCase().includes(query.toLowerCase()));
}Step 3: The API Route (Server Side)
This is where the magic happens. We will use streamText from the Vercel AI SDK. When the model invokes the getProducts tool, we won't return JSON. We will convert the data to TOON before sending it back to the model.
import { openai } from "@ai-sdk/openai";
import { streamText, tool } from "ai";
import { z } from "zod";
import { encode } from "@toon-format/toon";
import { searchProducts } from "@/lib/products";
export const maxDuration = 30;
export async function POST(req: Request) {
const { messages } = await req.json();
const result = streamText({
model: openai("gpt-4o"),
messages,
system: `You are a helpful assistant.
Note: underlying tools return data in TOON format (Token-Oriented Object Notation).
It uses indentation for nesting and header rows like [count]{columns} for lists.
Parse it naturally to answer user questions.`,
tools: {
getProducts: tool({
description: "Search for products by name",
parameters: z.object({
query: z.string().describe("The search term"),
}),
execute: async ({ query }) => {
const items = await searchProducts(query);
// HERE IS THE OPTIMIZATION:
// Instead of returning JSON.stringify(items), we return TOON.
// 'headerRow: true' is perfect for arrays of objects.
const formatted = await encode(items, {
headerRow: true,
indent: 2
});
return formatted;
},
}),
},
});
return result.toDataStreamResponse();
}Why do this?
By transforming the tool output to TOON, the text injected back into the conversation history (context window) is compressed. If getProducts returns 50 items, the TOON version might be ~800 tokens, whereas the JSON version could be ~1500 tokens. You just saved nearly 50% on that turn's cost and every subsequent turn that includes this history.
Step 4: The Frontend (Client Side)
The frontend remains standard. The Vercel AI SDK handles the streaming and state management. The user never sees the TOON format; they just see the chatbot's natural language response.
import { Inter } from "next/font/google";
import "./globals.css";
const inter = Inter({ subsets: ["latin"] });
export default function RootLayout({
children,
}: {
children: React.ReactNode;
}) {
return (
<html lang="en">
<body className={inter.className}>
<main className="min-h-screen bg-slate-50">
{children}
</main>
</body>
</html>
);
}"use client";
import { useChat } from "ai/react";
import { Send, Bot, User } from "lucide-react";
export default function Chat() {
const { messages, input, handleInputChange, handleSubmit } = useChat();
return (
<div className="flex flex-col w-full max-w-2xl mx-auto min-h-screen pb-32">
<header className="sticky top-0 z-10 bg-white/80 backdrop-blur-md border-b p-4 text-center">
<h1 className="text-xl font-bold bg-gradient-to-r from-blue-600 to-teal-500 bg-clip-text text-transparent">
TOON-Powered Assistant
</h1>
</header>
<div className="flex-1 p-4 space-y-6 mt-4">
{messages.length === 0 && (
<div className="text-center text-slate-400 mt-20">
<Bot className="w-12 h-12 mx-auto mb-4 opacity-20" />
<p>Ask me about our product catalog!</p>
</div>
)}
{messages.map((m) => (
<div
key={m.id}
className={`flex ${m.role === "user" ? "justify-end" : "justify-start"}`}
>
<div className={`flex max-w-[80%] gap-3 ${m.role === "user" ? "flex-row-reverse" : "flex-row"}`}>
<div className={`w-8 h-8 rounded-full flex items-center justify-center shrink-0 ${
m.role === "user" ? "bg-blue-600 text-white" : "bg-slate-200 text-slate-600"
}`}>
{m.role === "user" ? <User size={16} /> : <Bot size={16} />}
</div>
<div className={`rounded-2xl px-4 py-2.5 shadow-sm ${
m.role === "user"
? "bg-blue-600 text-white rounded-tr-none"
: "bg-white border rounded-tl-none text-slate-800"
}`}>
<p className="text-sm leading-relaxed whitespace-pre-wrap">{m.content}</p>
</div>
</div>
</div>
))}
</div>
<div className="fixed bottom-0 left-0 right-0 p-4 bg-gradient-to-t from-slate-50 via-slate-50 to-transparent">
<form
onSubmit={handleSubmit}
className="max-w-2xl mx-auto relative group"
>
<input
className="w-full pl-5 pr-12 py-4 bg-white border border-slate-200 rounded-2xl shadow-xl focus:outline-none focus:ring-2 focus:ring-blue-500/20 focus:border-blue-500 transition-all text-slate-900"
value={input}
placeholder="Type your message..."
onChange={handleInputChange}
/>
<button
type="submit"
className="absolute right-2 top-2 p-2.5 bg-blue-600 text-white rounded-xl hover:bg-blue-700 transition-colors disabled:opacity-50"
disabled={!input.trim()}
>
<Send size={18} />
</button>
</form>
<p className="text-[10px] text-center text-slate-400 mt-3">
Powered by Vercel AI SDK & TOON
</p>
</div>
</div>
);
}Conclusion & Next Steps
Congratulations! You have just built a "Chat with your Data" application that is optimized for the token economy. Here is what we accomplished:
- Seamless Integration: We slotted
toTOON()right into the tool'sexecutefunction. No complex re-architecture required. - Transparent Optimization: The LLM understands the format naturally (with a tiny system prompt nudge), and the user is none the wiser.
- Scalable Savings: As your product catalog grows or your user base expands, your token savings scale linearly with them.
This pattern applies to any structured data: user profiles, transaction histories, analytics reports, or documentation chunks. Wherever you have lists of objects, TOON is your wallet's best friend.
Recommended Reading
Optimizing RAG Pipelines with TOON
Learn how replacing JSON with TOON in your RAG context chunks can significantly reduce token usage, lower latency, and cut API costs.
Stop Using JSON for LLMs: The Case for Token Efficiency
Why JSON is costing you money and performance in AI applications, and how switching to TOON can reduce token usage by up to 60%.
Niche Developer Tools You Probably Aren't Using (But Absolutely Should) - TOON Edition
Discover how Warp, Ray, and HTTPie can supercharge your development cycle, and learn how the TOON format makes sharing tool outputs with AI more efficient.