Structured Output from LLMs: JSON Mode Explained
LLMs generate text, but applications need structured data. Here's how JSON mode, function calling, and schema enforcement turn free-form AI output into reliable, typed data.
LLMs generate text. They predict the next token, then the next, until they produce a response. That’s great for chat. It’s a problem when your application needs data it can actually use: a JSON object with specific fields, a list of extracted entities, or typed parameters for a function call. Free-form text doesn’t parse. You need structure.
Structured output solves this. It constrains the model to produce output your code can consume directly. No regex scraping. No brittle string parsing. Just valid, typed data.
JSON Mode: The Simplest Constraint
JSON mode tells the model to output valid JSON and nothing else. No markdown code fences, no explanatory text, no trailing commas. Just raw JSON that parses on first try.
OpenAI, Anthropic, Google, and most other providers support it. In the OpenAI API, you set response_format: { type: "json_object" }. The model is instructed internally to produce only JSON. Same idea across providers, slightly different parameter names.
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Extract name and email from: John Doe, john@example.com"}],
response_format={"type": "json_object"}
)
# response.choices[0].message.content is guaranteed parseable JSON
JSON mode guarantees parseable output. It does not guarantee a specific shape. The model might return {"name": "John Doe", "email": "john@example.com"} or {"person": {"name": "John Doe", "email": "john@example.com"}}. You still need to describe the structure in your prompt and validate on the client.
Function Calling and Tool Use
Function calling (OpenAI) and tool use (Anthropic) go further. Instead of asking for arbitrary JSON, you define functions with typed parameters. The model outputs structured arguments for those functions. Your code receives a parsed object that matches your schema.
This is how models “use tools.” You define a function like search_database(query: string) or send_email(to: string, subject: string, body: string). The model decides when to call it and fills in the parameters. The API returns a structured object you pass directly to your implementation.
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["location"]
}
}
}]
The model returns something like {"location": "San Francisco", "unit": "celsius"} when it decides to call get_weather. You parse it, validate it, and execute your actual weather API. The structure is enforced by the schema you provided.
Schema Enforcement: Strict Output Shapes
Some APIs let you enforce a full JSON Schema. OpenAI’s structured outputs (via response_format with a schema) and Anthropic’s tool use with strict schemas guarantee that the output conforms to your definition. Wrong field names, wrong types, missing required fields: the API rejects the output and retries or errors.
This is the strongest guarantee. You define exactly what you want:
{
"type": "object",
"properties": {
"sentiment": {"type": "string", "enum": ["positive", "negative", "neutral"]},
"confidence": {"type": "number", "minimum": 0, "maximum": 1},
"summary": {"type": "string"}
},
"required": ["sentiment", "confidence", "summary"]
}
The model cannot return "sentiment": "happy" if “happy” isn’t in the enum. It cannot omit confidence. The API validates before returning. Use this when you need strict contracts: data pipelines, API integrations, or any place where downstream code assumes a fixed shape.
Client-Side Validation: Your Safety Net
Even with schema enforcement, validate on the client. APIs can change. Models can occasionally produce malformed output. Network issues can truncate responses. Defensive parsing catches problems before they reach your business logic.
Pydantic (Python) and Zod (TypeScript) are the standard choices. Both parse JSON into typed objects and validate structure. Invalid data raises an error instead of propagating garbage.
from pydantic import BaseModel
class ExtractionResult(BaseModel):
name: str
email: str
result = ExtractionResult.model_validate_json(model_output)
import { z } from "zod";
const ExtractionResult = z.object({
name: z.string(),
email: z.string().email()
});
const result = ExtractionResult.parse(JSON.parse(modelOutput));
If the model returns {"name": "John", "email": "not-an-email"}, Zod or Pydantic rejects it. You catch the error, log it, retry, or fall back to a default. Never assume the model output is correct without validation.
When to Use Structured Output
API integrations. Your LLM extracts data from documents to populate a CRM, ticketing system, or database. The output must match the target schema. Structured output plus validation ensures it does.
Data extraction. Pull entities, classifications, or key-value pairs from unstructured text. RAG retrieves the right chunks; structured output extracts the right fields. The result feeds into search, filters, or downstream analytics.
Classification and routing. Sentiment, intent, category, priority. A fixed set of labels. Low temperature plus an enum schema gives you consistent, parseable classifications for routing logic.
Multi-step pipelines. Step 1: extract. Step 2: transform. Step 3: store. When each step’s output is the next step’s input, structured data is non-negotiable. Free-form text breaks the pipeline.
Function calling. The model decides which tool to use and with what arguments. Structured parameters are the interface. Without them, you’re back to parsing strings.
Practical Tips
Keep schemas simple. Complex nested structures increase failure rates. Flatten when possible. Prefer a few well-named fields over a deep hierarchy. The model has to fit its output to your schema; simpler schemas are easier to satisfy.
Provide examples. In your system prompt, show the exact JSON shape you want. One or two examples dramatically improve consistency. “Return JSON like: {“name”: ”…”, “email”: ”…”}” works better than “Return name and email as JSON.”
Handle validation failures gracefully. When Pydantic or Zod throws, don’t crash. Log the raw output and the error. Retry with a clearer prompt, fall back to a default, or surface the failure to the user. Production systems need fallbacks.
Use low temperature for structured tasks. Temperature controls randomness. For extraction, classification, and any task where you need consistent structure, set temperature to 0 or 0.2. High temperature increases the chance of malformed or inconsistent output.
Start with JSON mode, upgrade as needed. If a simple “return JSON” instruction plus client validation works, use it. Add schema enforcement when you need stricter guarantees. Add function calling when the model needs to choose and invoke tools. Each layer adds complexity; use the minimum that solves your problem.
Structured output turns LLMs from text generators into data producers. JSON mode, function calling, and schema enforcement give you the control. Client-side validation gives you the safety. Together they make LLM output reliable enough for production systems. For more on building AI applications that work in the real world, see Get Insanely Good at AI.
Get Insanely Good at AI
The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.
Keep Reading
GPT vs Claude vs Gemini: Which AI Model Should You Use?
A practical comparison of GPT, Claude, and Gemini. Their real strengths, pricing, context windows, and which model fits which task in 2026.
Fine-Tuning vs RAG: When to Use Each Approach
RAG changes what the model knows. Fine-tuning changes how it behaves. Here's when to use each approach, their real tradeoffs, and why the answer is usually both.
What Is AI Temperature and How Does It Affect Output?
Temperature controls how random or deterministic an AI model's output is. Here's what it does technically, how it relates to top-p and top-k, and when to adjust it.
What Is an LLM? How Large Language Models Actually Work
LLMs predict text, they don't understand it. Here's how large language models work under the hood, from training to transformers to next-token prediction, and why it matters for how you use them.
Prompt Engineering Guide: How to Write Better AI Prompts
Prompting isn't about magic phrases. It's structured thinking that determines output quality. Here's how to write prompts that actually work, from frameworks to chain-of-thought to system prompts.
Anthropic Makes Claude's 1M Token Context Generally Available
Anthropic made 1M-token context GA for Claude 4.6, removing long-context premiums and boosting throughput for large code and agent tasks.