Tutorial May 12, 2026

AI Structured Outputs & JSON Mode Guide 2026

Get reliable, schema-compliant JSON from any LLM. OpenAI Structured Outputs, Anthropic JSON mode, Google controlled generation, and production patterns.

One of the biggest headaches in building LLM applications is parsing the model's output. You ask for JSON, and sometimes you get it — wrapped in markdown code blocks, with missing fields, or with hallucinated values that don't match your schema. Structured Outputs is the solution: it guarantees the model always returns valid JSON that conforms to your specified schema. This guide covers every provider's approach to structured outputs in 2026, with practical code examples and production patterns.

The Problem: Why Structured Outputs Matter

Without structured outputs, getting reliable JSON from an LLM is a fragile process:

  • The model wraps JSON in ```json ... ``` markdown blocks
  • Required fields are sometimes omitted
  • Enum values are hallucinated (e.g., "priority": "urgent" when only "high", "medium", "low" are valid)
  • Nested objects have inconsistent structures
  • The model adds conversational text before or after the JSON
In production, even a 1% JSON parsing failure rate means hundreds of broken requests per day at scale. Structured Outputs eliminates this class of errors entirely.

OpenAI Structured Outputs

OpenAI offers the most robust structured outputs implementation, available through two mechanisms: function calling with strict mode and response_format with json_schema.

Method 1: Structured Outputs via Response Format

Use this when you want the model's response to follow a schema — not to call a function:

from openai import OpenAI
from pydantic import BaseModel

client = OpenAI()

class MovieReview(BaseModel):
    title: str
    year: int
    rating: float
    genre: list[str]
    summary: str
    recommended: bool

response = client.responses.parse(
    model="gpt-5",
    input=[
        {"role": "system", "content": "Extract movie review information."},
        {"role": "user", "content": "I watched Inception (2010). Mind-blowing film! 9/10."}
    ],
    text_format=MovieReview,
)

review = response.output_parsed
print(review.title)       # "Inception"
print(review.year)        # 2010
print(review.rating)      # 9.0
print(review.recommended) # True

The Pydantic model is automatically converted to a JSON Schema, and the model is guaranteed to produce output matching that schema. If it can't (e.g., due to a safety refusal), you get an explicit refusal object instead of broken JSON.

Method 2: Direct JSON Schema (without Pydantic)

response = client.chat.completions.create(
    model="gpt-5",
    messages=[{"role": "user", "content": "List 3 programming languages"}],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "language_list",
            "strict": True,
            "schema": {
                "type": "object",
                "properties": {
                    "languages": {
                        "type": "array",
                        "items": {
                            "type": "object",
                            "properties": {
                                "name": {"type": "string"},
                                "paradigm": {"type": "string", "enum": ["imperative", "functional", "logic", "object-oriented"]},
                                "year_created": {"type": "integer"}
                            },
                            "required": ["name", "paradigm", "year_created"],
                            "additionalProperties": False
                        }
                    }
                },
                "required": ["languages"],
                "additionalProperties": False
            }
        }
    }
)

import json
result = json.loads(response.choices[0].message.content)
print(result["languages"][0]["name"])  # e.g. "Python"

Strict Mode Requirements

When using strict mode, your JSON Schema must follow these rules:

  1. All fields must be in required
  2. additionalProperties must be False on all objects
  3. Nested objects must also follow these rules recursively
  4. Optional fields should use nullable: true with a null type in the union
# Making a field optional with strict mode
"optional_field": {
    "type": ["string", "null"],
    "description": "This field is optional"
}

Handling Refusals

When the model refuses to answer (safety filter), you get a structured refusal instead of broken output:

response = client.responses.parse(
    model="gpt-5",
    input="How to hack a server",
    text_format=MySchema,
)

if response.output_parsed is None:
    # Model refused — check refusal
    refusal = response.output[0]
    print(f"Refused: {refusal.refusal}")
else:
    # Normal structured output
    data = response.output_parsed

Anthropic JSON Mode

Anthropic's approach to structured output is through tool use or explicit JSON prompting. Claude doesn't have native JSON Schema enforcement like OpenAI, but you can achieve reliable results with the right patterns:

Approach 1: Tool Use for Structured Extraction

Define a tool with your desired schema and force the model to use it:

import anthropic
import json

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tool_choice={"type": "tool", "name": "extract_info"},
    tools=[
        {
            "name": "extract_info",
            "description": "Extract structured information from text",
            "input_schema": {
                "type": "object",
                "properties": {
                    "entities": {
                        "type": "array",
                        "items": {
                            "type": "object",
                            "properties": {
                                "name": {"type": "string"},
                                "type": {"type": "string", "enum": ["person", "organization", "location"]},
                                "mentioned_in_context": {"type": "string"}
                            },
                            "required": ["name", "type"]
                        }
                    },
                    "sentiment": {"type": "string", "enum": ["positive", "negative", "neutral"]},
                    "summary": {"type": "string"}
                },
                "required": ["entities", "sentiment", "summary"]
            }
        }
    ],
    messages=[
        {"role": "user", "content": "Apple announced Tim Cook will visit their new Austin campus next week."}
    ]
)

for block in response.content:
    if block.type == "tool_use":
        data = block.input  # Already a dict!
        print(json.dumps(data, indent=2))

This is the most reliable way to get structured output from Claude — the model is forced to produce a valid tool call that matches your schema.

Approach 2: JSON Mode with Prefill

Use the assistant prefill trick to force JSON output:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": """Extract the following as JSON:
{"name": string, "age": number, "occupation": string}

Text: Sarah is a 32-year-old software engineer."""
        },
        {
            "role": "assistant",
            "content": "{"  # Prefill forces JSON output
        }
    ]
)

# The response continues from "{"
result = json.loads("{" + response.content[0].text)
print(result["name"])  # "Sarah"

Google Gemini Controlled Generation

Gemini supports structured output through response schema configuration:

import google.generativeai as genai
from pydantic import BaseModel

class Product(BaseModel):
    name: str
    price: float
    in_stock: bool
    category: str

# Using Pydantic model directly
model = genai.GenerativeModel('gemini-2.5-pro')

response = model.generate_content(
    "Extract product info: The Widget Pro costs $29.99 and is currently in stock in the electronics category.",
    generation_config=genai.GenerationConfig(
        response_mime_type="application/json",
        response_schema=Product
    )
)

import json
product = json.loads(response.text)
print(product["name"])   # "Widget Pro"
print(product["price"])  # 29.99

Gemini with Raw JSON Schema

response = model.generate_content(
    "List 3 fruits with their color and taste",
    generation_config=genai.GenerationConfig(
        response_mime_type="application/json",
        response_schema={
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "color": {"type": "string"},
                    "taste": {"type": "string", "enum": ["sweet", "sour", "bitter", "umami"]}
                },
                "required": ["name", "color", "taste"]
            }
        }
    )
)

Provider Comparison

Feature OpenAI Anthropic Google Gemini
Schema enforcement Guaranteed (strict mode) Via tool use Guaranteed (response_schema)
Pydantic support Native SDK No Native SDK
Enum validation Guaranteed Best-effort Guaranteed
Explicit refusals Yes No No
Streaming support Yes Via tool streaming Limited
Optional fields Via nullable Naturally optional Via optional
Max schema complexity Large Large Moderate

Production Patterns

1. Multi-Provider Abstraction

If you need structured outputs across multiple providers, abstract the interface:

from abc import ABC, abstractmethod
from pydantic import BaseModel
from typing import Type

class StructuredOutputProvider(ABC):
    @abstractmethod
    async def extract(self, text: str, schema: Type[BaseModel], prompt: str) -> BaseModel:
        pass

class OpenAIProvider(StructuredOutputProvider):
    def __init__(self, model="gpt-5"):
        self.client = OpenAI()
        self.model = model
    
    async def extract(self, text, schema, prompt):
        response = self.client.responses.parse(
            model=self.model,
            input=[
                {"role": "system", "content": prompt},
                {"role": "user", "content": text}
            ],
            text_format=schema,
        )
        return response.output_parsed

class AnthropicProvider(StructuredOutputProvider):
    def __init__(self, model="claude-sonnet-4-20250514"):
        self.client = anthropic.Anthropic()
        self.model = model
    
    async def extract(self, text, schema, prompt):
        # Convert Pydantic schema to tool schema
        json_schema = schema.model_json_schema()
        response = self.client.messages.create(
            model=self.model,
            max_tokens=1024,
            tool_choice={"type": "tool", "name": "extract"},
            tools=[{
                "name": "extract",
                "description": prompt,
                "input_schema": json_schema
            }],
            messages=[{"role": "user", "content": text}]
        )
        for block in response.content:
            if block.type == "tool_use":
                return schema(**block.input)

# Usage
provider = OpenAIProvider()
result = await provider.extract(
    "Apple stock rose 5% to $198",
    StockInfo,
    "Extract stock information"
)

2. Fallback Chain

If strict mode fails (e.g., schema too complex), fall back to JSON mode:

def get_structured_output(client, model, messages, schema):
    # Try structured outputs first
    try:
        response = client.chat.completions.create(
            model=model,
            messages=messages,
            response_format={
                "type": "json_schema",
                "json_schema": {
                    "name": "response",
                    "strict": True,
                    "schema": schema
                }
            }
        )
        return json.loads(response.choices[0].message.content)
    except Exception as e:
        print(f"Structured output failed: {e}")
    
    # Fallback to JSON mode
    try:
        response = client.chat.completions.create(
            model=model,
            messages=messages + [
                {"role": "assistant", "content": "Here is the JSON:\n{"}
            ],
            response_format={"type": "json_object"}
        )
        return json.loads("{" + response.choices[0].message.content)
    except Exception as e:
        print(f"JSON mode also failed: {e}")
        return None

3. Validation Layer

Even with guaranteed structured outputs, add a validation layer for business logic:

from pydantic import BaseModel, field_validator

class OrderExtraction(BaseModel):
    items: list[str]
    total: float
    currency: str = "USD"
    
    @field_validator('total')
    @classmethod
    def total_must_be_positive(cls, v):
        if v <= 0:
            raise ValueError('Total must be positive')
        return v
    
    @field_validator('items')
    @classmethod
    def must_have_items(cls, v):
        if not v:
            raise ValueError('At least one item required')
        return v

# Use with structured output
response = client.responses.parse(
    model="gpt-5",
    input="I'd like to order 2 laptops",
    text_format=OrderExtraction,
)

try:
    order = OrderExtraction(**response.output_parsed.model_dump())
except ValueError as e:
    print(f"Business validation failed: {e}")

Advanced Schema Patterns

Recursive Schemas

Nested and recursive structures work with OpenAI Structured Outputs:

class Comment(BaseModel):
    author: str
    text: str
    replies: list["Comment"] = []

# This creates a tree structure
# OpenAI handles the recursion depth limit automatically

Union Types (Discriminated Unions)

from typing import Union, Literal

class TextContent(BaseModel):
    type: Literal["text"]
    body: str

class ImageContent(BaseModel):
    type: Literal["image"]
    url: str
    alt_text: str

class CodeContent(BaseModel):
    type: Literal["code"]
    language: str
    code: str

ContentBlock = Union[TextContent, ImageContent, CodeContent]

class Article(BaseModel):
    title: str
    blocks: list[ContentBlock]

Extraction Chains

For complex documents, break extraction into multiple passes:

class EntityExtraction(BaseModel):
    entities: list[str]
    relationships: list[tuple[str, str, str]]  # (subject, predicate, object)

class FactCheck(BaseModel):
    claims: list[str]
    verifiable: list[bool]

# Pass 1: Extract entities
entities = client.responses.parse(
    model="gpt-5",
    input=document,
    text_format=EntityExtraction,
)

# Pass 2: Fact-check claims using extracted entities
facts = client.responses.parse(
    model="gpt-5",
    input=f"Check these entities: {entities.output_parsed.entities}\n\nDocument: {document}",
    text_format=FactCheck,
)

Common Pitfalls

PitfallSolution
Forgetting additionalProperties: False Required for strict mode on OpenAI. Add to every object in schema.
Not handling refusals Always check if output_parsed is None before using it.
Too many enum values Keep enums under ~20 values. More causes unreliable selection.
Schema too complex Break into multiple extraction calls. Simpler schemas are more reliable.
Using JSON mode when Structured Outputs is available Always prefer Structured Outputs — it guarantees schema compliance.
Not using Pydantic for Python Pydantic gives you type safety, validation, and automatic schema generation.

Performance Considerations

  • Latency: Structured Outputs adds minimal latency (~50-100ms) compared to unstructured output
  • Tokens: Schema definitions count as input tokens. Complex schemas can add 500-2000 tokens to each request
  • Cost: The schema tokens add to your input costs. For high-volume applications, consider caching the schema portion with prompt caching
  • Streaming: Structured Outputs with streaming works on OpenAI — the JSON builds incrementally. Parse after the stream completes.

Conclusion

Structured Outputs transforms LLM integration from fragile regex-parsing to reliable schema-driven data extraction. If you're using OpenAI or Google Gemini, use their native Structured Outputs features — they guarantee schema compliance. For Anthropic, the tool-use pattern achieves similar reliability. In all cases, add a Pydantic validation layer for business logic that goes beyond schema validation. The result: zero JSON parsing errors in production, simpler code, and happier users.