How much does it cost to use AI APIs like GPT or Claude?

AI API pricing is per-token, with separate rates for input and output. Costs range from $0.05 per million tokens for budget models like GPT-5.4 Nano to $15+ per million tokens for flagship models like Claude Opus 4.5. Your monthly bill depends on token volume, model choice, and whether you use features like prompt caching.

What is prompt caching and how does it reduce API costs?

Prompt caching stores frequently-used prompt prefixes so repeat calls pay a fraction of the normal input cost. Most providers offer 75-90% discounts on cached tokens. This is especially effective for applications that reuse the same system prompt or context across many requests.

How do agentic loops affect AI API costs?

In agentic workflows, a single user request triggers multiple LLM calls as the agent reasons, calls tools, and iterates. A 5-step agentic loop uses roughly 5x the tokens of a single completion. This makes model selection critical, because a budget model at 5 steps can still cost less than a flagship model at 1 step.

← All tools

AI API Cost Calculator

Set your expected token usage and see what each model costs per month. Toggle prompt caching and agentic loops to model real-world workloads.

Requests per day

Avg input tokens per request

Avg output tokens per request

Prompt caching

Agentic loop

Filter by provider

Estimated monthly cost

—

Cheapest model

—

Cheapest cost

—

Most expensive

—

Savings w/ caching

Configure your workload above to see how costs compare across 15 AI models.

Prices per million tokens sourced from official provider pages (March 2026). Cached input pricing used when prompt caching is enabled. Agentic loop multiplies both input and output token counts by the selected step count. Actual costs vary with response length, caching efficiency, and retry rates.

How AI API pricing works

Every major AI provider charges per token, with separate rates for input (your prompt) and output (the model's response). A token is roughly 4 characters of English text, or about 0.75 words. The price difference between models can be 100x or more for the same task.

Two features can dramatically reduce your costs. Prompt caching stores frequently-used prompt prefixes so you pay a fraction of the input cost on repeat calls, with most providers offering 75-90% discounts on cached tokens. Batch processing (not shown here) lets you submit requests asynchronously at 50% off.

Why agentic loops change the math

When you build AI agents, a single user request often triggers multiple LLM calls: the agent reasons, calls a tool, reads the result, reasons again, and repeats. A 5-step agentic workflow uses roughly 5x the tokens of a single chat completion. The multiplier slider above models this directly.

This is where model selection matters most. A budget model at 5x steps can still cost less than a flagship model at 1x. Understanding this tradeoff is core to production cost optimization.

Get Insanely Good at AI

This calculator shows you the price tag. The real challenge isn't knowing what a model costs. It's what happens when real users show up and costs start multiplying in ways you didn't plan for. The book is about crossing that gap between playing with AI and actually building with it.

Get the Book

Frequently asked questions

How much does it cost to use AI APIs like GPT or Claude?: AI API pricing is per-token, with separate rates for input and output. Costs range from $0.05 per million tokens for budget models like GPT-5.4 Nano to $15+ per million tokens for flagship models like Claude Opus 4.5. Your monthly bill depends on token volume, model choice, and whether you use features like prompt caching.
What is prompt caching and how does it reduce API costs?: Prompt caching stores frequently-used prompt prefixes so repeat calls pay a fraction of the normal input cost. Most providers offer 75-90% discounts on cached tokens. This is especially effective for applications that reuse the same system prompt or context across many requests.
How do agentic loops affect AI API costs?: In agentic workflows, a single user request triggers multiple LLM calls as the agent reasons, calls tools, and iterates. A 5-step agentic loop uses roughly 5x the tokens of a single completion. This makes model selection critical, because a budget model at 5 steps can still cost less than a flagship model at 1 step.