What is a context window in AI?

A context window is the maximum amount of text (measured in tokens) that a language model can process in a single request. It includes everything: your system prompt, tool definitions, conversation history, retrieved documents, and the model response. Exceeding it causes truncation or request failure.

How many tokens can GPT-5.4 and Claude process?

GPT-5.4 supports up to 1M tokens of context. Claude Opus 4.5 and Sonnet 4.5 support 200K tokens. Gemini 3.1 Pro leads with 1M tokens. However, larger context does not always mean better results. Performance can degrade when the context window is heavily filled.

What happens when you exceed the context window?

When input exceeds the context limit, the API either truncates your input silently or returns an error. Even before hitting the limit, model quality degrades as it struggles to attend to all the information. A practical rule is to stay below 80% utilization for best results.

← All tools

Context Window Calculator

Every model has a token budget. Paste your system prompt, tool definitions, and context to see exactly how much room you have left. Token counting uses the same algorithm as GPT.

Context budget

System prompt

Tools / functions

Conversation history

RAG context

Response reservation

Available

tokens used

remaining

utilized

$0.00

est. input cost

System prompt 0 tokens

Tools / functions 0 tokens

Conversation history 0 tokens

RAG context 0 tokens

Response reservation 4,096 tokens

Select a preset or enter your context to see how your token budget breaks down.

What is a context window?

A context window is the total amount of text (measured in tokens) that a language model can process in a single request. It includes everything: your system prompt, any tool definitions, the conversation history, retrieved documents, and the model's response.

When you exceed the context window, the model either truncates your input or refuses the request. Even before hitting the limit, performance degrades as the model struggles to attend to all the information. A general rule: stay below 80% utilization for best results.

Why budget planning matters

In production applications, your context fills up fast. A system prompt might use 500 tokens. Tool definitions for 10 functions could add 3,000 more. A few turns of conversation history adds another 2,000. If you're doing RAG, each retrieved chunk is 200-500 tokens. Suddenly you've used 8,000 tokens before the model generates a single word.

This calculator helps you plan that budget upfront, so you can choose the right model for your workload and avoid runtime surprises.

Get Insanely Good at AI

You can see the budget. But do you know what happens when you fill it? Why the AI seems to forget things mid-conversation, or why more context sometimes makes output worse? The book explains the mechanic behind all of it.

Get the Book

Frequently asked questions

What is a context window in AI?: A context window is the maximum amount of text (measured in tokens) that a language model can process in a single request. It includes everything: your system prompt, tool definitions, conversation history, retrieved documents, and the model response. Exceeding it causes truncation or request failure.
How many tokens can GPT-5.4 and Claude process?: GPT-5.4 supports up to 1M tokens of context. Claude Opus 4.5 and Sonnet 4.5 support 200K tokens. Gemini 3.1 Pro leads with 1M tokens. However, larger context does not always mean better results. Performance can degrade when the context window is heavily filled.
What happens when you exceed the context window?: When input exceeds the context limit, the API either truncates your input silently or returns an error. Even before hitting the limit, model quality degrades as it struggles to attend to all the information. A practical rule is to stay below 80% utilization for best results.