What is a token in AI?

A token is the smallest unit of text that a language model processes. It can be a whole word, part of a word, a space, or a punctuation mark. For English text, one token is roughly 4 characters or 0.75 words. The word "tokenization" itself splits into multiple tokens.

How many tokens is a word?

On average, one English word is about 1.3 tokens. Short common words like "the" or "is" are single tokens. Longer or less common words get split into multiple tokens. Code, non-English text, and technical terms tend to use more tokens per word.

Why does tokenization matter for AI costs and context limits?

AI APIs charge per token, and every model has a maximum context window measured in tokens. Understanding how your text tokenizes helps you estimate costs accurately and stay within context limits. Efficient phrasing can reduce token count by 10-20% without changing meaning.

← All tools

AI Tokenizer

You write words. AI sees numbers. This is a real GPT tokenizer running the same algorithm as ChatGPT. Type anything and watch the transformation happen.

Enter any text

tokens

characters

tokens / word

$0.00

est. cost (GPT-4o)

Type something above to see how AI actually processes your text. Try the examples to see surprising differences.

What is tokenization?

Before a language model can process your text, it breaks it into smaller pieces called tokens. A token can be a word, part of a word, a space, or even a single character. The model doesn't see your text the way you do. It sees a sequence of token IDs.

This matters because the way text gets tokenized affects everything: how much context the model can process, how much it costs to use, and even how well the model understands your input.

Why should you care?

Cost control. API pricing is per-token. Knowing how many tokens your prompt uses helps you optimize spending.
Context limits. Every model has a token limit. Understanding tokenization helps you stay within bounds without cutting important context.
Better prompts. Some phrasing tokenizes more efficiently than others. Understanding this makes your prompts more effective.

Get Insanely Good at AI

You can see the tokens now. But what does the model actually do with them? Why does it sometimes give brilliant answers and sometimes confidently make things up? The book gives you the mental models that change how you work with AI entirely.

Get the Book

Frequently asked questions

What is a token in AI?: A token is the smallest unit of text that a language model processes. It can be a whole word, part of a word, a space, or a punctuation mark. For English text, one token is roughly 4 characters or 0.75 words. The word "tokenization" itself splits into multiple tokens.
How many tokens is a word?: On average, one English word is about 1.3 tokens. Short common words like "the" or "is" are single tokens. Longer or less common words get split into multiple tokens. Code, non-English text, and technical terms tend to use more tokens per word.
Why does tokenization matter for AI costs and context limits?: AI APIs charge per token, and every model has a maximum context window measured in tokens. Understanding how your text tokenizes helps you estimate costs accurately and stay within context limits. Efficient phrasing can reduce token count by 10-20% without changing meaning.