What Tokenization Means for Your Prompts
Tokenization isn't just a technical detail. It shapes how LLMs process your input. Understanding it changes the way you write prompts.
LLMs don’t read words. They read tokens. That distinction sounds pedantic until you realize it explains half the weird behavior you’ve seen: why some prompts work and others don’t, why “don’t” behaves differently than “do not,” and why your carefully crafted instructions sometimes get mangled. Tokenization is the hidden layer that shapes everything.
What Tokenization Actually Is
When you type a prompt, the model never sees your text as-is. It gets converted into tokens, chunks of characters that the model was trained on. A token might be a word, part of a word, a punctuation mark, or several characters. There’s no fixed rule. The tokenizer was built from the training data, so common patterns get their own tokens; rare patterns get split up.
For English, it’s roughly 4 characters per token on average. “Hello” might be one token. “Hello world” might be two or three. “Supercalifragilisticexpialidocious” gets chopped into several. The model literally cannot process input that hasn’t been tokenized first.
Why “Don’t” vs “Do Not” Matters
Here’s a concrete example. Type “don’t” and “do not” into the same model. They mean the same thing to you. To the tokenizer, they’re different.
“Don’t” might tokenize as two tokens: “don” + “‘t”. Or it might be one token if it appeared frequently in training. “Do not” is almost certainly two tokens: “do” + ” not”. The model’s probability calculations happen at the token level. Different token sequences can lead to different outputs, even when the semantic meaning is identical.
This matters for prompts. If you’re getting inconsistent results with contractions, try spelling them out. If you’re asking the model to avoid something, “do not include X” might behave more predictably than “don’t include X.” It’s not magic. It’s mechanics.
Code vs. Prose: Why Prompts Behave Differently
Code gets tokenized differently than prose. Programming languages have dense, repetitive structures. function, return, const: these show up thousands of times in training data. They often get single tokens or very efficient tokenization. Natural language is messier. Rare words, domain jargon, and creative phrasing get split into more tokens.
That’s why code generation often feels more reliable than creative writing. The model has seen more code tokens in similar contexts. The token patterns are more predictable. When you’re prompting for code, you’re working with a distribution the model knows well. When you’re prompting for nuanced prose or unusual phrasing, you’re pushing into less-traveled token space.
Context Windows Are Token Budgets
“Context window” sounds abstract. Think of it as a token budget. Every prompt you send, every response you get, every message in the conversation. It all consumes tokens. Hit the limit and the model either truncates (losing information) or refuses (losing the conversation).
This has practical implications. Long prompts eat your budget. So do long system instructions. If you’re working with a 128K context window, that sounds huge, until you realize a 50-page document plus your prompt plus the response can blow through it. Tokenization determines exactly how much fits. A document with lots of rare words or code might use more tokens than a document with common prose, even at the same character count.
When you’re designing prompts, you’re allocating a budget. Put the important stuff first. Trim the fluff. Know that “be concise” in your instructions costs tokens too, and the model might not follow it anyway if it’s buried in a long prompt.
What This Means for Your Prompts
Understanding tokenization changes how you write:
Be consistent with phrasing. If something works, the token pattern is working. Changing “don’t” to “do not” might change behavior. Document what works.
Put critical instructions early. Models tend to weight the beginning of the context more. Don’t bury your key constraints at the end of a long prompt.
Match the task to the token distribution. Code, structured data, and common patterns get better tokenization. Unusual requests need more explicit scaffolding.
Watch your budget. Long context isn’t free. Every token counts. If you’re hitting limits or getting truncated output, the fix might be shortening the prompt, not upgrading the model.
Tokenization isn’t something you need to obsess over for every prompt. But when something behaves oddly (inconsistent output, weird truncation, instructions being ignored), it’s often worth asking: what’s the token story here? The answer usually explains a lot.
Try it yourself: AI Tokenizer — see how AI models break your text into tokens using the same BPE algorithm as ChatGPT.
Get Insanely Good at AI
The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.
Keep Reading
NV-Raw2Insights-US Processes Raw Ultrasound Sensor Data
NVIDIA and Siemens Healthineers have released a physics-informed AI model that generates personalized speed of sound maps from raw baseband IQ channel data.
Context Windows Explained: Why Your AI Forgets
Context windows determine how much an AI model can 'see' at once. Here's what they are technically, how attention scales, and practical strategies for working within their limits.
System Prompts: How to Write Effective LLM Instructions
System prompts define how your LLM behaves. Here's how to structure them, what mistakes to avoid, and how provider-specific behavior affects your prompt strategy.
Chain of Thought Prompting: A Developer Guide
Chain of thought prompting makes LLMs reason through problems step by step. Here's when it works, when it doesn't, and how to implement it with practical patterns.
Few-Shot Prompting: How to Guide LLMs with Examples
Few-shot prompting teaches LLMs by example instead of instruction. Here's how to choose examples, format them, and know when few-shot is the right approach vs. fine-tuning.