What Tokenization Means for Your Prompts
Tokenization isn't just a technical detail. It shapes how LLMs process your input. Understanding it changes the way you write prompts.
LLMs don’t read words. They read tokens. That distinction sounds pedantic until you realize it explains half the weird behavior you’ve seen: why some prompts work and others don’t, why “don’t” behaves differently than “do not,” and why your carefully crafted instructions sometimes get mangled. Tokenization is the hidden layer that shapes everything.
What Tokenization Actually Is
When you type a prompt, the model never sees your text as-is. It gets converted into tokens, chunks of characters that the model was trained on. A token might be a word, part of a word, a punctuation mark, or several characters. There’s no fixed rule. The tokenizer was built from the training data, so common patterns get their own tokens; rare patterns get split up.
For English, it’s roughly 4 characters per token on average. “Hello” might be one token. “Hello world” might be two or three. “Supercalifragilisticexpialidocious” gets chopped into several. The model literally cannot process input that hasn’t been tokenized first.
Why “Don’t” vs “Do Not” Matters
Here’s a concrete example. Type “don’t” and “do not” into the same model. They mean the same thing to you. To the tokenizer, they’re different.
“Don’t” might tokenize as two tokens: “don” + “‘t”. Or it might be one token if it appeared frequently in training. “Do not” is almost certainly two tokens: “do” + ” not”. The model’s probability calculations happen at the token level. Different token sequences can lead to different outputs, even when the semantic meaning is identical.
This matters for prompts. If you’re getting inconsistent results with contractions, try spelling them out. If you’re asking the model to avoid something, “do not include X” might behave more predictably than “don’t include X.” It’s not magic. It’s mechanics.
Code vs. Prose: Why Prompts Behave Differently
Code gets tokenized differently than prose. Programming languages have dense, repetitive structures. function, return, const: these show up thousands of times in training data. They often get single tokens or very efficient tokenization. Natural language is messier. Rare words, domain jargon, and creative phrasing get split into more tokens.
That’s why code generation often feels more reliable than creative writing. The model has seen more code tokens in similar contexts. The token patterns are more predictable. When you’re prompting for code, you’re working with a distribution the model knows well. When you’re prompting for nuanced prose or unusual phrasing, you’re pushing into less-traveled token space.
Context Windows Are Token Budgets
“Context window” sounds abstract. Think of it as a token budget. Every prompt you send, every response you get, every message in the conversation. It all consumes tokens. Hit the limit and the model either truncates (losing information) or refuses (losing the conversation).
This has practical implications. Long prompts eat your budget. So do long system instructions. If you’re working with a 128K context window, that sounds huge, until you realize a 50-page document plus your prompt plus the response can blow through it. Tokenization determines exactly how much fits. A document with lots of rare words or code might use more tokens than a document with common prose, even at the same character count.
When you’re designing prompts, you’re allocating a budget. Put the important stuff first. Trim the fluff. Know that “be concise” in your instructions costs tokens too, and the model might not follow it anyway if it’s buried in a long prompt.
What This Means for Your Prompts
Understanding tokenization changes how you write:
Be consistent with phrasing. If something works, the token pattern is working. Changing “don’t” to “do not” might change behavior. Document what works.
Put critical instructions early. Models tend to weight the beginning of the context more. Don’t bury your key constraints at the end of a long prompt.
Match the task to the token distribution. Code, structured data, and common patterns get better tokenization. Unusual requests need more explicit scaffolding.
Watch your budget. Long context isn’t free. Every token counts. If you’re hitting limits or getting truncated output, the fix might be shortening the prompt, not upgrading the model.
Tokenization isn’t something you need to obsess over for every prompt. But when something behaves oddly (inconsistent output, weird truncation, instructions being ignored), it’s often worth asking: what’s the token story here? The answer usually explains a lot.