Why AI Hallucinates and How to Reduce It

AI hallucination is when a model generates something that sounds confident, reads fluently, and is completely wrong. It cites a paper that doesn’t exist. It invents a function that’s not in any library. It claims a historical event happened in the wrong year, with the wrong people, in the wrong city.

This isn’t a glitch. It’s a fundamental consequence of how language models generate text. Understanding why it happens is the first step to reducing it.

Why Models Hallucinate

Language models predict the next token. That’s the entire mechanism. Given a sequence of tokens, the model produces a probability distribution over the vocabulary and picks the next token. It doesn’t “know” facts. It learned statistical patterns during training, and it uses those patterns to generate text that looks like it belongs.

When you ask “Who wrote To Kill a Mockingbird?”, the model doesn’t look up the answer. It generates the token “Harper” because, in its training data, the phrase “Who wrote To Kill a Mockingbird” was overwhelmingly followed by “Harper Lee.” The model is right, but not because it knows. It’s right because the statistical pattern is strong.

When the pattern is weak, the model still generates something. Ask about a niche topic with little training data, and the model fills in gaps with plausible-sounding text. It doesn’t know it’s making things up. It doesn’t have a concept of “making things up.” It generates the most probable next token, always.

Temperature and Randomness

During generation, the model assigns a probability to every possible next token. Temperature controls how the model samples from this distribution. At temperature 0, it always picks the highest-probability token. This is the most deterministic (and often the most boring). At higher temperatures (0.7, 1.0), the model samples from a wider range of tokens, including less probable ones.

Higher temperature increases creativity and diversity. It also increases hallucination, because less probable tokens are, by definition, less grounded in the training data’s patterns. For factual tasks, lower temperature is safer. For creative writing, higher temperature produces more interesting text.

Confidence Without Calibration

One particularly dangerous aspect of hallucination: models are not well-calibrated. A well-calibrated system would be uncertain when it’s likely to be wrong. Language models generate hallucinated content with the same fluency, confidence, and grammatical correctness as factual content. There’s no stutter, no hedge, no visible uncertainty in the output. This makes hallucinated text hard to detect without external verification.

Types of Hallucination

Factual fabrication. The model invents facts: fake citations, nonexistent API endpoints, wrong dates, fictional people. This is the most commonly recognized type.

Logical inconsistency. The model contradicts itself within the same response. Paragraph 2 says X, paragraph 5 says not-X. This is especially common in long outputs where the model’s generation drifts.

Task misalignment. The model generates text that’s fluent and factually accurate but doesn’t answer the actual question. You asked for Python, it gives you JavaScript. You asked about version 3 of a library, it describes version 2. The facts might be correct. The response is still wrong.

Source fabrication. The model generates plausible-looking citations (real author names, real journal names) with completely fabricated details (wrong titles, wrong years, nonexistent DOIs). This is particularly insidious because the citations look legitimate and require manual verification to catch.

What Actually Reduces Hallucination

Grounding with RAG

Give the model real data and instruct it to answer only from that data. This is retrieval-augmented generation (RAG). Instead of asking the model to answer from its training data (where hallucination thrives), you provide the specific documents it should reference.

The model can still hallucinate with RAG. It might misinterpret the provided context, blend it with training knowledge, or generate claims that go beyond what the context supports. But the hallucination rate drops dramatically when the model has real, relevant context to work with instead of relying on learned patterns.

The key instruction: “Answer using ONLY the provided context. If the context doesn’t contain the answer, say you don’t have that information.” This gives the model permission to say “I don’t know,” which, without explicit instruction, it almost never does.

Structured Output

Asking a model to return structured output (JSON, XML, specific formats) constrains its generation. The model can’t ramble into hallucinated territory as easily when it has to fill specific fields. A structured response with fields for “source,” “confidence,” and “answer” forces the model to attribute its claims, and fields left empty are a signal that the model doesn’t have strong grounding.

Self-Consistency Checking

Generate the same response multiple times (with some temperature variation) and compare. If the model gives you the same answer 5 out of 5 times, it’s more likely correct. If it gives you 5 different answers, at least 4 of them are wrong, and possibly all 5.

This is expensive (5x the cost for one question), but effective for high-stakes decisions. It exploits a useful property: correct answers tend to be consistent across samples, while hallucinations vary because they’re drawn from the tail of the probability distribution.

Chain-of-Verification

Ask the model to generate a response, then ask it to extract the factual claims from that response, then ask it to verify each claim independently. This multi-step process catches many hallucinations because the model evaluates individual claims in isolation, without the conversational momentum that produced the original hallucination.

A claim that seemed natural in the flow of a paragraph might look obviously wrong when evaluated on its own. “The paper was published in Nature in 2019” is easy to assert in a paragraph, harder to defend when you ask the model: “Is there a paper with this title in Nature’s 2019 archives?”

Domain-Specific Prompting

Tell the model what domain it’s operating in and what expertise to apply. “You are a senior backend engineer reviewing Python code for security vulnerabilities” produces more grounded analysis than “Review this code.” Domain framing activates more relevant patterns from training and reduces drift into generic, surface-level responses.

Lower Temperature for Factual Tasks

This is simple and effective. For summarization, data extraction, fact-based Q&A, and code generation, use temperature 0 or near-0. Save higher temperatures for brainstorming, creative writing, and exploration.

Measuring Hallucination

You can’t reduce what you can’t measure. For any system where hallucination matters:

Build a test set. 50-100 questions with known correct answers. Run your system against this set after every change. Measure what percentage of responses are factually correct, partially correct, or hallucinated.

Human evaluation. For open-ended responses where there’s no single “right” answer, have humans rate responses on a scale (factual, plausible, fabricated). This is slow and expensive but necessary for production systems.

Citation verification. If your system generates citations or references, verify them automatically. Do the URLs resolve? Do the titles match? Does the content at the cited source actually support the claim? Many hallucinated citations fail these basic checks.

The Honest Truth

Hallucination cannot be eliminated. It’s not a bug to fix. It’s a property of probabilistic text generation. Every mitigation technique reduces frequency, not eliminates it. The question isn’t “How do I make my model never hallucinate?” It’s “How do I build a system where hallucinations get caught before they cause harm?”

That means verification layers, confidence thresholds, human review for high-stakes decisions, and designing your system so that a wrong answer is caught, not blindly trusted.

Chapter 3 of Get Insanely Good at AI covers the mechanics of model output, including why hallucination happens at the architectural level and practical frameworks for building systems that produce reliable, verifiable responses.