Hallucination

TL;DR

When an LLM generates plausible-sounding but false, misleading, or fabricated information with high confidence.

An engineer asks ChatGPT for documentation about an API function. ChatGPT generates a detailed, well-formatted response. It looks authoritative. The syntax seems right. The parameter names make sense. The engineer implements it. The code fails. The function never existed. ChatGPT invented it. This is hallucination.

Here's the uncomfortable truth: LLMs don't understand truth the way humans do. They understand patterns. They're predicting the statistically likely next token based on billions of examples from the internet. The internet contains truth, lies, rumors, misinformation, and plausible falsehoods in equal measure. The model has no built-in mechanism to distinguish between them. A confident, well-structured lie looks identical to a confident, well-structured truth from the model's perspective. Both are just patterns in high-dimensional space.

Hallucinations are rampant. Studies find that GPT-3 hallucinates in roughly 8-14% of factual queries. Domain-specific questions have higher hallucination rates. Questions about recent events have higher rates. Anything outside the model's training data is a hallucination risk. But here's what makes hallucinations insidious: they're often plausible. The model doesn't say "I don't know." It confidently asserts false information using correct grammar and apparent expertise. This is arguably the most dangerous behavior in LLMs because people trust confident assertions.

Temperature amplifies hallucinations. When you increase temperature (making the model more "creative"), it hallucinates more. Decrease it, and hallucinations drop but creativity suffers. You can't eliminate hallucinations entirely without crippling the model's usefulness. Few-shot examples help but inconsistently. Chain-of-thought reasoning can reduce hallucinations in some scenarios but not others.

Different hallucination types exist. Factual hallucinations: inventing data. Logical hallucinations: reasoning that doesn't follow. Intrinsic hallucinations: contradicting previous statements. Extrinsic hallucinations: claiming things unverifiable but plausible. A model might hallucinate the title of a book that doesn't exist, or the date of an event it gets slightly wrong, or the author of a quote it misremembers. The spectrum is wide.

The constraint that makes hallucinations inevitable is that LLMs are trained on unsupervised text. Nobody manually labeled "this sentence is true" or "this sentence is false." The model learned to mimic the style and structure of confident assertions without distinguishing truth from fiction. To truly address hallucinations would require either explicit truth supervision (labeling massive datasets by hand), staying very conservative and refusing to answer uncertain questions (limiting usefulness), or using external knowledge (RAG).

Industries dependent on accuracy have learned hard lessons with hallucination. Legal firms tried using LLMs to research case law. The models cited cases that didn't exist with perfect judicial formatting. Investment advisors discovered their LLM assistant had invented analyst reports and financial metrics. Insurance companies found hallucinated policy details that customers believed. Each discovered the same way: someone checked the sources and found fabrication.

RAG significantly reduces hallucinations by grounding responses in external facts. If the knowledge isn't in your retrieval database, the model can't hallucinate it. This is the main reason RAG became essential infrastructure instead of a nice-to-have. It's the practical solution to the hallucination problem that LLM architecture can't solve.

Hallucination also varies by task. Multiple-choice questions show lower hallucination rates. Open-ended generation shows much higher rates. Questions requiring reasoning through uncertainty trigger more hallucinations. Mathematics problems show high hallucination because the model learned patterns of similar problems but not the underlying logic.

Why It Matters

Hallucination is the core limitation preventing LLMs from being reliable sources of truth. For enterprise applications, hallucinations mean risk. A hallucinating customer support bot gives wrong information. A hallucinating AI agent makes wrong decisions. A hallucinating analysis system produces faulty insights. Understanding and mitigating hallucinations is essential for building trustworthy AI. This is why factual systems require RAG, human oversight, and verification mechanisms rather than relying on base model knowledge.

Example

A research team building a competitive intelligence system based on LLMs discovers it's citing articles that don't exist and attributing quotes to executives who never said them. Executives seeing false competitive claims make business decisions on fabricated data. The system hallucinates plausible-sounding research because no external knowledge base grounds it. Switching to RAG that pulls from verified sources (actual published articles, earnings calls transcripts, filed documents) eliminates the hallucination problem entirely.

Related Terms

Both Vity and Synap address hallucination through context grounding, ensuring AI assistants stay accurate and factual within your personal or enterprise knowledge systems.