An LLM generates probabilities for the next token. "The capital of France is ___" produces: Paris (99%), Pari (0.5%), banana (0.001%). Normally, you sample from these probabilities, weighted by their values. Paris is almost certain. Banana is almost impossible but technically possible.
Temperature controls how much you respect those probabilities. Low temperature (0.1): strictly follow the probabilities, pick high-probability tokens. High temperature (1.0+): flatten the probabilities, make low-probability tokens more likely.
Mathematically, temperature divides the probability scores before sampling. Temperature 0.1 makes the difference between 99% and 0.5% more extreme (the 99% token becomes 99.9% likely, the 0.5% token becomes nearly impossible). Temperature 2.0 makes the difference less extreme (the 99% token becomes 50% likely, the 0.5% token becomes 20% likely). Higher temperature increases randomness and creativity. Lower temperature increases predictability and consistency.
Temperature 0 is deterministic. Given identical input, identical output (usually - some models use tie-breaking). This is ideal for anything requiring consistency: customer support responses, data extraction, code generation. You don't want randomness in a contract extraction system.
Temperature 0.5-0.7 is moderate. Slightly creative but mostly consistent. Good for writing, summarization, coding assistance. You want variety but not chaos.
Temperature 1.0 is neutral. The model's native probability distribution. This is the baseline, though many applications tune it.
Temperature above 1.0 is creative. The model is less predictable, sometimes generates surprising or off-topic content. Useful for creative writing, brainstorming, generating diverse options. But also risky. High temperature makes hallucinations more likely.
The misconception is that temperature controls "creativity" in a human sense. Higher temperature doesn't make the model more intelligent or insightful. It makes the model more random. Sometimes randomness is valuable (generating multiple diverse solutions to a problem). Often randomness is harmful (hallucinating false facts).
Top-p (nucleus sampling) is related but different. Instead of controlling temperature, you set a probability threshold and sample from tokens that cumulatively represent 90% of probability. This sometimes works better than temperature for controlling randomness while preserving coherence.
Different tasks have different optimal temperatures. Factual tasks (Q&A, code generation, data extraction): 0-0.3. Creative tasks (writing, brainstorming): 0.7-1.0. Default in APIs: often 1.0 or 0.7.
The relationship between temperature and other parameters matters. High temperature plus max_tokens=10,000 is chaotic, might generate garbage. Low temperature plus max_tokens=10 is safe, generates concise responses. Interactive applications might use low temperature for faster, more predictable output. Offline tasks might use higher temperature to generate diverse options.
Confusion arises because temperature isn't always available as a parameter. Consumer APIs sometimes hide it. Different APIs use different default temperatures, so the same prompt produces different results on different platforms.
The underlying mechanism reveals a truth about LLMs: they're fundamentally probabilistic. Even asking "what's 2+2?" produces probabilities across all tokens. The model might assign 95% probability to "4" and 5% probability to something else. Temperature controls whether you accept that probabilistic nature or coerce the model toward single answers.
Common mistake: using high temperature when you want determinism, then being surprised by inconsistent output. Using low temperature when you want exploration, then getting bored repetitive output. Understanding temperature lets you tune model behavior to fit your needs.
Why It Matters
Temperature is a critical parameter for controlling model behavior. Wrong temperature settings produce bad results: inconsistent support responses (wrong temperature), hallucinated code (wrong temperature), boring creative writing (wrong temperature). For production systems, temperature tuning is essential. For interactive use, understanding temperature helps you get better results. Most users never tune temperature because it's hidden in defaults, but experts use it to dramatically improve outputs for specific tasks.
Example
A company builds a Q&A chatbot for product documentation. Using default temperature (1.0): the bot sometimes provides correct answers, sometimes invents plausible-sounding but false information, sometimes misses details. Switching to temperature 0.2: the bot becomes consistent, accurate, reliable. Now building a brainstorming assistant: temperature 0.2 makes it boring, always suggesting the same ideas. Switching to temperature 0.9: the assistant generates diverse, sometimes unconventional ideas. Same model, completely different behavior, tuned by temperature.