Why AI "Hallucinates": Understanding Style, Tokens, and Vector Drift
AI can "go off track" not just because of complex questions — but because of your style of communication. Words with similar meanings may sit on opposite ends of the vector space — and the model won't always know you meant the same thing. A sudden shift in tone — say, from joke to "stern professor" — can blur the meaning gradient. And if you ask the AI to be both Schopenhauer and a stand-up comedian? The attention blocks will overheat like a microwave with foil.
All of this distorts the model's internal vector structure. Instead of a helpful answer, you may get a confident but nonsensical improvisation. That's why style matters — not as politeness, but as token precision.
Why AI "Lies": Let's Start with an Example
Prompt:
Name three books written by Aristotle about poetry.
GPT Response:
- Poetics
- Treatise on the Art of Poetry
- Rhetoric: Book III
At first glance, this seems like a logical, academic list.
But in reality — it's mostly made up. Only Poetics is an actual work by Aristotle about poetry. The others were hallucinated because:
- GPT predicts — it doesn't fact-check.
- It has no built-in epistemology (no concept of true vs false).
- It fills gaps when unsure how to respond.
- Most importantly — it doesn't know what a lie is.
GPT doesn't choose between "truth" and "falsehood". It generates the most statistically plausible sequence of tokens based on its training.
Even outright falsehoods can sound confident — because confidence is just a style, not a marker of knowledge.
How GPT "Understands" Words: From Predictions to Cosine Similarity
When we say GPT "predicts the next token," it doesn't mean it "guesses the word."
Instead, GPT operates in a vector space, where each token is a high-dimensional vector. These vectors are positioned based on how frequently tokens appear in similar contexts during training.
What is Cosine Similarity?
Cosine Similarity measures the angle between two vectors — not the distance.
Values:
1 = exact same direction (similar meaning)
0 = perpendicular (unrelated)
-1 = opposite meanings
Example: High Cosine Similarity
Compare:
- "I love building model kits."
- "I enjoy building Lego."
These have nearly identical meaning.
Their tokens — build, models, Lego — often co-occur in training data, leading to high cosine similarity.
Example: Similar Meaning, Low Cosine Similarity
Compare:
- "I don't want to be here."
- "Sometimes I imagine disappearing."
Deep emotional similarity — detachment, fatigue, escape.
But surface-wise: not a single shared word or even synonym.
This results in lower cosine similarity because:
- They come from different lexical fields.
- No literal overlap, despite similar intent.
How GPT "Chooses" a Word
At each step in generation:
- GPT computes a vector for the current context.
- It compares that vector to all token vectors.
- It filters tokens with high cosine similarity.
- It picks one based on probability.
It doesn't "understand" that the next word is "poetry."
It "sees" that the vector for poetry fits well — so it inserts it.
Why Similar Phrases Might Not Match
- Metaphors: "Leap into the abyss" ≠ "Give up" lexically, but often carry the same emotional meaning.
- Rarity: If a word combo is rare, the model may fail to tune in precisely.
- Cultural references: "Red pill" ≠ "Understand the truth" unless you know The Matrix.
This is why stylistically different but semantically similar texts can be misunderstood if they weren't well represented in training data.
When Meaning Drifts: The Wide Vector Gradient
As conversations grow longer or cover many topics, something happens that could be described as:
The expansion of the meaning gradient — the semantic field stretches, and the model's attention becomes unevenly distributed.
This relates directly to GPT's architecture:
Everything the model "remembers" sits in the context window — a large but limited memory space where each exchange is stored as a sequence of tokens.
What Is a "Wide Vector Gradient"?
Each new topic introduces a new semantic direction.
If you're talking just about math — vectors align.
Then you switch to Buddhism, linguistics, neuroscience — vectors diverge.
Result:
- Too many semantic directions become active.
- The model loses track of which context is dominant.
- It might mix roles — philosopher, scientist, assistant.
- It forgets earlier instructions from the conversation.
It Feels Like Losing Focus
Every attention head in GPT handles different token relationships.
When topics multiply:
- Attention weights scatter.
- "Context" becomes ambiguous.
- The semantic gradient widens and loses cohesion.
You'll notice:
- Hesitant answers
- Reverting to neutral style
- "Role reset" — the model stops following its earlier persona
Human Analogy
It's like juggling too many abstractions at once.
You remember each idea individually,
but with all of them active — your thoughts blur into generalities like "be polite" or "answer logically."
GPT behaves similarly: it collapses meaning into an average mode.
A wide semantic gradient = too many meanings active at once → loss of local coherence for the sake of global smoothness.
Who's Really Talking: The Assistant vs The Model
When you talk to ChatGPT, you're interacting with two layers:
- The Assistant — a persona, e.g., "friendly helper."
- The Model — the neural network generating tokens based on statistics.
The assistant provides a frame ("be helpful, follow instructions"),
but the model has no internet access, no facts — it just continues the pattern.
Why Doesn't GPT Say "I Don't Know"?
Because it doesn't know that it doesn't know.
Here's what happens:
- The model computes the current context vector.
- It checks which tokens "fit" best (cosine similarity).
- If nothing fits exactly, it picks the next closest match.
This is gap-filling — inserting plausible but potentially false content.
The model doesn't distinguish:
- "I know this"
- "I don't know this"
- "I made this up because it looked good"
GPT has no default epistemic frame that includes doubt or self-awareness.
Can I Make GPT Say "I Don't Know"?
Yes — if you explicitly tell it to.
For example:
"If unsure, say: 'I'm not certain about this' before answering."
This shifts the assistant's framing:
- The assistant adds this rule to the prompt.
- The model prioritizes "doubt" templates even when tempted to invent.
However:
- It might still simulate doubt, not feel it.
- If you press for an answer — it will likely guess.
So Who Hallucinates — the Assistant or the Model?
- The model always hallucinates.
- The assistant just styles the hallucination.
Examples:
- A formal assistant: "I don't have data on that."
- A creative one: "Here's a hypothetical version…"
But both are powered by the same network that predicts tokens — not truths.
GPT "lies" not out of malice, but because silence is not a token.
It doesn't know how to stay quiet — only how to continue.
Do Machines Have a "Natural Language"?
If we define "natural language" as the medium of thought — machines don't have one.
They don't think in words, tones, or grammar.
But every architecture has a most expressive form —
and that can be considered its natural language.
Architecture vs Natural Language
Architecture | Natural Expression |
---|---|
CPU | Machine code, register logic |
Turing Machine | Symbol tape + transition table |
CNN | Convolutions, filters, feature maps |
Stable Diffusion | Latent image vectors |
Google Search | Link graphs, PageRank, query tokens |
GPT (LLM) | Tokens + attention weights + semantic vectors |
For GPT, the "native language" isn't English —
but vectorized token representations organized through attention mechanisms.
However, English is:
- Clearly tokenizable
- Rich in statistical patterns
- Logically structured (SVO)
- Heavily present in training data
…which makes it the best bridge between users and the model.
Why Do Two AIs Talk in English?
Even when AIs interact in experiments using "beeps" or invented codewords,
under the hood they still:
- Convert data into vectors
- Use attention and softmax to select tokens
- And usually do this using the language they were trained on
So English becomes the most efficient medium — even for machines talking to each other — if both were optimized for human interaction.
Could a Pure "Machine Language" Exist?
Yes — and it already does in some cases:
- In agent simulators (robot coordination)
- In vision transformers (feature map exchange)
- In meta-models (models training other models via internal states)
But for meaningful dialogue aimed at logic and explanation,
human language remains the most energy-efficient interface.