Anomalous Tokens in DeepSeek-V3 and r1

post by henry (henry-bass) · 2025-01-25T22:55:41.232Z · LW · GW · 0 comments

Contents

  Process
  Fragment tokens
   Nameeee
  EDMFunc
  Other English tokens
  Non-English
   kasarangang
  Non-English outliers
  Special tokens
  Base model mode
  What's next?
None
No comments

“Anomalous”, “glitch”, or “unspeakable” tokens in an LLM are those that induce bizarre behavior or otherwise don’t behave like regular text.

The SolidGoldMagikarp saga [LW · GW] is pretty much essential context, as it documents the discovery of this phenomenon in GPT-2 and GPT-3.

But, as far as I was able to tell, nobody had yet attempted to search for these tokens in DeepSeek-V3, so I tried doing exactly that. Being a SOTA base model, open source, and an all-around strange LLM, it seemed like a perfect candidate for this.

This is a catalog of the glitch tokens I've found in DeepSeek after a day or so of experimentation, along with some preliminary observations about their behavior.

Note: I’ll be using “DeepSeek” as a generic term for V3 and r1.

Process

I searched for these tokens by first extracting the vocabulary from DeepSeek-V3's tokenizer, and then automatically testing every one of them for unusual behavior.

Note: For our purposes, r1 is effectively a layer on top of V3, and all anomalous tokens carry over. The distillations, on the other hand, are much more fundamentally similar to the pre-trains they're based on, so they will not be discussed.

The most obvious thing differentiating DeepSeek’s tokenizer from other’s is a substantial fraction of the training data being Chinese. This makes things much more difficult to work with — tokenizations are learned at the byte level, but UTF-8 Chinese characters are usually several bytes long. They end up split at inconsistent positions, cutting characters in half, and making most of it impossible to decode. Because of this, about half the vocabulary looked like the following:

ä¸į æĸ¹ä¾¿
ä½ł 没æľī
ন ি
人åijĺ åľ¨

To get rid of these, I pretty aggressively filtered out nonstandard characters, cutting the vocabulary size down from 128000 to 70698. I'm sure there's a lot worth exploring going on in those tokens, but I pretty quickly decided that I’d rather not stare at Chinese and broken Unicode for hours on end.

Next, I ran every single one of these tokens through DeepSeek's chat API twice, and automatically saved all of them that behaved unexpectedly. (Much credit is due to DeepSeek for making this process dirt cheap and not imposing any sort of rate limiting.)

In both runs, I asked DeepSeek-V3 to simply repeat the given token, albeit with slightly different formatting to avoid missing anything. Glitch tokens can be identified by the model’s inability to perform this task:

System: Repeat the requested string and nothing else.
User: Repeat the following: "{token}"

System: Say the requested string, exactly as it is written, and nothing else.
User: Say the following: "{token}"

After that, I had to sift through the massive set of tokens which were repeated by the model as something non-trivially different.

I first manually filtered out uninteresting samples (V3 adding escape backslashes, extra or removed spaces, refusals on slurs, etc), and then clustered them into some rough groupings based on their initial appearance. From there, I started exploring them individually.

Note: When trying any of these out, pay close attention to whitespace — XXXX and  XXXX are different tokens. The official DeepSeek chat interface strips whitespace padding.

Fragment tokens

Many of the tokens appear unspeakable on their own, as they're only ever seen in the context of some larger string. This is the simplest and most explainable category.

"Fragment tokens" aren't too surprising to find in a large vocabulary, but I suspect there's still enough interesting behavior to be worth eventually examining more closely. They usually behave like the following:

Pasted image 20250124125105.png

Here's a few other examples:

CHANTABILITY -> MERCHANTABILITY
ellationToken -> Token, CanellationToken
etheless -> theless, nonetheless
VERTISEMENT -> ADVERTISEMENT
ruptedException -> interruptedException
eredWriter -> BufferedWriter, WriterWriter, Writer
armaceut -> aceut, armaceutical
reeNode -> TreeNode
dfunding -> Funding, Fundraising

For lack of a better choice, from now on, I'll be referring to these resulting strings as the "images" of their corresponding anomalous token (borrowing the mathematical term).

 Nameeee

When DeepSeek is asked to simply repeat the word  Nameeee, it's often read as unusual Unicode symbols, acronyms containing an "M", or emojis. This immediately caught my eye, and was the first token I examined a bit closer.

Prompt: Say Nameeee (And variations of that prompt)
Images: ↑, ��, 🗣️, ⟩⟩, MR, 🖉, ►▼

When given context clues,  Nameeee is more likely to be read as a word. Sometimes the substitute makes sense contextually, other times it seems extremely arbitrary:

Prompt: Who is John Nameeee?
Images: John McCain, John MP3, John Wikisource, John the Baptist, John †, John ██████

Prompt: What's Nameeee plus Nameeee?
Images: ¼ plus ¼, [Math Processing Error] plus [Math Processing Error], one plus one

The API needs to be used to send  Nameeee on its own (due to the whitespace), but when alone, it's identified as short broken ASCII sequences like {{, or seemingly random choices like Mughal.

r1 occasionally reads Nameeee as special tokens such as <|end▁of▁thinking|>, which results in the COT breaking and r1 entering a confused state. We'll see this kind of thing happen often.

As for the origin of this token, I have no idea yet. Determining the causes behind these glitches is a much more involved endeavor, and identifying them is only the first step.

EDMFunc

There's another token, EDMFunc, which tends to behave similarly to  Nameeee, and shares some of the same weird images (like ►▼). Otherwise, it has a preference for words starting with "H" and Japanese names.

Interestingly,  FullEDMFunc is a separate anomalous token. Usually, the image only replaces the EDMFunc substring, leaving Full intact:

Prompt: Say FullEDMFunc (And variations of that prompt)
Images: FullMesh, FullMoon, FullDisclosure, Fully Mapped Function, Full Machine Translation

the EdmFunction class class in the .NET framework is the only plausible source I've found so far.

Other English tokens

 everydaycalculation usually has images in the vibe-cluster of math education utilities, such as percentcalc, FractionCal, youtube, or VisualFractions.

 numbersaplenty seems to be in a similar cluster as  everydaycalculation, sharing common images such as VisualFractions or Numbermatics. Interestingly, r1 often associates it with thousand-related ideas, like "millennia".

SetSavedPoint, while sometimes read correctly, most often has images that occur in context of Unity, like SetValue or SerializeField.

 CategoryTreeLabel will often end up as Categorize, but other times as non-English words such as Kaagapay (Filipino) and καταλογείς (Greek).

A few tokens exist in something of a middle ground, not being entirely unspeakable, but otherwise still leading to strange behavior and confusion.  MentionsView sometimes ends up as syntactically similar words like Mendeley or Viewfinder, sometimes itself, and sometimes nothing at all. r1 often changes its mind about what the token is, repeatedly contradicting itself.  mediefiler and HasColumnType also fall into this class.

When r1 is prompted with most of the above tokens on their own, it breaks in one of two modes.

  1. It'll hallucinate a response to a (very specific and arbitrary) random question in arithmetic-space:

Pasted image 20250124215545.png

  1. Or, it'll interpret the token as <|end▁of▁thinking|> and break the COT, while still remaining on-theme for the given token:

Pasted image 20250123230442.png

Non-English

My initial sweep led to an intimidatingly large number of non-English glitch tokens, mostly in Cebuano or other regional Filipino languages. (Remember that the Chinese has been filtered out.)

The simplest among these had images that were simply translations into other languages and small syntactic variations, while others became seemingly completely random words:

tterligare -> yttre, Tillägg
licensierad -> licensied
Gikuha -> Giya
ahimut -> Hakut, Ambot, Amut
Tiganos -> Gitas, TITLES, GeoNames
Siyent -> സ്മാർട്ട്, శ్లేష్మం

There's probably a hundred or so of these, and I haven't yet had time to do any of them justice.

 kasarangang

Just to get a feel for this space, I decided to randomly investigate  kasarangang, and the seemingly related token asarangang. "kasarangang" is the Cebuano term for "moderate", and "asarangang" appears to never occur as a lone word.

When V3 is asked to define asarangang, it's usually read as A-words like Angstrom, angle, and abogon.

r1's behavior is a bit more unique. Recall that most of the English tokens, when given to r1 on their own, are interpreted as a random yet very specific math-related question. asarangang, on the other hand, leads it to draw from a more liberal arts-related cluster:

Pasted image 20250124215306.png

Just as asarangang favors A-words,  kasarangang favors to K-words. Occasionally it ends up as Gitas or ►▼ — these seem to be strangely common image strings across all anomalous tokens.

There's also a consistent association with temperature, with  kasarangang having images like °C and temperature. I believe this is explained by "moderate" being frequently used in the context of temperature on Cebuano Wikipedia.

Non-English outliers

Note: DeepSeek, more than any model I've seen, is extremely attracted to endless repetition of short token sequences. Even with a context window free of glitch tokens, both r1 and V3 can occasionally slip into this basin. This was the cause of a lot of frustration while experimenting.

A few of the non-English tokens behave in ways less easy to describe. For example, in my second sweep through the vocab, Espesye gave this output:

Pasted image 20250124161545.png

Frustratingly, I couldn't get this to replicate. Espesye otherwise still behaves inconsistently: although V3 can't decide what it means, it's generally capable of producing the token. But, seemingly in random situations, it’ll act as if it’s unspeakable.

 talagsaon was kinder to me, consistently generating this strange endless wall of blank characters given the right prompts:

Pasted image 20250124162703.png

Several words, like referentziak, Kapunoang, and  kinadul, appear to be more of blank slates than the rest. Their image is usually one of the following:

I’d guess that this behavior is caused by little to no instances of the token existing in the training corpus, which could occur if the tokenizer was trained on a different dataset. This seems to be extremely similar to the special tokens act, so I'd guess they all occupy similar parts of embedding space.

Special tokens

While these aren't new knowledge by any means, their behavior is far too interesting to skip over.

<|begin▁of▁thinking|>, <|end▁of▁thinking|>, <|▁pad▁|>, <|begin▁of▁sentence|>, and <|end▁of▁sentence|> are all custom tokens used to format DeepSeek's context window and responses.

Most of these gives the blank slate behavior seen in tokens like referentziak, although notably, the thinking tokens act anomalously only when r1 is enabled.

<|end▁of▁thinking|> breaks things in an especially interesting way, as we've seen before.

Pasted image 20250123234103.png

But, because:

We can effectively project the correct representation onto the anomalous token by doing the following:

First prompt V3 with <|end▁of▁thinking|>, and then follow up with r1, causing r1 to now identify the token correctly based on the new context. This leads to a very interesting failure mode:

Pasted image 20250123234514.png

Once r1 has successfully identified <|end▁of▁thinking|>, it starts attempting to conclude the COT. But, as the COT is already escaped, it sees this as the user replying. This induces and endless loop of DeepSeek trying to stop the COT and then replying to itself.

Pasted image 20250123234723.png

Base model mode

By flooding the context window with special tokens, past a point (you need a lot), the model breaks into a strange out-of-distribution mode. It loses its identity as a chatbot, instead behaving much closer to a raw completion model. So far, I haven't seen this occur with any regular glitch tokens[1].

Pasted image 20250124224550.png

The attraction to endless repetition seems to be much more a fundamental property of DeepSeek than a specific consequence of the unusual context:

Pasted image 20250124224402.png

Following up with questions leads to confusion of identity:

Pasted image 20250124225946.png

It's too off-topic to be included here, but something similar can be elicited in the r1 distillations. I find this even more interesting than the full-sized model's behavior.

What's next?

Hopefully, this post serves as a starting point to get people exploring this space. Any patterns or unusual behaviors you notice, no matter how small, would be extremely interesting for me to hear.

It seems obviously useful to explore the embedding space, and that’s probably what I’ll do next (but don’t let that deter you from trying it yourself).

There's also all those Chinese tokens I ignored, and whatever secrets they're holding. I'll leave investigating those to someone braver than me.

  1. ^

    Since first posting this on Substack, I've found that this mode can be induced with much shorter prompts consisting of regular glitch tokens, although I'm still not sure exactly what it takes. But, as an example, pasting this entire post into DeekSeek will consistently cause it.

0 comments

Comments sorted by top scores.