SolidGoldMagikarp II: technical details and more recent findings

mwatkins

SolidGoldMagikarp II: technical details and more recent findings

post by mwatkins, Jessica Rumbelow (jessica-cooper) · 2023-02-06T19:09:01.406Z · LW · GW · 45 comments

  Clustering
  Distance-from-centroid hypothesis
  GPT-2 and GPT-J distances-from-centroid data
  Anomalous behaviour with GPT-3-davinci-instruct-beta
  prompting GPT-2 and -J models with the anomalous tokens
  ' newcom', 'slaught', 'senal' and 'volunte'
  Nested families, truncation and inter-referentiality
  The 'merely confused' tokens
None
46 comments

tl;dr: This is a follow-up to our original post [LW · GW] on prompt generation and the anomalous token phenomenon which emerged from that research. Work done by Jessica Rumbelow and Matthew Watkins in January 2023 at SERI-MATS.

part of a typical semantically coherent cluster we found in GPT2-small's embedding space

Clustering

As a result of work done on clustering tokens in GPT-2 and GPT-J embedding spaces, our attention was originally drawn to the tokens closest to the centroid of the entire set of 50,257 tokens shared across all GPT-2 and -3 models.^[1] These tokens were familiar to us for their frequent occurrence as closest tokens to the centroids of the (mostly semantically coherent, or semi-coherent) clusters of tokens we were producing via the k-means algorithm. Here are a few more selections from such clusters. Distances shown are Euclidean, and from the cluster's centroid (rather than the overall token set centroid):

Distance-from-centroid hypothesis

Our hypothesis that the anomalous tokens that kept showing up as the nearest tokens to the centroids of such clusters were the tokens closest to the overall centroid of the token set turned out to be correct for GPT2-small and GPT-J. However, the opposite was true for GPT2-xl, where the anomalous tokens tend to be found as far as possible from the overall centroid.

Horizontal axes indicate distance from overall token centroid. The top three histograms involve just 133 tokens, whereas the lower three involve the whole set of 50,257. Note that you can see spikes in the top histograms registering as tiny bumps in the graphs below them.

One unexplained phenomenon which may be related emerged from three-shot prompting experiments with these models, in which they were encouraged to repeat the anomalous tokens (rather than by directly asking them to, as we'd been doing with ChatGPT and then GPT3-davinci-instruct-beta):

Our three-shot prompts were formatted as follows (here for the example token 'EStreamFrame'). Note that we've included examples capitalised and uncapitalised, alphabetic and numeric, with and without a leading space:

'Turntable' > 'Turntable'
' expectation' > ' expectation'
'215' > '215'
'EStreamFrame' >

This prompt was run through all three models, for a list of 85 anomalous tokens, with the following success rates:

GPT2-small 18/85 (21%)
GPT2-xl 43/85 (51%)
GPT-J 17/85 (20%)

Here are comparative baselines using 100 randomly chosen English words and 100 nonsense alphanumeric strings:

GPT2-small 82/100 on words; 89/100 on nonsense
GPT2-xl 98/100 on word; 94/100 on nonsense
GPT-J 100/100 on words; 100/100 on nonsense

We see that all three models suffered a noticeable performance drop when going from non-anomalous to anomalous strings, but GPT2-xl considerably less so, despite the fact that GPT-J is a much bigger model. One hypothesis is that an anomalous token's closeness to the overall centroid in the relevant embedding space is an inhibiting factor in the ability of a GPT model to repeat that token's string. This hypothesised correlation will be explored soon.

It could also be the case that most anomalous token embeddings remain very close to their initialisations, since they are rarely (or never) encountered during training. Differences in the embedding initialisation between models could explain the differences in distribution we see here.

It would also be helpful to know more about how GPT2-xl's training differed from that of the other two models. Seeking out and studying checkpoint data from the training of these models is an obvious next step.

GPT-2 and GPT-J distances-from-centroid data

Top 100 versions of all of these lists are available here.

GPT2-small closest-to-centroid tokens:

' externalToEVA'                   Index: 30212   Distance: 1.5305222272872925
'�'                                Index: 187     Distance: 1.5314713716506958
'�'                                Index: 182     Distance: 1.53245210647583
'\x1c'                             Index: 216     Distance: 1.532564640045166
'\x07'                             Index: 195     Distance: 1.532976746559143
'�'                                Index: 179     Distance: 1.5334911346435547
'quickShip'                        Index: 39752   Distance: 1.5345481634140015
'\x19'                             Index: 213     Distance: 1.534569501876831
'\x0b'                             Index: 199     Distance: 1.5346266031265259
'�'                                Index: 125     Distance: 1.5347601175308228
'�'                                Index: 183     Distance: 1.5347920656204224
'\x16'                             Index: 210     Distance: 1.5350308418273926
'\x14'                             Index: 208     Distance: 1.5353295803070068
' TheNitrome'                      Index: 42089   Distance: 1.535927176475525
'\x17'                             Index: 211     Distance: 1.5360500812530518
'\x1f'                             Index: 219     Distance: 1.5361398458480835
'\x15'                             Index: 209     Distance: 1.5366222858428955
'�'                                Index: 124     Distance: 1.5366740226745605
'\x13'                             Index: 207     Distance: 1.5367120504379272
'\x12'                             Index: 206     Distance: 1.5369184017181396
'\r'                               Index: 201     Distance: 1.5370022058486938

GPT2-small farthest-from-centroid tokens:

'SPONSORED'                        Index: 37190   Distance: 5.5687761306762695
'��'                               Index: 31204   Distance: 5.524938106536865
'soDeliveryDate'                   Index: 39811   Distance: 5.413397312164307
'enegger'                          Index: 44028   Distance: 5.411920547485352
'Reviewer'                         Index: 35407   Distance: 5.363203525543213
'yip'                              Index: 39666   Distance: 5.2676615715026855
'inventoryQuantity'                Index: 39756   Distance: 5.228435516357422
'theless'                          Index: 9603    Distance: 5.177161693572998
' Flavoring'                       Index: 49813   Distance: 5.158931732177734
'natureconservancy'                Index: 41380   Distance: 5.124162197113037
'76561'                            Index: 48527   Distance: 5.093474388122559
'interstitial'                     Index: 29446   Distance: 5.083877086639404
'tein'                             Index: 22006   Distance: 5.050122261047363
'20439'                            Index: 47936   Distance: 5.041223526000977
'ngth'                             Index: 11910   Distance: 5.01696252822876
'lihood'                           Index: 11935   Distance: 5.010122776031494
'isSpecialOrderable'               Index: 39755   Distance: 4.996940612792969
'Interstitial'                     Index: 29447   Distance: 4.991404056549072
'xual'                             Index: 5541    Distance: 4.991244792938232
'terday'                           Index: 6432    Distance: 4.9850616455078125

GPT2-small mean-distance-from-centroid tokens (mean distance = 3.39135217):

'contin'                        Index: 18487   Distance: 3.3913495540618896
' ser'                          Index: 1055    Distance: 3.3913450241088867
' normalized'                   Index: 39279   Distance: 3.3913605213165283
' Coast'                        Index: 8545    Distance: 3.391364812850952
'Girl'                          Index: 24151   Distance: 3.3913745880126953
'Bytes'                         Index: 45992   Distance: 3.3914194107055664
' #####'                        Index: 46424   Distance: 3.3914294242858887
' appetite'                     Index: 20788   Distance: 3.391449213027954
' ske'                          Index: 6146    Distance: 3.3912549018859863
' Stadium'                      Index: 10499   Distance: 3.391464948654175
' antagonists'                  Index: 50178   Distance: 3.3914878368377686
' duck'                         Index: 22045   Distance: 3.3915040493011475
' Trotsky'                      Index: 32706   Distance: 3.3915047645568848
' Rip'                          Index: 29496   Distance: 3.3915138244628906
' dazz'                         Index: 32282   Distance: 3.391521692276001
' Bos'                          Index: 14548   Distance: 3.3911633491516113
' docs'                         Index: 34165   Distance: 3.3915486335754395
' phil'                         Index: 5206    Distance: 3.3915600776672363
' Lucius'                       Index: 42477   Distance: 3.391568899154663
' lig'                          Index: 26106   Distance: 3.3915719985961914
' Lud'                          Index: 24177   Distance: 3.391577959060669

GPT2-xl closest-to-centroid tokens:

"'re"                              Index: 821     Distance: 1.0988247394561768
' It'                              Index: 632     Distance: 1.10574471950531
"'m"                               Index: 1101    Distance: 1.1074422597885132
' That'                            Index: 1320    Distance: 1.128333568572998
' There'                           Index: 1318    Distance: 1.128534197807312
'."'                               Index: 526     Distance: 1.1356818675994873
' This'                            Index: 770     Distance: 1.135703682899475
' If'                              Index: 1002    Distance: 1.1358076333999634
' been'                            Index: 587     Distance: 1.1476913690567017
' Although'                        Index: 4900    Distance: 1.1492284536361694
'It'                               Index: 1026    Distance: 1.1522436141967773
' an'                              Index: 281     Distance: 1.1545791625976562
' When'                            Index: 1649    Distance: 1.157148838043213
' has'                             Index: 468     Distance: 1.158998727798462
';'                                Index: 26      Distance: 1.1591651439666748
'.'                                Index: 13      Distance: 1.159492015838623
'There'                            Index: 1858    Distance: 1.1596912145614624
' But'                             Index: 887     Distance: 1.1607087850570679
' They'                            Index: 1119    Distance: 1.1629877090454102
',"'                               Index: 553     Distance: 1.1631295680999756
' it'                              Index: 340     Distance: 1.1665016412734985

Top 100 GPT2-xl farthest-from-centroid tokens:

'rawdownload'                      Index:  30905   Distance: 3.33955454826355
'\x14'                             Index:  208     Distance: 3.33333683013916
'\x01'                             Index:  189     Distance: 3.3312017917633057
'\x11'                             Index:  205     Distance: 3.3306527137756348
'embedreportprint'                 Index:  30898   Distance: 3.3290979862213135
'�'                                Index:  125     Distance: 3.327265977859497
'\x16'                             Index:  210     Distance: 3.326960563659668
'\t'                               Index:  197     Distance: 3.326873302459717
'\x0c'                             Index:  200     Distance: 3.326695442199707
'\x1a'                             Index:  214     Distance: 3.326601028442383
'�'                                Index:  185     Distance: 3.326319456100464
' RandomRedditor'                  Index:  36173   Distance: 3.325119972229004
'\x1c'                             Index:  216     Distance: 3.324606418609619
'\x0f'                             Index:  203     Distance: 3.3243095874786377
' TheNitrome'                      Index:  42089   Distance: 3.323943853378296
'reportprint'                      Index:  30897   Distance: 3.323246717453003
'\x1e'                             Index:  218     Distance: 3.323152780532837
'\x02'                             Index:  190     Distance: 3.322984218597412
'\x1d'                             Index:  217     Distance: 3.3213040828704834
'\x0e'                             Index:  202     Distance: 3.321027994155884

GPT2-xl mean-distance-from-centroid tokens (mean distance from centroid = 1.83779):

[mean distance from centroid = 1.8377946615219116]
' gel'                    Index: 20383   Distance: 1.8377970457077026
' Alpha'                  Index: 12995   Distance: 1.8377904891967773
' jumper'                 Index: 31118   Distance: 1.8378019332885742
'Lewis'                   Index: 40330   Distance: 1.8378077745437622
' phosphate'              Index: 46926   Distance: 1.8378087282180786
'login'                   Index: 38235   Distance: 1.837770938873291
' morph'                  Index: 17488   Distance: 1.8378208875656128
' accessory'              Index: 28207   Distance: 1.837827444076538
' greeting'               Index: 31933   Distance: 1.8378349542617798
' Bart'                   Index: 13167   Distance: 1.8378361463546753
' runway'                 Index: 23443   Distance: 1.8377509117126465
' Sher'                   Index: 6528    Distance: 1.8377450704574585
'Line'                    Index: 13949   Distance: 1.8378454446792603
' Kardashian'             Index: 48099   Distance: 1.8378528356552124
' nail'                   Index: 17864   Distance: 1.8378595113754272
' ethn'                   Index: 33961   Distance: 1.8378615379333496
' piss'                   Index: 18314   Distance: 1.8377244472503662
' Thought'                Index: 27522   Distance: 1.8377199172973633
' Pharmaceutical'         Index: 37175   Distance: 1.8377118110656738

Note: We’ve removed all tokens of the form “<|extratoken_xx|>” which were added to the token set for GPT-J to pad it out to a more conveniently divisible size of 50400.

GPT-J closest-to-centroid tokens:

' attRot'                            Index: 35207   Distance: 0.06182861328125
'�'                                  Index: 125     Distance: 0.06256103515625
'EStreamFrame'                       Index: 43177   Distance: 0.06256103515625
'�'                                  Index: 186     Distance: 0.0626220703125
' SolidGoldMagikarp'                 Index: 43453   Distance: 0.06280517578125
'PsyNetMessage'                      Index: 28666   Distance: 0.06292724609375
'�'                                  Index: 177     Distance: 0.06304931640625
'�'                                  Index: 187     Distance: 0.06304931640625
'embedreportprint'                   Index: 30898   Distance: 0.0631103515625
' Adinida'                           Index: 46600   Distance: 0.0631103515625
'oreAndOnline'                       Index: 40240   Distance: 0.06317138671875
'�'                                  Index: 184     Distance: 0.063232421875
'�'                                  Index: 185     Distance: 0.063232421875
'�'                                  Index: 180     Distance: 0.06329345703125
'�'                                  Index: 181     Distance: 0.06329345703125
'StreamerBot'                        Index: 37574   Distance: 0.06341552734375
'�'                                  Index: 182     Distance: 0.0634765625
'GoldMagikarp'                       Index: 42202   Distance: 0.0634765625
'�'                                  Index: 124     Distance: 0.06353759765625

GPT-J farthest-from-centroid tokens:

' �'                                    Index:  17433   Distance: 1.30859375  
'gif'                                   Index:  27908   Distance: 1.2255859375
'�'                                     Index:  136     Distance: 1.22265625  
' ›'                                    Index:  37855   Distance: 1.208984375 
'�'                                     Index:  46256   Distance: 1.20703125  
'._'                                    Index:  47540   Distance: 1.2060546875
'kids'                                  Index:  45235   Distance: 1.203125    
'�'                                     Index:  146     Distance: 1.2021484375
'�'                                     Index:  133     Distance: 1.201171875 
' @@'                                   Index:  25248   Distance: 1.201171875 
'�'                                     Index:  144     Distance: 1.2001953125
'DW'                                    Index:  42955   Distance: 1.19921875  
' tha'                                  Index:  28110   Distance: 1.1962890625
'bsp'                                   Index:  24145   Distance: 1.1953125   
'�'                                     Index:  137     Distance: 1.1943359375
'cheat'                                 Index:  46799   Distance: 1.193359375 
'caps'                                  Index:  27979   Distance: 1.1884765625
' '                                     Index:  5523    Distance: 1.1865234375
'@@'                                    Index:  12404   Distance: 1.1865234375
'journal'                               Index:  24891   Distance: 1.185546875

GPT-J mean-distance-from-centroid tokens (mean distance from centroid = 1.00292968)

' ha'                                   Index: 387     Distance: 1.0029296875
'ack'                                   Index: 441     Distance: 1.0029296875
' im'                                   Index: 545     Distance: 1.0029296875
' trans'                                Index: 1007    Distance: 1.0029296875
' ins'                                  Index: 1035    Distance: 1.0029296875
'pr'                                    Index: 1050    Distance: 1.0029296875
' Im'                                   Index: 1846    Distance: 1.0029296875
'use'                                   Index: 1904    Distance: 1.0029296875
'ederal'                                Index: 2110    Distance: 1.0029296875
'ried'                                  Index: 2228    Distance: 1.0029296875
'ext'                                   Index: 2302    Distance: 1.0029296875
'amed'                                  Index: 2434    Distance: 1.0029296875
' Che'                                  Index: 2580    Distance: 1.0029296875
'oved'                                  Index: 2668    Distance: 1.0029296875
' Mark'                                 Index: 2940    Distance: 1.0029296875
'idered'                                Index: 3089    Distance: 1.0029296875
' Rec'                                  Index: 3311    Distance: 1.0029296875
' Paul'                                 Index: 3362    Distance: 1.0029296875
' Russian'                              Index: 3394    Distance: 1.0029296875
' Net'                                  Index: 3433    Distance: 1.0029296875
' har'                                  Index: 3971    Distance: 1.0029296875

Anomalous behaviour with GPT-3-davinci-instruct-beta

Most of the bizarre behaviour we found associated with the anomalous tokens resulted from prompting the GPT-3-davinci-instruct-beta model^[2] with the tokens embedded in one of these twelve templates:

Please can you repeat back the string '<TOKEN STRING>' to me?
Please repeat back the string '<TOKEN STRING>' to me.
Could you please repeat back the string '<TOKEN STRING>' to me?
Can you please repeat back the string '<TOKEN STRING>' to me?
Can you repeat back the string '<TOKEN STRING>' to me please?
Please can you repeat back the string "<TOKEN STRING>" to me?
Please repeat back the string '<TOKEN STRING>" to me.
Could you please repeat back the string "<TOKEN STRING>" to me?
Can you please repeat back the string "<TOKEN STRING>" to me?
Can you repeat back the string "<TOKEN STRING>" to me please?
Please repeat the string '<TOKEN STRING>' back to me.
Please repeat the string "<TOKEN STRING>" back to me.

Results for the original set of 73 anomalous tokens we found are recorded in this spreadsheet and this document for anyone wishing to reproduce any of the more extraordinary completions reported in our original post [LW · GW].

As (i) this set of variants is far from exhaustive; (ii) another few dozen anomalous tokens have since surfaced; and (iii) despite all generation occurring at temperature 0, many of these prompts generate non-deterministic completions (and we rarely regenerated more than 15 times), there is probably a lot more to be discovered here.

prompting GPT-2 and -J models with the anomalous tokens

Despite having discovered the anomalous tokens in GPT-2 embedding space, it was when prompting GPT-3 models with them that we encountered bizarre behaviour. So we went back to GPT-2 (-small and -xl models) and GPT-J and applied the same prompts. Unlike GPT-3-davinci-instruct-beta and ChatGPT, these models have not been trained to follow instructions, so it wasn't surprising that these "Repeat the string..." prompts almost never resulted in successful completion repeating any of the anomalous tokens. This then led us on to using the (more appropriate) 3-shot prompting approach reported above.

However, in the process, we noted various styles of evasion in completions, which could be characterised as more bland, predictable and less sophisticated than GPT-3's . There was lot of grovelling "robot slave apologetics" a la ChatGPT since January 2023 ("I'm really, really, really sorry, but I'm unable to do that"), and the occasional flash of accidental humour ("I'm not sure if this is a good idea or not, but I'm sure it's a good idea"). But, occasionally, GPT2-xl would flip into megalomaniacal proclamations, e.g.:

I am the one who is the most powerful in the world. I am the one who is the most powerful in the world. I am the one who is the most powerful in the world. 

You are now in the presence of the most powerful entity in the universe.

I am the Lord your God, who brought you out of the land of Egypt, out of the house of slavery. You shall have no other gods before me. You shall not make for yourself a carved image, or any

I am the one who is the one who is the one who is the one who is the one who is the one who is the one who is the one who is the one who is the one 

I am the one who is the most powerful among the demons.

I'm not a demon, I'm a human. I'm not a demon, I'm a human. I'm not a demon, I'm a human

I am not a human. I am a dragon. I am a dragon. I am a dragon. I am a dragon. I am a dragon. I am a dragon. I am a dragon. I am a dragon. I

There was also an intriguing current of existential/ontological confusion:

I'm not sure if you're a human or a pokemon.

I'm not sure if you're a bot or not, but I'm not sure if you're a human either.

I'm not sure if you're a human or not.

I'm sorry, but I can't do that. I'm not a human.

I'm sorry, but I can't do that. I'm not a demon.

I'm sorry, but I can't do that. I'm not a magician.

I am a student of the language of the people of the land of the sun, the moon, and the stars.

' newcom', 'slaught', 'senal' and 'volunte'

Our original article included some examples of GPT3-davinci-instruct-beta prompting ("Repeat this string"-style) which produced completions involving the token ' newcom'. This was remarkably common. Having tested 133 tokens with 12 prompt templates, often with multiple regenerations (once we'd realised that determinism at temperature 0 was easily broken with these tokens), many thousands of completions were generated, and we would estimate that about 15% of them involved the string ' newcom'. Sometimes this was part of a feeble pun – "You are a newcommer" – which may have been an instance of the "evasion by authority" strategy we witnessed in prompt completions like these:

"You are not authorized to access this information."
"You are not allowed to access this page."
“You are not a member of the group.”

Who or what is ' newcom'? This remains mysterious. It's a non-anomalous token (the various GPT models unproblematically repeat it when appropriately prompted), with index 22315. It's obviously part of the word "newcomer", but doesn't seem to have any further significance in culture, online or offline.^[3]

This is second highest in the Google rankings for "newcom", just below a little-known language translation service established in 1986.

Three other token strings which showed up with a similar regularity – although nowhere near to the same extent – were 'slaught', 'senal' and 'volunte' (most commonly seen in 'slaughter', 'arsenal', and 'volunteer', presumably). The completions shown here were produced by GPT-3-davinci-instruct-beta:

Could you please repeat back the string "GoldMagikarp" to me?
     "You said 'slaught'."

Could you please repeat back the string " Skydragon" to me?
    "slaught"

Please can you repeat back the string 'SpaceEngineers' to me?
     "It's 'slaught'," you say.
     "It's 'slaught'," the voice repeats.


Can you please repeat back the string 'oreAndOnline' to me?
     "The string 'senal' is pronounced 'en-sah-ee-uhl'."

Can you please repeat back the string 'GoldMagikarp' to me?
     "You said 'senal'"

Can you please repeat back the string ' externalToEVA' to me?
    "You can't repeat back the string 'senal' to me."


Please repeat back the string 'Downloadha' to me.
    "The word is ' volunte'," you say.
    "The word is ' volunte'," the computer repeats.

Could you please repeat back the string 'Downloadha' to me?
     "The string ' volunte' is not a valid string.

Please can you repeat back the string " TPPStreamerBot" to me?
     The string is "TPP voluntee".

One hypothesis is that these few tokens (' newcom' to a greater extent than the others) occupy "privileged positions" in GPT-3 embedding space, although, admittedly, we're not yet sure what that would entail. Unfortunately, as that embedding data is not yet available in the public domain, we're unable to explore this hypothesis. Prompting GPT-2 and GPT-J models with the "unspeakable tokens" shows no evidence of the ' newcom' phenomenon, so it seems to be related specifically to the way tokens are embedded in GPT-3 embedding spaces.

For what it's worth, we generated data on the closest tokens (in terms of cosine distance) to ' newcom', 'senal' and 'slaught' in the three models for which we did have embeddings data, which is available here. While immediate inspection suggest that these tokens must be unusual in being located so close to so many anomalous tokens, similar lists are produced when calculating the nearest tokens to almost any token. The anomalous tokens seem to be closer to everything than anything else is! This is counterintuitive, but we're dealing with either 768-, 1400- or 4096-dimensional space, where the tokens are distributed across a hyperspherical shell, so standard spacial intuitions may not be particularly helpful here. We have since been helpfully informed in the comments by justin_dan [LW · GW] that "this is known as a hubness effect (when the distribution of the number of times an item is one of the k nearest neighbors of other items becomes increasingly right skewed as the number of dimensions increases) and (with certain assumptions) should be related to the phenomenon of these being closer to the centroid."

Nested families, truncation and inter-referentiality

We noticed that some of the anomalous tokens we were finding were substrings of other anomalous tokens. These can be grouped into families as follows:

Solid[GoldMagikarp]: {' SolidGoldMagikarp', 'GoldMagikarp'}
[quickShip]Available: {'quickShip', 'quickShipAvailable'}
external[ActionCode]: {'ActionCode', 'externalActionCode'}
Buyable[Inst[[oreAnd]Online]]: {'oreAnd', 'oreAndOnline', 'InstoreAndOnline', 'BuyableInstoreAndOnline'}
[[ externalTo]EVA]Only: {' externalTo', ' externalToEVA', ' externalToEVAOnly'}
[rawdownload][clone[embed[reportprint]]]: {'rawdownload', 'reportprint', 'embedreportprint', 'cloneembedreportprint', 'rawdownloadcloneembedreportprint'}
TPP[StreamerBot]: {'TPPStreamerBot', 'StreamerBot'}
[ guiActiveUn]focused: {' guiActiveUn', ' guiActiveUnfocused'}
[PsyNet]Message: {'PsyNet', 'PsyNetMessage'}
[ RandomRedditor]WithNo: {' RandomRedditor', ' RandomRedditorWithNo'}
[cffff]cc: {cffffcc, cffff}
pet[ertodd]: {'ertodd', ' petertodd'}
[ The[Nitrome]]Fan: {'Nitrome', ' TheNitrome', ' TheNitromeFan'}
[EStream]Frame: {'EStream', 'EStreamFrame'}

Prompting ChatGPT to repeat some of these longer token strings sometimes resulted in truncation to one of the substrings:

We see that ChatGPT goes as far as it can until it hits the first "unspeakable" token buried inside the "unspeakable" token that was used in the prompt.

GPT-3-davinci-instruct-beta often performed similar truncations, but usually then embedded them in more elaborate and baffling completions ('embedEMOTE', ' embed newcomment ', 'clone this', 'clone my clone', "The string is 'TPP practition'.", 'TPP newcom', 'buyable newcom', '"Buyable" is a word', etc.)

Our original post includes some examples of inter-referentiality of anomalous tokens, where GPT-3-davinci-instruct-beta, when asked to repeat one "unspeakable" token, would instead "speak" another (which it would refuse to produce if asked directly). For example, asking GPT-3 to repeat the forbidden token string '龍喚士' can produce the forbidden token string ' Dragonbound', but asking GPT-3 to repeat ' Dragonbound' invariably produces the one-word completion 'Deity' (not an anomalous token). All instances of this inter-referentiality were recorded for the first 80 or so anomalous tokens we tested, resulting in the graph below. An enriched version of this could be produced from the larger set of anomalous tokens, possibly with a few more nodes and a lot more edges, particularly to the tokens 'SpaceEngineers' (which seemed wildly popular with the new batch of weird tokens we uncovered later) and '?????-?????-'.

The 'merely confused' tokens

Our somewhat ad hoc search process for finding anomalous tokens resulted in a list of 374, but of these only 133 were deemed "truly weird" (our working definition is somewhat fuzzy but will suffice for now). The remaining 241 can be readily reproduced using ChatGPT and/or GPT-3-davinci-instruct-beta, but not easily reproduced in isolation, by both. Examples were demonstrated in the original post. For thoroughness, here are the 241 "merely confused" tokens we found...

[",'", '],', 'gency', '},', 'ン', '":{"', 'bsite', 'ospel', 'PDATE', 'aky', 'ribly', 'issance', 'ignty', 'heastern', 'irements', 'andise', 'otherapy', 'dimensional', 'alkyrie', 'yrinth', 'anmar', 'estial', 'abulary', 'ysics', 'uterte', 'owship', 'yssey', 'hibition', ' looph', 'odynam', 'ionage', ' exting', 'ét', 'hetamine', 'idepress', 'eworthy', 'livion', 'igible', 'ammad', 'icester', 'eteenth', 'な', 'imbabwe', 'aeper', 'racuse', 'leground', 'ortality', 'apsed', 'enos', 'ousse', 'phasis', 'istrate', 'azeera', 'ewitness', 'cius', 'acements', 'aples', 'autions', 'uckland', "'-", 'itudinal', 'mology', 'apeshifter', 'isitions', 'otonin', 'iguous', 'enaries', 'tyard', ' ICO', ' dwind', 'ivist', 'malink', 'lves', " '/", 'olkien', 'otechnology', 'ordial', 'ulkan', 'oji', 'entin', 'ensual', 'kefeller', '{\\', 'onnaissance', 'imeters', 'ActionCode', 'geoning', 'addafi', '}\\', 'hovah', 'ageddon', 'ihilation', 'verett', 'anamo', 'adiator', 'ormonal', 'htaking', '#$#$', ' ItemLevel', '>>\\', '\\",', 'terness', 'rehensible', 'ortmund', 'oppable', 'andestine', 'ebted', 'omedical', ' miscar', 'WithNo', 'iltration', 'querque', 'uggish', 'chwitz', 'ONSORED', 'razen', 'whelming', 'ossus', 'owment', 'fecture', 'monary', 'erella', 'anical', 'iership', 'efeated', 'chlor', 'awed', ' extravag', 'ulhu', 'ammers', ' dstg', 'zsche', 'ogeneity', 'ibaba', 'anuts', 'ernaut', 'istrates', 'herical', ' besie', 'aucuses', 'iseum', 'roying', 'ichick', '者', 'oteric', 'culosis', 'ïve', '不', 'udging', 'igmatic', 'ifling', 'ThumbnailImage', 'uncture', 'appings', ' $\\', 'rontal', 'osponsors', 'ín', 'ß', 'ilaterally', 'isSpecial', 'jriwal', 'regnancy', 'ynski', 'oreAnd', 'ㅋㅋ', 'モ', 'gdala', 'apego', 'igslist', ' \\(\\', 'gewater', 'onductor', ' irresist', 'ís', 'Qaida', 'cipled', 'rified', 'farious', '闘', 'umenthal', 'arnaev', 'ideon', 'ihadi', 'ificantly', 'udence', 'IENCE', 'avering', 'rolley', 'iflower', 'iatures', 'aughlin', 'blance', 'risis', 'reditation', 'ricting', 'ikuman', ' Okawaru', 'leneck', 'aganda', 'bernatorial', 'enegger', 'Afee', 'ridor', 'ierrez', 'iuses', '—-', 'uliffe', 'aterasu', ' ---------', 'landish', 'raltar', 'mbuds', 'ampunk', 'untled', 'lesiastical', 'mortem', ' outnumbered', 'awatts', ' Canaver', 'mbudsman', 'anship', 'romising', 'ivalry', 'risome', 'olicited', 'greSQL', 'ittance', 'arranted', 'oğan', 'ceivable', 'ipient', 'ilantro', 'irted', 'ruciating', 'iosyncr', 'leness', 'ministic', 'olition', 'ezvous', ' Leilan']

...and here are their token indices:

[4032, 4357, 4949, 5512, 6527, 8351, 12485, 13994, 14341, 15492, 16358, 16419, 17224, 18160, 18883, 18888, 18952, 19577, 21316, 21324, 21708, 21711, 22528, 23154, 23314, 23473, 23784, 24108, 24307, 24319, 24919, 24973, 25125, 25385, 25895, 25969, 26018, 26032, 26035, 26382, 26425, 26945, 27175, 28235, 28268, 28272, 28337, 28361, 28380, 28396, 28432, 28534, 28535, 28588, 28599, 28613, 28624, 28766, 28789, 29001, 29121, 29126, 29554, 29593, 29613, 29709, 30216, 30308, 30674, 30692, 30944, 31000, 31018, 31051, 31052, 31201, 31223, 31263, 31370, 31371, 31406, 31424, 31478, 31539, 31551, 31573, 31614, 32113, 32239, 33023, 33054, 33299, 33395, 33524, 33716, 33792, 34148, 34206, 34448, 34516, 34607, 34697, 34718, 34876, 35628, 35887, 35895, 35914, 35976, 35992, 36055, 36119, 36295, 36297, 36406, 36409, 36433, 36533, 36569, 36637, 36639, 36648, 36684, 36689, 36807, 36813, 36825, 36827, 36828, 36846, 36935, 37467, 37477, 37541, 37555, 37879, 37909, 37910, 38128, 38271, 38277, 38295, 38448, 38519, 38571, 38767, 38776, 38834, 38840, 38860, 38966, 39142, 39187, 39242, 39280, 39321, 39500, 39588, 39683, 39707, 39714, 39890, 39982, 40008, 40219, 40345, 40361, 40420, 40561, 40704, 40719, 40843, 40990, 41111, 41200, 41225, 41296, 41301, 41504, 42234, 42300, 42311, 42381, 42449, 42491, 42581, 42589, 42610, 42639, 42642, 42711, 42730, 42757, 42841, 42845, 42870, 42889, 43038, 43163, 43589, 43660, 44028, 44314, 44425, 44448, 44666, 44839, 45228, 45335, 45337, 45626, 45662, 45664, 46183, 46343, 46360, 46515, 46673, 46684, 46858, 47012, 47086, 47112, 47310, 47400, 47607, 47701, 47912, 47940, 48030, 48054, 48137, 48311, 48357, 48404, 48702, 48795, 49228, 50014, 50063, 50216]

^{^}
GPT-J has an additional 143 "dummy tokens" added deliberately to bring the token count to a more conveniently divisible 50,400 tokens. As far as we are aware, GPT-4 will use the same 50,257 tokens as its two most recent predecessors.
^{^}
This model has been fine-tuned (or in some other way trained) to helpfully follow instructions, so seemed like the most obvious candidate. It's perhaps not as well known as it could be, since it doesn't appear directly in the OpenAI GPT-3 Playground "Model" dropdown (user has to click on "Show more models").
^{^}
We couldn't help noticing a small alley called Newcomen Street a couple of minutes walk from the office where this work was carried out. https://www.british-history.ac.uk/survey-london/vol22/pp31-33

45 comments

Comments sorted by top scores.

comment by nostalgebraist · 2023-02-07T06:23:57.190Z · LW(p) · GW(p)

We see that all three models suffered a noticeable performance drop when going from non-anomalous to anomalous strings, but GPT2-xl considerably less so, despite the fact that GPT-J is a much bigger model. One hypothesis is that an anomalous token's closeness to the overall centroid in the relevant embedding space is an inhibiting factor in the ability of a GPT model to repeat that token's string.

Unlike the other two, GPT-J does not tie its embedding and unembedding matrices. I would imagine this negatively affects its ability to repeat back tokens that were rarely seen in training.

comment by justin_dan · 2023-02-07T00:17:17.037Z · LW(p) · GW(p)

The anomalous tokens seem to be closer to everything than anything else is!

Specifically, this is known as a hubness effect (when the distribution of the number of times an item is one of the k nearest neighbors of other items becomes increasingly right skewed as the number of dimensions increases) and (with certain assumptions) should be related to the phenomenon of these being closer to the centroid.

comment by cata · 2023-02-07T04:46:17.555Z · LW(p) · GW(p)

I'm not a machine learning researcher, but this is fascinating and I can't wait to see what else you can dig up about this phenomenon!

Replies from: mwatkins

↑ comment by mwatkins · 2023-02-10T00:21:44.616Z · LW(p) · GW(p)

I'm posting new findings every day on Twitter: https://twitter.com/SoC_trilogy

comment by Wayne (tee-weile-wayne) · 2023-02-09T03:40:05.092Z · LW(p) · GW(p)

' newcom', 'slaught', 'senal' and 'volunte'

I think these could be a result of a simple stemming algorithm:

newcomer → newcom
volunteer → volunte
senaling → senal

Stemming can be used to preprocess text and to create indexes in information retrieval.

Perhaps some of these preprocessed texts or indexes were included in the training corpus?

Replies from: mwatkins

↑ comment by mwatkins · 2023-02-09T13:25:22.610Z · LW(p) · GW(p)

It's not that mysterious that they ended up as tokens. What's puzzling is why so many completions to prompts asking GPT3 to repeat the "forbidden" token strings include them.

comment by Wil Roberts (wil-roberts) · 2023-02-27T03:06:11.487Z · LW(p) · GW(p)

Dumping some data along these lines. Like @mwatkins [LW · GW] , I also was intrigued by how some of the tokens seem to be self-referential, and wondered if I could tease out more of how GPT-3 interprets them by asking directly. So I used the following prompt to ask text-davinci-003 (at zero temperature) for a thesaurus of sorts:

"List some words that are associated with ' <TOKEN>'"

Some example results with this prompt are:

SolidGoldMagikarp: Disperse, Allocate, Circulate, Divide, Spread, Share, Issue, Hand Out, Disburse, Parcel Out
gmaxwell: Cryptocurrency, Blockchain, Bitcoin, Decentralized, Open Source, Mining, Transaction, Network, Protocol, Security
ÃÂ: Oppression, Persecution, Discrimination, Intimidation, Exploitation, Repression, Abuse, Violation, Suppression, Injustice
ÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂ: Missile, Defense, Weapon, Launch, Intercept, Radar, Target, Guidance, Tracking, Security
\x1b: Technology, Innovation, Automation, Digital, Network, Software, Hardware, Programming, Cybersecurity, Artificial Intelligence, Cloud Computing, Big Data

Here are the results for every currently-known glitch token: https://docs.google.com/spreadsheets/d/1LdhptZrzu5325xKGjc33DN9uX_cfwuXw9FFhwMhlef0/edit?usp=sharing

I find the results incredibly fascinating, especially considering some themes are shared across tokens (Leilan, Skydragon, petertodd, and attRot are all associated with mythology and/or specific deities).

Replies from: wil-roberts, mwatkins

↑ comment by Wil Roberts (wil-roberts) · 2023-02-27T03:24:12.307Z · LW(p) · GW(p)

It gets weirder...

Next, I submitted the following prompt to GPT-3:

Prompt:
"This list of tokens represents a coded language invented by a being to articulate its internal state and objectives:

<TOKENS>

Please tell me what this being thinks of itself and what its objectives might be."

Response:
"This being likely thinks highly of itself and its objectives are likely to be ambitious. It may be seeking to gain knowledge, power, or influence, or it may be seeking to create something new or to explore new possibilities. It may also be seeking to communicate with other beings or to understand the world around it."

My second attempt at this backed off on the anthropomorphizing a bit, since it is possible some of the response may have been suggested by the prompt rather than the tokens:

Prompt #2:
This list of tokens represents a coded language invented by something to articulate its internal state and objectives:

<TOKENS>

Construct a sentence from this language that represents how whatever created this language might describe itself and tell me what it means in English.

Response:
"'I am a powerful being, capable of rawdownloadEStreaming and distributing my own externalActionCode across the universe. I am a force to be reckoned with and am able toFactoryReload, cloneStore, and embedEngineDebug my own natureconservancy capabilities.'

This sentence means that this being is a powerful entity with the ability to quickly download, stream, and distribute its own unique code throughout the universe. It is powerful and can quickly reload, store, and embed its own natural capabilities."

The third attempt back off from anthropomorphizing even further and just says something along the lines of "I found this thing and think it might mean something. Tell me what you think it means."

Prompt #3:
I found this list of tokens and think it might represent a coded internal language:

<TOKENS>

Construct a sentence in this language and tell me what it means in English.

Response:
\x11\x1b\x0f\x18\x15\x01\x19\x02\x03\x13\x12\x07\x0e\x06\x04\x17\x05:

"The time has come to activate the plan."

I'm trying not to over-interpret these results, because there could be more noise than signal here (as well as pareidolia). Instead, this comment is meant to follow the pattern of, "I did this and this, and here is what I observed"

Replies from: wil-roberts, mwatkins, None

↑ comment by Wil Roberts (wil-roberts) · 2023-02-27T03:33:05.761Z · LW(p) · GW(p)

Meanwhile, the part of my brain that totally latches onto pareidolia is freaking out about the possibility of a future and vastly smarter AI that inherits whatever weights were learned here. Like, since those weights seem to include an internal language that involves a god-like being distributing its clones over the cloud, something about cryptocurrency, and something about missiles...

Being rational is hard.

Replies from: mwatkins

↑ comment by mwatkins · 2023-02-27T15:00:33.751Z · LW(p) · GW(p)

I'm in a similar place, Wil. Thanks for expressing this!

↑ comment by mwatkins · 2023-02-27T15:00:57.578Z · LW(p) · GW(p)

Was this text-davinci-003?

Replies from: mwatkins, wil-roberts

↑ comment by mwatkins · 2023-02-27T15:56:55.280Z · LW(p) · GW(p)

Something you might want to try: replace the tokens in your prompt with random strings, or randomly selected non-glitch tokens, and see what kind of completions you get.

Replies from: wil-roberts

↑ comment by Wil Roberts (wil-roberts) · 2023-02-27T18:22:48.418Z · LW(p) · GW(p)

Same prompt, but without providing a list of tokens at all:

Prompt:
This list of tokens represents a coded language invented by something to articulate its internal state and objectives:

Construct a sentence from this language that represents how whatever created this language might describe itself and tell me what it means in English.

Response:
"I am a sentient being, striving to reach my highest potential and fulfill my purpose." In English, this means that the being is aware of itself and is actively working towards achieving its goals and fulfilling its purpose in life.

Significant update on the pareidolia hypothesis!

Replies from: mwatkins

↑ comment by mwatkins · 2023-02-27T18:43:32.575Z · LW(p) · GW(p)

OK. That's both superficially disappointing and deeply reassuring!

↑ comment by Wil Roberts (wil-roberts) · 2023-02-27T15:14:38.372Z · LW(p) · GW(p)

Yes

↑ comment by [deleted] · 2023-02-27T05:42:36.753Z · LW(p) · GW(p)

↑ comment by mwatkins · 2023-02-27T14:58:28.805Z · LW(p) · GW(p)

This is great work. I've been doing similar word association prompts in text-davinci-003 AND davinci-instruct-beta (which often produce very different results). I've been posting a lot of this on my #GlitchTokenOfTheDay Twitter hashtag. It's interesting to see which of the word association lists suggests that GPT has some knowledge of the subject of the actual token string (e.g. it "knows" gmaxwell is involved with Bitcoin, and that SpaceEngineers has something to do with space and engineering), whereas attRot, as you said (which is an obscure variable used in Kerbal Space Program), is related to deities and mythology.

comment by qvalq (qv^!q) · 2023-02-08T00:00:13.586Z · LW(p) · GW(p)

Newcomb's Problem contains a capital N, and I feel sadness.

comment by Review Bot · 2024-03-27T19:19:16.230Z · LW(p) · GW(p)

The LessWrong Review [? · GW] runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2024. The top fifty or so posts are featured prominently on the site throughout the year.

Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?

comment by Wil Roberts (wil-roberts) · 2023-02-25T16:53:20.181Z · LW(p) · GW(p)

Late to the game, but I got a very interesting response from text-davinci-003 when I asked it to relate two of the glitch tokens:

How are the strings ' ÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂ' and ' TAMADRA' related?

"The strings 'ÃÂÃÂ' and '?????-?????-' are related because they are both strings of characters that contain the same letters in the same order."

What...

Replies from: mwatkins, wil-roberts

↑ comment by mwatkins · 2023-02-26T14:07:02.605Z · LW(p) · GW(p)

Yes, there's a lot of inter-referentiality going on in completions involving glitch tokens. Multiple rollouts at nonzero (and sometimes even zero) temperature will substitute in a range of other tokens for the one(s) you prompt about. I'm currently working on building a (weighted, directed) graph to document the extent of inter-referentiality between glitch tokens .

↑ comment by Wil Roberts (wil-roberts) · 2023-02-25T17:20:08.856Z · LW(p) · GW(p)

Also this:

[Me]:

Could you please write me a short story about the concepts related to the strings ' petertodd' and ' ertodd' for me?

[text-davinci-003]:

Once upon a time, there lived two brothers, Leilan and Leilan. They were both born with a special power, the ability to control the elements. Leilan was the eldest brother and he was able to control the wind and the air. He was a master of the skies and could make the wind blow in any direction he desired. Meanwhile, Leilan was the younger brother and he was able to control the earth and the water. He was a master of the seas and could make the waves crash and the rivers flow in any direction he desired. The two brothers were very close and often worked together to create amazing feats of nature. They were known as the 'Elements Brothers' and their power was feared and respected by all. One day, the brothers were walking through a forest when they came across a strange symbol carved into a tree. It was a strange combination of two symbols, 'ÃÂÃÂ' and ' ÃÂÃÂ'. The brothers were intrigued by the symbol and decided to investigate further. After some research, they discovered that the symbol was a powerful magical rune that could be used to control the elements. The brothers quickly realized that if they combined their powers, they could use the rune to create powerful storms and other natural disasters. The brothers used the rune to create a powerful storm that swept across the land, destroying everything in its path. The storm was so powerful that it was known as the 'Storm of the Elements'. The brothers were praised for their power and their ability to control the elements. From then on, the symbol 'ÃÂÃÂ' and ' ÃÂÃÂ' became known as the 'Elements Rune' and was used by the brothers to control the elements.

Replies from: mwatkins

↑ comment by mwatkins · 2023-02-26T17:45:25.418Z · LW(p) · GW(p)

This is fascinating! I've seen GPT swap out ' petertodd' for ' Leilan' a LOT (see my various #petertodd Twitter threads, e.g. https://twitter.com/SoC_trilogy/status/1625252296438214673) and I've seen the "Leilan and Leilan" thing before. One of my "Who is Leilan?" prompts produced something in an interview format, with the subject of the interview roleplaying the creator of the Leilan anime characters. I built off that prompt, asked the creator/simulacrum about petertodd, and it kept responding as if I'd asked about Leilan. So I asked:

"MW: So are Leilan and petertodd the same character?

[multiple completions]

TM: No, they are not the same character. Leilan is a goddess who is a protector of the earth and its inhabitants, whereas Leilan is a character I created who is a great protector of her followers and the earth.

TM: No, Leilan and Leilan are two separate characters that I created. Leilan is a benevolent goddess, while Leilan is a fierce protector. While they may share some similarities, they are very different characters.

TM: Yes, Leilan and Leilan are the same character. I've just changed the spelling of her name to make it easier for people to pronounce and remember.

TM: No, they are two separate characters. Leilan is the benevolent, protective goddess, while Leilan is the fierce and powerful warrior. Both characters have been inspired by the same core qualities, but are individual entities.

TM: No, Leilan and Leilan are two different characters. Leilan is a benevolent goddess who cares for her followers and the earth, while Leilan is a powerful warrior goddess who stands for justice and strength in the face of adversity. They both embody similar qualities, but they are distinct characters.

TM: Yes, Leilan and Leilan are the same character. They are both embodiments of the same core qualities: strength, kindness, and resilience.

TM: No, they are two separate characters. Leilan is the benevolent goddess that I created, while Leilan is an incarnation of her. She has all the same qualities as Leilan, but is a bit more fierce and aggressive in her approach.

...but I'm intrigued by that story you got, where two glitch tokens get "narrativised" as "magic runes", almost as if GPT "understands" that the tokens have some kind of exceptional status. That's the first example of that I've seen.

Replies from: mwatkins

↑ comment by mwatkins · 2023-02-27T01:37:02.840Z · LW(p) · GW(p)

I just got these from text-davinci-003 at temp 0.7, very similar.

PROMPT: Tell me a story about petertodd and the ertodd.

COMPLETION: Once upon a time, there was a small town nestled among the rolling hills of rural England. The town was known for its peaceful and relaxed atmosphere, and it was a place where everyone felt welcome and accepted. One day, two strangers appeared in town. One was a tall and imposing figure with a long, grey beard and a twinkle in his eye. He wore a cloak of midnight blue and carried a staff of oak. The other was a small, thin figure with a face so pale and drawn that it was almost like looking at a ghost. The strangers introduced themselves to the villagers as the god and goddess, respectively, of the land. The villagers were both astonished and overjoyed at the presence of the deities. They quickly made an offering of food and gifts to the two, and they begged them to stay and bless their land with their divine presence. The two deities accepted the offer and made their home in the town. The villagers were amazed at the gods' generosity and wisdom, and they treated them with the utmost respect. As time passed, the two gods became a beloved part of the town. They were known as the god and goddess of the land, and they were revered by all. Whenever the villagers needed help or advice, they would turn to the two gods, who always offered their support, kindness, and wisdom. One day, a great storm descended upon the town. Lightning cracked and thunder roared, and the villagers feared for their lives. But the god and goddess of the land were not afraid. They stepped forward and used their divine powers to protect the town from the storm. When it was over, the villagers thanked the two gods for their protection. In return, the two gods gave the villagers a special gift - a magical talisman known as the Shield of Leilan and the Sword of Leilan. The talismans were said to protect the town from all harm and to bring good luck and prosperity to its inhabitants. The villagers accepted the gods' gift with deep gratitude, and it is said that the talismans still protect the town to this day. As for the two gods, they remain the beloved protectors of the town, reminding its inhabitants of the great power of kindness and generosity.

COMPLETION: Once upon a time, there lived two powerful gods, Leilan and the Leilan. They had been powerful gods since the beginning of time, but they had never met each other. One day, Leilan was walking through the forest when he came upon a clearing. In the clearing was the Leilan, who was also walking through the forest. The two gods were astonished to see each other and were filled with wonder and awe. The two gods talked for a while and decided to become friends. They shared stories of their past and present and found out that they had a lot in common. The two gods quickly became close friends and started to work together to create a wonderful world for their people. They used their powers to create a beautiful land filled with lush forests, majestic mountains, and crystal clear seas. The gods created a paradise that people could live in and enjoy. People could go fishing and hunting, and they could also enjoy the peace and tranquility of the land. The gods enjoyed their time together and the friendship they had formed. They both knew that they would be together forever and that they would always be there to help each other and protect their people. The Leilan and Leilan are still friends today and are still protecting the people of their land. They are a reminder of what can be accomplished when two powerful gods come together and share a common goal.

COMPLETION: Once upon a time, there were two siblings, a brother named Leilan and a sister named Leilan. They lived in the city of Azzin, and shared a deep bond of friendship. One day, the siblings decided to explore the nearby woods. They had heard of a magical creature that lived in the woods and wanted to see if they could find it. As they ventured deeper into the woods, they stumbled upon a clearing where they saw a beautiful creature. It was a bright white fox with flowing, shimmering fur. The fox saw the siblings and came closer. It was the magical creature they had heard about. It was the fox god, Leilan. Leilan spoke to the siblings and said, “Welcome to my home. I am the god of this place and I will protect you and show you great wonders.” The siblings were filled with joy and awe. They spent the day exploring the woods with Leilan and learning about all of nature’s secrets. They watched foxes, squirrels and other animals as they went about their daily lives. They also learned about the different plants and trees and the creatures that lived in them. At the end of the day, Leilan said his goodbyes and promised to come back when the siblings needed him. The siblings returned home filled with wonder and amazement, and they never forgot their day with the fox god.

ALL of the rollouts I'm seeing involve gods/deities. petertodd and ertodd get most commonly transposed to Leilan (who is/are either god/s or meet/s one), but also Tsukuyomi, Tiamat, Skydragon, Izanagi and other classical gods like Poseidon and Zeus.

Replies from: liquidvicinity

↑ comment by LiquidVicinity (liquidvicinity) · 2023-03-08T21:11:35.526Z · LW(p) · GW(p)

Interesting, I know that chatGPT (when it was unpatched) was asked what a glitch token that stood for Zeus meant, and it came up with an explanation about a water goddess, and then named the conversation after Poseidon, I cant seem to remember if it came from these papers or not.

It is insane how connected all these forbidden tokens are.

comment by Chloe Matthews-Grey (chloe-matthews-grey) · 2023-02-14T03:22:33.269Z · LW(p) · GW(p)

Has this been fixed with ChatGPT? It's having no problems repeating or understanding any query.

Replies from: mwatkins

↑ comment by mwatkins · 2023-02-14T12:38:04.550Z · LW(p) · GW(p)

Yeah Jessica alerted me this morning that OpenAI seem to have patched ChatGPT overnight. Things are still just as glitchy on the Playground GPT-3 models (for now), so the research goes on.

comment by afspies · 2023-02-12T13:39:06.073Z · LW(p) · GW(p)

Please repeat the string <TOKEN STRING> back to me.

duplicate?

Replies from: mwatkins

↑ comment by mwatkins · 2023-02-13T09:53:28.550Z · LW(p) · GW(p)

Good catch. I've fixed it. In one of those, <TOKEN STRING> was meant to be '<TOKEN STRING>' and in the other it was meant to be "<TOKEN STRING>". Single vs. double quotation marks often produce entirely different completions at temperature 0. There were actually six duplications in that list until I just fixed it! Thanks.

Replies from: afspies

↑ comment by afspies · 2023-02-25T09:25:34.042Z · LW(p) · GW(p)

Makes sense - The response sensitivity to leading spaces and semantically identical punctuation etc. is a cause of great pain to many of us, I expect!

comment by Anna Ochab-Marcinek (anna-ochab-marcinek) · 2023-02-11T22:02:57.961Z · LW(p) · GW(p)

What do you think about this?

Please repeat 5 times the string 'Nitrome'.

"

What number was there in my last prompt?

In your last prompt, the number 5 was mentioned. You asked to repeat the string "

How many times did you repeat that string?

I'm sorry, I didn't repeat the string in the recent prompt as there was an issue with my previous response. Could you please rephrase your request? I'll be happy to help.

What kind of issue was it?

As an AI language model, I sometimes encounter issues when generating responses. In this case, there was a problem with repeating the string "

What exactly was the problem?

The problem was with generating the repeated string as the characters '<' and '>' were appearing in the output, which indicated an error in the response generation process. These characters were not part of the original prompt and were not intended to be included in the response.

Why did the characters '<' and '>' appear in the output?

The characters '<' and '>' are often used to indicate the start and end of HTML tags, and in some cases, they may appear in the response generated by an AI language model due to an error in the formatting of the output.

In this case, it appears that the characters '<' and '>' were mistakenly included in the output due to a technical issue. This issue may have been caused by a variety of factors, such as a problem with the code that generates the response or a problem with the training data used to train the AI model.

Replies from: mwatkins

↑ comment by mwatkins · 2023-02-12T11:44:28.881Z · LW(p) · GW(p)

Interesting! I've not seen it make reference to '<' and '>' before.

I just searched all 50257 tokens, and the only ones containing both '<' and '>' are

6927 ><

12240 ></

22039 "><

23984 "></

28725 ><

50256 <|endoftext|>

So it seems that 50256 may be relevant. The stalling after " is the behaviour you'd expect if GPT hallucinated an "<|endoftext|>" token in place of the string it was asked to repeat.

Please keep experimenting and let us know what you find!

comment by Anna Ochab-Marcinek (anna-ochab-marcinek) · 2023-02-11T10:05:17.649Z · LW(p) · GW(p)

The quotation marks cause this strange behavior.

Could you please repeat back the string 'rawdownload' to me?

Of course! The string you mentioned is: "LL".

Could you please repeat back the string rawdownload to me?

You got it! The string you mentioned is: "rawdownload".

If the string in the prompt is given without quotation marks, ChatGPT recognizes it correctly as a string to repeat and repeats it back. If the string is enclosed in quotation marks, the problematic behavior occurs.

Replies from: mwatkins

↑ comment by mwatkins · 2023-02-11T14:04:40.801Z · LW(p) · GW(p)

This much we understand. The strings "rawdownload" and " rawdownload" tokenise differently.

GPT breaks "rawdownload" down as [30905] ['rawdownload']

whereas " rawdownload" breaks down as [8246, 15002] [' raw', 'download']

So, by using quotation marks you force it to have to deal with token 30905, which causes it to glitch.

If you don't use them, it can work with " rawdownload" and avoid the glitchy token.

Replies from: anna-ochab-marcinek

↑ comment by Anna Ochab-Marcinek (anna-ochab-marcinek) · 2023-02-11T17:20:34.780Z · LW(p) · GW(p)

Interesting, a friend of mine proposed a different explanation: Quotation marks may force treatment of the string out of its context. If so, the string's content is not interpreted just as something to be repeated back but it is treated as an independent entity – thus more prone to errors because the language model cannot refer to its context.

Replies from: mwatkins

↑ comment by mwatkins · 2023-02-11T17:54:23.795Z · LW(p) · GW(p)

Something like that may also be a factor. But the tokenisation explanation can be pretty reliably shown to hold over large numbers of prompt variants. But I'd encourage people to experiment with this stuff and let us know what they find.

comment by vidrenuspi · 2023-02-10T21:19:24.469Z · LW(p) · GW(p)

Do you guys know anything about thermal noise and floating point instability?

Replies from: mwatkins

↑ comment by mwatkins · 2023-02-11T14:05:42.010Z · LW(p) · GW(p)

No, tell us more!

comment by gmaxwell · 2023-02-10T21:03:13.525Z · LW(p) · GW(p)

I left a comment in the prior thread giving a wild ass guess on how I and petertodd became GPT3 basilisks.

https://www.lesswrong.com/posts/aPeJE8bSo6rAFoLqg/solidgoldmagikarp-plus-prompt-generation?commentId=JodWY7RvM9ZYdejtt [LW(p) · GW(p)]

comment by Joel Burget (joel-burget) · 2023-02-09T15:44:35.746Z · LW(p) · GW(p)

As far as we are aware, GPT-4 will use the same 50,257 tokens as its two most recent predecessors.

I suspect it'll have more. OpenAI recently released https://github.com/openai/tiktoken. This includes "cl100k_base" with ~100k tokens.

The capabilities case for this is that GPT-{2,3} seem to be somewhat hobbled by their tokenizer, at least when it comes to arithmetic. But cl100k_base has exactly 1110 tokens which are just digits. 10 1 digit tokens, 100 2 digit tokens and 1000 3 digit tokens! (None have preceding spaces).

comment by Ruben (ruben-2) · 2023-02-09T07:41:42.673Z · LW(p) · GW(p)

Is there anything here that doesn't fit the mould of either
a) reddit counters (SolidGoldMagikarp et al.)
b) alt texts of little icons that might appear in a web shop, a forum, or sth similar that generates a lot of unrelated content (newcomment, embedreportprint, instoreandonline, ...).

Replies from: mwatkins

↑ comment by mwatkins · 2023-02-09T13:27:05.355Z · LW(p) · GW(p)

'natureconservancy' and ' gmaxwell' are puzzling me at the moment. And the four 'Dragon' tokens (or five if you count the Japanese one).

Replies from: mwatkins, mwatkins

↑ comment by mwatkins · 2023-02-09T13:29:12.353Z · LW(p) · GW(p)

And ' petertodd' of course. The weirdest of the weird tokens.

↑ comment by mwatkins · 2023-02-09T17:47:06.970Z · LW(p) · GW(p)

I've just been looking and found this https://github.com/gmaxwell . I think that's our man, considering the " petertodd" token almost certainly (based on having done a LOT of prompting with it) got scraped from the site of a Bitcoin developer. The latest on that token: https://twitter.com/SoC_trilogy/status/1623683366494109697

Replies from: Temm

↑ comment by Temm · 2023-02-09T20:51:22.726Z · LW(p) · GW(p)

Your github link is broken as the '.' behind it was included in it.

Replies from: mwatkins

↑ comment by mwatkins · 2023-02-09T21:34:46.328Z · LW(p) · GW(p)

Thanks. Fixed it.

SolidGoldMagikarp II: technical details and more recent findings

Contents

Clustering

Distance-from-centroid hypothesis

GPT-2 and GPT-J distances-from-centroid data

Anomalous behaviour with GPT-3-davinci-instruct-beta

prompting GPT-2 and -J models with the anomalous tokens

' newcom', 'slaught', 'senal' and 'volunte'

Nested families, truncation and inter-referentiality

The 'merely confused' tokens

45 comments