Posts
Comments
Comment by
bmschmidt on
The ‘ petertodd’ phenomenon ·
2023-04-15T22:42:13.531Z ·
LW ·
GW
One theory I haven't seen in skimming some of the petertoddology out there:
- There is an fairly prominent github user named petertodd associated with crypto, and the presence of this as a token in the tokenizer is almost certainly a result of him;
- Crypto people tend to have their usernames sitting alongside varied crytographic hashes on the internet a lot;
- Cryptographic hashes are extremely weird things for a transformer, because unlike a person a transformer can't just skim past the block of text; instead they sit there furiously trying to predict the next token over and over again, filling up their context window one
4e
and6f
at a time.
So some of the weird sinkhole features of this token could result from a machine that tries to reduce entropy on token sequences, encountering a token that tends to live in strings of extremely high entropy.