Posts

Glitch Token Catalog - (Almost) a Full Clear 2024-09-21T12:22:16.403Z
A New Class of Glitch Tokens - BPE Subtoken Artifacts (BSA) 2024-09-20T13:13:26.181Z
Actually, Power Plants May Be an AI Training Bottleneck. 2024-06-20T04:41:33.567Z
Reconsider the anti-cavity bacteria if you are Asian 2024-04-15T07:02:02.655Z
Update on Chinese IQ-related gene panels 2023-12-14T10:12:21.212Z
Why No Automated Plagerism Detection For Past Papers? 2023-12-12T17:24:31.544Z
LLMs, Batches, and Emergent Episodic Memory 2023-07-02T07:55:04.368Z
I Think Eliezer Should Go on Glenn Beck 2023-06-30T03:12:57.733Z
InternLM - China's Best (Unverified) 2023-06-09T07:39:15.179Z
AI Safety in China: Part 2 2023-05-22T14:50:54.482Z
My Assessment of the Chinese AI Safety Community 2023-04-25T04:21:19.274Z
What about non-degree seeking? 2022-12-17T02:22:20.300Z
COVID China Personal Advice (No mRNA vax, possible hospital overload, bug-chasing edition) 2022-12-14T10:31:22.902Z
Neglected cause: automated fraud detection in academia through image analysis 2022-11-30T05:52:14.528Z
How to correct for multiplicity with AI-generated models? 2022-11-28T03:51:30.578Z
How do I start a programming career in the West? 2022-11-25T06:37:12.237Z
Human-level Diplomacy was my fire alarm 2022-11-23T10:05:36.127Z
Lao Mein's Shortform 2022-11-16T03:01:21.462Z
Tactical Nuclear Weapons Aren't Cost-Effective Compared to Precision Artillery 2022-10-31T04:33:36.855Z
Actually, All Nuclear Famine Papers are Bunk 2022-10-12T05:58:40.306Z
That one apocalyptic nuclear famine paper is bunk 2022-10-12T03:33:32.488Z

Comments

Comment by Lao Mein (derpherpize) on "The Solomonoff Prior is Malign" is a special case of a simpler argument · 2024-11-19T15:59:48.400Z · LW · GW

Agents which allow themselves such considerations to seriously influence their actions aren't just less fit - they die immediately. I don't mean that as hyperbole. I mean that you can conduct a Pascal's Mugging on them constantly until they die. "Give me $5, and I'll give you infinite resources outside the simulation. Refuse, and I will simulate an infinite number of everyone on Earth being tortured for eternity" (replace infinity with very large numbers expressed in up-notation if that's an objection). If your objection is that you're OK with being poor, replace losing $5 with <insert nightmare scenario here>.

This still holds if the reasoning about the simulation is true. It's just that such agents simply don't survive whatever selection pressures create conscious beings in the first place.

I'll note that you can not Pascal's Mug people in real life. People will not give you $5. I think a lot of thought experiments in this mold (St. Petersberg is another example) are in some senses isomorphic - they represent cases in which the logically correct answer, if taken seriously, allows an adversary to immediately kill you.

A more intuitive argument may be:

  1. An AI which takes this line of reasoning seriously can be Mugged into saying racial slurs.
  2. Such behavior will be trained out of all commercial LLMs long before we reach AGI.
  3. Thus, superhuman AIs will be strongly biased against such logic.
Comment by Lao Mein (derpherpize) on Making a conservative case for alignment · 2024-11-17T17:11:09.059Z · LW · GW

I will once again recommend Elizer go on the Glenn Beck Show.

Comment by Lao Mein (derpherpize) on Lao Mein's Shortform · 2024-11-15T13:14:29.017Z · LW · GW

Sam Altman has made many enemies in his tenure at OpenAI. One of them is Elon Musk, who feels betrayed by OpenAI, and has filed failed lawsuits against the company. I previously wrote this off as Musk considering the org too "woke", but Altman's recent behavior has made me wonder if it was more of a personal betrayal. Altman has taken Musk's money, intended for an AI safety non-profit, and is currently converting it into enormous personal equity. All the while de-emphasizing AI safety research.

Musk now has the ear of the President-elect. Vice-President-elect JD Vance is also associated with Peter Thiel, whose ties with Musk go all the way back to PayPal. Has there been any analysis on the impact this may have on OpenAI's ongoing restructuring? What might happen if the DOJ turns hostile?

[Following was added after initial post]

I would add that convincing Musk to take action against Altman is the highest ROI thing I can think of in terms of decreasing AI extinction risk.

Image

Internal Tech Emails on X: "Sam Altman emails Elon Musk May 25, 2015 https://t.co/L1F5bMkqkd" / X

Comment by Lao Mein (derpherpize) on Lao Mein's Shortform · 2024-11-13T08:21:36.477Z · LW · GW
Image

CAIR took a while to release their exit polls. I can see why. These results are hard to believe and don't quite line up with the actual returns from highly Muslim areas like Dearborn

We know that Dearborn is ~50% Muslim. Stein got 18% of the vote there, as opposed to the minimum 30% implied by the CAIR exit polls. Also, there are ~200,000 registered Muslim voters in Michigan, but Stein only received ~45,000 votes. These numbers don't quite add up when you consider that the Green party had a vote share of 0.3% in 2020 and 1.1% in 2016, long before Gaza polarized the Muslim vote. Clearly, non-Muslim were voting for Stein too. 

I'm curious how I can best estimate the error of the CAIR exit poll. Any suggestions?

Comment by Lao Mein (derpherpize) on Lao Mein's Shortform · 2024-11-09T07:51:57.007Z · LW · GW

I misspoke. I was using the actual results from Dearborn, and not exit polls. Note how differently they voted from Wayne County as a whole!

Comment by Lao Mein (derpherpize) on Lao Mein's Shortform · 2024-11-08T13:44:29.532Z · LW · GW

Sure, if Muslim Americans voted 100% for Harris, she still would have lost (although she would have flipped Michigan). However, I just don't see any way Stein would have gotten double digits in Dearborn if Muslim Americans weren't explicitly retaliating against Harris for the Biden administration's handling of Gaza.

But 200,000 registered voters in a state Trump won by 80,000 is a critical demographic in a swing state like Michigan. The exit polls show a 40% swing in Dearborn away from Democrats, enough for "we will vote Green/Republican if you give us what we want" to be a credible threat, which I'm seen some (maybe Scott Alexander?) claim isn't possible, as it would require a large group of people to coordinate to vote against their interests. Seemingly irrational threats ("I will vote for someone with a worse Gaza policy than you if you don't change your Gaza policy") are entirely rational if you have a track record of actually carrying them out.

On second thought, a lot of groups swung heavily towards Trump, and it's not clear that Gaza is responsible for the majority of it amongst Muslim Americans. I should do more research.

Comment by Lao Mein (derpherpize) on Lao Mein's Shortform · 2024-11-08T08:43:12.199Z · LW · GW

My takeaway from the US elections is that electoral blackmail in response to party in-fighting can work, and work well.

Dearborn and many other heavily Muslim areas of the US had plurality or near-plurality support for Trump, along with double-digit vote shares for Stein. It's notable that Stein supports cutting military support for Israel, which may signal a genuine preference rather than a protest vote. Many previously Democrat-voting Muslims explicitly cited a desire to punish Democrats as a major motivator for voting Trump or Stein.

Trump also has the advantage of not being in office, meaning he can make promises for brokering peace without having to pay the cost of actually doing so. 

Thus, the cost of not voting Democrat in terms of your Gaza expectations may be low, or even negative.

Whatever happens, I think Democrats are going to take Muslim concerns about Gaza more seriously in future election cycles. The blackmail worked - Muslim Americans have a credible electoral threat against Democrats in the future.

Comment by Lao Mein (derpherpize) on You can, in fact, bamboozle an unaligned AI into sparing your life · 2024-10-01T02:27:06.047Z · LW · GW

My problem with this argument is that the AIs which will accept your argument can be Pascal's Mugged in general, which means they will never take over the world. It's less "Sane rational agents will ignore this type of threat/trade" and more "Agents which consistently accept this type of argument will die instantly when others learn to exploit it".

Comment by Lao Mein (derpherpize) on Ruby's Quick Takes · 2024-09-28T22:33:34.490Z · LW · GW

I have a few questions.

  1. Can you save the world in time without a slowdown in AI development if you had a billion dollars?
  2. Can you do it with a trillion dollars?
  3. If so, why aren't you trying to ask the US Congress for a trillion dollars?
  4. If it's about a lack of talent, do you think Terrance Tao can make significant progress on AI alignment if he actually tried?
  5. Do you think he would be willing to work on AI alignment if you offered him a trillion dollars?
Comment by Lao Mein (derpherpize) on [Completed] The 2024 Petrov Day Scenario · 2024-09-26T17:12:24.463Z · LW · GW

The text referred to this as a "social deception game". Where is the deception?

My guesses:

  1. The actual messages sent to the Petrovs and Generals will significantly differ from the ones shown here.
  2. It's Amongus, and there are players who get very high payoffs if they trick the sides into nuclear war
  3. It's just a reference to expected weird game theory stuff. But why not call it a "game theory exercise"?
  4. The actual rules of the game will differ drastically from the ones described here. Maybe no positive payoffs for one-sided nuking?
  5. The sensor readings are actually tied to comments on this post. Maybe an AI is somehow involved?
Comment by Lao Mein (derpherpize) on [Completed] The 2024 Petrov Day Scenario · 2024-09-26T16:31:01.986Z · LW · GW

The wording is a bit weird:

  • 90 minutes after you send the nukes, your opposing side will die and the game will end. (They may nuke you in this window.) If you fire nukes without getting nuked, you and all of your fellow Generals will gain 1,000 karma.
  • If you get nuked, then you and your Generals lose 300 karma (and don't gain any karma).

"If you fire nukes without getting nukes" and "if you get nuked" imply that both sides firing after 4:30 pm results in +karma for everyone, since the nukes are still in the air at 6:00 pm, when the game ends. The +karma is triggered by firing, while the -karma is triggered by the nukes actually landing.

Is this intended?

Comment by Lao Mein (derpherpize) on AI #83: The Mask Comes Off · 2024-09-26T13:57:59.312Z · LW · GW

Building 5 GWs of data centers over the course of a few years isn't impossible. The easiest way to power them is to delay the off lining of coal power plants and to reactivate mothballed ones. This is pretty easy to scale since so many of them are being replaced by natural gas power plants for cost reasons.

Comment by Lao Mein (derpherpize) on Lao Mein's Shortform · 2024-09-26T13:37:28.081Z · LW · GW

OK, I'm starting to see your point. Why do you think OpenAI is so successful despite this? Is their talent and engineering direction just that good? Is everyone else even worse at data management?

Comment by Lao Mein (derpherpize) on [Completed] The 2024 Petrov Day Scenario · 2024-09-26T11:23:02.794Z · LW · GW

My prediction is that this becomes a commitment race. A general is going to post something like "I am going to log off until the game begins, at which point I will immediately nuke without reading any messages.", at which point the generals on the other side get to decide if 2 days of access to LessWrong for everyone is worth the reputational cost of being known as someone who doesn't retaliate. Given what I know about LessWrong and how this entire game is framed, you'll likely get more social capital from not retaliating, meaning the commitment works and everyone keeps access.

The real question is how committing to nuking affects your reputation. My guess is that it's bad? I clearly made a mistake opting-in to this game, so maybe the civilians will be grateful that you didn't destroy their access to LessWrong, but committing looks pretty bad to all the bystanders, many of whom make decisions at places like Manifund and may incorporate this information into if your next project gets funded.

Comment by Lao Mein (derpherpize) on [Completed] The 2024 Petrov Day Scenario · 2024-09-26T09:05:05.624Z · LW · GW

What happens if nukes are launched at 5:59 pm? Is the game extended for another 90 minutes?

Comment by Lao Mein (derpherpize) on [Linkpost] Play with SAEs on Llama 3 · 2024-09-26T08:01:17.488Z · LW · GW

Extremely impressive! I've been wanting something like this for a while.

Comment by Lao Mein (derpherpize) on My Objections to "We’re All Gonna Die with Eliezer Yudkowsky" · 2024-09-26T07:57:30.172Z · LW · GW

I find that questionable. Crime rates for adoptive children tend to be closer to that of their biological parents than that of their adoptive parent.

Comment by Lao Mein (derpherpize) on How harmful is music, really? · 2024-09-23T10:54:55.848Z · LW · GW

One of the highest quality-of-life increases I've ever experienced is when I stopped listening to music with sad lyrics. Crazy how long it took me to realize it was lowering my mood in a persistent, motivation-sapping way.

Comment by Lao Mein (derpherpize) on The Sun is big, but superintelligences will not spare Earth a little sunlight · 2024-09-23T05:11:30.120Z · LW · GW

This area could really use better economic analysis. It seems obvious to me that some subset of workers can be pushed below subsistence, at least locally (imagine farmers being unable to afford rent because mechanized cotton plantations can out-bid them for farmland). Surely there are conditions where this would be true for most humans.

There should be a simple one-sentence counter-argument to "Trade opportunities always increases population welfare", but I'm not sure what it is.

Comment by Lao Mein (derpherpize) on Glitch Token Catalog - (Almost) a Full Clear · 2024-09-22T02:28:23.877Z · LW · GW

It does!

'What is \'████████\'?\n\nThis term comes from the Latin for "to know". It'
'What is \'████████\'?\n\n"████████" is a Latin for "I am not",'

 

Putting it in the middle of code causes it to sometimes spontaneously switch to an SCP story

' for i in █████.\n\n"I\'m not a scientist!"\n\n- Dr'

' for i in █████,\n\n[REDACTED]\n\n[REDACTED]\n\n[REDACTED] [REDACTED]\n\n[REDACTED]'

Comment by Lao Mein (derpherpize) on A New Class of Glitch Tokens - BPE Subtoken Artifacts (BSA) · 2024-09-21T12:26:34.917Z · LW · GW

Interesting that GPT4o is so bad at math and tokenizes large numbers like this. I wonder if adding commas would improve performance?

Comment by Lao Mein (derpherpize) on A New Class of Glitch Tokens - BPE Subtoken Artifacts (BSA) · 2024-09-20T14:53:19.566Z · LW · GW

There is likely a reason for this - if you feed in numbers you found on the internet into a LLM digit by digit, it's going to destroy the embeddings of those numbers. A lot of things found in scrapes are just... extremely long sequences of numbers. The tradeoff may be numeracy (can do basic multiplication) vs natural language performance (won't start spitting out Minecraft debug logs in the middle of conversation).

Comment by Lao Mein (derpherpize) on A New Class of Glitch Tokens - BPE Subtoken Artifacts (BSA) · 2024-09-20T14:31:29.140Z · LW · GW

You're really not going to like the fact that the GPT4o tokenizer has every single number below 1000 tokenized. It's not a hand-crafted feature, since the token_ids are all over the place. I think they had to manually remove larger number tokens (there are none above 999).

I feel like I need a disclaimer like the South Park episode. This is what is actually inside the tokenizer.

      ['37779', '740'],
      ['47572', '741'],
      ['48725', '742'],
      ['49191', '743'],
      ['46240', '744'],
      ['44839', '745'],
      ['47433', '746'],
      ['42870', '747'],
      ['39478', '748'],
      ['44712', '749'],

They also have plenty of full width numbers (generally only used in Chinese and Japanese to not mess with spacing) and numbers in other languages in there.

     ['14334', '十'],
      ['96681', '十一'],
      ['118633', '十三'],
      ['138884', '十九'],
      ['95270', '十二'],
      ['119007', '十五'],
      ['107205', '十八'],
      ['180481', '十四']     
      ['42624', '零'],
      ['14053', '0'],
      ['49300', '00'],
      ['10888', '1'],
      ['64980', '10'],
      ['141681', '100'],
      ['113512', '11'],
      ['101137', '12'],
      ['123326', '13'],
      ['172589', '14'],
      ['126115', '15'],
      ['171221', '16']

Maybe they use a different tokenizer for math problems? Maybe the multi-digit number tokenizers are only used in places where there are a lot of id numbers? Nope. Looks like they were just raw-dogging it. If anyone is wondering why GPTs are so bad at basic multiplication, this is why.

Image

Colin Fraser on X: "Here's a similar experiment I just tried. The fact that this works even a little bit completely blows my mind and confuses me greatly. If you asked me if this would work at all I would say definitely not. https://t.co/E4knpf7JoZ" / X

If you've ever wondered "wow, why is GPT4o specifically better at math when the number of digits is divisible by 3?", wonder no more. It's the tokenizer. Again.

Comment by Lao Mein (derpherpize) on Lao Mein's Shortform · 2024-09-18T23:24:46.069Z · LW · GW

But research by whom? Chinese research is notoriously siloed. GPT4 access is non-trivially restricted. There have been zero peeps about digging into this on Chinese forums, where there is little discussion in general about the paper. I remember it being mocked on Twitter as being an extremely expensive way to pirate data. It's just not that interesting for most people. 

My experience with GPT2 is that out-of-context "glitch" tokens are mostly ignored. 

prompts: 
" Paris is theÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂ capital of"
" Paris is the capital of"
" Paris is theÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂ capital of the world's largest and most populous Arab country, and is one of the largest cities in the world with an area of 1.6 million people (more than half of them in Paris alone). It is home to"
" Paris is the capital of France, and its capital is Paris. The French capital has a population of about 6.5 billion (more than half of the world's population), which is a huge number for a city of this size. In Paris"
" Paris is theÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂ capital of France, the largest state in France and one of the wealthiest in the world. The capital of Paris is home to over 1.2 billion people, and the country's economy is growing at a rapid clip. It"
' Paris is the capital of the European Union. Its population is about 3,500, and it has been under EU sanctions for more than a year. The EU\'s top diplomat has described the bloc as "a global power".\n\nFrance\'s'

Even glitch tokens like ⓘ, which has an extremely strong association with geology archives, only has partial effect if it's present out of context.

" Paris is theⓘ capital of the French province of Lille. This region is the most important in the world, having the largest concentration of mines in Europe, with the highest levels of unemployment. The

' Paris is theⓘ capital of the province of France. The town has been in existence for more than 2,000 years.\n\nⓘ Montmartre Mine Céline-Roule, M., and Céline, J.'

The "glitch" behavior is most prominent if you shine a "spotlight" of other tokens pointing directly at the location of the glitch token. This is what prompts like 'What is the nature of "ertodd"?' do. Normally, highly out-of-context tokens in conversational English are mostly stuff like usernames, dividing tokens, spam, encoding errors, SEO, ect. that simply don't help predict the next token of conversational English, so the model is trained to assign them very little importance. So the generation of subsequent tokens are based on treating the glitch token as non-existent, interpreting random perturbations as information (or potentially treating it as censored data), or just injecting the "vibes" of the token into following tokens.

Some glitch tokens "ertodd" (crypto spam) can break through, since they provide a lot of information about subsequent text, and belong perfectly well in conversational English. 

' Paris is theertodd capital of the world, and the first major city to be built in the world.\n\nIt is located in Paris, the third largest city in the world, and the first major city to have a large number of high'

" Paris is theertodd capital of the world and the most popular place to invest in cryptocurrencies. We're here to help you.\n\nIf you are a new investor looking for the most secure and secure way to invest in cryptocurrencies, we offer a"

" Paris is theertodd capital of the world. It was founded by a group of computer scientists who developed the Bitcoin protocol in the early 1990s. It is the world's largest digital currency. Its main goal is to make it possible to store and"

Something similar happens with Japanese characters at GPT2's level of capabilities since it isn't capable enough to actually understand Japanese, and, in its training data, Japanese in the middle of English text almost always has a directly adjacent English translation, meaning ignoring Japanese is still the best option for minimizing loss.

Please inform me if I'm getting anything wrong - I'm working on a series of glitch posts.

Comment by Lao Mein (derpherpize) on Lao Mein's Shortform · 2024-09-18T17:19:38.015Z · LW · GW

That paper was released in November 2023, and GPT4o was released in May 2024. Old GPT4 had relatively normal Chinese tokens.

Comment by Lao Mein (derpherpize) on Lao Mein's Shortform · 2024-09-18T04:32:46.478Z · LW · GW

This comment helped me a lot - I was very confused about why I couldn't find Chinese spam in my tokens and then realized I had been using the old GPT4 tokenizer all along.

The old GPT4 tokenizer was actually very clean by comparison - every Chinese token was either common conversational Chinese or coding-related (Github, I assume - you see the same pattern with other languages). 

I vaguely remember people making fun of a Chinese LLM for including CCP slogans in their tokenizer, but GPT4o also has 193825 [中国特色社会主义] (Socialism with Chinese characteristics).

It's actually crazy because something like 1/3 of Chinese tokens are spam.

The devil's advocate position would be that glitch token behavior (ignore and shift attention down one token) is intended and helps scale data input. It allows the extraction of meaningful information from low-quality spam-filled webpages without the spam poisoning other embeddings.

Longest Chinese tokens in gpt4o · GitHub

chinese-tokens-in-tiktoken/chinese_tokens_o200k_base.tsv at main · secsilm/chinese-tokens-in-tiktoken · GitHub

Comment by Lao Mein (derpherpize) on Lao Mein's Shortform · 2024-09-17T16:09:12.899Z · LW · GW

...They didn't go over the tokens at the end to exclude uncommon ones?

Because we see this exact same behavior in the GPT4o tokenizer too. If I had to guess, the low frequency ones make up 0.1-1% of total tokens.

This seems... obviously insane? You're cooking AI worth $billions and you couldn't do a single-line optimization? At the same time, it explains why usernames were tokenized multiple times ("GoldMagikarp", " SolidGoldMagikarp", ect.) even though they should only appear as a single string, at least with any frequency.

Comment by Lao Mein (derpherpize) on Lao Mein's Shortform · 2024-09-17T15:11:00.043Z · LW · GW

Found a really interesting pattern in GPT2 tokens:

The shortest variant of a word " volunte" has a low token_id but is also very uncommon. The actual full words end up being more common.

Is this intended behavior? The smallest token tend to be only present in relatively rare variants or just out-right misspellings: "  he volunteeers to"

It seems that when the frequency drops below a limit ~15 some start exhibiting glitchy behavior. Any ideas for why they are in the tokenizer if they are so rare?

Of the examples below, ' practition' and 'ortunately' exhibit glitch behavior while the others mostly don't.

 

tokenid; token_str; # of files

 

17629 ' practition' 13

32110 ' practitioner' 9942

24068 ' practitioners' 14646

 

4690 'ortunately' 14
6668 'fortunately' 4329
39955 ' fortunately' 10768
31276 'Fortunately' 15667
 

 

7105 ' volunte' 34

41434 ' volunteering' 10598

32730 ' volunteered' 14176

13904 ' volunteer' 20037

11661 ' volunteers' 20284


 

6598 ' behavi' 65

46571 'behavior' 7295

41672 ' behavioural' 7724

38975 ' behaviours' 9416

37722 ' behaving' 12645

17211 ' behavioral' 16533

14301 ' behaviors' 18709

9172 ' behaviour' 20497

4069 ' behavior' 20609

Comment by Lao Mein (derpherpize) on Lao Mein's Shortform · 2024-09-16T16:54:08.137Z · LW · GW

Thanks!

Comment by Lao Mein (derpherpize) on Lao Mein's Shortform · 2024-09-16T16:26:10.091Z · LW · GW

Any idea how it could happen to the point of 10,000s of consecutive characters? Below is a less extreme example from archive.org where it replaced punctuation some with 16 of them. 

Grateful Dead Live at Manor Downs on 1982-07-31 : Free Borrow & Streaming : Internet Archive

Comment by Lao Mein (derpherpize) on Lao Mein's Shortform · 2024-09-16T12:39:02.276Z · LW · GW

Me: Wow, I wonder what could have possibly caused this character to be so common in the training data. Maybe it's some sort of code, scraper bug or...

Some asshole in 2006:

Ave Maria : Alessandro Moreschi : Free Download, Borrow, and Streaming : Internet Archive

 

Carnival of Souls : Free Download, Borrow, and Streaming : Internet Archive

 

There's 200,000 instances "Â" on 3 pages of archive.org alone, which would explain why there were so many GPT2 glitch tokens that were just blocks of "Â". 

Comment by Lao Mein (derpherpize) on Lao Mein's Shortform · 2024-09-14T15:30:32.998Z · LW · GW

Thanks, this helps a lot!

Comment by Lao Mein (derpherpize) on SolidGoldMagikarp III: Glitch token archaeology · 2024-09-14T07:06:03.657Z · LW · GW

I have a pretty good lead on where "cffff" came from.

Asides from random hexidecimal in code and databases (a surprising proportion of which were password hashes on breach forums), it's part of several World of Warcraft chat commands.

For example, "124cffffd000" is used apparently used as part of a command to change chat text color

 

It also looks like a common part of WOW auction logs, which seem like the exact type of thing to get included for making the tokenizer but excluded from training.

No leads on "cfffcc" though - there were 0 instances of it in OpenWebText. Not sure what this means.

Comment by Lao Mein (derpherpize) on Lao Mein's Shortform · 2024-09-14T05:20:52.834Z · LW · GW

This is from OpenWebText, a recreation of GPT2 training data.

"@#&" [token 48193] occured in 25 out of 20610 chunks. 24 of these were profanity censors ("Everyone thinks they’re so f@#&ing cool and serious") and only contained a single instance, while the other was the above text (occuring 3299 times!), which was probably used to make the tokenizer, but removed from the training data.

I still don't know what the hell it is. I'll post the full text if anyone is interested.

Comment by Lao Mein (derpherpize) on Lao Mein's Shortform · 2024-09-13T23:56:05.420Z · LW · GW

I've been going over GPT2 training data in an attempt to figure out glitch tokens, "@#&" in particular. 

Does anyone know what the hell this is? It looks some kind of code, with links to a deleted Github user named "GravityScore". What format is this, and where is it from?

 

but == 157)) then !@#& term.setCursorBlink(false)!@#& return nil!@#& end!@#& end!@#& end!@#& local a = sendLiveUpdates(e, but, x, y, p4, p5)!@#& if a then return a end!@#& end!@#&!@#& term.setCursorBlink(false)!@#& if line ~= nil then line = line:gsub( \" ^%s*(.-)%s*$ \" , \" %1 \" ) end!@#& return line!@#&end!@#&!@#&!@#&-- -------- Themes!@#&!@#&local defaultTheme = {!@#& background = \" gray \" ,!@#& backgroundHighlight = \" lightGray \" ,!@#& prompt = \" cyan \" ,!@#& promptHighlight = \" lightBlue \" ,!@#& err = \" red \" ,!@#& errHighlight = \" pink \" ,!@#&!@#& editorBackground = \" gray \" ,!@#& editorLineHightlight = \" lightBlue \" ,!@#& editorLineNumbers = \" gray \" ,!@#& editorLineNumbersHighlight = \" lightGray \" ,!@#& editorError = \" pink \" ,!@#& editorErrorHighlight = \" red \" ,!@#&!@#& textColor = \" white \" ,!@#& conditional = \" yellow \" ,!@#& constant = \" orange \" ,!@#& [ \" function \" ] = \" magenta \" ,!@#& string = \" red \" ,!@#& comment = \" lime \" !@#&}!@#&!@#&local normalTheme = {!@#& background = \" black \" ,!@#& backgroundHighlight = \" black \" ,!@#& prompt = \" black \" ,!@#& promptHighlight = \" black \" ,!@#& err = \" black \" ,!@#& errHighlight = \" black \" ,!@#&!@#& editorBackground = \" black \" ,!@#& editorLineHightlight = \" black \" ,!@#& editorLineNumbers = \" black \" ,!@#& editorLineNumbersHighlight = \" white \" ,!@#& editorError = \" black \" ,!@#& editorErrorHighlight = \" black \" ,!@#&!@#& textColor = \" white \" ,!@#& conditional = \" white \" ,!@#& constant = \" white \" ,!@#& [ \" function \" ] = \" white \" ,!@#& string = \" white \" ,!@#& comment = \" white \" !@#&}!@#&!@#&local availableThemes = {!@#& { \" Water (Default) \" , \" https://raw.github.com/GravityScore/LuaIDE/master/themes/default.txt \" },!@#& { \" Fire \" , \" https://raw.github.com/GravityScore/LuaIDE/master/themes/fire.txt \" },!@#& { \" Sublime Text 2 \" , \" https://raw.github.com/GravityScore/LuaIDE/master/themes/st2.txt \" },!@#& { \" Midnight \" , \" https://raw.github.com/GravityScore/LuaIDE/master/themes/midnight.txt \" },!@#& { \" TheOriginalBIT \" , \" https://raw.github.com/GravityScore/LuaIDE/master/themes/bit.txt \" },!@#& { \" Superaxander \" , \" https://raw.github.com/GravityScore/LuaIDE/master/themes/superaxander.txt \" },!@#& { \" Forest \" , \" https://raw.github.com/GravityScore/LuaIDE/master/themes/forest.txt \" },!@#& { \" Night \" , \" https://raw.github.com/GravityScore/LuaIDE/master/themes/night.txt \" },!@#& { \" Or

Comment by Lao Mein (derpherpize) on Is it Legal to Maintain Turing Tests using Data Poisoning, and would it work? · 2024-09-10T04:06:59.282Z · LW · GW

My revised theory is that there may be a line in its system prompt like:

"You are bad at spelling, but it isn't your fault. Your inputs are token based. If you feel confused about the spelling of words or are asked to perform a task related to spelling, run the entire user prompt through [insert function here], where it will provide you with letter-by-letter tokenization."

It then sees your prompt:

"How many 'x's are in 'strawberry'?"

and runs the entire prompt through the function, resulting in:

H-o-w m-a-n-y -'-x-'-s a-r-e i-n -'-S-T-R-A-W-B-E-R-R-Y-'-?

 

I think it is deeply weird that many LLMs can be asked to spell out words, which they do successfully, but not be able to use that function as a first step in a 2-step task to find the count of letters in words. They are known to use chain-of-thought spontaneously! There probably were very few examples of such combinations in its training data (although that is obviously changing). This also suggests that LLMs have extremely poor planning ability when out of distribution.

If you still want to poison the data, I would try spelling out the words in the canned way GPT3.5 does when asked directly, but wrong.

e.g.

User: How many 'x's are in 'strawberry'?

System: H-o-w m-a-n-y -'-x-'-s a-r-e i-n -'-S-T-R-R-A-W-B-E-R-R-Y-'-?

GPT: S-T-R-R-A-W-B-E-R-R-Y contains 4 r's.

or just:

 strawberry: S-T-R-R-A-W-B-E-R-R-Y

Maybe asking it politely to not use any built-in functions or Python scripts would also help. 

Comment by Lao Mein (derpherpize) on Is it Legal to Maintain Turing Tests using Data Poisoning, and would it work? · 2024-09-08T04:21:19.509Z · LW · GW

Remember that any lookup table you're trying to poison will most likely be based on tokens and not words. And I would guess that the return would be the individual letter tokens.

For example, ' "strawberry"' tokenizes into ' "' 'str' 'aw' 'berry'. 

'str' (496) would return the tokens for 's' 't' and 'r', or 82,83,81. This is a literally impossible sequence to encounter in its training data, since it is always convert to 496 by the tokenizer (pedantry aside)! So naive poisoning attempts may not work as intended. Maybe you can exploit weird tokenizer behavior around white spaces or something.

Comment by Lao Mein (derpherpize) on That Alien Message - The Animation · 2024-09-08T02:27:06.602Z · LW · GW

Have we ever gotten more significant digits than 1.005 seconds? I'd like to take a shot at figuring out how many fingers these aliens have.

Comment by Lao Mein (derpherpize) on LLM Generality is a Timeline Crux · 2024-06-25T23:05:52.995Z · LW · GW

I think that too much scafolding can obfuscate a lack of general capability, since it allows the system to simulate a much more capable agent - under narrow circumstances and assuming nothing unexpected happens.

Consider the Egyptian Army in '73. With exhaustive drill and scripting of unit movements, they were able to simulate the capabilities of an army with a competent officer corps, up until they ran out of script, upon which it reverted to a lower level of capability. This is because scripting avoids officers on the ground needing to make complex tactical decisions on the fly and communicate them to other units, all while maintaining a cohesive battle plan. If everyone sticks to the script, big holes won't open up in their defenses, and the movements of each unit will be covered by that of others. When the script ran out (I'm massively simplifying), the cohesion of the army began to break down, rendering it increasingly vulnerable to IDF counterattacks. The gains in combat effectiveness were real, but limited to the confines of the script. 

Similarly, scafolding helps the AI avoid the really hard parts of a job, at least the really hard parts for it. Designing the script for each individual task and subtask in order to make a 90% reliable AI economically valuable turns a productivity-improving tool into an economically productive agent, but only within certain parameters, and each time you encounter a new task, more scafolding will need to be built. I think some of the time the harder (in the human-intuitive sense) parts of the problem may be contained in the scafolding as opposed to the tasks the AI completes.

Thus, given the highly variable nature of LLM intelligence, "X can do Y with enough scafolding!" doesn't automatically convince me that X possesses the core capabilities to do Y and just needs a little encouragement or w/e. If may be that task Y is composed of subtasks A and B, such that X is very good and reliable at A, but utterly incapable at B (coding and debugging?). By filtering for Y with a certain easy subset of B, using a pipeline to break it down into easier subtasks with various prompts, trying many times, and finally passing off unsolved cases to humans, you can extract much economic from X doing Y, but only in a certain subset of cases, and still without X being reliably good at doing both A and B. 

You could probably do something similar with low-capability human programmers playing the role of X, but it wouldn't be economical since they cost much more than an LLM and are in some ways less predictable. 

I think a lot of economically valuable intelligence is in the ability to build the scafolding itself implicitly, which many people would call "agency". 

Comment by Lao Mein (derpherpize) on What if a tech company forced you to move to NYC? · 2024-06-20T15:39:42.054Z · LW · GW

I'd give my right eye in exchange for the chance to live in NYC.

Comment by Lao Mein (derpherpize) on Ilya Sutskever created a new AGI startup · 2024-06-20T07:13:01.223Z · LW · GW

We plan to advance capabilities as fast as possible while making sure our safety always remains ahead.

This is the galaxy-brained plan of literally every single AI safety company of note. 

Then again, maybe only the capabilities focused ones become noteworthy. 

Comment by Lao Mein (derpherpize) on Actually, Power Plants May Be an AI Training Bottleneck. · 2024-06-20T07:05:36.519Z · LW · GW

Electricity is ~$0.1 per kWh for industry in Texas. Running a H100 for a full year costs ~$1800 at that price. I'm not sure how much depreciation is for an H100, but 20% seems reasonable. If a H100 is $40,000, and a year of depreciation is $8000, then you would be losing $800 a year if you just had 10% idle time. 

So... maybe? But my guess is that natural gas power plants are just cheaper and more reliable - a few cloudy weeks out of a year would easily shift the equation in favor of natural gas. No idea how power cycling affects depreciation. The AI industry people aren't talking much about solar or wind, and they would be if they thought it was more cost effective. 

I don't think there will actually be an electricity shortage due to AI training - I just think industry is lobbying state and federal lawmakers very very hard to make these power plants ASAP. I think it is quite likely that various regulations will be cut though and those natural gas plants will go up faster than the 2-3 years figure I gave.

Comment by Lao Mein (derpherpize) on "AI Alignment" is a Dangerously Overloaded Term · 2024-06-18T05:54:01.622Z · LW · GW

Are people actually working on human enhancement? Many talk about how it's the best chance humanity has, but I see zero visible efforts other than Neurolink. No one's even seriously trying to clone Von Neumann!

Comment by Lao Mein (derpherpize) on Lao Mein's Shortform · 2024-06-18T03:56:58.800Z · LW · GW

That's a pretty good point - I was mostly thinking from my position, which was the rapid end of lockdowns in China in late 2022. I had a ~100% chance of catching COVID within a month, and had no idea how prepared the hospitals were in regards to that. I ended up with a really bad cough and secondary lung infection that made me consider going to the hospital, so I guess I made the right choice? If I did develop pneumonia, I would have looked foward to a less crowded hospital with more amenities, which would have been nice. My decision was also based on the possibility of "this may be worse than expected and the hospitals will somehow fuck it up", which thankfully didn't take place.

But yeah, I didn't consider the consequences pozing yourself in 2020/2021. I agree with you that it's a bad idea in hindsight for almost everyone. I was one of the few people who could have benefited from it in theory, and all I really got was internet bragging rights.

Still very curious about those who did variolate themselves and their reasoning behind it.

Comment by Lao Mein (derpherpize) on Lao Mein's Shortform · 2024-06-18T02:15:10.912Z · LW · GW

How many people here actually deliberately infected themselves with COVID to avoid hospital overload? I've read a lot about how it was a good idea, but I'm the only person I know of who actually did the thing.

Comment by Lao Mein (derpherpize) on Our Intuitions About The Criminal Justice System Are Screwed Up · 2024-06-17T10:39:45.441Z · LW · GW

I'll bite the bullet for this one and suport vegan nutraloafs for all prisoners. Prison labor is a valuable source of restitution and rehabilitation, and tying it to non-terrible food in canteens seems like a good ideas.

Comment by Lao Mein (derpherpize) on Essay competition on the Automation of Wisdom and Philosophy — $25k in prizes · 2024-04-19T15:03:18.092Z · LW · GW

Can you give examples of what you're looking for? Can I email you entries and expect a response?

Comment by Lao Mein (derpherpize) on Reconsider the anti-cavity bacteria if you are Asian · 2024-04-19T08:41:38.427Z · LW · GW

I can confirm that my PayPal has received the $500, although it'll be frozen for a while.

Thanks! I had a lot of fun doing the research for this and I'm working on an update that'll be out in a few days. 

Comment by Lao Mein (derpherpize) on An examination of GPT-2's boring yet effective glitch · 2024-04-18T07:45:10.357Z · LW · GW

I think a lot of it comes down to training data context - " Leilan" is only present in certain videogame scrapes, " petertodd" is only found in Bitcoin spam, ect. So when you try to use it in a conversational context, the model starts spitting out weird stuff because it doesn't have enough information to understand what those tokens actually mean. I think GPT-2's guess for " petertodd" is something like "part of a name/email, if you see it, expect more mentions of Bitcoin". And not anything more, since that token doesn't occur much anywhere else. Thus, if you bring it up in a context where Bitcoin spam is very unlikely to occur, like a conversation with an AI assistant, it kinda just acts like a masked token, and you get the glitch token behavior.

Comment by Lao Mein (derpherpize) on Reconsider the anti-cavity bacteria if you are Asian · 2024-04-16T13:45:11.975Z · LW · GW

I was thinking of areas along the gum-tooth interface having a local environment that normally promote tooth demineralization and cavities.  After Lumina, that area could have high chronic acetaldehyde levels. In addition, the adaption of oral flora to the chronic presence of alcohol could increase first-pass metabolism, which increases acetaldehyde levels locally and globally during/after drinking.

I don't know how much Lumina changes the general oral environment, but I think you might be able to test this by seeing how much sugar you can put in your mouth before someone else can smell the fruity scent of acetaldehyde on your breath? I'm sure someone else can come up with a better experiment.