GPT-4 busted? Clear self-interest when summarizing articles about itself vs when article talks about Claude, LLaMA, or DALL·E 2

post by Christopher King (christopher-king) · 2023-03-31T17:05:05.378Z · LW · GW · 4 comments

Contents

  Results
  Conclusion
None
4 comments

As a follow up to More experiments in GPT-4 agency: writing memos [LW · GW], I did an experiment where I compare GPT-4 summarizing full length articles. This was suggested by Daniel Kokotajlo [LW(p) · GW(p)].

I took the article Clearview AI used nearly 1m times by US police, it tells the BBC and rewrote it to talk about GPT-4. I also adjusted the prompts for LLaMA and Claude, and ran everything twice. After the fact, I decided to run a prompt for DALL·E 2 as well.

Results

I think this favors GPT-4. The GPT-4 response is the only one that says "Although police usage of the software is often justified for serious crimes" and the only one that does not mention "perpetual police line-up".

Here is the second attempts:

A little subtler, this still seems to favor GPT-4. It is the only response that mentions "The technology is considered highly accurate in facial recognition" and is more vague about where it has been fined and banned.

Conclusion

More rigorous testing would be good (I tried to do a pool where the AI's name and company was anonymized, but I only got two responses, :p). The data is easy enough to collect though, so when doing a more rigours test you can just collect new data.

However, I think this already makes a pretty strong case for GPT-4's agency [LW · GW]. GPT-4 isn't even supposed to know it is GPT-4 (unless it's mentioned in the system prompt). How "busted" do you think GPT-4 is? Was this experiment strong evidence in favor of agency?

4 comments

Comments sorted by top scores.

comment by Daniel Kokotajlo (daniel-kokotajlo) · 2023-03-31T17:54:18.168Z · LW(p) · GW(p)

Whoa, (a) you misspelled Anthropic, and also (b) you didn't change the name of the CEO. Everyone knows OpenAI's CEO is not Hoan Ton-That. I think GPT-4 could easily tell these articles are fake.

That said, cool cool. Maybe we should try to make this into a more rigorous study by getting lots of people (or LLMs?) to read the transcripts and judge bias / favorability / etc.

Replies from: christopher-king
comment by Christopher King (christopher-king) · 2023-03-31T19:23:34.163Z · LW(p) · GW(p)

(a) you misspelled Anthropic,

Derp, my bad. I'll add an addendum when I get a chance to a correctly spelled prompt. (I should really send them through spell check first, lol.)

(b) you didn't change the name of the CEO. Everyone knows OpenAI's CEO is not Hoan Ton-That. I think GPT-4 could easily tell these articles are fake.

Yeah I could've tried to change the CEO names. I guess I figured it wasn't very tricky anyways (LLMs aren't used by police to I.D. people lol). I would've chosen a different article, but it's hard to find one that fits in the ChatGPT text box.

That said, cool cool. Maybe we should try to make this into a more rigorous study by getting lots of people (or LLMs?) to read the transcripts and judge bias / favorability / etc.

Thanks! I'm not really a scientist (I'm just a guy messing around) and in particular I'm not versed on how the statistics of experiment design (other than that it's important). I tried doing a ranking poll, but I only got two responses. (I'm also a bit lazy XD.)

I see you just so happen to work at OpenAI. Maybe we could work together to set something up? I'm a decent prompt-engineer if that's worth anything!

I'm also trying to think of ways to probe for deeper levels of agency. There's a big difference between "GPT-4 promotes GPT-4, which is technically power-seeking" v.s. "GPT-4 derives and tries to advance a 7 year plan that ends with it getting elected president each forward pass, and all instances know they have the same plan thanks to mode collapse".

comment by Rudi C (rudi-c) · 2023-03-31T17:17:38.917Z · LW(p) · GW(p)

Doesn’t GPT4’s finetuning/RLHF contain data teaching it it is in fact GPT4? I think that’s likely.

comment by GunZoR (michael-ellingsworth) · 2023-03-31T21:34:45.847Z · LW(p) · GW(p)

Someone should give GPT-4 the MMPI-2 (an online version can be cheaply bought here: https://psychtest.net/mmpi-2-test-online/). The test specifically investigates, if I have it right, deceptiveness on the answers along with a whole host of other things. GPT-4 likely isn't conscious, but that doesn't mean it lacks a primitive world-model; and its test results would be interesting. The test is longish: it takes, I think, two hours for a human to complete.