[ASoT] Thoughts on GPT-N
post by Ulisse Mini (ulisse-mini) · 2022-11-08T07:14:37.900Z · LW · GW · 0 commentsContents
No comments
Editor's note: This is an alignment stream-of-thought post, meaning it's literally a rambling catalog of my thoughts on something.
TLDR: GPT-n isn't AGI, but you can probably get AGI relatively easily by RL finetuning and/or self-improvement. Some people wrongly think GPT-n will immediately kill them though.
I believe that "GPT-n" and similar systems [LW · GW]
- Won't be agents/consequentialists by default, they will only simulate consequentialist sub-agent/simulacra [LW · GW]. GPT-n won't kill you by default, GPT-n doesn't "want" anything.
- (I'm uncertain about if sufficiently powerful "things" (like GPT-n) will become "agents"; currently, I think this isn't the case, since modeling GPT-3 as an agent doesn't work well.)
- In GPT-n, The intelligence of each individual simulacra is approximately bounded by the intelligence of the smartest human in the training distribution. Prompting GPT-n with "A solution to the alignment problem:" will never work. GPT-n is not an oracle. (This doesn't imply mimicry is doomed, though I'm pessimistic for reasons John gives [LW · GW] plus requiring the coordination to simulate alignment researchers but not capabilities researchers which seems hard to me.)
- GPT-n may well be superintelligent (It already is [LW · GW] in some sense) but even if GPT-n understands more (e.g. about physics) than the smartest human, it will never report this understanding because that isn't what a human on the training distribution would be likely to write.
- This isn't to say GPT-n won't lead to AGI. Just that there's some nontrivial prompting/finetuning to be done in order to access the latent knowledge. I'm not sure how hard this step is. The main angles of attack I see: using GPT-n to self-improve [LW(p) · GW(p)], RL-based fine-tuning, or maybe crazy Codex-based stuff like a fancier AutoML-Zero.
0 comments
Comments sorted by top scores.