Tom Shlomi's Shortform
post by Tom Shlomi (tom-shlomi-1) · 2023-02-21T05:45:59.435Z · LW · GW · 2 commentsContents
2 comments
2 comments
Comments sorted by top scores.
comment by Tom Shlomi (tom-shlomi-1) · 2023-02-21T05:45:59.683Z · LW(p) · GW(p)
Talking about what a language model "knows" feels confused. There's a big distinction between what a language model can tell you if you ask it directly, what it can tell you if you ask it with some clever prompting, and what a smart alien could tell you after only interacting with that model. A moderately smart alien that could interact with GPT-3 could correctly answer far more questions than GPT-3 can even with any amount of clever prompting.
Replies from: tom-shlomi-1↑ comment by Tom Shlomi (tom-shlomi-1) · 2023-02-21T06:15:45.666Z · LW(p) · GW(p)
The Constitutional AI paper, in a sense, shows that a smart alien with access to an RLHFed helpful language model can figure out how to write text according to a set of human-defined rules. It scares me a bit that this works well, and I worry that this sort of self-improvement is going to be a major source of capabilities progress going forward.