OpenAI lied about SFT vs. RLHF
post by sanxiyn · 2025-02-10T03:24:16.625Z · LW · GW · 2 commentsThis is a link post for https://x.com/BlancheMinerva/status/1887801044777807897
Contents
2 comments
I used to think while OpenAI is pretty deceitful (eg for-profit conversion) it generally won't lie about its research. This is a pretty definitive case of lying, so I updated accordingly. I am posting here because it doesn't seem to be widely known.
2 comments
Comments sorted by top scores.
comment by peterbarnett · 2025-02-10T03:58:49.325Z · LW(p) · GW(p)
This was discussed in this post: Update to Mysteries of mode collapse: text-davinci-002 not RLHF [LW · GW]
I don't think OpenAI explicitly lied about text-davinci-002 being the same model as InstructGPT. I think if you weren't very carefully reading OpenAI's documentation it was pretty easy to believe that text-davinci-002 was InstructGPT (and hence trained with RLHF). I don't think OpenAI as an organization did much to clear this up, although individual researchers did.
↑ comment by sanxiyn · 2025-02-10T04:40:50.398Z · LW(p) · GW(p)
I think if you weren't carefully reading OpenAI's documentation it was pretty easy to believe that text-davinci-002 was InstructGPT (and hence trained with RLHF).
Not only was it easy, in fact many people did (including myself). In fact, can you point a single case of people NOT making this reading mistake? As in, after January 2022 instruction following announcement, but before October 2022 model index for researchers. Jan Leike's tweet you linked to postdates October 2022 and does not count. The allegation is that OpenAI lied (or at the very least was extremely misleading) for ten months of 2022. I am more ambivalent about post October 2022.