OpenAI lied about SFT vs. RLHF

sanxiyn

OpenAI lied about SFT vs. RLHF

post by sanxiyn · 2025-02-10T03:24:16.625Z · LW · GW · 2 comments

This is a link post for https://x.com/BlancheMinerva/status/1887801044777807897

2 comments

I used to think while OpenAI is pretty deceitful (eg for-profit conversion) it generally won't lie about its research. This is a pretty definitive case of lying, so I updated accordingly. I am posting here because it doesn't seem to be widely known.

2 comments

Comments sorted by top scores.

comment by peterbarnett · 2025-02-10T03:58:49.325Z · LW(p) · GW(p)

This was discussed in this post: Update to Mysteries of mode collapse: text-davinci-002 not RLHF [LW · GW]
I don't think OpenAI explicitly lied about text-davinci-002 being the same model as InstructGPT. I think if you weren't very carefully reading OpenAI's documentation it was pretty easy to believe that text-davinci-002 was InstructGPT (and hence trained with RLHF). I don't think OpenAI as an organization did much to clear this up, although individual researchers did.

Replies from: sanxiyn

↑ comment by sanxiyn · 2025-02-10T04:40:50.398Z · LW(p) · GW(p)

I think if you weren't carefully reading OpenAI's documentation it was pretty easy to believe that text-davinci-002 was InstructGPT (and hence trained with RLHF).

Not only was it easy, in fact many people did (including myself). In fact, can you point a single case of people NOT making this reading mistake? As in, after January 2022 instruction following announcement, but before October 2022 model index for researchers. Jan Leike's tweet you linked to postdates October 2022 and does not count. The allegation is that OpenAI lied (or at the very least was extremely misleading) for ten months of 2022. I am more ambivalent about post October 2022.

OpenAI lied about SFT vs. RLHF

Contents

2 comments