deontologician

Posts
Comments

Posts

Comments

Comment by deontologician (josh-kuhn-1) on ARC tests to see if GPT-4 can escape human control; GPT-4 failed to do so · 2023-03-15T05:14:12.867Z · LW · GW

Yeah, if the RLHF is supposed to train honesty into it, maybe it would have done worse on the task rabbit task. This really seems like a PR throwaway line rather than a legit attempt to red team the final model.

User info

Posts

Comments