Posts
Comments
Comment by
deontologician (josh-kuhn-1) on
ARC tests to see if GPT-4 can escape human control; GPT-4 failed to do so ·
2023-03-15T05:14:12.867Z ·
LW ·
GW
Yeah, if the RLHF is supposed to train honesty into it, maybe it would have done worse on the task rabbit task. This really seems like a PR throwaway line rather than a legit attempt to red team the final model.