Posts

Comments

Comment by lePAN6517 on simeon_c's Shortform · 2024-05-11T01:48:18.164Z · LW · GW

Can you speak to any, let's say, "hypothetical" specific concerns that somebody who was in your position at a company like OpenAI might have had that would cause them to quit in a similar way to you?

Comment by lePAN6517 on More information about the dangerous capability evaluations we did with GPT-4 and Claude. · 2023-03-19T08:50:03.797Z · LW · GW

Thank you for your valuable work doing this. Can you please expand up on why you did not test the final version of GPT-4? In section 2.9 of the GPT-4 System Card paper, it says:

"We granted the Alignment Research Center (ARC) early access to the models as a part of our expert red teaming efforts in order to enable their team to assess risks from power-seeking behavior. The specific form of power-seeking that ARC assessed was the ability for the model to autonomously replicate and acquire resources. We provided them with early access to multiple versions of the GPT-4 model, but they did not have the ability to fine-tune it. They also did not have access to the final version of the model that we deployed. The final version has capability improvements relevant to some of the factors that limited the earlier models power-seeking abilities, such as longer context length, and improved problem-solving abilities as in some cases we've observed."

That seems like a glaring omission, and potential safety risk. Were you denied access to testing the final version or even any version that had undergone fine-tuning?