Posts

Comments

Comment by Jiaxin Wen on Auditing language models for hidden objectives · 2025-04-04T21:20:59.322Z · LW · GW

interesting! do you mean experiments in Sec 3.9.2?