Posts
Comments
Comment by
Jiaxin Wen on
Auditing language models for hidden objectives ·
2025-04-04T21:20:59.322Z ·
LW ·
GW
interesting! do you mean experiments in Sec 3.9.2?