Posts

A Sober Look at Steering Vectors for LLMs 2024-11-23T17:30:00.745Z
Dima's Shortform 2024-08-22T14:49:00.960Z

Comments

Comment by Dmitrii Krasheninnikov (dmitrii-krasheninnikov) on Meta learning to gradient hack · 2022-07-06T16:31:10.770Z · LW · GW

Could you please share the results in case you ended up finishing those experiments?