Posts

Evaluating Sparse Autoencoders with Board Game Models 2024-08-02T19:50:21.525Z
Interpreting Preference Models w/ Sparse Autoencoders 2024-07-01T21:35:40.603Z
Finding Backward Chaining Circuits in Transformers Trained on Tree Search 2024-05-28T05:29:46.777Z
Improving SAE's by Sqrt()-ing L1 & Removing Lowest Activating Features 2024-03-15T16:30:00.744Z

Comments