Posts
Evaluating Sparse Autoencoders with Board Game Models
2024-08-02T19:50:21.525Z
Interpreting Preference Models w/ Sparse Autoencoders
2024-07-01T21:35:40.603Z
Finding Backward Chaining Circuits in Transformers Trained on Tree Search
2024-05-28T05:29:46.777Z
Improving SAE's by Sqrt()-ing L1 & Removing Lowest Activating Features
2024-03-15T16:30:00.744Z