Posts

SAE regularization produces more interpretable models 2025-01-28T20:02:56.662Z
Peter Lai's Shortform 2025-01-25T19:41:33.057Z

Comments

Comment by Peter Lai (peter-lai) on SAE regularization produces more interpretable models · 2025-01-28T22:08:17.756Z · LW · GW

Yep, the graphs in this post reflect the values of features extracted through training new a SAE on the activations of the "regularized" weights.