Posts
SAE regularization produces more interpretable models
2025-01-28T20:02:56.662Z
Peter Lai's Shortform
2025-01-25T19:41:33.057Z
Comments
Comment by
Peter Lai (peter-lai) on
SAE regularization produces more interpretable models ·
2025-01-28T22:08:17.756Z ·
LW ·
GW
Yep, the graphs in this post reflect the values of features extracted through training new a SAE on the activations of the "regularized" weights.