Posts
[Linkpost] Interpretable Analysis of Features Found in Open-source Sparse Autoencoder (partial replication)
2024-09-09T03:33:53.548Z
Comments
Comment by
Fernando Avalos (fernando-avalos) on
Alignment and Deep Learning ·
2024-03-09T20:18:02.152Z ·
LW ·
GW
I got redirected into here via AI Safety Ideas. Was this idea ever implemented? Has something similar been attempted? As someone who just got into the field and is looking to test its fit, I'm willing to invest time and effort to get an MVP working.