Posts
SAEs Discover Meaningful Features in the IOI Task
2024-06-05T23:48:04.808Z
An Interpretability Illusion for Activation Patching of Arbitrary Subspaces
2023-08-29T01:04:18.688Z
Comments
Comment by
Alex Makelov (amakelov) on
SAEs Discover Meaningful Features in the IOI Task ·
2024-06-16T14:05:06.749Z ·
LW ·
GW
Hi - there's code here https://github.com/amakelov/sae which covers almost everything reported in the blog post. Let me know if you have more specific questions (or open an issue) and I can point to / explain specific parts of the code!