Posts

SAEs Discover Meaningful Features in the IOI Task 2024-06-05T23:48:04.808Z
An Interpretability Illusion for Activation Patching of Arbitrary Subspaces 2023-08-29T01:04:18.688Z

Comments

Comment by Alex Makelov (amakelov) on SAEs Discover Meaningful Features in the IOI Task · 2024-06-16T14:05:06.749Z · LW · GW

Hi - there's code here https://github.com/amakelov/sae which covers almost everything reported in the blog post. Let me know if you have more specific questions (or open an issue) and I can point to / explain specific parts of the code!