Posts

Comments

Comment by Ian Johnson (ian-johnson) on Showing SAE Latents Are Not Atomic Using Meta-SAEs · 2024-08-24T12:27:42.427Z · LW · GW

Are the datasets used to train the meta-SAEs the same as the datasets to train the original SAEs? If atomicity in a subdomain were a goal, would training a meta-SAE with a domain-specific dataset be interesting?

It seems like being able to target how atomic certain kinds of features are would be useful. Especially if you are focused on something, like identifying functionality/structure rather than knowledge. A specific example would be training on a large code dataset along with code QA. Would we find more atomic "bug features" like in scaling monosemanticity?