Posts

SAEs Discover Meaningful Features in the IOI Task 2024-06-05T23:48:04.808Z
An Interpretability Illusion for Activation Patching of Arbitrary Subspaces 2023-08-29T01:04:18.688Z

Comments

Comment by Georg Lange (GeorgLange) on Some costs of superposition · 2024-03-19T17:53:43.201Z · LW · GW

Calculating l, the maximal number of simultaneously active features, yields strange results. For example, if we have 100 features and 100 neurons, l has to be < 100/(8 * ln(100)) = 2.7. But I would expect that 100 features can be simultaneously active because we have 100 dimensions, so the features can be orthogonal and independent. Am I understanding something wrong?