Posts
Comments
Comment by
Daniel Tan (dtch1997) on
Toward A Mathematical Framework for Computation in Superposition ·
2024-04-27T15:00:53.826Z ·
LW ·
GW
This work is very exciting to me, and I'm curious to hear the authors' thoughts on whether we could verify specific predictions made by this model in real models.
- For example, the proposed U-AND operator - do we expect this to occur in real LLMs, and could we try to find evidence of this by applying mech interp to carefully-chosen toy models?
I have a more detailed write-up on model organisms of superposition here: https://docs.google.com/document/d/1hwI30HNNB2MkOrtEzo7hppG9X7Cn7Xm9a-1LBqcttWc/edit?usp=sharing
Would love to discuss this more!