Posts

Comments

Comment by Daniel Tan (dtch1997) on Toward A Mathematical Framework for Computation in Superposition · 2024-04-27T15:00:53.826Z · LW · GW

This work is very exciting to me, and I'm curious to hear the authors' thoughts on whether we could verify specific predictions made by this model in real models. 

  • For example, the proposed U-AND operator - do we expect this to occur in real LLMs, and could we try to find evidence of this by applying mech interp to carefully-chosen toy models? 

I have a more detailed write-up on model organisms of superposition here: https://docs.google.com/document/d/1hwI30HNNB2MkOrtEzo7hppG9X7Cn7Xm9a-1LBqcttWc/edit?usp=sharing

Would love to discuss this more!