Posts

Comments

Comment by Joel Ye (joel-ye) on Transformers Represent Belief State Geometry in their Residual Stream · 2024-05-22T12:26:05.271Z · LW · GW

Thanks for the post, it's neat to see the fields and terms existing for these questions.

I have two questions for hope of using this type of analysis in my work to analyze a lack of transfer between two distinct datasets A and B. (I see this is in your future work?) 

1. Where does OOD data project, or data that is implausible for the model?

2. For more complex data, might we expect this MSP to most clearly show in places other than the final layer?

re: transfer, my hypothesis is that we might be able to see, having trained on A and B, that during inference, the heldout data from A rapidly becomes easily identifiable as A, and thus stands to reason that there's less to benefit from any of B's features. Alternatively, a more optimistic test for whether we might see transfer between A and B prior to training on B, is if we could tell that a sample from B is extremely unlikely or OOD, via raw likelihood or misbehaving MSP?