Posts
Comments
Comment by
Moughees Ahmed (moughees-ahmed) on
Transformers Represent Belief State Geometry in their Residual Stream ·
2024-05-03T15:03:09.574Z ·
LW ·
GW
Excited to see what you come up with!
Plausibly, one could think that if a model, trained on the entirety of human output, should be able to decipher more hidden states - ones that are not obvious to us - but might be obvious in latent space. It could mean that models might be super good at augmenting our existing understanding of fields but might not create new ones from scratch.
Comment by
Moughees Ahmed (moughees-ahmed) on
Transformers Represent Belief State Geometry in their Residual Stream ·
2024-05-02T19:49:15.807Z ·
LW ·
GW
This might be an adjacent question but assuming this is true and comprehensively explains the belief updating process. What does it say, if anything, about whether transformers can produce new (undiscovered) knowledge/states? If they can't observe a novel state - something that doesn't exist in the data - can they never discover new knowledge on their own?