Posts

Excursions into Sparse Autoencoders: What is monosemanticity? 2024-08-05T19:22:40.249Z
Communication, consciousness, and belief strength measures 2024-02-17T05:45:42.834Z
Measuring pre-peer-review epistemic status 2024-02-08T05:09:01.418Z
Starting in mechanistic interpretability 2024-01-22T23:40:56.871Z
Carving up problems at their joints 2023-12-01T18:48:46.510Z

Comments

Comment by Jakub Smékal (jakub-smekal) on Open Source Sparse Autoencoders for all Residual Stream Layers of GPT2-Small · 2024-02-20T17:43:49.579Z · LW · GW

Neel was advised by the authors that it was important minimise batches having tokens from the same prompt. This approach leads to a buffer having activations from many different prompts fairly quickly. 

Oh I see, it's a constraint on the tokens from the vocabulary rather than the prompts. Does the buffer ever reuse prompts or does it always use new ones?

Comment by Jakub Smékal (jakub-smekal) on Open Source Sparse Autoencoders for all Residual Stream Layers of GPT2-Small · 2024-02-19T19:24:27.020Z · LW · GW
  • We store activations in a buffer of ~500k tokens which is refilled and shuffled whenever 50% of the tokens are used (ie: Neel’s approach). 

I am not sure I understand the reasoning around this approach. Why do you want to refill and shuffle tokens whenever 50% of the tokens are used? Is this just tokens in the training set or also the test set? In Neel's code I didn't see a train/test split, isn't that important? Also, can you track the number of epochs of training when using this buffer method (it seems like that makes it more difficult)?

Comment by Jakub Smékal (jakub-smekal) on Sparse Autoencoders Work on Attention Layer Outputs · 2024-01-25T05:19:39.118Z · LW · GW

Hey, great post! Are your code or autoencoder weights available somewhere?