jake-ward

Posts
Comments

Posts

Antonym Heads Predict Semantic Opposites in Language Models 2024-11-15T15:32:14.102Z

Comments

Comment by Jake Ward (jake-ward) on Effects of Non-Uniform Sparsity on Superposition in Toy Models · 2024-11-14T21:02:20.569Z · LW · GW

we stumble on a weird observation where the few features with the least sparsity are not even learned and represented in the hidden layer

I'm not sure how you're modeling sparsity, but if these features are present in nearly 100% of inputs, you could think of it as the not-feature being extremely sparse. My guess is that these features are getting baked into the bias instead of the weights so the model is just always predicting them.

User info

Posts

Comments