Posts
Positional kernels of attention heads
2025-03-03T01:40:13.014Z
Using the probabilistic method to bound the performance of toy transformers
2025-01-21T23:01:38.067Z
Duplicate token neurons in the first layer of GPT-2
2024-12-27T04:21:55.896Z
Alex Gibson's Shortform
2024-12-27T04:21:55.840Z