Transformer Mech Interp: Any visualizations?
post by Joyee Chen (joyee-chen) · 2023-01-18T04:32:33.085Z · LW · GW · No commentsThis is a question post.
Contents
No comments
After getting to the the part of a demo (one of Neel Nanda's interp demos) where they talk about the idea of a Logit Lens, and Layer Attribution, I have a bit of trouble visualizing it as I could have for simpler concepts (e.g. residual streams, which were indeed drawn as my primary method of comprehension). Anybody have good resources for illustrations? (I know Nanda had a great colorful runthrough but it was only at a high level and for encoder-decoder machines, and thus not generalizable)
Answers
No comments
Comments sorted by top scores.