Transformer Mech Interp: Any visualizations?

post by Joyee Chen (joyee-chen) · 2023-01-18T04:32:33.085Z · LW · GW · No comments

This is a question post.

Contents

No comments

After getting to the the part of a demo (one of Neel Nanda's interp demos) where they talk about the idea of a Logit Lens, and Layer Attribution, I have a bit of trouble visualizing it as I could have for simpler concepts (e.g. residual streams, which were indeed drawn as my primary method of comprehension). Anybody have good resources for illustrations? (I know Nanda had a great colorful runthrough but it was only at a high level and for encoder-decoder machines, and thus not generalizable)

Answers

No comments

Comments sorted by top scores.