Physics of Language models (part 2.1)
post by Nathan Helm-Burger (nathan-helm-burger) · 2024-09-19T16:48:32.301Z · LW · GW · 2 commentsThis is a link post for https://youtu.be/bpp6Dz8N2zY?si=RC20soJLynXxNOfv
Contents
2 comments
This is perhaps the best interpretability work I've seen outside of Chris Olah's team.
2 comments
Comments sorted by top scores.
comment by StefanHex (Stefan42) · 2024-09-19T17:46:35.078Z · LW(p) · GW(p)
Paper link: https://arxiv.org/abs/2407.20311
(I have neither watched the video nor read the paper yet, just in case someone else was looking for the non-video version)
comment by Logan Riggs (elriggs) · 2024-09-23T11:57:34.765Z · LW(p) · GW(p)
Could you dig into why you think it's great inter work?