Physics of Language models (part 2.1)

post by Nathan Helm-Burger (nathan-helm-burger) · 2024-09-19T16:48:32.301Z · LW · GW · 2 comments

This is a link post for https://youtu.be/bpp6Dz8N2zY?si=RC20soJLynXxNOfv

Contents

2 comments

This is perhaps the best interpretability work I've seen outside of Chris Olah's team.

2 comments

Comments sorted by top scores.

comment by StefanHex (Stefan42) · 2024-09-19T17:46:35.078Z · LW(p) · GW(p)

Paper link: https://arxiv.org/abs/2407.20311

(I have neither watched the video nor read the paper yet, just in case someone else was looking for the non-video version)

comment by Logan Riggs (elriggs) · 2024-09-23T11:57:34.765Z · LW(p) · GW(p)

Could you dig into why you think it's great inter work?