Video/animation: Neel Nanda explains what mechanistic interpretability is
post by DanielFilan · 2023-02-22T22:42:45.054Z · LW · GW · 7 commentsThis is a link post for https://youtu.be/sISodZSxNvc
Contents
7 comments
Nice little video - audio is Neel Nanda explaining what mechanistic interpretability is and why he does it, and it's illustrated by the illustrious Hamish Doodles. Excerpted from the AXRP episode [LW · GW].
(It's not technically animation I think, but I don't know what other single word to use for "pictures that move a bit and change")
7 comments
Comments sorted by top scores.
comment by Sheikh Abdur Raheem Ali (sheikh-abdur-raheem-ali) · 2023-02-23T01:19:03.935Z · LW(p) · GW(p)
Lots of alpha in AI research distillers learning motion-canvas/motion-canvas: Visualize Complex Ideas Programmatically (github.com) and making explainers.
Replies from: alexander-cai, lahwran↑ comment by adzcai (alexander-cai) · 2023-02-23T01:36:42.976Z · LW(p) · GW(p)
Or even better, finetuning an LLM to automate writing the code!
Replies from: lahwran↑ comment by the gears to ascension (lahwran) · 2023-02-23T02:16:16.123Z · LW(p) · GW(p)
cyborgism, activate!
just don't use an overly large model.
↑ comment by the gears to ascension (lahwran) · 2023-02-23T02:17:52.460Z · LW(p) · GW(p)
For those reading (I imagine Sheikh knows about these already), some videos from the creator of that library:
comment by novalinium · 2023-02-22T23:57:32.720Z · LW(p) · GW(p)
A single word for this would be an animatic, probably.
Replies from: DanielFilan↑ comment by DanielFilan · 2023-02-23T00:09:05.963Z · LW(p) · GW(p)
I kinda guess that most people don't know what that means.
comment by TinkerBird · 2023-02-23T09:29:31.142Z · LW(p) · GW(p)
Here's a dumb idea: if you have a misaligned AGI, can you keep it inside a box and have it teach you some things about alignment, perhaps through some creative lies?