Can you MRI a deep learning model?

post by Yair Halberstadt (yair-halberstadt) · 2022-06-13T13:43:05.293Z · LW · GW · 1 comment

This is a question post.

Contents

  Answers
    2 P.
    2 mtaran
None
1 comment

In an MRI scan you see which parts of a brain light up in response to a stimulus. This has proven invaluable in understanding brains.

Is there an equivalent thing you can do with deep learning models, where you can see which parts light up in response to stimuli? And does there exist good UIs to explore this? It seems like such a technique would be invaluable for understanding deep learning models, and possibly for alignment.

Answers

answer by P. · 2022-06-13T17:57:37.558Z · LW(p) · GW(p)

Most neural networks don’t have anything comparable to specialised brain areas, at least structurally, so you can’t see which areas light up given some stimulus to determine what that part does. You can do it with individual neurons or channels, though. The best UI I know of to explore this is the “Dataset Samples” option in the OpenAI Microscope, that shows which inputs activate each unit.

answer by mtaran · 2022-06-13T15:04:48.983Z · LW(p) · GW(p)

The most similar analysis tool I'm aware of is called an activation atlas (https://distill.pub/2019/activation-atlas/), though I've only seen it applied to visual networks. Would love to see it used on language models!

1 comment

Comments sorted by top scores.

comment by Dagon · 2022-06-13T14:40:15.446Z · LW(p) · GW(p)

This has proven invaluable in understanding brains.

It has?  It's proven quite useful in understanding some types of injury and malfunction.  And it may have given hints to developmental and very general structures.  But I don't think it's helped very much in understanding cognitive effects or ideas.