Posts

Comments

Comment by Butanium (butanium-1) on EIS III: Broad Critiques of Interpretability Research · 2023-02-15T00:17:04.008Z · LW · GW

Have you read the Redwood post on causal scrubbing? To me, it's an excellent example of evaluating interpretability using something other than intuition.

Comment by Butanium (butanium-1) on We Found An Neuron in GPT-2 · 2023-02-13T13:37:53.282Z · LW · GW

https://www.lesswrong.com/posts/aPeJE8bSo6rAFoLqg/solidgoldmagikarp-plus-prompt-generation is related to the embedding space

Comment by Butanium (butanium-1) on Decision Transformer Interpretability · 2023-02-08T01:00:17.539Z · LW · GW

Are you using decision transformers or other RL agents on procgens ? Also, do you plan to work on coinrun ?