Posts
Comments
Comment by
Butanium (butanium-1) on
EIS III: Broad Critiques of Interpretability Research ·
2023-02-15T00:17:04.008Z ·
LW ·
GW
Have you read the Redwood post on causal scrubbing? To me, it's an excellent example of evaluating interpretability using something other than intuition.
Comment by
Butanium (butanium-1) on
We Found An Neuron in GPT-2 ·
2023-02-13T13:37:53.282Z ·
LW ·
GW
https://www.lesswrong.com/posts/aPeJE8bSo6rAFoLqg/solidgoldmagikarp-plus-prompt-generation is related to the embedding space
Comment by
Butanium (butanium-1) on
Decision Transformer Interpretability ·
2023-02-08T01:00:17.539Z ·
LW ·
GW
Are you using decision transformers or other RL agents on procgens ? Also, do you plan to work on coinrun ?