Posts

AISC project: TinyEvals 2023-11-22T20:47:32.376Z
Polysemantic Attention Head in a 4-Layer Transformer 2023-11-09T16:16:35.132Z
An adversarial example for Direct Logit Attribution: memory management in gelu-4l 2023-08-30T17:36:59.034Z
A circuit for Python docstrings in a 4-layer attention-only transformer 2023-02-20T19:35:14.027Z

Comments

Comment by Jett (jett) on A Comprehensive Mechanistic Interpretability Explainer & Glossary · 2023-10-09T08:41:10.458Z · LW · GW

The activation patching, causal tracing and resample ablation terms seem to be out of date, compared to how you define them in your post on attribution patching.