Posts

Arrakis - A toolkit to conduct, track and visualize mechanistic interpretability experiments. 2024-07-17T02:02:50.492Z

Comments

Comment by Yash Srivastava (yash-srivastava) on Arrakis - A toolkit to conduct, track and visualize mechanistic interpretability experiments. · 2024-07-24T07:10:31.167Z · LW · GW

Thanks a lot for the read. To answer your question :

1. I am a regular user of Transformer Lens(not so much of NNSight), and one the things that bugged me a lot is lack of abstractions to do common operations (ablations, head compositions, model surgery etc) and thought of just implementing it. In terms of architecture, what I've planned is to have a similar outline to Meta's Hydra -  where you run your experiments from config files, and the library does the grunt work. I'm still open to ideas and have been in talking about it with people from OS community. 

2. In my docs,  I have included example usage of all the tools that are working as of now(for supported models). There are example usage for common attention operations (merging/ablating heads) removing /permuting layers others such as sparsity analysis, polysemantic scores. I will try to push more heavy tutorials such as IOI ones in the near future.