Adam Karvonen's Shortform

adam-karvonen

Adam Karvonen's Shortform

post by Adam Karvonen (karvonenadam) · 2025-01-18T17:11:46.227Z · LW · GW · 1 comments

1 comment

1 comments

Comments sorted by top scores.

comment by Adam Karvonen (karvonenadam) · 2025-01-18T17:11:46.329Z · LW(p) · GW(p)

If you're looking for a hackable SAE training repo for experiments, I'd recommend our dictionary_learning repo. It's been around for a few months, but we've recently spent some time cleaning it up and adding additional trainer types.

It's designed to be simple and hackable - you can add a new SAE type in a single file (~350 lines). We have 8 tested implementations, including JumpReLU, TopK, BatchTopK, Matryoshka, Gated, and others, with BatchTopK recommended as a good default. Training is quick and cheap - training 6 16K width SAEs on Gemma-2-2B for 200M tokens takes ~6 3090 hours, or ~$1.20.

The repo integrates with SAE Bench and includes reproducible baselines trained on Pythia-160M and Gemma-2-2B. While it's not optimized for large models like Eleuther's (no Cuda kernels/multi-GPU support) and has fewer features than SAE Lens, it's great for experiments and trying new architectures.

Here is a link to the repo: https://github.com/saprmarks/dictionary_learning

Adam Karvonen's Shortform

Contents

1 comments