Introducing SARA: a new activation steering technique 2024-06-09T15:33:11.699Z


Comment by Alejandro Tlaie (alejandro-tlaie-boria) on Introducing SARA: a new activation steering technique · 2024-06-16T18:45:56.778Z · LW · GW

Hi Charlie, thanks a lot for taking the time to read the post and for the question!

Regarding what was the idea of changing the activation histories: I wanted to capture token dependencies, as I thought that concepts that weren't captured by one token only (as in the case of ActAdd) would be better described by these history-dependent activations. As to why bringing the 3 relevant activation histories to the same size: that's for enabling comparison (and, ultimately, similarity).

Regarding why SVD: I decided to use SVD as it's one of the simplest and most ubiquitous matrix factorisation techniques out there (so I didn't need to validate it or benchmark it). Also, it allows for not-so-heavy computations, which is crucial because SARA is thought to be implemented at inference time.