0 comments

Comments sorted by top scores.

comment by chanind · 2025-02-14T07:22:43.640Z · LW(p) · GW(p)

The encoder of a Sparse Autoencoder (SAE) is assumed to produce a single scalar activation given by a linear combination of the features. Under feature absorption, the encoder output is modeled as:
where $δ \in [0, 1]$ is an absorption parameter. A higher $δ$ means that the contribution from $f_{1}$ is attenuated (or absorbed) into the representation.

This doesn't seem correct. The encoder output should be a function of h. We need to specify the SAE encoder and decoder mathematically I think. We need to specify that we have a 2-latent SAE here (or only care about 2 latents) and that the encoder and decoder are thus specified by 2 stacked vectors that are linear combinations of f1 and f2. (is f2 the child?) The encoder has one latent that's just f2, and another latent that's f1 - delta f2, and corresponding decoder latents f2 + delta f1, and f1.