Posts

Comments

Comment by mozzarellapesto on Learning Multi-Level Features with Matryoshka SAEs · 2025-04-04T07:40:04.038Z · LW · GW

Thank you for the interesting article, I came here after reading Noa Nabeshima's implementation. I have a question about an alternative approach to your reconstruction loss calculation. In your current method, you divide latents into fixed nested groups, where each group's reconstruction loss is calculated using all activated latents within that group's range.

Instead, have you considered an approach that works only with the subset of latents that actually activate for a given input? For example, if latents #2, #17, and #103 are the only ones that activate for a particular input, could you calculate cumulative reconstruction losses where first only latent #2 must reconstruct, then #2 and #17 together, and finally all three? This would maintain the same priority by latent index but apply the hierarchical pressure only within the activated set.

My rationale is that this might enforce a local hierarchy within activated latents rather than a global one, potentially being less restrictive while still creating meaningful pressure for early latents to capture general features. Your approach enforces a stronger global hierarchical structure, whereas this alternative might allow more flexibility while still preventing feature absorption.

Interested to know your thoughts on this!

Thanks in advance.