phenomanon

Posts
Comments

Posts

Quantifying SAE Quality with Feature Steerability Metrics 2025-04-08T20:55:05.291Z

Composition Circuits in Vision Transformers (Hypothesis) 2024-11-01T22:16:11.191Z

Comments

Comment by phenomanon (ekg) on Efficient Dictionary Learning with Switch Sparse Autoencoders · 2024-07-26T22:19:09.503Z · LW · GW

Thank you very much for your reply - I appreciate the commentary and direction

Comment by phenomanon (ekg) on Efficient Dictionary Learning with Switch Sparse Autoencoders · 2024-07-26T00:16:48.194Z · LW · GW

Hi Lee, if I may ask, when you say "geometric analysis" of the router, do you mean analysis of the parameters or activations? Are there any papers that perform the sort of analysis you'd like seen done? Asking from the perspective of someone who understands nns thoroughly but is new to mechinterp.

Comment by phenomanon (ekg) on Efficient Dictionary Learning with Switch Sparse Autoencoders · 2024-07-24T18:49:07.280Z · LW · GW

Thank you for the answer, that makes more sense.

Comment by phenomanon (ekg) on Efficient Dictionary Learning with Switch Sparse Autoencoders · 2024-07-23T19:19:00.570Z · LW · GW

For a batch with $T$ activations, we first compute vectors $f \in R^{N}$ and $P \in R^{N}$ . $f$ represents what proportion of activations are sent to each expert

Hi, I'm not exactly sure where f fits in here. In Figure 1/section 2.2, it seems like x is fed into the router layer, which produces a distribution over the N experts, from which the "best expert" is chosen. I'm not sure where the "proportion of activations" is in that process. To me that sounds like it's describing something that would be multiplied by x before it's fed into an expert, but I don't see that reflected in the diagram or described in section 2.2.

User info

Posts

Comments