Posts
Steering LLMs' Behavior with Concept Activation Vectors
2024-09-28T09:53:19.658Z
Exploring the Evolution and Migration of Different Layer Embedding in LLMs
2024-03-08T15:01:17.504Z
Comments
Comment by
Ruixuan Huang (sprout_ust) on
Subspace Rerouting: Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models ·
2025-03-22T03:39:28.095Z ·
LW ·
GW
Great job! Consider reading our related paper: https://arxiv.org/abs/2404.12038