Posts

Steering LLMs' Behavior with Concept Activation Vectors 2024-09-28T09:53:19.658Z
Exploring the Evolution and Migration of Different Layer Embedding in LLMs 2024-03-08T15:01:17.504Z

Comments

Comment by Ruixuan Huang (sprout_ust) on Subspace Rerouting: Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models · 2025-03-22T03:39:28.095Z · LW · GW

Great job! Consider reading our related paper: https://arxiv.org/abs/2404.12038