Posts

Comments

Comment by Sidharth Baskaran (sidharth-baskaran) on Fluent dreaming for language models (AI interpretability method) · 2024-11-16T23:42:48.827Z · LW · GW

Cool followup work here!
https://www.lesswrong.com/posts/hMBTaFvAzdMNnj29c/evolutionary-prompt-optimization-for-sae-feature