Posts

Comments

Comment by danielbalsam on Refusal in LLMs is mediated by a single direction · 2024-05-03T05:03:23.745Z · LW · GW

Great post -- thanks for sharing. I am trying to replicate this work and was able to do so for several models but having a lot of trouble reproducing this for the Llama 3 models. I am able to sometimes success in some narrow prompts but not others. Are there any suggestions you have or anything else non-obvious for that model family?

Comment by danielbalsam on DSLT 0. Distilling Singular Learning Theory · 2024-01-18T21:46:31.143Z · LW · GW

Hi! I am in the process of reading this sequence and would love some supplemental lecture materials (particularly at the intersection of alignment research) and was very excited by the prospect of the lectures form the June summit, however the YouTube channels appears to 404 now. Is there somewhere else I can listen to these lectures?