Posts

Comments

Comment by alexandraabbas on Reducing sycophancy and improving honesty via activation steering · 2024-04-26T10:49:09.224Z · LW · GW

"[...] This is because there would be no general direction towards a truth-based belief domain or away from using human modeling in output generation."

What do you mean by "human modeling in output generation"?