Posts
Comments
Comment by
alexandraabbas on
Reducing sycophancy and improving honesty via activation steering ·
2024-04-26T10:49:09.224Z ·
LW ·
GW
"[...] This is because there would be no general direction towards a truth-based belief domain or away from using human modeling in output generation."
What do you mean by "human modeling in output generation"?