0 comments
Comments sorted by top scores.
comment by Milan W (weibac) · 2024-11-10T20:47:26.595Z · LW(p) · GW(p)
I'd rather say that RLHF+'ed chatbots are upon-reflection-not-so-shockingly sycophantic, since they have been trained to satisfy their conversational partner.