0 comments
Comments sorted by top scores.
comment by Raemon · 2023-03-10T20:06:43.416Z · LW(p) · GW(p)
This did not help me understand anything.
Replies from: janus, shminux↑ comment by Shmi (shminux) · 2023-03-10T21:53:30.707Z · LW(p) · GW(p)
I guess what I was trying to illustrate that if you train an LLM with RLHF, the analogy is squeezing the directionless network along a specific axis, but then you get both the friendly face and the evil face, two sides of the same coin squeeze.