0 comments

Comments sorted by top scores.

comment by Raemon · 2023-03-10T20:06:43.416Z · LW(p) · GW(p)

This did not help me understand anything.

Replies from: janus, shminux

↑ comment by janus · 2023-03-10T20:07:19.828Z · LW(p) · GW(p)

Helped me.

Replies from: Raemon

↑ comment by Raemon · 2023-03-10T20:09:19.332Z · LW(p) · GW(p)

Huh. Interested in either shminux or janus spelling this out more for me.

↑ comment by Shmi (shminux) · 2023-03-10T21:53:30.707Z · LW(p) · GW(p)

I guess what I was trying to illustrate that if you train an LLM with RLHF, the analogy is squeezing the directionless network along a specific axis, but then you get both the friendly face and the evil face, two sides of the same ~~coin~~ squeeze.