Posts
Comments
Comment by
scrdest on
Goodhart's Law in Reinforcement Learning ·
2023-10-16T21:24:36.175Z ·
LW ·
GW
This seems to me like a formalisation of Scott Alexander's The Tails Coming Apart As Metaphor For Life post.
Given a function and its approximation, following the approximate gradient in Mediocristan is good enough, but the extremes are highly dissimilar.
I wonder what impact complex reward functions have. If you have a pair of approximate rewards, added together, could they pull the system closer to the real target by cancelling each other out?