Posts
Does robustness improve with scale?
2024-07-25T20:55:53.359Z
Comments
Comment by
niki.h (niki2) on
Saying the quiet part out loud: trading off x-risk for personal immortality ·
2023-11-02T17:59:29.756Z ·
LW ·
GW
Based on personal experience, you are definitely not the only one thinking about that Statement.
Comment by
niki.h (niki2) on
Wireheading and misalignment by composition on NetHack ·
2023-11-02T17:29:26.885Z ·
LW ·
GW
Based on my understanding from talking with the author, it is the former. The language model is simply used to provide a shaping reward based on the text outputs that the game shows after some actions; it's the RL optimization that learns the weird hallucination strategy, and the reason it's able to do it is because its capabilities in general are improved thanks to the shaping reward.