Posts

Does robustness improve with scale? 2024-07-25T20:55:53.359Z

Comments

Comment by niki.h (niki2) on Saying the quiet part out loud: trading off x-risk for personal immortality · 2023-11-02T17:59:29.756Z · LW · GW

Based on personal experience, you are definitely not the only one thinking about that Statement.

Comment by niki.h (niki2) on Wireheading and misalignment by composition on NetHack · 2023-11-02T17:29:26.885Z · LW · GW

Based on my understanding from talking with the author, it is the former.  The language model is simply used to provide a shaping reward based on the text outputs that the game shows after some actions; it's the RL optimization that learns the weird hallucination strategy, and the reason it's able to do it is because its capabilities in general are improved thanks to the shaping reward.