niki-h

Posts
Comments

Posts

Does robustness improve with scale? 2024-07-25T20:55:53.359Z

Comments

Comment by niki.h (niki2) on Saying the quiet part out loud: trading off x-risk for personal immortality · 2023-11-02T17:59:29.756Z · LW · GW

Based on personal experience, you are definitely not the only one thinking about that Statement.

Comment by niki.h (niki2) on Wireheading and misalignment by composition on NetHack · 2023-11-02T17:29:26.885Z · LW · GW

Based on my understanding from talking with the author, it is the former. The language model is simply used to provide a shaping reward based on the text outputs that the game shows after some actions; it's the RL optimization that learns the weird hallucination strategy, and the reason it's able to do it is because its capabilities in general are improved thanks to the shaping reward.

User info

Posts

Comments