Examples of self-fulfilling prophecies in AI alignment?
post by Chipmonk · 2025-03-03T02:45:51.619Z · LW · GW · No commentsThis is a question post.
Contents
Answers 6 Chipmonk 4 Chipmonk 3 Chipmonk None No comments
Like Self-fulfilling misalignment data might be poisoning our AI models [AF · GW], what are historical examples of self-fulfilling prophecies [? · GW] that have affected AI alignment and development?
Put a few potential examples below to seed discussion.
Answers
answer by Chipmonk · 2025-03-03T02:47:39.289Z · LW(p) · GW(p)
Situational Awareness and race dynamics? h/t Jan Kulveit @Jan_Kulveit [LW · GW]
answer by Chipmonk · 2025-03-03T02:46:34.809Z · LW(p) · GW(p)
Training on Documents About Reward Hacking Induces Reward Hacking [LW · GW]
No comments
Comments sorted by top scores.