Examples of self-fulfilling prophecies in AI alignment?

chipmonk

Examples of self-fulfilling prophecies in AI alignment?

post by Chipmonk · 2025-03-03T02:45:51.619Z · LW · GW · 1 comment

This is a question post.

  Answers
    8 Chipmonk
    8 Chipmonk
    4 Chipmonk
    3 DivineMango
    2 Chipmonk
None
1 comment

Like Self-fulfilling misalignment data might be poisoning our AI models [AF · GW], what are historical examples of self-fulfilling prophecies [? · GW] that have affected AI alignment and development?

Put a few potential examples below to seed discussion.

Answers

answer by Chipmonk · 2025-03-03T02:50:27.022Z · LW(p) · GW(p)

https://x.com/sama/status/1621621724507938816

answer by Chipmonk · 2025-03-03T02:46:34.809Z · LW(p) · GW(p)

Training on Documents About Reward Hacking Induces Reward Hacking [LW · GW]

answer by Chipmonk · 2025-03-03T02:47:39.289Z · LW(p) · GW(p)

Situational Awareness and race dynamics? h/t Jan Kulveit @Jan_Kulveit [LW · GW]

answer by DivineMango · 2025-04-03T20:51:14.902Z · LW(p) · GW(p)

Superintelligence Strategy is pretty explicitly trying to be self-fulfilling, e.g. "This dynamic stabilizes the strategic landscape without lengthy treaty negotiations—all that is necessary is that states collectively recognize their strategic situation" (which this paper popularly argues exists in the first place)

answer by Chipmonk · 2025-04-03T19:39:56.539Z · LW(p) · GW(p)

https://x.com/saffronhuang/status/1907863453009867183

1 comment

Comments sorted by top scores.

comment by DivineMango · 2025-04-03T20:50:51.957Z · LW(p) · GW(p)

Examples of self-fulfilling prophecies in AI alignment?

Contents

Answers

1 comment