AI Debate Stability: Addressing Self-Defeating Responses 2024-06-11T03:03:29.915Z


Comment by Anton Sorkin (anton-sorkin) on The King and the Golem · 2023-09-30T01:18:08.960Z · LW · GW

As far as I understand, it is possible that the golem foresees this strategy. In this case its future copies will cooperate (sacrifice themselves) until past the point you will be sure they are safe.