Jacob Watts' Shortform

post by Jacob Watts (Green_Swan) · 2023-05-11T02:14:04.510Z · LW · GW · 1 comments

Contents

1 comment

1 comments

Comments sorted by top scores.

comment by Jacob Watts (Green_Swan) · 2023-05-11T02:14:05.830Z · LW(p) · GW(p)

I have sometimes seen people/contests focused on writing up specific scenarios for how AI can go wrong starting with our current situation and fictionally projecting into the future. I think the idea is that this can act as an intuition pump and potentially a way to convince people.

I think that is likely net negative given the fact that state of the art AIs are being trained on internet text and stories where a good agent starts behaving badly are a key component motivating the Waluigi effect.

These sort of stories still seem worth thinking about, but perhaps greater care should be taken not to inject GPT-5's training data with examples of chatbots that go murderous. Maybe only post it as a zip file or use a simple cipher.