Jacob Watts' Shortform
post by Jacob Watts (Green_Swan) · 2023-05-11T02:14:04.510Z · LW · GW · 1 commentsContents
1 comment
1 comments
Comments sorted by top scores.
comment by Jacob Watts (Green_Swan) · 2023-05-11T02:14:05.830Z · LW(p) · GW(p)
I have sometimes seen people/contests focused on writing up specific scenarios for how AI can go wrong starting with our current situation and fictionally projecting into the future. I think the idea is that this can act as an intuition pump and potentially a way to convince people.
I think that is likely net negative given the fact that state of the art AIs are being trained on internet text and stories where a good agent starts behaving badly are a key component motivating the Waluigi effect.
These sort of stories still seem worth thinking about, but perhaps greater care should be taken not to inject GPT-5's training data with examples of chatbots that go murderous. Maybe only post it as a zip file or use a simple cipher.