AI box question

post by KvmanThinking (avery-liu) · 2024-12-04T19:03:43.201Z · LW · GW · 1 comment

This is a question post.

Contents

  Answers
    6 TsviBT
None
1 comment

I believe that even through a text-only terminal, a superintelligence could do anything to a human. Persuade the human to let it out, inflict extreme pleasure or suffering with a word. However, can't you just... limit the output of the superintelligence? Just make it so that the human can say anything, but the AI can only respond from a short list of responses, like "Yes," "No," "Maybe," "IDK," "The first option," "The second option," or a similar system. I... don't see how that can still present a risk. What about you?

Answers

answer by TsviBT · 2024-12-04T19:46:14.663Z · LW(p) · GW(p)

In theory, possibly, but it's not clear how to save the world given such restricted access. See e.g. https://www.lesswrong.com/posts/NojipcrFFMzNx6Grc/sudo-s-shortform?commentId=onKfTrunn2Q2Gc4Pw [LW(p) · GW(p)]

In practice no, because you can't deal with a superintelligence safely. E.g.

  • You can't build a computer system that's robust to auto-exfiltration. I mean, maybe you can, but you're taking on a whole bunch more cost, and also hoping you didn't screw up.
  • You can't develop this tech without other people stealing it and running it unsafely.
  • You can't develop this tech safely at all, because in order to develop it you have to do a lot more than just get a few outputs, you have to, like, debug your code and stuff.
  • And so forth. Mainly and so forth.

1 comment

Comments sorted by top scores.

comment by Dagon · 2024-12-04T19:43:07.161Z · LW(p) · GW(p)

I'm not sure that AI boxing is a live debate anymore.  People are lining up to give full web access to current limited-but-unknown-capabilities implementations, and there's not much reason to believe there will be any attempt at constraining the use or reach of more advanced versions.