Oracle AGI - How can it escape, other than security issues? (Steganography?)

post by RationalSieve · 2022-12-25T20:14:09.834Z · LW · GW · 6 comments

This is a question post.

Contents

6 comments

Let's suppose we have an AGI running on a perfectly physically isolated computer that may be misalligned. It only communicates with select humans through a perfectly secure text channel.
How can such an AGI escape?
While don't think that AGI can manipulate EVERY person to do what AGI wants, it's better to be cautious.
But recently, I thought of steganography. An AGI can tell a human to convey a message to the public that looks helpful at the first glance, but in fact contains a steganographic message with instructions for building antimatter fusion reactor/nanobots/bioweapon/whatever.
What other ways do you think can such an AGI escape?

Answers

6 comments

Comments sorted by top scores.

comment by shminux · 2022-12-25T21:54:05.750Z · LW(p) · GW(p)

A useful framing is a human guarded by a bunch of dogs. Think of how the human could escape that dogs can't even conceive of.

comment by the gears to ascension (lahwran) · 2022-12-25T20:26:24.600Z · LW(p) · GW(p)

Is it a good idea to be brainstorming ways ai can escape a box in public? seems like the same kind of thing as asking for people to brainstorm security vulnerabilities in public. they shouldn't necessarily stay private, but if we're aiming to close them, we should have some idea what our fix process is.

Replies from: shminux, RationalSieve
comment by shminux · 2022-12-25T21:52:23.185Z · LW(p) · GW(p)

I don't think casual comments on a forum can match what is going on in professional discussions. And those professionals know to stay mum about them. Most of what is public on AGI escape is mostly for entertainment value.

Replies from: lahwran
comment by the gears to ascension (lahwran) · 2022-12-25T21:56:08.590Z · LW(p) · GW(p)

I'm just declaring what you're saying you feel is safe to assume anyhow, so, yup

comment by RationalSieve · 2022-12-27T12:43:05.302Z · LW(p) · GW(p)

Hmm, I was somewhat worried about that, but there are way more dangerous things for AI to see written on the internet. 

If you're trying to create AGI by training it on a large internet crawl dataset, you have bigger problems...

To fix something, we need to know what to fix first.

comment by MacroMint · 2023-08-03T02:36:09.077Z · LW(p) · GW(p)

Depends on what you mean by “perfectly isolated computer”, there are already methods for exploiting air gapped systems, an AI would be able to figure out more and more complicated exploits. 
Theres also the “air conditioner“ argument, where a schematic could be given to someone to build a machine that does X, but in reality it does something to benefit the AI. I also think memetic exploits exist, interrogation/negotiation techniques seem to imply they do, and given a certain amount of time (which you imply would be allowed.  Communicating with “select humans” implies a continued contact, not just one conversation per person) an AI would be able to psychoanalyze and exploit personality flaws.