Will the first AGI agent have been designed as an agent (in addition to an AGI)?

post by nahoj · 2022-12-03T20:32:52.242Z · LW · GW · No comments

This is a question post.

Contents

  Answers
    9 Charlie Steiner
None
No comments

I wonder about a scenario where the first AI with human or superior capabilities would be nothing goal-oriented, eg a language model like GPT. Then one instance of it would be used, possibly by a random user, to make a conversational agent told to behave as a goal-oriented AI. The bot would then behave as an AGI agent with everything that implies from a safety standpoint, eg using its human user to affect the outside world.

Is this a plausible scenario for the development of AGI and the first goal-oriented AGI? Does it have any implication regarding AI safety compared to the case of an AGI designed as goal-oriented from the start?

Answers

answer by Charlie Steiner · 2022-12-08T01:12:10.017Z · LW(p) · GW(p)

At this point in history, you have to be a bit more specific than the label "AGI," because I'd already consider language models to be above the minimum standard for "AGI."

But if you mean a program that navigates the real world at a near-human level and successfully carries out plans to perpetuate its existence, then I would expect such a program to have to work "out of the box," rather than being a pure simulacrum.

Not to say that language models can't be involved, but I'd count things like starting with a language model and then training it (or some supernetwork) to be an agent with RL as "designing it as an agent."

comment by nahoj · 2022-12-08T21:03:18.722Z · LW(p) · GW(p)

Thank you for your answer. In my example I was thinking of an AI such as a language model that would have latent ≥human-level capability without being an agent, but could easily be made to emulate one just long enough for it to get out of the box, e.g. duplicate itself. Do you think this couldn't happen?

More generally, I am wondering if the field of AI safety research studies somewhat specific scenarios based on the current R&D landscape (e.g. "A car company makes an AI to drive a car and then someone does xyz and then paperclips") and tailor-made safety measures in addition to more abstract ones like the ones in A Tentative Typology of AI-Foom Scenarios for instance.

Replies from: Charlie Steiner
comment by Charlie Steiner · 2022-12-08T21:57:04.964Z · LW(p) · GW(p)

I think that would have the form of current AI research, but would involve extremely souped-up models of the world relative to what we have now (even moreso for the self-driving car), to the extent that it's not actually that close to modern AI research. I think it's reasonable to focus our efforts on deliberate attempts to make AGI that navigates the real world.

Replies from: nahoj
comment by nahoj · 2022-12-10T13:22:37.179Z · LW(p) · GW(p)

I'm not sure I understand, do you mean that considering these possibilities is too difficult because there are too many or that it's not a priority because AIs not designed as agents are less dangerous? Or both?

Replies from: Charlie Steiner
comment by Charlie Steiner · 2022-12-10T14:49:39.272Z · LW(p) · GW(p)

The latter, specifically because it's less likely.

Replies from: nahoj
comment by nahoj · 2022-12-10T17:08:17.710Z · LW(p) · GW(p)

Right. So, considering that the most advanced AIs of a leading AI company such as OpenAI are not agents, what do you think of the following plan to solve or help solve AI risk: keep making more and more powerful Q&A AIs that are not agents until we have ones that are smarter than us, then ask them how to solve the problem. Do you think this is a safe and reasonable pursuit? Or do you think we just won't get to superhuman intelligence that way?

Replies from: Charlie Steiner
comment by Charlie Steiner · 2022-12-10T18:00:02.090Z · LW(p) · GW(p)

You could get to superintelligence that way, except that before that happens, someone else is going to make an AI that actively seeks out information and navigates the real world. 

And it's not all that safe in an absolute sense - large sequence models are so trustworthy specifically because we're using them on problems where we can give lots of examples of humans solving them. By default, when you ask a big Q&A AI how to solve alignment, it will just tell you the sort of bad answer a human would give. Trying to avoid that default carries risks, and just seems like the wrong thing to be doing. Building tools to help humans solve the problem isn't crazy [LW · GW], but this is different than expecting the answer to spring fully formed from a big AI that you trust without knowing much about alignment.

Replies from: nahoj
comment by nahoj · 2022-12-10T19:16:10.787Z · LW(p) · GW(p)

Thank you.

No comments

Comments sorted by top scores.