Just don't make a utility maximizer?

finalformal2

Just don't make a utility maximizer?

post by FinalFormal2 · 2023-01-22T06:33:07.601Z · LW · GW · 2 comments

This is a question post.

  Answers
    4 lc
    4 Ozyrus
    3 Charlie Steiner
    3 Vladimir_Nesov
None
2 comments

As we've seen some rapid advances in AI over the past year, it seems pretty clear that none of the current AI we're working with would scale up into a paper-clip maximizer.

We still face more fundamental risks that come along with having an oracular AI, but doesn't it look pretty likely right now that the first AGI is going to be oracular?

Am I missing something fundamental?

Answers

answer by lc · 2023-03-27T03:01:38.384Z · LW(p) · GW(p)

How do you plan to prevent someone from asking this oracular AI to generate a computer program that optimizes their company's stock price and then running it? Also, how do you plan to make sure the oracular AI isn't a mesa optimizer and doesn't generalize poorly to complicated plans that affect its internals?

answer by Ozyrus · 2023-01-22T07:55:18.842Z · LW(p) · GW(p)

I feel like yes, you are. See https://www.lesswrong.com/tag/instrumental-convergence [? · GW] and related posts. As far as I understand it, sufficiently advanced oracular AI will seek to “agentify” itself in one way or the other (unbox itself, so to say) and then converge on power-seeking behaviour that puts humanity at risk.

↑ comment by FinalFormal2 · 2023-01-22T16:58:04.777Z · LW(p) · GW(p)

Instrumental convergence only matters if you have a goal to begin with. As far as I can tell, ChatGPT doesn't 'want' to predict text, it's just shaped that way.

It seems to me that anything that could or would 'agentify' itself, is already an agent. It's like the "would Gandhi take the psychopath pill" question but in this case the utility function doesn't exist to want to generate itself.

Is your mental model that a scaled-up GPT 3 spontaneously becomes an agent? My mental model says it just gets really good at predicting text.

Replies from: JBlack

↑ comment by JBlack · 2023-01-23T01:27:17.861Z · LW(p) · GW(p)

My mental model is that a scaled up GPT becomes as dangerous as many agents precisely because it gets extremely good at producing text that would be an apt continuation of the preceding text.

Note that I do not say "predicting" text, since the system is not "trying" to predict anything. It's just shaped in initial training by a process that involves treating its outputs as predictions. In fine-tuning it's very likely that the outputs will not be treated as predictions, and the process may shape the system's behaviour so that the outputs are more agent-like. It seems likely that this will be more common as the technology matures.

In many ways GPT is already capable of manifesting agents (plural) depending upon its prompts. They're just not very capable - yet.

answer by Charlie Steiner · 2023-01-22T17:28:27.141Z · LW(p) · GW(p)

There's still all the same incentives there ever were to build an AI that makes plans to affect the real world. And having a good unsupervised model of the world is a great starting point for an RL agent.

So sure. I would also like it if people just decided to avoid doing that :P

answer by Vladimir_Nesov · 2023-01-22T10:38:44.395Z · LW(p) · GW(p)

LLMs are not like the other hypothetical AGIs, they have human behavior as a basic part of them, channeled directly. So they are probably more like uploads than AIs, including for alignment purposes.

Most standard arguments about alignment of AIs (like world-eating instrumental convergence or weight of simple consistent preferences) aren't relevant to them, no more than to humans. But the serial speedup in thinking is still there, so they have an advantage in the impending sequence of events that's too fast for humans to follow or meaningfully direct.

↑ comment by Mitchell_Porter · 2023-01-23T11:36:08.980Z · LW(p) · GW(p)

LLMs ... are probably more like uploads than AIs

I realized this myself, just a week ago! And you also highlight something that wasn't clear to me: for now, their important property (with respect to singularity) is

the serial speedup in thinking ... too fast for humans to follow or meaningfully direct

LLMs are a kind of human-level AI - but certainly not yet genius-level human. However, they are already inhumanly fast.

↑ comment by TAG · 2023-02-03T03:04:42.040Z · LW(p) · GW(p)

they have human behavior as a basic part of them, channeled directly.

No, it's indirect, because it's via text and training.

2 comments

Comments sorted by top scores.

comment by Lone Pine (conor-sullivan) · 2023-01-22T07:49:50.709Z · LW(p) · GW(p)

What do you mean by an oracular AI? Do you mean an oracle AI?

Replies from: FinalFormal2

↑ comment by FinalFormal2 · 2023-01-22T16:58:38.735Z · LW(p) · GW(p)

Yes

Just don't make a utility maximizer?

Contents

Answers

2 comments