My current uncertainties regarding AI, alignment, and the end of the world

post by dominicq · 2021-11-14T14:08:50.956Z · LW · GW · 3 comments

As I read the interview [LW · GW] with Eliezer Yudkowsky on AI alignment problems, I had a couple of thoughts of my own. These are poorly researched, and maybe poorly formulated. I intend to think more about them, but I thought this might be a good place to post them for feedback. I'm basically using this post as a large interactive bookmark for "hey, these are the things you thought about, think about them some more" with the added benefit of other people commenting.

3 comments

Comments sorted by top scores.

comment by [deleted] · 2021-11-14T17:57:18.364Z · LW(p) · GW(p)

I feel like there's a difference between "modeling" and "statistical recognition", in the sense that current (and near-future) AI systems currently don't necessarily model the world around them.

There is an entire subfield of ML called model-based reinforcement learning.

You'd think that to destroy a world, you first need to have a model of it, but that may not be the case.

Natural selection is existence proof (minus anthropic effects) that you can produce world-altering agents without explicitly using models.

There may be a sense in which generating text and maneuvering the real world are very different. 

Well yes, which is why I'm less worried about GPT-3 than EfficientZero [LW · GW].

There may be a sense in which successfully imitating human speech without a "model" or agency is possible.

It is trivially true, and trivially false if you ask the AI adversarial questions that require AGI-completeness.

There may also be such strongly (or even more strongly) binding constraints that prevent even a superintelligent agent from achieving their goals, but which aren't "defects" in the agent itself, but in some constant in the universe. One such example is the speed of light. However intelligent you are, that's a physical constraint that you just can't surpass.

Sure, but one does not need to surpass the speed of light to destroy humanity

There may also be a sense in which AI systems would not self-improve further than required for what we want from them. Meaning, we may fulfill our needs (for which we design and produce AI systems) with a class of AI agents that stop receiving any sort of negative feedback at a certain level of proficiency or ability. 

Who is "we"? What is the mechanism by which any AI outside this class will be completely and permanently prevented from coming into existence? This is my criticism for the rest of the points as well. Your strategy for AI risk seems to be "Let's not build the sort of AI that would destroy the world", which fails at the first word:  "Let's".

Replies from: dominicq
comment by dominicq · 2021-11-14T18:38:30.966Z · LW(p) · GW(p)

Your strategy for AI risk seems to be "Let's not build the sort of AI that would destroy the world", which fails at the first word:  "Let's".

I don't have a strategy, I'm basically just thinking out loud about a couple of specific points. Building a strategy for preventing that type of AI is important, but I don't (yet?) have any ideas in that area.

Replies from: None
comment by [deleted] · 2021-11-14T18:45:03.513Z · LW(p) · GW(p)

Ok, perhaps I was too combative with the wording. My general point is: Don't think of humanity as a coordinated agent, don't think of "AGI" as a single tribe with particular properties (I frequently see this same mistake with regard to aliens), and in particular, don't think because a specific AI won't be able or want to destroy the world, that therefore the world is saved in general.