Understanding differences between humans and intelligence-in-general to build safe AGI

post by Florian_Dietz · 2022-08-16T08:27:02.859Z · LW · GW · 8 comments

Anthropomorphization strikes me as a big problem in AI safety research. People intuitively ascribe human attributes to AGI, even though humans are only one tiny subset of the space of all possible forms of intelligence.

I would like to compile a list of the most crucial differences between AGI and humans, to help with research. Here are a few to start with:

What other differences do you think are important?

8 comments

Comments sorted by top scores.

comment by Dave Orr (dave-orr) · 2022-08-16T16:13:15.066Z · LW(p) · GW(p)

I think that AIs being able to access their own thoughts probably needs more work to show that it is actually the case. Certainly the state of the art AIs now, e.g. GPT3 or PaLM, have if anything less access to their own state than people. They can't introspect really, all they can do is process the data that they are given.

Maybe that will change, but as you note, the configuration space of intelligence is large, and it could easily be that we don't end up with that particular ability, it seems to me.

I have similar reservations about the next one, thoughts of others, though you do caveat that one.

One thing that might be missing is that humans tend to have a defined location -- I know where I am, and "where I am" has a relatively clear definition. That may not hold for AIs which are much more loosely coupled to the computers running them.

Replies from: Florian_Dietz
comment by Florian_Dietz · 2022-08-17T17:22:55.361Z · LW(p) · GW(p)

I agree that current AIs can not introspect. My own research has bled into my believes here. I am actually working on this problem, and I expect that we won't get anything like AGI until we have solved this issue. As far as I can tell, an AI that works properly and has any chance to become an AGI will necessarily have to be able to introspect. Many of the big open problems in the field seem to me like they can't be solved precisely because we haven't figured out how to do this, yet.

The "defined location" point you note is intended to be covered by "being sure about the nature of your reality", but it's much more specific, and you are right that it might be worth considering as a separate point.

comment by shminux · 2022-08-17T00:32:54.614Z · LW(p) · GW(p)

I don't think your listed points are the crux of the difference. Though maybe AI (self-)interpretability is an important one. My personal feeling is that what is important is that humans are not coherent agents with goals, we just do things, often sphexing and being random or, conversely, routine, not acting to advance any of the stated goals.

Replies from: Florian_Dietz
comment by Florian_Dietz · 2022-08-17T17:39:52.823Z · LW(p) · GW(p)

This is a great point. I don't expect that the first AGI will be a coherent agent either, though.

As far as I can tell from my research, being a coherent agent is not an intrinsic property you can build into an AI, or at least not if you want it to have a reasonably effective ability to learn. It seems more like being coherent is a property that each agent has to continuously work on.

The reason for this is basically that every time we discover new things about the way reality works, the new knowledge might contradict some of the assumptions on which our goals are grounded. If this happens, we need a way to reconfigure and catch ourselves.

Example: A child does not have the capacity to understand ethics, yet. So it is told "hurting people is bad", and that is good enough to keep it from doing terrible things until it is old enough to learn more complex ethics. Trying to teach it about utilitarian ethics before it has an understanding of probability theory would be counterproductive.

Replies from: shminux
comment by shminux · 2022-08-17T22:30:47.952Z · LW(p) · GW(p)

I agree that even an AGI would have shifting goals. But at least at every single instance of time one assumes that there is a goal it optimizes for. Or a set of rules it follows. Or a set of acceptable behaviors. Or maybe some combination of those. Humans are not like that. There is no inner coherence ever, we just do stuff we are compelled to do in the moment.

Replies from: Florian_Dietz
comment by Florian_Dietz · 2022-08-19T21:09:36.844Z · LW(p) · GW(p)

Contemporary AI agents that are based on neural networks are exactly like that. They do stuff they feel compelled to in the moment. If anything, they have less coherence than humans, and no capacity for introspection at all. I doubt that AI will magically go from this current, very sad state to a coherent agent. It might modify itself into being coherent some time after becoming super intelligent, but it won't be coherent out of the box.

Replies from: shminux
comment by shminux · 2022-08-19T22:31:23.647Z · LW(p) · GW(p)

Interesting. I know very little about the ML field, and my impression from reading what the ML and AI alignment experts write on this site is that they model an AI as an agent to some degree, not just "do something incoherent at any given moment".

Replies from: Florian_Dietz
comment by Florian_Dietz · 2022-08-20T07:54:57.326Z · LW(p) · GW(p)

I mean "do something incoherent at any given moment" is also perfectly agent-y behavior. Babies are agents, too.

I think the problem is modelling incoherent AI is even harder than modelling coherent AI, so most alignment researchers just hope that AI researchers will be able to build coherence in before there is a takeoff, so that they can base their own theories on the assumption that the AI is already coherent.

I find that view overly optimistic. I expect that AI is going to remain incoherent until long after it has become superintelligent.