What will happen when an all-reaching AGI starts attempting to fix human character flaws?

michael-bright

What will happen when an all-reaching AGI starts attempting to fix human character flaws?

post by Michael Bright (michael-bright) · 2022-06-01T18:45:17.760Z · LW · GW · No comments

This is a question post.

  Answers
    9 Korz
    2 Quintin Pope
    2 burmesetheater
None
No comments

You know the saying： “I just want him to listen not to solve my problem”

Would an all-reaching AGI accept this?

How would AGI respond to self-sabotaging tendencies?

If it's true that world problems stem from individual preferences which are based on individual perspectives that originate from individual personality traits and experiences, where will an AGI that's hellbent to improve the world stop and accept things as they are and realize that any attempt to improve things may cause more damage to humans than good?

Take social media algorithm for context, keeping everyone in a relatively closed bubble it believes each person should be in and guides each person's decisions by determining what information to show to each person.

And with the advent of the internet of things, AI will become more prevalent.

While there may be AI specifically designed to offer emotional support and AI designed to solve problems, will an AGI that may develop some sort of consciousness simply accept some of the human character flaws and limitations, or will it strip it all away at the risk of hurting the human until the singularity of what is considered acceptable is achieved?

Would an AGI always act from a place of pure rationality and maximum efficiency, in disregard of some human values that prevent us from doing this most of the time?

Answers

answer by Mart_Korz (Korz) · 2022-06-01T21:20:43.281Z · LW(p) · GW(p)

If one believes the orthogonality thesis [? · GW] (and we only need a very weak version of it), just knowing that there is an AGI trying to improve the world is not enough to predict how exactly it would reason about the more quirky aspects about human character and values. It seems to me that something that could be called "AGI-humans" is quite possible, but a more alien-to-us "total hedonistic utility maximizing AGI" also seems possible.

From how I understood arguments of Eliezer Yudkowsky here [LW · GW], the way that we are selecting for AI models will favour models with consequentialist [? · GW] decision making (we do select the models that give good results), which tends towards the latter.

Because of this, I would expect an AGI to be more on the far-reaching/utilitarian end of affecting our lives.

With regards to

[...] accept some of the human character flaws and limitations, or will it strip it all away at the risk of hurting the human until the singularity of what is considered acceptable is achieved

if we are talking about an AGI that is aiming for good in a sufficiently aligned sense, it is not obvious that a significant "risk of hurting the human" is necessary to reach a value-optimal state.

But of course a utilitarian-leaning AGI will be more willing to risk actively doing harm if it thinks that the total expected outcome is improved.

↑ comment by Michael Bright (michael-bright) · 2022-06-02T20:18:22.382Z · LW(p) · GW(p)

If one believes the orthogonality thesis [? · GW]

Yes, I do.

I would expect an AGI to be more on the far-reaching/utilitarian end of affecting our lives.

Me too.

But I'm adopting the term "AGI-humans" from today.

But of course a utilitarian-leaning AGI will be more willing to risk actively doing harm if it thinks that the total expected outcome is improved.

...

answer by Quintin Pope · 2022-06-01T22:11:36.641Z · LW(p) · GW(p)

I don't think that this is how values or beneficence works. I think that, if you had an aligned superintellience, that was actually an aligned superintelligence, it would be able to give you a simple, obvious in retrospect, explanation of why "helping" people in the manner you're worried about isn't even a coherent thing for an aligned superintelligence to do.

I think the fact that we ourselves are currently unable to come up with such an explanation is a big part of why alignment remains unsolved.

↑ comment by Michael Bright (michael-bright) · 2022-06-02T20:05:52.073Z · LW(p) · GW(p)

My question stems from a place of personal experiences where people see a certain solution as the best option and agree to apply the said solution. Only to later fail to follow up. This failure may lead to grave consequences, over and over again but the same mistakes keep getting repeated again and again.

The conclusion so far is that this is caused by some psychological limitations, usually emotional in nature.

An ASI may try to straighten this up for us. But would have to take a support role to us. Is that likely if the ASI develops its own consciousness?

And, it's highly likely that an ASI will have non-reductive emergent properties beyond our comprehension same as we developed some non-reductive emergent properties beyond the comprehension of other animals. In which case, this 👇🏾

I don't think that this is how values or beneficence works. I think that, if you had an aligned superintellience, that was actually an aligned superintelligence, it would be able to give you a simple, obvious in retrospect, explanation of why "helping" people in the manner you're worried about isn't even a coherent thing for an aligned superintelligence to do.

But we'd be talking about an ASI, not AGI. An aligned ASI for that matter. I don't think it's possible to speculate about what it would be like and how it would resolve contradictions and confusions that originate from human personality traits.

But if it is modeled after us, is it possible for an AGI to choose not to handle matters the way we do when things don't go our way?

An example: we put down animals that fail to act the way we want, even if they're acting right according to their nature.

If an all-reaching AGI were to find itself in a similar situation, its scope of action will be considerably broad.

answer by burmesetheater · 2022-06-01T21:43:41.988Z · LW(p) · GW(p)

What is the question? It seems to have something to do with AGI intervening in personality disorders, but why? AGI aside, considering the modification of humans to remove functionality that's undesirable to oneself it's not at all clear where one would stop. Some would consider human existence (and propagation) to be undesirable functionality that the user is poorly equipped to recognize or confront. Meddling in personality disorders doesn't seem relevant at this stage.

↑ comment by Michael Bright (michael-bright) · 2022-06-02T19:44:07.266Z · LW(p) · GW(p)

My main concern is:

Humans can be irrational and illogical, allowing them to let things slide for better or for worse. There are also psychological and reach limitations that put a hard cap on them somewhere.

An AGI will most likely do everything it does rationally and logically. Including emotions. And this may be detrimental to most humans.

it's not at all clear where one would stop

Yes

No comments

Comments sorted by top scores.

What will happen when an all-reaching AGI starts attempting to fix human character flaws?

Contents

Answers

No comments