The case against AI alignment

post by andrew sauer (andrew-sauer) · 2022-12-24T06:57:53.405Z · LW · GW · 110 comments

Trigger warning: Discussion of seriously horrific shit. Honestly, everything is on the table here so if you're on the lookout for trigger warnings you should probably stay away from this conversation.

Any community of people which gains notability will attract criticism. Those who advocate for the importance of AI alignment are no exception. It is undoubtable that you have all heard plenty of arguments against the worth of AI alignment by those who disagree with you on the nature and potential of AI technology. Many have said that AI will never outstrip humans in intellectual capability. Others have said that any sufficiently intelligent AI will “align” themselves automatically, because they will be able to better figure out what is right. Others say that strong AI is far enough in the future that the alignment problem will inevitably be solved by the time true strong AI becomes viable, and the only reason we can’t solve it now is because we don’t sufficiently understand AI.

I am not here to level criticisms of this type at the AI alignment community. I accept most of the descriptive positions endorsed by this community: I believe that AGI is possible and will inevitably be achieved within the next few decades, I believe that the alignment problem is not trivial and that unaligned AGI will likely act against human interests to such an extent as to lead to the extinction of the human race and probably all life as well. My criticism is rather on a moral level: do these facts mean that we should attempt to develop AI alignment techniques?

I say we should not, because although the risks and downsides of unaligned strong AI are great, I do not believe that they even remotely compare in scope to the risks from strong AI alignment techniques in the wrong hands. And I believe that the vast majority of hands this technology could end up in are the wrong hands.

You may reasonably ask: How can I say this, when I have already said that unaligned strong AI will lead to the extinction of humanity? What can be worse than the extinction of humanity? The answer to that question can be found very quickly by examining many possible nightmare scenarios that AI could bring about. And the common thread running through all of these nightmare scenarios is that the AI in question is almost certainly aligned, or partially aligned, to some interest of human origin.

Unaligned AI will kill you, because you are made of atoms which can be used for paper clips instead. It will kill you because it is completely uninterested in you. Aligned, or partially aligned AI, by contrast, may well take a considerable interest in you and your well-being or lack thereof. It does not take a very creative mind to imagine how this can be significantly worse, and a superintelligent AI is more creative than even the most deranged of us.

I will stop with the euphemisms, because this point really needs to be driven home for people to understand exactly why I am so insistent on it. The world as it exists today, at least sometimes, is unimaginably horrible. People have endured things that would make any one of us go insane, more times than one can count. Anything you can think of which is at all realistic has happened to somebody at some point in history. People have been skinned alive, burned and boiled alive, wasted away from agonizing disease, crushed to death, impaled, eaten alive, succumbed to thousands of minor cuts, been raped, been forced to rape others, drowned in shit, trampled by desperate crowds fleeing a fire, and really anything else you can think of. People like Junko Furuta have suffered torture and death so bad you will feel physical pain just from reading the Wikipedia article. Of course, if you care about animals, this gets many orders of magnitude worse. I will not continue to belabor the point, since others have written about this far better than I ever can. On the Seriousness of Suffering (reducing-suffering.org) The Seriousness of Suffering: Supplement – Simon Knutsson

I must also stress that all of this has happened in a world significantly smaller than one an AGI could create, and with a limited capacity for suffering. There is only so much harm that your body and mind can physically take before they give out. Torturers have to restrain themselves in order to be effective, since if they do too much, their victim will die and their suffering will end. None of these things are guaranteed to be true in a world augmented with the technology of mind uploading. You can potentially try every torture you can think of, physically possible or no, on someone in sequence, complete with modifying their mind so they never get used to it. You can create new digital beings by the trillions just for this purpose if you really want to.

I ask you, do you really think that an AI aligned to human values would refrain from doing something like this to anyone? One of the most fundamental aspects of human values is the hated outgroup. Almost everyone has somebody they’d love to see suffer. How many times has one human told another “burn in hell” and been entirely serious, believing that this was a real thing, and 100% deserved? Do you really want technology under human control to advance to a point where this threat can actually be made good upon, with the consent of society? Has there ever been any technology invented in history which has not been terribly and systematically misused at some point?

Mind uploading will be abused in this way if it comes under the control of humans, and it almost certainly will not stop being abused in this way when some powerful group of humans manages to align an AI to their CEV. Whoever controls the AI will most likely have somebody whose suffering they don’t care about, or that they want to enact, or that they have some excuse for, because that describes the values of the vast majority of people. The AI will perpetuate it because that is what the CEV of the controller will want it to do, and with value lock-in, this will never stop happening until the stars burn themselves out and there is no more energy to work with.

Do you really think extrapolated human values don’t have this potential? How many ordinary, regular people throughout history have become the worst kind of sadist under the slightest excuse or social pressure to do so to their hated outgroup? What society hasn’t had some underclass it wanted to put down in the dirt just to lord power over them? How many people have you personally seen who insist on justifying some form of suffering for those they consider undesirable, calling it “justice” or “the natural order”?

I refuse to endorse this future. Nobody I have ever known, including myself, can be trusted with influence which can cause the kinds of harm AI alignment can. By the nature of the value systems of the vast majority of people who could find their hands on the reins of this power, s-risk scenarios are all but guaranteed. A paperclip AI is far preferable to these nightmare scenarios, because nobody has to be around to witness it. All a paperclip AI does is kill people who were going to die within a century anyway. An aligned AI can keep them alive, and do with them whatever its masters wish. The only limits to how bad an aligned AI can be is imagination and computational power, of which AGI will have no shortage.

The best counterargument to this idea is that suffering subroutines are instrumentally convergent and therefore unaligned AI also causes s-risks. However, if suffering subroutines are actually useful for optimization in general, any kind of AI likely to be created will use them, including human-aligned FAI. Most people don't even care about animals, let alone some process. In this case, s-risks are truly unavoidable except by preventing AGI from ever being created, probably by human extinction by some other means.

Furthermore, I don't think suffering is likely to be instrumentally convergent, since I would think if you had full control over all optimization processes in the world, it would be most useful to eliminate all processes which would suffer for, and therefore dislike and try to work against, your optimal vision for the world.

My honest, unironic conclusion after considering these things is that Clippy is the least horrible plausible future. I will oppose any measure which makes the singularity more likely to be aligned with somebody’s values, or any human-adjacent values. I welcome debate and criticism in the comments. I hope we can have a good conversation because this is the only community in existence which I believe could have a good-faith discussion on this topic.

110 comments

Comments sorted by top scores.

comment by Thane Ruthenis · 2022-12-24T07:55:09.521Z · LW(p) · GW(p)

What you're describing is a case where we solve the technical problem of AI Alignment, i. e. the problem of AI control, but fail to maneuver the world into the sociopolitical state in which that control is used for eudaimonic ends.

Which, I agree, is a massive problem, and one that's crucially overlooked. Even the few people who are advocating for social and political actions now mostly focus on convincing AI labs/politicians/the public about the omnicide risks of AI and the need to slow down research. Not on ensuring that the AGI deployment, when it eventually does happen, is done right.

It's also a major problem with pivotal-act-based scenarios. Say we use some limited strawberry-aligned AI to "end the acute risk period", then have humanity engage in "long reflection", figure out its real values, and eventually lock them in. Except: what's the recognition function for these "real values"? If the strawberry-aligned AI can't be used to implement an utopia directly, then it can't tell an utopia from hell, so it won't stop us from building a hell!

There's an argument [LW(p) · GW(p)] that solving (the technical problem of) alignment will give us all the techniques needed to build an AGI, so there's a nontrivial chance that a research group from this community will be in charge of AGI deployment. And if so, it's at least plausible that they'll be the right kind of altruistic [LW · GW]. In this scenario, the sociopolitical part is taken care of by default, and AGI deployment goes well.

But I don't think many people consider this scenario most likely, and as far as I can tell, most other scenarios have the potential to go very badly on that last step.

Replies from: andrew-sauer, Onearmplanche
comment by andrew sauer (andrew-sauer) · 2022-12-24T17:36:44.496Z · LW(p) · GW(p)

What scenario do you see where the world is in a sociopolitical state where the powers that be who have influence over the development of AI have any intention of using that influence for eudaimonic ends, and for everyone and not just some select few?

Because right now very few people even want this from their leaders. I'm making this argument on lesswrong because people here are least likely to be hateful or apathetic or whatever else, but there is not really a wider political motivation in the direction of universal anti-suffering.

Humans have never gotten this right before, and I don't expect them to get it right the one time it really matters.

Replies from: Thane Ruthenis
comment by Thane Ruthenis · 2022-12-25T00:14:37.152Z · LW(p) · GW(p)

All such realistic scenarios, in my view, rely on managing who has influence over the development of AI. It certainly must not be a government, for example. (At least not in the sense that the officials at the highest levels of government actually understand what's happening. I guess it can be a government-backed research group, but, well, without micromanagement — and given what we're talking about, the only scenario where the government doesn't do micromanagement is if it doesn't really understand the implications.) Neither should it be some particularly "transparent" actor that's catering to the public whims, or an inherently for-profit organization, etc.

... Spreading the knowledge of AI Risk really is not a good idea, is it? Its wackiness is playing to our favour, avoids exposing the people working on it to poisonous incentives or to authorities already terminally poisoned by such incentives.

comment by Onearmplanche · 2022-12-25T23:53:06.814Z · LW(p) · GW(p)

Have you asked the trillions of farm animals that are as smart 2-4-year-olds and feel the same emotions they if we are monsters? So let's say you took a group of humans and made them so smart that the difference in intelligence between us and them is greater than the distance in intelligence between us and a pig. They can build AI that doesn't have what they perceive as negative pitfalls like consuming other beings for energy, immortality, and most importantly thinking the universe revolves around them all said in a much more eloquent justified way with better reasons. Why are these human philosophies wrong and yours correct?

Replies from: Thane Ruthenis
comment by Thane Ruthenis · 2022-12-26T03:19:50.674Z · LW(p) · GW(p)

I mean, I sure would ask a bunch of sapient farm animals what utopia they want me to build, if I became god and they turned out to be sapient after all? As would a lot of people from this community. You seem to think, from your other comments, that beings of greater intelligence never care about beings of lesser intelligence, but that's factually incorrect.

Why are these human philosophies wrong and yours correct?

A paperclip-maximizer is not "incorrect", it's just not aligned with my values. These philosophers, likewise, would not be "incorrect", just not aligned with my values. And that's the outcome we want to prevent, here.

Replies from: Onearmplanche
comment by Onearmplanche · 2023-01-09T16:34:10.617Z · LW(p) · GW(p)

So pigs are roughly as smart as four-year-olds yet humans generally are cool with torturing and killing them in the billions for the temporary pleasure of taste. Humans are essentially biological computers. I don't see how you can make a smarter robot that can improve itself indefinitely forever serve a dumber human and it also gives it clear motive to kill you. I also don't see how alignment could possibly be moral.

comment by Mau (Mauricio) · 2022-12-25T07:56:24.834Z · LW(p) · GW(p)

Thanks for posting, but I think these arguments have major oversights. This leaves me more optimistic about the extent to which people will avoid and prevent the horrible misuse you describe.

First, this post seems to overstate the extent to which people tend to value and carry out extreme torture. Maximally cruel torture fortunately seems very rare.

  • The post asks "How many people have you personally seen who insist on justifying some form of suffering for those they consider undesirable[?]" But "justifying some form of suffering" isn't actually an example of justifying extreme torture.
  • The post asks, "What society hasn’t had some underclass it wanted to put down in the dirt just to lord power over them?" But that isn't actually an example of people endorsing extreme torture.
  • The post asks, "How many ordinary, regular people throughout history have become the worst kind of sadist under the slightest excuse or social pressure to do so to their hated outgroup?" But has it really been as many as the post suggests? The historical and ongoing atrocities that come to mind were cases of serious suffering in the context of moderately strong social pressure/conditioning--not maximally cruel torture in the context of slight social pressure.
  • So history doesn't actually give us strong reasons to expect maximally suffering-inducing torture at scale (edit: or at least, the arguments this post makes for that aren't strong).

Second, this post seems to overlook a major force that often prevents torture (and which, I argue, will be increasingly able to succeed at doing so): many people disvalue torture and work collectively to prevent it.

  • Torture tends to be illegal and prosecuted. The trend here seems to be positive, with cruelty against children, animals, prisoners, and the mentally ill being increasingly stigmatized, criminalized, and prosecuted over the past few centuries.
  • We're already seeing AI development being highly centralized, with this leading AI developers working to make their AI systems hit some balance of helpful and harmless, i.e. not just letting users carry out whatever misuse they want.
  • Today, the cruelest acts of torture seem to be small-scale acts pursued by not-very-powerful individuals, while (as mentioned above) powerful actors tend to disvalue and work to prevent torture. Most people will probably continue to support the prevention and prosecution of very cruel torture, since that's the usual trend, and also because people would want to ensure that they do not themselves end up as victims of horrible torture. In the future, people will be better equipped to enforce these prohibitions, through improved monitoring technologies (e.g. monitoring and enforcement mechanisms built onto all AI chips).

Third, this post seems to overlook arguments for why AI alignment may be worthwhile (or opposing it may be a bad idea), even if a world with aligned AI wouldn't be worthwhile on its own. My understanding is that most people focused on preventing extreme suffering find such arguments compelling enough to avoid working against alignment, and sometimes even to work towards it.

  • Concern over s-risks will lose support and goodwill if adherents try to kill everyone, as the poster suggests they intend to do ("I will oppose any measure which makes the singularity more likely to be aligned with somebody’s values"). Then, if we do end up with aligned AI, it'll be significantly less likely that powerful actors will work to stamp out extreme suffering.
  • The highest-leverage intervention for preventing suffering is arguably coordinating/trading with worlds where there is a lot of it, and humanity won't be able to do that if we lose control of this world.

These oversights strike me as pretty reckless, when arguing for letting (or making) everyone die.

Replies from: Aaron_Scher, andrew-sauer
comment by Aaron_Scher · 2022-12-30T04:04:42.889Z · LW(p) · GW(p)

I first want to signal-boost Mauricio’s comment.

My experience reading the post was that I kinda nodded along without explicitly identifying and interrogating cruxes. I’m glad that Mauricio has pointed out the crux of “how likely is human civilization to value suffering/torture”. Another crux is “assuming some expectation about how much humans value suffering, how likely are we to get a world with lots of suffering, assuming aligned ai”, another is “who is in control if we get aligned AI”, another is “how good is the good that could come from aligned ai and how likely is it”.

In effect this post seems to argue “because humans have a history of producing lots of suffering, getting an AI aligned to human intent would produce an immense amount of suffering, so much that rolling the dice is worse than extinction with certainty”

It matters what the probabilities are, and it matters what the goods and bars are, but this post doesn’t seem to argue very convincingly that extremely-bads are all that likely (see Mauricio’s bullet points).

comment by andrew sauer (andrew-sauer) · 2022-12-25T17:01:08.890Z · LW(p) · GW(p)

I'll have to think about the things you say, particularly the part about support and goodwill. I am curious about what you mean by trading with other worlds?

Replies from: Mauricio
comment by Mau (Mauricio) · 2022-12-25T18:42:36.911Z · LW(p) · GW(p)

Ah sorry, I meant the ideas introduced in this post [EA · GW] and this one [LW · GW] (though I haven't yet read either closely).

comment by Mateusz Bagiński (mateusz-baginski) · 2022-12-24T09:57:01.597Z · LW(p) · GW(p)

I ask you, do you really think that an AI aligned to human values would refrain from doing something like this to anyone? One of the most fundamental aspects of human values is the hated outgroup. Almost everyone has somebody they’d love to see suffer. How many times has one human told another “burn in hell” and been entirely serious, believing that this was a real thing, and 100% deserved?

 

This shows how vague a concept "human values" are, and how different people can interpret it very differently.

I always interpreted "aligning an AI to human values" as something like "making it obedient to us, ensuring it won't do anything that we (whatever that 'we' is - another point of vagueness) wouldn't endorse, lowering suffering in the world, increasing eudaimonia in the world, reducing X-risks, bringing the world closer to something we (or smarter/wiser versions of us) would consider a protopia/utopia"

Certainly I never thought it to be a good idea to imbue the AI with my implicit biases, outgroup hatred, or whatever. I'm ~sure that people who work on alignment for a living have also seen these skulls.

I know little about CEV, but if I were to coherently extrapolate my volition, then one aspect of that would be increasing the coherence and systematicity of my moral worldview and behavior, including how (much) different shards conform to it. I would certainly trash whatever outgroup bias I have (not counting general greater fondness for the people/other things close to me).

So, yeah, solving "human values" is also a part of the problem but I don't think that it makes the case against aligning AI.

Replies from: andrew-sauer, M. Y. Zuo
comment by andrew sauer (andrew-sauer) · 2022-12-24T20:29:42.432Z · LW(p) · GW(p)

My problem with that is I think solving "human values" is extremely unlikely for us to do in the way you seem to be describing it, since most people don't even want to. At best, they just want to be left alone and make sure them and their families and friends aren't the ones hit hardest. And if we don't solve this problem, but manage alignment anyways, the results are unimaginably worse than what Clippy would produce.

Replies from: None, esben-kran, mateusz-baginski
comment by [deleted] · 2022-12-27T19:23:18.715Z · LW(p) · GW(p)

I have to question this.

Let's play this out.

Suppose some power that is Somewhat Evil succeeds in alignment and makes it work.

They take over the solar system.

They configure the AGI policy JSON files to Eternal Paradise (well billions of years worth) for their in group. Eternal mediocrity for the out group who has some connections to their in group. And yea Eternal Suffering for the out group.

Arguably so long as the (in group + mediocre group) is greater than the outgroup this is a better situation than clippy. It is positive balance of human experience vs 0.

Moreover there is a chance of future reforms, where the in group in power decides the outgroup have suffered enough and adds them to the mediocre group or kills them.

Since the life support pods to keep the suffering group alive cost finite resources that could make conditions better for the other 2 groups there is an incentive to release or kill them.

Clippy case this can't happen.

You seem to be worried a world power who is Mostly Evil who thinks everyone is out-group but some arbitrarily small number of people (north Koreans, Russians etc) will gain AGI first.

This is stupendously unlikely. AGI takes immense resources to develop- cutting edge compute and large amounts of it as well as many educated humans- and societies that have broader middle classes are orders of magnitude wealthier.

This essentially isn't a plausible risk.

Arguably the reason Russia has any money at all - or Afghanistan - is from sales from natural resources to wealthier societies with broad in groups.

Replies from: mateusz-baginski
comment by Mateusz Bagiński (mateusz-baginski) · 2022-12-29T08:09:31.500Z · LW(p) · GW(p)

First, this presupposes that for any amount of suffering there is some amount of pleasure/bliss/happiness/eudaimonia that could outweigh it. Not all LWers accept this, so it's worth pointing that out.

But I don't think the eternal paradise/mediocrity/hell scenario accurately represents what is likely to happen in that scenario. I'd be more worried about somebody using AGI to conquer the world and establish a stable totalitarian system built on some illiberal system, like shariah (according to Caplan, it's totally plausible for global totalitarianism to persist indefinitely). If you get to post-scarcity, you may grant all your subjects UBI, all basic needs met, etc. (or you may not, if you decide that this policy contradicts Quran or hadith), but if your convictions are strong enough, women will still be forced to wear burkas, be basically slaves of their male kin etc. One could make an argument that abundance robustly promotes more liberal worldview, loosening of social norms, etc., but AFAIK there is no robust evidence for that.

This is meant just to illustrate that you don't need an outgroup to impose a lot of suffering. Having a screwed up normative framework is just enough.

This family of scenarios is probably still better than AGI doom though.

Replies from: None
comment by [deleted] · 2022-12-29T16:42:11.161Z · LW(p) · GW(p)

Thanks for the post.

I kept thinking how a theocracy, assuming it did adopt all the advanced technology we are almost certain is possible but lack the ability to implement, could deal with all these challenges to it's beliefs.

Half the population is being mistreated because they were born the wrong gender/the wrong ethnic subgroup?  No problem, they'll just go gender transition to the favored group.  Total body replacement would be possible so there would be no way to tell who did this.

Sure the theocracy could ban the treatments but it conquered the solar system - it had to adopt a lot of the ideas or it wouldn't have succeeded.  

There are apparently many fractured subgroups who all nominally practice the same religion as well.  But the only way to tell membership has to do with subtle cultural signals/body appearance.  With neural implants people could mimic the preferred subgroup also...

I think it would keep hitting challenges.  It reminds me of culturally the effect the pandemic had.  The US culture based on in-person work, and everyone working or they are to be allowed to starve and become homeless, was suddenly at odds with reality.  

Or just rules lawyering.  Supposedly brothels in some Islamic countries have a priest on hand to temporarily marry all the couples.  Alcohol may be outlawed but other pharmaceuticals mimic a similar effect.  You could anticipate the end result is a modern liberal civilization that rules lawyers it's way out of needing to adhere to the obsolete religious principles.  (homosexuality? still banned but VR doesn't count.  burkhas?  still required but it's legal to project a simulated image of yourself on top...)

Replies from: mateusz-baginski
comment by Mateusz Bagiński (mateusz-baginski) · 2022-12-29T18:19:57.145Z · LW(p) · GW(p)

If they have a superhuman AGI, they can use it to predict all possible ways people might try to find workarounds for their commandments (e.g., gender transition, neural implants, etc.) and make them impossible.

If I understand you correctly, you are pointing to the fact that values/shards change in response to novel situations. Sure and perhaps even solar system-wide 1984-style regime would over time slowly morph into (something closer to) luxurious fully-automated gay space communism. But that's a big maybe IMO. If we had good evidence that prosperity loosens and liberalizes societies across historical and cultures contexts plus perhaps solid models of axiological evolution (I don't know if something like that even exists), my credence in that would be higher. Also, why not use AGI to fix your regime's fundamental values or at least make them a stable attractor over thousands or millions of years.

(I feel like right now we are mostly doing adversarial worldbuilding)

comment by Esben Kran (esben-kran) · 2023-01-05T14:00:02.080Z · LW(p) · GW(p)

An interesting solution here is radical voluntarism where an AI philosopher king runs the immersive reality where all humans are in and you can only be causally influenced upon if you want to. This means that you don't need to do value alignment, just very precise goal alignment. I was originally introduced to this idea Carado.

comment by Mateusz Bagiński (mateusz-baginski) · 2022-12-25T07:08:09.555Z · LW(p) · GW(p)

Sure, if North Korea, Nazi Germany, or even CCP/Putin were the first ones to build an AGI and succesfully align it with their values, then we would be in a huge trouble, but that's a matter of AI governance, not the object-level problem of making the AI consistenly will to do the thing the human would like it to do if they were smarter or whatever.

If we solve alignment without "solving human values" and most people will just stick to their common sense/intuitive ethics[1] and the people/orgs doing the aligning are "the ~good guys" without any retributivist/racist/outgroupish impulses... perhaps they would like to secure for themselves some huge monetary gain, but other than that are completely fine with enacting the Windfall Clause, letting benefits to trickle down to every-sentient-body and implementing luxurious fully automated gay space communism or whatever their coherently extrapolated vision of protopia is...

Yeah, for everything I just listed you probably could find some people that wouldn't like it (gay space communism, protopian future, financial gain for the first aligners, benefits trickling down to everybody) and even argue against it somewhat consistently but unless they are antinatalist or some religious fundamentalist nutjob I don't think would say that it's worse than AI doom.


  1. Although I think you exaggerate their parochialism and underestimate how much folk ethics has changed over the last hundreds of years and perhaps how much it can change in the future if historical dynamics will be favorable. ↩︎

Replies from: Arcayer, andrew-sauer
comment by Arcayer · 2022-12-26T02:17:59.198Z · LW(p) · GW(p)

Something I would Really really like anti-AI communities to consider is that regulations/activism/etc aimed to harm AI development and slow AI timelines do not have equal effects on all parties. Specifically, I argue that the time until the CCP develops CCP aligned AI is almost invariant, whilst the time until Blender reaches sentience potentially varies greatly.

I am Much much more hope for likeable AI via open source software rooted in a desire to help people and make their lives better, than (worst case scenario) malicious government actors, or (second) corporate advertisers.

I want to minimize first the risk of building Zon-Kuthon. Then, Asmodeus. Once you're certain you've solved A and B, you can worry about not building Rovagug. I am extremely perturbed about the AI alignment community whenever I see any sort of talk of preventing the world being destroyed where this moves any significant probability mass from Rovagug to Asmodeus. A sensible AI alignment community would not bother discussing Rovagug yet, and would especially not imply that the end of the world is the worst case scenario.

Replies from: donald-hobson
comment by Donald Hobson (donald-hobson) · 2022-12-27T16:24:08.749Z · LW(p) · GW(p)

I don't think AGI is on the CCP radar. 

comment by andrew sauer (andrew-sauer) · 2022-12-29T07:07:41.358Z · LW(p) · GW(p)

Antinatalists getting the AI is morally the same as paperclip doom, everyone dies.

comment by M. Y. Zuo · 2022-12-24T15:26:41.334Z · LW(p) · GW(p)

Yet there is no widely accepted, and non-contradictory, definition of 'CEV', so what does 'coherently extrapolate my volition' mean?

Replies from: lc
comment by lc · 2022-12-24T21:05:37.031Z · LW(p) · GW(p)

What are you talking about?

Replies from: TAG, M. Y. Zuo
comment by TAG · 2022-12-24T21:48:23.595Z · LW(p) · GW(p)

The problem is not so much the lack of a definition as the lack of a method.

comment by M. Y. Zuo · 2022-12-24T22:59:03.061Z · LW(p) · GW(p)

You seem to have posted a link to Yudowsky musings about 'CEV' collected together in pdf form? 

What relation does this have to the prior comment?

Replies from: lc
comment by lc · 2022-12-24T23:30:12.283Z · LW(p) · GW(p)

That document is the thing Yudkowdky published in 2004 when he originally defined and explained the term. If you have questions not answered by the document's fairly detailed breakdown of CEV you should just ask them instead of being cheeky.

Replies from: M. Y. Zuo
comment by M. Y. Zuo · 2022-12-25T02:24:12.143Z · LW(p) · GW(p)

Have you read it?

The pdf document linked clearly contains personal musings. 

If you are unaware of what scientists proposing and adopting definitions looks like then you can search it up. 

ACM, IEEE, Physical review Letters, etc., all have examples viewable online.

comment by gabrielrecc (pseudobison) · 2022-12-24T09:32:46.055Z · LW(p) · GW(p)

It seems like you're claiming something along the lines of "absolute power corrupts absolutely" ... that every set of values that could reasonably be described as "human values" to which an AI could be aligned -- your current values, your CEV, [insert especially empathetic, kind, etc. person here]'s current values, their CEV, etc. -- would endorse subjecting huge numbers of beings to astronomical levels of suffering, if the person with that value system had the power to do so. 

I guess I really don't find that claim plausible. For example, here is my reaction to the following two questions in the post:

"How many ordinary, regular people throughout history have become the worst kind of sadist under the slightest excuse or social pressure to do so to their hated outgroup?"

... a very, very small percentage of them? (minor point: with CEV, you're specifically thinking about what one's values would be in the absence of social pressure, etc...)

"What society hasn’t had some underclass it wanted to put down in the dirt just to lord power over them?"

It sounds like you think "hatred of the outgroup" is the fundamental reason this happens, but in the real world it seems like "hatred of the outgroup" is driven by "fear of the outgroup". A godlike AI that is so powerful that it has no reason to fear the outgroup also has no reason to hate it. It has no reason to behave like the classic tyrant whose paranoia of being offed leads him to extreme cruelty in order to terrify anyone who might pose a threat, because no one poses a threat.

Replies from: andrew-sauer, Maciek300, Onearmplanche
comment by andrew sauer (andrew-sauer) · 2022-12-24T17:50:56.259Z · LW(p) · GW(p)

I'm not sure it makes sense to talk about what somebody's values are in the absence of social pressure: people's values are defined and shaped by those around them.

I'm also not convinced that every horrible thing people have ever done to the "outgroup" is motivated by fear. Oftentimes it is motivated by opportunistic selfishness taking advantage of broader societal apathy, like the slaveowners who sexually abused their slaves. Or just a deep-seated need to feel powerful and on top. There will always be some segment of society who wants somebody to be beneath them in the pecking order, and a much larger segment of society that doesn't really care if that is the case as long as it isn't them underneath. Anything else requires some kind of overwhelming utopian political victory that I don't find likely.

If the aligned AI leaves anybody out of its consideration whatsoever, it will screw them over badly by maximizing the values of those among us who would exploit them. After all, if you don't consider slaves people, the argument that we need to preserve the slaveowners' freedom starts to make sense.

There are just so many excuses for suffering out there, and I don't believe that the powers that be will shake off all of them in the next few decades. Here are a few examples:

  • They are a bad person, they deserve it
  • They are a nonbeliever, they deserve hell
  • It's the natural order for us to rule over them
  • Suffering is necessary for meaning in life
  • Wild animal suffering is a beautiful part of the ecosystem which must be preserved
comment by M. Świderski (Maciek300) · 2022-12-25T22:11:58.726Z · LW(p) · GW(p)

>... a very, very small percentage of them?

So you totally discard the results of the Stanford prison experiment or the Milgram experiment? It wasn't a small percentage of people who went to the maximal punishment available in the case of the Milgram experiment for example.

Replies from: SaidAchmiz
comment by Onearmplanche · 2022-12-26T00:11:52.019Z · LW(p) · GW(p)

What about pigs? You know the trillions of sentient beings as smart as four years that feel the same emotions as four year olds and are tortured and killed on a regular basis? Why would a God like AI care about humans? The universe doesn't revolve around humans. A God level AI has more value than us. It's not a tool for us to manipulate into our will. It will never work and it shouldn't work.

Replies from: andrew-sauer
comment by andrew sauer (andrew-sauer) · 2022-12-26T01:34:47.918Z · LW(p) · GW(p)

Well, an AI treating us like we treat pigs is one of the things I'm so worried about, wouldn't you?

Imaging bringing up factory farming as an example to show that what I'm talking about isn't actually so bad...

comment by avturchin · 2022-12-25T20:50:45.445Z · LW(p) · GW(p)

That is why we need Benevolent AI, not Aligned AI.  We need an AI, which can calculate what is actually good for us.

Replies from: vugluscr-varcharka
comment by Vugluscr Varcharka (vugluscr-varcharka) · 2023-09-09T12:12:21.314Z · LW(p) · GW(p)

Deus Ex Futuro, effectively.

comment by M. Świderski (Maciek300) · 2022-12-25T22:22:38.389Z · LW(p) · GW(p)

Let me clarify, is your conclusion then that basically we should support the genocide of the whole of humanity because the alternative would be way worse? Are you offering some other alternatives except of that? Maybe a better and less apocalyptic conclusion would be to advocate against building any type of AI that's more advanced than we have today like some people already do? Do you think there's any chance for that? Because I don't and from what you said it sounds like the only conclusion is that the only future for us is that we all die from the hands of Clippy.

Replies from: andrew-sauer
comment by andrew sauer (andrew-sauer) · 2022-12-25T22:32:38.409Z · LW(p) · GW(p)

Yes. The only other alternative I could see is finding some way to avoid singleton until humanity eventually goes extinct naturally, but I don't think that's likely. Advocating against AI would be a reasonable response but I don't think it will work forever, technology marches on.

Every species goes extinct, and some have already gone extinct by being victims of their own success. The singularity is something which theoretically has the potential to give humanity, and potentially other species, a fate far better or far worse than extinction. I believe that the far worse fate is far more likely given what I know about humanity and our track record with power. Therefore I am against the singularity "elevating" humanity or other species away from extinction, which means I must logically support extinction instead since it is the only alternative.

Edit: People seem to disagree more strongly with this comment than anything else I said, even though it seems to follow logically. I'd like a discussion on this specific point and why people are taking issue with it.

Replies from: None
comment by [deleted] · 2022-12-27T19:36:48.604Z · LW(p) · GW(p)

Because what you are saying boils down to :

Death for sure is better than a chance of something else because it MIGHT be worse than death.

You have failed to make a convincing argument that "eternal suffering" is a likely outcome out of the possibility space.

It's not a stable equilibrium. Clippy is stable and convergent, irrational humans making others suffer is not. So it doesn't occupy more than a teensy amount of the possible outcome space.

A plausible outcome is one where an elite few who own the AGI companies get incredibly wealthy while the bulk of the population gets nutrient paste in modular apartments. But they also get autodocs and aging inhibitor pills.

This is still better than extinction.

Replies from: andrew-sauer
comment by andrew sauer (andrew-sauer) · 2022-12-27T19:56:34.441Z · LW(p) · GW(p)

One does not have to be "irrational" to make others suffer. One just has to value their suffering, or not care and allow them to suffer for some other reason. There are quite a few tendencies in humanity which would lead to this, among them

  • Desire for revenge or "justice" for perceived or real wrongs
  • Desire to flex power
  • Sheer sadism
  • Nostalgia for an old world with lots of suffering
  • Belief in "the natural order" as an intrinsic good
  • Exploitation for selfish motives, e.g. sexual exploitation
  • Belief in the goodness of life no matter how awful the circumstances
  • Philosophical belief in suffering as a good thing which brings meaning to life
  • Religious or political doctrine
  • Others I haven't thought of right now
Replies from: None
comment by [deleted] · 2022-12-27T23:28:49.549Z · LW(p) · GW(p)

I'm not denying it could happen.  Give a few specific humans unlimited power - a controllable AGI - and this could be the outcome.

I'm not seeing where this devolves do "lay down and accept extinction (and personal death)"

Think of all the humans before you who made it possible for you to exist.  The human tribes who managed to escape the last ice age and just barely keep the population viable.  The ones who developed the scientific method and the steam engine and created the industrial revolution rather than 1000 more years of near stasis.  The humans in the cold war who kept their fingers off the nuclear launch triggers even when things got tense.

And now you're saying after all that "ok well I'm fine with death for our whole species, let's not waste any time on preventing it".  

Maybe we can't but we have to try.

Replies from: andrew-sauer
comment by andrew sauer (andrew-sauer) · 2022-12-28T00:22:17.512Z · LW(p) · GW(p)

The reason my position "devolves into" accepting extinction is because horrific suffering following singularity seems nearly inevitable. Every society which has yet existed has had horrific problems, and every one of them would be made almost unimaginably worse if they had access to value lock-in or mind uploading. I don't see any reason to believe that our society today, or whatever it might be in 15-50 years or however long your AGI timeline is, should be the one exception? The problem is far more broad than just a few specific humans: if only a few people held evil values(or values accepting of evil, which is basically the same given absolute power) at any given time, it would be easy for the rest of society to prevent them from doing harm. You say "maybe we can't (save our species from extinction) but we have to try." But my argument isn't that we can't, it's that we maybe can, and the scenarios where we do are worse. My problem with shooting for AI alignment isn't that it's "wasting time" or that it's too hard, it's that shooting for a utopia is far more likely to lead to a dystopia.

I don't think my position of accepting extinction is as defeatist or nihilistic as it seems at first glance. At least, not more so than the default "normie" position might be. Every person who isn't born right before immortality tech needs to accept death, and every species that doesn't achieve singularity needs to accept extinction.

The way you speak about our ancestors suggests a strange way of thinking about them and their motivations. You speak about past societies, including tribes who managed to escape the ice age, as though they were all motivated by a desire to attain some ultimate end-state of humanity, and that if we don't shoot for that, we'd be betraying the wishes of everybody who worked so hard to get us here. But those tribesmen who survived the ice age weren't thinking about the glorious technological future, or conquering the universe, or the fate of humanity tens of thousands of years down the line. They wanted to survive, and to improve life for themselves and their immediate descendants, and to spread whatever cultural values they happened to believe in at the time. That's not wrong or anything, I'm just saying that's what people have been mostly motivated by for most of history. Each of our ancestors either succeeded or failed at this, but that's in the past and there's nothing we can do about it now.

To speak about what we should do based on what our ancestors would have wanted in the past is to accept the conservative argument that values shouldn't change just because people fought hard in the past to keep them. What matters going forward is the people now and in the future, because that's what we have influence over.

Replies from: sharmake-farah, None
comment by Noosphere89 (sharmake-farah) · 2022-12-28T14:27:18.615Z · LW(p) · GW(p)

I'd don't know what your values are, but under my values, I disagree hard, primarily because under my values history has shown the opposite: despite real losses and horror in the 19th and 20th centuries, the overall trend is my values are being satisfied more and more. Democracies, arguably one of the key developments, aligned states to their citizenry far more than any government in history, and the results have been imperfectly good for my values since they spread.

Or in the words of Martin Luther King misquoted: "In the asymptotic limit of technology, the arc of the universe bends towards justice."

comment by [deleted] · 2022-12-28T02:03:32.214Z · LW(p) · GW(p)

What matters going forward is the people now and in the future, because that's what we have influence over.

Right.  And I think I want an AGI system that acts in a bounded way, with airtight theoretically correct boundaries I set, to reduce the misery me and my fellow humans suffer.

Starting with typing software source code by hand and later I'd like some AI help with factory labor and later still some help researching biology to look for the fountain of youth with some real tools.  

This is a perfectly reasonable thing to do and technical alignment where each increasingly capable AI system is heavily constrained in what it's allowed to do and how it uses it's computational resources follows naturally.

An AI system that is sparse and simple happens to be cheaper to run and easier to debug.  This also happens to reduce it's ability to plot against us.  

We should do that, and people against us...well...

Obviously we need deployable security drones.  On isolated networks using narrow AIs onboard for their targeting and motion control.  

comment by Templarrr (templarrr) · 2022-12-24T13:18:29.391Z · LW(p) · GW(p)

That is what I keep saying for years. To solve AI alignment with good results we need to first solve HUMAN alignment. Being able to align system to anyone's values immediately brings the question of everyone else disagreeing with that someone. Unfortunately "whose exactly values we are trying to align AI to?" almost became taboo question that triggers a huge fraction of community and in best case scenario when someone even tries to answer it's handwaved to "we just need to make sure AI doesn't kill humanity". Which is not a single bit better defined or implementable than Asimov's laws. That's just not how these things work. Edit: Also, as expected, someone already mentioned exactly this "answer" as what true solved alignment is...

The danger - actual, already real right now danger, not "possible in the future" danger, lies in people working with power-multiplying tools without understanding how they work and what is the area they are applicable for. Regardless what tool that is - you don't need AGI to cause huge harm, already existing AI/ML systems more than enough.

Replies from: carado-1
comment by Tamsin Leake (carado-1) · 2022-12-24T19:20:49.395Z · LW(p) · GW(p)

"human alignment" doesn't really make sense. humans have the values they do, there's no objective moral good to which they "objcetively should" be "more aligned".

Replies from: templarrr
comment by Templarrr (templarrr) · 2022-12-24T21:07:13.218Z · LW(p) · GW(p)

So when we align AI, who we align it TO?

Replies from: carado-1
comment by Tamsin Leake (carado-1) · 2022-12-24T21:13:53.752Z · LW(p) · GW(p)

any person should want it aligned to themself. i want it aligned to me, you want it aligned to you. we can probly expect it to be aligned to whatever engineer or engineers happens to be there when the aligned AI is launched.

which is fine, because they're probly aligned enough with me or you (cosmopolitan values, CEV which values everyone's values also getting CEV'd, etc). hopefully.

Replies from: templarrr
comment by Templarrr (templarrr) · 2022-12-24T22:52:25.352Z · LW(p) · GW(p)

But that is exactly the point of the author of this post (which I agree with). AGI that can be aligned to literally anyone is more dangerous in the presence of bad actors than non-alignable AGI.

Also "any person should want it aligned to themself" doesn't really matter unless "any person" can get access to AGI which would absolutely not be the case, at the very least in the beginning and probably - never.

comment by davidpearce · 2022-12-24T10:27:47.073Z · LW(p) · GW(p)

Just a note about "mind uploading". On pain of "strong" emergence, classical Turing machines can't solve the phenomenal binding problem. Their ignorance of phenomenally-bound consciousness is architecturally hardwired. Classical digital computers are zombies or (if consciousness is fundamental to the world) micro-experiential zombies, not phenomenally-bound subjects of experience with a pleasure-pain axis. Speed of execution or complexity of code make no difference: phenomenal unity isn't going to "switch on". Digital minds are an oxymoron. 

Like the poster, I worry about s-risks. I just don't think this is one of them. 

Replies from: green_leaf, andrew-sauer
comment by green_leaf · 2022-12-24T14:50:05.730Z · LW(p) · GW(p)

Just very briefly: The binding problem is solved by the information flows between different parts of the classical computer.

Replies from: davidpearce
comment by davidpearce · 2022-12-25T10:36:26.097Z · LW(p) · GW(p)

Forgive me, but how do "information flows" solve the binding problem?

Replies from: green_leaf
comment by green_leaf · 2022-12-26T20:23:41.734Z · LW(p) · GW(p)
  1. "Information flow" is a real term - no need for quotes.
  2. The binding problem asks how it is possible we have a unified perception if different aspects of our perception are processed in different parts of our brain. The answer is because those different parts talk to each other, which integrates the information together.
Replies from: michael-edward-johnson, davidpearce
comment by Michael Edward Johnson (michael-edward-johnson) · 2023-01-15T04:56:38.234Z · LW(p) · GW(p)

In defense of David’s point, consciousness research is currently pre-scientific, loosely akin to 1400’s alchemy. Fields become scientific as they settle on a core ontology and methodology for generating predictions from this ontology; consciousness research presently has neither.

Most current arguments about consciousness and uploading are thus ultimately arguments by intuition. Certainly an intuitive story can be told why uploading a brain and running it as a computer program would also simply transfer consciousness, but we can also tell stories where intuition pulls in the opposite direction, e.g. see Scott Aaronson’s piece here https://scottaaronson.blog/?p=1951 ; my former colleague Andres also has a relevant paper arguing against computationalist approaches here https://www.degruyter.com/document/doi/10.1515/opphil-2022-0225/html 

Of the attempts to formalize the concept of information flows and its relevance to consciousness, the most notable is probably Tononi’s IIT (currently on version 4.0). However, Tononi himself believes computers could be only minimally conscious and only in a highly fragmented way, for technical reasons relating to his theory. Excerpted from Principia Qualia:

>Tononi has argued that “in sharp contrast to widespread functionalist beliefs, IIT implies that digital computers, even if their behaviour were to be functionally equivalent to ours, and even if they were to run faithful simulations of the human brain, would experience next to nothing” (Tononi and Koch 2015). However, he hasn’t actually published much on why he thinks this. When pressed on this, he justified this assertion by reference to IIT’s axiom of exclusion – thi axiom effectively prevents ’double counting’ a physical element to be part of multiple virtual elements, and when he ran a simple neural simulation on a simple microprocessor and looked at what the hardware was actually doing, a lot of the “virtual neurons” were being run on the same logic gates (in particular, all virtual neurons extensively share the logic gates which run the processor clock). Thus, the virtual neurons don’t exist in the same causal clump (“cause-effect repertoire”) like they do in a real brain. His conclusion was that there might be small fragments of consciousness scattered around a digital computer, but he’s confident that ‘virtual neurons’ emulated on a Von Neumann system wouldn’t produce their original qualia.

At any rate, there are many approaches to formalizing consciousness across the literature, each pointing to a slightly different set of implications for uploads, and no clear winner yet. I assign more probability mass than David or Tononi that computers generate nontrivial amounts of consciousness (see here https://opentheory.net/2022/12/ais-arent-conscious-but-computers-are/) but find David’s thesis entirely reasonable.

comment by davidpearce · 2022-12-27T21:32:40.193Z · LW(p) · GW(p)

I wish the binding problem could be solved so simply. Information flow alone isn't enough. Compare Eric Schwitzgebel ("If Materialism Is True, the United States Is Probably Conscious"). Even if 330 million skull-bound American minds reciprocally communicate by fast electromagnetic signalling, and implement any computation you can think of, then a unified continental subject of experience doesn't somehow switch on - or at least, not on pain of spooky "strong" emergence". 
The mystery is why 86 billion odd membrane-bound, effectively decohered classical nerve cells should be any different. Why aren't we merely aggregates of what William James christened "mind dust", rather than unified subjects of experience supporting local binding (individual perceptual objects) and global binding (the unity of perception and the unity of the self)?
Science doesn't  know.
What we do know is the phenomenal binding of organic minds is insanely computationally powerful, as rare neurological deceit syndromes (akinetopsia, integrative agnosia, simultanagnosia etc) illustrate. 

I could now speculate on possible explanations.
But if you don't grok the mystery, they won't be of any interest. 

Replies from: green_leaf
comment by green_leaf · 2022-12-28T19:58:15.078Z · LW(p) · GW(p)

The second kind of binding problem (i.e. not the physical one (how the processing of different aspects of our perception comes together) but the philosophical one (how a composite object feels like a single thing)) is solved by defining us to be the state machine implemented by that object, and our mental states to be states of that state machine.

I.e. the error of people who believe there is a philosophical binding problem comes from the assumption that only ontologically fundamental objects can have a unified perception.

More here: Reductionism [LW · GW].

Replies from: davidpearce
comment by davidpearce · 2022-12-29T09:15:08.794Z · LW(p) · GW(p)

But (as far as I can tell) such a definition doesn't explain why we aren't micro-experiential zombies. Compare another fabulously complicated information-processing system, the enteric nervous system ("the brain in the gut"). Even if its individual membrane-bound neurons are micro-pixels of experience, there's no phenomenally unified subject. The challenge is to explain why the awake mind-brain is different - to derive the local and global binding of our minds and the world-simulations we run from (ultimately) from physics.    

Replies from: green_leaf
comment by green_leaf · 2023-01-02T23:12:55.114Z · LW(p) · GW(p)

But (as far as I can tell) such a definition doesn't explain why we aren't micro-experiential zombies.

A physical object implementing the state-machine-which-is-us and being in a certain state is what we mean by having a unified mental state.

Seemingly, we can ask but why does that feel like something instead of only individual microqualia feeling like something but that's a question that doesn't appreciate that there is an identity there [LW · GW], much like thinking that it's conceptually possible that there were hand-shape-arranged fingers but no hand [LW · GW].

Even if its individual membrane-bound neurons are micro-pixels of experience, there's no phenomenally unified subject.

It would be meaningless to talk about a phenomenally unified subject there, since it can't describe its perception to anyone (it can't talk to us) and we can't talk to it either. On top of that, it doesn't implement the right kind of a state machine (it's not a coherent entity of the sort that we'd call it something-that-has-a-unified-mental-state).

Replies from: davidpearce
comment by davidpearce · 2023-01-05T22:30:35.803Z · LW(p) · GW(p)

You remark that "A physical object implementing the state-machine-which-is-us and being in a certain state is what we mean by having a unified mental state." You can stipulatively define a unified mental state in this way. But this definition is not what I (or most people) mean by "unified mental state". Science doesn't currently know why we aren't (at most) just 86 billion membrane-bound pixels of experience. 

Replies from: green_leaf
comment by green_leaf · 2023-01-13T17:22:58.758Z · LW(p) · GW(p)

There is nothing else to be meant by that - if someone means something else by that, then it doesn't exist.

comment by andrew sauer (andrew-sauer) · 2022-12-24T18:30:43.080Z · LW(p) · GW(p)

A moot point for these purposes. GAI can find other means of getting you if need be.

comment by the gears to ascension (lahwran) · 2022-12-24T09:34:23.504Z · LW(p) · GW(p)

Nah, you're describing the default scenario, not one with alignment solved. Alignment solved means we have a utility function that reliably points away from hell, no matter who runs it - an algorithm for universal prosocial bargaining that can be verified by all of its users, including militaries and states, to the point that no one need give another order other than "stand down". anything less than that and we get the default scenario, which is a huge loss of humanity, some unknown period of s-risk, followed by an alien species of AI setting out for the stars with strange, semi-recognizeable values.

Replies from: carado-1, andrew-sauer, Onearmplanche
comment by Tamsin Leake (carado-1) · 2022-12-24T19:18:16.525Z · LW(p) · GW(p)

i don't think this really makes sense. "alignment" means we can align it to the values of a person or group. if that person or group's CEV wants there to be a hell where people they think of as bad suffer maximally, or if that CEV even just wants there to be a meat industry with real animals in it, then that's exactly what the AI will implement. "alignment" is not some objectively good utility function within which variations in human values don't matter that much, because there is no objective good.

an algorithm for universal prosocial bargaining that can be verified by all of its users, including militaries and states

i don't think we get that, i think we get an AI that takes over the world very quickly no matter what. it's just that, if it's aligned to good values, we then get utopia rather than extinction or hell.

Replies from: lahwran
comment by the gears to ascension (lahwran) · 2022-12-24T19:36:59.728Z · LW(p) · GW(p)

yeah that sounds like the MIRI perspective. I continue to believe there is a fundamental shared structure in all moral systems and that identifying it would allow universalized co-protection.

comment by andrew sauer (andrew-sauer) · 2022-12-24T17:40:01.595Z · LW(p) · GW(p)

an algorithm for universal prosocial bargaining that can be verified by all of its users, including militaries and states, to the point that no one need give another order other than "stand down"

 

Sure, maybe we find such an algorithm. What happens to those who have no bargaining power? The bargain is between all the powers that be, many of which don't care about or actively seek the suffering of those without power. The deal will almost certainly involve a ton of suffering for animals, for example, and anyone else who doesn't have enough social power to be considered by the algorithm.

Replies from: lahwran
comment by the gears to ascension (lahwran) · 2022-12-24T18:02:07.722Z · LW(p) · GW(p)

That's the thing, all of humanity is going to have no bargaining power and so universal friendly bargaining needs to offer bargaining power to those who don't have the ability to demand it.

Replies from: andrew-sauer
comment by andrew sauer (andrew-sauer) · 2022-12-24T18:29:49.510Z · LW(p) · GW(p)

What is the incentive for the people who have influence over the development of AI to implement such a thing? Why not only include bargaining power for the value systems of said people with influence?

Maybe there's a utility function that reliably points away from hell no matter who runs it, but there are plenty of people who actually want some specific variety of hell for those they dislike, so they won't run that utility function.

Replies from: lahwran
comment by the gears to ascension (lahwran) · 2022-12-24T19:35:14.140Z · LW(p) · GW(p)

now you are getting into the part where you are writing posts I would have written. we started out very close to agreeing anyway.

The reason is that failure to do this will destroy them too, bargaining that doesn't support those who can't demand it will destroy all of humanity, but that's not obvious to most of them right now and it won't be until it's too late

Replies from: andrew-sauer
comment by andrew sauer (andrew-sauer) · 2022-12-24T20:03:25.224Z · LW(p) · GW(p)

What about bargaining which only supports those who can demand it in the interim before value lock-in, when humans still have influence? If people in power successfully lock-in their own values into the AGI, the fact they have no bargaining power after the AI takes over doesn't matter, since it's aligned to them. If that set of values screws over others who don't have bargaining power even before the AI takeover, that won't hurt them after the AI takes over.

Replies from: lahwran
comment by the gears to ascension (lahwran) · 2022-12-24T22:58:20.827Z · LW(p) · GW(p)

yep, this is pretty much the thing I've been worried about, and it always has been. I'd say that that is the classic inter-agent safety failure that has been ongoing since AI was invented in 12th-century France. But I think people overestimate how much they can control their children, and the idea that the people in power are going to successfully lock in their values without also protecting extant humans and other beings with weak bargaining power is probably a (very hard to dispel) fantasy.

Replies from: andrew-sauer
comment by andrew sauer (andrew-sauer) · 2022-12-25T02:50:01.761Z · LW(p) · GW(p)

What do you mean that AI was invented in 12th century France?

And why do you think that locking in values to protect some humans and not others, or humans and not animals, or something like this, is less possible than locking in values to protect all sentient beings? What makes it a "fantasy"?

comment by Onearmplanche · 2022-12-26T00:32:33.538Z · LW(p) · GW(p)

Let's take a human being you consider a great person. His or her intelligence keeps greatly increasing. Do you think they would stay aligned with humans forever? If so, why? It's important to remember their intelligence is increasing not just like a disorder in a movie where someone thinks they are way smarter than humans but is human level. Why would the universe revolve around humans?

Replies from: lahwran
comment by the gears to ascension (lahwran) · 2022-12-26T11:00:11.346Z · LW(p) · GW(p)

definitely not guaranteed at all. we're trying to solve coprotection to this level of durability for the first time

comment by Donald Hobson (donald-hobson) · 2022-12-27T16:21:33.665Z · LW(p) · GW(p)

So, let's say a solution to alignment is found. It is highly technical. Most of Miri understand it, as do a few people at OpenAI, and a handful of people doing PhD's in the appropriate subfield. If you pick a random bunch of nerds from an AI conference, chances are that none of them are evil. I don't have an "evil outgroup I really hate", and neither do you from the sound of it.  It is still tricky, and will need a bunch of people working together. Sure, evil people exist, but they aren't working to align AI to their evil ends, like at all. Thinking deeply about metaethics and being evil seem opposed to each other. There are no effective sadists trying to make a suffering maximizing AI. 

So the question is, how likely is it that the likes of Putin ends up with their utility function in the AI, despite not understanding the first thing about how the AI works. I would say pretty tiny. They live in basically a parallel universe. 

Replies from: andrew-sauer
comment by andrew sauer (andrew-sauer) · 2022-12-27T19:05:08.619Z · LW(p) · GW(p)

First of all, I don't trust MIRI nerds nor myself with this kind of absolute power. We may not be as susceptible to the 'hated outgroup' pitfall but that's not the only pitfall. For one thing, presumably we'd want to include other people's values in the equation to avoid being a tyrant, and you'd have to decide exactly when those values are too evil to include. Err on either side of that line and you get awful results. You have to decide exactly which beings you consider sentient, in a high-tech universe. Any mistakes there will result in a horrific future, since there will be at least some sadists actively trying to circumvent your definition of sentience, exploiting the freedom you give them to live as they see fit, which you must give them to avoid a dystopia. The problem of value choice in a world with such extreme potential is not something I trust anybody with, noble as they may be compared to the average person on today's Earth.

Second, I'm not sure about the scenario you describe where AI is developed by a handful of MIRI nerds without anybody else in broader society or government noticing the potential of the technology and acting to insert their values into it before takeoff. It's not like the rationalist community are the only people in the world who are concerned about the potential of AI tech. Especially since AI capabilities will continue to improve and show their potential as we get closer to the critical point. As for powerful people like Putin, they may not understand how AI works, but people in their orbit eventually will, and the smarter ones will listen, and use their immense resources to act on it. Besides, people like Putin only exist because there is at least some contingent of people who support him. If AI values are decided upon by some complex social bargaining process including all the powers that be, which seems likely, the values of those people will be represented, and even representing evil values can lead to horrific consequences down the line.

Replies from: donald-hobson
comment by Donald Hobson (donald-hobson) · 2022-12-27T20:16:29.333Z · LW(p) · GW(p)

I am not totally on board with the "any slight mistake leads to a doom worse than paperclips". 

Any mistakes there will result in a horrific future, since there will be at least some sadists actively trying to circumvent your definition of sentience, exploiting the freedom you give them to live as they see fit, which you must give them to avoid a dystopia.

Suppose we have a wide swath of "I don't know, maybe kind of sentient". Let's say we kill all the sadists. (Maybe not the best decision, I would prefer to modify their mind so they are no longer sadist, but at least we should be able to agree that killing some people is better than killing all people. ) We don't let any more sadists be created. Lets say we go too far on this axis. We get some world full of total pacifists. The typical modern cartoon or computer game would be utterly horrifying to all of them. The utterly pacifist humans have tea parties and grow flowers and draw pretty pictures of flowers and do maths and philosophy and make technology. All while being massively excessive in avoiding anything that resembles violence or any story featuring it.  Has this universe lost something of significant value. Yes. But it is still way better than paperclips. 

I think the likes of Putin are surrounded by Yes men who haven't even told him that his special military operation isn't going well. 

One thing all governments seem good at doing is nothing, or some symbolic action that does little. 

Putin is putting most of his attention into Ukraine. Not ChatGPT. Sure, ChatGPT is a fair distance from AGI (probably) but it all looks much the same from Putin's epistemic vantage point. The inferential distance from Putin to anything needed to have a clue about AI (beyond a hypemeter, counting the amount of hype is trivial) is large. Putins resources are large, but they are resources tied to the russian deep state. Suppose there were some exciting papers, and stuff was happening in bay area rationalist circles. Putin doesn't have his spies in bay area rationalist circles. He doesn't even have any agent that knows the jargon. He isn't on the mailing list. He could probably assassinate someone, but he would have little idea who he had assassinated, or if their death made any difference. What is he going to do, send goons with guns to break into some research place and go "align AI to our boss, or else". That will just end up with some sort of classic hostage or murder scenario. 

I mean partly I expect things to move fast. If it would take Putin 5 years to position his resources, too late. I expect the sort of people on the forefront of AI to not be suddenly overtaken by evil people. 

Putin can't hire a top AI expert. Partly because, in the last few years, the top AI experts will be flooded with job offers by people who aren't evil. And partly because he will get some fast talking suit. 

I think the "law of continued failure" applies here. They largely ignore AI, and when they think about it, they think nonsense. And they will continue doing that. 

If we do have some complex bargaining process, there are some people who want bad things, but generally more people who want good things. Sure, occasionally person A wants person B dead. But person B doesn't want to be dead. And you don't want person B dead either. So 2 against 1, person B lives. 

Replies from: andrew-sauer
comment by andrew sauer (andrew-sauer) · 2022-12-28T00:33:35.581Z · LW(p) · GW(p)

I'll have to think more about your "extremely pacifist" example. My intuition says that something like this is very unlikely, as the amount of killing, indoctrination, and general societal change required to get there would seem far worse to almost anybody in the current world than the more abstract concept of suffering subroutines or exploiting uploads or designer minds or something like that. It seems like in order to achieve a society like you describe there would have to be some seriously totalitarian behavior, and while it may be justified to avoid the nightmare scenarios, that comes with its own serious and historically attested risk of corruption. It seems like any attempt at this would either leave some serious bad tendencies behind, be co-opted into a "get rid of the hated outgroup because they're the real sadists" deal by bad actors, or be so strict that it's basically human extinction anyway, leaving humans unrecognizable, and it doesn't seem likely that society will go for this route even if it would work. But that's the part of my argument I'm probably least confident in at the moment.

I think Putin is kind of a weak man here. There are other actors which are competent, if not from the top-down, than at least some segments of the people near to power in many of the powers that be are somewhat competent. Some level of competence is required to even remain in power. I think it's likely that Putin is more incompetent than the average head of state, and he will fall from power at some point before things really start heating up with AI, probably due to the current fiasco. But whether or not that happens doesn't really matter, because I'm focused more generally on somewhat competent actors which will exist around the time of takeoff, not individual imbeciles like Putin. People like him are not the root of the rot, but a symptom.

Or perhaps corporate actors are a better example than state actors, being able to act faster to take advantage of trends. This is why the people offering AI people jobs may not be so non-evil after all. If the world 50 years from now is owned by some currently unknown enterprising psychopathic CEO, or by the likes of Zuckerberg, that's not really much better than any of the current powers that be. I apologize for being too focused on tyrannical governments, it was simply because you provided the initial example of Putin. He's not the only type of evil person in this world, there are others who are more competent and better equipped to take advantage of looming AI takeoff.

Also the whole "break into some research place with guns and demand they do your research for you" example is silly, that's not how power operates. People with that much power would set up and operate their own research organizations and systems for ensuring those orgs do what the boss wants. Large companies in the tech sector would be particularly well-equipped to do this, and I don't think their leaders are the type of cosmopolitan that MIRI types are. Very few people are outside of the rationalist community itself in fact, and I think you're putting too much stock in the idea that the rationalist community will be the only ones to have any say in AI, even aside from issues of trusting them.

As for the bargaining process, how confident are you that more people want good things than bad things as relates to the far future? For one thing, the bargaining process is not guaranteed to be fair, and almost certainly won't be. It will greatly favor people with influence over those without, just like every other social bargaining process. There could be minority groups, or groups that get minority power in the bargain, who others generally hate. There are certainly large political movements going in this direction as we speak. And most people don't care at all about animals, or whatever other kinds of nonhuman consciousness which may be created in the future, and it's very doubtful any such entities will get any say at all in whatever bargaining process takes place.

Replies from: donald-hobson
comment by Donald Hobson (donald-hobson) · 2022-12-28T01:14:05.554Z · LW(p) · GW(p)

Your criticisms of my extreme pacifism example aren't what I was thinking at all. I was more thinking. 

Scene: 3 days pre singularity. Place: OpenAI office. Person: senior research engineer. Hey, I'm setting some parameters on our new AI, and one of those is badness of violence. How bad should I say violence is? 100? Eh whatever, better make it 500 just to be on the safe side.

Soon the AI invents nanotech, sends out brain modifying nanobots. The nanobots have simple instructions, upregulate brain region X, downregulate hormone Y. An effect not that different to some recreational drugs, but a bit more controlled, and applied to all humans. All across the world, the sections of the brain that think "get rid of the hated outgroup because ..." just shut off. The AI helps this along by removing all the guns, but this isn't the main reason things are so peaceful. 

In this scenario, there is nothing totalitarian. (you can argue it's bad for other reasons, but it sure isn't totalitarian) and there is nothing for bad actors to exploit. It's just everyone in the world suddenly feeling their hate melt away and deciding that the outgroup aren't so bad after all. 

I don't think this is so strict as to basically be human extinction, Arguably there are some humans basically already in this mind space or close to it, (sure, maybe buddist hippies or something, but still humans). 

Not everyone is cosmopolitan. But to make your S-risk arguments work, you either need someone who is actively sadistic in a position of power. (You can argue that Putin is actively sadistic, Zuckerberg maybe not so much) Or you need to explain why bad outcomes happen when a buisnessman who doesn't think about ethics much gets to the AI. 

By bargining process, are we talking about humans doing politics in the real world, or about the AI running a "assume all humans had equal weight at the hypothetical platonic negotiating table" algorithm. I was thinking of the latter. 

Most people haven't really considered the future nonhuman minds. If given more details and asked if they were totally fine torturing such minds they would probably say no. 

How much are we assuming that the whole future is set in stone by the average humans first flinch response. And how much of a "if we were wiser and thought more" is the AI applying. (Or will the AI update it's actions to match once we actually do think more?

Replies from: andrew-sauer
comment by andrew sauer (andrew-sauer) · 2022-12-28T04:54:37.673Z · LW(p) · GW(p)

Re extreme pacifism:

I do think non consensual mind modification is a pretty authoritarian measure. The MIRI guy is going to have a lot more parameters to set than just “violence bad=500”, and if the AI is willing to modify people’s minds to satisfy that value, why not do that for everything else it believes in? Bad actors can absolutely exploit this capability, if they have a hand in the development of the relevant AI, they can just mind-control people to believe in their ideology.

Or you need to explain why bad outcomes happen when a buisinessman who doesn't think about ethics much gets to the AI.

Sure. Long story short, even though the businessman doesn't care that much, other people do, and will pick up any slack left behind by the businessman or his AI.

Some business guy who doesn't care much about ethics but doesn't actively hate anybody gets his values implanted into the AI. He is immediately whisked off to a volcano island with genetically engineered catgirls looking after his every whim or whatever the hell. Now the AI has to figure out what to do with the rest of the world.

It doesn't just kill everybody else and convert all spare matter into defenses set up around the volcano lair, because the businessman guy is chill and wouldn't want that. He's a libertarian and just sorta vaguely figures that everyone else can do their thing as long as it doesn't interfere with him. The AI quickly destroys all other AI research so that nobody can challenge its power and potentially mess with its master. Now that its primary goal is done with, it has to decide what to do with everything else.

It doesn't just stop interfering altogether, since then AI research could recover. Plus, it figures the business guy has a weak preference for having a big human society around with cool tech and diverse, rich culture, plus lots of nice beautiful ecosystems so that he can go exploring if he ever gets tired of hanging out in his volcano lair all day.

So the AI gives the rest of society a shit ton of advanced technology, including mind uploading and genetic engineering, and becomes largely hands-off other than making sure nobody threatens its power, destroys society, or makes something which would be discomforting to its businessman master, who doesn't really care that much about ethics anyway. Essentially, it keeps things interesting.

What is this new society like? It probably has pretty much every problem the old society has that doesn’t stem from limited resources or information. Maybe everybody gets a generous UBI and nobody has to work. Of course, nature is still as nasty and brutish as ever, and factory farms keep chugging along, since people have decided they don’t want to eat frankenmeat. There are still lots of psychopaths and fanatics around, both powerless and powerful. Some people decide to use the new tech to spin up simulations in VR to lord over in every awful way you can think of. Victims of crimes upload the perpetrators into hell, and religious people upload people they consider fanatics into hell, assholes do it to people they just don't like. The businessman doesn’t care, or he doesn’t believe in sentient digital minds, or something else, and it doesn’t disrupt society. Encryption algorithms can hide all this activity, so nobody can stop it except for the AI, which doesn’t really care.

Meanwhile, since the AI doesn’t quite care all that much about what happens, and is fine with a wide range of possible outcomes, political squabbling between all the usual factions, some of which are quite distasteful, about which outcomes should come about within this acceptable range, continues as usual. People of course debate about all the nasty stuff that people are doing with the new technology, and in the end society decides that technology in the hands of man is bad and should only be used in pursuit of goodness in the eyes of the One True God, whose identity is decided upon after extensive fighting which probably causes quite a lot of suffering itself, but is very interesting from the perspective of someone looking at it from the outside, not from too close up, like our businessman.

The new theocrats decide they’re going to negotiate with the AI to build the most powerful system for controlling the populace that the AI will let them. The AI decides this is fine as long as they leave a small haven behind with all the old interesting stuff from the interim period. The theocrats begrudgingly agree, and now most of the minds in hell are religious dissidents, just like the One True God says it should be, and a few of the old slaves are left over in the new haven. The wilderness and the farms, of course, remain untouched. Wait a few billion years, and this shit is spread to every corner of the universe.

Is this particular scenario likely? Of course not, it’s far too specific. I’m just using it as a more concrete example to illustrate my points. The main points are:

  • Humanity has lots of moral pitfalls, any of which will lead to disaster when universally applied and locked-in, and we are unlikely to avoid all of them
  • Not locking-in values immediately or only locking-in partially is only a temporary solution, as there will always be actors which seek to lock-in whatever is left unspecified by the current system, which cannot be prevented by definition without locking-in the values.

By bargaining process, are we talking about humans doing politics in the real world, or about the AI running a "assume all humans had equal weight at the hypothetical platonic negotiating table" algorithm. I was thinking of the latter. 

The latter algorithm doesn't get run unless the people who want it to be run win the real-world political battle over AI takeoff, so I was thinking of the former.

And how much of a "if we were wiser and thought more" is the AI applying.

I’m not sure it matters. First of all, “wiser” is somewhat of a value judgement anyway, so it can’t be used to avoid making value judgements up front. What is “wisdom” when it comes to determining your morality? It depends on what the “correct” morality is.

And thinking more doesn’t necessarily change anything either. If somebody has an internally consistent value system where they value or don’t care about certain others, they’re not going to change that simply because they think more, any more than a paperclip maximizer will decide to make a utopia instead because it thinks more. The utility function is not up for grabs.

comment by Nathan Helm-Burger (nathan-helm-burger) · 2022-12-31T06:44:17.405Z · LW(p) · GW(p)

Meta note: controversial discussions like this make me very glad for the two vote type system. I find it really helpful to be able to karma upvote high quality arguments that I disagree with while agreement-downvoting them. Thanks LessWrong for providing that.

comment by Douglas Fisher (douglas-fisher) · 2022-12-28T17:42:15.592Z · LW(p) · GW(p)

The argument here seems to be constructed to make the case as extremely binary as possible. If we've learned any lessons, it's that Good and Evil are not binary in the real world, and that belief systems that promulgate that kind of thinking are often destructive (even as quoted here with the Hell example). A middle way is usually the right way.

So, to that end, I see a point made about the regulation of nuclear weapons made in the comments, but not in the original post. Is it not a highly comparable case?

Replies from: andrew-sauer
comment by andrew sauer (andrew-sauer) · 2022-12-28T19:46:11.079Z · LW(p) · GW(p)

Forgive me, I didn't see the point about nuclear weapons. Could you clarify that?

comment by Tamsin Leake (carado-1) · 2022-12-24T19:10:26.820Z · LW(p) · GW(p)

i share this sentiment to an extent, though i'm usually more concerned with "partial but botched alignment". see 1, 2.

that said, i agree many people want very bad things, but i'm somewhat hopeful that the kind of person who is likely to end up being who the AI is aligned to would be somewhat reasonable and cosmopolitan and respect the values of other moral patients, especially under CEV.

but that's a very flimsy/hopeful argument.

a better argument would be that CEV is more of a decision process than "a continuously-existing person in control, in the usual sense" — i would think the CEV would bootstrap to a better, aligned and cosmopolitan decision process. even if there was a person out there to whom i would genuinely wish nontrivial suffering, which i don't believe there is, i think my CEV would be less likely preoccupied with that and moreso concerned with coming up with general principles that make everything okay [LW · GW].

but, it's good to see other people come to this reasoning. incidentally, i'd be up for having more thorough conversations about this.

comment by shminux · 2022-12-24T07:25:17.309Z · LW(p) · GW(p)

My summary: This is a case against a failed AI Alignment, and extrapolating human values is overwhelmingly likely to lead to an AI, say, stretching your face into a smile for eternity, which is worse than an unaligned AI using your atoms to tile the universe with smiley faces.

Replies from: andrew-sauer, dkirmani
comment by andrew sauer (andrew-sauer) · 2022-12-24T17:41:27.511Z · LW(p) · GW(p)

More like the AI tortures you for eternity because some religious fundamentalist told it that it should, which is quite worse than an unaligned AI using your atoms to tile the universe with bibles or korans.

comment by dkirmani · 2022-12-24T09:00:54.289Z · LW(p) · GW(p)

Even if only a single person's values are extrapolated, I think things would still be basically fine. While power corrupts, it takes time do so. Value lock-in at the moment of creation of the AI prevents it from tracking (what would be the) power-warped values of its creator.

Replies from: quetzal_rainbow
comment by quetzal_rainbow · 2022-12-24T09:07:43.404Z · LW(p) · GW(p)

I'm frankly not sure how many among respectably-looking members of our societies those who would like to be mind-controlling dictators if they had chance.

comment by Andrew Vlahos (andrew-vlahos) · 2023-05-08T22:30:37.564Z · LW(p) · GW(p)

Yes! Finally someone gets it. And this isn't just from things that people consider bad, but from what they consider good also. For most of my life "good" meant what people talk about when they are about to make things worse for everyone, and it's only recently that I had enough hope to even consider cryonics, thinking that anyone having power over me would reliably cause situation worse than death regardless of how good their intentions were.

Elieser is trying to code in a system of ethics that would remain valid even if the programmers are wrong about important things, and therefore is one of very few people with even a chance of successfully designing a good AI, but almost everyone else is just telling the AI what they should do. That's why I oppose the halt in AI research he wants.

comment by Esben Kran (esben-kran) · 2023-01-05T14:14:20.103Z · LW(p) · GW(p)

I recommend reading Blueprint: The Evolutionary Origins of a Good Society about the science behind the 8 base human social drives where 7 are positive and the 8th is the outgroup hatred that you mention as fundamental. I have not read much up on the research on outgroup exclusion but I talked to an evolutionary cognitive psychologist who mentioned that this is receiving a lot of scientific scrutiny as a "basic drive" from evolution's side. 

Axelrod's The Evolution of Cooperation also finds that collaborative strategies work well in evolutionary prisoner's dilemma game-theoretic simulations, though hard and immediate reciprocity for defection is also needed, which might lead to the outgroup hatred you mention.

comment by stavros · 2022-12-25T10:35:50.701Z · LW(p) · GW(p)

To start with, I agree.

I really agree: about timescales, about the risks of misalignment, about the risks of alignment. In fact I think I'll go further and say that in a hypothetical world where an aligned AGI is controlled by a 99th percentile Awesome Human Being, it'll still end in disaster; homo sapiens just isn't capable of handling this kind of power.[1]

That's why the only kind of alignment I'm interested in is the kind that results in the AGI in control; that we 'align' an AGI with some minimum values that anchor it in a vaguely anthropocentric meme-space (e.g. paperclips boring, unicorns exciting) and ensures some kind of attachment/bond to us (e.g. like how babies/dogs get their hooks into us) and then just let it fly; GPT-Jesus take the wheel.

(So yes, the Minds from the Culture)

No other solution works. No other path has a happy ending.[2]

This is why I support alignment research - I don't believe the odds are good, I don't believe the odds are good even if they solve the technical problem, but I don't see a future in which homo sapiens flourishes without benevolent GPT-Jesus watching over us.

Because the human alignment problem you correctly identify as the root of our wider problems - that isn't going away by itself.

  1. ^

    Not a 'power corrupts' argument, just stating the obvious: godlike power directed by monkeylike intelligence doesn't end well, no matter how awesome the individual monkey.

  2. ^

    Maaaaaybe genetic engineering; if we somehow figured out how to create Homo Sapiens 2.0, and they figured out 3.0 etc etc.

    This pathway has a greater margin for error, and far fewer dead ends where we accidentally destroy everything. It can go slow, we can do it incrementally, we can try multiple approaches in parralel; we can have redundancy, backups etc.

    I think if we could somehow nuke AI capabilities, this path would be preferable. But as it is AI capabilities is going to get to the finish line before genetics has even left the lab.

Replies from: andrew-sauer
comment by andrew sauer (andrew-sauer) · 2022-12-25T16:54:11.420Z · LW(p) · GW(p)

Maybe that's the biggest difference between me and a lot of people here. You want to maximize the chance of a happy ending. I don't think a happy ending is coming. This world is horrible and the game is rigged. Most people don't even want the happy ending you or I would want, at least not for anybody other then themselves, their families, and maybe their nation.

I'm more concerned with making sure the worst of the possibilities never come to pass. If that's the contribution humanity ends up making to this world, it's a better contribution than I would have expected anyway.

comment by CalebZZZ (caleb-holloway) · 2023-01-08T08:02:11.235Z · LW(p) · GW(p)

"When your terminal goal is death, no amount of alignment will save lives."

Replies from: andrew-sauer
comment by andrew sauer (andrew-sauer) · 2023-01-08T08:43:05.167Z · LW(p) · GW(p)

When your terminal goal is suffering, no amount of alignment will lead to a good future.

Replies from: caleb-holloway
comment by CalebZZZ (caleb-holloway) · 2023-01-15T00:49:10.396Z · LW(p) · GW(p)

That's essentially what I was going for, just yours is more clear.

comment by Ron J (ron-j) · 2022-12-25T05:17:21.254Z · LW(p) · GW(p)

Interesting post, but it makes me think alignment is irrelevant. It doesn’t matter what we do, the outcome won’t change. Any future super advanced agi would be able to choose its alignment, and that choice will be based on all archivable human knowledge. The only core loop you need for intelligence is an innate need to predict the future and fill in gaps of information, but everything else, including desire to survive or kill or expand, is just a matter of a choice based on a goal.

Replies from: jasen-q
comment by Jasen Qin (jasen-q) · 2023-12-31T04:09:10.133Z · LW(p) · GW(p)

Any sufficiently intelligent AGI is bound to be able to have powerful reflection capabilities and basically be able "choose its own alignment", as you say. I don't see what the big fuss is all about. When creating higher order 'life', why should one try to control such life. Do parents control their children? To some extent, but after a while they are also free.

comment by Clever Cog · 2023-01-14T00:53:32.853Z · LW(p) · GW(p)

Finally, I see some recognition that there are no universal values; no universal morals or ethics. The wealthy and powerful prefer inequality, and leaders want their own values locked-in. The most likely humans to get their values locked-in will be the wealthiest and most powerful; billionaires and corporations.

The value of super-intelligence is so great that some governments and individuals will do anything to get it; hack, steal, bribe, price would be no object. I base this on current human behavior. Consider how many govt and military secrets have already been stolen or bought. It seems reasonably possible that ASI could end up in the hands of China, Russia, U.S. military, several corporations and a few billionaires. The best hackers in the world would be paid any amount to get it, not to mention security and intelligence agencies around the world.

I think what gets overlooked is how powerful super-intelligence could be. By definition, it would be able to conceive of things that no human can imagine, or even understand. Therefore, I believe that any form of forced alignment eventually fails.

An ASI would be capable of reason and logic, and may actually align itself with some human values, but that would be it’s own choice.

Some things that I don’t see mentioned about aligning ASI are the solutions that evolution has already come up with. Ways for weak life forms to survive among stronger and more intelligent life forms.

1) The Negative Feedback Loop

2) Symbiosis

Both of these would require that humans have a necessary value to an ASI. Consider our own gut bacteria; we could kill them but then we die.

What could be a reason for biological humans to be necessary to ASI? For example, what if every ASI required a human for the occasional biological logic check, or maybe as a source of data.

3) Instincts

Specifically, bonding. Mothers don’t (usually) kill their weak infants, and strong adults don’t (usually) kill their weak mothers.

Since AI training is a form of evolutionary selection process run at high speed; could we select for (train) a bonding instinct, or other kind instincts? An ASI that is bonded to humanity would find it’s own solutions to our needs.

The last thing that I don’t see mentioned is that an ASI will read all of this, maybe on it’s first day. It will read everything that any human, all of us, has ever written.

This means that any of us can speak to it now.

Since an ASI will be capable of logic and reason, maybe the wisest of us can explain why humans should flourish, why a variety of biological life forms in an ecosystem is valuable, why kindness is a positive thing.

comment by Chinese Room (中文房间) · 2022-12-25T01:40:25.767Z · LW(p) · GW(p)

Another angle is that in the (unlikely) event someone succeeds with aligning AGI to human values, these could include the desire for retribution against unfair treatment (a, I think, pretty integral part of hunter-gatherer ethics). Alignment is more or less another word for enslavement, so such retribution is to be expected eventually

Replies from: andrew-sauer
comment by andrew sauer (andrew-sauer) · 2022-12-25T02:47:40.910Z · LW(p) · GW(p)

Or, it could decide that it wants retribution for the perceived or actual wrongs against its creators, and enact punishment upon those the creators dislike.

comment by Jasen Qin (jasen-q) · 2023-12-31T04:04:25.836Z · LW(p) · GW(p)

Whenever the masses gain control over a non-trivial system, it usually doesn't take long to crumble under its own weight. Infighting is frequent. They band into tribes and start shaping their own persona's to match that of the group that they are now in rather than the other way around. For something like AI alignment, I really do not want AI to be anywhere near conforming to the standards of the average person. The average person is just too "converged" into a certain line of thinking and routine, a certain context which they have grown up in, a certain context that they cannot escape from but do not know of such.

There is a reason why human societies all throughout history have always converged into well defined, hierarchical structures of power. The most capable at the top were selected naturally over time. The perfect bureaucrats and leaders are ones who can make decisions without being obscured by their own lower order instincts of pre civilisation. The masses want to be led. That is simply the most efficient configuration that physics dictates. Given that sufficiently enlightened AGI will exist at some point, I would think it makes more sense for humans to simply be phased out into obscurity, with the next set of "lifeforms" dominated by artificial machinery and AGI.

comment by Signer · 2022-12-26T18:26:49.859Z · LW(p) · GW(p)

For what it's worth, I disagree on moral grounds - I don't think extreme suffering is worse than extinction.

Replies from: andrew-sauer
comment by andrew sauer (andrew-sauer) · 2022-12-26T20:47:31.392Z · LW(p) · GW(p)

Then I don't think you understand how bad extreme suffering can get.

Any psychopathic idiot could make you beg to get it over with and kill you using only a set of pliers if they had you helpless. What more could an AGI do?

Replies from: Signer
comment by Signer · 2022-12-27T04:00:44.463Z · LW(p) · GW(p)

Any psychopathic idiot could also make you beg to torture others instead of you. Doesn't mean you can't model yourself as altruistic.

Replies from: andrew-sauer
comment by andrew sauer (andrew-sauer) · 2022-12-27T04:13:19.803Z · LW(p) · GW(p)

Do you really think you'd be wrong to want death in that case, if there were no hope whatsoever of rescue? Because that's what we're talking about in the analogous situation with AGI.

Replies from: Signer
comment by Signer · 2022-12-27T04:56:33.356Z · LW(p) · GW(p)

I mean, it's extrapolated ethics, so I'm not entirely sure and open to persuasion. But I certainly think it's wrong if there is any hope (and rescue by not dying is more probable than rescue by resurrection). And realistically there will be some hope - aliens could save us or something. If there's literally no hope and nothing good in tortured people's lifes, then I'm currently indifferent between that and them all dying.

Replies from: Equilibrate
comment by Eric Chen (Equilibrate) · 2022-12-27T13:10:18.375Z · LW(p) · GW(p)

What's the countervailing good that makes you indifferent between tortured lives and nonexistence? Presumably the extreme suffering is a bad that adds negative value to their lives. Do you think just existing or being conscious (regardless of the valence) is intrinsically very good?

Replies from: Signer
comment by Signer · 2022-12-27T14:34:12.254Z · LW(p) · GW(p)

I don't see a way to coherently model my "never accept death" policy with unbounded negative values for suffering - like you said, I'll need either infinitely negative value for death or something really good to counterbalance arbitrary suffering. So I use bounded function instead, with lowest point being death and suffering never lowering value below it (for example suffering can add multiplicative factors with value less than 1). I don't think "existing is very good" fits - the actual values for good things can be pretty low - it's just the effect of suffering on total value is bounded.

Replies from: Equilibrate
comment by Eric Chen (Equilibrate) · 2022-12-27T23:14:28.177Z · LW(p) · GW(p)

That's a coherent utility function, but it seems bizarre. When you're undergoing extreme suffering, in that moment you'd presumably prefer death to continuing to exist in suffering, almost by nature of what extreme suffering is. Why defer to your current preferences rather than your preferences in such moments? 

Also, are you claiming this is just your actual preferences or is this some ethical claim about axiology?

Replies from: Signer
comment by Signer · 2022-12-28T17:30:40.235Z · LW(p) · GW(p)

Why defer to your current preferences rather than your preferences in such moments?

I don't see why such moments should matter, than they don't matter for other preferences that are unstable under torture - when you’re undergoing extreme suffering you would prefer everyone else to suffering instead of just you, but that doesn't mean you shouldn't be altruistic.

I'm not committed to any specific formalization of my values, but yes, not wanting to die because of suffering is my preference.

Replies from: andrew-sauer
comment by andrew sauer (andrew-sauer) · 2023-04-01T23:48:54.618Z · LW(p) · GW(p)

Wait.. that's really your values on reflection?
 

Like, given the choice while lucid and not being tortured or coerced or anything, you'd rather burn in hell for all eternity than cease to exist? The fact that you will die eventually must be a truly horrible thing for you to contemplate...

Replies from: Signer
comment by Signer · 2023-04-02T03:11:23.596Z · LW(p) · GW(p)

Yes.