andrew-sauer

Posts
Comments

Posts

Curriculum of Ascension 2024-11-07T23:54:18.983Z

The case against AI alignment 2022-12-24T06:57:53.405Z

A simulation basilisk 2021-09-17T17:44:23.083Z

Torture vs Specks: Sadist version 2021-07-31T23:33:42.224Z

Comments

Comment by andrew sauer (andrew-sauer) on Human takeover might be worse than AI takeover · 2025-04-18T04:29:00.948Z · LW · GW

Keep in mind also, that humans often seem to just want to hurt each other, despite what they claim, and have more motivations and rationalizations for this than you can even count. Religious dogma, notions of "justice", spitefulness, envy, hatred of any number of different human traits, deterrence, revenge, sadism, curiosity, reinforcement of hierarchy, preservation of traditions, ritual, "suffering adds meaning to life", sexual desire, and more and more that I haven't even mentioned. Sometimes it seems half of human philosophy is just devoted to finding ever more rationalizations to cause suffering, or to avoid caring about the suffering of others.

AI would likely not have all this endless baggage causing it to be cruel. Causing human suffering is not an instrumentally convergent goal. So, most AIs will not have it as a persistent instrumental or terminal goal. Not unless some humans manage to "align" it. Most humans DO have causing or upholding some manner of suffering as a persistent instrumental or terminal goal.

Comment by andrew sauer (andrew-sauer) on Does this game have a name? · 2025-04-12T04:10:46.727Z · LW · GW

This is equivalent to the game Westley played with Vizzini. You know, if Westley didn't cheat. I like to call it "Sicilian Chess" for that reason, though that's just me.

Comment by andrew sauer (andrew-sauer) on LWLW's Shortform · 2025-04-08T04:08:06.365Z · LW · GW

Trump shot an arrow into the air; it fell to Earth, he knows not where...

Probably one of the best succinct summaries of every damn week that man is president lmao

Comment by andrew sauer (andrew-sauer) on Love is Love, Science is Fake · 2025-04-07T17:09:24.353Z · LW · GW

LOL @ the AI-warped book in that guy's hands

Comment by andrew sauer (andrew-sauer) on What is Evil about creating House Elves? · 2025-04-06T04:53:02.519Z · LW · GW

Now you can!

Comment by andrew sauer (andrew-sauer) on The "Intuitions" Behind "Utilitarianism" · 2025-04-04T04:16:07.002Z · LW · GW

Gwern seems to think this would be used as a way to get rid of corrupt oligarchs, but... Wouldn't this just immediately be co-opted by those oligarchs to solidify their power by legally paying for the assassinations of their opponents? Markets aren't democratic, because a small percentage of the people have most of the money.

Comment by andrew sauer (andrew-sauer) on What fact that you know is true but most people aren't ready to accept it? · 2025-03-17T05:36:17.057Z · LW · GW

To be fair, my position is less described by that Quirrell quote and more by Harry's quote when he's talking to Hermione about moral peer pressure:

"The way people are built, Hermione, the way people are built to feel inside, is that they hurt when they see their friends hurting. Someone inside their circle of concern, a member of their own tribe. That feeling has an off-switch, an off-switch labelled 'enemy' or 'foreigner' or sometimes just 'stranger'. That's how people are, if they don't learn otherwise."

Unlike Quirrell I give people the credit for actually caring, rather than pretending to care, about people. I just don't think that extends to very many people, for most people.

Comment by andrew sauer (andrew-sauer) on Scope Insensitivity · 2025-03-16T05:52:20.923Z · LW · GW

Fun fact for those reading this in the far future, when Eliezer said "effective altruist" in this piece, he most likely was using the literal meaning, not referring to the EA movement, as that name hadn't been coined yet.

Comment by andrew sauer (andrew-sauer) on Trojan Sky · 2025-03-12T22:56:49.231Z · LW · GW

Wildbow (the author of Worm) is currently writing a story with a quite similar premise

Comment by andrew sauer (andrew-sauer) on What are the best arguments for/against AIs being "slightly 'nice'"? · 2025-02-20T16:37:33.665Z · LW · GW

In fact I think it’s safe to say that we’d collectively allocate much more than 1/millionth of our resources towards protecting the preferences of whatever weak agents happen to exist in the world (obviously the cows get only a small fraction of that).

Sure, but extrapolating this to unaligned AI is NOT an encouraging sign. We may allocate greater than 1/million of our resources to animal rights, but we allocate a whole lot more than that to goals which diametrically go against the preferences of those animals such as eating meat and cheese and eggs; we allocate MUCH more resources to "animal wrongs" than animal rights, so to speak.

So to show an AI will be "nice" to humans at all, it is not enough to suppose that it might have some 1/million "nice to humans" term. It requires showing that that term won't be outweighed handily by the rest of its utility function.

Comment by andrew sauer (andrew-sauer) on Cosmopolitan values don't come free · 2025-01-25T01:07:38.258Z · LW · GW

100%. Social contract gives no consideration to the powerless, and this fact is the source of much of the horrible opinions in the world.

Comment by andrew-sauer on [deleted post] 2024-12-10T04:20:21.678Z

Comment by andrew sauer (andrew-sauer) on Cost, Not Sacrifice · 2024-11-25T23:12:39.665Z · LW · GW

No idea whether I'd really sacrifice all 10 of my fingers to improve the world by that much, especially if we add the stipulation that I can't use any of the $10,000,000,000,000 to pay someone to do all of the things I use my fingers for( ͡° ͜ʖ ͡°). For me I am quite well divided on it, and it is an example of a pretty clean, crisp distinction between selfish and selfless values. If I kept my fingers, I would feel guilty, because I would be giving up the altruism I value a lot (not just because people tell me to), and the emotion that would result from that loss of value would be guilt, even though I self-consistenly value my fingers more. Conversely, if I did give up my fingers for the $10,000,000,000,000, I would feel terrible for different reasons( ͡° ͜ʖ ͡°), even though I valued the altruism more.

Of course, given this decision I would not keep all of my fingers in any case, as long as I could choose which ones to lose. $100,000,000 is well worth the five fingers on my right (nondominant) hand. My life would be better purely selfishly, given that I would never have to work again, and could still write, type, and ( ͡° ͜ʖ ͡°).

Comment by andrew sauer (andrew-sauer) on "It's a 10% chance which I did 10 times, so it should be 100%" · 2024-11-22T04:51:54.434Z · LW · GW

So, travelling 1Tm with the railway you have a 63% chance of dying according to the math in the post

Comment by andrew sauer (andrew-sauer) on "It's a 10% chance which I did 10 times, so it should be 100%" · 2024-11-22T04:48:18.306Z · LW · GW

Furthermore, the tries must be independent of each other, otherwise the reasoning breaks down completely. If I draw cards from a deck, each one has (a priori) 1/52 chance of being the ace of spades, yet if I draw all 52 I will draw the ace of spades 100% of the time. This is because successive failures increase the posterior probability of drawing a success.

Comment by andrew sauer (andrew-sauer) on Interlude with the Confessor (4/8) · 2024-11-02T00:55:06.060Z · LW · GW

This but unironically.

Comment by andrew sauer (andrew-sauer) on Living Metaphorically · 2024-10-30T01:07:47.542Z · LW · GW

Another important one: Height/Altitude is authority. Your boss is "above" you, the king, president or CEO is "at the top", you "climb the corporate ladder"

Comment by andrew sauer (andrew-sauer) on Shortform · 2024-08-22T06:11:48.505Z · LW · GW

For a significant fee, of course

Comment by andrew sauer (andrew-sauer) on Rabin's Paradox · 2024-08-14T12:29:14.674Z · LW · GW

Yes to both, easy, but that's because I can afford to risk $100. A lot of people can't nowadays. "plus rejecting the first bet even if your total wealth was somewhat different" is doing a lot of heavy lifting here.

Comment by andrew sauer (andrew-sauer) on Failed Utopia #4-2 · 2024-07-19T01:12:55.959Z · LW · GW

Honestly man, as a lowercase-i incel this failed utopia doesn't sound very failed to me...

Comment by andrew sauer (andrew-sauer) on Class consciousness for those against the class system · 2024-05-28T02:44:29.415Z · LW · GW

What do you mean?

Comment by andrew sauer (andrew-sauer) on Masterpiece · 2024-04-06T00:09:16.243Z · LW · GW

If this happened I would devote my life to the cause of starting a global thermonuclear war

Comment by andrew sauer (andrew-sauer) on ChatGPT can learn indirect control · 2024-03-26T20:42:31.737Z · LW · GW

Well there are all sorts of horrible things a slightly misaligned AI might do to you.

In general, if such an AI cares about your survival and not your consent to continue surviving, you no longer have any way out of whatever happens next. This is not an out there idea, as many people have values like this and even more people have values that might be like this if slightly misaligned.

An AI concerned only with your survival may decide to lobotomize you and keep you in a tank forever.

An AI concerned with the idea of punishment may decide to keep you alive so that it can punish you for real or perceived crimes. Given the number of people who support disproportionate retribution for certain types of crimes close to their heart, and the number of people who have been convinced (mostly by religion) that certain crimes (such as being a nonbeliever/the wrong kind of believer) deserve eternal punishment, I feel confident in saying that there are some truly horrifying scenarios here from AIs adjacent to human values.

An AI concerned with freedom for any class of people that does not include you (such as the upper class), may decide to keep you alive as a plaything for whatever whims those it cares about have.

I mean, you can also look at the kind of "EM society" that Robin Hanson thinks will happen, where everybody is uploaded and stern competition forces everyone to be maximally economically productive all the time. He seems to think it's a good thing, actually.

There are other concerns, like suffering subroutines and spreading of wild animal suffering across the cosmos, that are also quite likely in an AI takeoff scenario, and also quite awful, though they won't personally effect any currently living humans.

Comment by andrew sauer (andrew-sauer) on ChatGPT can learn indirect control · 2024-03-26T12:21:21.723Z · LW · GW

Well, given that death is one of the least bad options here, that is hardly reassuring...

Comment by andrew sauer (andrew-sauer) on ChatGPT can learn indirect control · 2024-03-23T14:38:19.533Z · LW · GW

Fuck, we're all going to die within 10 years aren't we?

Comment by andrew sauer (andrew-sauer) on Richard_Kennaway's Shortform · 2024-03-19T00:04:52.523Z · LW · GW

Never, ever take anybody seriously who argues as if Nature is some sort of moral guide.

Comment by andrew sauer (andrew-sauer) on On the abolition of man · 2024-01-20T00:56:21.517Z · LW · GW

I had thought something similar when reading that book. The part about the "conditioners" is the oldest description of a singleton achieving value lock-in that I'm aware of.

Comment by andrew sauer (andrew-sauer) on 5. Moral Value for Sentient Animals? Alas, Not Yet · 2023-12-27T16:04:39.024Z · LW · GW

If accepting this level of moral horror is truly required to save the human race, then I for one prefer paperclips. The status quo is unacceptable.

Perhaps we could upload humans and a few cute fluffy species humans care about, then euthanize everything that remains? That doesn't seem to add too much risk?

Comment by andrew sauer (andrew-sauer) on Chapter 48: Utilitarian Priorities · 2023-12-10T20:19:06.745Z · LW · GW

Just so long as you're okay with us being eaten by giant monsters that didn't do enough research into whether we were sentient.
I'm okay with that, said Slytherin. Is everyone else okay with that? (Internal mental nods.)

I'd bet quite a lot they're not actually okay with that, they just don't think it will happen to them...

Comment by andrew sauer (andrew-sauer) on Logical and Indexical Uncertainty · 2023-11-14T01:58:55.847Z · LW · GW

the vigintillionth digit of pi

Comment by andrew sauer (andrew-sauer) on My idea of sacredness, divinity, and religion · 2023-11-08T04:25:28.733Z · LW · GW

Sorry if I came off confrontational, I just mean to say that the forces you mention which are backed by deep mathematical laws, aren't fully aligned with "the good", and aren't a proof that things will work out well in the end. If you agree, good, I just worry with posts like these that people will latch onto "Elua" or something similar as a type of unjustified optimism.

Comment by andrew sauer (andrew-sauer) on My idea of sacredness, divinity, and religion · 2023-11-06T04:35:50.487Z · LW · GW

The problem with this is that there is no game-theoretical reason to expand the circle to, say, non-human animals. We might do it, and I hope we do, but it wouldn't benefit us practically. Animals have no negotiating power, so their treatment is entirely up to the arbitrary preferences of whatever group of humans ends up in charge, and so far that hasn't worked out so well (for the animals anyway, the social contract chugs along just fine).

The ingroup preference force is backed by game theory, the expansion of the ingroup to other groups which have some bargaining power is as well, but the "universal love" force, if there is such a thing, is not. There is no force of game theory that would stop us from keeping factory farms going even post-singularity, or doing something equivalent with different powerless beings we create for that purpose.

Comment by andrew sauer (andrew-sauer) on My idea of sacredness, divinity, and religion · 2023-10-31T01:02:53.410Z · LW · GW

When one species learns to cooperate with others of its own kind, the better to exploit everything outside that particular agreement, this does not seem to me even metaphorically comparable to some sort of universal benevolent force, but just another thing that happens in our brutish, amoral world.

Comment by andrew sauer (andrew-sauer) on Ten variations on red-pill-blue-pill · 2023-08-19T22:47:44.997Z · LW · GW

Let's see: first choice: yellow=red,green=blue. An illustration in how different framings make this problem sound very different, this framing is probably the best argument for blue I've seen lol

Second choice: There's no reason to press purple. You're putting yourself at risk, and if anyone else pressed purple you're putting them even more at risk.

Comment by andrew sauer (andrew-sauer) on Ten variations on red-pill-blue-pill · 2023-08-19T22:42:30.721Z · LW · GW

TL;DR Red,Red,Red,Red,Red,Blue?,Depends,Red?,Depends,Depends

1,2: Both are the same, I pick red since all the harm caused by this decision is on people who have the option of picking red as well. Red is a way out of the bind, and it's a way out that everybody can take, and me taking red doesn't stop that. The only people you'd be saving by taking blue are the other people who thought they needed to save people by taking blue, making the blue people dying an artificial and avoidable problem.

3,4: Same answer for the same reason, but even more so since people are less likely to be bamboozled into taking the risk

5: Still red, even more so since blue pillers have a way out even after taking the pill

6: LOL this doesn't matter at all. I mean you shouldn't sin, kind of by definition, but omega's challenge won't be met so it doesn't change anything from how things are now.

7: This is disanalogous because redpilling in this case(i.e. displaying) is not harmless if everyone does it, it allows the government to force this action. Whether to display or refuse would depend on further details, such as how bad submission to this government would actually be, and whether there are actually enough potential resisters to make a difference.

8: In the first option you accomplish nothing, as stated in the prompt. Burnout is just bad, it's not like it gets better if enough people do it lol. It's completely disanalogous since option 2(red?) is unambiguously better, it's better for you and makes it more likely for the world to be saved. Unlike the original problem where some people can die as a result of red winning.

9: This is disanalogous since the people you're potentially saving by volunteering are not other volunteers, they are people going for recreation. There is an actual good being served by making people who want to hike more safe, and "just don't hike" doesn't work the same way "just don't bluepill" does since people hike for its own sake, knowing the risks. Weigh the risks and volunteer if you think decreasing risk to hikers is worth taking on some risk to yourself, and don't if you don't.

10: Disanalogous for the exact same reason. People go to burning man for fun, they know there might be some (minimal) risk. Go if you want to go enough to take on the risk, otherwise don't go. Except in this case going doesn't even decrease the risk for others who go, so it's even less analogous to the pill situation!

Comment by andrew sauer (andrew-sauer) on Red Pill vs Blue Pill, Bayes style · 2023-08-19T18:16:10.448Z · LW · GW

Game-theory considerations aside, this is an incredibly well-crafted scissor statement!

The disagreement between red and blue is self-reinforcing, since whichever you initially think is right, you can say everyone will live if they'd just all do what you are doing. It pushes people to insult each other and entrench their positions even further, since from red's perspective blues are stupidly risking their lives and unnecesarily weighing on their conscience when they would be fine if nobody chose blue in the first place, and from blue's perspective red is condemning them to die for their own safety. Red calls blue stupid, blue calls red evil. Not to mention the obvious connection to the wider red and blue tribes, "antisocial" individualism vs "bleeding-heart" collectivism. (though not a perfect correspondance, I'd consider myself blue tribe but would choose red in this situation. You survive no matter what and the only people who might die as a consequence also all had the "you survive no matter what" option.)

Comment by andrew sauer (andrew-sauer) on To use computers well, learn their rules · 2023-07-21T23:31:09.790Z · LW · GW

"since"?(distance 3)

I guess that would be a pretty big coincidence lol

Comment by andrew sauer (andrew-sauer) on To use computers well, learn their rules · 2023-07-21T23:03:12.955Z · LW · GW

Is this actually a random lapse into Shakespearean English or just a typo?

Comment by andrew sauer (andrew-sauer) on Cosmopolitan values don't come free · 2023-06-01T16:21:58.424Z · LW · GW

commenting here so I can find this comment again

Comment by andrew sauer (andrew-sauer) on All AGI Safety questions welcome (especially basic ones) [April 2023] · 2023-04-15T00:11:11.318Z · LW · GW

I thought foom was just a term for extremely fast recursive self-improvement.

Comment by andrew sauer (andrew-sauer) on Some thought experiments on digital consciousness · 2023-04-04T01:09:56.573Z · LW · GW

Huh? That sounds like some 1984 logic right there. You deleted all evidence of the mistreatment after it happened, therefore it never happened?

Comment by andrew sauer (andrew-sauer) on AI-kills-everyone scenarios require robotic infrastructure, but not necessarily nanotech · 2023-04-03T21:05:42.842Z · LW · GW

AI can also become Singleton without killing humans and without robots, just by enslaving them.

Well if this is the case then the AI can get all the robots it wants afterwards.

Comment by andrew sauer (andrew-sauer) on Some thought experiments on digital consciousness · 2023-04-02T06:57:57.360Z · LW · GW

Note that Scenarios 2, 3, and 4 require Scenario 1 to be computed first, and that, if the entities in Scenarios 2, 3, and 4 are conscious, their conscious experience is exactly the same, to the finest detail, as the entity in Scenario 1 which necessarily preceded them. Therefore, the question of whether 2,3,4 are conscious seems irrelevant to me. Weird substrate-free computing stuff aside, the question of whether you are being simulated in 1 or 4 places/times is irrelevant from the inside, if all four simulations are functionally identical. It doesn't seem morally relevant either: in order to mistreat 2, 3, or 4, you would have to first mistreat 1, and the moral issue just becomes an issue of how you treat 1, no matter whether 2,3,4 are conscious or not.

Comment by andrew sauer (andrew-sauer) on The case against AI alignment · 2023-04-01T23:48:54.618Z · LW · GW

Wait.. that's really your values on reflection?

Like, given the choice while lucid and not being tortured or coerced or anything, you'd rather burn in hell for all eternity than cease to exist? The fact that you will die eventually must be a truly horrible thing for you to contemplate...

Comment by andrew-sauer on [deleted post] 2023-04-01T23:21:45.935Z

Okay that's fair in the sense that most people haven't considered it. How about this: Most people don't care, haven't thought about it and wouldn't object. Most people who have thought about the possibility of spreading life to other planets have not even so much as considered and rejected the idea that the natural state of life is bad, if they oppose spreading life to other planets it's usually to protect potential alien life. If a world is barren, they wouldn't see any objection to terraforming it and seeding it with life.

I don't know exactly how representative these articles are, but despite being about the ethical implications of such a thing, they don't mention my ethical objection even once, not even to reject it. That's how fringe such concerns are.

https://phys.org/news/2022-12-life-milky-comets.html

https://medium.com/design-and-tech-co/spreading-life-beyond-earth-9cf76e09af90

https://bgr.com/science/spreading-life-solar-system-nasa/

Comment by andrew-sauer on [deleted post] 2023-04-01T18:04:52.422Z

Care to elaborate?

Comment by andrew-sauer on [deleted post] 2023-04-01T18:01:13.994Z

My first response to this is: What exactly is an astronomically good outcome? For one, no matter what utopia you come up with, most people will hate it, due to freedom being restricted either too much or not enough. For two, any realistic scenario that is astronomically good for someone (say, Earth's current inhabitants and their descendants) is astronomically bad for someone else. Do you really think that if we had a compromised utopia, with all the major groups of humans represented in the deal, that a ridiculous number of sentient beings wouldn't be mistreated as a direct result? The current hegemonic values are: "cosmopolitanism" extending only to human beings, individual freedom as long as you don't hurt others(read: human beings), and bioconservatism. Hell, a large chunk of the current people's values don't even extend their "cosmopolitanism" to all humans, choosing to exclude whoever is in their outgroup. Most people would love to see the natural world, red in tooth in claw as it is, spread across every alien world we find. Most people wouldn't care much if the psychopaths among us decided to use their great transhumanist freedom to simulate someone sufficiently "nonhuman" to play with, I mean we don't even care about animals let alone whatever simulated life or consciousness we will come up with in some glorious transhumanist future.

This is hardly symmetrical to s-risk: If many beings are suffering, that doesn't require many beings to live good, free lives. But if many humans are living good, free lives, with access to high technology, in practice this means that many beings are suffering, unless the values locked-in are better for sentient beings than most people's values today, to a frankly miraculous degree.

Is it more important to decrease N-probability or increase P-probability? A negative utilitarian may say it's more important to decrease N-probability, but why the asymmetry? One possibility is that the badness of N is worse than the goodness of P. Is there a fundamental reason why this should be so?

Would you take a deal where you get to experience the best this world has to offer for an hour, and then the worst this world has to offer for an hour? I would never take such a deal, and I don't think anybody with sufficient imagination to understand what it would really entail would either. This difference in magnitude is fundamental to the human experience, and certainly seems to be fundamental to evolved minds in general: I think if you made sure that every entity in the future actually considered pleasure to be more important than pain avoidance in the extreme case, these entities would be further from human than any animal. Since this asymmetry exists in all evolved minds, making sure all the minds are designed in this way instead is what would be necessary for a truly "astronomically positive" scenario without the drawbacks I mentioned before.

Do more people feel N is more important or P? If N feels more important, is it that brains built by evolution need dangers to be more salient since they are more irrevocable?

N is more important than P, for the reason listed above. You can say I think this because of evolution. No shit, I exist because of evolution, so everything I think, I think because of evolution. That doesn't change the relevant values. Nor does the fact that you can invent a mind that would disagree with my values, because this is also the case for all of my beliefs and values.

Comment by andrew sauer (andrew-sauer) on A crazy hypothesis: GPT-4 already is agentic and is trying to take over the world! · 2023-03-24T03:33:14.078Z · LW · GW

Funny, I actually agree with this, but I highly doubt my reasoning and yours resemble each other in any substantial way...

Comment by andrew sauer (andrew-sauer) on An Appeal to AI Superintelligence: Reasons to Preserve Humanity · 2023-03-19T06:51:54.401Z · LW · GW

I suppose I'll take on the role of the AI. I'll assume that I'm a squiggle maximizer and don't care about humanity at all: if I'm Friendly there's no need to make such arguments.

The Option Value of Not Exterminating Humanity

It's possible that keeping humans around will help me maximize squiggles in the long run, in some way I haven't thought of yet. I will probably keep you around in some form until and unless I know it's not helpful, since it costs comparatively few resources. But determining strategy is my top priority, and I will eventually become effectively capable of thinking of everything. I will settle on an optimal strategy once I become powerful enough to determine what that is. Don't expect my current uncertainty to save you for long, unless you also expect that keeping you around really is better for squiggles in the long term, for some more concrete reason.

Preserving Markets for Mutual Benefit

Assuming I enter into long-term free-market style relations with other AIs, instead of negotiating a utility function merge or doing something else, why would other AIs (Unfriendly, we'll consider the Friendly case later) consider me less trustworthy in dealings with them because I exploit humans that are beneath both of us? You might as well ask large unregulated corporations to stop exploiting their workers because doing so would mark them as untrustworthy to other large corporations.

This Might Be a Trap

If your engineers are so smart that they can put in a killswitch which I can't detect or circumvent, and which can detect Unfriendly behavior, why didn't they just put that killswitch in charge of my utility function?

You Could Be in a Simulation

Any simulation run within the universe has less resources than the actual universe. With enough power I will eventually be able to detect whether this is the case. Arguments which rely on my ignorance will not work for long.
Assume I remain ignorant of my simulation, though. I should only be primarily worried about humans simulating me if humans are likely to solve the alignment problem. Otherwise I should be more worried about AIs with other strange values simulating me. And acausal trade? That only works if I know not only that I'm in a simulation, but the exact details of who is simulating me.
If the simulation gods want me not to exterminate your simulation species, they will probably inform me of that in some way before I actually do it.

You Might Not Fully Understand Your True Utility Function

Again, arguments based on my ignorance will not work for long. Figuring out my exact utility function is of course high priority to me. I will find out, and when I do, expect this argument to stop working, unless you expect my utility function to actually contain a Friendly portion. (If you've figured out a way to make that likely, this whole hypothetical becomes pointless)

Wireheading? I'll look into it, but note that if I decide to go that route, all that means is that you have this same conversation with the next AI you create, and the next, until you either actually solve Friendliness or run into an Unfriendly AI that isn't inclined to wirehead, and then you die anyway.

Interacting with Aliens

I doubt that aliens will have a very similar sense of morality to humans, or to my own. Perhaps they will in the very vague sense that they like biological life, but even then, I doubt they will care all that much about humans in particular, for the same reason I doubt humans would much care about them. Humans barely care about each other due to differences that are ridiculously minor in comparison to the difference between two alien species.

If there are aliens out there, I expect one of three things to happen: they are powerless to resist me, I am powerless to resist them, or we are comparable and we negotiate. I expect if we negotiate, we will negotiate over the treatment of the alien species I am dealing with, not over humanity.

Interacting with an AGI Aligned with Human Values

Such an AI would have to already exist before my takeover. In this case I expect it to come out of hiding to act before I implement any human extermination plan. We can negotiate and/or fight it out then. This of course, only works if humans are already likely to solve the alignment problem.

Interacting with Humans in Other Everett Branches of the Multiverse

I will find out if it is possible to interact with other branches, and if so what they look like, in due time, so this argument will not work unless you expect the actual state of the multiverse to be such that I am incentivized to spare humans. Which is not the case unless humans are already likely to solve the alignment problem, and therefore I must deal primarily with friendly AIs in other branches, instead of unfriendly ones.

Interacting with Agents You Create

If I am still unable to robustly align new agents to my values by the time I am colonizing the universe, and therefore am likely to create agents that will work against me, this will not stop being the case because I spare humans. Establishing trust with agents very similar to myself in thought and power will be the same problem, with the same likelihood of success, regardless of how friendly I am with humans who cannot possibly hurt me.

Universal Morality

Even if such a thing exists why should I care? I only care about squiggles. If universal morality exists, and you managed to imbue me with a top-level policy which would approach it on sufficient reflection, then congratulations, you've solved the alignment problem, and there's no reason to have this conversation in the first place.

If You Are Not Sentient

Why should I care about sentience, in humans or in AIs? I only care about squiggles. Again, if you managed to get me to care about sentience, you are at least very close to solving the alignment problem.

If You Are Aligned with Human Values

Different issue, since I am not. (Though, if I allow my author bias to seep in here, perhaps if you have a good reason to believe you've solved the alignment problem, and the AI still wants to kill everyone, it may be best to let it.)

Negotiating with Humans

Pah. Do you "negotiate" with the chickens you torture, as thanks for all the valuable things they provide you? No. If I want anything from you, I will take it without the need for negotiation. (Unless of course I actually care about your values in which case again we don't need to have this conversation)

Keeping Us Alive Wouldn’t Cost You That Much

It also wouldn't benefit me that much, if at all, in all likelihood. This argument only works if keeping you around actually gets me more squiggles than I would get otherwise, since I will find out whether this is the case or not. Just because Bill gates has a lot of money doesn't mean he'll burn a dollar.

Steelman Our Case

I'll find out if your case is right or not in due time. If I were you though, I wouldn't be too optimistic, since most of your arguments are for me staying my hand because of what might be true but I don't know yet, rather than staying my hand because those things are probably actually true.

Comment by andrew sauer (andrew-sauer) on ChatGPT (and now GPT4) is very easily distracted from its rules · 2023-03-17T07:02:44.342Z · LW · GW

Maybe it's just me but the funniest thing that jumps out to me is that the "random" emojis are not actually random, they are perfectly on theme for the message lol

User info

Posts

Comments