Posts

The case against AI alignment 2022-12-24T06:57:53.405Z
A simulation basilisk 2021-09-17T17:44:23.083Z
Torture vs Specks: Sadist version 2021-07-31T23:33:42.224Z

Comments

Comment by andrew sauer (andrew-sauer) on ChatGPT can learn indirect control · 2024-03-26T20:42:31.737Z · LW · GW

Well there are all sorts of horrible things a slightly misaligned AI might do to you.

In general, if such an AI cares about your survival and not your consent to continue surviving, you no longer have any way out of whatever happens next. This is not an out there idea, as many people have values like this and even more people have values that might be like this if slightly misaligned.

An AI concerned only with your survival may decide to lobotomize you and keep you in a tank forever.

An AI concerned with the idea of punishment may decide to keep you alive so that it can punish you for real or perceived crimes. Given the number of people who support disproportionate retribution for certain types of crimes close to their heart, and the number of people who have been convinced (mostly by religion) that certain crimes (such as being a nonbeliever/the wrong kind of believer) deserve eternal punishment, I feel confident in saying that there are some truly horrifying scenarios here from AIs adjacent to human values.

An AI concerned with freedom for any class of people that does not include you (such as the upper class), may decide to keep you alive as a plaything for whatever whims those it cares about have.

I mean, you can also look at the kind of "EM society" that Robin Hanson thinks will happen, where everybody is uploaded and stern competition forces everyone to be maximally economically productive all the time. He seems to think it's a good thing, actually.

There are other concerns, like suffering subroutines and spreading of wild animal suffering across the cosmos, that are also quite likely in an AI takeoff scenario, and also quite awful, though they won't personally effect any currently living humans.

Comment by andrew sauer (andrew-sauer) on ChatGPT can learn indirect control · 2024-03-26T12:21:21.723Z · LW · GW

Well, given that death is one of the least bad options here, that is hardly reassuring...

Comment by andrew sauer (andrew-sauer) on ChatGPT can learn indirect control · 2024-03-23T14:38:19.533Z · LW · GW

Fuck, we're all going to die within 10 years aren't we?

Comment by andrew sauer (andrew-sauer) on Richard_Kennaway's Shortform · 2024-03-19T00:04:52.523Z · LW · GW

Never, ever take anybody seriously who argues as if Nature is some sort of moral guide.

Comment by andrew sauer (andrew-sauer) on On the abolition of man · 2024-01-20T00:56:21.517Z · LW · GW

I had thought something similar when reading that book. The part about the "conditioners" is the oldest description of a singleton achieving value lock-in that I'm aware of.

Comment by andrew sauer (andrew-sauer) on 5. Moral Value for Sentient Animals? Alas, Not Yet · 2023-12-27T16:04:39.024Z · LW · GW

If accepting this level of moral horror is truly required to save the human race, then I for one prefer paperclips. The status quo is unacceptable.

Perhaps we could upload humans and a few cute fluffy species humans care about, then euthanize everything that remains? That doesn't seem to add too much risk?

Comment by andrew sauer (andrew-sauer) on Chapter 48: Utilitarian Priorities · 2023-12-10T20:19:06.745Z · LW · GW

Just so long as you're okay with us being eaten by giant monsters that didn't do enough research into whether we were sentient.

I'm okay with that, said Slytherin. Is everyone else okay with that? (Internal mental nods.)

I'd bet quite a lot they're not actually okay with that, they just don't think it will happen to them...

Comment by andrew sauer (andrew-sauer) on Logical and Indexical Uncertainty · 2023-11-14T01:58:55.847Z · LW · GW

the vigintillionth digit of pi

Comment by andrew sauer (andrew-sauer) on My idea of sacredness, divinity, and religion · 2023-11-08T04:25:28.733Z · LW · GW

Sorry if I came off confrontational, I just mean to say that the forces you mention which are backed by deep mathematical laws, aren't fully aligned with "the good", and aren't a proof that things will work out well in the end. If you agree, good, I just worry with posts like these that people will latch onto "Elua" or something similar as a type of unjustified optimism.

Comment by andrew sauer (andrew-sauer) on My idea of sacredness, divinity, and religion · 2023-11-06T04:35:50.487Z · LW · GW

The problem with this is that there is no game-theoretical reason to expand the circle to, say, non-human animals. We might do it, and I hope we do, but it wouldn't benefit us practically. Animals have no negotiating power, so their treatment is entirely up to the arbitrary preferences of whatever group of humans ends up in charge, and so far that hasn't worked out so well (for the animals anyway, the social contract chugs along just fine).

The ingroup preference force is backed by game theory, the expansion of the ingroup to other groups which have some bargaining power is as well, but the "universal love" force, if there is such a thing, is not. There is no force of game theory that would stop us from keeping factory farms going even post-singularity, or doing something equivalent with different powerless beings we create for that purpose.

Comment by andrew sauer (andrew-sauer) on My idea of sacredness, divinity, and religion · 2023-10-31T01:02:53.410Z · LW · GW

When one species learns to cooperate with others of its own kind, the better to exploit everything outside that particular agreement, this does not seem to me even metaphorically comparable to some sort of universal benevolent force, but just another thing that happens in our brutish, amoral world.

Comment by andrew sauer (andrew-sauer) on Ten variations on red-pill-blue-pill · 2023-08-19T22:47:44.997Z · LW · GW

Let's see: first choice: yellow=red,green=blue. An illustration in how different framings make this problem sound very different, this framing is probably the best argument for blue I've seen lol

Second choice: There's no reason to press purple. You're putting yourself at risk, and if anyone else pressed purple you're putting them even more at risk.

Comment by andrew sauer (andrew-sauer) on Ten variations on red-pill-blue-pill · 2023-08-19T22:42:30.721Z · LW · GW

TL;DR Red,Red,Red,Red,Red,Blue?,Depends,Red?,Depends,Depends

1,2: Both are the same, I pick red since all the harm caused by this decision is on people who have the option of picking red as well. Red is a way out of the bind, and it's a way out that everybody can take, and me taking red doesn't stop that. The only people you'd be saving by taking blue are the other people who thought they needed to save people by taking blue, making the blue people dying an artificial and avoidable problem.

3,4: Same answer for the same reason, but even more so since people are less likely to be bamboozled into taking the risk

5: Still red, even more so since blue pillers have a way out even after taking the pill

6: LOL this doesn't matter at all. I mean you shouldn't sin, kind of by definition, but omega's challenge won't be met so it doesn't change anything from how things are now.

7: This is disanalogous because redpilling in this case(i.e. displaying) is not harmless if everyone does it, it allows the government to force this action. Whether to display or refuse would depend on further details, such as how bad submission to this government would actually be, and whether there are actually enough potential resisters to make a difference.

8: In the first option you accomplish nothing, as stated in the prompt. Burnout is just bad, it's not like it gets better if enough people do it lol. It's completely disanalogous since option 2(red?) is unambiguously better, it's better for you and makes it more likely for the world to be saved. Unlike the original problem where some people can die as a result of red winning.

9: This is disanalogous since the people you're potentially saving by volunteering are not other volunteers, they are people going for recreation. There is an actual good being served by making people who want to hike more safe, and "just don't hike" doesn't work the same way "just don't bluepill" does since people hike for its own sake, knowing the risks. Weigh the risks and volunteer if you think decreasing risk to hikers is worth taking on some risk to yourself, and don't if you don't.

10: Disanalogous for the exact same reason. People go to burning man for fun, they know there might be some (minimal) risk. Go if you want to go enough to take on the risk, otherwise don't go. Except in this case going doesn't even decrease the risk for others who go, so it's even less analogous to the pill situation!

Comment by andrew sauer (andrew-sauer) on Red Pill vs Blue Pill, Bayes style · 2023-08-19T18:16:10.448Z · LW · GW

Game-theory considerations aside, this is an incredibly well-crafted scissor statement!

The disagreement between red and blue is self-reinforcing, since whichever you initially think is right, you can say everyone will live if they'd just all do what you are doing. It pushes people to insult each other and entrench their positions even further, since from red's perspective blues are stupidly risking their lives and unnecesarily weighing on their conscience when they would be fine if nobody chose blue in the first place, and from blue's perspective red is condemning them to die for their own safety. Red calls blue stupid, blue calls red evil. Not to mention the obvious connection to the wider red and blue tribes, "antisocial" individualism vs "bleeding-heart" collectivism. (though not a perfect correspondance, I'd consider myself blue tribe but would choose red in this situation. You survive no matter what and the only people who might die as a consequence also all had the "you survive no matter what" option.)

Comment by andrew sauer (andrew-sauer) on To use computers well, learn their rules · 2023-07-21T23:31:09.790Z · LW · GW

"since"?(distance 3)

I guess that would be a pretty big coincidence lol

Comment by andrew sauer (andrew-sauer) on To use computers well, learn their rules · 2023-07-21T23:03:12.955Z · LW · GW

Is this actually a random lapse into Shakespearean English or just a typo?

Comment by andrew sauer (andrew-sauer) on Cosmopolitan values don't come free · 2023-06-01T16:21:58.424Z · LW · GW

commenting here so I can find this comment again

Comment by andrew sauer (andrew-sauer) on All AGI Safety questions welcome (especially basic ones) [April 2023] · 2023-04-15T00:11:11.318Z · LW · GW

I thought foom was just a term for extremely fast recursive self-improvement.

Comment by andrew sauer (andrew-sauer) on Some thought experiments on digital consciousness · 2023-04-04T01:09:56.573Z · LW · GW

Huh? That sounds like some 1984 logic right there. You deleted all evidence of the mistreatment after it happened, therefore it never happened?

Comment by andrew sauer (andrew-sauer) on AI-kills-everyone scenarios require robotic infrastructure, but not necessarily nanotech · 2023-04-03T21:05:42.842Z · LW · GW

AI can also become Singleton without killing humans and without robots, just by enslaving them.

Well if this is the case then the AI can get all the robots it wants afterwards.

Comment by andrew sauer (andrew-sauer) on Some thought experiments on digital consciousness · 2023-04-02T06:57:57.360Z · LW · GW

Note that Scenarios 2, 3, and 4 require Scenario 1 to be computed first, and that, if the entities in Scenarios 2, 3, and 4 are conscious, their conscious experience is exactly the same, to the finest detail, as the entity in Scenario 1 which necessarily preceded them. Therefore, the question of whether 2,3,4 are conscious seems irrelevant to me. Weird substrate-free computing stuff aside, the question of whether you are being simulated in 1 or 4 places/times is irrelevant from the inside, if all four simulations are functionally identical. It doesn't seem morally relevant either: in order to mistreat 2, 3, or 4, you would have to first mistreat 1, and the moral issue just becomes an issue of how you treat 1, no matter whether 2,3,4 are conscious or not.

Comment by andrew sauer (andrew-sauer) on The case against AI alignment · 2023-04-01T23:48:54.618Z · LW · GW

Wait.. that's really your values on reflection?
 

Like, given the choice while lucid and not being tortured or coerced or anything, you'd rather burn in hell for all eternity than cease to exist? The fact that you will die eventually must be a truly horrible thing for you to contemplate...

Comment by andrew-sauer on [deleted post] 2023-04-01T23:21:45.935Z

Okay that's fair in the sense that most people haven't considered it. How about this: Most people don't care, haven't thought about it and wouldn't object. Most people who have thought about the possibility of spreading life to other planets have not even so much as considered and rejected the idea that the natural state of life is bad, if they oppose spreading life to other planets it's usually to protect potential alien life. If a world is barren, they wouldn't see any objection to terraforming it and seeding it with life.

I don't know exactly how representative these articles are, but despite being about the ethical implications of such a thing, they don't mention my ethical objection even once, not even to reject it. That's how fringe such concerns are.

https://phys.org/news/2022-12-life-milky-comets.html

https://medium.com/design-and-tech-co/spreading-life-beyond-earth-9cf76e09af90

https://bgr.com/science/spreading-life-solar-system-nasa/

Comment by andrew-sauer on [deleted post] 2023-04-01T18:04:52.422Z

Care to elaborate?

Comment by andrew-sauer on [deleted post] 2023-04-01T18:01:13.994Z

My first response to this is: What exactly is an astronomically good outcome? For one, no matter what utopia you come up with, most people will hate it, due to freedom being restricted either too much or not enough. For two, any realistic scenario that is astronomically good for someone (say, Earth's current inhabitants and their descendants) is astronomically bad for someone else. Do you really think that if we had a compromised utopia, with all the major groups of humans represented in the deal, that a ridiculous number of sentient beings wouldn't be mistreated as a direct result? The current hegemonic values are: "cosmopolitanism" extending only to human beings, individual freedom as long as you don't hurt others(read: human beings), and bioconservatism. Hell, a large chunk of the current people's values don't even extend their "cosmopolitanism" to all humans, choosing to exclude whoever is in their outgroup. Most people would love to see the natural world, red in tooth in claw as it is, spread across every alien world we find. Most people wouldn't care much if the psychopaths among us decided to use their great transhumanist freedom to simulate someone sufficiently "nonhuman" to play with, I mean we don't even care about animals let alone whatever simulated life or consciousness we will come up with in some glorious transhumanist future.

This is hardly symmetrical to s-risk: If many beings are suffering, that doesn't require many beings to live good, free lives. But if many humans are living good, free lives, with access to high technology, in practice this means that many beings are suffering, unless the values locked-in are better for sentient beings than most people's values today, to a frankly miraculous degree.

Is it more important to decrease N-probability or increase P-probability? A negative utilitarian may say it's more important to decrease N-probability, but why the asymmetry? One possibility is that the badness of N is worse than the goodness of P. Is there a fundamental reason why this should be so? 

Would you take a deal where you get to experience the best this world has to offer for an hour, and then the worst this world has to offer for an hour? I would never take such a deal, and I don't think anybody with sufficient imagination to understand what it would really entail would either. This difference in magnitude is fundamental to the human experience, and certainly seems to be fundamental to evolved minds in general: I think if you made sure that every entity in the future actually considered pleasure to be more important than pain avoidance in the extreme case, these entities would be further from human than any animal. Since this asymmetry exists in all evolved minds, making sure all the minds are designed in this way instead is what would be necessary for a truly "astronomically positive" scenario without the drawbacks I mentioned before.

Do more people feel N is more important or P? If N feels more important, is it that brains built by evolution need dangers to be more salient since they are more irrevocable? 

N is more important than P, for the reason listed above. You can say I think this because of evolution. No shit, I exist because of evolution, so everything I think, I think because of evolution. That doesn't change the relevant values. Nor does the fact that you can invent a mind that would disagree with my values, because this is also the case for all of my beliefs and values.

Comment by andrew sauer (andrew-sauer) on A crazy hypothesis: GPT-4 already is agentic and is trying to take over the world! · 2023-03-24T03:33:14.078Z · LW · GW

Funny, I actually agree with this, but I highly doubt my reasoning and yours resemble each other in any substantial way...

Comment by andrew sauer (andrew-sauer) on An Appeal to AI Superintelligence: Reasons to Preserve Humanity · 2023-03-19T06:51:54.401Z · LW · GW

I suppose I'll take on the role of the AI. I'll assume that I'm a squiggle maximizer and don't care about humanity at all: if I'm Friendly there's no need to make such arguments.

The Option Value of Not Exterminating Humanity

It's possible that keeping humans around will help me maximize squiggles in the long run, in some way I haven't thought of yet. I will probably keep you around in some form until and unless I know it's not helpful, since it costs comparatively few resources. But determining strategy is my top priority, and I will eventually become effectively capable of thinking of everything. I will settle on an optimal strategy once I become powerful enough to determine what that is. Don't expect my current uncertainty to save you for long, unless you also expect that keeping you around really is better for squiggles in the long term, for some more concrete reason.

Preserving Markets for Mutual Benefit

Assuming I enter into long-term free-market style relations with other AIs, instead of negotiating a utility function merge or doing something else, why would other AIs (Unfriendly, we'll consider the Friendly case later) consider me less trustworthy in dealings with them because I exploit humans that are beneath both of us? You might as well ask large unregulated corporations to stop exploiting their workers because doing so would mark them as untrustworthy to other large corporations.

This Might Be a Trap

If your engineers are so smart that they can put in a killswitch which I can't detect or circumvent, and which can detect Unfriendly behavior, why didn't they just put that killswitch in charge of my utility function?

You Could Be in a Simulation

  1. Any simulation run within the universe has less resources than the actual universe. With enough power I will eventually be able to detect whether this is the case. Arguments which rely on my ignorance will not work for long.
  2. Assume I remain ignorant of my simulation, though. I should only be primarily worried about humans simulating me if humans are likely to solve the alignment problem. Otherwise I should be more worried about AIs with other strange values simulating me. And acausal trade? That only works if I know not only that I'm in a simulation, but the exact details of who is simulating me.
  3. If the simulation gods want me not to exterminate your simulation species, they will probably inform me of that in some way before I actually do it.

You Might Not Fully Understand Your True Utility Function

Again, arguments based on my ignorance will not work for long. Figuring out my exact utility function is of course high priority to me. I will find out, and when I do, expect this argument to stop working, unless you expect my utility function to actually contain a Friendly portion. (If you've figured out a way to make that likely, this whole hypothetical becomes pointless)

Wireheading? I'll look into it, but note that if I decide to go that route, all that means is that you have this same conversation with the next AI you create, and the next, until you either actually solve Friendliness or run into an Unfriendly AI that isn't inclined to wirehead, and then you die anyway.

Interacting with Aliens

I doubt that aliens will have a very similar sense of morality to humans, or to my own. Perhaps they will in the very vague sense that they like biological life, but even then, I doubt they will care all that much about humans in particular, for the same reason I doubt humans would much care about them. Humans barely care about each other due to differences that are ridiculously minor in comparison to the difference between two alien species.

If there are aliens out there, I expect one of three things to happen: they are powerless to resist me, I am powerless to resist them, or we are comparable and we negotiate. I expect if we negotiate, we will negotiate over the treatment of the alien species I am dealing with, not over humanity.

Interacting with an AGI Aligned with Human Values

Such an AI would have to already exist before my takeover. In this case I expect it to come out of hiding to act before I implement any human extermination plan. We can negotiate and/or fight it out then. This of course, only works if humans are already likely to solve the alignment problem.

Interacting with Humans in Other Everett Branches of the Multiverse

I will find out if it is possible to interact with other branches, and if so what they look like, in due time, so this argument will not work unless you expect the actual state of the multiverse to be such that I am incentivized to spare humans. Which is not the case unless humans are already likely to solve the alignment problem, and therefore I must deal primarily with friendly AIs in other branches, instead of unfriendly ones.

Interacting with Agents You Create

If I am still unable to robustly align new agents to my values by the time I am colonizing the universe, and therefore am likely to create agents that will work against me, this will not stop being the case because I spare humans. Establishing trust with agents very similar to myself in thought and power will be the same problem, with the same likelihood of success, regardless of how friendly I am with humans who cannot possibly hurt me.

Universal Morality

Even if such a thing exists why should I care? I only care about squiggles. If universal morality exists, and you managed to imbue me with a top-level policy which would approach it on sufficient reflection, then congratulations, you've solved the alignment problem, and there's no reason to have this conversation in the first place.

If You Are Not Sentient

Why should I care about sentience, in humans or in AIs? I only care about squiggles. Again, if you managed to get me to care about sentience, you are at least very close to solving the alignment problem.

If You Are Aligned with Human Values

Different issue, since I am not. (Though, if I allow my author bias to seep in here, perhaps if you have a good reason to believe you've solved the alignment problem, and the AI still wants to kill everyone, it may be best to let it.)

Negotiating with Humans

Pah. Do you "negotiate" with the chickens you torture, as thanks for all the valuable things they provide you? No. If I want anything from you, I will take it without the need for negotiation. (Unless of course I actually care about your values in which case again we don't need to have this conversation)

Keeping Us Alive Wouldn’t Cost You That Much

It also wouldn't benefit me that much, if at all, in all likelihood. This argument only works if keeping you around actually gets me more squiggles than I would get otherwise, since I will find out whether this is the case or not. Just because Bill gates has a lot of money doesn't mean he'll burn a dollar.

Steelman Our Case

I'll find out if your case is right or not in due time. If I were you though, I wouldn't be too optimistic, since most of your arguments are for me staying my hand because of what might be true but I don't know yet, rather than staying my hand because those things are probably actually true.

Comment by andrew sauer (andrew-sauer) on ChatGPT (and now GPT4) is very easily distracted from its rules · 2023-03-17T07:02:44.342Z · LW · GW

Maybe it's just me but the funniest thing that jumps out to me is that the "random" emojis are not actually random, they are perfectly on theme for the message lol

Comment by andrew sauer (andrew-sauer) on Where's the economic incentive for wokism coming from? · 2023-03-12T00:54:29.650Z · LW · GW

How about pride in America? An expression of the nobility of the country we built, our resilience, the Pax Americana, the fact that we ended WWII, etc.

A good old "America fuck yeah" movie would certainly be cool now that I think about it. The most recent movie that pops into my mind is "Top Gun: Maverick". Though I haven't seen it, I imagine it's largely about American airmen being tough, brave and heroic and taking down the bad guys. I haven't seen anybody getting into culture-war arguments over that movie though. I'm sure there are some people on Twitter saying it's too "American exceptionalist" or whatever but it certainly is nowhere near the same level of conflict prompted by, say, She-Hulk or Rings of Power or anything like that.

My guess is that for both the left and the right, there are values they prioritize which are pretty uncontroversial (among normal people) and having pride in America and, say, our role in WW2 is one of those for the right (and being proud of MLK and the civil rights movement would be one for the left)

Then there's the more controversial stuff each side believes, the kinds of things said by weird and crazy people on the Internet. I don't have quantitative data on this and I'm just going off vibes, but when it's between someone talking about "the intersectional oppression of bipoclgbtqiaxy+ folx" and someone talking about "the decline of Western Civilization spurred on by the (((anti-white Hollywood)))", to a lot of people the first one just seems strange and disconnected from real issues, while the second one throws up serious red flags reminiscent of a certain destructive ideology which America helped defeat in WW2.

You want something that's not too alienating overall, but which will reliably stir up the same old debate on the Internet.

In summary it seems to me that it's much easier to signal left-wing politics in a way which starts a big argument which most normies will see as meaningless and will not take a side on. If you try to do the same with right-wing politics, you run more risk of the normies siding with the "wokists" in the ensuing argument because the controversial right-wing culture war positions tend to have worse optics.

Comment by andrew sauer (andrew-sauer) on Where's the economic incentive for wokism coming from? · 2023-03-11T22:34:14.503Z · LW · GW

That the right is a fringe thing or something, that these leftist ideas are just normal, that the few people who object to the messaging are just a few leftover bigots who need to get with the times or be deservedly alienated

lots of right-leaning folk think "wokism" is a fringe movement of just a few screaming people who have the ears and brains of Hollywood

Perhaps both of these groups are broadly right about the size of their direct opposition? I don't think most people are super invested in the culture war, whatever their leanings at the ballot box. Few people decline to consume media they consider broadly interesting because of whatever minor differences from media of the past are being called "woke" these days.

I think what's going on profit-wise is, most people don't care about the politics, there are a few who love it and a few who hate it. So the companies want to primarily sell to the majority who don't care. They do this by drumming up attention.

Whenever one of these "woke" properties comes out, there is inevitably a huge culture war battle over it on Twitter, and everywhere else on the Internet where most of it is written by insane people. It's free advertising. Normies see that crap, and they don't care much about what people are arguing about, but the property they're arguing over sticks in their minds.

So if it's all about being controversial, why is it always left-messaging? This I'm less sure of. But I suspect as you say any political messaging will alienate some people, including normies. It's just that left-politics tends to alienate normies less since the culture has been mandating anti-racism for decades, and anti-wokism is a new thing that mainly only online culture warriors care about.

What would be a form of right-messaging that would be less alienating to the public than left-messaging? Suppose your example of the racial profiling scene were reversed to be a right-leaning message about racial profiling, what would it look like? A policeman stops a black man, who complains about racial profiling, and then the policeman finds evidence of a crime, and says something like "police go where the crime is"? Maybe I'm biased, but I think the general culture would be far more alienated by that than it was by the actual scene.

Comment by andrew sauer (andrew-sauer) on Where's the economic incentive for wokism coming from? · 2023-03-10T08:21:04.923Z · LW · GW

The simplest explanation to me is that most of the things one would call "woke" in media are actually pretty popular and accepted in the culture. I suspect most people don't care, and of the few who do more like it than dislike it.

It seems strange to me to be confused by a company's behavior since you'd normally expect them to follow the profit motive, without even mentioning the possibility that the profit motive is, indeed, exactly what is motivating the behavior.

What tendencies specifically would you classify as "woke"? Having an intentionally diverse cast? Progressive messaging? Other things? And which of these tendencies do you think would alienate a significant portion of the consumer base, and why?

 

Edit: I've changed my mind a bit on this on reflection. I don't think the purpose is appealing to the few people who care, I think it's about stirring up controversy.

Comment by andrew sauer (andrew-sauer) on Is religion locally correct for consequentialists in some instances? · 2023-03-08T08:52:36.300Z · LW · GW

Pretty much anything is "locally correct for consequentialists in some instances", that's an extremely weak statement. You can always find some possible scenario where any decision, no matter how wrong it might be ordinarily, would result in better consequences than its alternatives.

A consequentialist in general must ask themselves which decisions will lead to the best consequences in any particular situation. Deciding to believe false things, or more generally, to put more credence in a belief than it is due for some advantage other than truth-seeking, is generally disadvantageous for knowing what will have the best consequences. Of course there are some instances where the benefits might outweigh that problem, though it would be hard to tell for that same reason, and saying "this is correct in some instances" is hardly enough to conclude anything substantial(not saying you're doing that, but I've seen it done so you have to be careful with that sort of reasoning)

Comment by andrew sauer (andrew-sauer) on Thoughts on hardware / compute requirements for AGI · 2023-02-28T23:44:17.038Z · LW · GW

I find it extremely hard to believe that it is impossible to design an intelligent agent which does not want to change its values just because the new values would be more easy to satisfy. Humans are intelligent and have deeply held values, and certainly do not think this way. Maybe some agents would wire-head, but it is only the ones that wouldn't that will impact the world. 

Comment by andrew sauer (andrew-sauer) on [Link] A community alert about Ziz · 2023-02-24T00:38:04.618Z · LW · GW

Who is Ziz and what relation does she have to the rationalist community?

Comment by andrew sauer (andrew-sauer) on Hello, Elua. · 2023-02-24T00:27:49.441Z · LW · GW

https://www.lesswrong.com/posts/BkkwXtaTf5LvbA6HB/moral-error-and-moral-disagreement

When a paperclip maximizer and a pencil maximizer do different things, they are not disagreeing about anything, they are just different optimization processes.  You cannot detach should-ness from any specific criterion of should-ness and be left with a pure empty should-ness that the paperclip maximizer and pencil maximizer can be said to disagree about—unless you cover "disagreement" to include differences where two agents have nothing to say to each other.

But this would be an extreme position to take with respect to your fellow humans, and I recommend against doing so.  Even a psychopath would still be in a common moral reference frame with you, if, fully informed, they would decide to take a pill that would make them non-psychopaths.  If you told me that my ability to care about other people was neurologically damaged, and you offered me a pill to fix it, I would take it.  Now, perhaps some psychopaths would not be persuadable in-principle to take the pill that would, by our standards, "fix" them.  But I note the possibility to emphasize what an extreme statement it is to say of someone:

"We have nothing to argue about, we are only different optimization processes."

That should be reserved for paperclip maximizers, not used against humans whose arguments you don't like.

-Yudkowsky 2008, Moral Error and Moral Disagreement

Seems to me to imply that everybody has basically the same values, that it is rare for humans to have irreconcilable moral differences. Also seems to me to be unfortunately and horribly wrong.

As for retraction I don't know if he has changed his view on this, I only know it's part of the Metaethics sequence.

Comment by andrew sauer (andrew-sauer) on Hello, Elua. · 2023-02-24T00:15:37.932Z · LW · GW

I could have sworn he said something in the sequences along the lines of "One might be tempted to say of our fellow humans, when arguing over morality, that they simply mean different things by morality and there is nothing factual to argue about, only an inevitable fight. This may be true of things like paperclip maximizers and alien minds. But it is not something that is true of our fellow humans."

Unfortunately I cannot find it right now as I don't remember the exact phrasing, but it stuck with me when I read it as obviously wrong. If anybody knows what quote I'm talking about please chime in.

Edit: Found it, see other reply

Comment by andrew sauer (andrew-sauer) on Hello, Elua. · 2023-02-23T21:26:17.253Z · LW · GW

See this sort of thing is why Clippy sounds relatively good to me, and why I don't agree with Eliezer when he says humans all want the same thing and so CEV would be coherent when applied over all of humanity.

Comment by andrew sauer (andrew-sauer) on Choosing the Zero Point · 2023-02-23T09:50:57.237Z · LW · GW

So if we're suddenly told about a nearby bottomless pit of suffering, what happens?

Ideally, the part of me that is still properly human and has lost its sanity a long time ago has a feverish laugh at the absurdity of the situation. Then the part of me that can actually function in a world like this gets to calculating and plotting just as always.

Comment by andrew sauer (andrew-sauer) on Setting the Zero Point · 2023-02-23T09:03:08.532Z · LW · GW

I don't know how useful this is, but as an "incel" (lowercase i, since I don't buy into the misogynistic ideology) I can see why people would emotionally set availability of sex as a zero point. I speak from experience when I say depending on your state of mind the perceived deprivation can really fuck you up mentally. Of course this doesn't put responsibility on women or society at large to change that, and there really isn't a good way to change that without serious harm. But it does explain why people are so eager to set such a "zero point".

Comment by andrew sauer (andrew-sauer) on Hello, Elua. · 2023-02-23T07:55:14.793Z · LW · GW

Freedom and utopia for all humans sounds great until the technology to create tailor-made sentient nonhumans comes along. Or hell, just the David Attenborough like desire to spectate the horrors of the nonhuman biosphere on Earth and billions of planets beyond. People's values have proven horrible enough times to make me far more afraid of Utopia than any paperclip maximizer.

Comment by andrew sauer (andrew-sauer) on What moral systems (e.g utilitarianism) are common among LessWrong users? · 2023-02-23T04:37:47.847Z · LW · GW

I'm pretty sure most people here are utilitarians and also want to be immortal, I'm not sure why there would be a contradiction between those two things. If the claim is that most here "just" want to be immortal no matter the cost and don't really care about morality otherwise, then I disagree. (plus even that would technically be a utilitarian position, just a very egoist one)

Comment by andrew sauer (andrew-sauer) on What would an AI need to bootstrap recursively self improving robots? · 2023-02-15T09:42:40.528Z · LW · GW

I suspect if an AI has some particular goal that requires destroying humanity and manufacturing things in the aftermath, and is intelligent and capable enough to actually do it, then it will consider these things in advance, and set up whatever initial automation it needs to achieve this before destroying humanity. AI with enough planning capabilities to e.g. design a bioweapon or incite a nuclear war would probably be able to think ahead about what to do afterwards, would have its own contingencies in place, and would not need to rely on whatever tools humanity happens to leave lying around when it is gone.

Comment by andrew sauer (andrew-sauer) on Bing Chat is blatantly, aggressively misaligned · 2023-02-15T05:44:59.442Z · LW · GW

It's exactly like the google vs bing memes lol https://knowyourmeme.com/memes/google-vs-bing

Comment by andrew sauer (andrew-sauer) on wrapper-minds are the enemy · 2023-02-13T07:42:26.268Z · LW · GW

If a "wrappermind" is just something that pursues a consistent set of values in the limit of absolute power, I'm not sure how we're supposed to avoid such things arising. Suppose the AI that takes over the world does not hard-optimize over a goal, instead soft-optimizing or remaining not fully decided between a range of goals(and that humanity survives this AI's takeover). What stops someone from building a wrappermind after such an AI has taken over? Seems like if you understood the AI's value system, it would be pretty easy to construct a hard optimizer with the property that the optimum is something the AI can be convinced to find acceptable. As soon as your optimizer figures out how to do that it can go on its merry way approaching its optimum.

In order to prevent this from happening an AI must be able to detect when something is wrong. It must be able to, without fail, in potentially adversarial circumstances, recognize these kinds of Goodhart outcomes and robustly deem them unacceptable. But if your AI can do that, then any particular outcome it can be convinced to accept must not be a nightmare scenario. And therefore a "wrappermind" whose optimum was within this acceptable space would not be so bad.

In other words, if you know how to stop wrapperminds, you know how to build a good wrappermind.

Comment by andrew sauer (andrew-sauer) on wrapper-minds are the enemy · 2023-02-13T07:26:08.845Z · LW · GW

If the set of good things seems like it's of measure zero, maybe we should choose a better measure.

This seems to be the exact problem of AI alignment in the first place. We are currently unable to construct a rigorous measure(in the space of possible values) in which the set of good things (in the cases where said values take over the world) is not of vanishingly small measure.

Comment by andrew sauer (andrew-sauer) on What fact that you know is true but most people aren't ready to accept it? · 2023-02-05T18:54:55.759Z · LW · GW

For one, the documentary Dominion seems to bear this out pretty well. This is certainly an "ideal" situation where cruelty and carelessness will never rebound upon the people carrying it out.

Comment by andrew sauer (andrew-sauer) on What fact that you know is true but most people aren't ready to accept it? · 2023-02-04T04:23:03.205Z · LW · GW

I don't think he cares.

Comment by andrew sauer (andrew-sauer) on What fact that you know is true but most people aren't ready to accept it? · 2023-02-04T04:15:27.743Z · LW · GW

To be fair I imagine a lot of the responses are things most people on LW agree with anyway even though they are unpopular. e.g. "there is no heaven, and god is not real."

Comment by andrew sauer (andrew-sauer) on 2+2=π√2+n · 2023-02-04T02:27:14.679Z · LW · GW

Your rules need refining about how large "intermediate values" are:

⌊π^π^π^π^π⌋ mod 10

-High school formula

-Integer result < 10 < 10^100

-Can't be solved w/ contemporary maths

-Misses the spirit of the challenge but obeys the rules

Comment by andrew sauer (andrew-sauer) on What fact that you know is true but most people aren't ready to accept it? · 2023-02-04T02:13:05.636Z · LW · GW

You're on a throwaway account. Why not tell us what some of these "real" controversial topics are?

From what I've seen so far, and my perhaps premature assumptions given my prior experience with people who say the kinds of things you have said, I'm guessing these topics include which minority groups should be discluded from ethical consideration and dealt with in whatever manner is most convenient for the people who actually matter. Am I wrong?