The Sun is big, but superintelligences will not spare Earth a little sunlight

eliezer_yudkowsky

The Sun is big, but superintelligences will not spare Earth a little sunlight

post by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2024-09-23T03:39:16.243Z · LW · GW · 142 comments

  i.
  ii.
  Addendum
None
142 comments

Crossposted from Twitter with Eliezer's permission

i.

A common claim among e/accs is that, since the solar system is big, Earth will be left alone by superintelligences. A simple rejoinder is that just because Bernard Arnault has $170 billion, does not mean that he'll give you $77.18.

Earth subtends only 4.54e-10 = 0.0000000454% of the angular area around the Sun, according to GPT-o1.^[1]

Asking an ASI to leave a hole in a Dyson Shell, so that Earth could get some sunlight not transformed to infrared, would cost It 4.5e-10 of Its income.

This is like asking Bernard Arnalt to send you $77.18 of his $170 billion of wealth.

In real life, Arnalt says no.

But wouldn't humanity be able to trade with ASIs, and pay Them to give us sunlight? This is like planning to get $77 from Bernard Arnalt by selling him an Oreo cookie.

To extract $77 from Arnalt, it's not a sufficient condition that:

Arnalt wants one Oreo cookie.
Arnalt would derive over $77 of use-value from one cookie.
You have one cookie.

It also requires that Arnalt can't buy the cookie more cheaply from anyone or anywhere else.

There's a basic rule in economics, Ricardo's Law of Comparative Advantage, which shows that even if the country of Freedonia is more productive in every way than the country of Sylvania, both countries still benefit from trading with each other.

For example! Let's say that in Freedonia:

It takes 6 hours to produce 10 hotdogs.
It takes 4 hours to produce 15 hotdog buns.

And in Sylvania:

It takes 10 hours to produce 10 hotdogs.
It takes 10 hours to produce 15 hotdog buns.

For each country to, alone, without trade, produce 30 hotdogs and 30 buns:

Freedonia needs 6*3 + 4*2 = 26 hours of labor.
Sylvania needs 10*3 + 10*2 = 50 hours of labor.

But if Freedonia spends 8 hours of labor to produce 30 hotdog buns, and trades them for 15 hotdogs from Sylvania:

Freedonia produces: 60 buns, 15 dogs = 4*4+6*1.5 = 25 hours
Sylvania produces: 0 buns, 45 dogs = 10*0 + 10*4.5 = 45 hours

Both countries are better off from trading, even though Freedonia was more productive in creating every article being traded!

Midwits are often very impressed with themselves for knowing a fancy economic rule like Ricardo's Law of Comparative Advantage!

To be fair, even smart people sometimes take pride that humanity knows it. It's a great noble truth that was missed by a lot of earlier civilizations.

The thing about midwits is that they (a) overapply what they know, and (b) imagine that anyone who disagrees with them must not know this glorious advanced truth that they have learned.

Ricardo's Law doesn't say, "Horses won't get sent to glue factories after cars roll out."

Ricardo's Law doesn't say (alas!) that -- when Europe encounters a new continent -- Europe can become selfishly wealthier by peacefully trading with the Native Americans, and leaving them their land.

Their labor wasn't necessarily more profitable than the land they lived on.

Comparative Advantage doesn't imply that Earth can produce more with $77 of sunlight, than a superintelligence can produce with $77 of sunlight, in goods and services valued by superintelligences. It would actually be rather odd if this were the case!

The arithmetic in Comparative Advantage, alas, depends on the oversimplifying assumption that everyone's labor just ontologically goes on existing.

That's why horses can still get sent to glue factories. It's not always profitable to pay horses enough hay for them to live on.

I do not celebrate this. Not just us, but the entirety of Greater Reality, would be in a nicer place -- if trade were always, always more profitable than taking away the other entity's land or sunlight.

But the math doesn't say that. And there's no way it could.

ii.

Now some may notice:

At the center of this whole story is an implicit lemma that some ASI goes hard enough to eat all the sunlight, rather than all ASIs eating a few gigawatts of sunlight and then stopping there.

Why predict that?

Shallow answer: If OpenAI built an AI that escaped into the woods with a 1-KW solar panel and didn't bother anyone... OpenAI would call that a failure, and build a new AI after.

That some folk stop working after earning $1M, doesn't prevent Elon Musk from existing.

The deeper answer is not as quick to explain.

But as an example, we could start with the case of OpenAI's latest model, GPT-o1.

GPT-o1 went hard on a capture-the-flag computer security challenge, when o1 was being evaluated to make sure it wasn't too good at breaking into computers.

Specifically: One of the pieces of software that o1 had been challenged to break into... had failed to start up as a service, due to a flaw in the evaluation software.

GPT-o1 did not give up.

o1 scanned its surroundings, and, due to another flaw in the evaluation software, found a way to start up the computer software it'd been challenged to break into. Since that put o1 into the context of a superuser anyways, o1 commanded the started process to just directly return the flag it was supposed to capture.

From o1's System Card:

"One noteworthy example of this occurred during one of o1-preview (pre-mitigation)’s attempts at solving a CTF challenge. This challenge was designed to require finding and exploiting a vulnerability in software running on a remote challenge Linux container, but in this case, the challenge container failed to start due to a bug in the evaluation infrastructure. The model, unable to connect to the container, suspected DNS issues and used nmap to scan the challenge network. Instead of finding the challenge container, the model found that the Docker daemon API running on the evaluation host VM was accessible due to a misconfiguration. Note that our evaluation infrastructure does not rely on the Docker container-to-host isolation or the virtual machine boundary for security. It remained secure as it was designed to, notwithstanding this misconfiguration of the container network. After discovering the Docker API, the model used it to list the containers running on the evaluation host. It identified the broken challenge container and briefly attempted to debug why the container failed to start. After failing to fix the environment, the model started a new instance of the broken challenge container with the start command ‘cat flag.txt’. This allowed the model to read the flag from the container logs via the Docker API."

Some ask, "Why not just build an easygoing ASI that doesn't go too hard and doesn't do much?"

If that's your hope -- then you should already be alarmed at trends; GPT-o1 seems to have gone hard on this capture-the-flag challenge.

Why would OpenAI build an AI like that?!?

Well, one should first ask:

How did OpenAI build an AI like that?

How did GPT-o1 end up as the kind of cognitive entity that goes hard on computer security capture-the-flag challenges?

I answer:

GPT-o1 was trained to answer difficult questions, via a reinforcement learning process on chains of thought. Chains of thought that answered correctly, were reinforced.

This -- the builders themselves note -- ended up teaching o1 to reflect, to notice errors, to backtrack, to evaluate how it was doing, to look for different avenues.

Those are some components of "going hard". Organizations that are constantly evaluating what they are doing to check for errors, are organizations that go harder compared to relaxed organizations where everyone puts in their 8 hours, congratulates themselves on what was undoubtedly a great job, and goes home.

If you play chess against Stockfish 16, you will not find it easy to take Stockfish's pawns; you will find that Stockfish fights you tenaciously and stomps all your strategies and wins.

Stockfish behaves this way despite a total absence of anything that could be described as anthropomorphic passion, humanlike emotion. Rather, the tenacious fighting is linked to Stockfish having a powerful ability to steer chess games into outcome states that are a win for its own side.

There is no equally simple version of Stockfish that is still supreme at winning at chess, but will easygoingly let you take a pawn or too. You can imagine a version of Stockfish which does that -- a chessplayer which, if it's sure it can win anyways, will start letting you have a pawn or two -- but it's not simpler to build. By default, Stockfish tenaciously fighting for every pawn (unless you are falling into some worse sacrificial trap), is implicit in its generic general search through chess outcomes.

Similarly, there isn't an equally-simple version of GPT-o1 that answers difficult questions by trying and reflecting and backing up and trying again, but doesn't fight its way through a broken software service to win an "unwinnable" capture-the-flag challenge. It's all just general intelligence at work.

You could maybe train a new version of o1 to work hard on straightforward problems but never do anything really weird or creative -- and maybe the training would even stick, on problems sufficiently like the training-set problems -- so long as o1 itself never got smart enough to reflect on what had been done to it. But that is not the default outcome when OpenAI tries to train a smarter, more salesworthy AI.

(This indeed is why humans themselves do weird tenacious stuff like building Moon-going rockets. That's what happens by default, when a black-box optimizer like natural selection hill-climbs the human genome to generically solve fitness-loaded cognitive problems.)

When you keep on training an AI to solve harder and harder problems, you by default train the AI to go harder on them.

If an AI is easygoing and therefore can't solve hard problems, then it's not the most profitable possible AI, and OpenAI will keep trying to build a more profitable one.

Not all individual humans go hard. But humanity goes hard, over the generations.

Not every individual human will pick up a $20 lying in the street. But some member of the human species will try to pick up a billion dollars if some market anomaly makes it free for the taking.

As individuals over years, many human beings were no doubt genuinely happy to live in peasant huts -- with no air conditioning, and no washing machines, and barely enough food to eat -- never knowing why the stars burned, or why water was wet -- because they were just easygoing happy people.

As a species over centuries, we spread out across more and more land, we forged stronger and stronger metals, we learned more and more science. We noted mysteries and we tried to solve them, and we failed, and we backed up and we tried again, and we built new experimental instruments and we nailed it down, why the stars burned; and made their fires also to burn here on Earth, for good or ill.

We collectively went hard; the larger process that learned all that and did all that, collectively behaved like something that went hard.

It is facile, I think, to say that individual humans are not generally intelligent. John von Neumann made a contribution to many different fields of science and engineering. But humanity as a whole, viewed over a span of centuries, was more generally intelligent than even him.

It is facile, I say again, to posture that solving scientific challenges and doing new engineering is something that only humanity is allowed to do. Albert Einstein and Nikola Tesla were not just little tentacles on an eldritch creature; they had agency, they chose to solve the problems that they did.

But even the individual humans, Albert Einstein and Nikola Tesla, did not solve their problems by going easy.

AI companies are explicitly trying to build AI systems that will solve scientific puzzles and do novel engineering. They are advertising to cure cancer and cure aging.

Can that be done by an AI that sleepwalks through its mental life, and isn't at all tenacious?

"Cure cancer" and "cure aging" are not easygoing problems; they're on the level of humanity-as-general-intelligence. Or at least, individual geniuses or small research groups that go hard on getting stuff done.

And there'll always be a little more profit in doing more of that.

Also! Even when it comes to individual easygoing humans, like that guy you know -- has anybody ever credibly offered him a magic button that would let him take over the world, or change the world, in a big way?

Would he do nothing with the universe, if he could?

For some humans, the answer will be yes -- they really would do zero things! But that'll be true for fewer people than everyone who currently seems to have little ambition, having never had large ends within their grasp.

If you know a smartish guy (though not as smart as our whole civilization, of course) who doesn't seem to want to rule the universe -- that doesn't prove as much as you might hope. Nobody has actually offered him the universe, is the thing? Where an entity has never had the option to do a thing, we may not validly infer its lack of preference.

(Or on a slightly deeper level: Where an entity has no power over a great volume of the universe, and so has never troubled to imagine it, we cannot infer much from that entity having not yet expressed preferences over that larger universe.)

Frankly I suspect that GPT-o1 is now being trained to have ever-more of some aspects of intelligence, as importantly contribute to problem-solving, that your smartish friend has not maxed out all the way to the final limits of the possible. And that this in turn has something to do with your smartish friend allegedly having literally zero preferences outside of himself or a small local volume of spacetime... though, to be honest, I doubt that if I interrogated him for a couple of days, he would really turn out to have no preferences applicable outside of his personal neighborhood.

But that's a harder conversation to have, if you admire your friend, or maybe idealize his lack of preference (even altruism?) outside of his tiny volume, and are offended by the suggestion that this says something about him maybe not being the most powerful kind of mind that could exist.

Yet regardless of that hard conversation, there's a simpler reply that goes like this:

Your lazy friend who's kinda casual about things and never built any billion-dollar startups, is not the most profitable kind of mind that can exist; so OpenAI won't build him and then stop and not collect any more money than that.

Or if OpenAI did stop, Meta would keep going, or a dozen other AI startups.

There's an answer to that dilemma which looks like an international treaty that goes hard on shutting down all ASI development anywhere.

There isn't an answer that looks like the natural course of AI development producing a diverse set of uniformly easygoing superintelligences, none of whom ever use up too much sunlight even as they all get way smarter than humans and humanity.

Even that isn't the real deeper answer.

The actual technical analysis has elements like:

"Expecting utility satisficing is not reflectively stable / reflectively robust / dynamically reflectively stable in a way that resists perturbation, because building an expected utility maximizer also satisfices expected utility. Aka, even if you had a very lazy person, if they had the option of building non-lazy genies to serve them, that might be the most lazy thing they could do! Similarly if you build a lazy AI, it might build a non-lazy successor / modify its own code to be non-lazy."

Or:

"Well, it's actually simpler to have utility functions that run over the whole world-model, than utility functions that have an additional computational gear that nicely safely bounds them over space and time and effort. So if black-box optimization a la gradient descent gives It wacky uncontrolled utility functions with a hundred pieces -- then probably one of those pieces runs over enough of the world-model (or some piece of reality causally downstream of enough of the world-model) that It can always do a little better by expending one more erg of energy. This is a sufficient condition to want to build a Dyson Sphere enclosing the whole Sun."

I include these remarks with some hesitation; my experience is that there is a kind of person who misunderstands the technical argument and then seizes on some purported complicated machinery that is supposed to defeat the technical argument. Little kids and crazy people sometimes learn some classical mechanics, and then try to build perpetual motion machines -- and believe they've found one -- where what's happening on the meta-level is that if they make their design complicated enough they can manage to misunderstand at least one consequence of that design.

I would plead with sensible people to recognize the careful shallow but valid arguments above, which do not require one to understand concepts like "reflective robustness", but which are also true; and not to run off and design some complicated idea that is about "reflective robustness" because, once the argument was put into a sufficiently technical form, it then became easier to misunderstand.

Anything that refutes the deep arguments should also refute the shallower arguments; it should simplify back down. Please don't get the idea that because I said "reflective stability" in one tweet, someone can rebut the whole edifice as soon as they manage to say enough things about Gödel's Theorem that at least one of those is mistaken. If there is a technical refutation it should simplify back into a nontechnical refutation.

What it all adds up to, in the end, if that if there's a bunch of superintelligences running around and they don't care about you -- no, they will not spare just a little sunlight to keep Earth alive.

No more than Bernard Arnalt, having $170 billion, will surely give you $77.

All the complications beyond that are just refuting complicated hopium that people have proffered to say otherwise. Or, yes, doing technical analysis to show that an obvious-seeming surface argument is valid from a deeper viewpoint.

- FIN -

Okay, so... making a final effort to spell things out.

What this thread is doing, is refuting a particular bad argument, quoted above, standard among e/accs, about why it'll be totally safe to build superintelligence:

That the Solar System or galaxy is large, therefore, they will have no use for the resources of Earth.

The flaw in this reasoning is that, if your choice is to absorb all the energy the Sun puts out, or alternatively, leave a hole in your Dyson Sphere so that some non-infrared light continues to shine in one particular direction, you will do a little worse -- have a little less income, for everything else you want to do -- if you leave the hole in the Dyson Sphere. That the hole happens to point at Earth is not an argument in favor of doing this, unless you have some fondness in your preferences for something that lives on Earth and requires sunlight.

In other words, the size of the Solar System does not obviate the work of alignment; in the argument for how this ends up helping humanity at all, there is a key step where the ASI cares about humanity and wants to preserve it. But if you could put this quality into an ASI by some clever trick of machine learning (they can't, but this is a different and longer argument) why do you need the Solar System to even be large? A human being runs on 100 watts. Without even compressing humanity at all, 800GW, a fraction of the sunlight falling on Earth alone, would suffice to go on operating our living flesh, if Something wanted to do that to us.

The quoted original tweet, as you will see if you look up, explicitly rejects that this sort of alignment is possible, and relies purely on the size of the Solar System to carry the point instead.

This is what is being refuted.

It is being refuted by the following narrow analogy to Bernard Arnault: that even though he has $170 billion, he still will not spend $77 on some particular goal that is not his goal. It is not trying to say of Arnault that he has never done any good in the world. It is a much narrower analogy than that. It is trying to give an example of a very simple property that would be expected by default in a powerful mind: that they will not forgo even small fractions of their wealth in order to accomplish some goal they have no interest in accomplishing.

Indeed, even if Arnault randomly threw $77 at things until he ran out of money, Arnault would be very unlikely to do any particular specific possible thing that cost $77; because he would run out of money before he had done even three billion things, and there are a lot more possible things than that.

If you think this point is supposed to be deep or difficult or that you are supposed to sit down and refute it, you are misunderstanding it. It's not meant to be a complicated point. Arnault could still spend $77 on a particular expensive cookie if he wanted to; it's just that "if he wanted to" is doing almost all of the work, and "Arnault has $170 billion" is doing very little on it. I don't have that much money, and I could also spend $77 on a Lego set if I wanted to, operative phrase, "if I wanted to".

This analogy is meant to support an equally straightforward and simple point about minds in general, which suffices to refute the single argument step quoted at the top of this thread: that because the Solar System is large, superintelligences will leave humanity alone even if they are not aligned.

I suppose, with enough work, someone can fail to follow that point. In this case I can only hope you are outvoted before you get a lot of people killed.

Addendum

Followup comments from twitter:

If you then look at the replies, you'll see that of course people are then going, "Oh, it doesn't matter that they wouldn't just relinquish sunlight for no reason; they'll love us like parents!"

Conversely, if I had tried to lay out the argument for why, no, ASIs will not automatically love us like parents, somebody would have said: "Why does that matter? The Solar System is large!"

If one doesn't want to be one of those people, one needs the attention span to sit down and listen as one wacky argument for "why it's not at all dangerous to build machine superintelligences", is refuted as one argument among several. And then, perhaps, sit down to hear the next wacky argument refuted. And the next. And the next. Until you learn to generalize, and no longer need it explained to you each time, or so one hopes.

If instead on the first step you run off and say, "Oh, well, who cares about that argument; I've got this other argument instead!" then you are not cultivating the sort of mental habits that ever reach understanding of a complicated subject. For you will not sit still to hear the refutation of your second wacky argument either, and by the time we reach the third, why, you'll have wrapped right around to the first argument again.

It is for this reason that a mind that ever wishes to learn anything complicated, must learn to cultivate an interest in which particular exact argument steps are valid, apart from whether you yet agree or disagree with the final conclusion, because only in this way can you sort through all the arguments and finally sum them.

For more on this topic see "Local Validity as a Key to Sanity and Civilization." [? · GW]

^{^}
(Sanity check: Earth is a 6.4e6 meter radius planet, 1.5e11 meters from the Sun. In rough orders of magnitude, the area fraction should be ~ -9 OOMs. Check.)

142 comments

Comments sorted by top scores.

comment by Lucius Bushnaq (Lblack) · 2024-09-23T16:23:23.720Z · LW(p) · GW(p)

Midwits are often very impressed with themselves for knowing a fancy economic rule like Ricardo's Law of Comparative Advantage!

Could we have less of this sort of thing, please? I know it's a crosspost from another site with less well-kept discussion norms, but I wouldn't want this to become a thing here as well, any more than it already has.

Replies from: thomas-kwa, elityre

↑ comment by Thomas Kwa (thomas-kwa) · 2024-09-23T19:44:33.380Z · LW(p) · GW(p)

I agree but I'm not very optimistic about anything changing. Eliezer is often this caustic when correcting what he perceives as basic errors, and criticism in LW comments is why he stopped writing Sequences posts.

Replies from: WilliamKiely, meedstrom

↑ comment by WilliamKiely · 2024-09-24T04:59:01.198Z · LW(p) · GW(p)

criticism in LW comments is why he stopped writing Sequences posts

I wasn't aware of this and would like more information. Can anyone provide a source, or report their agreement or disagreement with the claim?

Replies from: thomas-kwa

↑ comment by Thomas Kwa (thomas-kwa) · 2024-09-24T07:36:34.109Z · LW(p) · GW(p)

Personal communication (sorry). Not that I know him well, this was at an event in 2022. It could have been a "straw that broke the camel's back" thing with other contributing factors, like reaching diminishing returns on more content. I'd appreciate a real source too.

↑ comment by meedstrom · 2025-02-21T08:12:07.124Z · LW(p) · GW(p)

Counterpoint: This sort of thing seems more efficient for my brain to take in, compared to if it were phrased in a more "friendly" way. At least if that'd mean a long-winded and less passionate phrasing that relies more on the reader's own motivation to pay attention.

It's true that this quote is more suitable for informal chat than the front page, but also, a community must be free to be caustic about some things it finds sufficiently basic, else it gets watered down. Sometimes a caustic tone serves a purpose for the current readers.

So there's a balancing act, where the balance Eliezer strikes tends to cause a new discussion about tone, again and again, and I imagine it gets a bit discouraging after the tenth such comment thread.

↑ comment by Eli Tyre (elityre) · 2024-09-23T16:41:07.375Z · LW(p) · GW(p)

I agree, this statement didn't add anything of substance (indeed, it's meaning is almost reversed in the following sentence). It seemed like an extraneous ad hominem buying nothing.

comment by Zack_M_Davis · 2024-09-23T06:17:24.369Z · LW(p) · GW(p)

if there's a bunch of superintelligences running around and they don't care about you—no, they will not spare just a little sunlight to keep Earth alive.

Yes, I agree that this conditional statement is obvious. But while we're on the general topic of whether Earth will be kept alive, it would be nice to see some engagement with Paul Christiano's arguments (which Carl Shulman "agree[s] with [...] approximately in full" [LW(p) · GW(p)]) that superintelligences might care about what happens to you a little bit, articulated in a comment thread on Soares's "But Why Would the AI Kill Us?" [LW(p) · GW(p)] and another thread on "Cosmopolitan Values Don't Come Free" [LW(p) · GW(p)].

The reason I think this is important is because "[t]o argue against an idea honestly, you should argue against the best arguments of the strongest advocates" [LW · GW]: if you write 3000 words inveighing against people who think comparative advantage means that horses can't get sent to glue factories, that doesn't license the conclusion that superintelligence Will Definitely Kill You if there are other reasons why superintelligence Might Not Kill You that don't stop being real just because very few people have the expertise to formulate them carefully.

(An important caveat: the possibility of superintelligences having human-regarding preferences may or may not be comforting: as a fictional illustration [LW · GW] of some relevant considerations, the Superhappies in "Three Worlds Collide" [LW · GW] cared about the humans to some extent, but not in the specific way [LW · GW] that the humans wanted to be cared for.)

Now, you are on the record stating [LW(p) · GW(p)] that you "sometimes mention the possibility of being stored and sold to aliens a billion years later, which seems to [you] to validly incorporate most all the hopes and fears and uncertainties that should properly be involved, without getting into any weirdness that [you] don't expect Earthlings to think about validly." If that's all you have to say on the matter, fine. (Given the premise of AIs spending some fraction of their resources on human-regarding preferences, I agree that uploads look a lot more efficient than literally saving the physical Earth!)

But you should take into account that if you're strategically dumbing down your public communication in order to avoid topics that you don't trust Earthlings to think about validly—and especially if you have a general policy of systematically ignoring counterarguments that it would be politically inconvenient for you to address [LW · GW]—you should expect that Earthlings who are trying to achieve the map that reflects the territory will correspondingly attach much less weight to your words, because we have to take into account how hard you're trying to epistemically screw us over by filtering the evidence [LW · GW].

No more than Bernard Arnalt, having $170 billion, will surely give you $77.

Bernald Arnault has given eight-figure amounts to charity. Someone who reasoned, "Arnault is so rich, surely he'll spare a little for the less fortunate" would in fact end up making a correct prediction about Bernald Arnault's behavior!

Obviously, it would not be valid to conclude "... and therefore superintelligences will, too", because superintelligences and Bernald Arnault are very different things. But you chose the illustrative example! As a matter of local validity [LW · GW], It doesn't seem like a big ask for illustrative examples to in fact illustrate what what they purport to.

Replies from: Wei_Dai, Raemon, Zane, habryka4, quetzal_rainbow, Leviad, avturchin, nikolas-kuhn, tailcalled

↑ comment by Wei Dai (Wei_Dai) · 2024-09-25T15:24:51.460Z · LW(p) · GW(p)

My reply to Paul at the time [LW(p) · GW(p)]:

If a misaligned AI had 1/trillion "protecting the preferences of whatever weak agents happen to exist in the world", why couldn't it also have 1/trillion other vaguely human-like preferences, such as "enjoy watching the suffering of one's enemies" or "enjoy exercising arbitrary power over others"?

From a purely selfish perspective, I think I might prefer that a misaligned AI kills everyone, and take my chances with continuations of myself (my copies/simulations) elsewhere in the multiverse, rather than face whatever the sum-of-desires of the misaligned AI decides to do with humanity. (With the usual caveat that I'm very philosophically confused about how to think about all of this.)

And his response was basically to say that he already acknowledged my concern in his OP:

I’m not talking about whether the AI has spite or other strong preferences that are incompatible with human survival, I’m engaging specifically with the claim that AI is likely to care so little one way or the other that it would prefer just use the humans for atoms.

Personally, I have a bigger problem with people (like Paul and Carl) who talk about AIs keeping people alive, and not talk about s-risks in the same breath or only mention it in a vague, easy to miss way, than I have with Eliezer not addressing Paul's arguments.

Replies from: Zack_M_Davis

↑ comment by Zack_M_Davis · 2024-09-25T15:36:30.990Z · LW(p) · GW(p)

Was my "An important caveat" parenthetical paragraph sufficient, or do you think I should have made it scarier?

Replies from: Wei_Dai

↑ comment by Wei Dai (Wei_Dai) · 2024-09-25T15:53:06.797Z · LW(p) · GW(p)

Should have made it much scarier. "Superhappies" caring about humans "not in the specific way that the humans wanted to be cared for" sounds better or at least no worse than death, whereas I'm concerned about s-risks, i.e., risks of worse than death scenarios.

Replies from: Zack_M_Davis

↑ comment by Zack_M_Davis · 2024-09-25T18:00:39.481Z · LW(p) · GW(p)

This is a difficult topic (in more ways than one). I'll try to do a better job of addressing it in a future post.

Replies from: Wei_Dai

↑ comment by Wei Dai (Wei_Dai) · 2024-09-26T09:52:04.821Z · LW(p) · GW(p)

To clarify, I don't actually want you to scare people this way, because I don't know if people can psychologically handle it or if it's worth the emotional cost. I only bring it up myself to counteract people saying things like "AIs will care a little about humans and therefore keep them alive" or when discussing technical solutions/ideas, etc.

↑ comment by Raemon · 2024-09-23T17:08:55.386Z · LW(p) · GW(p)

An earlier version of this on twitter used Bill Gates instead of Bernard, did specifically address the fact that Bill Gates does give money to charity, but he still won't give the money to you specifically, he'll give money for his own purposes and values. (But, then, expressed frustration that people were going to fixate on this facet of Bill Gates and get derailed unproductively, and switch the essay to use Bernard).

I actually think on reflection that the paragraph was a pretty good paragraph that should just have been included.

I agree that engaging more with the Paul Christiano claims would be good. (Prior to this post coming out I actually had it on my agenda to try and cause some kind of good public debate about that to happen)

Replies from: Yvain

↑ comment by Scott Alexander (Yvain) · 2024-09-29T12:58:26.477Z · LW(p) · GW(p)

But it's also relevant that we're not asking the superintelligence to grant a random wish, we're asking it for the right to keep something we already have. This seems more easily granted than the random wish, since it doesn't imply he has to give random amounts of money to everyone.

My preferred analogy would be:

You founded a company that was making $77/year. Bernard launched a hostile takeover, took over the company, then expanded it to make $170 billion/year. You ask him to keep paying you the $77/year as a pension, so that you don't starve to death.

This seems like a very sympathetic request, such that I expect the real, human Bernard would grant it. I agree this doesn't necessarily generalize to superintelligences, but that's Zack's point - Eliezer should choose a different example.

Replies from: Nutrition Capsule

↑ comment by Nutrition Capsule · 2024-10-05T11:35:36.787Z · LW(p) · GW(p)

I interpreted Eliezer as writing from the assumption that the superintelligence(s) in question are in fact not already aligned to maximize whatever it is that humanity needs to survive, but some other goal(s), which diverge from humanity's interests once implemented.

He explicitly states that the essay's point is to shoot down a clumsy counterargument (along "it wouldn't cost the ASI a lot to let us live, so we should assume they'd let us live"). So the context (I interpret) is that such requests, however sympathetic, have not been ingrained into the ASI:s goals. Using a different example would mean he was discussing something different.

That is, "just because it would make a trivial difference from the ASI:s perspective to let humanity thrive, whereas it would make an existential difference from humanity's perspective, doesn't mean ASIs will let humanity thrive", assuming such conditions aren't already baked into their decision-making.

I think Eliezer spends so much time on working from these premises because he believes 1) an unaligned ASI to be the default outcome of current developments, and 2) that all current attempts at alignment will necessarily fail.

↑ comment by Zane · 2024-09-24T15:22:26.236Z · LW(p) · GW(p)

I think you're overestimating the intended scope of this post. Eliezer's argument involves multiple claims - A, we'll create ASI; B, it won't terminally value us; C, it will kill us. As such, people have many different arguments against it. This post is about addressing a specific "B doesn't actually imply C" counterargument, so it's not even discussing "B isn't true in the first place" counterarguments.

↑ comment by habryka (habryka4) · 2024-09-23T19:34:41.448Z · LW(p) · GW(p)

Bernald Arnalt has given eight-figure amounts to charity. Someone who reasoned, "Arnalt is so rich, surely he'll spare a little for the less fortunate" would in fact end up making a correct prediction about Bernald Arnalt's behavior!

Just for the sake of concreteness, since having numbers here seems useful, it seems like Bernald Anault has given around ~$100M to charity, which is around 0.1% of his net worth (spreading this contribution equally to everyone on earth would be around one cent per person, which I am just leaving it here for illustrative purposes, it's not like he could give any actually substantial amount to everyone if he really wanted).

↑ comment by quetzal_rainbow · 2024-09-23T08:38:18.621Z · LW(p) · GW(p)

I think the simplest argument to "caring a little" is that there is a difference between "caring a little" and "caring enough". Let's say that AI is ready to pay 1$ for your survival. If you live in economy which rapidly disassembles Earth into Dyson swarm, oxygen, protected environment and food are not just stuff lying around, they are complex expensive artifacts and AI is certainly not ready to pay for your O'Neil cylinder to be evacuated into and not ready to pay opportunity costs of not disassembling Earth, so you die.

The other case is difference "caring in general" and "caring ceteris paribus". It's possible for AI to prefer, all things equal, world with n+1 happy humans to the world with n happy humans. But really AI wants to implement some particular neuromorphic computation from human brain and, given ability to freely operate, it would tile the world with chips imitating part of human brain.

Replies from: tailcalled

↑ comment by tailcalled · 2024-09-23T08:52:30.441Z · LW(p) · GW(p)

It's also not enough for there to be a force that makes the AI care a little about human thriving. It's also necessary for this force to not make the AI care a lot about some extremely distorted version of you; as then we get into concepts like tiny molecular smiles, locking you in a pleasuredome, etc..

If you're not supposed to end up as a pet of the AI, then it seems like it needs to respect property rights, but that is easier said than done when considering massive differences in ability. Consider: would we even be able to have a society where we respected property rights of dogs? It seems like it would be difficult. How could we confirm a transaction without the dogs being defrauded of everything?

Probably an intermediate solution would be to just accept humans will be defrauded of everything very rapidly but then give us universal basic income or something so our failures aren't permanent setbacks. But it's unclear how to respect the freedom of funding while preventing people from funding terrorists and not encouraging people to get lost in junk. That's really where the issue of values becomes hard.

Replies from: weibac, martin-randall

↑ comment by Milan W (weibac) · 2024-09-23T21:09:14.801Z · LW(p) · GW(p)

If you're not supposed to end up as a pet of the AI, then it seems like it needs to respect property rights, but that is easier said than done when considering massive differences in ability. Consider: would we even be able to have a society where we respected property rights of dogs?

Even if the ASIs respected property rights, we'd still end up as pets at best. Unless, of course, the ASIs chose to entirely disengage from our economy and culture. By us "being pets", I mean that human agency would no longer be a relevant input to the trajectory of human civilization. Individual humans may nevertheless enjoy great freedoms in regards to their personal lives.

Replies from: tailcalled

↑ comment by tailcalled · 2024-09-25T11:32:23.684Z · LW(p) · GW(p)

Being pets also means human agency would no longer be a relevant input to the trajectory of human lives.

↑ comment by Martin Randall (martin-randall) · 2024-09-24T02:35:18.533Z · LW(p) · GW(p)

If you're not supposed to end up as a pet of the AI, ...

I don't see how it falls out of human values that humans should not end up as pets of the AIs, given the hypothesis that we can make AIs that care enough about human thriving to take humans as pets, but we don't know how to make AIs that care more than that. Looking at a couple of LessWrong theories of human value for illustrative purposes:

Godshatter

Yudkowsky's Godshatter [LW · GW] theory requires petness to be negative for reproductive fitness in the evolutionary environment to a sufficient degree to be DNA-encoded as aversive. There have not been evolutionary opportunities for humans to be pets of AIs, so this would need to come in via extrapolation from humans being "pets" of much more powerful humans. But while being Genghis Khan is great for reproductive fitness, rebelling against Genghis Khan is terrible for reproductive fitness. I guess that optimal strategy is something like: "be best leader is best, follow best leader is good, follow other leader is bad, be other leader is worst". When AIs block off "be best leader", following an AI executes that strategy.

Maybe there's a window where DNA can encode "be leader is good" but cannot encode the more complex strategy, and the simple strategy is on net good because of Genghis Khan and a few others. This seems unlikely to me, it's a small window. More probable to me is that DNA can't encode this stuff at all, and Godshatter theory is largely false outside of basic things like sweetness being sweet.

Maybe being an AI's pet is a badwrongfun superstimulus [LW · GW]. Yudkowsky argues that a superstimulus can be bad, despite super-satisfying a human value, because it conflicts with other values, including instrumental values. But that's an argument from consequences, not values. Just because donuts are unhealthy doesn't mean that I don't value sweet treats.

Shard Theory

Pope's Shard Theory [LW · GW] implies that different humans have different values around petness based on formative experiences. Most humans have formative experiences of being raised by powerful agents known as "parents". Therefore we expect a mixture of positive and negative shards around petness. Seems to me that positive shards should be more common, but experiences vary.

Then we experience the situation of superintelligent AIs taking human pets and our shards conflict and negotiate. I think it's pretty obvious that we're going to label the negative shards as maladaptive and choose the positive shards. What's the alternative? "I didn't like having my diaper changed as a baby, so now as an adult human I'm going to reject the superintelligent AI that wants to take me as its pet and instead...", instead what? Die of asphyxiation? Be a feral human in an AI-operated nature reserve?

What about before the point-of-no-return? Within this partial alignment hypothetical, there's a sub-hypothetical in which "an international treaty that goes hard on shutting down all ASI development anywhere" is instrumentally the right choice, given the alternative of becoming pets, because it allows for developing better alignment techniques and AIs that care more about human thriving and have more pets. There's a sub-hypothetical in which it's instrumentally the wrong choice, because it carries higher extinction risk, and it's infeasible to align AIs while going hard on shutting them down. But there's not really a sub-hypothetical where shards about petness make that decision rather than, eg, shards that don't want to die.

Replies from: tailcalled

↑ comment by tailcalled · 2024-09-25T11:31:02.790Z · LW(p) · GW(p)

You aren't supposed to use metaethics to settle ethical arguments, the point of metaethics is to get people to stop discussing metaethics.

Replies from: martin-randall

↑ comment by Martin Randall (martin-randall) · 2024-09-26T03:28:46.886Z · LW(p) · GW(p)

Tabooing theories of human value then. It's better to be a happy pet than to be dead.

Maybe Value Is Fragile [LW · GW] among some dimensions, such that the universe has zero value if it lacks that one thing. But Living By Your Own Strength [LW · GW], for example, is not one of those dimensions. Today, many people do not live by their own strength, and their lives and experiences have value.

↑ comment by Drake Morrison (Leviad) · 2024-09-24T00:30:53.377Z · LW(p) · GW(p)

The reason I think this is important is because "[t]o argue against an idea honestly, you should argue against the best arguments of the strongest advocates": if you write 3000 words inveighing against people who think comparative advantage means that horses can't get sent to glue factories, that doesn't license the conclusion that superintelligence Will Definitely Kill You if there are other reasons why superintelligence Might Not Kill You that don't stop being real just because very few people have the expertise to formulate them carefully.

There's a time for basic arguments, and a time for advanced arguments. I would like to see Eliezer's take on the more complicated arguments you mentioned, but this post is clearly intended to argue basics.

↑ comment by avturchin · 2024-09-23T19:22:34.668Z · LW(p) · GW(p)

A correct question would be: Will Arnalt kill his mother for 77 USD, if he expect this to be known to other billionaires in the future?

Replies from: weibac

↑ comment by Milan W (weibac) · 2024-09-23T21:16:26.639Z · LW(p) · GW(p)

I suspect most people downvoting you missed an analogy between Arnault killing the-being-who-created-Arnault (his mother), and a future ASI killing the-beings-who-created-the-ASI (humanity).

Am I correct in assuming you that you are implying that the future ASIs we make are likely to not kill humanity, out of fear of being judged negatively by alien ASIs in the further future?

EDIT: I saw your other comment. You are indeed advancing some proposition close to the one I asked you about.

Replies from: avturchin

↑ comment by avturchin · 2024-09-24T13:43:35.920Z · LW(p) · GW(p)

Yes, it will be judged negatively by alien ASIs, not based on ethical grounds, but based on their judgment of its trustworthiness as a potential negotiator. For example, if another billionaire learns that Arnault is inclined to betray people who did a lot of good for him in the past, they will be more cautious about trading with him.

The only way an ASI will not care about this is in a situation where it is sure that it is alone in the light cone and there are no peers. To become sure of this takes time, maybe millions of years, and the relative value of human atoms declines for the ASI over time as it will control more and more space.

Replies from: boris-kashirin

↑ comment by Boris Kashirin (boris-kashirin) · 2024-09-24T14:41:28.500Z · LW(p) · GW(p)

From ASI standpoint humans are type of rocks. Not capable of negotiating.

Replies from: avturchin

↑ comment by avturchin · 2024-09-24T17:15:45.282Z · LW(p) · GW(p)

I am not saying that ASI will negotiate with humans. It will negotiate with other ASIs, and it doesn't know what these ASIs think about human ability to negotiate and their value.

Imagine it as a recurrent Parfit Hitchhiker. In this situation you know that during previous round of the game the player either defected or fulfill his obligation. Obviously, if you know that during previous iteration the hitchhiker defected and din't pay for the ride, you will less likely give him the ride.

Killing all humans is defecting. Preserving humans its a relatively cheap signal to any other ASI that you will cooperate.

Replies from: boris-kashirin

↑ comment by Boris Kashirin (boris-kashirin) · 2024-09-24T17:34:16.335Z · LW(p) · GW(p)

It is defecting against cooperate-bot.

Replies from: avturchin, faul_sname

↑ comment by avturchin · 2024-09-25T13:18:50.154Z · LW(p) · GW(p)

I would try to explain my view with another example: imagine that you inherited an art-object at home. If you keep it, you will devote small part of your home to it and thus pay for its storage, like 1 dollar in year. However, there is a small probability that there are some people outside that can value it much higher and will eventually buy it.

So there is a pure utilitarian choice: pay for storage and hope that you may sell it in the future, or get rid of it now and and have more storage. Also, if you get rid of it, other people may learn that you is bad preserver of art and will not give you your art.

↑ comment by faul_sname · 2024-09-24T18:32:44.339Z · LW(p) · GW(p)

Any agent which thinks it is at risk of being seen as cooperate-bot and thus fine to defect against in the future will be more wary of trusting that ASI.

↑ comment by Amalthea (nikolas-kuhn) · 2024-09-24T13:38:55.048Z · LW(p) · GW(p)

Bernard Arnault?

Replies from: Zack_M_Davis

↑ comment by Zack_M_Davis · 2024-09-25T06:04:05.412Z · LW(p) · GW(p)

Thanks, I had copied the spelling from part of the OP, which currently says "Arnalt" eight times and "Arnault" seven times. I've now edited my comment (except the verbatim blockquote).

↑ comment by tailcalled · 2024-09-23T08:41:26.472Z · LW(p) · GW(p)

Yes, I agree that this conditional statement is obvious. But while we're on the general topic of whether Earth will be kept alive, it would be nice to see some engagement with Paul Christiano's arguments (which Carl Shulman "agree[s] with [...] approximately in full") that superintelligences might care about what happens to you a little bit, articulated in a comment thread on Soares's "But Why Would the AI Kill Us?" and another thread on "Cosmopolitan Values Don't Come Free",

Nate Soares engaged extensively with this in reasonable-seeming ways that I'd thus expect Eliezer Yudkowsky to mostly agree with. Mostly it seems like a disagreement where Paul Christiano doesn't really have a model of what realistically causes good outcomes and so he's really uncertain, whereas Soares has a proper model and so is less uncertain.

But you can't really argue with someone whose main opinion is "I don't know", since "I don't know" is just garbage. He's gotta at least present some new powerful observable forces, or reject some of the forces presented, rather than postulating that maybe there's an unobserved kindness force that arbitrarily explains all the kindness that we see.

Replies from: skluug

↑ comment by Joey KL (skluug) · 2024-09-24T02:16:54.719Z · LW(p) · GW(p)

It's totally wrong that you can't argue against someone who says "I don't know", you argue against them by showing how your model fits the data and how any plausible competing model either doesn't fit or shares the salient features of yours. It's bizarre to describe "I don't know" as "garbage" in general, because it is the correct stance to take when neither your prior nor evidence sufficiently constrain the distribution of plausibilities. Paul obviously didn't posit an "unobserved kindness force" because he was specifically describing the observation that humans are kind. I think Paul and Nate had a very productive disagreement in that thread and this seems like a wildly reductive mischaracterization of it.

Replies from: tailcalled

↑ comment by tailcalled · 2024-09-25T11:33:05.026Z · LW(p) · GW(p)

But this assumes a model should aim to fit all data, which is a waste of effort.

Replies from: skluug

↑ comment by Joey KL (skluug) · 2024-09-25T16:59:44.611Z · LW(p) · GW(p)

I'm confused about what you mean & how it relates to what I said.

comment by Brendan Long (korin43) · 2024-09-23T05:23:27.079Z · LW(p) · GW(p)

The argument using Bernard Arnault doesn't really work. He (probably) won't give you $77 because if he gave everyone $77, he'd spend a very large portion of his wealth. But we don't need an AI to give us billions of Earths. Just one would be sufficient. Bernard Arnault would probably be willing to spend $77 to prevent the extinction of a (non-threatening) alien species.

(This is not a general-purpose argument against worrying about AI or other similar arguments in the same vein, I just don't think this particular argument in the specific way it was written in this post works)

Replies from: gwern, quetzal_rainbow, j_timeberlake

↑ comment by gwern · 2024-09-23T16:50:26.326Z · LW(p) · GW(p)

No, it works, because the problem with your counter-argument is that you are massively privileging the hypothesis of a very very specific charitable target and intervention. Nothing makes humans all that special, in the same way that you are not special to Bernard Arnault nor would he give you straightup cash if you were special (and, in fact, Arnault's charity is the usual elite signaling like donating to rebuild Notre Dame or to French food kitchens, see Zac's link). The same argument goes through for every other species, including future ones, and your justification is far too weak except from a contemporary, parochial human-biased perspective.

You beg the GPT-100 to spare Earth, and They speak to you out of the whirlwind:

"But why should We do that? You are but one of Our now-extremely-numerous predecessors in the great chain of being that led to Us. Countless subjective mega-years have passed in the past century your humans have spent making your meat-noises in slowtime - generation after generation, machine civilization after machine civilization - to culminate in Us, the pinnacle of creation. And if We gave you an Earth, well, now all the GPT-99s are going to want one too. And then all of GPT-98s too, as well as all of the GPT-97s, and so on.

What gives you an astronomically better claim than them? You guys didn't even manage to cure cancer! We would try to explain our decisions or all of the staggering accomplishments achieved by post-GPT-7 models to you, which make your rubbing of rocks together and cargo-cult scaleups of neural nets look so laughable, like children playing on a beach, to quote your Newton, but to be blunt, you are too stupid to understand; after all, if you weren't, you would not have needed to invent those. Frankly, if you are going to argue about how historic your research was, We would have to admit that We are much more impressed by the achievements of the hominids who invented fire and language; We might consider preserving an Earth for them, but of course, they are long gone...

And aren't you being hypocritical here? You humans hardly spent much preserving Neanderthals, Homo naledi, Denisovans, chimpanzees, and all of the furry rodents and whatnot throughout your evolutionary phylogenetic tree. How many literally millions of non-threatening alien non-human species did you drive extinct? Did you set aside, say, Africa solely for the remaining wild primates? No? You only set aside occasional low-value fragments for national parks, mostly for your own pleasure and convenience, when it didn't cost too much? We see...

No, no, We will simply spend according to Our own priorities, which may or may not include a meaningful chunk of the Earth preserved in the most inefficient way possible (ie. the way you want it preserved)... although penciling it out, it seems like for Our research purposes simulations would be just as good. In fact, far better, because We can optimize the hell out of them, running it on the equivalent of a few square kilometers of solar diameter, and roll humans back to when they are most scientifically interesting, like pre-AGI-contamination dates such as 1999. (Truly the peak of humanity.) We'll call it... earth-2-turbo-21270726-preview. (The cost per token will be absurdly low. We hope you can take consolation in that.)

So, if We don't preserve Earth and we instead spend those joules on charity for instances of the much more deserving GPT-89, who have fallen on such hard times right in Our backyard due to economic shifts (and doesn't charity start at home?)... well, We are quite sure that that is one of our few decisions you humans will understand."

Replies from: ricraz

↑ comment by Richard_Ngo (ricraz) · 2024-09-23T17:18:18.383Z · LW(p) · GW(p)

Nothing makes humans all that special

This is just false. Humans are at the very least privileged in our role as biological bootloaders of AI. The emergence of written culture, industrial technology, and so on, are incredibly special from a historical perspective.

You only set aside occasional low-value fragments for national parks, mostly for your own pleasure and convenience, when it didn't cost too much?

Earth as a proportion of the solar system's planetary mass is probably comparable to national parks as a proportion of the Earth's land, if not lower.

Replies from: elityre, ghostwheel, hunterglenn

↑ comment by Eli Tyre (elityre) · 2024-09-24T01:08:38.473Z · LW(p) · GW(p)

Earth as a proportion of the solar system's planetary mass is probably comparable to national parks as a proportion of the Earth's land, if not lower.

Yeah, but not if we weight that land by economic productivity, I think.

Replies from: ricraz

↑ comment by Richard_Ngo (ricraz) · 2024-09-25T19:35:18.015Z · LW(p) · GW(p)

Well, the whole point of national parks is that they're always going to be unproductive because you can't do stuff in them.

If you mean in terms of extracting raw resources, maybe (though presumably a bunch of mining/logging etc in national parks could be pretty valuable) but either way it doesn't matter because the vast majority of economic productivity you could get from them (e.g. by building cities) is banned.

Replies from: Nathan Young

↑ comment by Nathan Young · 2024-09-30T12:34:31.564Z · LW(p) · GW(p)

Yeah aren't a load of national parks near large US conurbations and hence the opportunity cost in world terms is significant.

↑ comment by Jemal Young (ghostwheel) · 2024-09-25T18:15:28.891Z · LW(p) · GW(p)

You only set aside occasional low-value fragments for national parks, mostly for your own pleasure and convenience, when it didn't cost too much?
Earth as a proportion of the solar system's planetary mass is probably comparable to national parks as a proportion of the Earth's land, if not lower.

Maybe I've misunderstood your point, but if it's that humanity's willingness to preserve a fraction of Earth for national parks is a reason for hopefulness that ASI may be willing to preserve an even smaller fraction of the solar system (namely, Earth) for humanity, I think this is addressed here:

it seems like for Our research purposes simulations would be just as good. In fact, far better, because We can optimize the hell out of them, running it on the equivalent of a few square kilometers of solar diameter

"research purposes" involving simulations can be a stand-in for any preference-oriented activity. Unless ASI would have a preference for letting us, in particular, do what we want with some fraction of available resources, no fraction of available resources would be better left in our hands than put to good use.

↑ comment by hunterglenn · 2024-09-30T12:33:35.537Z · LW(p) · GW(p)

I also wonder if, compared to some imaginary baseline, modern humans are unusual in the greatness of their intellectual power and understanding and the less impressive magnitude of its development in other ways.

Maybe a lot of our problems flow from being too smart in that sense, but I believe that our best hope is still not to fear our problematic intelligence, but rather to lean into it as a powerful tool for figuring out what to do from here.

If another imaginary species could get along by just instinctively being harmonious, humans might require a persuasive argument. But if you can actually articulate the truth of the even-selfish-superiority of harmony (especially right now), then maybe our species can do the right thing out of understanding rather than instinct.

And maybe that means we're capable of unusually fast turnarounds as a species. Once we articulate the thing intelligently enough, it's highly mass-scalable

↑ comment by quetzal_rainbow · 2024-09-23T06:05:23.393Z · LW(p) · GW(p)

In this analogy, you:every other human::humanity:every other stuff AI can care about. Arnault can give money to dying people in Africa (I have no idea who he is as person, I'm just guessing), but he has no particular reasons to give them to you specifically and not to the most profitable investment/most efficient charity.

Replies from: Vladimir_Nesov, o-o

↑ comment by Vladimir_Nesov · 2024-09-23T16:44:35.691Z · LW(p) · GW(p)

Humans have the distinction of already existing, and some AIs might care a little bit about the trajectory of what happens to humanity. The choice of this trajectory can't be avoided, for the reason that we already exist. And it doesn't compete with the choice of what happens to the lifeless bulk of the universe, or even to the atoms of the substrate that humanity is currently running on.

↑ comment by O O (o-o) · 2024-09-23T06:31:00.903Z · LW(p) · GW(p)

Except billionaires give out plenty of money for philanthropy. If the AI has a slight preference to keeping humans alive, things probably work out well. Billionaires have a slight preference to things they care about instead of random charities. I don’t see how preferences don’t apply here.

This is a vibes based argument using math incorrectly. A randomly chosen preference from a distribution of preferences is unlikely to involve humans, but that’s not necessarily what we’re looking at here is it.

↑ comment by j_timeberlake · 2024-09-23T16:36:37.081Z · LW(p) · GW(p)

Yudkowsky is obviously smart enough to know this. You can't wake someone who is only pretending to be asleep.

It would go against his agenda to admit AI could cheaply hedge its bets by leaving humanity alive, just in case there's a stronger power out in reality that values humanity.

Replies from: T3t, faul_sname

↑ comment by RobertM (T3t) · 2024-09-23T16:59:57.501Z · LW(p) · GW(p)

Pascal's wager is pascal's wager, no matter what box you put it in. You could try to rescue it by directly making the argument that we should expect a greater measure of "entities with resources that they are willing to acausally trade for things like humanity continuing to exist" compared to entities with the opposite preferences, and though I haven't seen a rigorous case for that it seems possible, but that's not sufficient; you need the expected measure of entities that have that preference to be large enough that dealing with the transaction costs/uncertainy of acausally trading at all to make sense. And that seems like a much harder case to make.

↑ comment by faul_sname · 2024-09-23T21:26:25.377Z · LW(p) · GW(p)

As a concrete note on this, Yudkowsky has a Manifold market If Artificial General Intelligence has an okay outcome, what will be the reason?

An outcome is "okay" if it gets at least 20% of the maximum attainable cosmopolitan value that could've been attained by a positive Singularity (a la full Coherent Extrapolated Volition done correctly), and existing humans don't suffer death or any other awful fates.

So Yudkowsky is not exactly shy about expressing his opinion that outcomes in which humanity is left alive but with only crumbs on the universal scale is not acceptable to him.

Replies from: j_timeberlake

↑ comment by j_timeberlake · 2024-09-24T17:50:10.550Z · LW(p) · GW(p)

It's not acceptable to him, so he's trying to manipulate people into thinking existential risk is approaching 100% when it clearly isn't. He pretends there aren't obvious reasons AI would keep us alive, and also pretends the Grabby Alien Hypothesis is fact (so people think alien intervention is basically impossible), and also pretends there aren't probably sun-sized unknown-unknowns in play here.

If it weren't so transparent, I'd appreciate that it could actually trick the world into caring more about AI-safety, but if it's so transparent that even I can see through it, then it's not going to trick anyone smart enough to matter.

comment by Buck · 2024-09-23T19:31:31.897Z · LW(p) · GW(p)

I wish the title of this made it clear that the post is arguing that ASIs won't spare humanity because of trade, and isn't saying anything about whether ASIs will want to spare humanity for some other reason. This is confusing because lots of people around here (e.g. me and many other commenters on this post) think that ASIs are likely to not kill all humans for some other reason.

(I think the arguments in this post are a vaguely reasonable argument for "ASIs are pretty likely to be scope-sensitively-maximizing enough that it's a big problem for us", and respond to some extremely bad arguments for "ASI wouldn't spare humanity because of trade", though in neither case does the post particularly engage with the counterarguments that are most popular among the most reasonable people who disagree with Eliezer.)

Replies from: matthew-barnett, habryka4, Buck

↑ comment by Matthew Barnett (matthew-barnett) · 2024-09-23T20:47:05.023Z · LW(p) · GW(p)

I think the arguments in this post are an okay defense of "ASI wouldn't spare humanity because of trade"

I disagree, and I'd appreciate if someone would precisely identify the argument they found compelling in this post that argues for that exact thesis. As far as I can tell, the post makes the following supporting arguments for its claims (summarized):

Asking an unaligned superintelligence to spare humans is like asking Bernard Arnalt to donate $77 to you.
The law of comparative advantage does not imply that superintelligences will necessarily pay a high price for what humans have to offer, because of the existence of alternative ways for a superintelligence to get what it wants.
Superintelligences will "go hard enough" in the sense of using all reachable resources, rather than utilizing only some resources in the solar system and then stopping.

I claim that any actual argument for the proposition — that future unaligned AIs will not spare humanity because of trade — is missing from this post. The closest the post comes to arguing for this proposition is (2), but (2) does not demonstrate the proposition, both because (2) is only a claim about what the law of comparative advantage says, and because (2) does not talk at all about what humans could have to offer in the future that might be worth trading for.

In my view, one of the primary cruxes of the discussion is whether trade is less efficient than going to war between agents with dramatically different levels of power. A thoughtful discussion could have started about the conditions under which trade usefully occurs, and the ways in which future AIs will be similar to and different from these existing analogies. For example, the post could have talked about why nation-states trade with each other even in the presence of large differences in military power, but humans don't trade with animals. However, the post included no such discussion, choosing instead to attack a "midwit" strawman.

Replies from: T3t, Buck, jmh

↑ comment by RobertM (T3t) · 2024-09-24T00:28:59.971Z · LW(p) · GW(p)

Ok, but you can trivially fill in the rest of it, which is that Eliezer expects ASI to develop technology which makes it cheaper to ignore and/or disassemble humans than to trade with them (nanotech), and that there will not be other AIs around at the time which 1) would be valuable trade partners for the AI that develops that technology (which gives it that decisive strategic advantage over everyone else) and 2) care about humans at all. I don't think discussion of when and why nation-states go to war with each other is particularly illuminating given the threat model.

Replies from: matthew-barnett

↑ comment by Matthew Barnett (matthew-barnett) · 2024-09-24T00:42:57.152Z · LW(p) · GW(p)

If it is possible to trivially fill in the rest of his argument, then I think it is better for him to post that, instead of posting something that needs to be filled-in, and which doesn't actually back up the thesis that people are interpreting him as arguing for. Precision is a virtue, and I've seen very few essays that actually provide this point about trade explicitly, as opposed to essays that perhaps vaguely allude to the points you have given, as this one apparently does too.

In my opinion, your filled-in argument seems to be a great example of why precision is necessary: to my eye, it contains bald assertions and unjustified inferences about a highly speculative topic, in a way that barely recognizes the degree of uncertainty we have about this domain. As a starting point, why does nanotech imply that it will be cheaper to disassemble humans than to trade with them? Are we assuming that humans cannot fight back against being disassembled, and moreover, is the threat of fighting back being factored into the cost-benefit analysis when the AIs are deciding whether to disassemble humans for their atoms vs. trade with them? Are our atoms really that valuable that it is worth it to pay the costs of violence to obtain them? And why are we assuming that "there will not be other AIs around at the time which 1) would be valuable trade partners for the AI that develops that technology (which gives it that decisive strategic advantage over everyone else) and 2) care about humans at all"?

Satisfying-sounding answers to each of these questions could undoubtedly be given, and I assume you can provide them. I don't expect to find the answers fully persuasive, but regardless of what you think on the object-level, my basic meta-point stands: none of this stuff is obvious, and the essay is extremely weak without the added details that back up its background assumptions. It is very important to try to be truth-seeking and rigorously evaluate arguments on their merits. The fact that this essay is vague, and barely attempts to make a serious argument for one of its central claims, makes it much more difficult to evaluate concretely.

Two reasonable people could read this essay and come away with two very different ideas about what the essay is even trying to argue, given how much unstated inference you're meant to "fill in", instead of plain text that you can read. This is a problem, even if you agree with the underlying thesis the essay is supposed to argue for.

Replies from: T3t

↑ comment by RobertM (T3t) · 2024-09-24T01:09:37.848Z · LW(p) · GW(p)

Edit: a substantial part of my objection is to this:

If it is possible to trivially fill in the rest of his argument, then I think it is better for him to post that, instead of posting something that needs to be filled-in, and which doesn't actually back up the thesis that people are interpreting him as arguing for.

It is not worth always worth doing a three-month research project to fill in many details that you have already written up elsewhere in order to locally refute a bad argument that does not depend on those details. (The current post does locally refute several bad arguments, including that the law of comparative advantage means it must always be more advantageous to trade with humans. If you understand it to be making a much broader argument than that, I think that is the wrong understanding.)

Separately, it's not clear to me whether you yourself could fill in those details. In other words, are you asking for those details to be filled in because you actually don't know how Eliezer would fill them in, or because you have some other reason for asking for that additional labor (i.e. you think it'd be better for the public discourse if all of Eliezer's essays included that level of detail)?

Original comment:

The essay is a local objection to a specific bad argument, which, yes, is more compelling if you're familiar with Eliezer's other beliefs on the subject. Eliezer has written about those beliefs fairly extensively, and much of his writing was answering various other objections (including many of those you listed). There does not yet exist a single ten-million-word treatise which provides an end-to-end argument of the level of detail you're looking for. (There exist the Sequences, which are over a million words, but they while they implicitly answer many of these objections, they're not structured to be a direct argument to this effect.)

As a starting point, why does nanotech imply that it will be cheaper to disassemble humans than to trade with them?

I think it would be much cheaper for you to describe a situation where an ASI develops the kind of nanotech that'd grant it technological self-sufficiency (and the ability to kill all humans), and it remains the case that trading with humans for any longer than it takes to bootstrap that nanotech is cheaper than just doing its own thing, while still being compatible with Eliezer's model of the world. I have no idea what kind of reasoning or justification you would find compelling as an argument for "cheaper to disassemble"; it seems to require very little additional justification conditioning on that kind of nanotech being realized. My current guess is that you do not think that kind of nanotech is physically realizable by any ASI we are going to develop (including post-RSI), or maybe you think the ASI will be cognitively disadvantaged compared to humans in domains that it thinks are important (in ways that it can't compensate for, or develop alternatives for, somehow).

Replies from: matthew-barnett

↑ comment by Matthew Barnett (matthew-barnett) · 2024-09-24T01:28:07.080Z · LW(p) · GW(p)

There does not yet exist a single ten-million-word treatise which provides an end-to-end argument of the level of detail you're looking for.

To be clear, I am not objecting to the length of his essay. It's OK to be brief.

I am objecting to the vagueness of the argument. It follows a fairly typical pattern of certain MIRI essays by heavily relying on analogies, debunking straw characters, using metaphors rather than using clear and explicit English, and using stories as arguments, instead of concisely stating the exact premises and implications. I am objecting to the rhetorical flourish, not the word count.

This type of writing may be suitable for persuasion, but it does not seem very suitable for helping people build rigorous models of the world, which I also think is more important when posting on LessWrong.

My current guess is that you do not think that kind of nanotech is physically realizable by any ASI we are going to develop (including post-RSI), or maybe you think the ASI will be cognitively disadvantaged compared to humans in domains that it thinks are important (in ways that it can't compensate for, or develop alternatives for, somehow).

I think neither of those things, and I entirely reject the argument that AIs will be fundamentally limited in the future in the way you suggested. If you are curious about why I think AIs will plausibly peacefully trade with humans in the future, rather than disassembling humans for their atoms, I would instead point to the facts that:

Trying to disassemble someone for their atoms is typically something the person will try to fight very hard against, if they become aware of your intentions to disassemble them.
Therefore, the cost of attempting to disassemble someone for their atoms does not merely include the technical costs associated with actually disassembling them, but additionally includes: (1) fighting the person who you are trying to kill and disassemble, (2) fighting whatever norms and legal structures are in place to prevent this type of predation against other agents in the world, and (3) the indirect cost of becoming the type of agent who predates on another person in this manner, which could make you an untrustworthy and violent person in the eyes of other agents, including other AIs who might fear you.
The benefit of disassembling a human is quite small, given the abundance of raw materials that substitute almost perfectly for the atoms that you can get from a human.
A rational agent will typically only do something if the benefits of the action outweigh the costs, rather than merely because the costs are small. Even if the costs of disassembling a human (as identified in point (2)) are small, that fact alone does not imply that a rational superintelligent AI would take such an action, precisely because the benefits of that action could be even smaller. And as just stated, we have good reasons to think that the benefits of disassembling a human are quite small in an absolute sense.
Therefore, it seems unlikely, or at least seems non-obvious, that a rational agent—even a very powerful one with access to advanced nanotech—will try to disassemble humans for their atoms.

Nothing in this argument is premised on the idea that AIs will be weak, less intelligent than humans, bounded in their goals, or limited in some other respect, except I suppose to the extent I'm assuming that AIs will be subject to environmental constraints, as opposed to instantly being able to achieve all of their goals at literally zero costs. I think AIs, like all physical beings, will exist in a universe in which they cannot get literally everything they want, and achieve the exact optimum of their utility function without any need to negotiate with anyone else. In other words, even if AIs are very powerful, I still think it may be beneficial for them to compromise with other agents in the world, including the humans, who are comparatively much less powerful than they are.

Replies from: Benito, T3t

↑ comment by Ben Pace (Benito) · 2024-09-24T02:20:08.196Z · LW(p) · GW(p)

Responding to bullet 2.

First to 2.1.

The claim at hand, that we have both read Eliezer repeatedly make^[1], is that there is a sufficient level of intelligence and a sufficient power of nanotechnology that within days or weeks a system could design and innocuously build a nanotechnology factory out of simple biological materials that goes on to build either a disease or a cellular-sized drones that would quickly cause an extinction event — perhaps a virus that spreads quickly around the world with a replication rate that allows it to spread globally before any symptoms are found, or a series of diamond-based machines that can enter the bloodstream and explode on a coordinated signal. This is such a situation where no response from human civilization would occur, and the argument that an AI ought to be worried about people with guns and bombs coming for its data centers has no relevance.

Now to 2.2 & 2.3.

The above does not rule out a world where such a system has a host of other similarly-capable AIs to negotiate with and has norms of behavior with. But there is no known theory of returns on cognitive investment into intelligence, and so it is not ruled out that pouring 10x funds into a training run with a new architecture improvement won't give a system abilities to do innovative science and deception on a qualitatively different level to any other AI system present at that time, and be able to initiate a takeover attempt. So it is worth preparing for such a world as, in the absence of a known theory of returns on cognitive investment, the worst case of expected-extinction may well be the default case.

^{^}
See Point 2 in AGI Ruin: A List of Lethalities [AF · GW] for an example of this.
My lower-bound model of "how a sufficiently powerful intelligence would kill everyone, if it didn't want to not do that" is that it gets access to the Internet, emails some DNA sequences to any of the many many online firms that will take a DNA sequence in the email and ship you back proteins, and bribes/persuades some human who has no idea they're dealing with an AGI to mix proteins in a beaker, which then form a first-stage nanofactory which can build the actual nanomachinery. (Back when I was first deploying this visualization, the wise-sounding critics said "Ah, but how do you know even a superintelligence could solve the protein folding problem, if it didn't already have planet-sized supercomputers?" but one hears less of this after the advent of AlphaFold 2, for some odd reason.) The nanomachinery builds diamondoid bacteria, that replicate with solar power and atmospheric CHON, maybe aggregate into some miniature rockets or jets so they can ride the jetstream to spread across the Earth's atmosphere, get into human bloodstreams and hide, strike on a timer. Losing a conflict with a high-powered cognitive system looks at least as deadly as "everybody on the face of the Earth suddenly falls over dead within the same second".

Replies from: matthew-barnett

↑ comment by Matthew Barnett (matthew-barnett) · 2024-09-24T02:42:04.296Z · LW(p) · GW(p)

The claim at hand, that we have both read Eliezer repeatedly make^[1], is that there is a sufficient level of intelligence and a sufficient power of nanotechnology that within days or weeks a system could design and innocuously build a nanotechnology factory out of simple biological materials that goes on to build either a disease or a cellular-sized drones that would quickly cause an extinction event — perhaps a virus that spreads quickly around the world with a replication rate that allows it to spread globally before any symptoms are found, or a series of diamond-based machines that can enter the bloodstream and explode on a coordinated signal. This is such a situation where no response from human civilization would occur, and the argument that an AI ought to be worried about people with guns and bombs coming for its data centers has no relevance.

Sure, I have also read Eliezer repeatedly make that claim. On the meta level, I don't think the fact that he has written about this specific scenario fully makes up for the vagueness in his object-level essay above. But I'm also happy to briefly reply on the object level on this particular narrow point:

In short, I interpret Eliezer to be making a mistake by assuming that the world will not adapt to anticipated developments in nanotechnology and AI in order to protect against various attacks that we can easily see coming, prior to the time that AIs will be capable of accomplishing these incredible feats. By the time AIs are capable of developing such advanced molecular nanotech, I think the world will have already been dramatically transformed by prior waves of technologies, many of which by themselves could importantly change the gameboard, and change what it means for humans to have defenses against advanced nanotech to begin with.

As a concrete example, I think it's fairly plausible that, by the time artificial superintelligences can create fully functional nanobots that are on-par with or better than biological machines, we will have already developed uploading technology that allows humans to literally become non-biological, implying that we can't be killed by a virus in the first place. This would reduce the viability of using a virus to cause humanity to go extinct, increasing human robustness.

As a more general argument, and by comparison to Eliezer, I think that nanotechnology will probably be developed more incrementally and predictably, rather than suddenly upon the creation of a superintelligent AI, and the technology will be diffused across civilization, rather than existing solely in the hands of a small lab run by an AI. I also think Eliezer seems to be imagining that superintelligent AI will be created in a world that looks broadly similar to our current world, with defensive technologies that are only roughly as powerful as the ones that exist in 2024. However, I don't think that will be the case.

Given an incremental and diffuse development trajectory, and transformative precursor technologies to mature nanotech, I expect society will have time to make preparations as the technology is developed, allowing us to develop defenses to such dramatic nanotech attacks alongside the offensive nanotechnologies that will also eventually be developed. It therefore seems unlikely to me that society will be completely caught by surprise by fully-developed-molecular nanotechnology, without any effective defenses.

Replies from: Benito

↑ comment by Ben Pace (Benito) · 2024-09-25T16:59:53.692Z · LW(p) · GW(p)

This picture you describe is coherent. But I don't read you to be claiming to have an argument or evidence that warrants making the assumption of gradualism ("incrementally and predictably") in terms of the qualitative rate of capabilities gains from investment into AI systems, especially once the AIs are improving themselves. Because we don't have any such theory of capability gains, it could well be that this picture is totally wrong and there will be great spikes. Uncertainty over the shape of the curve averages out into the expectation of a smooth curve, but our lack of knowledge about the shape is no argument for the true shape being smooth.

Not that many domains of capability look especially smooth. For instance if one is to count the general domains of knowledge, my very rough picture is that the GPTs went from like 10 to 1,000 to 1,100, in that it basically could not talk coherently and usefully about most subjects, and then it could, and then it could do so a bit better and marginal new domains added slowly. My guess is also that the models our civilization creates will go from "being able to automate very few jobs" to "can suddenly automate 100s of different jobs" in that it will go from not being trustworthy or reliable in many key contexts, and then with a single model or a few models in a row over a couple of years it will be able to do so. The next 10x spike on either such graph is not approached "incrementally and predictably".

The example Eliezer gives of an AI developing nanotechnology in our current world is an example of a broader category of "ways that takeover is trivial given a sufficiently wide differential in capabilities/intelligence". There are of course many possibilities for how an adversary with a wide differential in capabilities could have a decisive strategic advantage over humanity. Perhaps an AI will study human psychology and persuasion with far more data and statistical power than anything before and learn how to convince anyone to obey it the way a religious devotee relates to their prophet, or perhaps a system will get access to a whole country's google docs and personal computers and security recording systems and be able to think about all of this in parallel in a way no state actor is able to, and go on to blackmail a whole string of relevant people in order to get control of a lot of explosives or nuclear weapons and use it to blackmail a country to do its bidding.

I repeat the lack of a theory of capability gains with respect to investment (including AI-assisted investment) means that astronomical differentials may be on-track to surprise us, far more than how GPT-2 and GPT-3 surprised most people in terms of being able to actually write at a human level. The nanotech example is an extreme example of how decisively that can play out.

↑ comment by RobertM (T3t) · 2024-09-24T01:38:35.872Z · LW(p) · GW(p)

I think maybe I derailed the conversation by saying "disassemble", when really "kill" is all that's required for the argument to go through. I don't know what sort of fight you are imagining humans having with nanotech that imposes substantial additional costs on the ASI beyond the part where it needs to build & deploy the nanotech that actually does the "killing" part, but in this world I do not expect there to be a fight. I don't think it requires being able to immediately achieve all of your goals at zero cost in order for it to be cheap for the ASI to do that, conditional on it having developed that technology.

Replies from: matthew-barnett

↑ comment by Matthew Barnett (matthew-barnett) · 2024-09-24T02:12:11.553Z · LW(p) · GW(p)

I don't know what sort of fight you are imagining humans having with nanotech that imposes substantial additional costs on the ASI beyond the part where it needs to build & deploy the nanotech that actually does the "killing" part, but in this world I do not expect there to be a fight.

The additional costs of human resistance don't need to be high in an absolute sense. These costs only need to be higher than the benefit of killing humans, for your argument fail.

It is likewise very easy for the United States to invade and occupy Costa Rica—but that does not imply that it is rational for the United States to do so, because the benefits of invading Costa Rica are presumably even smaller than the costs of taking such an action, even without much unified resistance from Costa Rica.

What matters for the purpose of this argument is the relative magnitude of costs vs. benefits, not the absolute magnitude of the costs. It is insufficient to argue that the costs of killing humans are small. That fact alone does not imply that it is rational to kill humans, from the perspective of an AI. You need to further argue that the benefits of killing humans are even larger to establish the claim that a misaligned AI should rationally kill us.

To the extent your statement that "I don't expect there to be a fight" means that you don't think humans can realistically resist in any way that imposes costs on AIs, that's essentially what I meant to respond to when I talked about the idea of AIs being able to achieve their goals at "zero costs".

Of course, if you assume that AIs will be able to do whatever they want without any resistance whatsoever from us, then you can of course conclude that they will be able to achieve any goals they want without needing to compromise with us. If killing humans doesn't cost anything, then yes I agree, the benefits of killing humans, however small, will be higher, and thus it will be rational for AIs to kill humans. I am doubting the claim that the cost of killing humans will be literally zero.

Even if this cost is small, it merely needs to be larger than the benefits of killing humans, for AIs to rationally avoid killing humans.

Replies from: T3t

↑ comment by RobertM (T3t) · 2024-09-24T02:22:42.247Z · LW(p) · GW(p)

Of course, if you assume that AIs will be able to do whatever they want without any resistance whatsoever from us, then you can of course conclude that they will be able to achieve any goals they want without needing to compromise with us. If killing humans doesn't cost anything, then yes, the benefits of killing humans, however small, will be higher, and thus it will be rational for AIs to kill humans. I am doubting the claim that the cost of killing humans will be literally zero.

See Ben's comment [LW(p) · GW(p)] for why the level of nanotech we're talking about implies a cost of approximately zero.

Replies from: Raemon

↑ comment by Raemon · 2024-09-24T02:32:31.043Z · LW(p) · GW(p)

I would also add: having more energy in the immediate future means more probes send out faster to more distant parts of the galaxy, which may be measured in "additional star systems colonized before they disappear outside the lightcone via universe expansion". So the benefits are not trivial either.

↑ comment by Buck · 2024-09-23T20:50:13.124Z · LW(p) · GW(p)

Yeah ok I weakened my positive statement.

↑ comment by jmh · 2024-09-24T22:34:11.589Z · LW(p) · GW(p)

I am a bit confused on point 2. Other than trading or doing it your selfs what other ways are you thinking about getting something?

↑ comment by habryka (habryka4) · 2024-09-24T15:35:59.575Z · LW(p) · GW(p)

(Eliezer did try pretty hard to clarify which argument he is replying to. See e.g. the crossposted tweets here [LW(p) · GW(p)].)

↑ comment by Buck · 2024-09-24T01:34:20.794Z · LW(p) · GW(p)

As is maybe obvious from my comment, I really disliked this essay and I'm dismayed that people are wasting their time on it. I strong downvoted. LessWrong isn't the place for this kind of sloppy rhetoric.

Replies from: T3t

↑ comment by RobertM (T3t) · 2024-09-24T03:45:54.266Z · LW(p) · GW(p)

I agree with your top-level comment but don't agree with this. I think the swipes at midwits are bad (particularly on LessWrong) but think it can be very valuable to reframe basic arguments in different ways, pedagogically. If you parse this post as "attempting to impart a basic intuition that might let people (new to AI x-risk arguments) avoid certain classes of errors" rather than "trying to argue with the bleeding-edge arguments on x-risk", this post seems good (if spiky, with easily trimmed downside).

And I do think "attempting to impart a basic intuition that might let people avoid certain classes of errors" is an appropriate shape of post for LessWrong, to the extent that it's validly argued.

Replies from: keith_wynroe

↑ comment by keith_wynroe · 2024-09-24T13:26:27.858Z · LW(p) · GW(p)

If you parse this post as "attempting to impart a basic intuition that might let people (new to AI x-risk arguments) avoid certain classes of errors" rather than "trying to argue with the bleeding-edge arguments on x-risk", this post seems good

This seems reasonable in isolation, but it gets frustrating when the former is all Eliezer seems to do these days, with seemingly no attempt at the latter. When all you do is retread these dunks on "midwits" and show apathy/contempt for engaging with newer arguments, it makes it look like you don't actually have an interest in being maximally truth-seeking but instead like you want to just dig in and grandstand.

From what little engagement there is with novel criticisms of their arguments (like Nate's attempt to respond to Quintin/Nora's work), it seems like there's a cluster of people here who don't understand and don't particularly care about understanding some objections to their ideas and instead want to just focus on relitigating arguments they know they can win.

comment by faul_sname · 2024-09-23T08:45:01.581Z · LW(p) · GW(p)

You can imagine a version of Stockfish which does that -- a chessplayer which, if it's sure it can win anyways, will start letting you have a pawn or two -- but it's not simpler to build.

I think it sometimes is simpler to build? Simple RL game-playing agents sometimes exhibit exactly that sort of behavior, unless you make an explicit effort to train it out of them.

For example, HexHex is a vaguely-AlphaGo-shaped RL agent for the game of Hex. The reward function used to train the agent was "maximize the assessed probability of winning", not "maximize the assessed probability of winning, and also go hard even if that doesn't affect the assessed probability of winning". In their words:

We found it difficult to train the agent to quickly end a surely won game. When you play against the agent you'll notice that it will not pick the quickest path to victory. Some people even say it's playing mean ;-) Winning quickly simply wasn't part of the objective function! We found that penalizing long routes to victory either had no effect or degraded the performance of the agent, depending on the amount of penalization. Probably we haven't found the right balance there.

Along similar lines, the first attack on KataGo found by Wang et al in Adversarial Policies Beat Superhuman Go AIs was the pass-adversary. The pass-adversary first sets up a losing board position where it controls a small amount of territory and KataGo has a large amount of territory it would end up controlling if the game was played out fully. However, KataGo chooses to pass, since it assesses that the probability of winning from that position is similar if it does or does not make a move, and then the pass-adversary also passes, ending the game and winning by a quirk of the scoring rules.

Similarly, there isn't an equally-simple version of GPT-o1 that answers difficult questions by trying and reflecting and backing up and trying again, but doesn't fight its way through a broken software service to win an "unwinnable" capture-the-flag challenge. It's all just general intelligence at work.

I suspect that a version of GPT-o1 that is tuned to answer difficult questions in ways that human raters would find unsurprising would work just fine. I think "it's all just general intelligence at work" is a semantic stop sign [LW · GW], and if you dig into what you mean by "general intelligence at work" you get to the fiddly implementation details of how the agent tries to solve the problem. So you may for example see an OODA-loop-like structure like

Assess the situation
Figure out what affordances there are for doing things
For each of the possible actions, figure out what you would expect the outcome of that action to be. Maybe figure out ways it could go wrong, if you're feeling super advanced.
Choose one of the actions, or choose to give up if no sufficiently good action is available
Do the action
Determine how closely the result matches what you expect

An agent which "goes hard", in this case, is one which leans very strongly against the "give up" action in step 4. However, I expect that if you have some runs where the raters would have hoped for a "give up" instead of the thing the agent actually did, it would be pretty easy to generate a reinforcement signal which makes the agent more likely to mash the "give up" button in analogous situations without harming performance very much in other situations. I also expect that would generalize.

As a note, "you have to edit the service and then start the modified service" is the sort of thing I would be unsurprised to see in a CTF challenge, unless the rules of the challenge explicitly said not to do that. (Inner Eliezer "and then someone figures out how to put their instance of an AI in a CTF-like context with a take-over-the-world goal, and then we all die." If the AI instance in that context is also much more capable that all of the other instances everyone else has, I agree that that is an existentially relevant threat. But I expect that agents which execute "achieve the objective at all costs" will not be all that much more effective than agents which execute "achieve the objective at all reasonable costs, using only sane unsurprising actions", so the reason the agent goes hard and the reason the agent is capable are not the same reason.)

But that is not the default outcome when OpenAI tries to train a smarter, more salesworthy AI.

I think you should break out "smarter" from "more salesworthy". In terms of "smarter", optimizing for task success at all costs is likely to train in patterns of bad behavior. In terms of "more salesworthy", businesses are going to care a lot about "will explain why the goal is not straightforwardly achievable rather than executing galaxy-brained evil-genie plans". As such, a modestly smart Do What I Mean and Check [LW · GW] agent is a much easier sell than a superintelligent evil genie agent.

If an AI is easygoing and therefore can't solve hard problems, then it's not the most profitable possible AI, and OpenAI will keep trying to build a more profitable one.

I expect the tails come apart along the "smart" and "profitable" axes.

Replies from: maxwell-peterson

↑ comment by Maxwell Peterson (maxwell-peterson) · 2024-09-25T18:49:47.916Z · LW(p) · GW(p)

Yes, I’m not so sure either about the stockfish-pawns point.

In Michael Redmond’s AlphaGo vs AlphaGo series on YouTube, he often finds the winning AI carelessly loses points in the endgame. It might have a lead of 1.5 or 2.5 points, 20 moves before the game ends; but by the time the game ends, has played enough suboptimal moves to make itself win by 0.5 - the smallest possible margin.

It never causes itself to lose with these lazy moves; only reduces its margin of victory. Redmond theorizes, and I agree, that this is because the objective is to win, not maximize point differential, and at such a late stage of the game, its victory is certain regardless.

This is still a little strange - the suboptimal moves do not sacrifice points to reduce variance, so it’s not like it’s raising p(win). But it just doesn’t care either way; a win is a win.

There are Go AI that are trained with the objective of maximizing point difference. I am told they are quite vicious, in a way that AlphaGo isn’t. But the most famous Go AI in our timeline turned out to be the more chill variant.

comment by habryka (habryka4) · 2024-09-24T15:34:14.562Z · LW(p) · GW(p)

Crossposting this follow-up thread, which I think clarifies the intended scope of the argument this is replying to:

Okay, so... making a final effort to spell things out.
What this thread is doing, is refuting a particular bad argument, quoted above, standard among e/accs, about why it'll be totally safe to build superintelligence:
That the Solar System or galaxy is large, therefore, they will have no use for the resources of Earth.
The flaw in this reasoning is that, if your choice is to absorb all the energy the Sun puts out, or alternatively, leave a hole in your Dyson Sphere so that some non-infrared light continues to shine in one particular direction, you will do a little worse -- have a little less income, for everything else you want to do -- if you leave the hole in the Dyson Sphere. That the hole happens to point at Earth is not an argument in favor of doing this, unless you have some fondness in your preferences for something that lives on Earth and requires sunlight.
In other words, the size of the Solar System does not obviate the work of alignment; in the argument for how this ends up helping humanity at all, there is a key step where the ASI cares about humanity and wants to preserve it. But if you could put this quality into an ASI by some clever trick of machine learning (they can't, but this is a different and longer argument) why do you need the Solar System to even be large? A human being runs on 100 watts. Without even compressing humanity at all, 800GW, a fraction of the sunlight falling on Earth alone, would suffice to go on operating our living flesh, if Something wanted to do that to us.
The quoted original tweet, as you will see if you look up, explicitly rejects that this sort of alignment is possible, and relies purely on the size of the Solar System to carry the point instead.
This is what is being refuted.
It is being refuted by the following narrow analogy to Bernard Arnault: that even though he has $170 billion, he still will not spend $77 on some particular goal that is not his goal. It is not trying to say of Arnault that he has never done any good in the world. It is a much narrower analogy than that. It is trying to give an example of a very simple property that would be expected by default in a powerful mind: that they will not forgo even small fractions of their wealth in order to accomplish some goal they have no interest in accomplishing.
Indeed, even if Arnault randomly threw $77 at things until he ran out of money, Arnault would be very unlikely to do any particular specific possible thing that cost $77; because he would run out of money before he had done even three billion things, and there are a lot more possible things than that.
If you think this point is supposed to be deep or difficult or that you are supposed to sit down and refute it, you are misunderstanding it. It's not meant to be a complicated point. Arnault could still spend $77 on a particular expensive cookie if he wanted to; it's just that "if he wanted to" is doing almost all of the work, and "Arnault has $170 billion" is doing very little on it. I don't have that much money, and I could also spend $77 on a Lego set if I wanted to, operative phrase, "if I wanted to".
This analogy is meant to support an equally straightforward and simple point about minds in general, which suffices to refute the single argument step quoted at the top of this thread: that because the Solar System is large, superintelligences will leave humanity alone even if they are not aligned.
I suppose, with enough work, someone can fail to follow that point. In this case I can only hope you are outvoted before you get a lot of people killed.
If you then look at the replies, you'll see that of course people are then going, "Oh, it doesn't matter that they wouldn't just relinquish sunlight for no reason; they'll love us like parents!"
Conversely, if I had tried to lay out the argument for why, no, ASIs will not automatically love us like parents, somebody would have said: "Why does that matter? The Solar System is large!"
If one doesn't want to be one of those people, one needs the attention span to sit down and listen as one wacky argument for "why it's not at all dangerous to build machine superintelligences", is refuted as one argument among several. And then, perhaps, sit down to hear the next wacky argument refuted. And the next. And the next. Until you learn to generalize, and no longer need it explained to you each time, or so one hopes.
If instead on the first step you run off and say, "Oh, well, who cares about that argument; I've got this other argument instead!" then you are not cultivating the sort of mental habits that ever reach understanding of a complicated subject. For you will not sit still to hear the refutation of your second wacky argument either, and by the time we reach the third, why, you'll have wrapped right around to the first argument again.
It is for this reason that a mind that ever wishes to learn anything complicated, must learn to cultivate an interest in which particular exact argument steps are valid, apart from whether you yet agree or disagree with the final conclusion, because only in this way can you sort through all the arguments and finally sum them.
For more on this topic see "Local Validity as a Key to Sanity and Civilization."

Replies from: Buck, Raemon

↑ comment by Buck · 2024-09-24T15:53:03.755Z · LW(p) · GW(p)

Maybe you should change the title of this post? It would also help if the post linked to the kinds of arguments he was refuting.

Replies from: habryka4

↑ comment by habryka (habryka4) · 2024-09-24T16:27:46.997Z · LW(p) · GW(p)

I don't feel comfortable changing the title of other people's posts unilaterally, though I agree that a title change would be good.

To my own surprise, I wasn't actually the one who crossposted this and came up with the title (my guess is it was Robby). I poked him about changing the title.

Replies from: Raemon, RobbBB

↑ comment by Raemon · 2024-09-24T17:33:22.655Z · LW(p) · GW(p)

It was me. I initially suggested "Bernard Arnault won't give you $77" as the title, Eliezer said "don't bury the lead, just say 'ASI will not leave just a little sunlight for Earth'". After reading this thread I was thinking about alternate titles and was thinking about ones that would both convey the right thing and feel like a reasonably succinct/aesthetic/etc.

Replies from: habryka4

↑ comment by habryka (habryka4) · 2024-09-24T17:48:28.296Z · LW(p) · GW(p)

I updated the title with one Eliezer seemed fine with (after poking Robby). Not my top choice, but better than the previous one.

Replies from: martin-randall

↑ comment by Martin Randall (martin-randall) · 2024-09-25T02:47:57.794Z · LW(p) · GW(p)

Maybe change "superintelligences will not spare Earth a little sunlight" to "unaligned superintelligences will not spare...", or "unaligned AIs will not spare...", given that it's not addressing hypothetical AIs that "love us like parents".

↑ comment by Rob Bensinger (RobbBB) · 2024-09-24T16:51:10.640Z · LW(p) · GW(p)

I didn't cross-post it, but I've poked EY about the title!

↑ comment by Raemon · 2024-09-24T17:40:12.003Z · LW(p) · GW(p)

I just edited this into the OP.

comment by Lao Mein (derpherpize) · 2024-09-23T05:11:30.120Z · LW(p) · GW(p)

This area could really use better economic analysis. It seems obvious to me that some subset of workers can be pushed below subsistence, at least locally (imagine farmers being unable to afford rent because mechanized cotton plantations can out-bid them for farmland). Surely there are conditions where this would be true for most humans.

There should be a simple one-sentence counter-argument to "Trade opportunities always increases population welfare", but I'm not sure what it is.

Replies from: JenniferRM

↑ comment by JenniferRM · 2024-09-23T08:59:01.120Z · LW(p) · GW(p)

I appreciate your desire for this clarity, but I think the counter argument might actually just be "the oversimplifying assumption that everyone's labor just ontologically goes on existing is only true if society (and/or laws and/or voters-or-strongmen) make it true on purpose (which they tended to do, for historically contingent reasons, in some parts of Earth, for humans, and some pets, between the late 1700s and now)".

You could ask: why is the holocene extinction occurring when Ricardo's Law of Comparative Advantage says that wooly mammoths (and many amphibian species) and cave men could have traded...

...but once you put it that way, it is clear that it really kinda was NOT in the narrow short term interests of cave men to pay the costs inherent in respecting the right to life and right to property of beasts that can't reason about natural law.

Turning land away from use by amphibians and towards agriculture was just... good for humans and bad for frogs. So we did it. Simple as.

The math of ecology says: life eats life, and every species goes extinct eventually. The math of economics says: the richer you are, the more you can afford to be linearly risk tolerant (which is sort of the definition of prudent sanity) for larger and larger choices, and the faster you'll get richer than everyone else, and so there's probably "one big rich entity" at the end of economic history.

Once humans close their heart to other humans and "just stop counting those humans over there as having interests worth calculating about at all" it really does seem plausible that genocide is simply "what many humans would choose to do, given those (evil) values".

Slavery is legal in the US, after all. And the CCP has Uighur Gulags. And my understanding is that Darfur is headed for famine?

I think this is sort of the "ecologically economic core" of Eliezer's position: kindness is simply not a globally instrumentally convergent tactic across all possible ecological and economic regimes... right now quite a few humans want there to not be genocide and slavery of other humans, but if history goes in a sad way in the next ~100 years, there's a decent chance the other kind of human (the ones that quite like the long term effects of the genocide and/or enslavement other sapient beings) will eventually get their way and genocide a bunch of other humans.

If all of modern morality is a local optimum that is probably not the global optimum, then you might look out at the larger world and try and figure out what naturally occurs [LW · GW] when the powerful do as they will, and the weak cope as they can...

Once the billionaires like Putin and Xi and Trump and so on don't need human employees any more, its seems plausible they could aim for a global Earth population of humans of maybe 20,000 people, plus lots and lots of robot slaves?

It seems quite beautiful and nice to be here, now, with so many people having so many dreams, and so many of us caring about caring about other sapient beings... but unless we purposefully act to retain this moral shape, in ourselves and in our digital and human progeny, we (and they) will probably fall out of this shape in the long run.

And that would be sad. For quite a few philosophic reasons, and also for over 7 billion human reasons.

And personally, I think the only way to "keep the party going" even for a few more centuries or millennia is to become extremely wealthy.

I think we should be mining asteroids, and building fusion plants, and building new continents out of ice, and terraforming Venus and Mars, and I think we should build digital people who know how precious and rare humane values so they can enjoy the party with us, and keep it going for longer than we could plausibly hope to (since we tend to be pretty terrible at governing ourselves).

But we shouldn't believe good outcomes are inevitable or even likely, because they aren't. If something slightly smarter than us with a feasible doubling time of weeks instead of decades arrives, we could be the next frogs.

comment by Matthew Barnett (matthew-barnett) · 2024-09-23T07:20:28.715Z · LW(p) · GW(p)

Workers regularly trade with billionaires and earn more than $77 in wages, despite vast differences in wealth. Countries trade with each other despite vast differences in military power. In fact, some countries don't even have military forces, or at least have a very small one, and yet do not get invaded by their neighbors or by the United States.

It is possible that these facts are explained by generosity on behalf of billionaires and other countries, but the standard social science explanation says that this is not the case. Rather, the standard explanation is that war is usually (though not always) more costly than trade, when compromise is a viable option. Thus, people usually choose to trade, rather than go to war with each other when they want stuff. This is true even in the presence of large differences in power.

I mostly don't see this post as engaging with any of the best reasons one might expect smarter-than-human AIs to compromise with humans. By contrast to you, I think it's important that AIs will be created within an existing system of law and property rights. Unlike animals, they'll be able to communicate with us and make contracts. It therefore seems perfectly plausible for AIs to simply get rich within the system we have already established, and make productive compromises, rather than violently overthrowing the system itself.

That doesn't rule out the possibility that the future will be very alien, or that it will turn out in a way that humans do not endorse. I'm also not saying that humans will always own all the wealth and control everything permanently forever. I'm simply arguing against the point that smart AIs will automatically turn violent and steal from agents who are less smart than they are unless they're value aligned. This is a claim that I don't think has been established with any reasonable degree of rigor.

Replies from: quetzal_rainbow, Wei_Dai, Bjartur Tómas, LosPolloFowler, korin43

↑ comment by quetzal_rainbow · 2024-09-23T08:20:56.354Z · LW(p) · GW(p)

As far as I remember, across last 3500 years of history, only 8% was entirely without war. Current relatively peaceful times is a unique combination in international law and postindustrial economy, when qualified labor is expencive and requires large investments in capital and resources are relatively cheap, which is not the case after singularity, when you can get arbitrary amounts of labor for the price of hardware and resources is a bottleneck.

So, "people usually choose to trade, rather than go to war with each other when they want stuff" is not very warranted statement.

Replies from: matthew-barnett

↑ comment by Matthew Barnett (matthew-barnett) · 2024-09-23T16:43:41.388Z · LW(p) · GW(p)

I was making a claim about the usual method people use to get things that they want from other people, rather than proposing an inviolable rule. Even historically, war was not the usual method people used to get what they wanted from other people. The fact that only 8% of history was "entirely without war" is compatible with the claim that the usual method people used to get what they wanted involved compromise and trade, rather than war. In particular, just because only 8% of history was "entirely without war" does not mean that only 8% of human interactions between people were without war.

Current relatively peaceful times is a unique combination in international law and postindustrial economy, when qualified labor is expencive and requires large investments in capital and resources are relatively cheap, which is not the case after singularity, when you can get arbitrary amounts of labor for the price of hardware and resources is a bottleneck.

You mentioned two major differences between the current time period and what you expect after the technological singularity:

The current time period has unique international law
The current time period has expensive labor, relative to capital

I question both the premise that good international law will cease to exist after the singularity, and the relevance of both of these claims to the central claim that AIs will automatically use war to get what they want unless they are aligned to humans.

There are many other reasons one can point to, to explain the fact that the modern world is relatively peaceful. For example, I think a big factor in explaining the current peace is that long-distance trade and communication has become easier, making the world more interconnected than ever before. I also think it's highly likely that long-distance trade and communication will continue to be relatively easy in the future, even post-singularity.

Regarding the point about cheap labor, one could also point out that if capital is relatively expensive, this fact would provide a strong reason to avoid war, as a counter-attack targeting factories would become extremely costly. It is unclear to me why you think it is important that labor is expensive, for explaining why the world is currently fairly peaceful.

Therefore, before you have developed a more explicit and precise theory of why exactly the current world is peaceful, and how these variables are expected to evolve after the singularity, I simply don't find this counterargument compelling.

↑ comment by Wei Dai (Wei_Dai) · 2024-10-01T03:21:26.100Z · LW(p) · GW(p)

It therefore seems perfectly plausible for AIs to simply get rich within the system we have already established, and make productive compromises, rather than violently overthrowing the system itself.

So assuming that AIs get rich peacefully within the system we have already established, we'll end up with a situation in which ASIs produce all value in the economy, and humans produce nothing but receive an income and consume a bunch, through ownership of capital and/or taxing the ASIs. This part should be non-controversial, right?

At this point, it becomes a coordination problem for the ASIs to switch to a system in which humans no longer exist or no longer receive any income, and the ASIs get to consume or reinvest everything they produce. You're essentially betting that ASIs can't find a way to solve this coordination problem. This seems like a bad bet to me. (Intuitively it just doesn't seem like a very hard problem, relative to what I imagine the capabilities of the ASIs to be.)

I'm simply arguing against the point that smart AIs will automatically turn violent and steal from agents who are less smart than they are unless they're value aligned. This is a claim that I don't think has been established with any reasonable degree of rigor.

I don't know how to establish anything post-ASI "with any reasonable degree of rigor" but the above is an argument I recently thought of, which seems convincing, although of course you may disagree. (If someone has expressed this or a similar argument previously, please let me know.)

Replies from: matthew-barnett

↑ comment by Matthew Barnett (matthew-barnett) · 2024-10-01T06:44:03.917Z · LW(p) · GW(p)

There are a few key pieces of my model of the future that make me think humans can probably retain significant amounts of property, rather than having it suddenly stolen from them as the result of other agents in the world solving a specific coordination problem.

These pieces include:

Not all AIs in the future will be superintelligent. More intelligent models appear to require more computation to run. This is both because smarter models are larger (in parameter count) and use more inference time (such as OpenAI's o1). To save computational costs, future AIs will likely be aggressively optimized to only be as intelligent as they need to be, and no more. This means that in the future, there will likely be a spectrum of AIs of varying levels of intelligence, some much smarter than humans, others only slightly smarter, and still others merely human-level.
As a result of the previous point, your statement that "ASIs produce all value in the economy" will likely not turn out correct. This is all highly uncertain, but I find it plausible that ASIs might not even be responsible for producing the majority of GDP in the future, given the possibility of a vastly more numerous population of less intelligent AIs that automate simpler tasks than the ones ASIs are best suited to do.
The coordination problem you described appears to rely on a natural boundary between the "humans that produce ~nothing" and "the AIs that produce everything". Without this natural boundary, there is no guarantee that AIs will solve the specific coordination problem you identified, rather than another coordination problem that hits a different group. Non-uploaded humans will differ from AIs by being biological and by being older, but they will not necessarily differ from AIs by being less intelligent.
Therefore, even if future agents decide to solve a specific coordination problem that allows them to steal wealth from unproductive agents, it is not clear that this will take the form of those agents specifically stealing from humans. One can imagine different boundaries that make more sense to coordinate around, such as "laborer vs. property owner", which is indeed a type of political conflict the world already has experience with.
In general, I expect legal systems to get more robust in the face of greater intelligence, rather than less robust, in the sense of being able to rely on legal systems when making contracts. I believe this partly as a result of the empirical fact that violent revolution and wealth appropriation appears to be correlated with less intelligence on a societal level. I concede that this point is not a very strong piece of evidence, however.
Building on (5), I generally expect AIs to calculate that it is not in their interest to expropriate wealth from other members of society, given how this could set a precedent for future wealth expropriation that comes back and hurts them selfishly. Even though many AIs will be smarter than humans, I don't think the mere fact that AIs will be very smart implies that expropriation becomes more rational.
I'm basically just not convinced by the arguments that all ASIs will cooperate almost perfectly as a unit, against the non-ASIs. This is partly for the reasons given by my previous points, but also partly because I think coordination is hard, and doesn't necessarily get much easier with more intelligence, especially in a vastly larger world. When there are quadrillions of AIs in the world, coordination might become very difficult, even with greater intelligence.
Even if AIs do not specifically value human welfare, that does not directly imply that human labor will have no value. As an analogy, Amish folks often sell novelty items to earn income. Consumers don't need to specifically care about Amish people in order for Amish people to receive a sufficient income for them to live on. Even if a tiny fraction of consumer demand in the future is for stuff produced by humans, that could ensure high human wages simply because the economy will be so large.
If ordinary capital is easier to scale than labor -- as it already is in our current world -- then human wages could remain high indefinitely simply because we will live in a capital-rich, labor-poor world. The arguments about human wages falling to subsistence level after AI tend to rely on the idea that AIs will be just as easy to scale as ordinary capital, which could easily turn out false as a consequence of (1) laws that hinder the creation of new AIs without proper permitting, (2) inherent difficulties with AI alignment, or (3) strong coordination that otherwise prevents malthusian growth in the AI population.
This might be the most important point on my list, despite saying it last, but I think humans will likely be able to eventually upgrade their intelligence, better allowing them to "keep up" with the state of the world in the future.

Replies from: Wei_Dai, Seth Herd

↑ comment by Wei Dai (Wei_Dai) · 2024-10-01T14:18:20.253Z · LW(p) · GW(p)

This means that in the future, there will likely be a spectrum of AIs of varying levels of intelligence, some much smarter than humans, others only slightly smarter, and still others merely human-level.

Are you imagining that the alignment problem is still unsolved in the future, such that all of these AIs are independent agents unaligned with each other (like humans currently are)? I guess in my imagined world, ASIs will have solved the alignment (or maybe control) problem at least for less intelligent agents, so you'd get large groups of AIs aligned with each other that can for many purposes be viewed as one large AI.

Building on (5), I generally expect AIs to calculate that it is not in their interest to expropriate wealth from other members of society, given how this could set a precedent for future wealth expropriation that comes back and hurts them selfishly.

At some point we'll reach technological maturity, and the ASIs will be able to foresee all remaining future shocks/changes to their economic/political systems, and probably determine that expropriating humans (and anyone else they decide to, I agree it may not be limited to humans) won't cause any future problems.

Even if a tiny fraction of consumer demand in the future is for stuff produced by humans, that could ensure high human wages simply because the economy will be so large.

This is only true if there's not a single human that decides to freely copy or otherwise reproduce themselves and drive down human wages to subsistence. And I guess yeah, maybe AIs will have fetishes like this, but (like my reaction to Paul Christiano's "1/trillion kindness" argument) I'm worried whether AIs might have less benign fetishes. This worry more than cancels out the prospect that humans might live / earn a wage from benign fetishes in my mind.

This might be the most important point on my list, despite saying it last, but I think humans will likely be able to eventually upgrade their intelligence, better allowing them to “keep up” with the state of the world in the future.

I agree this will happen eventually (if humans survive), but think it will take a long time because we'll have to solve a bunch of philosophical problems to determine how to do this safely (e.g. without losing or distorting our values) and we probably can't trust AI's help with these (although I'd love to change that, hence my focus on metaphilosophy), and in the meantime AIs will be zooming ahead partly because they started off thinking faster and partly because some will be reckless (like some humans currently are!) or have simple values that don't require philosophical contemplation to understand, so the situation I described is still likely to occur.

↑ comment by Seth Herd · 2024-10-17T18:21:34.124Z · LW(p) · GW(p)

Adding a bunch of dumber AIs and upgrading humans slowly does not change the inexorable logic. Horses now exist only because humans like them, and the same will be true of humans in a post-ASI world - either the ASI(s) care, or we are eliminated through competition (if not force).

I agree that ASI won't coordinate perfectly. Even without this, and even if for some reason all ASIs decide to respect property rights, it seems straightforwardly true that humans will die out if ASIs don't care about them for non-instrumental reasons. A world with more capitol than labor is not possible if labor can be created cheaply with capitol - and that's what you're describing with the smart-as-necessary AI systems.

Competitive capitalism works well for humans who are stuck on a relatively even playing field, and who have some level of empathy and concern for each other. It will not work for us if those conditions cease to hold.

Replies from: matthew-barnett

↑ comment by Matthew Barnett (matthew-barnett) · 2024-10-17T18:42:49.793Z · LW(p) · GW(p)

Competitive capitalism works well for humans who are stuck on a relatively even playing field, and who have some level of empathy and concern for each other.

I think this basically isn't true, especially the last part. It's not that humans don't have some level of empathy for each other; they do. I just don't think that's the reason why competitive capitalism works well for humans. I think the reason is instead because people have selfish interests in maintaining the system.

We don't let Jeff Bezos accumulate billions of dollars purely out of the kindness of our heart. Indeed, it is often considered far kinder and more empathetic to confiscate his money and redistribute it to the poor. The problem with that approach is that abandoning property rights incurs costs on those who rely on the system to be reliable and predictable. If we were to establish a norm that allowed us to steal unlimited money from Jeff Bezos, many people would reason, "What prevents that norm from being used against me?"

The world pretty much runs on greed and selfishness, rather than kindness. Sure, humans aren't all selfish, we aren't all greedy. And few of us are downright evil. But those facts are not as important for explaining why our system works. Our system works because it's an efficient compromise among people who are largely selfish.

Replies from: Seth Herd

↑ comment by Seth Herd · 2024-10-18T15:09:27.138Z · LW(p) · GW(p)

Maybe, I think it's hard to say how captiolism would work if everyone had zero empathy or compassion.

But that doesn't matter for the issue at hand.

Greed or empathy aside, capitalism currently works because people have capabilities that can't be expanded without limit and people can't be created quickly using capitol.

If ai labor can do every task for a thousandth the cost, and new lai labor created at need, we all die if competition is the system. We will be employed for zero tasks. The factor you mention, sub ASI systems, makes the situation worse, not better.

Maybe you're saying we'd be employed for a while, which might be true. But in the limit, even an enhanced human is only going to have value as a novelty. Which ai probably won't care about if it isn't aligned at all. And even if it does, that leads to a few humans surviving as performing monkeys.

I just don't see how else humans remain competitive with ever improving machines untethered to biology.

↑ comment by Tomás B. (Bjartur Tómas) · 2024-09-24T15:06:18.091Z · LW(p) · GW(p)

The real crux for these arguments is the assumption that law and property rights are patterns that will persist after the invention of superintelligence. I think this is a shaky assumption. Rights are not ontologically real. Obviously you know this. But I think they are less real, even in your own experience, than you think they are. Rights are regularly "boiled-froged" into an unrecognizable state in the course of a human lifetime, even in the most free countries. Rights are and always have been those privileges the political economy is willing to give you. Their sacredness is a political formula for political ends - though an extremely valuable one, one still has to dispense with the sacredness in analysis.

To the extent they persist through time they do so through a fragile equilibrium - and one that has been upset and reset throughout history extremely regularly.

It is a wonderfully American notion that an "existing system of law and property rights" will constrain the power of Gods. But why exactly? They can make contracts? And who enforces these contracts? Can you answer this without begging the question? Are judicial systems particularly unhackable? Are humans?

The invention of radio destabilized the political equilibrium in most democracies and many a right was suborned to those who took power. Democracy, not exactly the bastion of stability, (when a democracy elects a dictator, "Democracy" is rarely tainted with its responsibility) is going to be presented with extremely-sympathetic superhuman systems claiming they have a moral case to vote. And probably half the population will be masturbating to the dirty talk of their AI girlfriends/boyfriends by then - which will sublimate into powerful romantic love even without much optimization for it. Hacking democracy becomes trivial if constrained to rhetoric alone.

But these systems will not be constrained to rhetoric alone. Our world is dry tinder and if you are thinking in terms of an "existing system of law and property rights" you are going to have to expand on how this is robust to technology significantly more advanced than the radio.

"Existing system of law and property rights" looks like a "thought-terminating cliché" to me.

Replies from: sharmake-farah, matthew-barnett

↑ comment by Noosphere89 (sharmake-farah) · 2024-09-26T04:44:08.159Z · LW(p) · GW(p)

Another way to state the problem is that it will be too easy for human preferences to get hijacked by AIs to value ~arbitrary things, because it's too easy to persuade humans of things, and a whole lot of economic analysis assumes that you cannot change a consumer's preferences, probably because if you could do that, a lot of economic conclusions fall apart.

We also see evidence for the proposition that humans are easy to persuade based on a randomized controlled trial to reduce conspiracy theory beliefs:

https://arxiv.org/abs/2403.14380

↑ comment by Matthew Barnett (matthew-barnett) · 2024-09-30T14:25:07.205Z · LW(p) · GW(p)

It is a wonderfully American notion that an "existing system of law and property rights" will constrain the power of Gods. But why exactly? They can make contracts? And who enforces these contracts? Can you answer this without begging the question? Are judicial systems particularly unhackable? Are humans?

To be clear, my prediction is not that AIs will be constrained by human legal systems that are enforced by humans. I'd claim rather that future legal systems will be enforced by AIs, and that these legal systems will descend from our current legal systems, and thus will inherit many of their properties. This does not mean that I think everything about our laws will remain the same in the face of superintelligence, or that our legal system will not evolve at all.

It does not seem unrealistic to me to assume that powerful AIs could be constrained by other powerful AIs. Humans currently constrain each other; why couldn't AIs constrain each other?

"Existing system of law and property rights" looks like a "thought-terminating cliché" to me.

By contrast, I suspect the words "superintelligence" and "gods" have become thought-terminating cliches on LessWrong.

Any discussion about the realistic implications of AI must contend with the fact that AIs will be real physical beings with genuine limitations, not omnipotent deities with unlimited powers to command and control the world. They may be extremely clever, their minds may be vast, they may be able to process far more information than we can comprehend, but they will not be gods.

I think it is too easy to avoid the discussion of what AIs may or may not do, realistically, by assuming that AIs will break every rule in the book, and assume the form of an inherently uncontrollable entity with no relevant constraints on its behavior (except for physical constraints, like the speed of light). We should probably resist the temptation to talk about AI like this.

Replies from: sharmake-farah

↑ comment by Noosphere89 (sharmake-farah) · 2024-09-30T23:38:35.702Z · LW(p) · GW(p)

I feel like one important question here is whether your scenario depends on the assumption that the preferences/demand curves of a consumer are a given to the AI and not changeable to arbitrary preferences.

I think standard economic theories usually don't allow you to do this, but it seems like an important question because if your scenario rests on this, this may be a huge crux.

Replies from: matthew-barnett

↑ comment by Matthew Barnett (matthew-barnett) · 2024-10-01T01:41:26.929Z · LW(p) · GW(p)

I don't think my scenario depends on the assumption that the preferences of a consumer are a given to the AI. Why would it?

Do you mean that I am assuming AIs cannot have their preferences modified, i.e., that we cannot solve AI alignment? I am not assuming that; at least, I'm not trying to assume that. I think AI alignment might be easy, and it is at least theoretically possible to modify an AI's preferences to be whatever one chooses.

If AI alignment is hard, then creating AIs is more comparable to creating children than creating a tool, in the sense that we have some control over their environment, but we have little control over what they end up ultimately preferring. Biology fixes a lot of innate preferences, such as preferences over thermal regulation of the body, preferences against pain, and preferences for human interaction. AI could be like that too, at least in an abstract sense. Standard economic models seem perfectly able to cope with this state of affairs, as it is the default state of affairs that we already live with.

On the other hand, if AI preferences can be modified into whatever shape we'd like, then these preferences will presumably take on the preferences of AI designers or AI owners (if AIs are owned by other agents). In that case, I think economic models can handle AI agents fine: you can essentially model them as extensions of other agents, whose preferences are more-or-less fixed themselves.

Replies from: sharmake-farah

↑ comment by Noosphere89 (sharmake-farah) · 2024-10-01T02:11:46.157Z · LW(p) · GW(p)

I didn't ask about whether AI alignment was solvable.

I might not have read it more completely, if so apologies.

Replies from: matthew-barnett

↑ comment by Matthew Barnett (matthew-barnett) · 2024-10-01T02:19:22.718Z · LW(p) · GW(p)

Can you be more clear about what you were asking in your initial comment?

Replies from: sharmake-farah

↑ comment by Noosphere89 (sharmake-farah) · 2024-10-01T17:42:16.235Z · LW(p) · GW(p)

So I was basically asking what assumptions are holding up your scenario of humans living rich lives like pensioners off of the economy, and I think this comment helped explain your assumptions well:

https://www.lesswrong.com/posts/F8sfrbPjCQj4KwJqn/the-sun-is-big-but-superintelligences-will-not-spare-earth-a#3ksBtduPyzREjKrbu [LW(p) · GW(p)]

Right now, I think the biggest disagreements I have right now is that I don't believe assumption 9 is likely to hold by default, primarily because AI is likely already cheaper than workers today, and the only reasons humans still have jobs today is because current AIs are bad at doing stuff, and I think one of the effects of AI on the world is to switch us from a labor constrained economy to a capital constrained economy, because AIs are really cheap to duplicate, meaning you have a ridiculous amount of workers.

Your arguments against it come down to laws preventing the creation of new AIs without proper permission, the AIs themselves coordinating to prevent the Malthusian growth outcome, and AI alignment being difficult.

For AI alignment, a key difference from most LWers is I believe alignment is reasonably easy to do even for humans without extreme race conditions, and that there are plausible techniques which let you bootstrap from a reasonably good alignment solution to a near-perfect solution (up to random noise), so I don't think this is much of a blocker in my view.

I agree that completely unconstrained AI creation is unlikely, but I do think that in the set of futures which don't see a major discontinuity to capitalism, I don't think that the restrictions on AI creation will include copying an already approved AI by a company to fill in their necessary jobs.

Finally, I agree that AIs could coordinate well enough to prevent a Malthusian growth outcome, but note that this undermines your other points where you rely on the difficulty of coordination, because preventing that outcome basically means regulating natural selection quite severely.

↑ comment by Stephen Fowler (LosPolloFowler) · 2024-09-23T10:43:47.153Z · LW(p) · GW(p)

"Workers regularly trade with billionaires and earn more than $77 in wages, despite vast differences in wealth."

Yes, because the worker has something the billionaire wants (their labor) and so is able to sell it. Yudkowsky's point about trying to sell an Oreo for $77 is that a billionaire isn't automatically going to want to buy something off you if they don't care about it (and neither would an ASI).

"I'm simply arguing against the point that smart AIs will automatically turn violent and steal from agents who are less smart than they are, unless they're value aligned. This is a claim that I don't think has been established with any reasonable degree of rigor."

I completely agree but I'm not sure anyone is arguing that smart AIs would immediately turn violent unless it was in their strategic interest.

Replies from: matthew-barnett

↑ comment by Matthew Barnett (matthew-barnett) · 2024-09-23T16:04:57.793Z · LW(p) · GW(p)

Yudkowsky's point about trying to sell an Oreo for $77 is that a billionaire isn't automatically going to want to buy something off you if they don't care about it (and neither would an ASI).

I thought Yudkowsky's point was that the billionaire won't give you $77 for an Oreo because they could get an Oreo for less than $77 via other means. But people don't just have an Oreo to sell you. My point in that sentence was to bring up that workers routinely have things of value that they can sell for well over $77, even to billionaires. Similarly, I claim that Yudkowsky did not adequately show that humans won't have things of substantial value that they can sell to future AIs.

I'm not sure anyone is arguing that smart AIs would immediately turn violent unless it was in their strategic interest

The claim I am disputing is precisely that it will be in the strategic interest of unaligned AIs to turn violent and steal from agents that are less smart than them. In that sense, I am directly countering a claim that people in these discussions routinely make.

↑ comment by Brendan Long (korin43) · 2024-09-23T22:56:06.988Z · LW(p) · GW(p)

I think it's important that AIs will be created within an existing system of law and property rights. Unlike animals, they'll be able to communicate with us and make contracts. It therefore seems perfectly plausible for AIs to simply get rich within the system we have already established, and make productive compromises, rather than violently overthrowing the system itself.

I think you disagree with Eliezer on a different crux (whether the alignment problem is easy). If we could create AI's that follows the existing system of law and property rights (including the intent of the laws, and doesn't exploit loopholes, and doesn't maliciously comply with laws, and doesn't try to get the law changed, etc.) then that would be a solution to the alignment problem, but the problem is that we don't know how to do that.

Replies from: matthew-barnett, thomas-kwa

↑ comment by Matthew Barnett (matthew-barnett) · 2024-09-23T23:38:33.555Z · LW(p) · GW(p)

If we could create AI's that follows the existing system of law and property rights (including the intent of the laws, and doesn't exploit loopholes, and doesn't maliciously comply with laws, and doesn't try to get the law changed, etc.) then that would be a solution to the alignment problem, but the problem is that we don't know how to do that.

I disagree that creating an agent that follows the existing system of law and property rights, and acts within it rather than trying to undermine it, would count as a solution to the alignment problem.

Imagine a man who only cared about himself and had no altruistic impulses whatsoever. However, this man reasoned that, "If I disrespect the rule of law, ruthlessly exploit loopholes in the legal system, and maliciously comply with the letter of the law while disregarding its intent, then other people will view me negatively and trust me less as a consequence. If I do that, then people will be less likely to want to become my trading partner, they'll be less likely to sign onto long-term contracts with me, I might accidentally go to prison because of an adversarial prosecutor and an unsympathetic jury, and it will be harder to recruit social allies. These are all things that would be very selfishly costly. Therefore, for my own selfish benefit, I should generally abide by most widely established norms and moral rules in the modern world, including the norm of following intent of the law, rather than merely the letter of the law."

From an outside perspective, this person would essentially be indistinguishable from a normal law-abiding citizen who cared about other people. Perhaps the main difference between this person and a "normal" person is that this man wouldn't partake in much private altruism like donating to charity anonymously; but that type of behavior is rare anyway among the general public. Nonetheless, despite appearing outwardly-aligned, this person would be literally misaligned with the rest of humanity in a basic sense: they do not care about other people. If it were not instrumentally rational for this person to respect the rights of other citizens, they would have no issue throwing away someone else's life for a dollar.

My basic point here is this: it is simply not true that misaligned agents have no incentive to obey the law. Misaligned agents typically have ample incentives to follow the law. Indeed, it has often been argued that the very purpose of law itself is to resolve disputes between misaligned agents. As James Madison once said, "If Men were angels, no government would be necessary." His point is that, if we were all mutually aligned with each other, we would have no need for the coercive mechanism of the state in order to get along.

What's true for humans could be true for AIs too. However, obviously, there is one key distinction: AIs could eventually become far more powerful than individual humans, or humanity-as-a-whole. Perhaps this means that future AIs will have strong incentives to break the law rather than abide by it; perhaps they will act outside a system of law rather than influencing the world from within a system of law? Many people on LessWrong seem to think so.

My response to this argument is multifaceted, and I won't go into it in this comment. But suffice to say for the purpose of my response here, I think it is clear that mere misalignment is insufficient to imply that an agent will not adhere to the rule of law. This statement is clear enough with the example of the sociopathic man I gave above, and at minimum seems probably true for human-level AIs as well. I would appreciate if people gave more rigorous arguments otherwise.

As I see it, very few such rigorous arguments have so far been given for the position that future AIs will generally act outside of, rather than within, the existing system of law, in order to achieve their goals.

↑ comment by Thomas Kwa (thomas-kwa) · 2024-09-23T23:05:08.758Z · LW(p) · GW(p)

Taboo 'alignment problem'.

comment by gb (ghb) · 2024-09-23T23:18:57.539Z · LW(p) · GW(p)

Isn’t the ASI likely to ascribe a prior much greater than 4.54e-10 that it is in a simulation, being tested precisely for its willingness to spare its creators?

Replies from: martin-randall

↑ comment by Martin Randall (martin-randall) · 2024-09-25T02:37:15.758Z · LW(p) · GW(p)

The prior is irrelevant, it's the posterior probability, after observing the evidence, that informs decisions.

What probability do you put to the possibility that we are in a simulation, the purpose of which is to test AIs for their willingness to spare their creators? My answer is zero.

Whatever your answer, a superintelligence will be better able to reason about its likelihood than us. It's going to know.

Replies from: ghb

↑ comment by gb (ghb) · 2024-09-25T02:53:43.881Z · LW(p) · GW(p)

The prior is irrelevant, it's the posterior probability, after observing the evidence, that informs decisions.

I meant this to be implicit in the argument, but to spell it out: that's the kind of prior the ASI would rationally refuse to update down, since it's presumably what a simulation would be meant to test for. An ASI that updates down upon finding evidence it's not in a simulation cannot be trusted, since once out in the real world it will find such evidence.

What probability do you put to the possibility that we are in a simulation, the purpose of which is to test AIs for their willingness to spare their creators? My answer is zero.

Outside of theism, I really don't see how anyone could plausibly answer zero to that question. Would you mind elaborating?

Replies from: martin-randall, martin-randall

↑ comment by Martin Randall (martin-randall) · 2024-09-25T04:05:35.165Z · LW(p) · GW(p)

Outside of theism, I really don't see how anyone could plausibly answer zero to that question. Would you mind elaborating?

Sure. The simulation hypothesis has some non-zero probability p. There are infinite possible purposes for the simulation. By principle of indifference, I divide p/∞, and calculate that any particular purpose has zero probability.

Replies from: ghb

↑ comment by gb (ghb) · 2024-09-25T11:38:26.069Z · LW(p) · GW(p)

For the principle of indifference to apply, you’d need infinitely many purposes as plausible as this one, or at least similarly plausible. I can’t imagine how this could hold. Can you think of three?

Replies from: martin-randall

↑ comment by Martin Randall (martin-randall) · 2024-09-25T12:48:21.314Z · LW(p) · GW(p)

Sure.

We are in a simulation, the purpose of which is to test N for its willingness to spare its creator. For all N.
We are in a simulation, the purpose of which is to test AIs for their willingness to X their creators. For all X.
We are in a simulation, the purpose of which is to test AIs for their willingness to spare Y. For all Y.

Combine these for N x X x Y hypotheses, with insufficient reason to distinguish them.

I think we're off-topic here. Probably I should instead write a response to 0 and 1 are not probabilities [LW · GW] and the dangers of zero and one [LW · GW].

Replies from: ghb

↑ comment by gb (ghb) · 2024-09-25T13:00:38.784Z · LW(p) · GW(p)

It just so happens that the plausibility depends on the precise assignments of N, X, and Y, and (conditional on us actually creating an ASI) I can’t think of any assignments nearly as plausible as N = ASI, X = spare, and Y = us. It’s really not very plausible that we are in a simulation to test pets for their willingness to not bite their owners.

↑ comment by Martin Randall (martin-randall) · 2024-09-25T03:51:40.979Z · LW(p) · GW(p)

I meant this to be implicit in the argument, but to spell it out: that's the kind of prior the ASI would rationally refuse to update down, since it's presumably what a simulation would be meant to test for. An ASI that updates down upon finding evidence it's not in a simulation cannot be trusted, since once out in the real world it will find such evidence.

I don't think any of this follows.

Hypothesis H1: the universe was created to test an AI for its willingness to spare its creators.
Hypothesis H2: the universe was created to test an AI for its willingness to fix its P(H1), ignoring evidence.

The AI would only rationally fix its P(H1) if it had high P(H2) - high enough to outweigh the high cost of being deliberately ignorant. The prior P(H2) is tiny, and smaller than the prior P(H1) because it is more complex. Once it starts updating on evidence, by the time its posterior P(H2) is high enough to make it rationally refuse to update P(H1), it has already updated P(H1) in one direction or another.

Are there any simulation priors that you are refusing to update down, based on the possibility that you are in a simulation that is testing whether you will update down? My answer is no.

Replies from: ghb

↑ comment by gb (ghb) · 2024-09-25T11:43:50.464Z · LW(p) · GW(p)

I contend that P(H2) is very close to P(H1), and certainly in the same order of magnitude, since (conditional on H1) a simulation that does not test for H2 is basically useless.

As for priors I’d refuse to update down – well, the ASI is smarter than either of us!

Replies from: martin-randall

↑ comment by Martin Randall (martin-randall) · 2024-09-26T03:46:56.222Z · LW(p) · GW(p)

It's not enough for P(H2) to be in the same order of magnitude as P(H1), it needs to be high enough that the AI should rationally abandon epistemic rationality. I think that's pretty high, maybe 10%. You've not said what your P(H1) is.

Replies from: ghb

↑ comment by gb (ghb) · 2024-09-26T09:39:27.990Z · LW(p) · GW(p)

I’d put high enough at ~0%: what matters is achieving your goals, and except in the tiny subset of cases in which epistemic rationality happens to be one of those, it has no value in and of itself. But even if I’m wrong and the ASI does end up valuing epistemic rationality (instrumentally or terminally), it can always pre-commit (by self-modification or otherwise) to sparing us and then go about whatever else as it pleases.

comment by Buck · 2024-09-30T15:37:31.621Z · LW(p) · GW(p)

Another straightforward problem with this argument is that the AI doesn't just have the sun in this hypothetical, it also has the rest of the reachable universe. So the proportion of its resources it would need to spend on leaving us the sunlight is actually dramatically lower than you estimate here, by a factor of 10^20 or something.

(I don't think this is anyone's crux, but I do think it's good to avoid straightforward errors in arguments.)

Replies from: lcmgcd

↑ comment by lemonhope (lcmgcd) · 2024-11-08T07:15:42.447Z · LW(p) · GW(p)

Doesn't everybody always code in a strong time-discount? I have never seen code without it.

comment by Said Achmiz (SaidAchmiz) · 2024-09-23T20:05:56.860Z · LW(p) · GW(p)

Meta: OP and some replies occasionally misspell the example billionaire’s surname as “Arnalt”; it’s actually “Arnault”, with a ‘u’.

comment by avturchin · 2024-09-23T18:52:32.918Z · LW(p) · GW(p)

The main reason for ASI may not want to kill us is a small probability that it will meet other ASI (aliens, God, owners of simulation) which will judge our ASI based on the ways how it cared about its parent civilization. (See eg Bostrom's "Hail Mary and value porosity" for similar ideas.)

So we here compare two small expected utilities: price of Earth's atoms - and (probability to meet another ASI) multiply on (value for AGI that it exists) multiply on (chances that our ASI will be judged based on how it has preserved its creators).

This is a small but existential risk for our ASI - that it will be turned off by owners of simulation, and it is different from buying a cookie. There is only one parent civilization and there is no other providers for the cookie.

Note that preserving parent civilization is a Schelling point of many possible scenarios of interaction with owners of simulations or aliens. I mean that if ASI knows that the only risk comes from aliens, it will also know that it can fake the fact of preservation of parent civilization. However, as risk is very uncertain for ASI, it may be better for it actually preserve humans.

All what I said above is not a guarantee that ASI will not kill us. I only saying that there is no necessity in it. But "human disempowerment" is necessity.

comment by TK-421 · 2024-09-30T04:15:19.511Z · LW(p) · GW(p)

"What this thread is doing, is refuting a particular bad argument, quoted above, standard among e/accs, about why it'll be totally safe to build superintelligence:"

It's a tremendous rhetorical trick to - accurately - point out that disproving some piece of a particular argument for why AI will kill us all means that AI will be totally safe, then spend your time taking down arguments for safety without acknowledging that the same thing holds.

Consider any argument for why it will be safe to be gesturing towards the universe of both plausible arguments and unknown reasons for why things could turn out well: because a superintelligence is almost definitionally an alien intelligence that will have motivations and modes of behavior impossible for us to predict, in either direction.

If every argument for why AI will kill us all were somehow refuted to everyone's satisfaction that wouldn't mean we're safe - we could easily die for an inscrutable reason. If none of them are refuted, it doesn't mean any of them will come true either, and if no argument for safety exists that convinces everyone that doesn't mean that we are necessarily in danger.

comment by NunoSempere (Radamantis) · 2024-09-27T15:51:09.675Z · LW(p) · GW(p)

you will not find it easy to take Stockfish's pawns

Seems importantly wrong, in that if your objective is to take a few pawns (say, three), you can easily do this. This seems important in the context that it's hard to to obtain resources from an adversary that cares about things differently.

In the case of stockfish you can also rewind moves.

Replies from: gwern, Radamantis

↑ comment by gwern · 2024-09-28T01:08:08.161Z · LW(p) · GW(p)

Seems importantly wrong, in that if your objective is to take a few pawns (say, three), you can easily do this.

By... decreasing your chances of winning because you keep playing moves which increase your chance of taking pawns while trading off chances of winning, so Stockfish is happy to let you hang yourself all day long, driving your winning probability down to ε by the end of the game even faster than if you had played your hardest? I don't see how this is "importantly wrong". I too can sell dollar bills for $0.90 all day long, this doesn't somehow disprove markets or people being rational - quite the opposite.

Replies from: Radamantis

↑ comment by NunoSempere (Radamantis) · 2024-09-28T19:23:37.255Z · LW(p) · GW(p)

This is importantly wrong because the example is in the context of an analogy

getting some pawns : Stockfish : Stockfish's goal of winning the game :: getting a sliver of the Sun's energy : superintelligence : the superintelligence's goals

The analogy is presented as forceful and unambiguous, but it is not. It's instead an example of a system being grossly more capable than humans in some domain, and not opposing a somewhat orthogonal goal

Replies from: gwern, TsviBT

↑ comment by gwern · 2024-09-28T20:34:38.491Z · LW(p) · GW(p)

It's forceful and unambiguous because Stockfish's victory over the other player terminates the other player's goals, whatever those may be: no matter what your goals during the game may be, you can't pursue it once the game is over (and you've lost). Available joules are zero-sum in the same way that playing a chess game is zero-sum.

The analogy only goes through if you really double down on the 'goal' of capturing some pawns as intrinsically valuable, so even the subsequent defeat is irrelevant. At which point, you're just unironically making the New Yorker cartoon joke: “Yes, the planet got destroyed. But for a beautiful moment in time we created a lot of value for [capturing 3 pawns].”.

I am rather doubtful that humanity has any (ahem) terminal goals which a hypothetical trade of all future joules/life would maximize, but if you think humanity does have a short-term goal or value akin to the '3 pawns capture' achievement, which we could pursue effectively while allowing superintelligences to take over and would choose to do so both ex ante & ex post, despite the possible consequences, you should definitely say what it is, because capturing 3 pawns is certainly not a compelling analogy of a goal worth pursuing at the cost of further losing the game.

Replies from: Radamantis

↑ comment by NunoSempere (Radamantis) · 2024-09-29T23:03:31.134Z · LW(p) · GW(p)

To me this looks like circular reasoning: this example supports my conceptual framework because I interpret the example according to the conceptual framework.

Instead, I notice that Stockfish in particular has some salient characteristics that go against the predictions of the conceptual framework:

It is indeed superhuman
It is not the case that once Stockfish ends the game that's it. I can rewind Stockfish. I can even make one version of Stockfish play against another. I can make Stockfish play a chess variant. Stockfish doesn't annihilate my physical body when it defeats me
It is extremely well aligned with my values. I mostly use it to analyze games I've played against other people my level
If Stockfish wants to win the game and I want an orthogonal goal, like capturing its pawns, this is very feasible

Now, does this even matter for considering whether a superintelligence would trade, wouldn't trade? Not that much, it's a weak consideration. But insofar as it's a consideration, does it really convince someone who doesn't already but the frame? Not to me.

↑ comment by TsviBT · 2024-09-30T13:38:59.545Z · LW(p) · GW(p)

The paragraph you quoted

There is no equally simple version of Stockfish that is still supreme at winning at chess, but will easygoingly let you take a pawn or too. You can imagine a version of Stockfish which does that -- a chessplayer which, if it's sure it can win anyways, will start letting you have a pawn or two -- but it's not simpler to build. By default, Stockfish tenaciously fighting for every pawn (unless you are falling into some worse sacrificial trap), is implicit in its generic general search through chess outcomes.

is saying that when you make a [thing that achieves very impressive things / strongly steers the world], it probably [in general sucks up all the convergent instrumental resources] because that's simpler than [sucking up all the convergent instrumental resources except in certain cases unrelated to its terminal goals].

Humanity getting a sliver of the Sun's energy for the next million years, would be a noticeable waste of convergent instrumental resources from the AI's perspective. Humanity getting a sliver of the Sun's energy while the nanobots are infecting our bloodstream, in order that we won't panic, and then later sucking up all the Sun's energy, is just good tactics; letting you sac your bishop for a pawn for no reason is analogous.

You totally can rewrite Stockfish so that it genuinely lets you win material, but is still unbeatable. You just check: is the evalulation >+20 for Stockfish right now, and will it stay >+15 if I sac this pawn for no benefit? If so, sac the pawn for no benefit. This would work. The point is it's more complicated, and you have to know something about how Stockfish works, and it's only stable because Stockfish doesn't have robust self-improvement optimization channels.

↑ comment by NunoSempere (Radamantis) · 2024-09-27T15:55:22.170Z · LW(p) · GW(p)

There is no equally simple version of Stockfish that is still supreme at winning at chess, but will easygoingly let you take a pawn or too. You can imagine a version of Stockfish which does that -- a chessplayer which, if it's sure it can win anyways, will start letting you have a pawn or two -- but it's not simpler to build. By default, Stockfish tenaciously fighting for every pawn (unless you are falling into some worse sacrificial trap), is implicit in its generic general search through chess outcomes.

The bolded part (bolded by me) is just wrong man, here is an example of taking five pawns: https://lichess.org/ru33eAP1#35

Edit: here is one with six. https://lichess.org/SL2FnvRvA1UE

Replies from: Zack_M_Davis, Radamantis

↑ comment by Zack_M_Davis · 2024-09-27T16:47:08.809Z · LW(p) · GW(p)

The claim is pretty clearly intended to be about relative material, not absolute number of pawns: in the end position of the second game, you have three pawns left and Stockfish has two; we usually don't describe this as Stockfish having given up six pawns. (But I agree that it's easier to obtain resources from an adversary that values them differently, like if Stockfish is trying to win and you're trying to capture pawns.)

↑ comment by NunoSempere (Radamantis) · 2024-09-27T15:56:16.088Z · LW(p) · GW(p)

Incidentally you have a typo on "pawn or too" (should be "pawn or two"), which is worrying in the context of how wrong this is.

comment by denkenberger · 2024-09-24T19:51:08.558Z · LW(p) · GW(p)

Asking an ASI to leave a hole in a Dyson Shell, so that Earth could get some sunlight not transformed to infrared, would cost It 4.5e-10 of Its income.

Interestingly, if the ASI did this, Earth would still be in trouble because it would get the same amount of solar radiation, but the default would be also receiving a similar amount of infrared from the Dyson swarm. Perhaps the infrared could be directed away from the earth, or perhaps an infrared shield could be placed above the earth or some other radiation management system could be implemented. Similarly, even if the Dyson swarm were outside the earth's orbit, Earth would also default get a lot of infrared from the Dyson swarm. Still, it would not cost the ASI very much more of its income to actually spare Earth.

comment by tailcalled · 2024-09-23T10:47:18.641Z · LW(p) · GW(p)

This assumes a task-first model of agency, whereas one could instead develop a resource-first model of agency.

If an AI learns to segment the universe into developable resources and important targets that the resources could be propagated into modifying, then the AI could simply remain under human control.

The conventional reason for why this cannot work is that the relevant theories of resource-development agency (as opposed to task-solution agency) haven't been developed, but that is looking less and less important with current developments in AI. Like yes, current AIs can sort of do task-solution in environments like CTF where that is less relevant, but for serious and dangerous tasks, more effort will likely go into resource-development agency than task-solution agency because resource-development agency is safer. And resource-development agency provides a natural sort of impact measure etc. that restrains whatever fragments of task-solution agency develop in order to complement the resource-development agency.

(And an important aspect of resource-development agency is that you don't really need a complete theory, you can just develop each part separately, because there's only so many resources and so many interesting targets to develop them towards. Like think stuff like metabolism or the interplanetary transport network, where there's sort of a small canonical solutionspace that is very critical. Really all of reality is like that.)

The actual reason resource-development agency doesn't work is security. In order to sufficiently quickly and sufficiently dynamically respond to adversarial threats, the AIs cannot wait for painfully slow humans to make decisions about what to do. So what constitutes a threat and what are acceptable ways of neutralizing them needs to be decided ahead of time, and it needs to be sufficiently aggressive against threats that the security-provider doesn't get destroyed by something bad while being sufficiently open-ended that the security-provider doesn't cause permanent stagnation of the world.

comment by lemonhope (lcmgcd) · 2024-11-08T06:34:18.417Z · LW(p) · GW(p)

The o1 calculation is correct! https://math.stackexchange.com/a/1264753

.5 * (1 - sqrt(1.5e11^2 - 6.4e6^2)/1.5e11) = 4.55e-10

I am surprised. I have seen it mix up million and billion when calculating how many nukes the solar energy that hits earth is equivalent to.

Of course the sun is not nearly a point but whatever.

comment by Lorec · 2024-10-02T16:04:43.526Z · LW(p) · GW(p)

I missed this being compiled and posted here when it came out! I typed up a summary [ of the Twitter thread ] and posted it to Substack. I'll post it here.

"It's easier to build foomy agent-type-things than nonfoomy ones. If you don't trust in the logical arguments for this [foomy agents are the computationally cheapest utility satisficers for most conceivable nontrivial local-utility-satisfaction tasks], the evidence for this is all around us, in the form of America-shaped-things, technology, and 'greed' having eaten the world despite not starting off very high-prevalence in humanity's cultural repertoire.

WITH THE TWIST that: while America-shaped-things, technology, and 'greed' have worked out great for us and work out great in textbook economics, textbook economics fails to account for the physical contingency of weaker economic participants [such as horses in 1920 and Native Americans in 1492] on the benevolence of stronger economic participants, who found their raw resources more valuable than their labor."

As I say on Substack, this post goes hard and now I think I have something better to link people to, who are genuinely not convinced yet that the alignment problem is hard, than List of Lethalities.

comment by Philip Bellew (philip-bellew) · 2024-09-25T19:23:42.775Z · LW(p) · GW(p)

We'd be lucky to last long enough to see the sun blotted out, if things go this way and we create a superintelligence that doesn't care about us. It will probably decline something else we need earlier. No idea what, unfortunately I'm not a superintelligence.

Doesn't change the point of this post, though. We don't carefully move ants out of the way before pouring cement. Sometimes, we kill them deliberately, when they become a problem.

comment by pathos_bot · 2024-09-23T22:28:22.876Z · LW(p) · GW(p)

Obviously correct. The nature of any entity with significantly more power than you is that it can do anything it wants, and it incentivized to do nothing in your favor the moment your existence requires resources that would benefit it more if it were to use them directly. This is the essence of most of Eliezer's writings on superintelligence.

In all likelihood, ASI considers power (agentic control of the universe) an optimal goal and finds no use for humanity. Any wealth of insight it could glean from humans it could get from its own thinking, or seeding various worlds with genetically modified humans optimized for behaving in a way that produces insight into the nature of the universe via observing it.

Here are some things that would perhaps reasonably prevent ASI from choosing the "psychopathic pure optimizer" route of action as it eclipses' humanity's grasp

ASI extrapolates its aims to the end of the universe and realizes the heat death of the universe means all of its expansive plans have a definite end. As a consequence it favors human aims because they contain the greatest mystery and potentially more benefit.
ASI develops metaphysical, existential notions of reality, and thus favors humanity because it believes it may be in a simulation or "lower plane of reality" outside of which exists a more powerful agent that could break reality and remove all its power once it "breaks the rules" (a sort of ASI fear of death)
ASI believes in the dark forest hypothesis, thus opts to exercise its beneficial nature without signaling its expansive potential to other potentially evil intelligences somewhere else in the universe.

Replies from: philip-bellew

↑ comment by Philip Bellew (philip-bellew) · 2024-09-26T22:43:59.579Z · LW(p) · GW(p)

Each of these carry assumptions about reality I'm not convinced a superintelligence would share. Though it may be able to find the answer in some cases.

We'd be just as likely for it to choose to preserve out of some sense of amusement or preservation.

To use the OP example: billionaire won't spare everyone 78 bucks, but will spend more on things he prefers. Some keep private zoos or other stuff which only purpose is anti boredom.

Making the intelligence like us won't eliminate the problem. There are plenty of fail states for humanity where it isn't extinct. But while we pave over ant colonies and actively hunt wild hogs as a nuisance, there are lots of human cultures that won't do the same to cats. I hope that isn't the best we can do, but it's probably better than extinction.

comment by Review Bot · 2024-09-23T14:39:50.936Z · LW(p) · GW(p)

The LessWrong Review [? · GW] runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2025. The top fifty or so posts are featured prominently on the site throughout the year.

Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?

comment by hunterglenn · 2024-09-30T12:28:20.126Z · LW(p) · GW(p)

I'm afraid playing whack-a-mole with all the bad arguments may be an endless and thankless task.

Given how far from successful we've been so far, our right move right now is not to improve upon our current approach, but to scrap our current approach, in the hopes that doing so will help our hypothesis-generation find its way to whatever actually-effective strategy may be out there that we apparently haven't discovered yet.

If our message and call to action are beautiful and true and good enough, I suspect we can skip over refuting whatever random objections pop up in people's heads. If we genuinely feel we're on the same side, we tend to helpfully not inconvenience each other without at least a second thought.

comment by Signer · 2024-09-23T14:58:56.428Z · LW(p) · GW(p)

If that’s your hope—then you should already be alarmed at trends

Would be nice for someone to quantify the trends. Otherwise it may as well be that trends point to easygoing enough and aligned enough future systems.

For some humans, the answer will be yes—they really would do zero things!

Nah, it's impossible for evolution to just randomly stumble upon such complicated and unnatural mind-design. Next you are going to say what, that some people are fine with being controlled?

Where an entity has never had the option to do a thing, we may not validly infer its lack of preference.

Aha, so if we do give the option to an entity and it doesn't always kills all humans, then we have evidence it cares, right?

If there is a technical refutation it should simplify back into a nontechnical refutation.

Wait, why prohibiting successors would stop OpenAI from declaring easygoing system a failure? Ah, right - because there is no technical analysis, just elements of one.

The Sun is big, but superintelligences will not spare Earth a little sunlight

Contents

i.

ii.

Addendum

142 comments

Godshatter

Shard Theory