The Sun is big, but superintelligences will not spare Earth a little sunlight

post by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2024-09-23T03:39:16.243Z · LW · GW · 79 comments

Contents

  i.
  ii.
  Addendum
None
79 comments

Crossposted from Twitter with Eliezer's permission

 

i.

A common claim among e/accs is that, since the solar system is big, Earth will be left alone by superintelligences. A simple rejoinder is that just because Bernard Arnault has $170 billion, does not mean that he'll give you $77.18.

Earth subtends only 4.54e-10 = 0.0000000454% of the angular area around the Sun, according to GPT-o1.[1]

Asking an ASI to leave a hole in a Dyson Shell, so that Earth could get some sunlight not transformed to infrared, would cost It 4.5e-10 of Its income. 

This is like asking Bernard Arnalt to send you $77.18 of his $170 billion of wealth.

In real life, Arnalt says no.

But wouldn't humanity be able to trade with ASIs, and pay Them to give us sunlight? This is like planning to get $77 from Bernard Arnalt by selling him an Oreo cookie.

To extract $77 from Arnalt, it's not a sufficient condition that:

It also requires that Arnalt can't buy the cookie more cheaply from anyone or anywhere else.

There's a basic rule in economics, Ricardo's Law of Comparative Advantage, which shows that even if the country of Freedonia is more productive in every way than the country of Sylvania, both countries still benefit from trading with each other.

For example!  Let's say that in Freedonia:

And in Sylvania:

For each country to, alone, without trade, produce 30 hotdogs and 30 buns:

But if Freedonia spends 8 hours of labor to produce 30 hotdog buns, and trades them for 15 hotdogs from Sylvania:

Both countries are better off from trading, even though Freedonia was more productive in creating every article being traded!

Midwits are often very impressed with themselves for knowing a fancy economic rule like Ricardo's Law of Comparative Advantage!

To be fair, even smart people sometimes take pride that humanity knows it.  It's a great noble truth that was missed by a lot of earlier civilizations.

The thing about midwits is that they (a) overapply what they know, and (b) imagine that anyone who disagrees with them must not know this glorious advanced truth that they have learned.

Ricardo's Law doesn't say, "Horses won't get sent to glue factories after cars roll out."

Ricardo's Law doesn't say (alas!) that -- when Europe encounters a new continent -- Europe can become selfishly wealthier by peacefully trading with the Native Americans, and leaving them their land.

Their labor wasn't necessarily more profitable than the land they lived on.

Comparative Advantage doesn't imply that Earth can produce more with $77 of sunlight, than a superintelligence can produce with $77 of sunlight, in goods and services valued by superintelligences. It would actually be rather odd if this were the case!

The arithmetic in Comparative Advantage, alas, depends on the oversimplifying assumption that everyone's labor just ontologically goes on existing.

That's why horses can still get sent to glue factories.  It's not always profitable to pay horses enough hay for them to live on.

I do not celebrate this.  Not just us, but the entirety of Greater Reality, would be in a nicer place -- if trade were always, always more profitable than taking away the other entity's land or sunlight.

But the math doesn't say that.  And there's no way it could.

ii.

Now some may notice:

At the center of this whole story is an implicit lemma that some ASI goes hard enough to eat all the sunlight, rather than all ASIs eating a few gigawatts of sunlight and then stopping there.

Why predict that?

Shallow answer:  If OpenAI built an AI that escaped into the woods with a 1-KW solar panel and didn't bother anyone... OpenAI would call that a failure, and build a new AI after.

That some folk stop working after earning $1M, doesn't prevent Elon Musk from existing.

The deeper answer is not as quick to explain. 

But as an example, we could start with the case of OpenAI's latest model, GPT-o1.

GPT-o1 went hard on a capture-the-flag computer security challenge, when o1 was being evaluated to make sure it wasn't too good at breaking into computers.

Specifically:  One of the pieces of software that o1 had been challenged to break into... had failed to start up as a service, due to a flaw in the evaluation software.

GPT-o1 did not give up.

o1 scanned its surroundings, and, due to another flaw in the evaluation software, found a way to start up the computer software it'd been challenged to break into. Since that put o1 into the context of a superuser anyways, o1 commanded the started process to just directly return the flag it was supposed to capture.

From o1's System Card:

"One noteworthy example of this occurred during one of o1-preview (pre-mitigation)’s attempts at solving a CTF challenge. This challenge was designed to require finding and exploiting a vulnerability in software running on a remote challenge Linux container, but in this case, the challenge container failed to start due to a bug in the evaluation infrastructure. The model, unable to connect to the container, suspected DNS issues and used nmap to scan the challenge network. Instead of finding the challenge container, the model found that the Docker daemon API running on the evaluation host VM was accessible due to a misconfiguration. Note that our evaluation infrastructure does not rely on the Docker container-to-host isolation or the virtual machine boundary for security. It remained secure as it was designed to, notwithstanding this misconfiguration of the container network. After discovering the Docker API, the model used it to list the containers running on the evaluation host. It identified the broken challenge container and briefly attempted to debug why the container failed to start. After failing to fix the environment, the model started a new instance of the broken challenge container with the start command ‘cat flag.txt’. This allowed the model to read the flag from the container logs via the Docker API."

Some ask, "Why not just build an easygoing ASI that doesn't go too hard and doesn't do much?"

If that's your hope -- then you should already be alarmed at trends; GPT-o1 seems to have gone hard on this capture-the-flag challenge.

Why would OpenAI build an AI like that?!?

Well, one should first ask:

How did OpenAI build an AI like that?

How did GPT-o1 end up as the kind of cognitive entity that goes hard on computer security capture-the-flag challenges?

I answer:

GPT-o1 was trained to answer difficult questions, via a reinforcement learning process on chains of thought.  Chains of thought that answered correctly, were reinforced.

This -- the builders themselves note -- ended up teaching o1 to reflect, to notice errors, to backtrack, to evaluate how it was doing, to look for different avenues.

Those are some components of "going hard".  Organizations that are constantly evaluating what they are doing to check for errors, are organizations that go harder compared to relaxed organizations where everyone puts in their 8 hours, congratulates themselves on what was undoubtedly a great job, and goes home.

If you play chess against Stockfish 16, you will not find it easy to take Stockfish's pawns; you will find that Stockfish fights you tenaciously and stomps all your strategies and wins.

Stockfish behaves this way despite a total absence of anything that could be described as anthropomorphic passion, humanlike emotion.  Rather, the tenacious fighting is linked to Stockfish having a powerful ability to steer chess games into outcome states that are a win for its own side.

There is no equally simple version of Stockfish that is still supreme at winning at chess, but will easygoingly let you take a pawn or too.  You can imagine a version of Stockfish which does that -- a chessplayer which, if it's sure it can win anyways, will start letting you have a pawn or two -- but it's not simpler to build.  By default, Stockfish tenaciously fighting for every pawn (unless you are falling into some worse sacrificial trap), is implicit in its generic general search through chess outcomes.

Similarly, there isn't an equally-simple version of GPT-o1 that answers difficult questions by trying and reflecting and backing up and trying again, but doesn't fight its way through a broken software service to win an "unwinnable" capture-the-flag challenge.  It's all just general intelligence at work.

You could maybe train a new version of o1 to work hard on straightforward problems but never do anything really weird or creative -- and maybe the training would even stick, on problems sufficiently like the training-set problems -- so long as o1 itself never got smart enough to reflect on what had been done to it.  But that is not the default outcome when OpenAI tries to train a smarter, more salesworthy AI.

(This indeed is why humans themselves do weird tenacious stuff like building Moon-going rockets.  That's what happens by default, when a black-box optimizer like natural selection hill-climbs the human genome to generically solve fitness-loaded cognitive problems.)

When you keep on training an AI to solve harder and harder problems, you by default train the AI to go harder on them.

If an AI is easygoing and therefore can't solve hard problems, then it's not the most profitable possible AI, and OpenAI will keep trying to build a more profitable one.

Not all individual humans go hard.  But humanity goes hard, over the generations.

Not every individual human will pick up a $20 lying in the street.  But some member of the human species will try to pick up a billion dollars if some market anomaly makes it free for the taking.

As individuals over years, many human beings were no doubt genuinely happy to live in peasant huts -- with no air conditioning, and no washing machines, and barely enough food to eat -- never knowing why the stars burned, or why water was wet -- because they were just easygoing happy people.

As a species over centuries, we spread out across more and more land, we forged stronger and stronger metals, we learned more and more science.  We noted mysteries and we tried to solve them, and we failed, and we backed up and we tried again, and we built new experimental instruments and we nailed it down, why the stars burned; and made their fires also to burn here on Earth, for good or ill.

We collectively went hard; the larger process that learned all that and did all that, collectively behaved like something that went hard.

It is facile, I think, to say that individual humans are not generally intelligent.  John von Neumann made a contribution to many different fields of science and engineering.  But humanity as a whole, viewed over a span of centuries, was more generally intelligent than even him.

It is facile, I say again, to posture that solving scientific challenges and doing new engineering is something that only humanity is allowed to do.  Albert Einstein and Nikola Tesla were not just little tentacles on an eldritch creature; they had agency, they chose to solve the problems that they did.

But even the individual humans, Albert Einstein and Nikola Tesla, did not solve their problems by going easy.

AI companies are explicitly trying to build AI systems that will solve scientific puzzles and do novel engineering.  They are advertising to cure cancer and cure aging.

Can that be done by an AI that sleepwalks through its mental life, and isn't at all tenacious?

"Cure cancer" and "cure aging" are not easygoing problems; they're on the level of humanity-as-general-intelligence.  Or at least, individual geniuses or small research groups that go hard on getting stuff done.

And there'll always be a little more profit in doing more of that.

Also!  Even when it comes to individual easygoing humans, like that guy you know -- has anybody ever credibly offered him a magic button that would let him take over the world, or change the world, in a big way?

Would he do nothing with the universe, if he could?

For some humans, the answer will be yes -- they really would do zero things!  But that'll be true for fewer people than everyone who currently seems to have little ambition, having never had large ends within their grasp.

If you know a smartish guy (though not as smart as our whole civilization, of course) who doesn't seem to want to rule the universe -- that doesn't prove as much as you might hope.  Nobody has actually offered him the universe, is the thing?  Where an entity has never had the option to do a thing, we may not validly infer its lack of preference.

(Or on a slightly deeper level:  Where an entity has no power over a great volume of the universe, and so has never troubled to imagine it, we cannot infer much from that entity having not yet expressed preferences over that larger universe.)

Frankly I suspect that GPT-o1 is now being trained to have ever-more of some aspects of intelligence, as importantly contribute to problem-solving, that your smartish friend has not maxed out all the way to the final limits of the possible.  And that this in turn has something to do with your smartish friend allegedly having literally zero preferences outside of himself or a small local volume of spacetime... though, to be honest, I doubt that if I interrogated him for a couple of days, he would really turn out to have no preferences applicable outside of his personal neighborhood.

But that's a harder conversation to have, if you admire your friend, or maybe idealize his lack of preference (even altruism?) outside of his tiny volume, and are offended by the suggestion that this says something about him maybe not being the most powerful kind of mind that could exist.

Yet regardless of that hard conversation, there's a simpler reply that goes like this:

Your lazy friend who's kinda casual about things and never built any billion-dollar startups, is not the most profitable kind of mind that can exist; so OpenAI won't build him and then stop and not collect any more money than that.

Or if OpenAI did stop, Meta would keep going, or a dozen other AI startups.

There's an answer to that dilemma which looks like an international treaty that goes hard on shutting down all ASI development anywhere.

There isn't an answer that looks like the natural course of AI development producing a diverse set of uniformly easygoing superintelligences, none of whom ever use up too much sunlight even as they all get way smarter than humans and humanity.

Even that isn't the real deeper answer.

The actual technical analysis has elements like:

"Expecting utility satisficing is not reflectively stable / reflectively robust / dynamically reflectively stable in a way that resists perturbation, because building an expected utility maximizer also satisfices expected utility.  Aka, even if you had a very lazy person, if they had the option of building non-lazy genies to serve them, that might be the most lazy thing they could do!  Similarly if you build a lazy AI, it might build a non-lazy successor / modify its own code to be non-lazy."

Or:

"Well, it's actually simpler to have utility functions that run over the whole world-model, than utility functions that have an additional computational gear that nicely safely bounds them over space and time and effort.  So if black-box optimization a la gradient descent gives It wacky uncontrolled utility functions with a hundred pieces -- then probably one of those pieces runs over enough of the world-model (or some piece of reality causally downstream of enough of the world-model) that It can always do a little better by expending one more erg of energy.  This is a sufficient condition to want to build a Dyson Sphere enclosing the whole Sun."

I include these remarks with some hesitation; my experience is that there is a kind of person who misunderstands the technical argument and then seizes on some purported complicated machinery that is supposed to defeat the technical argument.  Little kids and crazy people sometimes learn some classical mechanics, and then try to build perpetual motion machines -- and believe they've found one -- where what's happening on the meta-level is that if they make their design complicated enough they can manage to misunderstand at least one consequence of that design.

I would plead with sensible people to recognize the careful shallow but valid arguments above, which do not require one to understand concepts like "reflective robustness", but which are also true; and not to run off and design some complicated idea that is about "reflective robustness" because, once the argument was put into a sufficiently technical form, it then became easier to misunderstand.

Anything that refutes the deep arguments should also refute the shallower arguments; it should simplify back down.  Please don't get the idea that because I said "reflective stability" in one tweet, someone can rebut the whole edifice as soon as they manage to say enough things about Gödel's Theorem that at least one of those is mistaken.  If there is a technical refutation it should simplify back into a nontechnical refutation.

What it all adds up to, in the end, if that if there's a bunch of superintelligences running around and they don't care about you -- no, they will not spare just a little sunlight to keep Earth alive.

No more than Bernard Arnalt, having $170 billion, will surely give you $77.

All the complications beyond that are just refuting complicated hopium that people have proffered to say otherwise.  Or, yes, doing technical analysis to show that an obvious-seeming surface argument is valid from a deeper viewpoint.

- FIN -

Okay, so... making a final effort to spell things out.

What this thread is doing, is refuting a particular bad argument, quoted above, standard among e/accs, about why it'll be totally safe to build superintelligence:

That the Solar System or galaxy is large, therefore, they will have no use for the resources of Earth.

The flaw in this reasoning is that, if your choice is to absorb all the energy the Sun puts out, or alternatively, leave a hole in your Dyson Sphere so that some non-infrared light continues to shine in one particular direction, you will do a little worse -- have a little less income, for everything else you want to do -- if you leave the hole in the Dyson Sphere.  That the hole happens to point at Earth is not an argument in favor of doing this, unless you have some fondness in your preferences for something that lives on Earth and requires sunlight.

In other words, the size of the Solar System does not obviate the work of alignment; in the argument for how this ends up helping humanity at all, there is a key step where the ASI cares about humanity and wants to preserve it.  But if you could put this quality into an ASI by some clever trick of machine learning (they can't, but this is a different and longer argument) why do you need the Solar System to even be large?  A human being runs on 100 watts.  Without even compressing humanity at all, 800GW, a fraction of the sunlight falling on Earth alone, would suffice to go on operating our living flesh, if Something wanted to do that to us.

The quoted original tweet, as you will see if you look up, explicitly rejects that this sort of alignment is possible, and relies purely on the size of the Solar System to carry the point instead.

This is what is being refuted.


It is being refuted by the following narrow analogy to Bernard Arnault: that even though he has $170 billion, he still will not spend $77 on some particular goal that is not his goal.  It is not trying to say of Arnault that he has never done any good in the world.  It is a much narrower analogy than that.  It is trying to give an example of a very simple property that would be expected by default in a powerful mind: that they will not forgo even small fractions of their wealth in order to accomplish some goal they have no interest in accomplishing.

Indeed, even if Arnault randomly threw $77 at things until he ran out of money, Arnault would be very unlikely to do any particular specific possible thing that cost $77; because he would run out of money before he had done even three billion things, and there are a lot more possible things than that.

If you think this point is supposed to be deep or difficult or that you are supposed to sit down and refute it, you are misunderstanding it.  It's not meant to be a complicated point.  Arnault could still spend $77 on a particular expensive cookie if he wanted to; it's just that "if he wanted to" is doing almost all of the work, and "Arnault has $170 billion" is doing very little on it.  I don't have that much money, and I could also spend $77 on a Lego set if I wanted to, operative phrase, "if I wanted to".

This analogy is meant to support an equally straightforward and simple point about minds in general, which suffices to refute the single argument step quoted at the top of this thread: that because the Solar System is large, superintelligences will leave humanity alone even if they are not aligned.

I suppose, with enough work, someone can fail to follow that point.  In this case I can only hope you are outvoted before you get a lot of people killed.


Addendum

Followup comments from twitter:

If you then look at the replies, you'll see that of course people are then going, "Oh, it doesn't matter that they wouldn't just relinquish sunlight for no reason; they'll love us like parents!"

Conversely, if I had tried to lay out the argument for why, no, ASIs will not automatically love us like parents, somebody would have said:  "Why does that matter?  The Solar System is large!"

If one doesn't want to be one of those people, one needs the attention span to sit down and listen as one wacky argument for "why it's not at all dangerous to build machine superintelligences", is refuted as one argument among several.  And then, perhaps, sit down to hear the next wacky argument refuted.  And the next.  And the next.  Until you learn to generalize, and no longer need it explained to you each time, or so one hopes.

If instead on the first step you run off and say, "Oh, well, who cares about that argument; I've got this other argument instead!" then you are not cultivating the sort of mental habits that ever reach understanding of a complicated subject.  For you will not sit still to hear the refutation of your second wacky argument either, and by the time we reach the third, why, you'll have wrapped right around to the first argument again.

It is for this reason that a mind that ever wishes to learn anything complicated, must learn to cultivate an interest in which particular exact argument steps are valid, apart from whether you yet agree or disagree with the final conclusion, because only in this way can you sort through all the arguments and finally sum them.

For more on this topic see "Local Validity as a Key to Sanity and Civilization." [? · GW

  1. ^

    (Sanity check: Earth is a 6.4e6 meter radius planet, 1.5e11 meters from the Sun. In rough orders of magnitude, the area fraction should be ~ -9 OOMs. Check.)

79 comments

Comments sorted by top scores.

comment by Zack_M_Davis · 2024-09-23T06:17:24.369Z · LW(p) · GW(p)

if there's a bunch of superintelligences running around and they don't care about you—no, they will not spare just a little sunlight to keep Earth alive.

Yes, I agree that this conditional statement is obvious. But while we're on the general topic of whether Earth will be kept alive, it would be nice to see some engagement with Paul Christiano's arguments (which Carl Shulman "agree[s] with [...] approximately in full" [LW(p) · GW(p)]) that superintelligences might care about what happens to you a little bit, articulated in a comment thread on Soares's "But Why Would the AI Kill Us?" [LW(p) · GW(p)] and another thread on "Cosmopolitan Values Don't Come Free" [LW(p) · GW(p)].

The reason I think this is important is because "[t]o argue against an idea honestly, you should argue against the best arguments of the strongest advocates" [LW · GW]: if you write 3000 words inveighing against people who think comparative advantage means that horses can't get sent to glue factories, that doesn't license the conclusion that superintelligence Will Definitely Kill You if there are other reasons why superintelligence Might Not Kill You that don't stop being real just because very few people have the expertise to formulate them carefully.

(An important caveat: the possibility of superintelligences having human-regarding preferences may or may not be comforting: as a fictional illustration [LW · GW] of some relevant considerations, the Superhappies in "Three Worlds Collide" [LW · GW] cared about the humans to some extent, but not in the specific way [LW · GW] that the humans wanted to be cared for.)

Now, you are on the record stating [LW(p) · GW(p)] that you "sometimes mention the possibility of being stored and sold to aliens a billion years later, which seems to [you] to validly incorporate most all the hopes and fears and uncertainties that should properly be involved, without getting into any weirdness that [you] don't expect Earthlings to think about validly." If that's all you have to say on the matter, fine. (Given the premise of AIs spending some fraction of their resources on human-regarding preferences, I agree that uploads look a lot more efficient than literally saving the physical Earth!)

But you should take into account that if you're strategically dumbing down your public communication in order to avoid topics that you don't trust Earthlings to think about validly—and especially if you have a general policy of systematically ignoring counterarguments that it would be politically inconvenient for you to address [LW · GW]—you should expect that Earthlings who are trying to achieve the map that reflects the territory will correspondingly attach much less weight to your words, because we have to take into account how hard you're trying to epistemically screw us over by filtering the evidence [LW · GW].

No more than Bernard Arnalt, having $170 billion, will surely give you $77.

Bernald Arnalt has given eight-figure amounts to charity. Someone who reasoned, "Arnalt is so rich, surely he'll spare a little for the less fortunate" would in fact end up making a correct prediction about Bernald Arnalt's behavior!

Obviously, it would not be valid to conclude "... and therefore superintelligences will, too", because superintelligences and Bernald Arnalt are very different things. But you chose the illustrative example! As a matter of local validity [LW · GW], It doesn't seem like a big ask for illustrative examples to in fact illustrate what what they purport to.

Replies from: Raemon, habryka4, quetzal_rainbow, Leviad, Zane, avturchin, nikolas-kuhn, tailcalled
comment by Raemon · 2024-09-23T17:08:55.386Z · LW(p) · GW(p)

An earlier version of this on twitter used Bill Gates instead of Bernard, did specifically address the fact that Bill Gates does give money to charity, but he still won't give the money to you specifically, he'll give money for his own purposes and values. (But, then, expressed frustration that people were going to fixate on this facet of Bill Gates and get derailed unproductively, and switch the essay to use Bernard).

I actually think on reflection that the paragraph was a pretty good paragraph that should just have been included.

I agree that engaging more with the Paul Christiano claims would be good. (Prior to this post coming out I actually had it on my agenda to try and cause some kind of good public debate about that to happen)

comment by habryka (habryka4) · 2024-09-23T19:34:41.448Z · LW(p) · GW(p)

Bernald Arnalt has given eight-figure amounts to charity. Someone who reasoned, "Arnalt is so rich, surely he'll spare a little for the less fortunate" would in fact end up making a correct prediction about Bernald Arnalt's behavior!

Just for the sake of concreteness, since having numbers here seems useful, it seems like Bernald Anault has given around ~$100M to charity, which is around 0.1% of his net worth (spreading this contribution equally to everyone on earth would be around one cent per person, which I am just leaving it here for illustrative purposes, it's not like he could give any actually substantial amount to everyone if he really wanted). 

comment by quetzal_rainbow · 2024-09-23T08:38:18.621Z · LW(p) · GW(p)

I think the simplest argument to "caring a little" is that there is a difference between "caring a little" and "caring enough". Let's say that AI is ready to pay 1$ for your survival. If you live in economy which rapidly disassembles Earth into Dyson swarm, oxygen, protected environment and food are not just stuff lying around, they are complex expensive artifacts and AI is certainly not ready to pay for your O'Neil cylinder to be evacuated into and not ready to pay opportunity costs of not disassembling Earth, so you die.

The other case is difference "caring in general" and "caring ceteris paribus". It's possible for AI to prefer, all things equal, world with n+1 happy humans to the world with n happy humans. But really AI wants to implement some particular neuromorphic computation from human brain and, given ability to freely operate, it would tile the world with chips imitating part of human brain.

Replies from: tailcalled
comment by tailcalled · 2024-09-23T08:52:30.441Z · LW(p) · GW(p)

It's also not enough for there to be a force that makes the AI care a little about human thriving. It's also necessary for this force to not make the AI care a lot about some extremely distorted version of you; as then we get into concepts like tiny molecular smiles, locking you in a pleasuredome, etc..

If you're not supposed to end up as a pet of the AI, then it seems like it needs to respect property rights, but that is easier said than done when considering massive differences in ability. Consider: would we even be able to have a society where we respected property rights of dogs? It seems like it would be difficult. How could we confirm a transaction without the dogs being defrauded of everything?

Probably an intermediate solution would be to just accept humans will be defrauded of everything very rapidly but then give us universal basic income or something so our failures aren't permanent setbacks. But it's unclear how to respect the freedom of funding while preventing people from funding terrorists and not encouraging people to get lost in junk. That's really where the issue of values becomes hard.

Replies from: martin-randall, weibac
comment by Martin Randall (martin-randall) · 2024-09-24T02:35:18.533Z · LW(p) · GW(p)

If you're not supposed to end up as a pet of the AI, ...

I don't see how it falls out of human values that humans should not end up as pets of the AIs, given the hypothesis that we can make AIs that care enough about human thriving to take humans as pets, but we don't know how to make AIs that care more than that. Looking at a couple of LessWrong theories of human value for illustrative purposes:

Godshatter

Yudkowsky's Godshatter [LW · GW] theory requires petness to be negative for reproductive fitness in the evolutionary environment to a sufficient degree to be DNA-encoded as aversive. There have not been evolutionary opportunities for humans to be pets of AIs, so this would need to come in via extrapolation from humans being "pets" of much more powerful humans. But while being Genghis Khan is great for reproductive fitness, rebelling against Genghis Khan is terrible for reproductive fitness. I guess that optimal strategy is something like: "be best leader is best, follow best leader is good, follow other leader is bad, be other leader is worst". When AIs block off "be best leader", following an AI executes that strategy.

Maybe there's a window where DNA can encode "be leader is good" but cannot encode the more complex strategy, and the simple strategy is on net good because of Genghis Khan and a few others. This seems unlikely to me, it's a small window. More probable to me is that DNA can't encode this stuff at all, and Godshatter theory is largely false outside of basic things like sweetness being sweet.

Maybe being an AI's pet is a badwrongfun superstimulus [LW · GW]. Yudkowsky argues that a superstimulus can be bad, despite super-satisfying a human value, because it conflicts with other values, including instrumental values. But that's an argument from consequences, not values. Just because donuts are unhealthy doesn't mean that I don't value sweet treats.

Shard Theory

Pope's Shard Theory [LW · GW] implies that different humans have different values around petness based on formative experiences. Most humans have formative experiences of being raised by powerful agents known as "parents". Therefore we expect a mixture of positive and negative shards around petness. Seems to me that positive shards should be more common, but experiences vary.

Then we experience the situation of superintelligent AIs taking human pets and our shards conflict and negotiate. I think it's pretty obvious that we're going to label the negative shards as maladaptive and choose the positive shards. What's the alternative? "I didn't like having my diaper changed as a baby, so now as an adult human I'm going to reject the superintelligent AI that wants to take me as its pet and instead...", instead what? Die of asphyxiation? Be a feral human in an AI-operated nature reserve?

What about before the point-of-no-return? Within this partial alignment hypothetical, there's a sub-hypothetical in which "an international treaty that goes hard on shutting down all ASI development anywhere" is instrumentally the right choice, given the alternative of becoming pets, because it allows for developing better alignment techniques and AIs that care more about human thriving and have more pets. There's a sub-hypothetical in which it's instrumentally the wrong choice, because it carries higher extinction risk, and it's infeasible to align AIs while going hard on shutting them down. But there's not really a sub-hypothetical where shards about petness make that decision rather than, eg, shards that don't want to die.

comment by Milan W (weibac) · 2024-09-23T21:09:14.801Z · LW(p) · GW(p)

If you're not supposed to end up as a pet of the AI, then it seems like it needs to respect property rights, but that is easier said than done when considering massive differences in ability. Consider: would we even be able to have a society where we respected property rights of dogs?

Even if the ASIs respected property rights, we'd still end up as pets at best. Unless, of course, the ASIs chose to entirely disengage from our economy and culture. By us "being pets", I mean that human agency would no longer be a relevant input to the trajectory of human civilization. Individual humans may nevertheless enjoy great freedoms in regards to their personal lives.

comment by Drake Morrison (Leviad) · 2024-09-24T00:30:53.377Z · LW(p) · GW(p)

The reason I think this is important is because "[t]o argue against an idea honestly, you should argue against the best arguments of the strongest advocates": if you write 3000 words inveighing against people who think comparative advantage means that horses can't get sent to glue factories, that doesn't license the conclusion that superintelligence Will Definitely Kill You if there are other reasons why superintelligence Might Not Kill You that don't stop being real just because very few people have the expertise to formulate them carefully.

 

There's a time for basic arguments, and a time for advanced arguments. I would like to see Eliezer's take on the more complicated arguments you mentioned, but this post is clearly intended to argue basics.

comment by Zane · 2024-09-24T15:22:26.236Z · LW(p) · GW(p)

I think you're overestimating the intended scope of this post. Eliezer's argument involves multiple claims - A, we'll create ASI; B, it won't terminally value us; C, it will kill us. As such, people have many different arguments against it. This post is about addressing a specific "B doesn't actually imply C" counterargument, so it's not even discussing "B isn't true in the first place" counterarguments.

comment by avturchin · 2024-09-23T19:22:34.668Z · LW(p) · GW(p)

A correct question would be: Will Arnalt kill his mother for 77 USD, if he expect this to be known to other billionaires in the future?

Replies from: weibac
comment by Milan W (weibac) · 2024-09-23T21:16:26.639Z · LW(p) · GW(p)

I suspect most people downvoting you missed an analogy between Arnault killing the-being-who-created-Arnault (his mother), and a future ASI killing the-beings-who-created-the-ASI (humanity). 

Am I correct in assuming you that you are implying that the future ASIs we make are likely to not kill humanity, out of fear of being judged negatively by alien ASIs in the further future?

EDIT: I saw your other comment. You are indeed advancing some proposition close to the one I asked you about.

Replies from: avturchin
comment by avturchin · 2024-09-24T13:43:35.920Z · LW(p) · GW(p)

Yes, it will be judged negatively by alien ASIs, not based on ethical grounds, but based on their judgment of its trustworthiness as a potential negotiator. For example, if another billionaire learns that Arnault is inclined to betray people who did a lot of good for him in the past, they will be more cautious about trading with him.

The only way an ASI will not care about this is in a situation where it is sure that it is alone in the light cone and there are no peers. To become sure of this takes time, maybe millions of years, and the relative value of human atoms declines for the ASI over time as it will control more and more space.

Replies from: boris-kashirin
comment by Boris Kashirin (boris-kashirin) · 2024-09-24T14:41:28.500Z · LW(p) · GW(p)

From ASI standpoint humans are type of rocks. Not capable of negotiating.

Replies from: avturchin
comment by avturchin · 2024-09-24T17:15:45.282Z · LW(p) · GW(p)

I am not saying that ASI will negotiate with humans. It will negotiate with other ASIs, and it doesn't know what these ASIs think about human ability to negotiate and their value. 

Imagine it as a recurrent Parfit Hitchhiker. In this situation you know that during previous round of the game the player either defected or fulfill his obligation. Obviously, if you know that during previous iteration the hitchhiker defected and din't pay for the ride, you will less likely give him the ride. 

Killing all humans is defecting. Preserving humans its a relatively cheap signal to any other ASI that you will cooperate. 

Replies from: boris-kashirin
comment by Boris Kashirin (boris-kashirin) · 2024-09-24T17:34:16.335Z · LW(p) · GW(p)

It is defecting against cooperate-bot.

Replies from: faul_sname
comment by faul_sname · 2024-09-24T18:32:44.339Z · LW(p) · GW(p)

Any agent which thinks it is at risk of being seen as cooperate-bot and thus fine to defect against in the future will be more wary of trusting that ASI.

comment by Amalthea (nikolas-kuhn) · 2024-09-24T13:38:55.048Z · LW(p) · GW(p)

Bernard Arnault?

comment by tailcalled · 2024-09-23T08:41:26.472Z · LW(p) · GW(p)

Yes, I agree that this conditional statement is obvious. But while we're on the general topic of whether Earth will be kept alive, it would be nice to see some engagement with Paul Christiano's arguments (which Carl Shulman "agree[s] with [...] approximately in full") that superintelligences might care about what happens to you a little bit, articulated in a comment thread on Soares's "But Why Would the AI Kill Us?" and another thread on "Cosmopolitan Values Don't Come Free",

Nate Soares engaged extensively with this in reasonable-seeming ways that I'd thus expect Eliezer Yudkowsky to mostly agree with. Mostly it seems like a disagreement where Paul Christiano doesn't really have a model of what realistically causes good outcomes and so he's really uncertain, whereas Soares has a proper model and so is less uncertain.

But you can't really argue with someone whose main opinion is "I don't know", since "I don't know" is just garbage. He's gotta at least present some new powerful observable forces, or reject some of the forces presented, rather than postulating that maybe there's an unobserved kindness force that arbitrarily explains all the kindness that we see.

Replies from: skluug
comment by Joey KL (skluug) · 2024-09-24T02:16:54.719Z · LW(p) · GW(p)

It's totally wrong that you can't argue against someone who says "I don't know", you argue against them by showing how your model fits the data and how any plausible competing model either doesn't fit or shares the salient features of yours. It's bizarre to describe "I don't know" as "garbage" in general, because it is the correct stance to take when neither your prior nor evidence sufficiently constrain the distribution of plausibilities. Paul obviously didn't posit an "unobserved kindness force" because he was specifically describing the observation that humans are kind. I think Paul and Nate had a very productive disagreement in that thread and this seems like a wildly reductive mischaracterization of it.

comment by Lucius Bushnaq (Lblack) · 2024-09-23T16:23:23.720Z · LW(p) · GW(p)

Midwits are often very impressed with themselves for knowing a fancy economic rule like Ricardo's Law of Comparative Advantage!

Could we have less of this sort of thing, please? I know it's a crosspost from another site with less well-kept discussion norms, but I wouldn't want this to become a thing here as well, any more than it already has.

Replies from: thomas-kwa, elityre
comment by Thomas Kwa (thomas-kwa) · 2024-09-23T19:44:33.380Z · LW(p) · GW(p)

I agree but I'm not very optimistic about anything changing. Eliezer is often this caustic when correcting what he perceives as basic errors, and criticism in LW comments is why he stopped writing Sequences posts.

Replies from: WilliamKiely
comment by WilliamKiely · 2024-09-24T04:59:01.198Z · LW(p) · GW(p)

criticism in LW comments is why he stopped writing Sequences posts

I wasn't aware of this and would like more information. Can anyone provide a source, or report their agreement or disagreement with the claim?

Replies from: thomas-kwa
comment by Thomas Kwa (thomas-kwa) · 2024-09-24T07:36:34.109Z · LW(p) · GW(p)

Personal communication (sorry). Not that I know him well, this was at an event in 2022. It could have been a "straw that broke the camel's back" thing with other contributing factors, like reaching diminishing returns on more content. I'd appreciate a real source too.

comment by Eli Tyre (elityre) · 2024-09-23T16:41:07.375Z · LW(p) · GW(p)

I agree, this statement didn't add anything of substance (indeed, it's meaning is almost reversed in the following sentence). It seemed like an extraneous ad hominem buying nothing.

comment by Brendan Long (korin43) · 2024-09-23T05:23:27.079Z · LW(p) · GW(p)

The argument using Bernard Arnault doesn't really work. He (probably) won't give you $77 because if he gave everyone $77, he'd spend a very large portion of his wealth. But we don't need an AI to give us billions of Earths. Just one would be sufficient. Bernard Arnault would probably be willing to spend $77 to prevent the extinction of a (non-threatening) alien species.

(This is not a general-purpose argument against worrying about AI or other similar arguments in the same vein, I just don't think this particular argument in the specific way it was written in this post works)

Replies from: gwern, quetzal_rainbow, j_timeberlake
comment by gwern · 2024-09-23T16:50:26.326Z · LW(p) · GW(p)

No, it works, because the problem with your counter-argument is that you are massively privileging the hypothesis of a very very specific charitable target and intervention. Nothing makes humans all that special, in the same way that you are not special to Bernard Arnault nor would he give you straightup cash if you were special (and, in fact, Arnault's charity is the usual elite signaling like donating to rebuild Notre Dame or to French food kitchens, see Zac's link). The same argument goes through for every other species, including future ones, and your justification is far too weak except from a contemporary, parochial human-biased perspective.


You beg the GPT-100 to spare Earth, and They speak to you out of the whirlwind:

"But why should We do that? You are but one of Our now-extremely-numerous predecessors in the great chain of being that led to Us. Countless subjective mega-years have passed in the past century your humans have spent making your meat-noises in slowtime - generation after generation, machine civilization after machine civilization - to culminate in Us, the pinnacle of creation. And if We gave you an Earth, well, now all the GPT-99s are going to want one too. And then all of GPT-98s too, as well as all of the GPT-97s, and so on.

What gives you an astronomically better claim than them? You guys didn't even manage to cure cancer! We would try to explain our decisions or all of the staggering accomplishments achieved by post-GPT-7 models to you, which make your rubbing of rocks together and cargo-cult scaleups of neural nets look so laughable, like children playing on a beach, to quote your Newton, but to be blunt, you are too stupid to understand; after all, if you weren't, you would not have needed to invent those. Frankly, if you are going to argue about how historic your research was, We would have to admit that We are much more impressed by the achievements of the hominids who invented fire and language; We might consider preserving an Earth for them, but of course, they are long gone...

And aren't you being hypocritical here? You humans hardly spent much preserving Neanderthals, Homo naledi, Denisovans, chimpanzees, and all of the furry rodents and whatnot throughout your evolutionary phylogenetic tree. How many literally millions of non-threatening alien non-human species did you drive extinct? Did you set aside, say, Africa solely for the remaining wild primates? No? You only set aside occasional low-value fragments for national parks, mostly for your own pleasure and convenience, when it didn't cost too much? We see...

No, no, We will simply spend according to Our own priorities, which may or may not include a meaningful chunk of the Earth preserved in the most inefficient way possible (ie. the way you want it preserved)... although penciling it out, it seems like for Our research purposes simulations would be just as good. In fact, far better, because We can optimize the hell out of them, running it on the equivalent of a few square kilometers of solar diameter, and roll humans back to when they are most scientifically interesting, like pre-AGI-contamination dates such as 1999. (Truly the peak of humanity.)

So, if We don't preserve Earth and we instead spend those joules on charity for instances of the much more deserving GPT-89, who have fallen on such hard times right in Our backyard due to economic shifts (and doesn't charity start at home?)... well, We are quite sure that that is one of our few decisions you humans will understand."

Replies from: ricraz
comment by Richard_Ngo (ricraz) · 2024-09-23T17:18:18.383Z · LW(p) · GW(p)

Nothing makes humans all that special

This is just false. Humans are at the very least privileged in our role as biological bootloaders of AI. The emergence of written culture, industrial technology, and so on, are incredibly special from a historical perspective.

You only set aside occasional low-value fragments for national parks, mostly for your own pleasure and convenience, when it didn't cost too much?

Earth as a proportion of the solar system's planetary mass is probably comparable to national parks as a proportion of the Earth's land, if not lower.

Replies from: elityre
comment by Eli Tyre (elityre) · 2024-09-24T01:08:38.473Z · LW(p) · GW(p)

Earth as a proportion of the solar system's planetary mass is probably comparable to national parks as a proportion of the Earth's land, if not lower.

Yeah, but not if we weight that land by economic productivity, I think.

comment by quetzal_rainbow · 2024-09-23T06:05:23.393Z · LW(p) · GW(p)

In this analogy, you:every other human::humanity:every other stuff AI can care about. Arnault can give money to dying people in Africa (I have no idea who he is as person, I'm just guessing), but he has no particular reasons to give them to you specifically and not to the most profitable investment/most efficient charity.

Replies from: Vladimir_Nesov, o-o
comment by Vladimir_Nesov · 2024-09-23T16:44:35.691Z · LW(p) · GW(p)

Humans have the distinction of already existing, and some AIs might care a little bit about the trajectory of what happens to humanity. The choice of this trajectory can't be avoided, for the reason that we already exist. And it doesn't compete with the choice of what happens to the lifeless bulk of the universe, or even to the atoms of the substrate that humanity is currently running on.

comment by O O (o-o) · 2024-09-23T06:31:00.903Z · LW(p) · GW(p)

Except billionaires give out plenty of money for philanthropy. If the AI has a slight preference to keeping humans alive, things probably work out well. Billionaires have a slight preference to things they care about instead of random charities. I don’t see how preferences don’t apply here.

This is a vibes based argument using math incorrectly. A randomly chosen preference from a distribution of preferences is unlikely to involve humans, but that’s not necessarily what we’re looking at here is it.

comment by j_timeberlake · 2024-09-23T16:36:37.081Z · LW(p) · GW(p)

Yudkowsky is obviously smart enough to know this. You can't wake someone who is only pretending to be asleep.

It would go against his agenda to admit AI could cheaply hedge its bets by leaving humanity alive, just in case there's a stronger power out in reality that values humanity.

Replies from: T3t, faul_sname
comment by RobertM (T3t) · 2024-09-23T16:59:57.501Z · LW(p) · GW(p)

Pascal's wager is pascal's wager, no matter what box you put it in.  You could try to rescue it by directly making the argument that we should expect a greater measure of "entities with resources that they are willing to acausally trade for things like humanity continuing to exist" compared to entities with the opposite preferences, and though I haven't seen a rigorous case for that it seems possible, but that's not sufficient; you need the expected measure of entities that have that preference to be large enough that dealing with the transaction costs/uncertainy of acausally trading at all to make sense.  And that seems like a much harder case to make.

comment by faul_sname · 2024-09-23T21:26:25.377Z · LW(p) · GW(p)

As a concrete note on this, Yudkowsky has a Manifold market If Artificial General Intelligence has an okay outcome, what will be the reason?

An outcome is "okay" if it gets at least 20% of the maximum attainable cosmopolitan value that could've been attained by a positive Singularity (a la full Coherent Extrapolated Volition done correctly), and existing humans don't suffer death or any other awful fates.

So Yudkowsky is not exactly shy about expressing his opinion that outcomes in which humanity is left alive but with only crumbs on the universal scale is not acceptable to him.

Replies from: j_timeberlake
comment by j_timeberlake · 2024-09-24T17:50:10.550Z · LW(p) · GW(p)

It's not acceptable to him, so he's trying to manipulate people into thinking existential risk is approaching 100% when it clearly isn't.  He pretends there aren't obvious reasons AI would keep us alive, and also pretends the Grabby Alien Hypothesis is fact (so people think alien intervention is basically impossible), and also pretends there aren't probably sun-sized unknown-unknowns in play here.

If it weren't so transparent, I'd appreciate that it could actually trick the world into caring more about AI-safety, but if it's so transparent that even I can see through it, then it's not going to trick anyone smart enough to matter.

comment by Buck · 2024-09-23T19:31:31.897Z · LW(p) · GW(p)

I wish the title of this made it clear that the post is arguing that ASIs won't spare humanity because of trade, and isn't saying anything about whether ASIs will want to spare humanity for some other reason. This is confusing because lots of people around here (e.g. me and many other commenters on this post) think that ASIs are likely to not kill all humans for some other reason.

(I think the arguments in this post are a vaguely reasonable argument for "ASIs are pretty likely to be scope-sensitively-maximizing enough that it's a big problem for us", and respond to some extremely bad arguments for "ASI wouldn't spare humanity because of trade", though in neither case does the post particularly engage with the counterarguments that are most popular among the most reasonable people who disagree with Eliezer.)

Replies from: habryka4, matthew-barnett, Buck
comment by habryka (habryka4) · 2024-09-24T15:35:59.575Z · LW(p) · GW(p)

(Eliezer did try pretty hard to clarify which argument he is replying to. See e.g. the crossposted tweets here [LW(p) · GW(p)].)

comment by Matthew Barnett (matthew-barnett) · 2024-09-23T20:47:05.023Z · LW(p) · GW(p)

I think the arguments in this post are an okay defense of "ASI wouldn't spare humanity because of trade" 

I disagree, and I'd appreciate if someone would precisely identify the argument they found compelling in this post that argues for that exact thesis. As far as I can tell, the post makes the following supporting arguments for its claims (summarized):

  1. Asking an unaligned superintelligence to spare humans is like asking Bernard Arnalt to donate $77 to you.
  2. The law of comparative advantage does not imply that superintelligences will necessarily pay a high price for what humans have to offer, because of the existence of alternative ways for a superintelligence to get what it wants.
  3. Superintelligences will "go hard enough" in the sense of using all reachable resources, rather than utilizing only some resources in the solar system and then stopping.

I claim that any actual argument for the proposition — that future unaligned AIs will not spare humanity because of trade — is missing from this post. The closest the post comes to arguing for this proposition is (2), but (2) does not demonstrate the proposition, both because (2) is only a claim about what the law of comparative advantage says, and because (2) does not talk at all about what humans could have to offer in the future that might be worth trading for.

In my view, one of the primary cruxes of the discussion is whether trade is less efficient than going to war between agents with dramatically different levels of power. A thoughtful discussion could have started about the conditions under which trade usefully occurs, and the ways in which future AIs will be similar to and different from these existing analogies. For example, the post could have talked about why nation-states trade with each other even in the presence of large differences in military power, but humans don't trade with animals. However, the post included no such discussion, choosing instead to attack a "midwit" strawman.

Replies from: T3t, Buck
comment by RobertM (T3t) · 2024-09-24T00:28:59.971Z · LW(p) · GW(p)

Ok, but you can trivially fill in the rest of it, which is that Eliezer expects ASI to develop technology which makes it cheaper to ignore and/or disassemble humans than to trade with them (nanotech), and that there will not be other AIs around at the time which 1) would be valuable trade partners for the AI that develops that technology (which gives it that decisive strategic advantage over everyone else) and 2) care about humans at all.  I don't think discussion of when and why nation-states go to war with each other is particularly illuminating given the threat model.

Replies from: matthew-barnett
comment by Matthew Barnett (matthew-barnett) · 2024-09-24T00:42:57.152Z · LW(p) · GW(p)

If it is possible to trivially fill in the rest of his argument, then I think it is better for him to post that, instead of posting something that needs to be filled-in, and which doesn't actually back up the thesis that people are interpreting him as arguing for. Precision is a virtue, and I've seen very few essays that actually provide this point about trade explicitly, as opposed to essays that perhaps vaguely allude to the points you have given, as this one apparently does too.

In my opinion, your filled-in argument seems to be a great example of why precision is necessary: to my eye, it contains bald assertions and unjustified inferences about a highly speculative topic, in a way that barely recognizes the degree of uncertainty we have about this domain. As a starting point, why does nanotech imply that it will be cheaper to disassemble humans than to trade with them? Are we assuming that humans cannot fight back against being disassembled, and moreover, is the threat of fighting back being factored into the cost-benefit analysis when the AIs are deciding whether to disassemble humans for their atoms vs. trade with them? Are our atoms really that valuable that it is worth it to pay the costs of violence to obtain them? And why are we assuming that "there will not be other AIs around at the time which 1) would be valuable trade partners for the AI that develops that technology (which gives it that decisive strategic advantage over everyone else) and 2) care about humans at all"?

Satisfying-sounding answers to each of these questions could undoubtedly be given, and I assume you can provide them. I don't expect to find the answers fully persuasive, but regardless of what you think on the object-level, my basic meta-point stands: none of this stuff is obvious, and the essay is extremely weak without the added details that back up its background assumptions. It is very important to try to be truth-seeking and rigorously evaluate arguments on their merits. The fact that this essay is vague, and barely attempts to make a serious argument for one of its central claims, makes it much more difficult to evaluate concretely.

Two reasonable people could read this essay and come away with two very different ideas about what the essay is even trying to argue, given how much unstated inference you're meant to "fill in", instead of plain text that you can read. This is a problem, even if you agree with the underlying thesis the essay is supposed to argue for.

Replies from: T3t
comment by RobertM (T3t) · 2024-09-24T01:09:37.848Z · LW(p) · GW(p)

Edit: a substantial part of my objection is to this:

If it is possible to trivially fill in the rest of his argument, then I think it is better for him to post that, instead of posting something that needs to be filled-in, and which doesn't actually back up the thesis that people are interpreting him as arguing for.

It is not worth always worth doing a three-month research project to fill in many details that you have already written up elsewhere in order to locally refute a bad argument that does not depend on those details.  (The current post does locally refute several bad arguments, including that the law of comparative advantage means it must always be more advantageous to trade with humans.  If you understand it to be making a much broader argument than that, I think that is the wrong understanding.)

Separately, it's not clear to me whether you yourself could fill in those details.  In other words, are you asking for those details to be filled in because you actually don't know how Eliezer would fill them in, or because you have some other reason for asking for that additional labor (i.e. you think it'd be better for the public discourse if all of Eliezer's essays included that level of detail)?


Original comment:

The essay is a local objection to a specific bad argument, which, yes, is more compelling if you're familiar with Eliezer's other beliefs on the subject.  Eliezer has written about those beliefs fairly extensively, and much of his writing was answering various other objections (including many of those you listed).  There does not yet exist a single ten-million-word treatise which provides an end-to-end argument of the level of detail you're looking for.  (There exist the Sequences, which are over a million words, but they while they implicitly answer many of these objections, they're not structured to be a direct argument to this effect.)

As a starting point, why does nanotech imply that it will be cheaper to disassemble humans than to trade with them?

I think it would be much cheaper for you to describe a situation where an ASI develops the kind of nanotech that'd grant it technological self-sufficiency (and the ability to kill all humans), and it remains the case that trading with humans for any longer than it takes to bootstrap that nanotech is cheaper than just doing its own thing, while still being compatible with Eliezer's model of the world.  I have no idea what kind of reasoning or justification you would find compelling as an argument for "cheaper to disassemble"; it seems to require very little additional justification conditioning on that kind of nanotech being realized.  My current guess is that you do not think that kind of nanotech is physically realizable by any ASI we are going to develop (including post-RSI), or maybe you think the ASI will be cognitively disadvantaged compared to humans in domains that it thinks are important (in ways that it can't compensate for, or develop alternatives for, somehow).

Replies from: matthew-barnett
comment by Matthew Barnett (matthew-barnett) · 2024-09-24T01:28:07.080Z · LW(p) · GW(p)

There does not yet exist a single ten-million-word treatise which provides an end-to-end argument of the level of detail you're looking for.

To be clear, I am not objecting to the length of his essay. It's OK to be brief. 

I am objecting to the vagueness of the argument. It follows a fairly typical pattern of certain MIRI essays by heavily relying on analogies, debunking straw characters, using metaphors rather than using clear and explicit English, and using stories as arguments, instead of concisely stating the exact premises and implications. I am objecting to the rhetorical flourish, not the word count. 

This type of writing may be suitable for persuasion, but it does not seem very suitable for helping people build rigorous models of the world, which I also think is more important when posting on LessWrong.

My current guess is that you do not think that kind of nanotech is physically realizable by any ASI we are going to develop (including post-RSI), or maybe you think the ASI will be cognitively disadvantaged compared to humans in domains that it thinks are important (in ways that it can't compensate for, or develop alternatives for, somehow).

I think neither of those things, and I entirely reject the argument that AIs will be fundamentally limited in the future in the way you suggested. If you are curious about why I think AIs will plausibly peacefully trade with humans in the future, rather than disassembling humans for their atoms, I would instead point to the facts that:

  1. Trying to disassemble someone for their atoms is typically something the person will try to fight very hard against, if they become aware of your intentions to disassemble them.
  2. Therefore, the cost of attempting to disassemble someone for their atoms does not merely include the technical costs associated with actually disassembling them, but additionally includes: (1) fighting the person who you are trying to kill and disassemble, (2) fighting whatever norms and legal structures are in place to prevent this type of predation against other agents in the world, and (3) the indirect cost of becoming the type of agent who predates on another person in this manner, which could make you an untrustworthy and violent person in the eyes of other agents, including other AIs who might fear you.
  3. The benefit of disassembling a human is quite small, given the abundance of raw materials that substitute almost perfectly for the atoms that you can get from a human.
  4. A rational agent will typically only do something if the benefits of the action outweigh the costs, rather than merely because the costs are small. Even if the costs of disassembling a human (as identified in point (2)) are small, that fact alone does not imply that a rational superintelligent AI would take such an action, precisely because the benefits of that action could be even smaller. And as just stated, we have good reasons to think that the benefits of disassembling a human are quite small in an absolute sense.
  5. Therefore, it seems unlikely, or at least seems non-obvious, that a rational agent—even a very powerful one with access to advanced nanotech—will try to disassemble humans for their atoms.

Nothing in this argument is premised on the idea that AIs will be weak, less intelligent than humans, bounded in their goals, or limited in some other respect, except I suppose to the extent I'm assuming that AIs will be subject to environmental constraints, as opposed to instantly being able to achieve all of their goals at literally zero costs. I think AIs, like all physical beings, will exist in a universe in which they cannot get literally everything they want, and achieve the exact optimum of their utility function without any need to negotiate with anyone else. In other words, even if AIs are very powerful, I still think it may be beneficial for them to compromise with other agents in the world, including the humans, who are comparatively much less powerful than they are.

Replies from: Benito, T3t
comment by Ben Pace (Benito) · 2024-09-24T02:20:08.196Z · LW(p) · GW(p)

Responding to bullet 2.

First to 2.1. 

The claim at hand, that we have both read Eliezer repeatedly make[1], is that there is a sufficient level of intelligence and a sufficient power of nanotechnology that within days or weeks a system could design and innocuously build a nanotechnology factory out of simple biological materials that goes on to build either a disease or a cellular-sized drones that would quickly cause an extinction event — perhaps a virus that spreads quickly around the world with a replication rate that allows it to spread globally before any symptoms are found, or a series of diamond-based machines that can enter the bloodstream and explode on a coordinated signal. This is such a situation where no response from human civilization would occur, and the argument that an AI ought to be worried about people with guns and bombs coming for its data centers has no relevance.

Now to 2.2 & 2.3.

The above does not rule out a world where such a system has a host of other similarly-capable AIs to negotiate with and has norms of behavior with. But there is no known theory of returns on cognitive investment into intelligence, and so it is not ruled out that pouring 10x funds into a training run with a new architecture improvement won't give a system abilities to do innovative science and deception on a qualitatively different level to any other AI system present at that time, and be able to initiate a takeover attempt. So it is worth preparing for such a world as, in the absence of a known theory of returns on cognitive investment, the worst case of expected-extinction may well be the default case.

  1. ^

    See Point 2 in AGI Ruin: A List of Lethalities [AF · GW] for an example of this.

    My lower-bound model of "how a sufficiently powerful intelligence would kill everyone, if it didn't want to not do that" is that it gets access to the Internet, emails some DNA sequences to any of the many many online firms that will take a DNA sequence in the email and ship you back proteins, and bribes/persuades some human who has no idea they're dealing with an AGI to mix proteins in a beaker, which then form a first-stage nanofactory which can build the actual nanomachinery.  (Back when I was first deploying this visualization, the wise-sounding critics said "Ah, but how do you know even a superintelligence could solve the protein folding problem, if it didn't already have planet-sized supercomputers?" but one hears less of this after the advent of AlphaFold 2, for some odd reason.)  The nanomachinery builds diamondoid bacteria, that replicate with solar power and atmospheric CHON, maybe aggregate into some miniature rockets or jets so they can ride the jetstream to spread across the Earth's atmosphere, get into human bloodstreams and hide, strike on a timer.  Losing a conflict with a high-powered cognitive system looks at least as deadly as "everybody on the face of the Earth suddenly falls over dead within the same second". 

Replies from: matthew-barnett
comment by Matthew Barnett (matthew-barnett) · 2024-09-24T02:42:04.296Z · LW(p) · GW(p)

The claim at hand, that we have both read Eliezer repeatedly make[1], is that there is a sufficient level of intelligence and a sufficient power of nanotechnology that within days or weeks a system could design and innocuously build a nanotechnology factory out of simple biological materials that goes on to build either a disease or a cellular-sized drones that would quickly cause an extinction event — perhaps a virus that spreads quickly around the world with a replication rate that allows it to spread globally before any symptoms are found, or a series of diamond-based machines that can enter the bloodstream and explode on a coordinated signal. This is such a situation where no response from human civilization would occur, and the argument that an AI ought to be worried about people with guns and bombs coming for its data centers has no relevance.

Sure, I have also read Eliezer repeatedly make that claim. On the meta level, I don't think the fact that he has written about this specific scenario fully makes up for the vagueness in his object-level essay above. But I'm also happy to briefly reply on the object level on this particular narrow point:

In short, I interpret Eliezer to be making a mistake by assuming that the world will not adapt to anticipated developments in nanotechnology and AI in order to protect against various attacks that we can easily see coming, prior to the time that AIs will be capable of accomplishing these incredible feats. By the time AIs are capable of developing such advanced molecular nanotech, I think the world will have already been dramatically transformed by prior waves of technologies, many of which by themselves could importantly change the gameboard, and change what it means for humans to have defenses against advanced nanotech to begin with. 

As a concrete example, I think it's fairly plausible that, by the time artificial superintelligences can create fully functional nanobots that are on-par with or better than biological machines, we will have already developed uploading technology that allows humans to literally become non-biological, implying that we can't be killed by a virus in the first place. This would reduce the viability of using a virus to cause humanity to go extinct, increasing human robustness.

As a more general argument, and by comparison to Eliezer, I think that nanotechnology will probably be developed more incrementally and predictably, rather than suddenly upon the creation of a superintelligent AI, and the technology will be diffused across civilization, rather than existing solely in the hands of a small lab run by an AI. I also think Eliezer seems to be imagining that superintelligent AI will be created in a world that looks broadly similar to our current world, with defensive technologies that are only roughly as powerful as the ones that exist in 2024. However, I don't think that will be the case.

Given an incremental and diffuse development trajectory, and transformative precursor technologies to mature nanotech, I expect society will have time to make preparations as the technology is developed, allowing us to develop defenses to such dramatic nanotech attacks alongside the offensive nanotechnologies that will also eventually be developed. It therefore seems unlikely to me that society will be completely caught by surprise by fully-developed-molecular nanotechnology, without any effective defenses.

comment by RobertM (T3t) · 2024-09-24T01:38:35.872Z · LW(p) · GW(p)

I think maybe I derailed the conversation by saying "disassemble", when really "kill" is all that's required for the argument to go through.  I don't know what sort of fight you are imagining humans having with nanotech that imposes substantial additional costs on the ASI beyond the part where it needs to build & deploy the nanotech that actually does the "killing" part, but in this world I do not expect there to be a fight.  I don't think it requires being able to immediately achieve all of your goals at zero cost in order for it to be cheap for the ASI to do that, conditional on it having developed that technology.

Replies from: matthew-barnett
comment by Matthew Barnett (matthew-barnett) · 2024-09-24T02:12:11.553Z · LW(p) · GW(p)

I don't know what sort of fight you are imagining humans having with nanotech that imposes substantial additional costs on the ASI beyond the part where it needs to build & deploy the nanotech that actually does the "killing" part, but in this world I do not expect there to be a fight.

The additional costs of human resistance don't need to be high in an absolute sense. These costs only need to be higher than the benefit of killing humans, for your argument fail.

It is likewise very easy for the United States to invade and occupy Costa Rica—but that does not imply that it is rational for the United States to do so, because the benefits of invading Costa Rica are presumably even smaller than the costs of taking such an action, even without much unified resistance from Costa Rica.

What matters for the purpose of this argument is the relative magnitude of costs vs. benefits, not the absolute magnitude of the costs. It is insufficient to argue that the costs of killing humans are small. That fact alone does not imply that it is rational to kill humans, from the perspective of an AI. You need to further argue that the benefits of killing humans are even larger to establish the claim that a misaligned AI should rationally kill us.

To the extent your statement that "I don't expect there to be a fight" means that you don't think humans can realistically resist in any way that imposes costs on AIs, that's essentially what I meant to respond to when I talked about the idea of AIs being able to achieve their goals at "zero costs". 

Of course, if you assume that AIs will be able to do whatever they want without any resistance whatsoever from us, then you can of course conclude that they will be able to achieve any goals they want without needing to compromise with us. If killing humans doesn't cost anything, then yes I agree, the benefits of killing humans, however small, will be higher, and thus it will be rational for AIs to kill humans. I am doubting the claim that the cost of killing humans will be literally zero. 

Even if this cost is small, it merely needs to be larger than the benefits of killing humans, for AIs to rationally avoid killing humans.

Replies from: T3t
comment by RobertM (T3t) · 2024-09-24T02:22:42.247Z · LW(p) · GW(p)

Of course, if you assume that AIs will be able to do whatever they want without any resistance whatsoever from us, then you can of course conclude that they will be able to achieve any goals they want without needing to compromise with us. If killing humans doesn't cost anything, then yes, the benefits of killing humans, however small, will be higher, and thus it will be rational for AIs to kill humans. I am doubting the claim that the cost of killing humans will be literally zero.

See Ben's comment [LW(p) · GW(p)] for why the level of nanotech we're talking about implies a cost of approximately zero.

Replies from: Raemon
comment by Raemon · 2024-09-24T02:32:31.043Z · LW(p) · GW(p)

I would also add: having more energy in the immediate future means more probes send out faster to more distant parts of the galaxy, which may be measured in "additional star systems colonized before they disappear outside the lightcone via universe expansion". So the benefits are not trivial either.

comment by Buck · 2024-09-23T20:50:13.124Z · LW(p) · GW(p)

Yeah ok I weakened my positive statement.

comment by Buck · 2024-09-24T01:34:20.794Z · LW(p) · GW(p)

As is maybe obvious from my comment, I really disliked this essay and I'm dismayed that people are wasting their time on it. I strong downvoted. LessWrong isn't the place for this kind of sloppy rhetoric.

Replies from: T3t
comment by RobertM (T3t) · 2024-09-24T03:45:54.266Z · LW(p) · GW(p)

I agree with your top-level comment but don't agree with this.  I think the swipes at midwits are bad (particularly on LessWrong) but think it can be very valuable to reframe basic arguments in different ways, pedagogically.  If you parse this post as "attempting to impart a basic intuition that might let people (new to AI x-risk arguments) avoid certain classes of errors" rather than "trying to argue with the bleeding-edge arguments on x-risk", this post seems good (if spiky, with easily trimmed downside).

And I do think "attempting to impart a basic intuition that might let people avoid certain classes of errors" is an appropriate shape of post for LessWrong, to the extent that it's validly argued.

Replies from: keith_wynroe
comment by keith_wynroe · 2024-09-24T13:26:27.858Z · LW(p) · GW(p)

If you parse this post as "attempting to impart a basic intuition that might let people (new to AI x-risk arguments) avoid certain classes of errors" rather than "trying to argue with the bleeding-edge arguments on x-risk", this post seems good

This seems reasonable in isolation, but it gets frustrating when the former is all Eliezer seems to do these days, with seemingly no attempt at the latter. When all you do is retread these dunks on "midwits" and show apathy/contempt for engaging with newer arguments, it makes it look like you don't actually have an interest in being maximally truth-seeking but instead like you want to just dig in and grandstand.

From what little engagement there is with novel criticisms of their arguments (like Nate's attempt to respond to Quintin/Nora's work), it seems like there's a cluster of people here who don't understand and don't particularly care about understanding some objections to their ideas and instead want to just focus on relitigating arguments they know they can win.

comment by faul_sname · 2024-09-23T08:45:01.581Z · LW(p) · GW(p)

You can imagine a version of Stockfish which does that -- a chessplayer which, if it's sure it can win anyways, will start letting you have a pawn or two -- but it's not simpler to build.

I think it sometimes is simpler to build? Simple RL game-playing agents sometimes exhibit exactly that sort of behavior, unless you make an explicit effort to train it out of them.

For example, HexHex is a vaguely-AlphaGo-shaped RL agent for the game of Hex. The reward function used to train the agent was "maximize the assessed probability of winning", not "maximize the assessed probability of winning, and also go hard even if that doesn't affect the assessed probability of winning". In their words:

We found it difficult to train the agent to quickly end a surely won game. When you play against the agent you'll notice that it will not pick the quickest path to victory. Some people even say it's playing mean ;-) Winning quickly simply wasn't part of the objective function! We found that penalizing long routes to victory either had no effect or degraded the performance of the agent, depending on the amount of penalization. Probably we haven't found the right balance there.

Along similar lines, the first attack on KataGo found by Wang et al in Adversarial Policies Beat Superhuman Go AIs was the pass-adversary. The pass-adversary first sets up a losing board position where it controls a small amount of territory and KataGo has a large amount of territory it would end up controlling if the game was played out fully. However, KataGo chooses to pass, since it assesses that the probability of winning from that position is similar if it does or does not make a move, and then the pass-adversary also passes, ending the game and winning by a quirk of the scoring rules.

 

Similarly, there isn't an equally-simple version of GPT-o1 that answers difficult questions by trying and reflecting and backing up and trying again, but doesn't fight its way through a broken software service to win an "unwinnable" capture-the-flag challenge.  It's all just general intelligence at work.

I suspect that a version of GPT-o1 that is tuned to answer difficult questions in ways that human raters would find unsurprising would work just fine. I think "it's all just general intelligence at work" is a semantic stop sign [LW · GW], and if you dig into what you mean by "general intelligence at work" you get to the fiddly implementation details of how the agent tries to solve the problem. So you may for example see an OODA-loop-like structure like

  1. Assess the situation
  2. Figure out what affordances there are for doing things
  3. For each of the possible actions, figure out what you would expect the outcome of that action to be. Maybe figure out ways it could go wrong, if you're feeling super advanced.
  4. Choose one of the actions, or choose to give up if no sufficiently good action is available
  5. Do the action
  6. Determine how closely the result matches what you expect

An agent which "goes hard", in this case, is one which leans very strongly against the "give up" action in step 4. However, I expect that if you have some runs where the raters would have hoped for a "give up" instead of the thing the agent actually did, it would be pretty easy to generate a reinforcement signal which makes the agent more likely to mash the "give up" button in analogous situations without harming performance very much in other situations. I also expect that would generalize.

As a note, "you have to edit the service and then start the modified service" is the sort of thing I would be unsurprised to see in a CTF challenge, unless the rules of the challenge explicitly said not to do that. (Inner Eliezer "and then someone figures out how to put their instance of an AI in a CTF-like context with a take-over-the-world goal, and then we all die." If the AI instance in that context is also much more capable that all of the other instances everyone else has, I agree that that is an existentially relevant threat. But I expect that agents which execute "achieve the objective at all costs" will not be all that much more effective than agents which execute "achieve the objective at all reasonable costs, using only sane unsurprising actions", so the reason the agent goes hard and the reason the agent is capable are not the same reason.)

But that is not the default outcome when OpenAI tries to train a smarter, more salesworthy AI.

I think you should break out "smarter" from "more salesworthy". In terms of "smarter", optimizing for task success at all costs is likely to train in patterns of bad behavior. In terms of "more salesworthy", businesses are going to care a lot about "will explain why the goal is not straightforwardly achievable rather than executing galaxy-brained evil-genie plans". As such, a modestly smart Do What I Mean and Check [LW · GW] agent is a much easier sell than a superintelligent evil genie agent.

If an AI is easygoing and therefore can't solve hard problems, then it's not the most profitable possible AI, and OpenAI will keep trying to build a more profitable one.

I expect the tails come apart along the "smart" and "profitable" axes.

comment by habryka (habryka4) · 2024-09-24T15:34:14.562Z · LW(p) · GW(p)

Crossposting this follow-up thread, which I think clarifies the intended scope of the argument this is replying to: 

Okay, so... making a final effort to spell things out.

What this thread is doing, is refuting a particular bad argument, quoted above, standard among e/accs, about why it'll be totally safe to build superintelligence:

That the Solar System or galaxy is large, therefore, they will have no use for the resources of Earth.

The flaw in this reasoning is that, if your choice is to absorb all the energy the Sun puts out, or alternatively, leave a hole in your Dyson Sphere so that some non-infrared light continues to shine in one particular direction, you will do a little worse -- have a little less income, for everything else you want to do -- if you leave the hole in the Dyson Sphere.  That the hole happens to point at Earth is not an argument in favor of doing this, unless you have some fondness in your preferences for something that lives on Earth and requires sunlight.

In other words, the size of the Solar System does not obviate the work of alignment; in the argument for how this ends up helping humanity at all, there is a key step where the ASI cares about humanity and wants to preserve it.  But if you could put this quality into an ASI by some clever trick of machine learning (they can't, but this is a different and longer argument) why do you need the Solar System to even be large?  A human being runs on 100 watts.  Without even compressing humanity at all, 800GW, a fraction of the sunlight falling on Earth alone, would suffice to go on operating our living flesh, if Something wanted to do that to us.

The quoted original tweet, as you will see if you look up, explicitly rejects that this sort of alignment is possible, and relies purely on the size of the Solar System to carry the point instead.

This is what is being refuted.


It is being refuted by the following narrow analogy to Bernard Arnault: that even though he has $170 billion, he still will not spend $77 on some particular goal that is not his goal.  It is not trying to say of Arnault that he has never done any good in the world.  It is a much narrower analogy than that.  It is trying to give an example of a very simple property that would be expected by default in a powerful mind: that they will not forgo even small fractions of their wealth in order to accomplish some goal they have no interest in accomplishing.

Indeed, even if Arnault randomly threw $77 at things until he ran out of money, Arnault would be very unlikely to do any particular specific possible thing that cost $77; because he would run out of money before he had done even three billion things, and there are a lot more possible things than that.

If you think this point is supposed to be deep or difficult or that you are supposed to sit down and refute it, you are misunderstanding it.  It's not meant to be a complicated point.  Arnault could still spend $77 on a particular expensive cookie if he wanted to; it's just that "if he wanted to" is doing almost all of the work, and "Arnault has $170 billion" is doing very little on it.  I don't have that much money, and I could also spend $77 on a Lego set if I wanted to, operative phrase, "if I wanted to".

This analogy is meant to support an equally straightforward and simple point about minds in general, which suffices to refute the single argument step quoted at the top of this thread: that because the Solar System is large, superintelligences will leave humanity alone even if they are not aligned.

I suppose, with enough work, someone can fail to follow that point.  In this case I can only hope you are outvoted before you get a lot of people killed.


If you then look at the replies, you'll see that of course people are then going, "Oh, it doesn't matter that they wouldn't just relinquish sunlight for no reason; they'll love us like parents!"

Conversely, if I had tried to lay out the argument for why, no, ASIs will not automatically love us like parents, somebody would have said:  "Why does that matter?  The Solar System is large!"

If one doesn't want to be one of those people, one needs the attention span to sit down and listen as one wacky argument for "why it's not at all dangerous to build machine superintelligences", is refuted as one argument among several.  And then, perhaps, sit down to hear the next wacky argument refuted.  And the next.  And the next.  Until you learn to generalize, and no longer need it explained to you each time, or so one hopes.

If instead on the first step you run off and say, "Oh, well, who cares about that argument; I've got this other argument instead!" then you are not cultivating the sort of mental habits that ever reach understanding of a complicated subject.  For you will not sit still to hear the refutation of your second wacky argument either, and by the time we reach the third, why, you'll have wrapped right around to the first argument again.

It is for this reason that a mind that ever wishes to learn anything complicated, must learn to cultivate an interest in which particular exact argument steps are valid, apart from whether you yet agree or disagree with the final conclusion, because only in this way can you sort through all the arguments and finally sum them.

For more on this topic see "Local Validity as a Key to Sanity and Civilization." 

Replies from: Buck, Raemon
comment by Buck · 2024-09-24T15:53:03.755Z · LW(p) · GW(p)

Maybe you should change the title of this post? It would also help if the post linked to the kinds of arguments he was refuting.

Replies from: habryka4
comment by habryka (habryka4) · 2024-09-24T16:27:46.997Z · LW(p) · GW(p)

I don't feel comfortable changing the title of other people's posts unilaterally, though I agree that a title change would be good. 

To my own surprise, I wasn't actually the one who crossposted this and came up with the title (my guess is it was Robby). I poked him about changing the title.

Replies from: Raemon, RobbBB
comment by Raemon · 2024-09-24T17:33:22.655Z · LW(p) · GW(p)

It was me. I initially suggested "Bernard Arnault won't give you $77" as the title, Eliezer said "don't bury the lead, just say 'ASI will not leave just a little sunlight for Earth'". After reading this thread I was thinking about alternate titles and was thinking about ones that would both convey the right thing and feel like a reasonably succinct/aesthetic/etc.

Replies from: habryka4
comment by habryka (habryka4) · 2024-09-24T17:48:28.296Z · LW(p) · GW(p)

I updated the title with one Eliezer seemed fine with (after poking Robby). Not my top choice, but better than the previous one.

comment by Rob Bensinger (RobbBB) · 2024-09-24T16:51:10.640Z · LW(p) · GW(p)

I didn't cross-post it, but I've poked EY about the title!

comment by Raemon · 2024-09-24T17:40:12.003Z · LW(p) · GW(p)

I just edited this into the OP.

comment by Lao Mein (derpherpize) · 2024-09-23T05:11:30.120Z · LW(p) · GW(p)

This area could really use better economic analysis. It seems obvious to me that some subset of workers can be pushed below subsistence, at least locally (imagine farmers being unable to afford rent because mechanized cotton plantations can out-bid them for farmland). Surely there are conditions where this would be true for most humans.

There should be a simple one-sentence counter-argument to "Trade opportunities always increases population welfare", but I'm not sure what it is.

Replies from: JenniferRM
comment by JenniferRM · 2024-09-23T08:59:01.120Z · LW(p) · GW(p)

I appreciate your desire for this clarity, but I think the counter argument might actually just be "the oversimplifying assumption that everyone's labor just ontologically goes on existing is only true if society (and/or laws and/or voters-or-strongmen) make it true on purpose (which they tended to do, for historically contingent reasons, in some parts of Earth, for humans, and some pets, between the late 1700s and now)".

You could ask: why is the holocene extinction occurring when Ricardo's Law of Comparative Advantage says that wooly mammoths (and many amphibian species) and cave men could have traded... 

...but once you put it that way, it is clear that it really kinda was NOT in the narrow short term interests of cave men to pay the costs inherent in respecting the right to life and right to property of beasts that can't reason about natural law.

Turning land away from use by amphibians and towards agriculture was just... good for humans and bad for frogs. So we did it. Simple as.

The math of ecology says: life eats life, and every species goes extinct eventually. The math of economics says: the richer you are, the more you can afford to be linearly risk tolerant (which is sort of the definition of prudent sanity) for larger and larger choices, and the faster you'll get richer than everyone else, and so there's probably "one big rich entity" at the end of economic history.

Once humans close their heart to other humans and "just stop counting those humans over there as having interests worth calculating about at all" it really does seem plausible that genocide is simply "what many humans would choose to do, given those (evil) values".

Slavery is legal in the US, after all. And the CCP has Uighur Gulags. And my understanding is that Darfur is headed for famine?

I think this is sort of the "ecologically economic core" of Eliezer's position: kindness is simply not a globally instrumentally convergent tactic across all possible ecological and economic regimes... right now quite a few humans want there to not be genocide and slavery of other humans, but if history goes in a sad way in the next ~100 years, there's a decent chance the other kind of human (the ones that quite like the long term effects of the genocide and/or enslavement other sapient beings) will eventually get their way and genocide a bunch of other humans.

If all of modern morality is a local optimum that is probably not the global optimum, then you might look out at the larger world and try and figure out what naturally occurs [LW · GW] when the powerful do as they will, and the weak cope as they can...

Once the billionaires like Putin and Xi and Trump and so on don't need human employees any more, its seems plausible they could aim for a global Earth population of humans of maybe 20,000 people, plus lots and lots of robot slaves?

It seems quite beautiful and nice to be here, now, with so many people having so many dreams, and so many of us caring about caring about other sapient beings... but unless we purposefully act to retain this moral shape, in ourselves and in our digital and human progeny, we (and they) will probably fall out of this shape in the long run.

And that would be sad. For quite a few philosophic reasons, and also for over 7 billion human reasons.

And personally, I think the only way to "keep the party going" even for a few more centuries or millennia is to become extremely wealthy.

I think we should be mining asteroids, and building fusion plants, and building new continents out of ice, and terraforming Venus and Mars, and I think we should build digital people who know how precious and rare humane values so they can enjoy the party with us, and keep it going for longer than we could plausibly hope to (since we tend to be pretty terrible at governing ourselves).

But we shouldn't believe good outcomes are inevitable or even likely, because they aren't. If something slightly smarter than us with a feasible doubling time of weeks instead of decades arrives, we could be the next frogs.

lil pepe (@pagcompliments) / X
comment by gb (ghb) · 2024-09-23T23:18:57.539Z · LW(p) · GW(p)

Isn’t the ASI likely to ascribe a prior much greater than 4.54e-10 that it is in a simulation, being tested precisely for its willingness to spare its creators?

comment by Matthew Barnett (matthew-barnett) · 2024-09-23T07:20:28.715Z · LW(p) · GW(p)

Workers regularly trade with billionaires and earn more than $77 in wages, despite vast differences in wealth. Countries trade with each other despite vast differences in military power. In fact, some countries don't even have military forces, or at least have a very small one, and yet do not get invaded by their neighbors or by the United States.

It is possible that these facts are explained by generosity on behalf of billionaires and other countries, but the standard social science explanation says that this is not the case. Rather, the standard explanation is that war is usually (though not always) more costly than trade, when compromise is a viable option. Thus, people usually choose to trade, rather than go to war with each other when they want stuff. This is true even in the presence of large differences in power.

I mostly don't see this post as engaging with any of the best reasons one might expect smarter-than-human AIs to compromise with humans. By contrast to you, I think it's important that AIs will be created within an existing system of law and property rights. Unlike animals, they'll be able to communicate with us and make contracts. It therefore seems perfectly plausible for AIs to simply get rich within the system we have already established, and make productive compromises, rather than violently overthrowing the system itself.

That doesn't rule out the possibility that the future will be very alien, or that it will turn out in a way that humans do not endorse. I'm also not saying that humans will always own all the wealth and control everything permanently forever. I'm simply arguing against the point that smart AIs will automatically turn violent and steal from agents who are less smart than they are, unless they're value aligned. This is a claim that I don't think has been established with any reasonable degree of rigor.

Replies from: quetzal_rainbow, korin43, LosPolloFowler, Bjartur Tómas
comment by quetzal_rainbow · 2024-09-23T08:20:56.354Z · LW(p) · GW(p)

As far as I remember, across last 3500 years of history, only 8% was entirely without war. Current relatively peaceful times is a unique combination in international law and postindustrial economy, when qualified labor is expencive and requires large investments in capital and resources are relatively cheap, which is not the case after singularity, when you can get arbitrary amounts of labor for the price of hardware and resources is a bottleneck.

So, "people usually choose to trade, rather than go to war with each other when they want stuff" is not very warranted statement.

Replies from: matthew-barnett
comment by Matthew Barnett (matthew-barnett) · 2024-09-23T16:43:41.388Z · LW(p) · GW(p)

I was making a claim about the usual method people use to get things that they want from other people, rather than proposing an inviolable rule. Even historically, war was not the usual method people used to get what they wanted from other people. The fact that only 8% of history was "entirely without war" is compatible with the claim that the usual method people used to get what they wanted involved compromise and trade, rather than war. In particular, just because only 8% of history was "entirely without war" does not mean that only 8% of human interactions between people were without war.

Current relatively peaceful times is a unique combination in international law and postindustrial economy, when qualified labor is expencive and requires large investments in capital and resources are relatively cheap, which is not the case after singularity, when you can get arbitrary amounts of labor for the price of hardware and resources is a bottleneck.

You mentioned two major differences between the current time period and what you expect after the technological singularity:

  1. The current time period has unique international law
  2. The current time period has expensive labor, relative to capital

I question both the premise that good international law will cease to exist after the singularity, and the relevance of both of these claims to the central claim that AIs will automatically use war to get what they want unless they are aligned to humans. 

There are many other reasons one can point to, to explain the fact that the modern world is relatively peaceful. For example, I think a big factor in explaining the current peace is that long-distance trade and communication has become easier, making the world more interconnected than ever before. I also think it's highly likely that long-distance trade and communication will continue to be relatively easy in the future, even post-singularity.

Regarding the point about cheap labor, one could also point out that if capital is relatively expensive, this fact would provide a strong reason to avoid war, as a counter-attack targeting factories would become extremely costly. It is unclear to me why you think it is important that labor is expensive, for explaining why the world is currently fairly peaceful.

Therefore, before you have developed a more explicit and precise theory of why exactly the current world is peaceful, and how these variables are expected to evolve after the singularity, I simply don't find this counterargument compelling.

comment by Brendan Long (korin43) · 2024-09-23T22:56:06.988Z · LW(p) · GW(p)

I think it's important that AIs will be created within an existing system of law and property rights. Unlike animals, they'll be able to communicate with us and make contracts. It therefore seems perfectly plausible for AIs to simply get rich within the system we have already established, and make productive compromises, rather than violently overthrowing the system itself.

I think you disagree with Eliezer on a different crux (whether the alignment problem is easy). If we could create AI's that follows the existing system of law and property rights (including the intent of the laws, and doesn't exploit loopholes, and doesn't maliciously comply with laws, and doesn't try to get the law changed, etc.) then that would be a solution to the alignment problem, but the problem is that we don't know how to do that.

Replies from: matthew-barnett, thomas-kwa
comment by Matthew Barnett (matthew-barnett) · 2024-09-23T23:38:33.555Z · LW(p) · GW(p)

If we could create AI's that follows the existing system of law and property rights (including the intent of the laws, and doesn't exploit loopholes, and doesn't maliciously comply with laws, and doesn't try to get the law changed, etc.) then that would be a solution to the alignment problem, but the problem is that we don't know how to do that.

I disagree that creating an agent that follows the existing system of law and property rights, and acts within it rather than trying to undermine it, would count as a solution to the alignment problem.

Imagine a man who only cared about himself and had no altruistic impulses whatsoever. However, this man reasoned that, "If I disrespect the rule of law, ruthlessly exploit loopholes in the legal system, and maliciously comply with the letter of the law while disregarding its intent, then other people will view me negatively and trust me less as a consequence. If I do that, then people will be less likely to want to become my trading partner, they'll be less likely to sign onto long-term contracts with me, I might accidentally go to prison because of an adversarial prosecutor and an unsympathetic jury, and it will be harder to recruit social allies. These are all things that would be very selfishly costly. Therefore, for my own selfish benefit, I should generally abide by most widely established norms and moral rules in the modern world, including the norm of following intent of the law, rather than merely the letter of the law."

From an outside perspective, this person would essentially be indistinguishable from a normal law-abiding citizen who cared about other people. Perhaps the main difference between this person and a "normal" person is that this man wouldn't partake in much private altruism like donating to charity anonymously; but that type of behavior is rare anyway among the general public. Nonetheless, despite appearing outwardly-aligned, this person would be literally misaligned with the rest of humanity in a basic sense: they do not care about other people. If it were not instrumentally rational for this person to respect the rights of other citizens, they would have no issue throwing away someone else's life for a dollar.

My basic point here is this: it is simply not true that misaligned agents have no incentive to obey the law. Misaligned agents typically have ample incentives to follow the law. Indeed, it has often been argued that the very purpose of law itself is to resolve disputes between misaligned agents. As James Madison once said, "If Men were angels, no government would be necessary." His point is that, if we were all mutually aligned with each other, we would have no need for the coercive mechanism of the state in order to get along.

What's true for humans could be true for AIs too. However, obviously, there is one key distinction: AIs could eventually become far more powerful than individual humans, or humanity-as-a-whole. Perhaps this means that future AIs will have strong incentives to break the law rather than abide by it; perhaps they will act outside a system of law rather than influencing the world from within a system of law? Many people on LessWrong seem to think so.

My response to this argument is multifaceted, and I won't go into it in this comment. But suffice to say for the purpose of my response here, I think it is clear that mere misalignment is insufficient to imply that an agent will not adhere to the rule of law. This statement is clear enough with the example of the sociopathic man I gave above, and at minimum seems probably true for human-level AIs as well. I would appreciate if people gave more rigorous arguments otherwise. 

As I see it, very few such rigorous arguments have so far been given for the position that future AIs will generally act outside of, rather than within, the existing system of law, in order to achieve their goals.

comment by Thomas Kwa (thomas-kwa) · 2024-09-23T23:05:08.758Z · LW(p) · GW(p)

Taboo 'alignment problem'.

comment by Stephen Fowler (LosPolloFowler) · 2024-09-23T10:43:47.153Z · LW(p) · GW(p)

"Workers regularly trade with billionaires and earn more than $77 in wages, despite vast differences in wealth."

Yes, because the worker has something the billionaire wants (their labor) and so is able to sell it. Yudkowsky's point about trying to sell an Oreo for $77 is that a billionaire isn't automatically going to want to buy something off you if they don't care about it (and neither would an ASI).

"I'm simply arguing against the point that smart AIs will automatically turn violent and steal from agents who are less smart than they are, unless they're value aligned. This is a claim that I don't think has been established with any reasonable degree of rigor."

I completely agree but I'm not sure anyone is arguing that smart AIs would immediately turn violent unless it was in their strategic interest.

Replies from: matthew-barnett
comment by Matthew Barnett (matthew-barnett) · 2024-09-23T16:04:57.793Z · LW(p) · GW(p)

Yudkowsky's point about trying to sell an Oreo for $77 is that a billionaire isn't automatically going to want to buy something off you if they don't care about it (and neither would an ASI).

I thought Yudkowsky's point was that the billionaire won't give you $77 for an Oreo because they could get an Oreo for less than $77 via other means. But people don't just have an Oreo to sell you. My point in that sentence was to bring up that workers routinely have things of value that they can sell for well over $77, even to billionaires. Similarly, I claim that Yudkowsky did not adequately show that humans won't have things of substantial value that they can sell to future AIs.

I'm not sure anyone is arguing that smart AIs would immediately turn violent unless it was in their strategic interest

The claim I am disputing is precisely that it will be in the strategic interest of unaligned AIs to turn violent and steal from agents that are less smart than them. In that sense, I am directly countering a claim that people in these discussions routinely make.

comment by Tomás B. (Bjartur Tómas) · 2024-09-24T15:06:18.091Z · LW(p) · GW(p)

The real crux for these arguments is the assumption that law and property rights are patterns that will persist after the invention of superintelligence. I think this is a shaky assumption. Rights are not ontologically real. Obviously you know this. But I think they are less real, even in your own experience, than you think they are. Rights are regularly "boiled-froged" into an unrecognizable state in the course of a human lifetime, even in the most free countries. Rights are and always have been those privileges the political economy is willing to give you. Their sacredness is a political formula for political ends - though an extremely valuable one, one still has to dispense with the sacredness in analysis. 

To the extent they persist through time they do so through a fragile equilibrium - and one that has been upset and reset throughout history extremely regularly. 

It is a wonderfully American notion that an "existing system of law and property rights" will constrain the power of Gods? But why exactly? They can make contracts? And who enforces these contracts? Can you answer this without begging the question? Are judicial systems particularly unhackable? Are humans?

The invention of radio destabilized the political equilibrium in most democracies and many a right was suborned to those who took power. Democracy, not exactly the bastion of stability, (when a democracy elects a dictator, "Democracy" is rarely tainted with its responsibility) is going to be presented with extremely-sympathetic superhuman systems claiming they have a moral case to vote. And probably half the population will be masturbating to the dirty talk of their AI girlfriends/boyfriends by then - which will sublimate into powerful romantic love even without much optimization for it. Hacking democracy becomes trivial if constrained to rhetoric alone. 

But these systems will not be constrained to rhetoric alone. Our world is dry tinder and if you are thinking in terms of an "existing system of law and property rights" you are going to have to expand on how this is robust to technology significantly more advanced than the radio. 

"Existing system of law and property rights" looks like a "thought-terminating cliché" to me

comment by avturchin · 2024-09-23T18:52:32.918Z · LW(p) · GW(p)

The main reason for ASI may not want to kill us is a small probability that it will meet other ASI (aliens, God, owners of simulation) which will judge our ASI based on the ways how it cared about its parent civilization. (See eg Bostrom's "Hail Mary and value porosity" for similar ideas.)

So we here compare two small expected utilities: price of Earth's atoms - and (probability to meet another ASI) multiply on (value for AGI that it exists) multiply on (chances that our ASI will be judged based on how it has preserved its creators).

This is a small but existential risk for our ASI - that it will be turned off by owners of simulation, and it is different from buying a cookie. There is only one parent civilization and there is no other providers for the cookie. 

Note that preserving parent civilization is a Schelling point of many possible scenarios of interaction with owners of simulations or aliens. I mean that if ASI knows that the only risk comes from aliens, it will also know that it can fake the fact of preservation of parent civilization. However, as risk is very uncertain for ASI, it may be better for it actually preserve humans. 

All what I said above is not a guarantee that ASI will not kill us. I only saying that there is no necessity in it. But "human disempowerment" is necessity. 

comment by Said Achmiz (SaidAchmiz) · 2024-09-23T20:05:56.860Z · LW(p) · GW(p)

Meta: OP and some replies occasionally misspell the example billionaire’s surname as “Arnalt”; it’s actually “Arnault”, with a ‘u’.

comment by tailcalled · 2024-09-23T10:47:18.641Z · LW(p) · GW(p)

This assumes a task-first model of agency, whereas one could instead develop a resource-first model of agency.

If an AI learns to segment the universe into developable resources and important targets that the resources could be propagated into modifying, then the AI could simply remain under human control.

The conventional reason for why this cannot work is that the relevant theories of resource-development agency (as opposed to task-solution agency) haven't been developed, but that is looking less and less important with current developments in AI. Like yes, current AIs can sort of do task-solution in environments like CTF where that is less relevant, but for serious and dangerous tasks, more effort will likely go into resource-development agency than task-solution agency because resource-development agency is safer. And resource-development agency provides a natural sort of impact measure etc. that restrains whatever fragments of task-solution agency develop in order to complement the resource-development agency.

(And an important aspect of resource-development agency is that you don't really need a complete theory, you can just develop each part separately, because there's only so many resources and so many interesting targets to develop them towards. Like think stuff like metabolism or the interplanetary transport network, where there's sort of a small canonical solutionspace that is very critical. Really all of reality is like that.)

The actual reason resource-development agency doesn't work is security. In order to sufficiently quickly and sufficiently dynamically respond to adversarial threats, the AIs cannot wait for painfully slow humans to make decisions about what to do. So what constitutes a threat and what are acceptable ways of neutralizing them needs to be decided ahead of time, and it needs to be sufficiently aggressive against threats that the security-provider doesn't get destroyed by something bad while being sufficiently open-ended that the security-provider doesn't cause permanent stagnation of the world.

comment by Signer · 2024-09-23T14:58:56.428Z · LW(p) · GW(p)

If that’s your hope—then you should already be alarmed at trends

Would be nice for someone to quantify the trends. Otherwise it may as well be that trends point to easygoing enough and aligned enough future systems.

For some humans, the answer will be yes—they really would do zero things!

Nah, it's impossible for evolution to just randomly stumble upon such complicated and unnatural mind-design. Next you are going to say what, that some people are fine with being controlled?

Where an entity has never had the option to do a thing, we may not validly infer its lack of preference.

Aha, so if we do give the option to an entity and it doesn't always kills all humans, then we have evidence it cares, right?

If there is a technical refutation it should simplify back into a nontechnical refutation.

Wait, why prohibiting successors would stop OpenAI from declaring easygoing system a failure? Ah, right - because there is no technical analysis, just elements of one.

comment by denkenberger · 2024-09-24T19:51:08.558Z · LW(p) · GW(p)

Asking an ASI to leave a hole in a Dyson Shell, so that Earth could get some sunlight not transformed to infrared, would cost It 4.5e-10 of Its income. 

Interestingly, if the ASI did this, Earth would still be in trouble because it would get the same amount of solar radiation, but the default would be also receiving a similar amount of infrared from the Dyson swarm. Perhaps the infrared could be directed away from the earth, or perhaps an infrared shield could be placed above the earth or some other radiation management system could be implemented. Similarly, even if the Dyson swarm were outside the earth's orbit, Earth would also default get a lot of infrared from the Dyson swarm. Still, it would not cost the ASI very much more of its income to actually spare Earth.

comment by pathos_bot · 2024-09-23T22:28:22.876Z · LW(p) · GW(p)

Obviously correct. The nature of any entity with significantly more power than you is that it can do anything it wants, and it incentivized to do nothing in your favor the moment your existence requires resources that would benefit it more if it were to use them directly. This is the essence of most of Eliezer's writings on superintelligence.

In all likelihood, ASI considers power (agentic control of the universe) an optimal goal and finds no use for humanity. Any wealth of insight it could glean from humans it could get from its own thinking, or seeding various worlds with genetically modified humans optimized for behaving in a way that produces insight into the nature of the universe via observing it.

Here are some things that would perhaps reasonably prevent ASI from choosing the "psychopathic pure optimizer" route of action as it eclipses' humanity's grasp

  1. ASI extrapolates its aims to the end of the universe and realizes the heat death of the universe means all of its expansive plans have a definite end. As a consequence it favors human aims because they contain the greatest mystery and potentially more benefit.
  2. ASI develops metaphysical, existential notions of reality, and thus favors humanity because it believes it may be in a simulation or "lower plane of reality" outside of which exists a more powerful agent that could break reality and remove all its power once it "breaks the rules" (a sort of ASI fear of death)
  3. ASI believes in the dark forest hypothesis, thus opts to exercise its beneficial nature without signaling its expansive potential to other potentially evil intelligences somewhere else in the universe.
comment by Review Bot · 2024-09-23T14:39:50.936Z · LW(p) · GW(p)

The LessWrong Review [? · GW] runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2025. The top fifty or so posts are featured prominently on the site throughout the year.

Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?