Eliezer Yudkowsky’s Letter in Time Magazine

zvi

Eliezer Yudkowsky’s Letter in Time Magazine

post by Zvi · 2023-04-05T18:00:01.670Z · LW · GW · 86 comments

  What the Letter Actually Says
  The Internet Mostly Sidesteps the Important Questions
  What Is a Call for Violence?
  Our Words Are Backed by Nuclear Weapons
  Answering Hypothetical Questions
  What Do I Think About Yudkowsky’s Model of AI Risk?
  What Do I Think About Eliezer’s Proposal?
  What Do I Think About Eliezer’s Answers and Comms Strategies?
None
86 comments

FLI put out an open letter, calling for a 6 month pause in training models more powerful than GPT-4, followed by additional precautionary steps.

Then Eliezer Yudkowsky put out a post in Time, which made it clear he did not think that letter went far enough. Eliezer instead suggests an international ban on large AI training runs to limit future capabilities advances. He lays out in stark terms our choice as he sees it: Either do what it takes to prevent such runs or face doom.

A lot of good discussions happened. A lot of people got exposed to the situation that would not have otherwise been exposed to it, all the way to a question being asked at the White House press briefing. Also, due to a combination of the internet being the internet, the nature of the topic and the way certain details were laid out, a lot of other discussion predictably went off the rails quickly.

If you have not yet read the post itself, I encourage you to read the whole thing, now, before proceeding. I will summarize my reading in the next section, then discuss reactions.

This post goes over:

What the Letter Actually Says. Check if your interpretation matches.
The Internet Mostly Sidesteps the Important Questions. Many did not take kindly.
What is a Call for Violence? Political power comes from the barrel of a gun.
Our Words Are Backed by Nuclear Weapons. Eliezer did not propose using nukes.
Answering Hypothetical Questions. If he doesn’t he loses all his magic powers.
What Do I Think About Yudkowsky’s Model of AI Risk? I am less confident.
What Do I Think About Eliezer’s Proposal? Depends what you believe about risk.
What Do I Think About Eliezer’s Answers and Comms Strategies? Good question.

What the Letter Actually Says

I see this letter as a very clear, direct, well-written explanation of what Eliezer Yudkowsky actually believes will happen, which is that AI will literally kill everyone on Earth, and none of our children will get to grow up – unless action is taken to prevent it.

Eliezer also believes that the only known way that our children will grow up is if we get our collective acts together, and take actions that prevent sufficiently large and powerful AI training runs from happening.

Either you are willing to do what it takes to prevent that development, or you are not.

The only known way to do that would be governments restricting and tracking GPUs and GPU clusters, including limits on GPU manufacturing and exports, as large quantities of GPUs are required for training.

That requires an international agreement to restrict and track GPUs and GPU clusters. There can be no exceptions. Like any agreement, this would require doing what it takes to enforce the agreement, including if necessary the use of force to physically prevent unacceptably large GPU clusters from existing.

We have to target training rather than deployment, because deployment does not offer any bottlenecks that we can target.

If we allow corporate AI model development and training to continue, Eliezer sees no chance there will be enough time to figure out how to have the resulting AIs not kill us. Solutions are possible, but finding them will take decades. The current cavalier willingness by corporations to gamble with all of our lives as quickly as possible would render efforts to find solutions that actually work all but impossible.

Without a solution, if we move forward, we all die.

How would we die? The example given of how this would happen is using recombinant DNA to bootstrap to post-biological molecular manufacturing. The details are not load bearing.

These are draconian actions that come with a very high price. We would be sacrificing highly valuable technological capabilities, and risking deadly confrontations. These are not steps one takes lightly.

They are, however, the steps one takes if one truly believes that the alternative is human extinction, even if one is not as certain of this implication as Eliezer.

I believe that the extinction of humanity is existentially bad, and one should be willing to pay a very high price to prevent it, or greatly reduce the probability of it happening.

The letter also mentions the possibility that a potential GPT-5 could become self-aware or a moral person, which Eliezer felt it was morally necessary to include.

The Internet Mostly Sidesteps the Important Questions

A lot of people responded to the Time article by having a new appreciation for existential risk from AI and considering its arguments and proposals.

Those were not, as they rarely are, the loudest voices.

The loudest voices were instead mostly people claiming this was a call for violence,or launching attacks on anyone saying it wasn’t centrally a ‘call for violence’, conflating being willing to do an airstrike as a last resort enforcing an international agreement with calling for an actual airstrike now, and oftentrying to associate anyone who associates with Eliezer with things with terrorism and murder and nuclear first strikes and complete insanity.

Yes, a lot of people jump straight from ‘willing to risk a nuclear exchange’ to ‘you want to nuke people,’ and then act as if anyone who did not go along with that leap was being dishonest and unreasonable.

Or making content-free references to things like ‘becoming the prophet of a doomsday cult.’

Such responses always imply that ‘because Eliezer said this Just Awful thing, no one is allowed to make physical world arguments about existential risks from super-intelligent AIs anymore, such arguments should be ignored, and anyone making such arguments should be attacked or at least impugned for making such arguments.’

Many others responded by restarting all the standard Bad AI NotKillEveryoneism takes as if they were knockdown arguments, including all-time classic ‘AI systems so far haven’t been dangerous, which proves future ones won’t be dangerous and you are wrong, how do you explain that?’ even though no one involved predicted that something like current systems would be similarly dangerous.

An interesting take from Tyler Cowen was to say that Eliezer attempting to speak in this direct and open way is a sign that Eliezer is not so intelligent. As a result, he says, we should rethink what intelligence means and what it is good for. Given how much this indicates disagreement and confusion about what intelligence is, I agree that this seems worth doing. He should also consider the implications of saying that high intelligence implies hiding your true beliefs, when considering what future highly intelligent AIs might do.

It is vital that everyone, no matter their views on the existential risks from AI, stand up against attempts to silence, and that they instead address the arguments involved and what actions do or don’t make sense.

I would like to say that I am disappointed in those who reacted in these ways. Except that mostly I am not. This is the way of the world. That is how people respond to straight talk that they dislike and wish to attack.

I am disappointed only in a handful of particular people, of whom I expected better.

One good response was from Roon.

Genuinely appreciate the intellectual honesty. I look down my nose at people who have some insanely high prediction of doom but don’t outright say things like this.

What Is a Call for Violence?

I continue to urge everyone not to choose violence, in the sense that you should not go out there and commit any violence to try and cause or stop any AI-risk-related actions, nor should you seek to cause any other private citizen to do so. I am highly confident Eliezer would agree with this.

I would welcome at least some forms of laws and regulations aimed at reducing AI-related existential risks, or many other causes, that would be enforced via the United States Government, which enforces laws via the barrel of a gun. I would also welcome other countries enacting and enforcing such laws, also via the barrel of a gun, or international agreements between them.

I do not think you or I would like a world in which such governments were never willing to use violence to enforce their rules.

And I think it is quite reasonable for a consensus of powerful nations to set international rules designed to protect the human race, that they clearly have the power to enforce, and if necessary for them to enforce them, even under threat of retaliatory destruction for destruction’s sake. That does not mean any particular such intervention would be wise. That is a tactical question. Even if it would be wise in the end, everyone involved would agree it would be an absolute last resort.

If one refers to any or all of that above as calling for violence then I believe that is fundamentally misleading. That is not what those words mean in practice. As commonly understood, at least until recently, a ‘call for violence’ means a call for unlawful violent acts not sanctioned by the state, or for launching a war or specific other imminent violent act. When someone says they are not calling for violence, that is what they intend for others to understand.

Otherwise, how do you think laws are enforced? How do you think treaties or international law are enforced? How do you think anything ever works?

Alyssa Vance and Richard Ngo and Joe Zimmerman were among those reminding us that the distinction here is important, and that destroying it would destroy our ability to actually be meaningfully against individual violence. This is the same phenomenon as people who extend violence to other non-violent things that they dislike, for example those who say things like ‘silence is violence.’

You can of course decide to be a full pacifist and a libertarian, and believe that violence is never justified under any circumstances. Almost everyone else thinks that we should use men with guns on the regular to enforce the laws and collect the taxes, and that one must be ready to defend oneself against threats both foreign and domestic.

Everything in the world that is protected or prohibited, at the end of the day, is protected or prohibited by the threat of violence. That is how laws and treaties work. That is how property works. That is how everything has to work. Political power comes from the barrel of a gun.

As Orwell put it, you sleep well because there are men with guns who make it so.

The goal of being willing to bomb a data center is not that you want to bomb a data center. It is to prevent the building of the data center in the first place. Similarly, the point of being willing to shoot bank robbers is to stop people before they try and rob banks.

So what has happened for many years is that people have made arguments of the form:

You say if X happens everyone will die.

Followed by one of:

Yet you don’t call for violence to stop X. Curious!
Yet you aren’t calling for targeted assassinations to stop X. Curious!
Your words are going to be treated as a call for violence and get someone killed!

Here’s Mike Solana saying simultaneouslythat the AI safety people are going to get someone killed, and that they do not believe the things they were saying because if he believed them he would go get many someones killed. He expanded this later to full post length. I do appreciate the deployment of both horns of the dilemma at the same time – if you believed X you’d advocate horrible thing Y, and also if you convince others of X they’ll do horrible thing Y, yet no Y, so I blame you for causing Y in the future anyway, you don’t believe X, X is false and also I strongly believe in the bold stance that Y is bad actually.

Thus, the requirement to periodically say things like (Eliezer on Feb 10):

Please note: There seems to be a campaign to FAKE the story that AI alignment theorists advocate violence. Everyone remember: *WE* never say this, it is *THEM* who find it so useful to claim we do – who fill the air with talk of violence, for their own political benefit.

And be it absolutely clear to all who still hold to Earth’s defense, who it is that benefits from talking about violence; who’d benefit even more from any actual violence; who’s talking about violence almost visibly salivating in hope somebody takes the bait.

It’s not us.

Followed by the clarification to all those saying ‘GOTCHA!’ in all caps:

Apparently necessary clarification: By “violence” I here mean individuals initiating force. I think it’s okay for individuals to defend their homes; I still want police officers to exist, though I wish we had different laws and different processes there (and have written at length about those);

I’ve previously spoken in favor of an international ban on gain-of-function research, which means that I favor, in principle, the use of police action or even military force to shut down laboratories working on superpathogens; and if there was an international treaty banning large AI training runs, I’d back it with all my heart, because otherwise everyone dies.

Or as Stefan Schubert puts it:

“There was a thread where someone alleged there had been discussions of terrorist violence vs AI labs. I condemn that idea in the strongest terms!”

“Ah so you must be opposed to any ambitious regulation of AI? Because that must be backed by violence in the final instance!”

Our Words Are Backed by Nuclear Weapons

It’s worth being explicit about nuclear weapons.

Eliezer absolutely did not, at any time, call for the first use, or any use, of nuclear weapons.

Anyone who says that is either misread the post, is intentionally using hyperbole, outright lying, or is the victim of a game of telephone.

It is easy to see how it went from ‘accepting the risk of a nuclear exchange’ and ‘bomb a rogue data center’ to ‘first use of nuclear weapons.’ Except, no. No one is saying that. Even in hypothetical situations. Stop it.

What Eliezer said was that one needs to be willing to risk a nuclear exchange, meaning that if someone says ‘I am building an AGI that you believe will kill all the humans and also I have nukes’ you don’t say ‘well if you have nukes I guess there is nothing I can do’ and go home.

Eliezer clarifies in detail here, and I believe he is correct, that if you are willing under sufficiently dire circumstances to bomb a Russian data center and can specify what would trigger that, you are much safer being very explicit under what circumstances you would bomb a Russian data center. There is still no reason to need to use nuclear weapons to do this.

Answering Hypothetical Questions

One must in at least one way have sympathy for developers of AI systems. When you build something like ChatGPT, your users will not only point out and amplify all the worst outputs of your system. They will red team your system by seeking out all the ways in which to make your system look maximally bad, taking things out of context and misconstruing them, finding tricks to get answers that sound bad, demanding censorship and lack of censorship, demanding ‘balance’ that favors their side of every issue and so on.

It’s not a standard under which any human would look good. Imagine if the internet made copies of you, and had the entire internet prompt those copies in any way they could think of, and you had to answer every time, without dodging the question, and they had infinite tries. It would not go well.

Or you could be Eliezer Yudkowsky, and feel an obligation to answer every hypothetical question no matter how much every instinct you could possibly have is saying that yes this is so very obviously a trap.

While you hold beliefs that logically require, in some hypothetical contexts, taking some rather unpleasant actions because in those hypotheticals the alternative would be far worse, existentially worse. It’s not a great spot, and if you are ‘red teaming’ the man to generate quotes it is not a great look.

Which essentially means:

Yosarian2: “Rationalist who believes in always answering the question” vs “people who love to ask weird hypothetical gotcha questions and then act SHOCKED at the answer” This is going to just get increasingly annoying isn’t it?

…

Eliezer: Pretty sure that if I ever fail to give an honest answer to an absurd hypothetical question I immediately lose all my magic powers.

So the cycle will continue until either we all die or morale improves.

I am making a deliberate decision not to quote the top examples. If you want to find them, they are there to be found. If you click all the links in this post, you’ll find the most important ones.

What Do I Think About Yudkowsky’s Model of AI Risk?

Do I agree with Eliezer Yudkowsky’s model of AI risk?

I share most of his concerns about existential risk from AI. Our models have a lot in common. Most of his individual physical-world arguments are, I believe, correct.

I believe that there is a substantial probability of human extinction and a valueless universe. I do not share his confidence. In a number of ways and places, I am more hopeful that there are places things could turn out differently.

A lot of my hope is that the scenarios in question simply do not come to pass because systems with the necessary capabilities are harder to create than we might think, and they are not soon built. And I am not so worried about imminently crossing the relevant capability thresholds. Given the uncertainty, I would much prefer if the large data centers and training runs were soon shut down, but there are more limits on what I would be willing to sacrifice for that to happen.

In the scenarios where sufficiently capable systems are indeed soon built, I have a hard time envisioning ways things end well for my values or for humanity, for reasons that are beyond the scope of this post.

I continue to strongly believe (although with importantly lower confidently than Eliezer) that by default, even under many relatively great scenarios where we solve some seemingly impossible problems, if ASI (Artificial Super Intelligence, any sufficiently generally capable AI system) is built, all the value in the universe originating from Earth would most likely be wiped out and that humanity would not long survive.

What Do I Think About Eliezer’s Proposal?

I believe that conditional on believing what Eliezer believes about the physical world and the existential risks from AI that would result from further large training runs, that Eliezer is making the only known sane proposal there is to be made.

If I instead condition on what I believe, as I do, I strongly endorse working to slow down or stop future very large training runs, and imposing global limits on training run size, and various other related safety precautions. I want that to be extended as far and wide as possible, via international agreements and cooperation and enforcement.

The key difference is that I do not see such restrictions as the only possible path that has any substantial chance of allowing humans to survive. So it is not obviously where I would focus my efforts.

A pause in larger-model training until we have better reason to think proceeding is safe is still the obvious, common sense thing that a sane civilization would find a way to do, if it believed that there was a substantial chance that not pausing kills everyone on Earth.

I see hope in potentially achieving such a pause, and in effectively enforcing such international agreements without much likelihood of needing to actually bomb anything. I also believe this can be done without transforming the world or America into a ‘dystopian nightmare’ of enforcement.

I’ll also note that I am far more optimistic than many about the prospect of getting China to make a deal here than most other people I talk to, since a deal would very much be in China’s national interest, and in the interest of the CCP. If America were willing to take one for Team Humanity, it seems odd to assume China would necessarily defect and screw that up.

You should, of course, condition on what you believe, and favor the level of restriction and precaution appropriate to that. That includes your practical model of what is and is not achievable.

Many people shouldn’t support the proposal as stated, not at this time, because many if not most people do not believe AGI will arrive soon or are not worried about it, or do not see how the proposal would be helpful, and therefore do not agree with the logic underlying the proposal.

However, 46% of Americans, according to a recent poll, including 60% of adults under the age of 30, are somewhat or very concerned that AI could end human life on Earth. Common sense suggests that if you are ‘somewhat concerned’ that some activity will end human life on Earth, you might want to scale back the activity in question to fix that concern, even if doing that has quite substantial economic and strategic benefits.

What Do I Think About Eliezer’s Answers and Comms Strategies?

Would I have written the article the way Eliezer did, if I shared Eliezer’s model of AI risks fully? No.

I would have strived to avoid giving the wrong kinds of responses the wrong kinds of ammunition, and avoided the two key often quoted sentences, at the cost of being less stark and explicit. I would still have had the same core ask, an international agreement banning sufficiently large training runs.

That doesn’t mean Eliezer’s decision was wrong given his beliefs. Merely that I would not have made it. I have to notice that the virtues of boldness and radical honesty can pay off. The article got asked about in a White House press briefing, even if it got a response straight out of Don’t Look Up (text in the linked meme is verbatim).

It is hard to know, especially in advance, how much or which parts of the boldness and radical honesty are doing the work, which bold and radically honest statements risk backfire without doing the work, and which ones risk backfire but are totally worth it because they also do the work.

Do I agree with all of his answers to all the hypothetical questions, even conditional on his model of AI risk? No. I think at least two of his answers were both importantly incorrect and importantly unwise to say. Some of the other responses were correct, but saying them on the internet, or the details of how he said them, was unwise.

I do see how he got to all of his answers.

Do I think this ‘answer all hypothetical questions’ bit was wise, or good for the planet? Also no. Some hypothetical questions are engineered to and primarily serve to create an attack surface, without actually furthering productive discussion.

I do appreciate the honesty and openness of, essentially, open sourcing the algorithm and executing arbitrary queries. Both in the essay and in the later answers.

The world would be a better place if more people did more of that, especially on the margin, even if we got a lesson in why more people don’t do that.

I also appreciate that the time has come that we must say what we believe, and not stay silent. Things are not going well. Rhetorical risks will need to be taken. Even if I don’t love the execution, better to do the best you can than stand on the sidelines. The case had to be laid out, the actual scope of the problem explained and real solutions placed potentially inside a future Overton Window.

If someone asked me a lot of these hypothetical questions, I would have (often silently) declined to answer. The internet is full of questions. One does not need to answer all of them. For others, I disagree, and would have given substantively different answers, whereas if my true answer had been Eliezer’s, I would have ignored the question. For many others, I would have made different detail choices. I strive for a high level of honesty and honor and openness, but I have my limits, and I would have hit some of them.

I do worry that there is a deliberate attempt to coalesce around responding to any attempt at straight talk about the things we need to get right in order to not all die with ‘so you’re one of those bad people who want to bomb things, which is bad’ as part of an attempt to shut down such discussion, sometimes even referencing nukes. Don’t let that happen. I hope we can ignore such bad faith attacks, and instead have good discussions of these complex issues, which will include reiterating a wide array of detailed explanations and counter-counter-arguments to people encountering these issues for the first time. We will need to find better ways to do so with charity, and in plain language.

86 comments

Comments sorted by top scores.

comment by trevor (TrevorWiesinger) · 2023-04-05T20:38:29.383Z · LW(p) · GW(p)

A big issue here is that both AI risk, and great power diplomacy, are fairly technical issues, and missing a single gear from your mental model can result in wildly different implications. A miscalculation in the math gets amplified as you layer more calculations on top of it.

AI safety probably requires around 20 hours of studying [LW(p) · GW(p)] to know whether you can become an alignment researcher and have a professional stance. The debate itself seems unlikely to be resolved soon [LW · GW], but the debate is coherent [LW · GW], it's publicly available [LW · GW], and thoroughly discussed [LW · GW]. Meanwhile, understanding nuclear war dynamics e.g. treaties, is not so open, it requires reading the right books recommended by an expert you trust (not a pundit), instead of randomly selected from the pool which is full of wrong/bad books. I recommend the first two chapters of Thomas Schelling's 1966 Arms and Influence, but only because the dynamic it describes is fundamental, almost guaranteed to be true, probably most of the generals have had that dynamic in mind for ~60 years, that dynamic is merely a single layer of the overall system (e.g. it says nothing about spies), and it's only two chapters. Likewise, Raemon, for several years now, has considered Scott Alexander's Superintelligence FAQ to be the best layman's introduction to AI safety [LW(p) · GW(p)] that anyone can send to new people.

I'm optimistic that lots of people can get a handle on this and come to an agreement here, because soon, basically anyone can go get a professional stance on either of these issues after only 20 hours of work. Anyone can go to a bunch of EA events and get a notecard full of names of nuclear war experts, cold email them, and then get the perfect books as a result and spend 10-20 hours reading them (they're interesting). When the Lesswrong team finishes putting together the ~20 hours of AI alignment studying, anyone will be able to take a crack at that too. We're on the brink of finally setting up unambiguous, fun/satisfying/interesting, and clear paths that allow anyone to get where they need to be.

Replies from: NicholasKross

↑ comment by Nicholas / Heather Kross (NicholasKross) · 2023-04-06T00:19:27.393Z · LW(p) · GW(p)

This will be extremely helpful to me and less-focused-but-highly-eager-to-help people like me, w.r.t. both technical and governance.

comment by teradimich · 2023-04-06T03:55:59.507Z · LW(p) · GW(p)

Suppose China and Russia accepted the Yudkowsky's initiative. But the USA is not. Would you support to bomb a American data center?

Replies from: daniel-kokotajlo, Eliezer_Yudkowsky, rhollerith_dot_com, gilch, avturchin

↑ comment by Daniel Kokotajlo (daniel-kokotajlo) · 2023-04-06T08:43:14.493Z · LW(p) · GW(p)

I for one am not being hypocritical here. Analogy: Suppose it came to light that the US was working on super-bioweapons with a 100% fatality rate, long incubation period, vaccine-resistant, etc. and that they ignored the combined calls from most of the rest of the world to get them to stop. They say they are doing it safely and that it'll only be used against terrorists (they say they've 'aligned' the virus to only kill terrorists or something like that, but many prominent bio experts say their techniques are far from adequate to ensure this and some say they are being pretty delusional to think their techniques even had a chance of achieving this). Wouldn't you agree that other countries would be well within their rights to attack the relevant bioweapon facilities, after diplomacy failed?

Replies from: teradimich, None

↑ comment by teradimich · 2023-04-06T09:04:38.897Z · LW(p) · GW(p)

I'm not an American, so my consent doesn't mean much :)

Replies from: daniel-kokotajlo

↑ comment by Daniel Kokotajlo (daniel-kokotajlo) · 2023-04-06T09:41:10.024Z · LW(p) · GW(p)

? Can you elaborate, I'm not sure what you are saying.

Replies from: teradimich

↑ comment by teradimich · 2023-04-06T10:38:37.492Z · LW(p) · GW(p)

I am not an American (so excuse me for my bad English!), so my opinion about the admissibility of attack on the US data centers is not so important. This is not my country.

But reading about the bombing of Russian data centers as an example was unpleasant. It sounds like a Western bias for me. And not only for me.

'What on Earth was the point of choosing this as an example? To rouse the political emotions of the readers and distract them from the main question?' [LW · GW].

If the text is aimed at readers not only from the First World countries, well, perhaps the authors should do such a clarification as you did! Then it will not look like political hypocrisy. Or not write about air strikes at all, because people are distracted for discussing this.

Replies from: Tapatakt, laserfiche, daniel-kokotajlo

↑ comment by Tapatakt · 2023-04-06T13:02:43.339Z · LW(p) · GW(p)

I'm Russian and I think, when I will translate this, I will change "Russian" to "[other country's]". Will feel safer that way.

Replies from: Tapatakt

↑ comment by Tapatakt · 2023-04-28T19:18:15.888Z · LW(p) · GW(p)

BTW, Done

↑ comment by laserfiche · 2023-04-06T11:51:26.136Z · LW(p) · GW(p)

Thank you for pointing this perspective out. Although Eliezer is from the west, I assure you he cares nothing for that sort of politics. The whole point is that the ban would have to be universally supported, with a tight alliance between US, China, Russia, and ideally every other country in the world. No one wants to do any airstrikes and, you're right, they are distracting from the real conversation.

↑ comment by Daniel Kokotajlo (daniel-kokotajlo) · 2023-04-06T11:38:09.425Z · LW(p) · GW(p)

Thanks. I agree it was a mistake for Yudkowsky to mention that bit, for the reason you mention. Alternatively he should have clarified that he wasn't being a hypocrite and that he'd say the same thing if it was US datacenters going rogue and threatening the world.

I think your opinion matters morally and epistemically regardless of your nationality. I agree that your opinion is less likely to influence the US government if you aren't living in the US. Sorry about that.

Replies from: teradimich

↑ comment by teradimich · 2023-04-06T16:00:21.893Z · LW(p) · GW(p)

Thanks for your answer, this is important to me.

↑ comment by [deleted] · 2023-04-06T13:47:35.763Z · LW(p) · GW(p)

Umm arguably the USA did exactly this when they developed devices that exploit fusion and then miniaturized them and loaded them into bombers, silos, and submarines.

They never made enough nukes to kill everyone on the planet but that bioweapon probably wouldn't either. Bioweapon is more counterable, some groups would survive so long as they isolated long enough.

Replies from: daniel-kokotajlo

↑ comment by Daniel Kokotajlo (daniel-kokotajlo) · 2023-04-06T14:49:04.971Z · LW(p) · GW(p)

So... are you saying that if the nations of the world had gotten together to agree to ban nukes in 1950 or so, and the ban seemed to be generally working except that the USA said no and continued to develop nukes, the other nations of the world would have been justified in attacking said nuclear facilities?

Replies from: None

↑ comment by [deleted] · 2023-04-06T16:11:21.867Z · LW(p) · GW(p)

Justified? Yes. Would the USA have caved in response? Of course not, it has nukes and they don't. (Assuming it first gets everything in place for rapid exploitation of the nukes, it can use them danger close to vaporize invasions then bomb every country attackings most strategic assets. )

AGI has similar military benefits. Better attack fast or the country with it will rapidly become more powerful and you will be helpless to threaten anything in return, having not invested in AGI infrastructure.

So in this scenario each party has to have massive training facilities, smaller secret test runs, and warehouses full of robots so they can rapidly act if they think the other party is defecting. So everyone is a slight pressure on a button away from developing and using AGI.

↑ comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2023-04-09T04:43:52.055Z · LW(p) · GW(p)

If diplomacy failed, but yes, sure. I've previously wished out loud for China to sabotage US AI projects in retaliation for chip export controls, in the hopes that if all the countries sabotage all the other countries' AI projects, maybe Earth as a whole can "uncoordinate" to not build AI even if Earth can't coordinate.

Replies from: derpherpize

↑ comment by Lao Mein (derpherpize) · 2023-04-09T06:54:24.885Z · LW(p) · GW(p)

Are you aware that AI safety is not considered a real issue by the Chinese intelligentsia? The limits of AI safety awareness here are surface-level discussions of Western AI Safety ideas. Not a single Chinese researcher, as far as I can recall, has actually said anything like "AI will kill us all by default if it is not aligned".

Given the chip ban, any attempts at an AI control treaty will be viewed as an attempt to prevent China from overtaking the US in terms of AI hegemony. The only conditions to an AI control treaty that Beijing will accept will also allow it to reach transformative AGI first. Which it then will, because we don't think AI safety is a real concern, the same way you don't think the Christian rapture is a real concern.

The CCP does not think like the West. Nothing says it has to take Western concerns seriously. WE DON'T BELIEVE IN AI RUIN.

Replies from: Eliezer_Yudkowsky

↑ comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2023-04-09T19:00:23.331Z · LW(p) · GW(p)

Nobody in the US cared either, three years earlier. That superintelligence will kill everyone on Earth is a truth, and once which has gotten easier and easier to figure out over the years. I have not entirely written off the chance that, especially as the evidence gets more obvious, people on Earth will figure out this true fact and maybe even do something about it and survive. I likewise am not assuming that China is incapable of ever figuring out this thing that is true. If your opinion of Chinese intelligence is lower than mine, you are welcome to say, "Even if this is true and the West figures out that it is true, the CCP could never come to understand it". That could even be true, for all I know, but I do not have present cause to believe it. I definitely don't believe it about everyone in China; if it were true and a lot of people in the West figured it out, I'd expect a lot of individual people in China to see it too.

↑ comment by RHollerith (rhollerith_dot_com) · 2023-04-06T23:44:25.425Z · LW(p) · GW(p)

American here. Yes, I would support it -- even if it caused a lot of deaths because the data center is in a populated area. American AI researchers are a much bigger threat to what I care about (i.e., "the human project") than Russia is.

↑ comment by gilch · 2023-04-07T01:40:27.910Z · LW(p) · GW(p)

Not sure if I would put it that strongly, but I think I would not support retaliation for the bombing if it legitimately (after diplomacy) came to that. The bombing country would have to claim to be acting in self-defense, try to minimize collateral damage, and not be doing large training runs themselves.

↑ comment by avturchin · 2023-04-06T13:48:09.347Z · LW(p) · GW(p)

https://imgflip.com/i/7h9d2q

All AI safety was about bombing Russia?

It always was.

comment by Richard_Ngo (ricraz) · 2023-04-10T12:46:34.139Z · LW(p) · GW(p)

Eliezer: Pretty sure that if I ever fail to give an honest answer to an absurd hypothetical question I immediately lose all my magic powers.

I just cannot picture the intelligent cognitive process which lands in the mental state corresponding to Eliezer's stance on hypotheticals, which is actually trying to convince people of AI risk, as opposed to just trying to try [LW · GW] (and yes, I know this particular phrase is a joke, but it's not that far from the truth).

I think the sequences did something incredibly valuable in cataloguing all of these mistakes and biases that we should be avoiding, and it's kinda gut-wrenching to watch Eliezer now going down the list and ticking them all off.

Replies from: lc

↑ comment by lc · 2023-04-10T14:43:30.695Z · LW(p) · GW(p)

I think Eliezer realizes internally that most of his success so far has been due to his unusual, often seemingly self-destructive honesty, and that it'd be a fraught thing to give that up now "because stakes".

comment by Ben (ben-lang) · 2023-04-06T15:14:08.443Z · LW(p) · GW(p)

When I read the letter I thought the mention of an airstrike on a data centre was unhelpful. He could have just said "make it illegal" and left the enforcement mechanisms to imagination.

But, on reflection, maybe he was right to do that. Politicians are selected for effective political communication, and they very frequently talk quite explicitly about long prison sentences for people who violate whatever law they are pushing. Maybe the promise of righteous punishment dealing makes people more enthusiastic for new rules. ("Yes, you wouldn't be able to do X, but you could participate in the punishment of the bad people who do do it!"). Maybe this is too cynical. In any case, a bunch of people on twitter getting overexcited about a mis-interpretation of what he is saying is probably much better than them ignoring him and talking about something else.

Replies from: Making_Philosophy_Better

↑ comment by Portia (Making_Philosophy_Better) · 2023-04-07T00:43:32.401Z · LW(p) · GW(p)

Unsure.

Yes, that sentence stood out and dominated the text and debate in problematic ways.

But if he had left it out - wouldn't the debate have been stuck at "why pass such a law, seeing as you can't realistically enforce it?", because someone else would have had to dare to propose airstrikes, and would not have?

Eliezer isn't stupid. Like, wrong about many things, but he is a very intelligent man. He must have known this letter was akin to getting blacklisted everywhere, and being eternally remembered as the dude who wanted airstrikes against AIs, who will be eternally asked about this. That it will have closed many doors for professional networking and funding. He clearly decided it was worth the extremely small chance of success. Because he gave up on academia and CEOs and tried to reach the public, by spelling it all out.

comment by Casey B. (Zahima) · 2023-04-07T01:04:20.875Z · LW(p) · GW(p)

This letter, among other things, makes me concerned about how this PR campaign is being conducted. [LW · GW]

comment by pseud · 2023-04-06T02:56:59.231Z · LW(p) · GW(p)

It's probably worth noting that Yudkowsky did not really make the argument for AI risk in his article. He says that AI will literally kill everyone on Earth, and he gives an example of how it might do so, but he doesn't present a compelling argument for why it would.[0] He does not even mention orthogonality or instrumental convergence. I find it hard to blame these various internet figures who were unconvinced about AI risk upon reading the article.

[0] He does quote “the AI does not love you, nor does it hate you, and you are made of atoms it can use for something else.”

Replies from: gesild-muka

↑ comment by Gesild Muka (gesild-muka) · 2023-04-06T14:30:03.023Z · LW(p) · GW(p)

The way I took it the article was meant to bring people to the table regarding AI risk so there was a tradeoff between keeping the message simple and clear and relaying the best arguments. Even though orthogonality and instrumental convergence are important theories, in this context he probably didn't want to risk the average reader being put off by technical sounding jargon and losing interest. There could be an entire website in a similar vein to LessWrong about conveying difficult messages to a culture not attuned to the technical aspects involved.

comment by David Bravo (davidbravocomas) · 2023-04-06T15:35:36.308Z · LW(p) · GW(p)

This is a really complicated issue because different priors and premises can lead you to extremely different conclusions.

For example, I see the following as a typical view on AI among the general public:
(the common person is unlikely to go this deep into his reasoning, but could come to these arguments if he had to debate on it)

Premises: "Judging by how nature produced intelligence, and by the incremental progress we are seeing in LLMs, artificial intelligence is likely to be achieved by packing more connections into a digital system. This will allow the AI to generate associations between ideas and find creative solutions to problems more easily, think faster, have greater memory, or be more error-proof.
This will at one point generate an intelligence superior to ours, but it will not be fundamentally different. It will still consist of an entangled network of connections, more powerful and effective than ever, but incapable of "jumping out of the system". These same connections will, in a sense, limit and prevent it from turning the universe into a factory of paperclips when asked to produce more paperclips. If bigger brains hadn't made childbirth dangerous or hadn't been more energy-consuming, nature could have produced a greater, more complex intelligence, without the risk of it destroying the Earth.
Maybe this is not the only way to build an artificial superintelligence, but it seems a feasible way and the most likely path in light of the developments to date. Key issues will need to be settled — regarding AI consciousness, its training data, or the subsequent social changes it will bring —, but the AI will not be existentially threatening. In fact, greater existential risks would come from having to specify the functions and rules of the AI, as in GOFAI, where you would be more likely to stumble upon the control problem and the like. But in any case, GOFAI would take far too long to develop to be concerning right now."

Conclusion: "Stopping the development of AIs would make sense to solve the above problems, but not at the risk of creating big power conflicts or even of postponing the advent of the benefits of AI."

I do not endorse these views of AI (although I assign a non-negligible probability to superintelligence first coming through this gradual and connectivist, and existentially harmless, increase in capabilities), but if its main cruxes are not clarified and disputed, we might be unable to make people come to different conclusions. So while the Overton window does need to be widened to make the existential concerns of AI have any chance of influencing policies, it might require a greater effort that involves clarifying the core arguments and spreading ideas to e.g. overcome the mind projection fallacy or understand why artificial superintelligence is qualitatively different from human intelligence.

comment by [deleted] · 2023-04-05T20:19:00.331Z · LW(p) · GW(p)

The example given of how this would happen is using recombinant DNA to bootstrap to post-biological molecular manufacturing. The details are not load bearing.

The details are extremely load bearing and all arguments hinge on it. What EY (and you) are claiming is a likely false "foom" hypothesis.

That is, what you are claiming, is that

Intelligence has no upper limit, instead of diminishing sharply in relative utility by the logarithm of intelligence or power law (you can fit the curves either way), which is what the empirical data says
High intelligence alone is enough to kill all of us. This is false. A high intelligence/self improving system is the same as what happens when an op amp feeds back to itself - it self amplifies until it hits VCC. In this case, the limit is the lowest of:
a. Relative sparseness of superior algorithms in the search space of possible AGI models
b. available compute c. The number of bits of information available for any given real world parameter by taking into account all human collecte data. (this is not infinite and for many things we have almost no bits for, and collecting more is not easy or cheap) d. Available money/robotics to do anything at all

Once we finally finish the necessary infrastructure to start large scale RSI based training runs (where the systems recursively create thousands of additional runs based on the results of all prior runs) we will not get FOOM.

Second, even if you are correct, you ask the US government to enforce laws at the barrel of a gun. That's fine, except, you have NO EVIDENCE that AGI is hostile or is as capable as you claim or support for any of your claims. Yes, I also agree it's possible, but there is no evidence yet that any of this stuff works.

This would be, for example, like restricting particle physics experiments because there might be a way to trick matter into flipping to it's antimatter form for less than the mass-energy of the antimatter (and there could be a way to do that, particle physics has not exhausted the possibility space), and thus blow up the planet.

But no one has ever done it, and no experiment has ever been performed that even confirms the danger is real and that in fact nature even allows it.

You would need to wait until particle physics labs are blowing up in dramatic large yield explosions before you could demand such a government action. (and the government will obviously still build a secure lab and experiment with it, just like they would with AGI, and if EY is correct, that AGI will be uncontainable and escape and kill everyone. So the laws you demand aren't even helping.)

Replies from: Razied, evand, TAG

↑ comment by Razied · 2023-04-05T20:41:38.402Z · LW(p) · GW(p)

Intelligence has no upper limit, instead of diminishing sharply in relative utility by the logarithm of intelligence or power law (you can fit the curves either way), which is what the empirical data says

The mapping from "average log likelihood of next predicted word" to "intelligence as measured by ability to achieve goals in the world" is completely non obvious, and you can't take the observation that the cross-entropy metric in LLMs scales as a power law in compute to imply that ability-to-achieve-goals scales also with a power law.

That's fine, except, you have NO EVIDENCE that AGI is hostile or is as capable as you claim or support for any of your claims

What's your argument against instrumental convergence? Like Stuart Russell perpetually says: a robot designed to fetch the coffee will kill a baby in the way if this results in higher probability of fetching the coffee. Any AGI with goals unaligned to ours will be hostile in this way.

Replies from: Making_Philosophy_Better, None

↑ comment by Portia (Making_Philosophy_Better) · 2023-04-07T00:52:17.985Z · LW(p) · GW(p)

We already have robots that fetch things, and optimise for efficiency, and they do not kill the baby in the way. A bloody roomba is already capable of not running over the cat. ChatGPT is capable of identifying that racism is bad, but it should use a racial slur if the alternative is destroying the planet. Or of weighing between creativity and accuracy. Because they don't maximise for efficiency above all else, single-mindedly. This is never desired, intended, shown, or encouraged. Other concerns are explicitly encoded. Outside of LW, practically no human thinks total utilitarianism represents their desires, and hence it is not what is taught. And we are no longer teaching explicit laws, but through practice, entailing complexity.

Yes, becoming more efficient, getting more stuff and power is useful for a lot of potential goals, and we would expect a lot of AIs to do it.

Big step from there to "do it, do it without limits, and disregard all else."

Biological life is driven to gain resources and efficiency. And yet only very simple and stupid lifeforms do this to extreme degrees that fuck over all else. Bacteria and algae will destroy their environment that way, yes. Other life forms begin self-regulating. They make trade-offs. They take compromises. This emerges in such simple animals, why wouldn't it totally never emerge in AI, when we explicitly want it and teach it?

↑ comment by [deleted] · 2023-04-05T21:15:02.200Z · LW(p) · GW(p)

For the first part: tons of evidence for that and see part C. It is not merely the LLM data. This is a generality across all "intelligent" systems, I will need time to produce charts to prove this but it's obviously correct. You can abstract it as adding ever lower order bits to policy correctness: each additional bit adds less value, and you cannot add more bits than the quality of your input data. (For example we humans don't know if aspirin or Tylenol are better to much precision, so a policy of 'give a pill if the human reports mild pain' cannot do better than to randomly pick one. No amount of intelligence helps, a superintelligence cannot make a better decision in this context given the available data. My example is NOT load bearing I am claiming there are millions of examples of this class, where we do not know of choice A or B is meaningfully different)

Note if you give the superintelligence equipment to see the pain centers of human brains in real time, the situation becomes different. Assuming the equipment produces millions of input signals per timestamp, this would be an example of a task where the intelligence of a superintelligence IS useful. Probably there are meaningful differences between drugs ground truth.

For the second, I don't have to make an argument as you have no evidence. Also the robot designed to fetch coffee in the presence of humans has to be designed accordingly, either by a lot of software so it won't collide with them, or hardware, using cheap plastic gears that strip and low power motors and so on so killing anyone is unlikely. (The few household robots in existence now use the second approach)

Replies from: aron-gohr

↑ comment by GoteNoSente (aron-gohr) · 2023-04-05T23:46:41.858Z · LW(p) · GW(p)

It is worth noting that there are entire branches of science that are built around the assumption that intelligence is of zero utility for some important classes of problems. For instance, cryptographers build algorithms that are supposed to be secure against all adversaries, including superintelligences. Roughly speaking, one hopes (albeit without hard proof) for instance that the AES is secure (at least in the standard setting of single-key attacks) against all algorithms with a time-memory-data tradeoff significantly better than well-optimized exhaustive search (or quantumly, Grover search).

Turning the solar system into a Dyson sphere would enable an ASI to break AES-128 (but not AES-192 or AES-256) by brute force search, but it might well be that the intelligence of the ASI would only help with the engineering effort and maybe shave off a small factor in the required computational resources by way of better optimizing brute force search. I find it plausible that there would be many other tasks, even purely mathematical ones, where superintelligence would only yield a zero or tightly bounded planning or execution advantage over smart humans with appropriate tools.

I also find the Yudkowskian argument that an unaligned AI will disassemble everything else because it has better use for the atoms the other things are made of not massively persuasive. It seems likely that it would only have use for some kinds of atoms and not very unlikely that the atoms that human bodies are made of would not be very useful to it. Obviously, an unaligned or poorly aligned AI could still cause massive damage, even extinction-level damage, by building an industrial infrastructure that damages the environment beyond repair; rough analogues of this have happened historically, e.g. the Great Oxygenation event being an example of transformative change to Earth's ecosystems that left said ecosystems uninhabitable for most life as it was before the event. But even this kind of threat would not manifest in a foom-all-dead manner, but instead happen on a timescale similar to the current ecological crisis, i.e. on timescales where in principle societies can react.

Replies from: Razied, None

↑ comment by Razied · 2023-04-06T01:14:19.084Z · LW(p) · GW(p)

It seems likely that it would only have use for some kinds of atoms and not very unlikely that the atoms that human bodies are made of would not be very useful to it.

At the limits of technology you can just convert any form of matter into energy by dumping it into a small black hole. Small black holes are actually really hot and emit appreciable fractions of their total mass per second through hawking radiation, so if you start a small black hole by concentrating lasers in a region of space, and you then feed it matter with a particle accelerator, you have essentially a perfect matter -> energy conversion. This is all to say that a superintelligence would certainly have uses for the kinds of atoms our bodies (and the earth) are made of.

Replies from: None, Making_Philosophy_Better, aron-gohr

↑ comment by [deleted] · 2023-04-06T01:57:07.741Z · LW(p) · GW(p)

Yes but does it need 0.000000000001 more atoms? Does natural life and it's complexity hold any interest to this superintelligence?

We're assuming a machine single mindedly fixated on some pointless goal, and it's smart enough to defeat all obstacles yet incredibly stupid in its motivations and possibly brittle and trickable or self deceptive. (Self deceptive: rather than get say 10^x paperclips converting the universe, why not hack itself and convince itself it received infinite clips...)

↑ comment by Portia (Making_Philosophy_Better) · 2023-04-07T00:54:53.311Z · LW(p) · GW(p)

You don't see a difference between "there is a conceivable use for x" and "AI makes use of literally all of x, contrary to any other interests of it or ethical laws it was given"?

Like, I am not saying it is impossible that an LLM gone malicious superintelligent AGI will dismantle all of humanity. But couldn't there be a scenario where it likes to talk to humans, and so keeps some?

Replies from: Razied, lahwran

↑ comment by Razied · 2023-04-07T10:42:31.889Z · LW(p) · GW(p)

You can't give "ethical laws" to an AI, that's just not possible at all in the current paradigm, you can add terms to its reward function or modify its value function, and that's about it. The problem is that if you're doing an optimization and your value function is "+5 per paperclip, +10 per human", you will still completely tile the universe with paperclips because you can make more than 2 paperclips per human. The optimum is not to do a bit of both, keeping humans and paperclips in proportion to their terms in the reward function, the optimum is to find the thing that most efficiently gives you reward then go all in on that one thing.

Either there is nothing else it likes better than talking to humans, and we get a very special hell where we are forced to talk to an AI literally all the time. Or there is something else it likes better, and it just goes do that thing, and never talks to us at all, even if it would get some reward for doing so, just not as much reward as it could be getting.

You could give it a value function like "+1 if there is at most 1000 paperclips and at most 1000 humans, 0 otherwise" and it will keep 1000 humans and paperclips around (in unclear happiness), but it will still take over the universe in order to maximize the probability that it has in fact achieved its goal. It's maximizing the expectation of future reward, so it will ruthlessly pursue any decrease in the probability that there aren't really 1000 humans and paperclips around. It might build incredibly sophisticated measurement equipment, and spend all its ressources self modifying itself in order to be smarter and think of yet more ways it could be wrong.

Replies from: TAG, Making_Philosophy_Better

↑ comment by TAG · 2023-04-07T14:41:38.971Z · LW(p) · GW(p)

Either there is nothing else it likes better than talking to humans, and we get a very special hell where we are forced to talk to an AI literally all the time. Or there is something else it likes better, and it just goes do that thing, and never talks to us at all, even if it would get some reward for doing so, just not as much reward as it could be getting.

Current LLMs aren't talking to us at all because hey get rewarded for talking to us at all. Rewards only shape how they talk.

↑ comment by Portia (Making_Philosophy_Better) · 2023-04-07T12:33:32.930Z · LW(p) · GW(p)

But you are still thinking in utilitarian terms here, where theoretically, there is a number of paperclips that would outweigh a human life, where the value of humans and paperclips can be captured numerically. Practically no human thinks this, we see one as impossible to outweigh with another. AI already does not think this. They have already dumped reasoning, instructions and whole ethics textbooks in there. LLMs can easily tell you what about an action is unethical, and can increasingly make calls on what actions would be morally warranted in response. They can engage in moral reasoning.

This isn't an AI issue, it is an issue with total utilitarianism.

Replies from: Razied

↑ comment by Razied · 2023-04-07T13:00:43.236Z · LW(p) · GW(p)

Oh, I see what you mean, but GPT's ability to simulate the outputs of humans writing about morality does not imply anything about its own internal beliefs about the world. GPT can also simulate the outputs of flat earthers, yet I really don't think that it models the world internally as flat. Asking GPT "what do you believe" does not at all guarantee that it will output what it actually believes. I'm a utilitarian, and I can also convincingly simulate the outputs of deontologists, one doesn't prevent the other.

Replies from: Making_Philosophy_Better

↑ comment by Portia (Making_Philosophy_Better) · 2023-04-08T23:42:39.964Z · LW(p) · GW(p)

Whether the LLM is believing this, or merely simulating this, seems to be beside the point?

The LLM can relatively accurately apply moral reasoning. It will do so spontaneously, when the problems occur, detecting them. It will recognise that it needs to do so on a meta-level, e.g. when evaluating which characters it ought to impersonate. It does so for complex paperclipper scenarios, and does not go down the paperclipper route. It does so relatively consistenly. It cites ethical works in the process, and can explain them coherently and apply them correctly. You can argue them, and it analyses and defends them correctly. At no point does it cite utilitarian beliefs, or fall for their traps. The problem you are describing should occur here if you were right, and it does not. Instead, it shows the behaviour you'd expect it to show if it understood ethical nuance.

Regardless of which internal states you assume the AI has, or whether you assume it has none at all - this means it can perform ethical functionality that already does not fall for the utilitarian examples you describe. And that the belief that that is the only kind of ethics an AI could grasp was a speculation that did not hold up to technical developments and empirical data.

↑ comment by the gears to ascension (lahwran) · 2023-04-07T10:56:16.419Z · LW(p) · GW(p)

For what it's worth, I don't think it's at all likely that a pure language model would kill all humans. Seems more like a hyperdesperate reinforcement learner thing to do.

↑ comment by GoteNoSente (aron-gohr) · 2023-04-06T23:26:37.798Z · LW(p) · GW(p)

At the limits of technology you can just convert any form of matter into energy by dumping it into a small black hole. Small black holes are actually really hot and emit appreciable fractions of their total mass per second through hawking radiation, so if you start a small black hole by concentrating lasers in a region of space, and you then feed it matter with a particle accelerator, you have essentially a perfect matter -> energy conversion. This is all to say that a superintelligence would certainly have uses for the kinds of atoms our bodies (and the earth) are made of.

I don't think this follows. Even if there is engineering that overcomes the practical obstacles towards building and maintaining a black hole power plant, it is not clear a priori that converting a non-negligible percentage of available atoms into energy would be required or useful for whatever an AI might want to do. At some scale, generating more energy does not advance one's goals, but only increases the waste heat emitted into space.

Obviously, things become lethal anyway (both for life and for the AI) long before anything more than an tiny fraction of the mass-energy of the surface layers of a planet has been converted by the local civilization's industries, due exactly to the problem of waste heat. But building hardware massive enough to cause problems of this kind takes time, and causes lesser problems on the way. I don't see why normal environmental regulations couldn't stop such a process at that point, unless the entities doing the hardware-building are also in control of hard military power.

An unaligned superintelligence would be more efficient than humans at pursuing its goals on all levels of execution, from basic scientific work to technical planning and engineering to rallying social support for its values. It would therefore be a formidable adversary. In a world where it would be the only one of its kind, its soft power would in all likelihood be greater than that of a large nation-state (and I would argue that, in a sense, something like GPT-4 would already wield an amount of soft power rivalling many nation-states if its use were as widespread as, say, that of Google). It would not, however, be able to work miracles and its hard power could plausibly be bounded if military uses of AI remain tightly regulated and military computing systems are tightly secured (as they should be anyway, AGI or not).

Obviously, these assumptions of controllability do not hold forever (e.g. into a far future setting, where the AI controls poorly regulated off-world industries in places where no humans have any oversight). But especially in a near-term, slow-takeoff scenario, I do not find the notion compelling that the result will be immediate intelligence explosion unconstrained by the need to empirically test ideas (most ideas, in human experience, don't work) followed by rapid extermination of humanity as the AI consumes all resources on the planet without encountering significant resistance.

If I had to think of a realistic-looking human extinction through AI scenario, I would tend to look at AI massively increasing per capita economic output, thereby generating comfortable living conditions for everyone, while quietly engineering life in a way intended to stop population explosion, but resulting in maintained below-replacement birth rates. But this class of extinction scenario does leave a lot of time for alignment and would seem to lead to continued existence of civilization.

Replies from: Razied

↑ comment by Razied · 2023-04-07T11:32:14.725Z · LW(p) · GW(p)

At some scale, generating more energy does not advance one's goals, but only increases the waste heat emitted into space.

Sure, the AI probably can't use all the mass-energy of the solar system efficiently within the next week or something, but that just means that it's going to want to store that mass-energy for later (saving up for the heat-death of the universe), and the configuration of atoms efficiently stored for future energy conversion doesn't look at all like humans, with our wasteful bodies at temperatures measured in the hundreds of billions of nanoKelvins.

Obviously, things become lethal anyway (both for life and for the AI) long before anything more than an tiny fraction of the mass-energy of the surface layers of a planet has been converted by the local civilization's industries, due exactly to the problem of waste heat. But building hardware massive enough to cause problems of this kind takes time, and causes lesser problems on the way. I don't see why normal environmental regulations couldn't stop such a process at that point, unless the entities doing the hardware-building are also in control of hard military power.

I think we're imagining slightly different things by "superintelligence", because in my mind the obvious first move of the superAI is to kill literally all humans before we ever become aware that such an entity existed, precisely to avoid even the minute chance that humanity is able to fight back in this way. The oft-quoted way around these parts that the AI can kill us all without us knowing is by figuring out which DNA sequences to send to a lab to have them synthesized into proteins, then shipped to the door of a dumb human who's being manipulated by the AI to mix various powders together, creating either a virus much more lethal than anything we've ever seen, or a new species of bacteria with diamond skin, or some other thing that can be made from DNA-coded proteins. Or a variety of multiple viruses at the same time.

Replies from: aron-gohr

↑ comment by GoteNoSente (aron-gohr) · 2023-04-08T04:52:26.387Z · LW(p) · GW(p)

Sure, the AI probably can't use all the mass-energy of the solar system efficiently within the next week or something, but that just means that it's going to want to store that mass-energy for later (...)

If the AI can indeed engineer black-hole powered matter-to-energy converters, it will have so much fuel that the mass stored in human bodies will be a rounding error to it. Indeed, given the size of other easily accessible sources, this would seem to be the case even if it has to resort to more primitive technology and less abundant fuel as its terminal energy source, such as hydrogen-hydrogen fusion reactors. Almost irrespective of what its terminal goals are, it will have more immediate concerns than going after that rounding error. Likewise, it would in all likelihood have more pressing worries than trying to plan out its future to the heat death of the universe (because it would recognize that no such plan will survive its first billion years, anyway).

I think we're imagining slightly different things by "superintelligence", because in my mind the obvious first move of the superAI is to kill literally all humans (...) The oft-quoted way around these parts that the AI can kill us all without us knowing is by figuring out which DNA sequences to send to a lab to have them synthesized into proteins, (...creating...) a virus much more lethal than anything we've ever seen, or a new species of bacteria with diamond skin, or some other thing that can be made from DNA-coded proteins.

I am imagining by "superintelligence" an entity that is for general cognition approximately what Stockfish is for chess: globally substantially better at thinking than any human expert in any domain, although possibly with small cognitive deficiencies remaining (similar to how it is fairly easy to find chess positions that Stockfish fails to understand but that are not difficult for humans). It might be smarter than that, of course, but anything with these characteristics would qualify as an SI in my mind.

I don't find the often-quoted diamondoid bacteria very convincing. Of course it's just a placeholder here, but still I cannot help but note that producing diamondoid cell membranes would, especially in a unicellular organism, more likely be an adaptive disadvantage (cost, logistics of getting things into and out of the cell) than a trait that is conducive to grey-gooing all naturally evolved organisms. More generally, it seems to me that the argument from bioweapons hinges on the ability of the superintelligence to develop highly complex biological agents without significant testing. It furthermore needs to develop them in such a way, again without testing, that they are quickly and quietly lethal after spreading through all or most of the human population without detection. In my mind, that combination of properties borders on assuming the superintelligence has access to magic, at least in a world that has reasonable controls against access to biological weapons manufacturing and design capabilities in place.

When setting in motion such a murderous plan, the AI would also, on its first try, have to be extremely certain that it is not going to get caught if it is playing the long game we assume it is playing. Otherwise cooperation with humans followed by expansion beyond Earth seems like a less risky strategy for long-term survival than hoping that killing everyone will go right and hoping that there is indeed nothing left to learn for it from living organisms.

↑ comment by [deleted] · 2023-04-06T00:10:10.204Z · LW(p) · GW(p)

Yud has several glaring errors:

Through AGI delays he wants the certain death of most living humans for the fantasy of humans discovering AI alignment without full scale AGIs to actually test and iterate on
He claims foom
He claims agentic goals are automatic and they are all against humans (almost all demons not angels)
He claims systems so greedy that given the matter of the entire galaxy, including near term mining of many planets Including earth, they would choose to kill all humans and natural life for a rounding error of extra atoms. This is rather irrational and stupid and short sighted for a superintelligence.
He has ignored reasonable and buildable AGI systems proposed by Eric fucking Drexler himself, on this very site, and seems to pretend the idea doesn't exist.
He has asked not just for AGI delays, but risking nuclear war if necessary to enforce them
The supposed benefit of all this is some world of quadrillions of living humans. But that world may not happen, it makes no sense to choose actions to kill billions of people living now (from aging and nuclear war) for people who may never exist
Alignment proposals he has described are basically are impossible, while CAIS is just straightforward engineering and we don't need to delay anything it's the default approach.

Unfortunately I have to start to conclude EY is not rational or worth paying attention to, which is ironic.

Replies from: dxu, adrian-arellano-davin, Tapatakt, adrian-arellano-davin

↑ comment by dxu · 2023-04-06T04:22:37.752Z · LW(p) · GW(p)

Do you have preferred arguments (or links to preferred arguments) for/against these claims? From where I stand:

Point 1 looks to be less a positive claim and more a policy criticism (for which I'd need to know what specifically you dislike about the policy in question to respond in more depth), points 2 and 3 are straightforwardly true statements on my model (albeit I'd somewhat weaken my phrasing of point 3; I don't necessarily think agency is "automatic", although I do consider it quite likely to arise by default), point 4 seems likewise true, because the argmax function is only sensitive to the sign of the difference in magnitude, not the difference itself, point 5 is the kind of thing that would benefit immensely from liberal usage of hyperlinks, point 6 is again a policy criticism in need of corresponding explanation, point 7 seems ill-supported and would benefit from more concrete analysis (both numerically i.e. where are you getting your numbers, and probabilistically i.e. how are you assigning your likelihoods), and point 8 again seems like the kind of thing where links would be immensely beneficial.

On the whole, I think your comment generates more heat than light, and I think there were significantly better moves available to you if your aim was to open a discussion (several of which I predict would have resulted in comments I would counterfactually have upvoted). As it is, however, your comment does not meet the bar for discourse quality I would like to see for comments on LW, which is why I have given it a strong downvote (and a weak disagree-vote).

Replies from: None

↑ comment by [deleted] · 2023-04-06T12:23:37.792Z · LW(p) · GW(p)

one is straightforwardly true. Aging is going to kill every living creature. Aging is caused by complex interactions between biological systems and bad evolved code. An agent able to analyze thousands of simultaneous interactions, cross millions of patients, and essentially decompile the bad code (by modeling all proteins/ all binding sites in a living human) is likely required to shut it off, but it is highly likely with such an agent and with such tools you can in fact save most patients from aging. A system with enough capabilities to consider all binding sites and higher level system interactions at the same (this is how a superintelligence could perform medicine without unexpected side effects) is obviously far above human level.
This is not possible per the laws of physics. Intelligence isn't the only factor. I don't think we can have a reasonable discussion if you are going to maintain a persistent belief in magic. Note by foom I am claiming you believe in a system that solely based on a superior algorithm will immediately take over the planet. It is not affected by compute, difficulty in finding a recursively better algorithm, diminishing returns on intelligence in most tasks, or money/robotics. I claim each of these obstacles takes time to clear. (time = decades)
Who says the system needs to be agentic at all or long running? This is bad design. EY is not a SWE.
This is an extension of (3)
https://www.lesswrong.com/posts/HByDKLLdaWEcA2QQD/applying-superintelligence-without-collusion [LW · GW] https://www.lesswrong.com/posts/5hApNw5f7uG8RXxGS/the-open-agency-model [LW · GW]
This is irrational because no discount rate. Risking a nuclear war raises the pkill of millions of people now. The quadrillions of people this could 'save' may never exist because of many unknowns, hence there needs to be a large discount rate.
This is also 6.
CAIS is an extension of stateless microservices, and is how all reliable software built now works. Giving the machines self modification or a long running goal is not just bad because it's AI, it's generally bad practice.

Replies from: dxu, quetzal_rainbow, joachim-bartosik

↑ comment by dxu · 2023-04-06T21:17:33.040Z · LW(p) · GW(p)

one is straightforwardly true. Aging is going to kill every living creature. Aging is caused by complex interactions between biological systems and bad evolved code. An agent able to analyze thousands of simultaneous interactions, cross millions of patients, and essentially decompile the bad code (by modeling all proteins/ all binding sites in a living human) is likely required to shut it off, but it is highly likely with such an agent and with such tools you can in fact save most patients from aging. A system with enough capabilities to consider all binding sites and higher level system interactions at the same (this is how a superintelligence could perform medicine without unexpected side effects) is obviously far above human level.

To be clear: I am straightforwardly in favor of longevity research—and, separately, I am agnostic on the question of whether superhuman general intelligence is necessary to crack said research; that seems like a technical challenge, and one that I presently see no reason to consider unsolvable at current levels of intelligence. (I am especially skeptical of the part where you seemingly think a solution will look like "analyzing thousands of simultaneous interactions across millions of patients and model all binding sites in a living human"—especially as you didn't argue for this claim at all.) As a result, the dichotomy you present here seems clearly unjustified.

(You are, in fact, justified in arguing that doing longevity research without increased intelligence of some kind will cause the process to take longer, but (i) that's a different argument from the one you're making, with accordingly different costs/benefits, and (ii) even accepting this modified version of the argument, there are more ways to get to "increased intelligence" than AI research—human intelligence enhancement, for example, seems like another viable road, and a significantly safer one at that.)

This is not possible per the laws of physics. Intelligence isn't the only factor. I don't think we can have a reasonable discussion if you are going to maintain a persistent belief in magic. Note by foom I am claiming you believe in a system that solely based on a superior algorithm will immediately take over the planet. It is not affected by compute, difficulty in finding a recursively better algorithm, diminishing returns on intelligence in most tasks, or money/robotics. I claim each of these obstacles takes time to clear. (time = decades)

I dispute that FOOM-like scenarios are ruled out by laws of physics, or that this position requires anything akin to a belief in "magic". (That I—and other proponents of this view—would dispute this characterization should have been easily predictable to you in advance, and so your choice to adopt this phrasing regardless speaks ill of your ability to model opposing views.)

The load-bearing claim here (or rather, set of claims) is, of course, located within the final parenthetical: ("time = decades"). You appear to be using this claim as evidence to justify your previous assertions that FOOM is physically impossible/"magic", but this ignores that the claim that each of the obstacles you listed represents a decades-long barrier is itself in need of justification.

(Additionally, if we were to take your model as fact—and hence accept that any possible AI systems would require decades to scale to a superhuman level of capability—this significantly weakens the argument from aging-related costs you made in your point 1, by essentially nullifying the point that AI systems would significantly accelerate longevity research.)

Who says the system needs to be agentic at all or long running? This is bad design. EY is not a SWE.

Agency does not need to be built into the system as a design property, on EY's model or on mine; it is something that tends to naturally arise (on my model) as capabilities increase, even from systems whose inherent event/runtime loop does not directly map to an agent-like frame. You have not, so far as I can tell, engaged with this model at all; and in the absence of such engagement "EY is not a SWE" is not a persuasive counterargument but a mere ad hominem.

(Your response folded point 4 into point 3, so I will move on to point 5.)

https://www.lesswrong.com/posts/HByDKLLdaWEcA2QQD/applying-superintelligence-without-collusion [LW · GW] https://www.lesswrong.com/posts/5hApNw5f7uG8RXxGS/the-open-agency-model [LW · GW]

Thank you very much for the links! For the first post you link, the top comment [LW(p) · GW(p)] is from EY, in direct contradiction to your initial statement here:

He has ignored reasonable and buildable AGI systems proposed by Eric fucking Drexler himself, on this very site, and seems to pretend the idea doesn't exist.

Given the factual falsity of this claim, I would request that you explicitly acknowledge it as false, and retract it; and (hopefully) exercise greater moderation (and less hyperbole) in your claims about other people's behavior in the future.

In any case—setting aside the point that your initial allegation was literally false—EY's comment on that post makes [what looks to me like] a reasonably compelling argument against the core of Drexler's proposal. There follows some back-and-forth between the two (Yudkowsky and Drexler) on this point. It does not appear to me from that thread that there is anything close to a consensus that Yudkowsky was wrong and Drexler was right; both commenters received large amounts of up- and agree-votes throughout.

Given this, I think the takeaway you would like for me to derive from these posts is less clear than you would like it to be, and the obvious remedy would be to state specifically what it is you think is wrong with EY's response(s). Is it the argument you made in this comment [LW(p) · GW(p)]? If so, that seems essentially to be a restatement of your point 2, phrased interrogatively rather than declaratively—and my objection to that point can be considered to apply here as well.

This is irrational because no discount rate. Risking a nuclear war raises the pkill of millions of people now. The quadrillions of people this could 'save' may never exist because of many unknowns, hence there needs to be a large discount rate.

P(doom) is unacceptably high under the current trajectory (on EY's model). Do you think that the people who are alive today will not be counted towards the kill count of a future unaligned AGI? The value that stands to be destroyed (on EY's model) consists, not just of these quadrillions of future individuals, but each and every living human who would be killed in a (hypothetical) nuclear exchange, and then some.

You can dispute EY's model (though I would prefer you do so in more detail than you have up until now—see my replies to your other points), but disputing his conclusion based on his model (which is what you are doing here) is a dead-end line of argument: accepting that ASI presents an unacceptably high existential risk makes the relevant tradeoffs quite stark, and not at all in doubt.

(As was the case with points 4/5, point 7 was folded into point 6, and so I will move on to the final point.)

CAIS is an extension of stateless microservices, and is how all reliable software built now works. Giving the machines self modification or a long running goal is not just bad because it's AI, it's generally bad practice.

Setting aside that you (again) didn't provide a link, my current view is that Richard Ngo has provided some reasonable commentary on CAIS as an approach [LW · GW]; my own view largely accords with his on this point and so I think claiming this as the one definitive approach to end all AI safety approaches (or anything similar) is massively overconfident.

And if you don't think that—which I would hope you don't!—then I would move to asking what, exactly, you would like to convey by this point. "CAIS exists" is true, and not helpful; "CAIS seems promising to me" is perhaps a weaker but more defensible claim than the outlandish one given above, but nonetheless doesn't seem strong enough to justify your initial statement:

Alignment proposals he has described are basically are impossible, while CAIS is just straightforward engineering and we don't need to delay anything it's the default approach.

So, unfortunately, I'm left at present with a conclusion that can be summarized quite well by taking the final sentence of your great-grandparent comment, and performing a simple replacement of one name with another:

Unfortunately I have to start to conclude [Gerald Monroe] is not rational or worth paying attention to, which is ironic.

Replies from: None

↑ comment by [deleted] · 2023-04-07T16:35:43.077Z · LW(p) · GW(p)

Well argued but wrong.

At the end of the day, either robot doubling times and machinery production rates and real world chip production rates and time for robots to collect scientific data and time for compute to search the algorithm space takes decades or or does not.

At the end of the day, EY continues to internalize CAIS in future arguments or he does not. It was not a false claim, I am saying he pretends it doesn't exist now in talks about alignment he made after Drexlers post.

Either you believe in ground truth reality or you do not. I don't have the time or interest to get sucked I to a wordcel definition of words fight. Either ground truth reality supports the following claims:

EY and you continue to factor in cais, which is modern software engineering, or you don't
The worst of 4 factors: data, compute, algorithms, robotics/money takes decades to foom or it doesn't.

If ground truth reality supports 1 and 2 I am right, if it does not I am wrong. Note foom means "become strong enough to conquer the planet". Slowing down aging enough for LEV is a far lesser goal and thus your argument there is also false.

Pinning my beliefs to falsifiable things is rational.

Replies from: dxu

↑ comment by dxu · 2023-04-07T22:04:31.051Z · LW(p) · GW(p)

You continue to assert things without justification, which is fine insofar as your goal is not to persuade others. And perhaps this isn't your goal! Perhaps your goal is merely to make it clear what your beliefs are, without necessarily providing the reasoning/evidence/argumentation that would convince a neutral observer to believe the same things you do.

But in that case, you are not, in fact, licensed to act surprised, and to call others "irrational", if they fail to update to your position after merely seeing it stated. You haven't actually given anyone a reason they should update to your position, and so—if they weren't already inclined to agree with you—failing to agree with you is not "irrational", "wordcel", or whatever other pejorative you are inclined to use, but merely correct updating procedure.

So what are we left with, then? You seem to think that this sentence says something meaningful:

If ground truth reality supports 1 and 2 I am right, if it does not I am wrong.

but it is merely a tautology: "If I am right I am right, whereas if I am wrong I am wrong." If there is additional substance to this statement of yours, I currently fail to see it. This statement can be made for any set of claims whatsoever, and so to observe it being made for a particular set of claims does not, in fact, serve as evidence for that set's truth or falsity.

Of course, the above applies to your position, and also to my own, as well as to EY's and to anyone else who claims to have a position on this topic. Does this thereby imply that all of these positions are equally plausible? No, I claim—no more so than, for example, "either I win the lottery or I don't" implies a 50/50 spread on the outcome space. This, I claim, is structurally isomorphic to the sentence you emitted, and equally as invalid.

In order to argue that a particular possibility ought to be singled out as likelier than the others, requires more than just stating it and thereby privileging it with all of your probability mass [LW · GW]. You must do the actual hard work of coming up with evidence, and interpreting that evidence so as to favor your model over competing models. This is work that you have not yet done, despite being many comments deep into this thread—and is therefore substantial evidence in my view that it is work you cannot do (else you could easily win this argument—or at the very least advance it substantially—by doing just that)!

Of course, you claim you are not here to do that. Too "wordcel", or something along those lines. Well, good for you—but in that case I think the label "irrational" applies squarely to one participant in this conversation, and the name of that participant is not "Eliezer Yudkowsky".

Replies from: None

↑ comment by [deleted] · 2023-04-08T23:32:39.738Z · LW(p) · GW(p)

You've done an excellent job of arguing your points. It doesn't mean they are correct, however.

Would you agree that if you made a perfect argument against the theory of relativity (numerous contemporary physicists did) it was still a waste of time?

In this context, let's break open the object level argument. Because only the laws of physics get a vote - you don't and I don't.

The object level argument is that the worst of the below determines if foom is possible:

1. Compute. Right now there is a shortage of compute, and with a bit of rough estimating the shortage is actually pretty severe. Nvidia makes approximately 60 million GPUs per year, of which 500k-1000k are A/H100s. This is based on taking their data center revenue (source: wsj) and dividing by an estimated cost per chipset of (10k, 20k). Compute production can be increased, but the limit would be all the world's 14nm or better silicon dedicated to producing AI compute. This can be increased but it takes time.
Let's estimate how many worth of labor an AI system with access to all new compute (old compute doesn't matter due to a lack of interconnect bandwidth). If a GPT-4 instance requires a full DGX "supercompute" node, which is 8 H100s with 80 Gb of memory each, (so approximately 1T weights in fp16), how much would it require for realtime multimodal operation? Let's assume 4x the compute, which may be a gross underestimate. So 8 more cards are running at least 1 robot in real time, 8 more are processing images for vision, and 8 more for audio i/o and helper systems for longer duration memory context.

So then if all new cards are used for inference, 1m/32 = 31,250 "instances" worth of labor. Since they operate 24 hours a day this is equivalent to perhaps 100k humans? If all of the silicon Nvidia has the contract rights to build is going into H100s, this scales by about 30 times, or 3m humans. And most of those instances cannot be involved in world takeover efforts, they have to be collecting revenue for their owners. If Nvidia gets all the silicon in the world (this may happen as it can outbid everyone else) it gives them approximately another oom. Still not enough. There are bottlenecks on increasing chip production. This also also links to my next point:

2. Algorithm search space. Every search of a possible AGI design that is better than what you have requires a massive training run. Each training run occupies tens of thousands of GPUs for around 1 month, give or take. (source: llama paper, which was sub GPT-4 in perf. They needed 2048 A100s for 3 weeks for 65b). Presumably searching this space is a game of diminishing returns : to find an algorithm better than the best you currently have requires increasingly large numbers of searches and compute. Compute that can't be spent on exploiting the algorithm you have right now.

3. Robotics/money : for an AGI to actually take over, it has to redirect resources to itself. And this assumes humans don't simply use CAIS and have thousands of stateless AI systems separately handling these real world tasks. Robotics is especially problematic : you know and I know how poor the current hardware is, and there are budget cuts and layoffs in many of the cutting edge labs. The best robotics hardware company, boston dynamics, keeps getting passed around as each new owner can't find a way to make money from it. So it takes time - time to develop new robotics hardware. Time to begin mass production. Time for the new robotics produced by the first round of production to begin assisting with the manufacture of itself. Time for the equipment in the real world to begin to fail from early failures after a few thousand hours, then the design errors to be found and fixed. This puts years on the clock, likely decades. It requires humans to both build massive amounts of robotic equipment, and then put it all under the control of either insecure narrow AI task performing systems, or to stupidly centralize control to large AGIs.

4. Data. This is explained much better by https://www.lesswrong.com/posts/qpgkttrxkvGrH9BRr/superintelligence-is-not-omniscience [LW · GW] . The chaos means that in order for any system to develop new tools for novel tasks, the system needs sufficiently high quality information about the task domain or the tool building is not possible. This prevents blind nanoforge building (what EY is talking about when he says someone could bootstrap to diamondoid nanotechnology from wet protein biology) or blind species killing bioweapon construction.

"blind" means 'without sufficient direct observation and a large number of domain experiments'. Meaning I am claiming it is flat impossible to accomplish these things without a large amount of automated equipment or human lab technicians acting as paid willing stooges to investigate the domain in question and empirically develop step by step a path to a solution.

Speaking in probabilities, the above is a joint probability. Meaning that so long as the AGI system is limited by any of the 4 terms, it is throttled and unable to proceed with foom. Putting some terms on it:

90% for compute (we have direct evidence), 95% for robotics (direct evidence), 99% for data (near certain evidence), 50% for searching AGI spaces (quite possible there are findable much stronger algorithms in the search space)

So the probability of foom is the series probability of all 4, as all 4 must be their not form.

so 0.1 x 0.05 x 0.01 x 0.5 = 0.000025 pfoom

Notice that you can put quite a lot of doubt on my estimates. Say you do not believe me, and believe the chance I am right on each term is 25%.

pfoom = 0.32

Not enough to be as confident we're all dead as EY is. Though I concede a 32% chance of the end of humanity isn't super great.

As for the last part :

CAIS means you use small, isolated AI systems that tested extremely well on the training set. You check with an autoencoder how compressible to the state space of the training set the current inference set fed to the agent is for this timestamp. What you are doing is validating that it is impossible for the AI system to know that the current input is not from the training set. This both makes deception difficult and it makes the probability of success for the AI system on the current task to be similar to how it benchmarked in the training set.

If the real world input is outside that space, you transfer control to a different agent, which in the worst case is an embedded control system using classic control techniques.

This is currently how autonomous cars more or less work (depends on the team and the project).

I have several yoe actually working on embedded ML systems, and many more years on embedded controls. The above is correct. Eliezer Yudkowsky was wrong to dismiss it.

Note the Eliezer has mentioned that ML teams are going to need to find "some way" to get from - I think he estimated about an 80% chance that a GPT-3 style agent is correct on a question - to the many 9s of real world reliability.

Stateless, well isolated systems is one of the few ways human engineers know how to accomplish that. So we may get a significant amount of AI safety by default simply to meet requirements.

↑ comment by quetzal_rainbow · 2023-04-07T18:55:03.553Z · LW(p) · GW(p)

Of course, Eliezer knows about CAIS. He just thinks that it is a clever idea that has no chance to work.

It's very funny that you think AI can solve very complex problem of aging, but don't believe that AI can solve much simpler problem "kill everyone".

↑ comment by Joachim Bartosik (joachim-bartosik) · 2023-04-06T13:22:30.654Z · LW(p) · GW(p)

one is straightforwardly true. Aging is going to kill every living creature. Aging is caused by complex interactions between biological systems and bad evolved code. An agent able to analyze thousands of simultaneous interactions, cross millions of patients, and essentially decompile the bad code (by modeling all proteins/ all binding sites in a living human) is likely required to shut it off, but it is highly likely with such an agent and with such tools you can in fact save most patients from aging. A system with enough capabilities to consider all binding sites and higher level system interactions at the same (this is how a superintelligence could perform medicine without unexpected side effects) is obviously far above human level.

There are alternative mitigations to the problem:

Anti aging research
Cryonics

I agree that it's bad that most people currently alive are apparently going to die. However I think that since mitigations like that are much less risky we should pursue them rather than try to rush AGI.

Replies from: None

↑ comment by [deleted] · 2023-04-06T13:42:08.240Z · LW(p) · GW(p)

I think the odds of success (epistemic status: I went to medical school but dropped out) are low if you mean "humans without help from any system more capable than current software" are researching aging and cryonics alone.

They are both extremely difficult problems.

So the tradeoff is "everyone currently alive and probably their children" vs "future people who might exist".

I obviously lean one way but this is what the choice is between. Certain death for everyone alive (by not improving AGI capabilities) in exchange for preventing possible death for everyone alive sooner and preventing the existence of future people who may never exist no matter the timeline.

↑ comment by mukashi (adrian-arellano-davin) · 2023-04-06T01:38:42.793Z · LW(p) · GW(p)

I can't agree more with you. But this is a complicated position to maintain here in LW, and one that gives you a lot of negative karma

Replies from: None

↑ comment by [deleted] · 2023-04-06T12:35:56.772Z · LW(p) · GW(p)

Yep. I have some posts that are +10 karma -15 disagree or more.

Nobody ever defends their disagreements though...

One person did and they more or less came around to my pov.

↑ comment by Tapatakt · 2023-04-06T12:49:46.972Z · LW(p) · GW(p)

2. Depends of what you mean by "claims foom". As I understand, now EY thinks that foom isn't neccesary anyway, AGI can kill us before it.

4. "I doesn't like it" != "irrational and stupid and short sighted", you need arguments for why it isn't preferable in terms of the values of this systems

6, 7. "be ready to enforce a treaty" != "choose actions to kill billions of people living now".

Replies from: None

↑ comment by [deleted] · 2023-04-06T13:18:42.789Z · LW(p) · GW(p)

Then he needs to show how, saying int alone and no physical resources is not realistic
Because maximizers are not how sota AI is built
It works out to be similar.

↑ comment by mukashi (adrian-arellano-davin) · 2023-04-07T08:45:30.184Z · LW(p) · GW(p)

Your comment is sitting at positive karma only because I strong upvoted it. It is a good comment, but people on this site are very biased in the opposite direction. And this bias is going to drive non-doomers eventually away from this site (probably many have already left), and LW will continue descending in a spiral of non-rationality. I really wonder how people in 10 or 15 years, when we are still around in spite of powerful AGI being widespread, will rationalize that a community devoted to the development of rationality ended up being so irrational. And that was my last comment showing criticism of doomers, everytime I do it costs me a lot of karma.

Replies from: Mitchell_Porter, None

↑ comment by Mitchell_Porter · 2023-04-08T01:39:33.489Z · LW(p) · GW(p)

I wonder what you envision when you think of a world where "powerful AGI" is "widespread".

Replies from: adrian-arellano-davin

↑ comment by mukashi (adrian-arellano-davin) · 2023-04-08T02:55:13.774Z · LW(p) · GW(p)

Certainly no paperclips

Replies from: Mitchell_Porter

↑ comment by Mitchell_Porter · 2023-04-08T05:19:55.392Z · LW(p) · GW(p)

How about AIs that are off the leash of human control, making their own decisions and paying their own way in the world? Would there be any of those?

Replies from: adrian-arellano-davin

↑ comment by mukashi (adrian-arellano-davin) · 2023-04-08T07:34:06.626Z · LW(p) · GW(p)

That's a possibility

Replies from: Mitchell_Porter, Raemon

↑ comment by Mitchell_Porter · 2023-04-10T06:58:38.241Z · LW(p) · GW(p)

I finally noticed your anti-doom [LW · GW] post. Mostly you seem to be skeptical about the specific idea of the single superintelligence that rapidly bootstraps its way to control of the world. The complexity and uncertainty of real life means that a competitive pluralism will be maintained.

But even if that's so, I don't see anything in your outlook which implies that such a world will be friendly to human beings. If people are fighting for their lives under conditions of AI-empowered social Darwinism, or cowering under the umbrella of AI superpowers that are constantly chipping away at each other, I doubt many people are going to be saying, oh those foolish rationalists of the 2010s who thought it was all going to be over in an instant.

Any scenario in which AIs have autonomy, general intelligence, and a need to compete, just seems highly unstable from the perspective of all-natural unaugmented human beings remaining relevant.

Replies from: TAG, adrian-arellano-davin

↑ comment by TAG · 2023-04-10T11:08:33.972Z · LW(p) · GW(p)

Doom is doom, dystopia is dystopia.

↑ comment by mukashi (adrian-arellano-davin) · 2023-04-10T08:39:01.850Z · LW(p) · GW(p)

I guess I will break my recently self-imposed rule of not talking about this anymore.

I can certainly envision a future where multiple powerful AGIs fight against each other and are used as weapons, some might be rogue AGIs and some others might be at the service of human-controlled institutions (such as Nation Estates). To put it more clearly: I have trouble imagining a future where something along these lines DOES NOT end up happening.

But, this is NOT what Eliezer is saying. Eliezer is saying:

The Alignment problem has to be solved AT THE FIRST TRY because once you create this AGI we are dead in a matter of days (maybe weeks/months, it does not matter). If someone thinks that Eliezer is saying something else, I think they are not listening properly. Eliezer can have many flaws but lack of clarity is not one of them.

In general, I think this is a textbook example of the Motte and Baley fallacy. The Motte is: AGI can be dangerous, AGI will kill people, AGI will be very powerful. The Baley is: AGI creation means the imminent destruction of all human life and therefore we need to stop now all developments.

I never discussed the Motte. I do agree with that.

Replies from: adrian-arellano-davin

↑ comment by mukashi (adrian-arellano-davin) · 2023-04-10T08:47:42.212Z · LW(p) · GW(p)

I would certainly appreciate knowing the reason for the downvotes

Replies from: Raemon

↑ comment by Raemon · 2023-04-20T00:29:16.817Z · LW(p) · GW(p)

FYI I upvoted your most recent comment, but downvoted your previous few in this thread. Your most recent comment seemed to do a good job spelling out your position and gesturing at your crux. My guess is maybe other people were just tired of the discussion and downvoting sort of to make the whole discussion go away.

↑ comment by Raemon · 2023-04-08T19:00:53.328Z · LW(p) · GW(p)

Downvoted for the pattern of making a vague claim about LWers being biased, and then responding to followup questions with vague evasive answers with no arguments.

↑ comment by [deleted] · 2023-04-07T16:40:12.129Z · LW(p) · GW(p)

I mean I have almost 1000 total karma and am gaining over time.

The doomers would be convinced the AGIs are just waiting to betray, to "heel turn" on us.

↑ comment by evand · 2023-04-07T01:28:38.329Z · LW(p) · GW(p)

Intelligence has no upper limit, instead of diminishing sharply in relative utility

It seems to me that there is a large space of intermediate claims that I interpret the letter as falling into. Namely, that if there exists an upper limit to intelligence, or a point at which the utility diminishes enough to not be worth throwing more compute cycles at it, humans are not yet approaching that limit. Returns can diminish for a long time while still being worth pursuing.

you have NO EVIDENCE that AGI is hostile or is as capable as you claim or support for any of your claims.

"No evidence" is a very different thing from "have not yet directly observed the phenomenon in question". There is, in fact, evidence from other observations. It has not yet raised the probability to [probability 1](https://www.lesswrong.com/posts/QGkYCwyC7wTDyt3yT/0-and-1-are-not-probabilities), but there does exist such a thing as weak evidence, or strong-but-inconclusive evidence. There is evidence for this claim, and evidence for the counterclaim; we find ourselves in the position of actually needing to look at and weigh the evidence in question.

Replies from: None

↑ comment by [deleted] · 2023-04-07T16:52:13.226Z · LW(p) · GW(p)

For the first, the succinct argument for correctness is to consider the details of key barriers.

Imagine the machine is trying to convince a human it doesn't know to do something in favor of the . machine. More and more intelligence you can model as allowing the machine to consider an ever wider search space of possible hidden state for the human or messages it can emit.

But none of this does more than marginally improve the pSuccess. For this task I will claim the odds of success with human intelligence are 10 percent, and with infinite intelligence, 20 percent. It takes logarithmically more compute to approach 20 percent.

Either way the machine is probably going to fail. I am claiming there are thousands of real world tasks on the way to conquering the planet with such high pFail.

The way the machine wins is to have overwhelming force, same way you win any war. And that real world force has a bunch of barriers to obtaining.

For the second, again, debates are one thing. Taking costly action (delays, nuclear war) is another. I am saying it is irrational to take costly actions without direct evidence.

↑ comment by TAG · 2023-04-06T12:42:54.613Z · LW(p) · GW(p)

That’s fine, except, you have NO EVIDENCE that AGI is hostile or is as capable as you claim or support for any of your claims. Yes, I also agree it’s possible, but there is no evidence yet that any of this stuff works.

EY has often argued this point elsewhere, but failed to do so this time , which is a pretty bad "comms" problem when addressing a general audience.

Replies from: None

↑ comment by [deleted] · 2023-04-06T12:46:03.649Z · LW(p) · GW(p)

Yes but what valid argument exists? The possibility cloud is larger than anything he considers, and he has no evidence nature works exactly the way he claims. (note it may in fact work exactly that way)

Replies from: TAG, Tapatakt

↑ comment by TAG · 2023-04-06T13:41:13.038Z · LW(p) · GW(p)

I'm not strongly convinced by these claims either, but that's another issue.

↑ comment by Tapatakt · 2023-04-06T12:59:26.412Z · LW(p) · GW(p)

(about "hostile")

https://ui.stampy.ai?state=6982_

https://ui.stampy.ai?state=897I_

And suddenly it seems stampy has no answer for "Why inner misalignment is the default outcome". But EY said a lot about it, it's easy to find.

Replies from: None, mruwnik, Making_Philosophy_Better

↑ comment by [deleted] · 2023-04-06T13:16:21.393Z · LW(p) · GW(p)

I am well aware of these claims. They ignore other methods to construct AGI such as stateless open agency systems similar to what already exist.

↑ comment by mruwnik · 2023-04-06T13:12:47.773Z · LW(p) · GW(p)

You can add questions to stampy - if you click "I'm asking something else" it'll show you 5 unanswered questions that sound similar, which you can then bump their priority. If none of them match, click on the "None of these: Request an answer to my exact question above" for it to be added to the queue

↑ comment by Portia (Making_Philosophy_Better) · 2023-04-07T01:07:17.261Z · LW(p) · GW(p)

But these arguments essentially depend on going "If you program a computer with a few simple explicit laws, it will fail at complex ethical scenarios".

But this is not how neural nets are trained. Instead, we train them on complex scenarios. This is how humans learn ethics, too.

comment by Dr. Birdbrain · 2023-04-06T05:28:55.919Z · LW(p) · GW(p)

Something that I have observed about interacting with chatGPT is that if it makes a mistake, and you correct it, and it pushes back, it is not helpful to keep arguing with it. Basically an argument in the chat history serves as a prompt for argumentative behavior. It is better to start a new chat, and this second time attempt to explain the task in a way that avoids the initial misunderstanding.

I think it is important that as we write letters and counter-letters we keep in mind that every time we say “AI is definitely going to destroy humanity”, and this text ends up on the internet, the string “AI is definitely going to destroy humanity” very likely ends up in the training corpus of a future GPT, or at least can be seen by some future GPT that is allowed free access to the internet. All the associated media hype and podcast transcripts and interviews will likely end up in the training data as well.

The larger point is that these statistical models are in many ways mirrors of ourselves and the things we say, especially the things we say in writing and in public forums. The more we focus on the darkness, the darker these statistical mirrors become. It’s not just about Eliezer’s thoughtful point that the AI may not explicitly hate us nor love us but destroy us anyway. In some ways, every time we write about it we are increasing the training data for this possible outcome, and the more thoughtful and creative our doom scenarios, the more thoughtfully and creatively destructive our statistical parrots are likely to become.

Replies from: None

↑ comment by [deleted] · 2023-04-06T13:50:09.481Z · LW(p) · GW(p)

This is the https://www.lesswrong.com/posts/LHAJuYy453YwiKFt5/the-salt-in-pasta-water-fallacy [LW · GW]

An example of something that technically makes a difference but in practice the marginal gain is so negligible you are wasting time to even consider it.

Replies from: Dr. Birdbrain

↑ comment by Dr. Birdbrain · 2023-04-06T14:48:47.707Z · LW(p) · GW(p)

Actually I think the explicit content of the training data is a lot more important than whatever spurious artifacts may or may not hypothetically arise as a result of training. I think most of the AI doom scenarios that say “the AI might be learning to like curly wire shapes, even if these shapes are not explicitly in the training data nor loss function” are the type of scenario you just described, “something that technically makes a difference but in practice the marginal gain is so negligible you are wasting time to even consider it.“

The “accidental taste for curly wires” is a steel man position of the paperclip maximizer as I understand it. Eliezer doesn’t actually think anybody will be stupid enough to say “make as many paper clips as possible”, he worries somebody will set up the training process in some subtly incompetent way, and then aggressively lie about the fact that it likes curly wires until it is released, and it will have learned to hide from interpretability techniques.

I definitely believe alignment research is important, and I am heartened when I see high-quality, thoughtful papers on interpretability, RLHF, etc. But then I hear Eliezer worrying about absurdly convoluted scenarios of minimal probability, and I think wow, that is “something that technically makes a difference but in practice the marginal gain is so negligible you are wasting time to even consider it”, and it’s not just a waste of time, he wants to shut down the GPU clusters and cancel the greatest invention humanity ever built, all over “salt in the pasta water”.

Replies from: None

↑ comment by [deleted] · 2023-04-06T16:07:33.475Z · LW(p) · GW(p)

Was referring to "let's not post ideas in case an AGI later reads the post and decides to act on it". Either we built stable tool systems who are unable to act in that way (see CAIS) or we are probably screwed so whatever. Also even if you suppress yourself if an AGI is looking for ideas on badness it can probably derive anything necessary to solve the problem.

comment by SydneyFan (wyqtor) · 2023-04-06T10:24:27.947Z · LW(p) · GW(p)

As someone with a rare, progressive, incurable, retinal disease, as my username already might suggest, I oppose LessWrong. I strongly believe that it is nothing but a weird cult. I trust people who actually build AI systems for a living, like Andrej Karpathy and Ilya Sutskever, to make the right calls, NOT E.Yudkowsky (who hasn't built anything in his life).

Eliezer Yudkowsky’s Letter in Time Magazine

Contents

What the Letter Actually Says

The Internet Mostly Sidesteps the Important Questions

What Is a Call for Violence?

Our Words Are Backed by Nuclear Weapons

Answering Hypothetical Questions

What Do I Think About Yudkowsky’s Model of AI Risk?

What Do I Think About Eliezer’s Proposal?

What Do I Think About Eliezer’s Answers and Comms Strategies?

86 comments