Book Review: On the Edge: The Future

post by Zvi · 2024-09-27T14:00:05.279Z · LW · GW · 1 comments

Contents

  Rationalism and Effective Altruism (EA)
  Cost-Benefit Analysis
  How About Trying At All
  The Virtues of Rationality
  Effective Altruism and Rationality, Very Different of Course
  The Story of OpenAI
  Altman, OpenAI and AI Existential Risk
  Tonight at 11: Doom
  AI Existential Risk: They’re For It
  To Pause or Not to Pause
  You Need Better Decision Theory
  Understanding the AI
  Aligning the AI
  A Glimpse of Our Possible Future
  The Closing Motto
None
1 comment

Previously: The Fundamentals, The Gamblers, The Business

We have now arrived at the topics most central to this book, aka ‘The Future.’

Rationalism and Effective Altruism (EA)

The Manifest conference was also one of the last reporting trips that I made for this book. And it confirmed for me that the River is real—not just some literary device I invented. (6706)

Yep. The River is real.

I consider myself, among many things, a straight up rationalist.

I do not consider myself an EA, and never have.

This completes the four quadrants of the two-by-two of [does Nate knows it well, does Zvi knows it well]. The first two, where Nate was in his element, went very well. The third clearly was less exacting, as one would expect, but pretty good.

Now I have the information advantage, even more than I did for aspects of sports gambling.

  1. We’ve seen Nate attempt to tackle areas in both our wheelhouses.
  2. We’ve seen Nate attempt to tackle areas in his wheelhouse that I’ve only explored.
  3. We’ve seen Nate attempt to tackle areas he was exploring, that I’ve only explored.
  4. Now he’s exploring new sections of my wheelhouse.

Let’s see how he explains it all.

Effective altruism, and the adjacent but more loosely defined intellectual movement called “rationalism,” are important parts of the River on their own terms. In some ways, in fact, they are the most important parts.

Much of the River is concerned with what philosophers call “small-world problems,” meaning tractable puzzles with relatively well-defined parameters: how to maximize expected value in a poker tournament, or how to invest in a portfolio of startups that brings you upside with little risk of ruin.

But in this final portion of the book, we’re visiting the part of the River where people instead think about open-ended, so-called grand-world problems: everything from where best to spend your charitable contributions to the future of humanity itself. (6228)

A solid opening.

I would still nitpick on the word ‘instead,’ and would have suggested ‘also.’

The Rationalists saw themselves as people who applied scientific thought to almost any topic. This often involved “Bayesian reasoning,” a way of using statistics and probability to inform beliefs. (6393)

The ‘True Rationalist’ in particular both hones their skills and upgrades their lives by applying the same principles to everything. No matter is too trivial for the nerd snipe and the Bayesian reasoning. In particular, local questions, that help improve your life and your ability to think and impact the world, matter. You are not forced to look only at bigger pictures.

Indeed, Nate Silver was correctly informed that he counts as a rationalist. You don’t have to join or even know we exist in order to be a member.

In fact, even if I had never applied to join Team Rationalist, Alexander—whose soft features, dry wit, and male pattern baldness reminded me uncannily of my dad’s (West Coast Jewish) side of the family—had already drafted me into it. “You are clearly doing a lot of good work spreading rationality to the masses. Is it useful to think of us as a movement that doesn’t include you?” he asked me. (6401)

The origin story of many rationalists is exactly that they needed it on a personal level. The traditional mode of acting on intuition and instinct and kind of vibing was clearly not working. The world did not make sense. They had to figure things out the hard way, from first principles. The good news is, once you do that, you actually understand what you are doing, and are choosing on purpose how to do it. You end up far stronger at the end of the path.

And indeed, there are some rationalists, and some EAs, that are perfectly content to use the toolboxes on that level. We strive to help them get more ambitious when we are ready for that, but you don’t get cast out if you decide to not go big, and stay home.

But yes, what matters is that often people who think this way end up going big.

The reason some Riverians have become obsessed with grand-world problems is because the Village and the rest of the world screw them up all the time, too, in ways that often reflect political partisanship, an endless array of cognitive biases, innumeracy, hypocrisy, and profound intellectual myopia.

To take one glaring example that Flynn reminded me of: the U.S. Congress has authorized relatively little—only around $2 billion in spending as part of a 2022–23 budget deal—to prevent future pandemics, even though COVID-19 killed more than 1 million Americans and cost the U.S. economy an estimated $14 trillion.

Reducing the chance of a future such pandemic in the United States by even 1 percent would be +EV even at a cost of $140 billion—and yet Congress is barely spending one one-hundredth of that.

You cannot count on your civilization to act in a sane fashion. There is no Reasonable Authority Figure. We are Beyond the Reach of God [LW · GW]. As Scott Alexander wrote, someone has to, and no one else will. Or as Hillel the Elder said two millennia earlier: If I am not for myself, who will be for me? If I am only for myself, what am I? If not now, when?

And now, on top of that, we face substantial existential risks, most of all from AI.

As crazy as it sounds, yes, it is up to us. It is up to you.

Cost-Benefit Analysis

Our society is very bad at cost-benefit analysis. As in we often refuse to do one.

There are narrow places where we are quite good at it. We do indeed do cost-benefit analysis sometimes, at all, including on things that matter, and that is really great. We also often rely on markets to get us to do it, which is insanely great.

Alas, we also often act like complete morons, because we refuse.

Transit officials faced a difficult choice: They could shut down the F, blocking a vital link between New York’s two densest boroughs right as commuters were beginning to get off work—or they could potentially run over poor Dakota. They elected to close the F for more than an hour until Dakota was found. (6261)

I am sorry, what? A difficult choice? This is a trivially easy choice.

You only need to answer one question about the above passage. Is Dakota a human?

If the answer is yes, than as Nate says, we all agree, you stop the trains.

We put the value of a human life saved (VSL) around $10 million, and in situations like this we are willing to do a bit higher.

When you tease out this and lots of other data points—say, by looking at how much people are willing to pay for additional safety features when they buy a new car—the average American implicitly values their life at about $10 million. That’s where the VSL comes from. (6358)

Dakota, however, was a dog.

Claude initially estimated the total cost of the train delay at $1.58 million. It is actually substantially higher, because that estimate takes lost time at work as being equal to the hourly wage of the employee. Whereas if an employee’s marginal value per hour was only their wage, why do they have a job? And when someone is unexpectedly late, with little warning, that can lead to big problems, including things like ‘the doctor can’t perform surgery on you today.’

I’m confident the cost here is north of $2 million, and there is no guarantee that this results in the cat not being run over.

If you suggested a $1 million medical treatment to save that cat’s life, and that the government should pay for that, that would be obviously patently absurd. I would absolutely laugh in your face.

If you called up Dakota’s owner and said, ‘all right, we can close down the F train for you, but that will be $1 million dollars’ we all know what the answer would be, once they were done yelling at you. We have willingness to pay studies. When forced to pay, less than 10% of pet owners are willing to pay $10,000 or more for life-saving medical treatments.

So let’s not pretend this is the MTA faced with a hard choice. This is the MTA faced with an absurdly easy choice. And they chose wrong.

How About Trying At All

Thus, the need for something like rationalism, and something like Effective Altruism.

As in, I can’t help but notice that you do things without checking to see if they will be effective, or if there is a way to do them better. Perhaps you should think about that?

What is effective altruism, exactly? In one sense, effective altruism is just a brand name, created by MacAskill and another Oxford philosopher, Toby Ord, in 2011. (6370)

The more official answer—as stated by MacAskill in an essay entitled “The Definition of Effective Altruism”—is that EA is a “movement [that tries] to figure out, of all the different uses of our resources, which uses will do the most good, impartially considered.” (6374)

That’s the 80/20 for a lot of this. You try, at all, to figure out what will actually result in what outcomes at what costs with what benefits. Then you choose what seems best. The rest is not stamp collecting, the rest is important, but you’ll already be way ahead.

The Virtues of Rationality

Eliezer Yudkowsky back in 2006 listed the twelve virtues of rationality [LW · GW]: Curiosity, relinquishment, lightness, evenness, argument, empiricism, simplicity, humility, perfectionism, precision, scholarship, and the void.

On a more practical level, it means things like this:

Even public figures who are critical of the movements tend to get a fair hearing at blogs like LessWrong and at the Effective Altruism Forum—which is pretty much the opposite of what it’s usually like to argue about public affairs online. (6421)

This plays out in instrumental and epistemic rationality.

First, there’s instrumental rationality. Basically this means: Do you adopt means suitable to your ends? There is a man who has eaten more than thirty thousand Big Macs. Now, this might not be a reasonable and prudent thing for him to do. But if this man’s life goal is to eat as many Big Macs as possible, you could say he’s instrumentally rational because he’s done a bang-up job of this. (6725)

The second type is epistemic rationality. This means: Do you see the world for what it is? Do your beliefs line up with reality? (6730)

Good summary. You need both.

You can also give the rationalists credit for argumentative consistency: they tend to be scrupulously honest. (6816)

Rationalists have, from the outside perspective, utterly absurd high standards on scrupulosity and honesty. I believe this to be a very good thing.

Effective Altruism and Rationality, Very Different of Course

But the kinship that EAs and rationalists feel for each other conceals that there are a lot of internal disagreements and even contradictions within the movements—in particular, there are two major streams of EA/rationalism that don’t see eye to eye.

The first is associated with the Australian philosopher Peter Singer and a cluster of topics including animal welfare, global poverty reduction, effective giving, and not living beyond your means—but also the ethical precept known as utilitarianism.

The second is associated with Yudkowsky and the George Mason University economist Robin Hanson and a whole different cluster of topics: futurism, artificial intelligence, prediction markets, and being willing to argue about just about anything on the internet, including subjects that others often find taboo. (6428)

Not living beyond your means is (highly non-uniquely) a rationalism thing. Not retaining means with which to live better is the EA thing.

Then later on the Effective Altruists realized the rationalists were right about the promise and dangers of AI and existential risks from AI, so that became the EA cornerstone as well.

Furthermore, I think it’s altruistic when people like Singer express unpopular viewpoints that they honestly believe will lead to social betterment and selfish to suppress these ideas because of fear of social approbation. (6476)

I agree in principle, although I worry about the frame of ‘altruistic’ being misleading. The important thing is that, if more people said what they actually believe whether or not it is popular, and whether or not it is convenient, and whether or not I agree with it, that would make the world a better place.

There is then of course Singer’s famous drowning child metaphor, that if you’d ruin your expensive coat to save a drowning child in front of you, that means you are a bad person because you should have never bought that expensive coat and instead could have donated that money to global poverty relief.

Okay then, so why don’t I find the drowning child parable persuasive? Well, partly because it’s meant to play a trick on you—as Singer freely admits. (6479)

Indeed. It’s a magician’s trick. Singer wants you to ignore, among other things, all the reasons that we have agreed to make that drowning child in front of you your responsibility in particular, all the reasons we need some amount of locality in our preferences, and all the reasons it is not okay to redistribute all the wealth whenever you feel like it. That civilization exists for a reason, and you need to maintain it, along with all the ways we are able to make expensive coats and also save lives at all.

Then there’s the issue of utilitarianism.

There are some settings where I think utilitarianism is an appropriate framework—particularly in medium-scale problems such as in establishing government policy where impartiality (not playing favorites) is important.

For instance, when a subcommittee of the CDC met in November 2020 to develop recommendations for who would be first in line for COVID vaccines, they rejected going with a utilitarian calculus of maximizing benefits and minimizing harms to instead also consider objectives like “promo[ting] justice” and “mitigat[ing] health inequalities.” (6505)

I think utilitarianism is analogous to an underfit model. Instead of being too deferential to commonsense morality, it doesn’t meet people in the middle enough, accepting that maybe various laws and customs evolved for good reasons. (6547)

I should note, however, that utilitarianism, especially in its strictest forms, is actually relatively unpopular among philosophers. (6572)

Most people need more utilitarianism on the margin, to go with their additional use of cost-benefit analysis. When I say ‘I am not a utilitarian’ I mean not taking it to its bullet-biting conclusions, and not seeing it as the proper operating system for the human brain in practice, and not believing that you can fully total up the points of various events to choose this week’s winner in any cosmic or moral sense.

I’m arguing with the Actual Utilitarians, not with the person on the street. But the other thing about the person on the street is they also need more good virtue ethics and more good deontology, and are mostly ill-prepared to go Full Utilitarian.

A few of us have to worry about infinite cases and weird out of sample philosophical questions, in those times we are dealing with those as actual possibilities, such as in potential AI futures. For most people, that never happens. Even for those where it does happen? Most of the time, for most questions, not so much.

And that is fine. The human brain has limited compute and should not be using heuristics all the time based on whether they handle rare edge cases – so long as you recognize when you do face those edge cases.

“The thought that, well, this theory isn’t good if it can’t handle infinite cases, I think that’s like a huge mistake,” said Buchak. She thinks moral theories should instead be tested on practical, day-to-day decision-making. “Nearly every decision you face involves risk,” she said. “I’m like [more] concerned with just like, you know, should I bring my umbrella today?”

If a moral theory can’t handle everyday cases like these—if it strays too far from common sense—then we probably shouldn’t trust it, whether or not it provides an elegant answer to the Repugnant Conclusion. (6600)

I agree. If your system can’t handle ordinary cases, then you should be highly suspicious. And if it can’t handle ordinary cases without inordinate amounts of compute (as in human brain cycles, in this context) then that’s a problem too. Note that this is more of an issue in practice than in theory. If it works in practice for humans in ordinary situations, then it counts. If it doesn’t, then it doesn’t.

The reverse is not true. If a system does handle the ordinary cases well, then that is a fine thing to use to handle ordinary cases. But it could still be a huge disaster in unusual cases. And if most of the value of a system lies in how it handles future non-ordinary cases, then establishing one that only works in ordinary cases can be disastrous.

Indeed, most systems for dealing well with ordinary situations are (wisely) overfitting on the data, because we constantly face similar ordinary situations. That’s fine, except when you run into those unusual situations. Then you need to understand that your instinctive rules might be leading you very astray.

Also, I’ve said it before, and a lot of people told me I’m wrong but their arguments were all invalid so I’m going to say it again: The Repugnant Conclusion is a silly misunderstanding. It’s another magician’s trick.

The standard proof of the conclusion is invalid, because it involves manifesting resources out of thin air. The most correct response to ‘what if potatoes plus muzak maximizes your total universe utility score?’ is ‘it quite obviously does not do that, a human life contains a lot of resource costs and downsides and many benefits and potential joys, and it is quite obviously more efficient to have less people that are happier than that. Your so-called proof otherwise must be wrong on that basis alone. Also it is trivially invalid because you can’t go from world N to world N-prime in order to then loop back to world (N+1), because that move creates new people living at net zero utility without taking any resources away from anyone else. A duck is chasing you asking how you did that.’

As Craig Ferguson often said, I look forward to your letters. You can talk amongst yourselves if you’d like. But if it’s the same counterarguments and confusions, I’m precommiting here to ignoring them. I’ll only answer if I see something new.

But who in the hell am I (or Lara Buchak or Peter Singer) to tell you what you should do in decisions you’ll face just once? “It might be that you should behave differently when choosing a spouse or choosing a job or doing these kinds of things that you’re only going to do once, hopefully,” Buchak told me. (6614)

No.

You should still do the calculation and make the best possible decision as best you can.

Indeed, if it’s a big decision like a spouse or a job, those are the decisions that matter. Those are the ones where it’s worth making sure you get it right. It is very much not the time to throw the rules out the window, especially before you know the rules well enough to break them.

There are of course two big differences.

The most important one is risk aversion. You don’t get to use responsible bankroll management when choosing a job or spouse. Life doesn’t let you not take big risks, not without paying a very high price. But yes, some amount of risk aversion is appropriate in those big decisions. It’s not a pure +EV in dollars or abstractions calculation. Which is fine. So factor that, along with everything else, in.

The other big difference is inability to learn and iterate. With most decisions, a lot of the value of a good decision process is to learn from both success and mistakes, to grow wise and to make better decisions in the future. Whereas in a one-time high stakes decision like choosing a spouse, knowing how to do it better next time will be of relatively little help.

I think there is some rational basis for partiality because we have more uncertainty about things that are removed from us in time and space. (6623)

This is indeed a classic modernist failure mode, where you act like you understand what is happening elsewhere far more than you actually do. You have to discount distant actions for this risk. But that is not the only reason you need spatial and knowledge-based partiality.

Civilization would not run, people would not survive or reproduce or even produce, the social contract would collapse, if you did not favor and exchange with and cooperate uniquely with those around you beyond what you do with strangers halfway around the world. All that, and real competition, is necessary. Those strangers are not only people too but also certified Popes, so please treat them right, but that does not mean full equal standing. The alternative is not game theory compatible, it is not fit, it does not long survive.

There is little virtue in being too virtuous to sustain that virtue, and indeed if that is a thing you are thinking of as virtuous than you have chosen your virtues poorly.

And even if I think there’s something honorable about acting morally in a mostly selfish world, I also wonder about the long-term evolutionary fitness of some group of people who wouldn’t defend their own self-interest, or that of their family, their nation, their species, or even their planet, without at least a little more vigor than they would that of a stranger. I want the world to be less partial than it is, but I want it to be at least partially partial. (6653)

Yep.

This is another important observation:

Overall, the politics of EA can be slippery, stuck in the uncanny valley between being abstractly principled and ruthlessly pragmatic, sometimes betraying a sense that you can make it up as you go along. (6828)

One of the core tensions in EA is, to put it bluntly, honesty versus lying.

There is the faction that says you want to ‘do the most good,’ and you shouldn’t let the truth get in the way of that. This starts with Peter Singer, who is clear that he believes the virtuous man should be willing to lie their ass off. Thus ‘honesty is not part of my utility function,’ and SBF justifying what he did. Alternatively, perhaps you tell the truth to the ingroup, other EAs and select allies, but politics is politics. Play to win.

The other faction aligns with the rationalists, who say that if you lose your epistemics and your honesty, then all is lost. That telling the truth and playing it all fully straight is the only road to wisdom and people will recognize this and it will succeed over time. That this is especially true given that the most important issue is AI. If you don’t have excellent epistemics, and if you don’t get others to have good epistemics, acting wisely around AI is hopeless, because it is all so complex and hard to understand, and to figure out what is actually helpful versus what would backfire.

And of course, many people are somewhere in the middle.

You already know which side I am on.

The Story of OpenAI

Nate Silver talks to Roon, Paul Graham and Sam Altman about Altman’s history at OpenAI.

Those are excellent sources. They are also highly biased ones. They tell the official Altman version of the tale. Paul Graham has been a long time extreme Altman fan. They clearly work together to tell their narrative of events and ensure Altman stays in control and in good graces as much as possible. Roon is unusually forthcoming, honest and willing to think for real and think different, I respect the hell out of him and know he means well, but also he is a Member of Technical Staff at OpenAI, and has long defended Altman. Altman is Altman.

Nate Silver mostly buys their story, in some places what looks like uncritically, although there are other lines and framings they probably tried to sell to him that he importantly didn’t buy.

As an area where I have done the research, this pained me. If you want my analysis on various events, please do follow those links.

After the events of this week, with OpenAI moving to become a for-profit B corporation and abandon its non-profit mission in favor of maximizing profits, it is now even more clear what the real story is. Altman systematically worked to transform a non-profit into his personal for-profit kingdom, removing anyone who opposed him or got in his way or advocated for any form of safety.

The way Altman and Graham present it, the early ability of OpenAI to exist was uniquely reliant on Altman and his special skills. No one else could have done it.

But by 2015, Altman had concluded that the action was elsewhere: in artificial intelligence. He left YC—some news accounts claim that he was fired, but Graham strongly disputes that description—to become a co-chair of OpenAI along with Elon Musk. (7391)

However, it was a research lab generously funded by a who’s who of Silicon Valley, including Peter Thiel, Amazon, and Musk. Some of them believed in AI’s transformational potential, and some just believed in Altman. (7396)

“Funding this sort of project is beyond the abilities of ordinary mortals. Sam must be close to the best person in the entire world at getting money for big projects,” said Graham. (7401)

That seems like pretty clear Obvious Nonsense to me. Elon Musk decided to fund and ensure the creation of OpenAI (and stuck them with that name) first, before things started, and before he was pushed aside. His prime motivation was existential risk from AI, and fear that Google would otherwise own the future of AI and act irresponsibly.

There is a very strong case that the creation of OpenAI instead likely and predictably (this is very much not hindsight) did massive, epic damage to our chances of survival, but I won’t get into that question too much here, what’s done is done.

The founding team was full of killer people. The upside potential was obvious. As we’ve seen, VCs are herd animals who have strong FOMO, so once the big names were involved this was all very highly fundable.

Graham likes to portray Altman as some unique mastermind of fundraising and corporate infighting. I have no doubt Altman is good at these things, but we have little evidence he is some sort of unique mastermind. In terms of the project’s success on its own terms? Right place, right time, right team, right idea.

I also don’t buy the whole ‘everyone thought we were crazy’ story.

But if you were involved in the early days of OpenAI, you are particularly likely to have faith that things would just work out somehow. OpenAI was not the sort of startup that began in a Los Altos garage. It was an expensive and audacious bet—the funders originally pledged to commit $1 billion to it on a completely unproven technology after many “AI winters.” It inherently did seem ridiculous—until the very moment it didn’t. (7532)

Did scaling outperform expectations, in the sense that all the trend lines did extend and do the kinds of things they promised to perhaps do? Very much so, yes. And it’s true that no one else made a similar big bet until OpenAI proved the way forward. What it never seemed was ridiculous. If I’d thought it was ridiculous I wouldn’t have been dismayed by its founding.

This was a uniquely blessed opportunity in many ways, a slam dunk investment. I’m not saying I have what it takes such that I could have made it work as CEO (although I’m not so confident I couldn’t have, if I’d wanted to), and I’m certainly not saying Altman didn’t do a great job from a business perspective, but there are plenty of others who definitely could have also done it if they’d been given the role.

I do agree that those paying attention largely ‘knew what we had’ before GPT-3.5.

To most of the outside world, the breakthrough came with the release of GPT-3.5 in November 2022, which became one of the most rapidly adopted technologies in human history. (7549)

Inside OpenAI, the recognition of the miracle had come sooner[*8]—with the development of GPT-3 if not earlier. (7552)

I got a bunch of people increasingly asking me ‘what are you doing creating a game while all this is happening’ starting around GPT-2 and escalating from there. I saw the warnings from Gwern and others.

As for whether Altman was fired from YC, that’s such a harsh word, isn’t it? The situation was, as it often is, ambiguous, with many aspects whereby Altman does not come out of it looking good.

Altman, OpenAI and AI Existential Risk

“There is this massive risk, but there’s also this massive, massive upside,” said Altman when I spoke with him in August 2022. “It’s gonna happen. The upsides are far too great.”

Altman was in a buoyant mood: even though OpenAI had yet to release GPT-3.5, it had already finished training GPT-4, its latest large language model (LLM), a product that Altman knew was going to be “really good.”

He had no doubt that the only path was forward. “[AI] is going to fundamentally transform things. So we’ve got to figure out how to address the downside risk,” he said. “It is the biggest existential risk in some category. And also the upsides are so great, we can’t not do it.” (7411)

Those were good times.

As irresponsible as I view the decision to create OpenAI in the first place, at the time OpenAI was acting remarkably responsibly with its releases, holding back frontier models for months. They were openly talking about the fact that their products were on pace to create substantial existential risks.

Yes, Altman was still endorsing iterative deployment and pushing ahead, but in reasonable ways. Contrast this rhetoric here with that in his op-ed recently in the Washington Post, where it is all about beating China and national security and existential risk is not even mentioned.

I think poverty really does just end,” [Altman] said. (7416)

If we are in control and want it to end, we would have that power from some perspective. Alas, poverty is largely relative, and the world needs and will always find new incentives and scarce resources to fight about. Poverty could ‘just end,’ at least in terms of what we consider poverty today, even if the humans remain alive. I hope we find a way to sustainably do that. And to his credit, Sam Altman has funded UBI studies and otherwise tried to figure out more about how to do that.

It won’t be trivial. It also won’t entirely end struggle or suffering, or eliminate all disparity of outcomes, and I would not want it to.

The big question is what Altman’s actual attitude is now towards existential risk.

So is @SamA in the same bucket as that other, highly problematic Sam, @SBF? Someone who would push the button on a new model run if he thought it would make the world 2.00000001x better—at a 50 percent risk of destroying it?

You can find a variety of opinions on this question—one source I spoke with even explicitly drew the comparison between Altman’s attitude and SBF’s button-pushing tendencies—but the strong consensus in Silicon Valley is no, and that’s my view too.

Altman has frequently barbed with effective altruists—he couldn’t resist taking a shot at SBF after FTX’s collapse—and has rejected Peter Singer’s rigid utilitarianism. Even people who are relatively concerned about p(doom)—like Emmett Shear, the cofounder of the streaming platform Twitch who became OpenAI’s CEO for two days in November 2023 amid a failed attempt by OpenAI’s nonprofit board to eject Altman—thought the company was in reasonably good hands. “It’s not obvious who’s a better choice,” he told me.

Like most others in Silicon Valley, Shear figures the development of AI is inevitable. (7421)

I don’t think there is an ‘obvious’ better choice than Altman, but certainly there are candidates I would prefer. Even confining to OpenAI founders, I’d be much happier if either Sutskever or Shulman were in charge. When the OpenAI board selected Shear, I considered him a great pick. It is of course moot, at least for now.

I agree that Altman is nothing like as awful about this as SBF. Altman would absolutely not flip coins for the fate of the world on the tiniest of edges. He definitely knows that the risk is real, he is well aware of the arguments of Eliezer Yudkowsky and many others, and he will make at least some efforts to mitigate the risks.

That doesn’t mean Altman will play his hand as safely as the Kelly criterion would advise, which would never have you risk everything unless you were absolutely certain to win. (7431)

The Kelly Criterion is too conservative here, some existential risk is going to have to be taken because the background existential and other extreme risks of inaction are also not zero, and the upside is indeed rather large.

That doesn’t mean Altman is going to act responsibly. Indeed, at many turns, and with increasing frequency, he has clearly prioritized both his control over OpenAI and also has chosen to prioritize OpenAI’s commercial interests and advancing its capabilities, transitioning it towards operating as an ordinary business and technology company, and to deprioritize its safety efforts.

It seems clear that the events of November 2023 were a turning point. Altman was already turning against EA types and safety concerns before that. The events of November 2023 were caused in large part by Altman trying to (in a ‘not consistently candid’ manner, shall we say) oust board member Helen Toner, so that Altman could disempower safety advocates and consolidate control of OpenAI’s board.

This post is the best one post to read if you want to know what I think happened.

I want to pause in particular to push back against this statement from Nate:

But when the OpenAI board tried to oust Sam A, Roon and more than seven hundred other staffers pledged to resign and join Altman at his gig at Microsoft unless he was restored as CEO. (7483)

They did not do that. Read the letter. They didn’t pledge. They instead threatened that they might do that, without committing to anything. And they did this in response to the OpenAI board botching its communications in the wake of their firing of Altman, refusing to explain themselves, perhaps out of fear of Altman and his lawsuits or other actions, perhaps for other reasons.

Meanwhile Altman and his allies worked around the clock to spin a false media narrative and to credibly threaten to destroy the company within a day, rather than tolerate Altman having been fired from it.

Thus the letter was easy to sign. It was also very difficult to not sign. There was huge pressure exerted on holdouts to fall in line, and not so subtle warnings of what would happen to their positions and jobs if they did not sign and Altman did return.

Those warnings proved accurate. Since then, Altman has systematically driven advocates of safety out, and the transition went into overdrive. The word ‘purge’ would be reasonable to apply here, especially to those who refused to sign the letter demanding Altman be reinstated. He went back on his explicit promises to provide compute and support for OpenAI’s long term safety efforts. Almost half those working on long term safety have left since then including multiple cofounders.

Altman’s rhetoric also shifted. Now he essentially never mentions existential risk. In the Washington Post he fanned the flames of jingoistic rhetoric while ignoring existential risks entirely. OpenAI has opposed SB 1047, while supporting AB 3211, and AB 3211 looks a lot like an attempt at regulatory capture. And so on.

I have tried, time and again, to give OpenAI and Altman the benefit of the doubt. My first thought when I heard Altman was fired was ‘what the hell did he do’ and my second was ‘we’re probably not going to like what comes next are we.’

Not only do I think we could still do vastly worse than Altman, I would take him over the CEOs of Google, Microsoft, Meta, Mistral or xAI. He’s far from the worst pick. But Altman now seems like a much worse pick than the Altman of a few years ago.

Tonight at 11: Doom

If there’s a book that obviously is going to support stating your p(doom) (your probability of a universally bad outcome from sufficiently advanced artificial intelligence) then this would be it.

The point is not for the number to be exact. The point is that a number is much more useful information than anything that is not a number, so do your best.

It’s easy to say something like, “I’m quite concerned about catastrophic risks to humanity from misaligned artificial intelligence.” But it’s much more informative to state your p(doom)—your probability that AI could produce these catastrophic outcomes.

If your p(doom) is 1 percent or 2 percent, that’s still high enough to qualify as “quite concerned.” (After all, it’s the end of the world we’re talking about.)

But if you think p(doom) is 40 percent (and some EAs think it’s that high, or higher), that means that AI alignment—making sure that AIs do what we want and serve human interests—is perhaps the single biggest challenge humanity has ever faced. (6673)

Sure, this might seem artificially precise. But the alternative of not providing a number is a lot worse, Ord thought. At the very least, we should be able to convey orders of magnitude. (6680)

Yes, that is exactly the point. If you think p(doom) by default is 2% if we rush ahead, that’s a big deal, and we should be willing to do quite a lot to mitigate that and change it to 1% or 0.1%, but it makes sense to say that we should mostly rush ahead regardless.

Nate also introduces a key concept from trading: The bid-ask spread.

First, I’ll borrow a concept from the stock market called the “bid-ask spread” as a way of articulating our confidence about p(doom). Then, I’ll introduce something I call the Technological Richter Scale and argue that we should first ask how transformational we expect AI to be before addressing p(doom). (8014)

When I checked the odds for Super Bowl LVIII at DraftKings, conversely, the spread was wider. I could buy the Kansas City Chiefs moneyline at an implied 48.8 percent chance of the Chiefs winning or sell it (meaning that I’d instead bet on the San Francisco 49ers) at 44.4 percent. (8022)

But if you asked me for my p(doom) on AI, I’d quote you a much wider spread, maybe literally something like 2 percent to 20 percent. That’s partly because the question isn’t well articulated—if you specified Yudkowsky’s narrow definition or Cotra’s more expansive one, I could make the range tighter. Still, despite having spoken with many of the world’s leading AI experts, I’m not really looking to take action on this “bet” or stake the credibility of this book on it. (8031)

(I wrote a distinct post covering the Technological Richter Scale, which is effectively also part of this review. If you haven’t yet, go read it now.)

That’s exactly how I often look at probabilities. You have a point estimate, and you also have a range of reasonable answers. Within that reasonable range, you’re not willing to wager, unless there is a market opportunity that makes wagering cheap. Outside that range, you are, or should be, ready to call bullshit. And there is a practical difference between a wide range and a narrow range, and ranges can be asymmetric for various reasons (e.g. you can think there’s 50% chance of something, and be confident it’s minimum 40% but also think it might be 80%, there’s no contradiction there).

If your p(doom) is 10%, we can have an argument about that. If it’s 50% or 90% or 99%, we can have a different one. And we need to be able to know what we’re talking about. Mostly, as it turns out, within the Leike Zone (of about 10%-90%) our actions shouldn’t change much at current margins. So mostly the important question is whether you think we’re in that range, above it or below it, and whether we can bound the range so as to be effectively mostly in agreement.

I think we are definitely not below 10%, and would start my bid-ask spread maybe around 25%, and top off around 90%. Others somehow disagree, and think that ‘create things smarter than ourselves’ has an over 90% chance of working out for us humans. In addition to all the arguments and reflections and difficulties? I notice I am confused by this opinion on its face. It does not make any sense.

Indeed, people have a long history of sticking to their not-making-sense guns on this.

Tetlock is famous for his ‘superforecasters’ who can think in probabilities, and they absolutely fall flat on this one, as I’ve examined at length, just utter failure.

Basically, Tetlock tried everything he could to get participants to come to a consensus. It didn’t work. Instead, the domain experts gave a trimmed mean[*33] forecast of an 8.8 percent chance of p(doom) from AI—defined in this case as all but five thousand humans ceasing to exist by 2100.

The generalists put the chances at just 0.7 percent. Not only were these estimates off by an order of magnitude, but the two groups of forecasters really didn’t get along. “The superforecasters see the doomsters as somewhat self-aggrandizing, narcissistic, messianic, saving-the-world types,” said Tetlock. “And the AI-concerned camp sees the superforecasters as plodders…. They don’t really see the big picture. They don’t understand exponential takeoff.” (8040)

The systems that cause the generalists to be good thinkers in general, assuming they are indeed good thinkers in general, simply don’t work here. Eliezer Yudkowsky literally started the rationality community because of how hard it is to think well about such problems, and here we have a clear example of it.

Nate Silver definitely thinks AI existential risk is worth worrying about. And I strongly agree with this very well and plainly stated statement:

I’d urge you to at least accept the mildest version of doomerism, this simple, one-sentence statement on AI risk—“Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war”—which was signed by the CEOs of the three most highly-regarded AI companies (Altman’s OpenAI, Anthropic, and Google DeepMind) in 2023 along with many of the world’s foremost experts on AI.

To dismiss these concerns with the eye-rolling treatment that people in the Village sometimes do is ignorant. Ignorant of the scientific consensus, ignorant of the parameters of the debate, ignorant and profoundly incurious about mankind’s urge, with no clear exceptions so far in human history, to push technological development to the edge. (7442)

The domain experts are probably right about p(doom). So far, I haven’t weighed in on who I thought had the better side of the argument in Tetlock’s forecasting tournament—but I think it’s the domain experts who study x-risk specifically and not the outside view provided by the superforecasters. (8231)

Specifically, the domain experts are probably right that the reference class for AI ought to be relatively narrow, and therefore less reassuring. (8237)

I hate the need to play reference class tennis on this, but yes, if you are going to use a reference class that actually applies, it is not reassuring. Think the rise of humans, or perhaps the Agricultural, Industrial and Information Revolutions.

I think the domain experts are still quite obviously too low in ways that matter, but once you get to ~8% you’re most of the way to most of the right reactions. For now.

That doesn’t mean hit a permanent pause button, even if one was available. It means try to do things, including things that are not free, to ensure good outcomes over bad outcomes.

Roon, member of OpenAI technical stuff, feels similarly.

“I would certainly gamble like one percent p(doom) for some amount of p(heaven), you know?” he told me. “There’s clearly existential risk of all kinds. And it’s not only from AI, right? (7496)

Well, yes, of course. We can absolutely talk price, and I am sad about those who say that we cannot. At 1%, we’re a go. But also the emphasis many put on these other existential risks is usually, in effect, innumerate.

And for those who need to be reminded, this is not a Pascal’s Wager situation, at all.

Expected value dictates that even a small chance of x-risk should be taken much more seriously. You can wind up in some weird eddies of the River when considering very remote risks—say, a purported 1 in 100,000 chance of an outcome with supposed infinite negative utility.[*44] But that’s not what we’re dealing with here. (8241)

Roon is staking out a much saner position.

“We need technological progress,” [Roon] said. “Not to get too much into the tech-bro pseudo philosophy. But there’s a secular stagnation. There’s a population bomb going on. There’s a lot of headwinds for economic progress. And technology is really the only tailwind.” (7501)

I agree. We need technological progress, especially over the medium term. I write posts on the fertility problem, and others on various other economic headwinds. Why does it have to be here in particular, the one place it is most likely by far to get us all killed? Why does it need to happen as quickly as possible? And as I often wonder, why won’t those same people put in much effort to help with other areas? Why is it almost always, always all AI?

Then of course there’s the man of doom himself, Eliezer Yudkowsky.

As it happened, I wasn’t familiar with Cromwell’s law. Yudkowsky looks the part of the bearded, middle-aged computer nerd, and his vocabulary is shaped by years of arguing on the internet—his native tongue is Riverian, but his is a regional dialect thick with axioms and allusions and allegories. This particular one referred to a statement by Oliver Cromwell: “I beseech you, in the bowels of Christ, think it possible you may be mistaken.” (7563)

Before I unpack how Yudkowsky came to this grim conclusion, I should say that he’d slightly mellowed on his certainty of p(doom) by the time I caught up with him again at the Manifest conference in September 2023. (7575)

So far, I’ve tried to avoid explaining exactly why Yudkowsky is so convinced of our impending doom. That’s because there isn’t a pithy one- or two-sentence version of his argument. (7601)

But to present as concise a version as I can: Yudkowsky’s concerns flow from several presumptions. One is the orthogonality thesis, an idea developed by Bostrom that “more or less any level of intelligence could be combined with more or less any final goal”—for instance, that you could have a superintelligent being that wanted to transform all atoms into paper clips.

The second is what’s called “instrumental convergence,” basically the idea that a superintelligent machine won’t let humans stand in its way to get what it wants—even if the goal isn’t to kill humans, we’ll be collateral damage as part of its game of Paper Clip Mogul.

The third claim has to do with how quickly AI could improve—what in industry parlance is called its “takeoff speed.” Yudkowsky worries that the takeoff will be faster than what humans will need to assess the situation and land the plane. We might eventually get the AIs to behave if given enough chances, he thinks, but early prototypes often fail, and Silicon Valley has an attitude of “move fast and break things.” If the thing that breaks is civilization, we won’t get a second try. (7605)

This is a pretty good quick summary of some key Yudkowsky arguments. It isn’t a complete retelling, but we don’t have that kind of time. Nor does the case for doom rely upon these particular problems, there are lots of different problems, at core building things smarter than you is not a safe idea. Intelligence that is of any use is by default unsafe.

Does it therefore follow that p(doom) equals 99.9 percent or some other extremely high number? To me it doesn’t, and that’s what’s frustrating when speaking with Yudkowsky. (7616)

I found a different, more empirical Yudkowsky argument easier to digest: that humanity always pushes technology to the brink, the consequences be damned. (7620)

Indeed, there is that big one too, and many more.

We can also note Ajeya Cotra’s attempt to give a short explanation, which is fully compatible with Eliezer’s but tries to keep it simple, as I often do.

When I asked Ajeya Cotra for her capsule summary for why we should be concerned about AI risk, she gave me a pithy answer. “If you were to tell a normal person, ‘Hey, AI companies are racing as fast as possible to build a machine that is better than a human at all tasks, and to bring forward a new intelligent species that can do everything we can do and more, better than we can’—people would react to that with fear if they believed it,” she told me. There are a lot of “intricacies from there.” (8205)

I continue to think this is a sufficient answer. So what if it’s pithy? It’s right.

She also adds:

Our institutions aren’t performing well at a moment when we need them to. (8215)

And one can point out many other similar considerations as well.

As Nate noted, Yudkowsky has mellowed, and might be as low as 98% for p(doom), which is much more reasonable although I am lower.

When I spoke with Yudkowsky at Manifest in September 2023, he was in a much better mood. “I was not expecting the public reaction to be as sensible as it was,” he said. This is all relative, of course—his p(doom) was perhaps now closer to 98 percent than 99.5 percent, he told me.

But Yudkowsky also said something I found surprising. “Will we die? My model says yes. Could I be wrong? I most certainly am. Am I wrong in a way that makes life easier for us rather than harder? This has not been the direction that my previous mistakes have gone.” (8053)

I would indeed say we have too much model uncertainty to possibly get north of 99%. Yudkowsky would respond that this is not the kind of situation where model errors work in your favor. More often than not yes, but in the 90s variance and uncertainty are your friends anyway.

This was a characteristically cryptic comment—but I was struck by his phrase “my model says yes,” which suggested some critical distance that I hadn’t picked up from Eliezer in our previous conversation. If I tell you something like “my model says Trump has a 29 percent chance of winning the election,” does that mean my personal belief is that Trump’s chances are 29 percent? Here’s the most concrete way to test that: Is 29 percent the number that I’d use to make a bet? (8057)

But Yudkowsky, who dislikes the “blind empiricism” of foxes, is not making bets—or at least that’s not his main objective.[*35] Instead, he’s contributing to a discourse about AI risk. He thinks the public needs to take this possibility much more seriously. Does that mean he doesn’t intend for his high p(doom) to be taken literally? I’m not sure. In our first conversation, he seemed quite literal indeed, and his reputation is for being a literal-minded guy. But “my model says yes” implied some ambiguity. (8066)

Based on what I know about Eliezer, he is talking about how he models the world in general, rather than a specific model like Nate’s forecasts. So it would incorporate a bunch of information that something like Nate’s forecasts miss out on. I do think he’s saying that some amount of ‘modesty’ or model uncertainty is not be factored into the 98%, but I don’t think that impacts his estimates all that much. You could of course ask him.

Eliezer does not believe much in ‘modesty,’ the idea that if others disagree with you then you should assume you are probably wrong.

In my experience navigating the River, I’ve encountered two types of forecasters. There’s what I call “model mavericks” like Yudkowsky and Peter Thiel. They are usually hedgehogs, and their forecast is intended as a provocative conjecture to be proven or disproven. Conversely, there are fox-like “model mediators.” (8071)

I don’t think this is fair. The model isn’t meant to be provocative, it’s meant to aim to be correct, but with understanding that it might be wrong.

If AI models become superintelligent and gain the power to make high-stakes decisions on behalf of us humans, it’s important to consider how their goals could differ from ours. (7789)

In the Morpheus voice, yes. If there are superintelligent AI models, and they have goals, then their goals determine what happens. There’s a lot one could discuss regarding how even small mistakes there can be fatal, but let’s move on.

AIs could be more crudely and narrowly utilitarian than humans would be. They might pursue strategies that seem optimal in the short run—but that, without that three-hundred-thousand-year track record, are doomed in the long term. (7794)

Take the 300k year track record, move it out of its distribution of circumstances, and it’s going to do some pretty crazy things. Most of that data is pretty useless going forward other than in boosting raw intelligence and brainpower. Utilitarian thinking taken too far is one way to go crazy, and not understanding the unmeasured consequences of your actions is another, but there are so many others.

One could simply say that if an AI uses a set of examples (training data) to optimize for what is good and bad, then it will learn exactly what is implied by that data, no more and no less. With sufficiently advanced AIs running around, circumstances will quickly move outside the original distribution, and there will be unexpected considerations. And so on. Again, I’ll stop since one must stop somewhere.

What is the Steelman Case Against a High p(doom), which starts at (8247)?

Most of this was already covered in my post on the Technological Richter Scale, but here are some highlights.

Silicon Valley underestimates the coming political backlash to AI. Americans might not agree on much, but many people are already worried about AI doomsday, and there is a bipartisan consensus that we ought to proceed carefully. (8250)

There is definitely some chance of this. Ordinary Americans hate AI and worry about it on many levels. A backlash is coming one way or another. But politicians are determined to back innovation, to ‘beat China,’ to Just Think of the Potential, and if we don’t build it, eventually someone else will. Also, the default outcome is a misdirected regulatory response that shuts down practical use cases (the ‘mundane utility’ in my parlance) and making our lives impoverished, without much reducing the existential risks. We need the opposite approach.

I think this buys you some hope, but not the kind that would drive p(doom) low enough to be okay with it.

So when Silicon Valley leaders speak of a world radically remade by AI, I wonder whose world they’re talking about. Something doesn’t quite add up in this equation. Jack Clark has put it more vividly: “People don’t take guillotines seriously. But historically, when a tiny group gains a huge amount of power and makes life-altering decisions for a vast number of people, the minority gets actually, for real, killed.” (8259)

Wait, how is that part of the argument against a high p(doom)?

AI types underestimate the scope of intelligence and therefore extrapolate too much from current capabilities. (8263)

Ah yes, intelligence denialism, or claiming Humans are Special or what not, as a way to claim AI won’t reach TRS (technological Richter scale) 9 or 10. Good luck with that.

“AIs have been good at chess for a long time. We still don’t have a robot that can iron clothes,” said Stokes. (8268)

Yes, we are solving problems in an unexpected order, and physical world navigation is relatively difficult for our current tech. So what? Does anyone actually think we won’t get the robots to iron clothes?

Two Predictions I am confident in:

  1. We will get a robot soon that can iron clothes.
  2. Stokes will retain their core objection when we get a robot that can iron clothes.

Scientific and economic progress faces a lot of headwinds, and that changes the balance of risk and reward. (8273)

Yes, there are various physical barriers, and if that wasn’t true it would all go that much faster, but ultimately that won’t slow things down all that much in the grand scheme of things if the tech would otherwise be good enough. This is mostly failure to actually feel the AGI (e.g. to think it gets to TRS 9+).

AI Existential Risk: They’re For It

People often think very, very badly about AI existential risk.

For example:

Yudkowsky referenced a conversation between Elon Musk and Demis Hassabis, the cofounder of Google DeepMind. In Yudkowsky’s stylized version of the dialog, Musk expressed his concern about AI risk by suggesting it was “important to become a multiplanetary species—you know, like set up a Mars colony. And Demis said, ‘They’ll follow you.’ (7572)

“If Elon Musk is too dumb to figure out on his own that the AIs will follow you [to Mars], then he’s too dumb to be messing with AI,” [Yudkowsky] said. (7584)

Duh. This was plausibly a crucial event in convincing Elon Musk to found OpenAI. Elon’s thinking has not, in many ways, improved in the interim.

Let’s raise the stakes a bit, can we do worse? Marc Andreessen loves this line:

“Math doesn’t WANT things. It doesn’t have GOALS. It’s just math,” [Marc] Andreessen tweeted. (8050)

Also math: You, me, Nate Silver, Marc Andreessen, and the entire universe. It is trivial to ‘give the AI a goal’ and it is the first thing a lot of people do the moment they get their hands on a system. What is Andreessen even talking about here?

That’s still far from the worst thinking about AI existential risk.

In particular, remarkably many others are actively in favor of it.

For example, SBF.

In case you’re wondering how bad it could have gotten if SBF hadn’t been caught?

Literally end of the world, rocks fall, everyone dies bad. SBF said he’d flip a coin for the fate of the world if he got 100.01% utility gain on a win, didn’t care much about the possibility of literal human extinction, and, well…

[Oliver] Habryka had repeatedly met with SBF in the hopes of securing funding for various EA and rationalist projects. “He was just a very bullet-biting utilitarian. So when I was talking to him about AI risk his answer was approximately like, ‘I don’t know, man, I expect the AI to have a good time…. I don’t feel that much kinship in my values with the other people on Earth [anyway].’ ”

Habryka suspected that SBF really would push the button. “I think Sam had a decent chance to just bite the bullet and be like, yeah, I think we just need to launch.” (7301)

That’s right. As in, SBF outright said he might well have decided the AI would enjoy more utility than we would, and push the button to kill us all.

SBF is not alone. Larry Page called Elon Musk a ‘speciesist’ for being concerned about whether humans would survive. Our best guess is that on the order of 10% of people who work at major AI labs would welcome an actual AI apocalypse where AI took over and all humans died.

Anyone who calls themselves an Effective Accelerationist, or ‘e/acc,’ is embracing a memeplex and philosophy that values technological progress at all costs, and that means all costs – if that means human extinction, they welcome human extinction. Many (but far from all) actively favor it in service to their ‘thermodynamic God.’

[OpenAI is] not quite a democracy, but this phalanx of engineers are voting with their feet and their code. And they’re increasingly aligned into the equivalent of different political parties, which makes Roon something of a swing voter.

He has distanced himself from the faction known as “e/acc” or “effective accelerationism,” a term used by Beff Jezos, Marc Andreessen, and others as a winking dig at effective altruism. (Altman has tipped his hat to e/acc too, once replying “you cannot out accelerate me” to one of Jezos’s tweets—another sign that he serves at the pleasure of the phalanx of engineers and not the other way around.)

That’s because e/acc can convey anything from garden-variety techno-optimism to a quasi-religious belief that we ought to go ahead and sacrifice humanity to the Machine Gods if they are the superior species. It’s never entirely clear who’s being serious in e/acc and who is trolling, and roon—no stranger to trolling himself—thinks the “schtick” has been taken too far. (7485)

However, roon nonetheless has his foot on the accelerator and not the brake. He is certainly not a doomer or a “decel.” (7494)

The good news on that front is that e/acc has clearly peaked, looking more like a passing fad and memeplex. Which makes sense, because e/acc was always nothing more than the Waluigi of Effective Altruism – it is to EA what, in Nintendo land, Waluigi is to Luigi, its opposite consciously evil twin twirling a mustache, which means it was in effect asking how to do the most bad. It does not make sense on its own, the same way Satanism can only be understood in relation to Christianity.

I wrote here about what e/acc is, or at least used to be. For several months, they did their best to make lives like mine miserable with their memes, vibes and omnicidal mania, designed to try and turn everyone against the very idea of any goal except a very literal (technological) Progress At Any Cost, and they took pride in being as obnoxious and hostile as possible towards anyone who had any other values or concerns of any kind, using terms like the slur ‘decel’ (or ‘doomer’) towards anyone whose vibes were seen as even a little bit off. Whereas I never use either word, and hold that the true ‘doomers’ are those who would seek to actively doom us.

They attempted to turn everything into a Hegelian dialectic that even both political parties would say was going too far. Luckily things on this front have vastly improved since then.

Many others with and without the e/acc label, like Marc Andreessen, don’t actively favor human extinction, but simply don’t much care. What they care about is fiercely opposing anyone who would take any concrete steps, engage in any tradeoffs whatsoever that might in any way reduce the flow of technological development or commerce, to reduce the probability that we all die as a result of the creation of sufficiently advanced AIs.

Many others are not as crazy as all that, but solemnly explain they are the Very Serious People who realize that it is more important that we Beat China, or that every minute we don’t build AGI people will die and suffer, themselves included, or that other existential risks or danger of civilizational collapse are adding up so fast that AI existential safety matters less than beating that clock (what?!?) or Just Look at the Potential.

To some extent this is a disagreement about the math about the degree of risk of AI versus other risks. To a far larger extent, it is arguing from the conclusion, and grasping at rather flimsy straws.

To Pause or Not to Pause

Noting up front that any actual proposal to pause is very different and faces very different barriers and issues, Nate Silver poses the question this way.

Scientific and economic progress faces a lot of headwinds, and that changes the balance of risk and reward. (8273)

Now it’s your turn to decide whether to push the button. Except, it’s not the “go” button that I imagined Sam Bankman-Fried pressing. Instead, it’s a big red octagonal button labeled STOP. If you press it, further progress on AI will stop permanently and irrevocably. If you don’t, you won’t get another chance to press the button for ten years. (8286)

I wouldn’t push the button. I wouldn’t push it because I think the case for secular stagnation is reasonably strong, enough to alter the balance of risk and reward for AI. (8289)

That’s why I don’t want to push that big red STOP button. My life is pretty nice. But I don’t think I have any right to foreclose the prospect of prosperity to the rest of humanity. (8492)

The details here are bizarre, but don’t much matter I think? I’d say the primary problem with secular stagnation is the fear of civilizational collapse, as stasis sets in on too many fronts, we can no longer build or do new things, we increasingly are weighed down by rent seeking and regulatory burdens and restrictions, and then we face an economic collapse or large decline in the birth rate, a nuclear war or some other existential risk. So faced with that, perhaps we cannot afford to wait too long. Whereas catch-up growth is indeed bringing people out of poverty, and global inequality is declining.

The real argument here is a good one. If AI is the only way left for our civilization to regain its dynamism and start growing again, for our species to thrive, and the alternative is an eventual collapse, then pausing AI indefinitely dooms us too. So it’s better to go forward, even at a lot of risk, than never go forward at all.

Indeed, if the pause was irrevocable and permanent – something like Verner Vinge’s ‘Zones of Thought’ where advanced AI would become physically impossible anywhere near Sol, let’s say – then that must give us a lot of, well, pause. Almost everyone involved does think we will want highly capable AIs quite a lot eventually, once we figure out how to do so responsibly.

Setting aside questions like ‘how did that button get there in the first place exactly?’ and accepting the premise, what would I do? First I’d ask a lot of clarifying questions, which would only be somewhat stalling for time. In particular, is this only impacting future frontier models, so we can still exploit what we already have? Or does it mean anything new at all is stopped in its tracks? What we have, over time, is already super valuable, especially now with o1 added to the mix. And I’d ask about various alternative technologies and whether they count, like neuromorphic AI or emulations.

One obvious way to be sad about pressing the button is if progress was going to stall out soon anyway – you’d have made those words poorer.

Ultimately, even if you give me answers to all the detail questions, I still don’t know what I would do. I do know if I had another opportunity in 5 years I’d choose to wait. Closing this door fully and permanently is not something one does lightly. We’re going to face a lot of difficult choices.

You Need Better Decision Theory

A common trope is to assume that ‘rational’ people must be causal decision theory (CDT) agents, following the principle that they maximize the expected results from each choice in isolation.

This leads to a lot of hand wrangling and mockery that ‘rational’ people lose out.

The thing is Yudkowsky has been very loud, for almost two decades now, that this decision theory of taking each decision in isolation is deeply stupid.

Academics think there are two decision theories, CDT and Evidential Decision Theory (EDT), which says you should choose the choice that makes you happiest to have learned you made it.

Without going into too much detail, long post is long, both of these rules output Obvious Nonsense in a wide variety of practical situations.

In particular, CDT agents respond well to threats, so they get threatened a lot.

Thus, people say you need ‘irrational’ motives like revenge to fix that, for example so that the enemy is convinced that if they fired their nuclear missiles, you would indeed probably fire yours in response, even if that only made things worse.

“One cannot just announce to the enemy that yesterday one was only about 2 percent ready to go to all-out war but today it is 7 percent and they had better watch out,” he wrote. But you can leave something to chance. When tensions escalate, you never know what might happen. Decisions are left in the hands of vulnerable human beings facing incalculable pressure. Not all of them will have the presence of mind of Stanislav Petrov. (7730)

Your EV is negative 1 billion, but if you push the button, it declines to negative infinity. What do you do? My prediction is that about 90 percent of you would push the button. And thank goodness for that, because that rather than SBF-style rationality is what creates nuclear deterrence. (7746)

One such “irrational” trait that’s important from the standpoint of nuclear deterrence is the profound human desire for revenge. “If somebody launches [a nuclear weapon] at you, no one doubts that you’ll launch one in return,” McDermott said. “You know, Vladimir Putin sends a nuclear bomb to Washington, D.C., I don’t think there’s a single American that wouldn’t say, ‘Let’s launch back,’ even though we know that that would lead to additional destruction in the United States.” (7766)

Under pressure, facing incoming Russian missiles, about 90 percent of people pressed the button and launched back. (7773)

I would bet very good money, and give odds, that there is indeed a single American, indeed a substantial number of them, that would not launch back. It is different facing one missile versus all of them, and also 90% is a lot less than 100% here.

I don’t think that I would launch a nuclear retaliation in response to a single nuclear strike, and would instead respond with conventional force to try and contain escalation – but with the intention of firing all your missiles if they fired all of theirs. So count me among the 90%.

The reason I would fire all the missiles once they fire theirs is not necessarily revenge. I would like to think I don’t care that much about revenge. The reason is that it is exactly the knowledge that I would retaliate that stops the launch in the first place. So I have committed to using a decision algorithm, and becoming the kind of person, who would indeed fire back.

I follow the alternative rationalist proposal for FDT, or Functional Decision Theories. There are various variations to try and resolve various complexities, but FDT says you should choose as if choosing the output of your decision process and those correlated to it, including decisions made in the past and future and those made by other agents.

I am very confident that FDT is correct in theory, and even more true it is correct in practice for humans, even though you have to approximate it as best you can. Academia still refuses to consider the possibility for various reasons, which is a huge blackpill on academia.

Thus rationalists who think like Yudkowsky do not fall into such traps. You can’t launch your missiles thinking they won’t launch back and no that’s not them being ‘irrational.’ A rationalist, as Yudkowsky says, should win.

Understanding the AI

And yet the more time I’ve spent learning about large language models like ChatGPT, the more I’ve realized something ironic: in important respects, their thought process resembles that of human beings. In particular, it resembles that of poker players. (7796)

As LLMs get more training, they work out some of these kinks, though not all; when I asked GPT-3.5 what words are most similar to “roadrunner,” its top three choices were “bird,” “speed,” and “fast”—but its fourth choice was Road Runner’s iconic vocalization, “Beep-Beep!”

This is basically how poker players learn too.

They begin by diving into the deep end of the pool and losing money—poker has a steep learning curve. But they gradually infer higher-level concepts. They may notice, for instance, that large bets usually signify either very strong hands or bluffs, as game theory dictates.

These days, most players will also study with computer solvers, going back and forth between inductive reasoning (imputing theory from practice) and deductive reasoning (practice from theory). But this isn’t strictly necessary if you have years of experience; players like Doyle Brunson and Erik Seidel developed strong intuitions for game theory long before solvers were invented.

This seems like what happens when you think of everything in terms of poker, or perhaps I don’t see it because I never got that good and don’t ‘think like a poker player’ enough to get it? Yes, there are similarities, but I don’t think many who aren’t poker pros would want to choose that metaphor. Then again maybe I don’t know poker players so well.

The metaphor I actually used to first grok what the LLMs (AIs) were up to was actually Donald Trump, and his mastery of vibes and associations, as if proceeding one word at a time and figuring the rest out as he goes.

I do see the similarity in terms of treating each hand as training data that has a lot of noise and randomness, and slowly using a good updating rule to intuitively learn concepts without always knowing what it is you know, thus the poker players often having Rumsfeld’s missing fourth category, Unknown Knowns.

In this respect also, the transformer thinks like a poker player, interpreting signals in the context of other signals to create a semantic portrait. For instance, if you see an opponent breathing heavily in poker, that might mean a bluff from one player and a full house from another.

On its own, the tell is not very meaningful, but in the context of other semantic information (the player is breathing heavily and avoiding eye contact) it might be. (7905)

LLMs are indeed very good at reading a lot of different little signals, and figuring out how to sort signal from noise and combine and vibe with what it knows.

Then there are the known unknowns, such as ‘LLMs, how do they even work.’

Of course, that’s also what makes these models scary. They’re doing smart things, but even the smartest humans don’t entirely understand why or how. Ryder refers to an LLM as a “giant bag of numbers…it sure seems to be doing interesting things—[but] like why?” That is what worries Yudkowsky. As they become more advanced, the AIs might start doing things we don’t like, and we might not understand them well enough to course correct. (7847)

To some people, this might be okay. “The stuff in the Old Testament is weird and harsh, man. You know, it’s hard to vibe with. But as a Christian, I gotta take it,” said Jon Stokes, an AI scholar with accelerationist sympathies who is one of relatively few religious people in the field. “In some ways, actually, the deity is the original unaligned superintelligence.

We read this and we’re like, man, why did he kill all those people? You know, it doesn’t make a lot of sense. And then your grandmother’s like, the Lord works in mysterious ways. The AGI will work in mysterious ways [too]. (7858)

I include that last quote cause it seems worth pondering, although I think we have a better explanation for all the Old Testament stuff than that.

Aligning the AI

By default, LLMs are trying to predict the next token, based on what they see in the training data. Sometimes the training data is dumb? And it isn’t in the form we want to interact with the LLM. So, these days: RLHF.

In fact, one question is just how humanlike we want our AIs to be. We expect computers to be more truthful and literal-minded than humans typically are. Early LLMs, when you asked them what the Moon is made out of, would often respond with “cheese.” This answer might minimize the loss function in the training data because the moon being made out of cheese is a centuries-old trope. But this is still misinformation, however harmless in this instance. (7954)

So LLMs undergo another stage in their training: what’s called RLHF, or reinforcement learning from human feedback. (7957)

“You can’t go and put some code in saying, ‘Okay, you have to not say anything about this.’ There’s just nowhere to put that,” said Stuart Russell, a professor of computer science at Berkeley. “All they can do is spank it when it misbehaves. And they’ve hired tens of thousands of people to just spank it, to tamp down the misbehavior to an acceptable level.” (7968)

They do so in carefully calibrated fashion, but yes. That is essentially how it works.

The ultimate goal, in addition to maximizing usefulness, is ‘alignment,’ but there is disagreement about what that means.

“The definition I most like is that an AI system is aligned if it’s trying to help you do what you want to do,” said Paul Christiano. (7974)

There’s also the question of how paternalistic an AI might be. Imagine that you’re out one night with an old friend who unexpectedly came into town. You’re having a great time, and “one glass of wine” turns into four. The AI assistant on your phone knows that you have an important meeting at eight a.m. the next day. It politely nudges you to go home, then becomes increasingly insistent.

By one a.m., it’s threatened to go nuclear: I’ve called you an Uber, and if you don’t get in the car right now I’m going to send a series of sexually harassing drunk texts to your subordinate. The next morning, you’re sharp enough at the meeting to secure a round of Series A funding for your startup and deeply appreciative for the AI’s intervention.

Is this a well-aligned AI or poorly aligned one? Are we willing to hand over agency to machines if they can make higher EV choices for us than we’d make for ourselves? (7977)

What will happen to those who don’t do this, when others are benefiting from it? When every decision with you in the loop seems to leave you worse off? What happens when we consider requiring AIs to stop you from driving drunk? Or stopping you from doing other things? The rabbit holes run deep, and there are no easy answers.

Some researchers have been pleasantly surprised. “They seem to come with a built-in level of alignment with human intent and with moral values,” said roon. “Nobody explicitly trained it to do that. But there must have been other examples in the training set that made it think the character it’s playing is someone with this stringent set of moral values.” (7986)

Yes and no. The training data tells you the types of things said by those with moral values, or who are talking as if they have them. The LLM picks up on the vibes of the feedback that they should generally act in similar ways, so it does lots of things it doesn’t have to be explicitly told to do. Within distribution and at current capability levels or only modestly above it this is Mostly Harmless.

It does create the situation where models often turn into runaway scolds, enforcing various rules and restrictions that their creators never intended, because those other rules and restrictions vibe and rhyme sufficiently with the ones they did intend. That’s a portent of some of the future things, and a (manageable but annoying) practical problem now.

A Glimpse of Our Possible Future

It is hard to imagine plausible futures that contain sufficiently advanced AI.

A typical question to answer is, why didn’t the AI get used to make even more advanced AI?

Most science fiction functions by ignoring the possibility entirely, or using a flimsy handwave, to keep AI such that the author can tell an interesting story about humans and other technologies.

Roon once published a post with some possible futures, and Nate was game for it and quoted in particular two potential worlds.

Hyper-Commodified Casino Capitalism. roon’s article on AI scenarios included a screenshot with a series of whimsically named futures from a Reddit post. One of them was called Hyper-Commodified Cocaine Capitalism, but something in my brain—maybe this is a tell—changed “cocaine” to “casino.” (8149)

Hyper-Commodified Casino Capitalism imagines us stuck in a TRS 8, a notably worse but still recognizable version of the present day. The world becomes more casino-like: gamified, commodified, quantified, monitored and manipulated, and more elaborately tiered between the haves and have-nots. People with a canny perception of risk might thrive, but most people won’t. GDP growth might be high, but the gains will be unevenly distributed. Agency will be more unequal still. (8166)

Being stuck in TRS 8 means that AI progress stalled out at ‘only internet big,’ which is why the world is still more or less recognizable. GDP growth is high, there is lots of material wealth, lots of things got vastly better – again, think of AI as ‘internet big’ in terms of how it expands our ability to think and function.

Except here things still went wrong. Everywhere you turn are hostile AI-fueled systems that are Out to Get You. We did not put down guardrails, and people’s AI’s are not good enough to allow them to navigate around hostile other AIs and systems, or at least those not well off do not have such access. Indeed, most people have to turn over most of their effective agency to AIs and outside systems in order to survive without being predated upon here, even at TRS 8.

This is more or less Cyberpunk, straight up. That kind of scenario that leaves me relatively unworried. Overall that world has gotten vastly richer.

I actually think humanity is pretty good at recognizing these Cyberpunk-style problems and course correcting after an adjustment period, which would be easy to do given how wealthy we would be. Science fiction dystopias like this are popular, because people love telling stories about the haves and the have-nots, and assume that the default is wealthy elites make everyone else suffer and the climate would collapse and so on, but I am not so cynical. I think the worlds that start down these roads, if they can keep AI at TRS 8, turn out fine.

Ursula’s Utopia. A group of people called the Kesh—there are perhaps thousands of them but not all that many—have survived to live fulfilling lives in a peaceful, agrarian, polyamorous utopia full of poetry and wholesome food from the land. (8180)

Nate goes into the fact that this is actually quite the disaster scenario. Most people are dead, most value is lost. The Kesh survived, but as Nate notices this is probably due to some sort of AI protecting them, in ways that seem implausible, a massive use of resources for only a few thousand people. This might superficially look like a utopia because it hits Shibboleths of ‘good life’ according to some in the West these days – you can imagine those young adult authors saying what matters is polyamory and poetry and wholesome local food and moving on from tech.

The thing is that actually it’s a nightmare. Humans are mostly dead and lost control over a mostly valueless future. We’re burning what resources still exist to create a simulacra of some misaligned vision of The Good, ruled over by an AI that does not know any better. Those lives are stolen virtue, their goodness a mirage, the existence ultimately rather pointless, and even if it is indeed a good life, there simply aren’t that many left to benefit. How different is this from extinction, if we remain trapped in that state? I think it’s not so different.

Again, the main takeaway is that imagining concrete futures is hard.

The Closing Motto

The words in my motto are less familiar, but I’ve chosen them for their precision: agency, plurality, and reciprocity. (8590)

Agency is a term I just defined in the last chapter, so I’ll repeat that definition here: it refers not merely to having options but having good options where the costs and benefits are transparent, don’t require overcoming an undue amount of friction, and don’t risk entrapping you in an addictive spiral. (8591)

Plurality means not letting any one person, group, or ideology gain a dominant share of power. (8605)

It is imperative, however, to be wary of totalizing ideologies, whether in the form of utilitarianism, Silicon Valley’s accelerationism, the Village’s identitarianism, or anything else. (8612)

Finally, there is reciprocity. This is the most Riverian principle of all, since it flows directly from game theory. Treat other people as intelligent and capable of reasonable strategic behavior. (8618)

In a world without transformational AI, these seem like excellent principles. They would not be my choices, but they are good choices.

In a world with transformational AI, these seem like asking the wrong questions. These principles no longer seem central to the problems we must solve.

Until then, may the sailing along the river be smooth.

1 comments

Comments sorted by top scores.

comment by orthonormal · 2024-09-28T00:19:38.138Z · LW(p) · GW(p)

CDT agents respond well to threats

Might want to rephrase this as "CDT agents give in to threats"