[Research log] The board of Alphabet would stop DeepMind to save the world

post by Lucie Philippon (lucie-philippon) · 2024-07-16T04:59:14.874Z · LW · GW · 0 comments

Contents

  Are government interventions necessary to stop the suicide race?
  Scenario: The board of Alphabet decides to stop the race
    Would they press the button?
  Why real life will be much more risky than this scenario
  Can risks be reduced through improving the information available to Frontier Labs decision maker?
None
No comments

Produced as part of the ML Alignment & Theory Scholars Program - Summer 2024 Cohort.

This post is not a finished research. I’m unconfident in the claims I’m making here, but I thought putting it out there for feedback would help me decide what to focus on next in the program.

Are government interventions necessary to stop the suicide race?

The zeitgeist I got from the AI Safety community since I joined, seems to accept as fact that Frontier AI Labs are locked knowingly in a suicidal race towards developing transformative AI, and that any solution will need to involve strong external pressure to stop them, either in the form of an international coalition imposing regulations which shift the incentives of Labs, or even more desperate measure like doing a pivotal act.

From this point of view, it seems that AI risk is mostly driven by game theory. The economic and personal incentives faced by the stakeholders of each Frontier AI Lab determine their actions, and they will proceed this way until AGI is developed, or until a sufficiently large external force changes the incentive landscape sufficiently. Therefore, the only way to make sure Labs don’t gamble the future of the world when building an AGI is to convince governments to implement policies which shift those incentives.

I now believe that this view is wrong, and that Frontier AI Labs would get out of the race if they thought the risks were sufficiently large and the consequences sufficiently dire for the world and for themselves.

Claim: If every decision maker in a Frontier AI Lab thought they were in a suicide race [LW · GW], and that their next development would bring with near certainty the destruction of humanity, they would decide to leave the “AGI at all costs” race, no matter the actions of other actors.

Below, I present a scenario which I find plausible, in which a Frontier Lab decides to drop out of the race because of this.

Scenario: The board of Alphabet decides to stop the race

Disclaimer: Handwaving a lot of details.

Situation: Frontier AI Labs are still locked in a race to be the first to develop AGI. It is widely believed that the coming generation of models might pass the threshold. We are in crunch time. A previous unsuccessful attempt by a model to take control of key infrastructure makes the possibility of X-risk clear in everyone’s mind. The team at Google DeepMind is hard at work preparing the next training run, which they believe will be the last one.

Demis Hassabis calls for a board meeting of Alphabet, where he present the current tactical situation. All board members get convinced of the following claims:

After having received this information, the board convenes to decide how to proceed. They consider two decisions:

I expect that, in this situation, the executives of Alphabet would decide to drop out of the race, as continuing would be such a high probability of death to everyone, including the board members themselves.

Would they press the button?

I see three possible reasons why the board might want to proceed despite the risks:

I think all of those are unlikely. Am I missing some other reason why they would do it?

Why real life will be much more risky than this scenario

Even if in this scenario the board of Alphabet can be reasonably believed to take the right call and stop development, I expect that such a clear-cut vision of the consequences will never be available. I expect that various forms of imperfect information and changes to the payoff matrix will make it less likely that they would drop out of the race before it’s too late.

However, I’m interested in knowing what exactly are the factors which prevent such an ideal scenario from happening, as it could inform my priorities for reducing AI risks. I’m specifically interested in which factors prevent decision makers from having such a complete view of the situation, and which interventions beside policy could improve those decisions.

A short list of factors which I expect to cause decision-making to be less than ideal:

Can risks be reduced through improving the information available to Frontier Labs decision maker?

My research at MATS will focus on exploring how improving the access to decision relevant information for Frontier Labs stakeholders, including board members and investors, could reduce the risks of AI development.

I'm still unsure of which direction to explore specifically, so don't hesitate to give me feedback and ideas on this basic model and which kind of research questions could be impactful in this direction.

  1. ^

0 comments

Comments sorted by top scores.