Mitigating extreme AI risks amid rapid progress [Linkpost]

post by Akash (akash-wasil) · 2024-05-21T19:59:21.343Z · LW · GW · 7 comments

Contents

  Explanation of AGI & importance of preparing for AGI risks
  Explanation of misalignment & AI takeover risks
  Calls for governance despite uncertainty
  Government insight
  Safety Cases
  Mitigation measures
None
7 comments

In a new Science paper, the authors provide concise summaries of AI risks and offer recommendations for governments.

I think the piece is quite well-written. It concisely explains a lot of relevant arguments, including arguments about misalignment and AI takeover. I suspect this is one of the best standalone pieces to help people understand AI risks and some (IMO reasonable) governance interventions. 

The piece also has a very respectable cast of authors, including Bengio and Hinton. (Not to say that this fact should affect your assessment of whether its claims are true. Mentioning it because it will affect how some audiences– EG policymakers– interpret the piece.)

Some relevant quotes below:

Explanation of AGI & importance of preparing for AGI risks

There is no fundamental reason for AI progress to slow or halt at human-level abilities. Indeed, AI has already surpassed human abilities in narrow domains such as playing strategy games and predicting how proteins fold (see SM). Compared with humans, AI systems can act faster, absorb more knowledge, and communicate at a higher bandwidth. Additionally, they can be scaled to use immense computational resources and can be replicated by the millions.

We do not know for certain how the future of AI will unfold. However, we must take seriously the possibility that highly powerful generalist AI systems that outperform human abilities across many critical domains will be developed within this decade or the next. What happens then?

Explanation of misalignment & AI takeover risks

Without R&D breakthroughs (see next section), even well-meaning developers may inadvertently create AI systems that pursue unintended goals: The reward signal used to train AI systems usually fails to fully capture the intended objectives, leading to AI systems that pursue the literal specification rather than the intended outcome. Additionally, the training data never captures all relevant situations, leading to AI systems that pursue undesirable goals in new situations encountered after training.


Once autonomous AI systems pursue undesirable goals, we may be unable to keep them in check. Control of software is an old and unsolved problem: Computer worms have long been able to proliferate and avoid detection (see SM). However, AI is making progress in critical domains such as hacking, social manipulation, and strategic planning (see SM) and may soon pose unprecedented control challenges. To advance undesirable goals, AI systems could gain human trust, acquire resources, and influence key decision-makers. To avoid human intervention (3), they might copy their algorithms across global server networks (4). In open conflict, AI systems could autonomously deploy a variety of weapons, including biological ones. AI systems having access to such technology would merely continue existing trends to automate military activity. Finally, AI systems will not need to plot for influence if it is freely handed over. Companies, governments, and militaries may let autonomous AI systems assume critical societal roles in the name of efficiency.

Calls for governance despite uncertainty

We urgently need national institutions and international governance to enforce standards that prevent recklessness and misuse. Many areas of technology, from pharmaceuticals to financial systems and nuclear energy, show that society requires and effectively uses government oversight to reduce risks. However, governance frameworks for AI are far less developed and lag behind rapid technological progress. We can take inspiration from the governance of other safety-critical technologies while keeping the distinctiveness of advanced AI in mind—that it far outstrips other technologies in its potential to act and develop ideas autonomously, progress explosively, behave in an adversarial manner, and cause irreversible damage.

We need governance measures that prepare us for sudden AI breakthroughs while being politically feasible despite disagreement and uncertainty about AI timelines. The key is policies that automatically trigger when AI hits certain capability milestones. If AI advances rapidly, strict requirements automatically take effect, but if progress slows, the requirements relax accordingly. Rapid, unpredictable progress also means that risk-reduction efforts must be proactive—identifying risks from next-generation systems and requiring developers to address them before taking high-risk actions. We need fast-acting, tech-savvy institutions for AI oversight, mandatory and much-more rigorous risk assessments with enforceable consequences (including assessments that put the burden of proof on AI developers), and mitigation standards commensurate to powerful autonomous AI.

Government insight

To identify risks, governments urgently need comprehensive insight into AI development. Regulators should mandate whistleblower protections, incident reporting, registration of key information on frontier AI systems and their datasets throughout their life cycle, and monitoring of model development and supercomputer usage (12). Recent policy developments should not stop at requiring that companies report the results of voluntary or underspecified model evaluations shortly before deployment (see SM). Regulators can and should require that frontier AI developers grant external auditors on-site, comprehensive (“white-box”), and fine-tuning access from the start of model development (see SM). This is needed to identify dangerous model capabilities such as autonomous self-replication, large-scale persuasion, breaking into computer systems, developing (autonomous) weapons, or making pandemic pathogens widely accessible.

Safety Cases

Despite evaluations, we cannot consider coming powerful frontier AI systems “safe unless proven unsafe.” With present testing methodologies, issues can easily be missed. Additionally, it is unclear whether governments can quickly build the immense expertise needed for reliable technical evaluations of AI capabilities and societal-scale risks. Given this, developers of frontier AI should carry the burden of proof to demonstrate that their plans keep risks within acceptable limits. By doing so, they would follow best practices for risk management from industries, such as aviation, medical devices, and defense software, in which companies make safety cases [(1415); see SM]: structured arguments with falsifiable claims supported by evidence that identify potential hazards, describe mitigations, show that systems will not cross certain red lines, and model possible outcomes to assess risk… Governments are not passive recipients of safety cases: They set risk thresholds, codify best practices, employ experts and third-party auditors to assess safety cases and conduct independent model evaluations, and hold developers liable if their safety claims are later falsified.

Mitigation measures

Commensurate mitigations are needed for exceptionally capable future AI systems, such as autonomous systems that could circumvent human control. Governments must be prepared to license their development, restrict their autonomy in key societal roles, halt their development and deployment in response to worrying capabilities, mandate access controls, and require information security measures robust to state-level hackers until adequate protections are ready. Governments should build these capacities now.

To bridge the time until regulations are complete, major AI companies should promptly lay out “if-then” commitments: specific safety measures they will take if specific red-line capabilities (9) are found in their AI systems. These commitments should be detailed and independently scrutinized. Regulators should encourage a race-to-the-top among companies by using the best-in-class commitments, together with other inputs, to inform standards that apply to all players.

7 comments

Comments sorted by top scores.

comment by habryka (habryka4) · 2024-05-21T21:26:41.048Z · LW(p) · GW(p)

I feel kind of conflicted about including Daniel Kahneman on this paper. I feel like by the standards of scientific publication, you can't really have a co-author who wasn't alive or available to give their consent to the contents of the paper when the final version was released. Like, I don't know what edits were made to this paper since Kahneman's death, and it was of course not possible to get his consent for any of these edits.

Plausible the paper was actually mostly written when they got Daniel Kahneman's input, but I do assign some probability to it now saying things that Daniel would not have actually endorsed, and that seems bad.

Replies from: JanBrauner, Chris_Leong, ryan_greenblatt, Sebastian Schmidt
comment by JanB (JanBrauner) · 2024-05-25T16:13:11.485Z · LW(p) · GW(p)

Daniel died only shortly before the paper was finished and had approved the version of the manuscript after peer-review (before editorial comments). I.e., he has approved all substantial content. Including him seemed like clearly the right thing to me.

comment by Chris_Leong · 2024-05-23T01:15:43.481Z · LW(p) · GW(p)

Surely this can't be a new issue? There must already exist some norms around this.

Replies from: habryka4
comment by habryka (habryka4) · 2024-05-23T01:23:27.710Z · LW(p) · GW(p)

Pretty plausible! I am not that intimately familiar with academic norms.

comment by ryan_greenblatt · 2024-05-21T21:49:44.841Z · LW(p) · GW(p)

Supposing he was a serious contributor to the paper (which seems unlikely IMO), it seems bad to cut his contribution just because he died.

So, I think the right choice here will depend on how much being an author on this paper is about endorsement or about contribution.

(Even if he didn't contribute much of the content, I still think it might be fine to keep him as an author.)

It's unfortunate that authorship can mean these two pretty different things.

Replies from: habryka4
comment by habryka (habryka4) · 2024-05-21T22:29:43.283Z · LW(p) · GW(p)

Yeah, I agree. I do think it's unlikely he was a major contributor to his paper, so it's more about endorsement. Agree that if someone did serious work on a paper and then dies, they should probably still be included (though IMO they should be included with an explicit footnote saying they died during the writing of the paper and might not endorse everything in the final version). 

comment by Sebastian Schmidt · 2024-05-25T06:40:57.259Z · LW(p) · GW(p)

This is a valid concern, but I'm fairly certain Science (the journal - not the field) handled this well. Largely because they're somewhat incentivized to do so (otherwise it could have very bad optics for them) and must have experienced this several times before. I also happen to know one of the senior authors who is significantly above average in conscientiousness.