International Scientific Report on the Safety of Advanced AI: Key Information

alenglander

International Scientific Report on the Safety of Advanced AI: Key Information

post by Aryeh Englander (alenglander) · 2024-05-18T01:45:10.194Z · LW · GW · 0 comments

  1 Introduction
  2 Capabilities
    2.1 How does General-Purpose AI gain its capabilities?
    2.2 What current general-purpose AI systems are capable of
    2.3 Recent trends in capabilities and their drivers
    2.4 Capability progress in coming years
  3 Methodology to assess and understand general-purpose AI systems
  4 Risks
    4.1 Malicious use risks
      4.1.1 Harm to individuals through fake content
      4.1.2 Disinformation and manipulation of public opinion
      4.1.3 Cyber offence
      4.1.4 Dual use science risks
    4.2 Risks from malfunctions
      4.2.1 Risks from product functionality issues
      4.2.2 Risks from bias and underrepresentation
      4.2.3 Loss of control
    4.3 Systemic risks
      4.3.1 Labour market risks
      4.3.2 Global AI divide
      4.3.3 Market concentration risks and single points of failure
      4.3.4 Risks to the environment
      4.3.5 Risks to privacy
      4.3.6 Copyright infringement
    4.4 Cross-cutting risk factors
      4.4.1 Cross-cutting technical risk factors
      4.4.2 Cross-cutting societal risk factors
  5 Technical approaches to mitigate risks
    5.1 Risk management and safety engineering
    5.2 Training more trustworthy models
    5.3 Monitoring and intervention
    5.4 Technical approaches to fairness and representation in general-purpose AI systems
    5.5 Privacy methods for general-purpose AI systems
None
No comments

I thought that the recently released International Scientific Report on the Safety of Advanced AI seemed like a pretty good summary of the state of the field on AI risks, in addition to being about as close to a statement of expert consensus as we're likely to get at this point. I noticed that each section of the report has a useful "Key Information" bit with a bunch of bullet points summarizing that section.

So for my own use as well as perhaps the use of others, and because I like bullet-point summaries, I've copy-pasted all the "Key Information" lists here.

1 Introduction

[Bullet points taken from the “About this report” part of the Executive Summary]

This is the interim publication of the first ‘International Scientific Report on the Safety of Advanced AI’. A diverse group of 75 artificial intelligence (AI) experts contributed to this report, including an international Expert Advisory Panel nominated by 30 countries, the European Union (EU), and the United Nations (UN).
Led by the Chair of this report, the independent experts writing this report collectively had full discretion over its content.
At a time of unprecedented progress in AI development, this first publication restricts its focus to a type of AI that has advanced particularly rapidly in recent years: General-purpose AI, or AI that can perform a wide variety of tasks. Amid rapid advancements, research on general-purpose AI is currently in a time of scientific discovery and is not yet settled science.
People around the world will only be able to enjoy general-purpose AI’s many potential benefits safely if its risks are appropriately managed. This report focuses on identifying these risks and evaluating technical methods for assessing and mitigating them. It does not aim to comprehensively assess all possible societal impacts of general-purpose AI, including its many potential benefits.
For the first time in history, this interim report brought together experts nominated by 30 countries, the EU, and the UN, and other world-leading experts, to provide a shared scientific, evidence-based foundation for discussions and decisions about general-purpose AI safety. We continue to disagree on several questions, minor and major, around general-purpose AI capabilities, risks, and risk mitigations. But we consider this project essential for improving our collective understanding of this technology and its potential risks, and for moving closer towards consensus and effective risk mitigation to ensure people can experience the potential benefits of general-purpose AI safely. The stakes are high. We look forward to continuing this effort.

2 Capabilities

2.1 How does General-Purpose AI gain its capabilities?

General-purpose AI models and systems can produce text, images, video, labels for unlabelled data, and initiate actions.
The lifecycle of general-purpose AI models and systems typically involves computationally intensive ‘pre-training’, labour-intensive ‘fine-tuning’, and continual post-deployment monitoring and updates.

There are various types of general-purpose AI. Examples of general-purpose AI models include:

Chatbot-style language models, such as GPT-4, Gemini-1.5, Claude-3, Qwen1.5, Llama-3, and Mistral Large.
Image generators, such as DALLE-3, Midjourney-5, and Stable Diffusion-3.
Video generators such as SORA.
Robotics and navigation systems, such as PaLM-E.
Predictors of various structures in molecular biology such as AlphaFold 3.

2.2 What current general-purpose AI systems are capable of

General-purpose AI capabilities are difficult to estimate reliably but most experts agree that current general-purpose AI capabilities include:
- Assisting programmers and writing short computer programs
- Engaging in fluent conversation over several turns
- Solving textbook mathematics and science problems
Most experts agree that general-purpose AI is currently not capable of tasks including:
- Performing useful robotic tasks such as household tasks
- Reliably avoiding false statements
- Developing entirely novel complex ideas
A key challenge for assessing general-purpose AI systems’ capabilities is that performance is highly context-specific. Methods that elicit improved model capabilities are sometimes discovered only after a model has been deployed, so initial capabilities might be underestimated.

2.3 Recent trends in capabilities and their drivers

In recent years, general-purpose AI capabilities have advanced rapidly according to many metrics, thanks to both increasing the resources used for training and algorithmic improvements. Per model, these are estimated to have increased:
- Compute for training: 4x/year
- Training dataset size: 2.5x/year
- Algorithmic training efficiency: 1.5x to 3x/year
- Energy used for powering computer chips during training: 3x/year
- Hardware efficiency: 1.3x/year
Using ever more compute and data to train general-purpose AI models in recent years is referred to as ‘scaling up’ models. Performance on broad metrics improves predictably with scale, and many AI researchers agree that scaling has driven most of the increase in advanced general-purpose AI capabilities in recent years. However, it is debated if this has resulted in progress on fundamental challenges such as causal reasoning.

2.4 Capability progress in coming years

The pace of future progress in general-purpose AI capabilities has important implications for managing emerging risks but experts disagree on what to expect, even in the near future. Experts variously support the possibility of general-purpose AI capabilities advancing slowly, rapidly, or extremely rapidly.
This disagreement involves a key question: Would continued ‘scaling up’ and refining existing techniques yield rapid progress, or is this approach fundamentally limited, and will unpredictable research breakthroughs be required to substantially advance general-purpose AI abilities? Those who think research breakthroughs are required often think that recent progress hasn’t overcome fundamental challenges like common sense reasoning and flexible world models.
In recent years, three main factors have driven progress in AI: scaling up the computational power ('compute') used in training; scaling up the amount of training data; and improving AI techniques and training methods.
Leading AI companies are betting on all three factors continuing to drive improvements, particularly increased compute. If recent trends continue, by the end of 2026 some generalpurpose AI models will be trained using 40x to 100x the computation of the most computeintensive models currently published, combined with around 3 to 20x more efficient techniques and training methods.
However, there are potential bottlenecks to further increasing both data and compute, including the limited availability of data, AI chip production challenges, high overall costs, and limited local energy supply. AI companies are working to overcome these bottlenecks. The pace of scaling also depends on regulations that might place constraints or conditions on AI deployment and development.

3 Methodology to assess and understand general-purpose AI systems

General-purpose AI governance approaches assume that both AI developers and policymakers can understand and measure what general-purpose AI systems are capable of, and their potential impacts.
Technical methods can help answer these questions but have limitations. Current approaches cannot provide strong assurances against large-scale general-purpose AI-related harms.
Currently, developers still understand little about how their general-purpose AI models operate. Model explanation and interpretability techniques can improve researchers’ and developers’ understanding of how general-purpose AI systems operate, but this research is nascent.
The capabilities of general-purpose AI are mainly assessed through testing the general-purpose AI on various inputs. These spot checks are helpful and necessary but do not provide quantitative guarantees. They often miss hazards, and overestimate or underestimate general-purpose AI capabilities, because test conditions differ from the real world. Many areas of concern are not fully amenable to the type of quantification that current evaluations rely on (for example, bias and misinformation).
Independent actors can, in principle, audit general-purpose AI models or systems developed by a company. However, companies do not always provide independent auditors with the necessary level of ‘white-box’ access to models or information about data and methods used, which are needed for rigorous assessment. Several governments are beginning to build capacity for conducting technical evaluations and audits.
It is difficult to assess the downstream societal impact of a general-purpose AI system because rigorous and comprehensive assessment methodologies have not yet been developed and because general-purpose AI has a wide range of possible real-world uses. Understanding the potential downstream societal impacts of general-purpose AI models and systems requires nuanced and multidisciplinary analysis. Increasing participation and representation of perspectives in the AI development and evaluation process is an ongoing technical and institutional challenge.

4 Risks

4.1 Malicious use risks

4.1.1 Harm to individuals through fake content

General-purpose AI systems can be used to increase the scale and sophistication of scams and fraud, for example through general-purpose AI-enhanced ‘phishing’ attacks.
General-purpose AI can be used to generate fake compromising content featuring individuals without their consent, posing threats to individual privacy and reputation.

4.1.2 Disinformation and manipulation of public opinion

General-purpose AI makes it possible to generate and disseminate disinformation at an unprecedented scale and with a high degree of sophistication, which could have serious implications for political processes. However, it is debated how impactful political disinformation campaigns generally are.
It can be difficult to detect disinformation generated by general-purpose AI because the outputs are increasingly realistic. Technical countermeasures, like watermarking content, are useful but can usually be circumvented by moderately sophisticated actors.

4.1.3 Cyber offence

General-purpose AI systems could uplift the cyber expertise of individuals, making it easier for malicious users to conduct effective cyber-attacks, as well as providing a tool that can be used in cyber defence. General-purpose AI systems can be used to automate and scale some types of cyber operations, such as social engineering attacks.
There is no substantial evidence yet suggesting that general-purpose AI can automate sophisticated cybersecurity tasks which could tip the balance between cyber attackers and defenders in favour of the attackers.

4.1.4 Dual use science risks

General-purpose AI systems could accelerate advances in a range of scientific endeavours, from training new scientists to enabling faster research workflows. While these capabilities could have numerous beneficial applications, some experts have expressed concern that they could be used for malicious purposes, especially if further capabilities are developed soon before appropriate countermeasures are put in place.
General-purpose AI systems for biological uses do not present a clear current threat, and future threats are hard to assess and rule out. In the biology domain, current general-purpose AI systems demonstrate growing capabilities but the limited studies available do not provide clear evidence that current systems can ‘uplift’ malicious actors to obtain biological pathogens better than using the internet. There is insufficient publicly available research to assess if near-term advances will provide this uplift, for example through trouble-shooting hands-on laboratory work.
Due to insufficient scientific work, this interim report does not assess the risks of malicious use leading to chemical, radiological, and nuclear risks.

4.2 Risks from malfunctions

4.2.1 Risks from product functionality issues

Product functionality issues occur when there is confusion or misinformation about what a general-purpose AI model or system is capable of. This can lead to unrealistic expectations and overreliance on general-purpose AI systems, potentially causing harm if a system fails to deliver on expected capabilities.
These functionality misconceptions may arise from technical difficulties in assessing an AI model's true capabilities on its own, or predicting its performance when part of a larger system. Misleading claims in advertising and communications can also contribute to these misconceptions.

4.2.2 Risks from bias and underrepresentation

The outputs and impacts of general-purpose AI systems can be biased with respect to various aspects of human identity, including race, gender, culture, age, and disability. This creates risks in high-stakes domains such as healthcare, job recruitment, and financial lending.
General-purpose AI systems are primarily trained on language and image datasets that disproportionately represent English-speaking and Western cultures, increasing the potential for harm to individuals not represented well by this data.

4.2.3 Loss of control

Ongoing AI research is seeking to develop more capable ‘general-purpose AI agents’, that is, general-purpose AI systems that can autonomously interact with the world, plan ahead, and pursue goals.
'Loss of control’ scenarios are potential future scenarios in which society can no longer meaningfully constrain some advanced general-purpose AI agents, even if it becomes clear they are causing harm. These scenarios are hypothesised to arise through a combination of social and technical factors, such as pressures to delegate decisions to general-purpose AI systems, and limitations of existing techniques used to influence the behaviours of general-purpose AI systems.
There is broad agreement among AI experts that currently known general-purpose AI systems pose no significant loss of control risk, due to their limited capabilities.
Some experts believe that loss of control scenarios are implausible, while others believe they are likely, and some consider them as low-likelihood risks that deserve consideration due to their high severity.
This expert disagreement is difficult to resolve, since there is not yet an agreed-upon methodology for assessing the likelihood of loss of control, or when the relevant AI capabilities might be developed.
If the risk of loss of control will in fact be large, then resolving this risk could require making fundamental progress on certain technical problems in AI safety. It is unclear if this progress would require many years of preparatory work.

4.3 Systemic risks

4.3.1 Labour market risks

Unlike previous waves of automation, general-purpose AI has the potential to automate a very broad range of tasks, which could have a significant effect on the labour market.
This could mean many people could lose their current jobs. However, many economists expect that potential job losses as a result of automation could be offset, partly or completely, by the creation of new jobs and by increased demand in non-automated sectors.
Labour market frictions, such as the time needed for workers to learn new skills or relocate for new jobs, could cause unemployment in the short run even if overall labour demand remained unchanged.
The expected impact of general-purpose AI on wages is ambiguous. It is likely to simultaneously increase wages in some sectors by augmenting productivity and creating new opportunities, and decrease wages in other sectors where automation reduces labour demand faster than new tasks are created.

4.3.2 Global AI divide

General-purpose AI research and development is currently concentrated in a few Western countries and China. This ‘AI Divide’ is multicausal, but in part related to limited access to computing power in low-income countries.
Access to large and expensive quantities of computing power has become a prerequisite for developing advanced general-purpose AI. This has led to a growing dominance of large technology companies in general-purpose AI development.
The AI R&D divide often overlaps with existing global socioeconomic disparities, potentially exacerbating them.

4.3.3 Market concentration risks and single points of failure

Developing state-of-the-art, general-purpose AI models requires substantial up-front investment. These very high costs create barriers to entry, disproportionately benefiting large technology companies.
Market power is concentrated among a few companies that are the only ones able to build the leading general-purpose AI models.
Widespread adoption of a few general-purpose AI models and systems by critical sectors including finance, cybersecurity, and defence creates systemic risk because any flaws, vulnerabilities, bugs, or inherent biases in the dominant general-purpose AI models and systems could cause simultaneous failures and disruptions on a broad scale across these interdependent sectors.

4.3.4 Risks to the environment

Growing compute use in general-purpose AI development and deployment has rapidly increased energy usage associated with general-purpose AI.
This trend might continue, potentially leading to strongly increasing CO2 emissions.

4.3.5 Risks to privacy

General-purpose AI models or systems can ‘leak’ information about individuals whose data was used in training. For future models trained on sensitive personal data like health or financial data, this may lead to particularly serious privacy leaks.
General-purpose AI models could enhance privacy abuse. For instance, Large Language Models might facilitate more efficient and effective search for sensitive data (for example, on internet text or in breached data leaks), and also enable users to infer sensitive information about individuals.

4.3.6 Copyright infringement

The use of large amounts of copyrighted data for training general-purpose AI models poses a challenge to traditional intellectual property laws, and to systems of consent, compensation, and control over data.
The use of copyrighted data at scale by organisations developing general-purpose AI is likely to alter incentives around creative expression.
An unclear copyright regime disincentivizes general-purpose AI developers from following best practices for data transparency.
There is very limited infrastructure for sourcing and filtering legally and ethically permissible data from the internet for training general-purpose AI models.

4.4 Cross-cutting risk factors

4.4.1 Cross-cutting technical risk factors

This section covers seven cross-cutting technical risk factors – technical factors that each contribute to many general-purpose AI risks.

General-purpose AI systems can be applied in many ways and contexts, making it hard to test and assure their trustworthiness across all realistic use-cases.
General-purpose AI developers have a highly limited understanding of how general-purpose AI models and systems function internally to achieve the capabilities they output.
General-purpose AI systems can act in accordance with unintended goals, leading to potentially harmful outputs, despite testing and mitigation efforts by AI developers.
A general-purpose AI system can be rapidly deployed to very large numbers of users, so if a faulty system is deployed at scale, resulting harm could be rapid and global.
Currently, risk assessment and evaluation methods for general-purpose AI systems are immature and can require significant effort, time, resources, and expertise.
Despite attempting to debug and diagnose, developers are not able to prevent overtly harmful behaviours across all circumstances in which general-purpose AI systems are used.
Some developers are working to create general-purpose AI systems that can act with increasing autonomy, which could increase the risks by enabling more widespread applications of general-purpose AI systems with less human oversight.

4.4.2 Cross-cutting societal risk factors

This section covers four cross-cutting societal risk factors – non-technical aspects of generalpurpose AI development and deployment that each contribute to many risks from generalpurpose AI:

AI developers competing for market share may have limited incentives to invest in mitigating risks.
As general-purpose AI advances rapidly, regulatory or enforcement efforts can struggle to keep pace.
Lack of transparency makes liability harder to determine, potentially hindering governance and enforcement.
It is very difficult to track how general-purpose AI models and systems are trained, deployed and used.

5 Technical approaches to mitigate risks

5.1 Risk management and safety engineering

Developing and incentivising systematic risk management practices for general-purpose AI is difficult. This is because current general-purpose AI is progressing rapidly, is not well-understood, and has a wide range of applications. Methodologies for assessing general-purpose AI risk are too nascent for good quantitative analysis of risk to be available.
While many other fields offer lessons for how such approaches could be developed, there are currently no well-established risk management and safety engineering practices for general-purpose AI systems.
Since no single existing method can provide full or partial guarantees of safety, a practical strategy is defence in depth – layering multiple risk mitigation measures. This is a common way to manage technological risks.
An important consideration for effective risk management of general-purpose AI is who to involve in the process in order to identify and assess high-priority risks. This can include experts from multiple domains but also representatives of impacted communities.

5.2 Training more trustworthy models

There is progress in training general-purpose AI systems to function more safely, but there is currently no approach that can ensure that general-purpose AI systems will be harmless in all circumstances.
Companies have proposed strategies to train general-purpose AI systems to be more helpful and harmless: however, the viability and reliability of these approaches for such advanced systems remains limited.
Current techniques for aligning the behaviour of general-purpose AI systems with developer intentions rely heavily on data from humans such as human feedback. This makes them subject to human error and bias. Increasing the quantity and quality of this feedback is an avenue for improvement.
Developers train models to be more robust to inputs that are designed to make them fail (‘adversarial training’). Despite this, adversaries can typically find alternative inputs that reduce the effectiveness of safeguards with low to moderate effort.
Limiting a general-purpose AI system’s capabilities to a specific use case can help to reduce risks from unforeseen failures or malicious use.
Researchers are beginning to learn to analyse the inner workings of general-purpose AI models. Progress in this area could help developers understand and edit general-purpose AI model functionality more reliably.
Researchers are exploring how to obtain AI systems that are safe by design or provably safe, although many open problems remain to scale these methods to general-purpose AI systems.

5.3 Monitoring and intervention

There are several techniques for identifying general-purpose AI system risks, inspecting general-purpose AI model actions, and evaluating performance once a general-purpose AI model has been deployed. These practices are often referred to as ‘monitoring’. Meanwhile, ‘interventions’ refers to techniques that prevent harmful actions from general-purpose AI models.
Techniques which are being developed to explain general-purpose AI actions could be used to detect and then intervene to block a risky action. However, the application of these techniques to general-purpose AI systems is still nascent.
Techniques for detecting and watermarking general-purpose AI-generated content can help to avoid some harmful uses of generative general-purpose AI systems by unsophisticated users. However, these techniques are imperfect and can be circumvented by moderately skilled users.
Techniques for identifying unusual behaviours from general-purpose AI systems can enable improved oversight and interventions.
Having humans in the loop, and other checks before and during the deployment of general-purpose AI systems increase oversight and provide multiple layers of defence against failures. However, such measures can slow down general-purpose AI system outputs, may compromise privacy and could conflict with the economic incentives for companies that use general-purpose AI systems.

5.4 Technical approaches to fairness and representation in general-purpose AI systems

General-purpose AI models can capture and, at times, amplify biases in their training data. This contributes to unequal resource allocation, inadequate representation, and discriminatory decisions.
Fairness lacks a universally agreed-upon definition with variations across cultural, social, and disciplinary contexts.
From a technical perspective, the cause of bias is often the data, which may fail to adequately represent minorities of a target population. Bias can also stem from poor system design or the type of general-purpose AI technique used. These choices depend on the involvement of diverse perspectives throughout the general-purpose AI lifecycle.
Mitigation of bias should be addressed throughout the lifecycle of the general-purpose AI system, including design, training, deployment, and usage.
It is very challenging to entirely prevent bias occurring in current general-purpose AI systems because it requires systematic training data collection, ongoing evaluation, and effective identification of bias, trading off fairness with other objectives such as accuracy, and deciding what is useful knowledge and what is an undesirable bias that should not be reflected in the outputs.
There are differing views about how feasible it is to achieve meaningful fairness in generalpurpose AI systems. Some argue that it is impossible for a general-purpose AI system to be completely ‘fair’, while others think that from a practical perspective, near-complete fairness is achievable.

5.5 Privacy methods for general-purpose AI systems

General-purpose AI systems present a number of risks to people’s privacy, such as loss of data confidentiality, transparency and control over how data is used, and new forms of privacy abuse.
Privacy protection is an active area of research and development. However, existing technical tools struggle to scale to large general-purpose AI models, and can fail to provide users with meaningful control.

0 comments

Comments sorted by top scores.

International Scientific Report on the Safety of Advanced AI: Key Information

Contents

1 Introduction

2 Capabilities

2.1 How does General-Purpose AI gain its capabilities?

2.2 What current general-purpose AI systems are capable of

2.3 Recent trends in capabilities and their drivers

2.4 Capability progress in coming years

3 Methodology to assess and understand general-purpose AI systems

4 Risks

4.1 Malicious use risks

4.1.1 Harm to individuals through fake content

4.1.2 Disinformation and manipulation of public opinion

4.1.3 Cyber offence

4.1.4 Dual use science risks

4.2 Risks from malfunctions

4.2.1 Risks from product functionality issues

4.2.2 Risks from bias and underrepresentation

4.2.3 Loss of control

4.3 Systemic risks

4.3.1 Labour market risks

4.3.2 Global AI divide

4.3.3 Market concentration risks and single points of failure

4.3.4 Risks to the environment

4.3.5 Risks to privacy

4.3.6 Copyright infringement

4.4 Cross-cutting risk factors

4.4.1 Cross-cutting technical risk factors

4.4.2 Cross-cutting societal risk factors

5 Technical approaches to mitigate risks

5.1 Risk management and safety engineering

5.2 Training more trustworthy models

5.3 Monitoring and intervention

5.4 Technical approaches to fairness and representation in general-purpose AI systems

5.5 Privacy methods for general-purpose AI systems

0 comments