News : Biden-Harris Administration Secures Voluntary Commitments from Leading Artificial Intelligence Companies to Manage the Risks Posed by AI
post by Jonathan Claybrough (lelapin) · 2023-07-21T18:00:57.016Z · LW · GW · 10 commentsContents
Building Systems that Put Security First Earning the Public’s Trust None 10 comments
Here are the extracts naming the companies and specific commitments :
As part of this commitment, President Biden is convening seven leading AI companies at the White House today – Amazon, Anthropic, Google, Inflection, Meta, Microsoft, and OpenAI – to announce that the Biden-Harris Administration has secured voluntary commitments from these companies to help move toward safe, secure, and transparent development of AI technology.
Today, these seven leading AI companies are committing to:
Ensuring Products are Safe Before Introducing Them to the Public
- The companies commit to internal and external security testing of their AI systems before their release. This testing, which will be carried out in part by independent experts, guards against some of the most significant sources of AI risks, such as biosecurity and cybersecurity, as well as its broader societal effects.
- The companies commit to sharing information across the industry and with governments, civil society, and academia on managing AI risks. This includes best practices for safety, information on attempts to circumvent safeguards, and technical collaboration.
Building Systems that Put Security First
- The companies commit to investing in cybersecurity and insider threat safeguards to protect proprietary and unreleased model weights. These model weights are the most essential part of an AI system, and the companies agree that it is vital that the model weights be released only when intended and when security risks are considered.
- The companies commit to facilitating third-party discovery and reporting of vulnerabilities in their AI systems. Some issues may persist even after an AI system is released and a robust reporting mechanism enables them to be found and fixed quickly.
Earning the Public’s Trust
- The companies commit to developing robust technical mechanisms to ensure that users know when content is AI generated, such as a watermarking system. This action enables creativity with AI to flourish but reduces the dangers of fraud and deception.
- The companies commit to publicly reporting their AI systems’ capabilities, limitations, and areas of appropriate and inappropriate use. This report will cover both security risks and societal risks, such as the effects on fairness and bias.
- The companies commit to prioritizing research on the societal risks that AI systems can pose, including on avoiding harmful bias and discrimination, and protecting privacy. The track record of AI shows the insidiousness and prevalence of these dangers, and the companies commit to rolling out AI that mitigates them.
- The companies commit to develop and deploy advanced AI systems to help address society’s greatest challenges. From cancer prevention to mitigating climate change to so much in between, AI—if properly managed—can contribute enormously to the prosperity, equality, and security of all.
As we advance this agenda at home, the Administration will work with allies and partners to establish a strong international framework to govern the development and use of AI. It has already consulted on the voluntary commitments with Australia, Brazil, Canada, Chile, France, Germany, India, Israel, Italy, Japan, Kenya, Mexico, the Netherlands, New Zealand, Nigeria, the Philippines, Singapore, South Korea, the UAE, and the UK. The United States seeks to ensure that these commitments support and complement Japan’s leadership of the G-7 Hiroshima Process—as a critical forum for developing shared principles for the governance of AI—as well as the United Kingdom’s leadership in hosting a Summit on AI Safety, and India’s leadership as Chair of the Global Partnership on AI. We also are discussing AI with the UN and Member States in various UN fora.
10 comments
Comments sorted by top scores.
comment by habryka (habryka4) · 2023-07-22T00:22:51.398Z · LW(p) · GW(p)
This overall seems like good news, with the exception of commitment 8) in the associated fact sheet:
8) Develop and deploy frontier AI systems to help address society’s greatest challenges
Companies making this commitment agree to support research and development of frontier AI systems that can help meet society’s greatest challenges, such as climate change mitigation and adaptation, early cancer detection and prevention, and combating cyber threats. Companies also commit to supporting initiatives that foster the education and training of students and workers to prosper from the benefits of AI, and to helping citizens understand the nature, capabilities, limitations, and impact of the technology.
This seems like a proactive commitment to build frontier models, which seems like to me the most relevant dimension (I don't think the safety commitments will make a huge difference in whether systems of a certain capability level will actually kill everyone), and seems to actively commit these companies to develop stuff in the space.
My guess is it doesn't make a huge difference, since the other commitments are fuzzy enough that any company that would want to slow down, would be able to slow down by saying they needed to in order to meet the other 7 commitments, but it still feels to me that the thing I most want to see are public commitments to slow down, and this does feel like a missed opportunity to do that.
Replies from: Zach Stein-Perlman↑ comment by Zach Stein-Perlman · 2023-07-22T02:42:21.851Z · LW(p) · GW(p)
I largely agree. But note that
frontier AI systems that can help meet society’s greatest challenges, such as climate change mitigation and adaptation, early cancer detection and prevention, and combating cyber threats
doesn't necessarily mean scary general-agents. E.g. "early cancer detection and prevention" is clearly non-scary, and in DeepMind's recent blogpost on AI for climate change mitigation examples are weather forecasting, animal behavior forecasting, and energy efficiency.
comment by TurnTrout · 2023-07-21T19:57:44.584Z · LW(p) · GW(p)
From OpenAI's post:
In designing the regime, [the companies] will ensure that they give significant attention to the following:
- ...
- The capacity for models to make copies of themselves or “self-replicate”
This seems like a big deal, in particular because it seems to foreshadow regulation:
Companies intend these voluntary commitments to remain in effect until regulations covering substantially the same issues come into force.
comment by jbash · 2023-07-22T00:53:27.383Z · LW(p) · GW(p)
I'm sorry, but I don't see anything in there that meaningfully reduces my chances of being paperclipped. Not even if they were followed universally.
I don't even see much that really reduces the chances of people (smart enough to act on them) getting bomb-making instructions almost as good as the ones freely available today, or of systems producing words or pictures that might hurt people emotionally (unless they get paperclipped first).
I do notice a lot of things that sound convenient for the commercial interests and business models of the people who were there to negotiate the list. And I notice that the list is pretty much a license to blast ahead on increasing capability, without any restrictions on how you get there. Including a provision that basically cheerleads for building anything at all that might be good for something.
There's really only one concrete action in there involving the models themselves. The White House calls it "testing", but OpenAI mutates it into "red-teaming", which narrows it quite a bit. Not that anybody has any idea how to test any of this using any approach. And testing is NOT how you create secure, correct, or not-everyone-killing software. The stuff under the heading of "Building Systems that Put Security First"... isn't. It's about building an arbitrarily dangerous system and trying to put walls around it.
Replies from: None↑ comment by [deleted] · 2023-07-22T02:33:05.290Z · LW(p) · GW(p)
Just to organize this:
Summary: every point hugely advantages a small concentration of wealthy AI companies, and once these become legal requirements it will entrench them indefinitely. And it in no ways slows capabilities, in fact it seems to be implicitly giving permission to push them as far as possible.
- The companies commit to internal and external security testing of their AI systems before their release. This testing, which will be carried out in part by independent experts, guards against some of the most significant sources of AI risks, such as biosecurity and cybersecurity, as well as its broader societal effects.
Effect on rich companies: they need to do extensive testing to deliver competitive products. Capabilities go hand in hand with reliability, every tool humanity uses that is capable is highly reliable.
Effect on poor companies : the testing burden prevents them from being able to gain early revenue on a shoddy product, preventing them from competing at all.
Effect on advancing capabilities : minimal
- The companies commit to sharing information across the industry and with governments, civil society, and academia on managing AI risks. This includes best practices for safety, information on attempts to circumvent safeguards, and technical collaboration.
Effect on rich companies: they need to pay for another group of staff/internal automation tools to deliver these information sharing reports, carefully scripted to look good/not reveal more than the legal minimum.
Effect on poor companies : the reporting burden reduces their runway further, preventing all but a few extremely well funded startups from existing at all
Effect on advancing capabilities : minimal
- The companies commit to investing in cybersecurity and insider threat safeguards to protect proprietary and unreleased model weights. These model weights are the most essential part of an AI system, and the companies agree that it is vital that the model weights be released only when intended and when security risks are considered.
Effect on rich companies: they already want to do this, this is how they protect their IP.
Effect on poor companies : the security burden reduces their runway further
Effect on advancing capabilities : minimal
- The companies commit to facilitating third-party discovery and reporting of vulnerabilities in their AI systems. Some issues may persist even after an AI system is released and a robust reporting mechanism enables them to be found and fixed quickly.
This is the same as the reporting case
- The companies commit to developing robust technical mechanisms to ensure that users know when content is AI generated, such as a watermarking system. This action enables creativity with AI to flourish but reduces the dangers of fraud and deception.
Effect on rich companies: they already want to avoid legal responsibility for use of AI in deception. Stripping the watermark puts the liability on the scammer.
Effect on poor companies : watermarks slightly reduce their runway
Effect on advancing capabilities : minimal
- The companies commit to publicly reporting their AI systems’ capabilities, limitations, and areas of appropriate and inappropriate use. This report will cover both security risks and societal risks, such as the effects on fairness and bias.
This is another form of reporting. Same effects as above
- The companies commit to prioritizing research on the societal risks that AI systems can pose, including on avoiding harmful bias and discrimination, and protecting privacy. The track record of AI shows the insidiousness and prevalence of these dangers, and the companies commit to rolling out AI that mitigates them.
Effect on rich companies: Now they need another internal group doing this research, for each AI company.
Effect on poor companies : Having to pay for another required internal group reduces their runway
Effect on advancing capabilities : minimal
- The companies commit to develop and deploy advanced AI systems to help address society’s greatest challenges. From cancer prevention to mitigating climate change to so much in between, AI—if properly managed—can contribute enormously to the prosperity, equality, and security of all.
Effect on rich companies: This is carte blanche to do what they already were planning to do. Also, this says 'fuck AI pauses', direct from the Biden administration. GPT-4 is not nearly capable enough to solve any of "societies greatest challenges". It's missing modalities and general ability to get anything but the simplest tasks accomplished reliably. To add those additional components will take far more compute, such as the multi-exaflop AI supercomputers everyone is building that obviously will allow models that dwarf GPT-4.
Effect on poor companies : Well they aren't competing with megamodels, but they already were screwed by the other points
Effect on advancing capabilities :
comment by Noosphere89 (sharmake-farah) · 2023-07-22T18:34:36.907Z · LW(p) · GW(p)
My quick take is that this is not that great, and I'll explain why below.
First, the scope is quite bad, and I want to say why this is a bad scope below:
Scope: Where commitments mention particular models, they apply only to generative models that are overall more powerful than the current industry frontier (e.g. models that are overall more powerful than any currently released models, including GPT-4, Claude 2, PaLM 2, Titan and, in the case of image generation, DALL-E 2).
I consider this to be a bad scope, because it burdens much safer models with regulations while allowing RL, especially deep RL to mostly be unregulated, and I worry that this will set a dangerous precedent, primarily because I learned something important when reading porby's posts: It is actually good that, by and large, generative models/simulators/predictors have come first, or at least it's way easier than RL from an non-misuse extinction risk perspective since they have many nice safety features, like a lack of instrumental goals, due to it being densely informative, and in general much easier alignment targets.
I really hope this gets fixed, soon, because I fear that this will just place adversarial pressure on safe AI by default, especially with condition 8 below.
- Develop and deploy frontier AI systems to help address society’s greatest challenges
Companies making this commitment agree to support research and development of frontier AI systems that can help meet society’s greatest challenges, such as climate change mitigation and adaptation, early cancer detection and prevention, and combating cyber threats. Companies also commit to supporting initiatives that foster the education and training of students and workers to prosper from the benefits of AI, and to helping citizens understand the nature, capabilities, limitations, and impact of the technology.
This seems like it's throwing a bone to the e/acc inside them, at least in part. This is because they have the ability to justify arbitrary capabilities research if they can meet this commitment, and this seems like an indication that any regulation will probably not ban AI progress, or perhaps even slow down AI all that much, because of this here:
Companies intend these voluntary commitments to remain in effect until regulations covering substantially the same issues come into force.
Condition 5 is impossible, and how they deal with that impossibility will plausibly determine how AI goes.
Here's the condition below:
Trust 5) Develop and deploy mechanisms that enable users to understand if audio or visual content is AI-generated, including robust provenance, watermarking, or both, for AI-generated audio or visual content
Companies making this commitment recognize that it is important for people to be able to understand when audio or visual content is AI-generated. To further this goal, they agree to develop robust mechanisms, including provenance and/or watermarking systems for audio or visual content created by any of their publicly available systems within scope introduced after the watermarking system is developed. They will also develop tools or APIs to determine if a particular piece of content was created with their system. Audiovisual content that is readily distinguishable from reality or that is designed to be readily recognizable as generated by a company’s AI system—such as the default voices of AI assistants—is outside the scope of this commitment. The watermark or provenance data should include an identifier of the service or model that created the content, but it need not include any identifying user information. More generally, companies making this commitment pledge to work with industry peers and standards-setting bodies as appropriate towards developing a technical framework to help users distinguish audio or visual content generated by users from audio or visual content generated by AI.
The reason I believe it is this study, and importantly gives various impossiblity results go:
https://arxiv.org/pdf/2303.11156.pdf
So the AI companies may have committed themselves to an impossible task. The question is, what's going to happen next?
OpenAI post below:
https://openai.com/blog/moving-ai-governance-forward
Replies from: DanArmak↑ comment by DanArmak · 2023-07-23T13:21:18.751Z · LW(p) · GW(p)
About the impossibility result, if I understand correctly, that paper says two things (I'm simplifying and eliding a great deal):
-
You can take a recognizable, possibly watermarked output of one LLM, use a different LLM to paraphrase it, and not be able to detect the second LLM's output as coming from (transforming) the first LLM.
-
In the limit, any classifier that tries to detect LLM output can be beaten by an LLM that is sufficiently good at generating human-like output. There's evidence that a LLMs can soon become that good. And since emulating human output is an LLM's main job, capabilities researchers and model developers will make them that good.
The second point is true but not directly relevant: OpenAI et al are committing not to make models whose output is indistinguishable from humans.
The first point is true, BUT the companies have not committed themselves to defeating it. Their own models' output is clearly watermarked, and they will provide reliable tools to identify those watermarks. If someone else then provides a model that is good enough at paraphrasing to remove that watermark, that is that someone else's fault, and they are effectively not abiding by this industry agreement.
If open source / widely available non-API-gated models become good enough at this to render the watermarks useless, then the commitment scheme will have failed. This is not surprising; if ungated models become good enough at anything contravening this scheme, it will have failed.
There are tacit but very necessary assumptions in this approach and it will fail if any of them break:
- The ungated models released so far (eg llama) don't contain forbidden capabilities, including output and/or paraphrasing that's indistinguishable from human, but also of course notkillingeveryone, and won't be improved to include them by 'open source' tinkering that doesn't come from large industry players
- No-one worldwide will release new more capable models, or sell ungated access to them, disobeying this industry agreement; and if they do, it will be enforced (somehow)
- The inevitable use of more capable models, that would be illegal if released publicly, by some governments, militaries, etc. will not result in the public release of such capabilities; and also, their inevitable use of e.g. indistinguishable-from-human output will not cause such (public) problems that this commitment not to let private actors do it will become meaningless
↑ comment by Thomas Kwa (thomas-kwa) · 2024-07-23T23:25:06.373Z · LW(p) · GW(p)
A more recent paper shows that an equally strong model is not needed to break watermarks though paraphrasing. It suffices to have a quality oracle and a model that achieves equal quality with positive probability.
comment by Jonas Hallgren · 2023-07-21T21:27:47.027Z · LW(p) · GW(p)
Uhm, fuck yeah?
If someone told me this 2 years a go I wouldn't have believed it, kinda feels like we're doing a Dr Strange on the timeline right now.