Ideas for AI labs: Reading list
post by Zach Stein-Perlman · 2023-04-24T19:00:00.832Z · LW · GW · 0 commentsContents
Lists & discussion Levers Desiderata Ideas Coordination[1] Transparency Publication practices Structured access to AI models Governance structure Miscellanea See also None No comments
Related: AI policy ideas: Reading list.
This document is about ideas for AI labs. It's mostly from an x-risk perspective. Its underlying organization black-boxes technical AI stuff, including technical AI safety.
Lists & discussion
- Towards best practices in AGI safety and governance: A survey of expert opinion (GovAI, Schuett et al. 2023) (LW [LW · GW])
- This excellent paper is the best collection of ideas for labs. See pp. 18–22 for 100 ideas.
- Frontier AI Regulation: Managing Emerging Risks to Public Safety (Anderljung et al. 2023)
- Mostly about government regulation, but recommendations on safety standards translate to recommendations on actions for labs
- Model evaluation for extreme risks (DeepMind, Shevlane et al. 2023)
- What AI companies can do today to help with the most important century (Karnofsky 2023) (LW)
- Karnofsky nearcasting: How might we align transformative AI if it’s developed very soon?, Nearcast-based "deployment problem" analysis, and Racing through a minefield: the AI deployment problem (LW) (Karnofsky 2022)
- Survey on intermediate goals in AI governance [EA · GW] (Räuker and Aird 2023)
- Corporate Governance of Artificial Intelligence in the Public Interest (Cihon, Schuett, and Baum 2021) and The case for long-term corporate governance of AI [EA · GW] (Baum and Schuett 2021)
- Three lines of defense against risks from AI (Schuett 2022)
- The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation (Brundage et al. 2018)
- Adapting cybersecurity frameworks to manage frontier AI risks: defense in depth (IAPS, Ee et al. 2023)
Levers
- AI developer levers and AI industry & academia levers in Advanced AI governance (LPP, Maas 2023)
- This report is excellent
- "Affordances" in "Framing AI strategy" (Stein-Perlman 2023)
- This list may be more desiderata-y than lever-y
Desiderata
Maybe I should make a separate post on desiderata for labs (for existential safety).
- Six Dimensions of Operational Adequacy in AGI Projects (Yudkowsky 2022)
- "Carefully Bootstrapped Alignment" is organizationally hard (Arnold 2023)
- Slowing AI: Foundations (Stein-Perlman 2023)
- [Lots of stuff implicated elsewhere, like "help others act well" and "minimize diffusion of your capabilities research"]
Ideas
Coordination[1]
See generally The Role of Cooperation in Responsible AI Development (Askell et al. 2019).
- Coordinate to not train or deploy dangerous AI
- Model evaluations
- Model evaluation for extreme risks (DeepMind, Shevlane et al. 2023) (LW [LW · GW])
- ARC Evals
- Safety evaluations and standards for AI [EA · GW] (Barnes 2023)
- Update on ARC's recent eval efforts (ARC 2023) (LW [LW · GW])
- Safety standards
- Model evaluations
Transparency
Transparency enables coordination (and some regulation).
- Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims (Brundage et al. 2020)
- Followed up by Filling gaps in trustworthy development of AI (Avin et al. 2021)
- Structured transparency
- Exploring the Relevance of Data Privacy-Enhancing Technologies for AI Governance Use Cases (Bluemke et al. 2023)
- Beyond Privacy Trade-offs with Structured Transparency (Trask and Bluemke et al. 2020)
- Honest organizations (Christiano 2018)
- Auditing & certification
- Theories of Change for AI Auditing [LW · GW] (Apollo 2023) and other Apollo stuff
- What does it take to catch a Chinchilla? Verifying Rules on Large-Scale Neural Network Training via Compute Monitoring (Shavit 2023)
- Auditing large language models: a three-layered approach (Mökander et al. 2023)
- The first two authors have other relevant-sounding work on arXiv
- AGI labs need an internal audit function (Schuett 2023)
- AI Certification: Advancing Ethical Practice by Reducing Information Asymmetries (Cihon et al. 2021)
- Private literature review (2021)
- Model evaluations
- Model evaluation for extreme risks (DeepMind, Shevlane et al. 2023) (LW [LW · GW])
- Safety evaluations and standards for AI [EA · GW] (Barnes 2023)
- Update on ARC's recent eval efforts (ARC 2023) (LW [LW · GW])
Publication practices
Labs should minimize/delay the diffusion of their capabilities research.
- Publication decisions for large language models, and their impacts [EA · GW] (Cottier 2022)
- Shift AI publication norms toward "don't always publish everything right away" in Survey on intermediate goals in AI governance [EA · GW] (Räuker & Aird 2023)
- "Publication norms for AI research" (Aird unpublished)
- Publication policies and model-sharing decisions (Wasil et al. 2023)
Structured access to AI models
- Sharing Powerful AI Models (Shevlane 2022)
- Structured access for third-party research on frontier AI models (GovAI, Bucknall and Trager 2023)
- Compute Funds and Pre-trained Models (Anderljung et al. 2022)
Governance structure
- How to Design an AI Ethics Board (Schuett et al. 2023)
- Ideal governance (for companies, countries and more) (Karnofsky 2022) (LW) has relevant discussion but not really recommendations
Miscellanea
- Do more/better safety research; share safety research and safety-relevant knowledge
- Do safety research as a common good
- Do and share alignment and interpretability research
- Help people who are trying to be safe be safe
- Make AI risk and safety more concrete and legible
- See Larsen et al.'s Instead of technical research, more people should focus on buying time and Ways to buy time (2022)
- Pay the alignment tax (if you develop a critical model)
- Do safety research as a common good
- Improve your security (operational security, information security, and cybersecurity)
- There's a private reading list on infosec/cybersec, but it doesn't have much about what labs (or others) should actually do.
- Plan and prepare: ideally figure out what's good, publicly commit to doing what's good (e.g., perhaps monitoring for deceptive alignment or supporting external model evals), do it, and demonstrate that you're doing it
- For predicting and avoiding misuse
- For alignment
- For deployment (especially of critical models)
- For coordinating with other labs
- Sharing
- Stopping
- Merging
- More
- For engaging government
- For increasing time 'near the end' and using it well
- For ending risk from misaligned AI
- For how to get from powerful AI to a great long-term future
- Much more...
- Recommendation: Bug Bounties and Responsible Disclosure for Advanced ML Systems (Gray 2023)
- See also this comment [LW(p) · GW(p)]
- See OpenAI's bug bounty program
- Report incidents
- The Windfall Clause: Distributing the Benefits of AI for the Common Good (O'Keefe et al. 2020)
- Also sounds relevant: Safe Transformative AI via a Windfall Clause (Bova et al. 2021)
- Watermarking[2]
- Make, share, and improve a safety plan
- Make share, and improve a plan for the long-term future
- Improve other labs' actions
- Inform, advise, advocate, facilitate, support, coordinate
- Differentially accelerate safer labs
- Improve non-lab actors' actions
- Government
- Support good policy
- See AI policy ideas: Reading list (Stein-Perlman 2023)
- Standards-setters
- Kinda the public
- Kinda the ML community
- Government
- Support miscellaneous other strategic desiderata
- E.g. prevent new leading labs from appearing
See also
- Best Practices for Deploying Language Models (Cohere, OpenAI, and AI21 Labs 2022)
- See also Lessons learned on language model safety and misuse (OpenAI 2022)
- Slowing AI [? · GW] (Stein-Perlman 2023)
- Survey on intermediate goals in AI governance [EA · GW] (Räuker and Aird 2023)
Some sources are roughly sorted within sections by a combination of x-risk-relevance, quality, and influentialness– but sometimes I didn't bother to try to sort them, and I haven't read all of them.
Please have a low bar to suggest additions, substitutions, rearrangements, etc.
Current as of: 9 July 2023.
- ^
At various levels of abstraction, coordination can look like:
- Avoiding a race to the bottom
- Internalizing some externalities
- Sharing some benefits and risks
- Differentially advancing more prosocial actors?
- More? - ^
Policymaking in the Pause (FLI 2023) cites A Systematic Review on Model Watermarking for Neural Networks (Boenisch 2021); I don't know if that source is good. (Note: this disclaimer does not imply that I know that the other sources in this doc are good!)
I am not excited about watermarking. (Note: this disclaimer does not imply that I am excited about the other ideas in this doc! But I am excited about most of them.)
0 comments
Comments sorted by top scores.