DeepMind: Model evaluation for extreme risks 2023-05-25T03:00:00.915Z
GovAI: Towards best practices in AGI safety and governance: A survey of expert opinion 2023-05-15T01:42:41.012Z
Stopping dangerous AI: Ideal US behavior 2023-05-09T21:00:55.187Z
Stopping dangerous AI: Ideal lab behavior 2023-05-09T21:00:19.505Z
Slowing AI: Crunch time 2023-05-03T15:00:12.495Z
Ideas for AI labs: Reading list 2023-04-24T19:00:00.832Z
Slowing AI: Interventions 2023-04-18T14:30:35.746Z
AI policy ideas: Reading list 2023-04-17T19:00:00.604Z
Slowing AI: Foundations 2023-04-17T14:30:09.427Z
Slowing AI: Reading list 2023-04-17T14:30:02.467Z
FLI report: Policymaking in the Pause 2023-04-15T17:01:06.727Z
FLI open letter: Pause giant AI experiments 2023-03-29T04:04:23.333Z
Operationalizing timelines 2023-03-10T16:30:01.654Z
Taboo "compute overhang" 2023-03-01T19:15:02.515Z
The public supports regulating AI for safety 2023-02-17T04:10:03.307Z
Framing AI strategy 2023-02-07T19:20:04.535Z
AI safety milestones? 2023-01-23T21:00:24.441Z
Sealed predictions thread 2022-05-07T18:00:04.705Z
Rationalism for New EAs 2021-10-18T16:00:18.692Z
Great Power Conflict 2021-09-17T15:00:17.039Z
Zach Stein-Perlman's Shortform 2021-08-29T18:00:56.148Z
The Governance Problem and the "Pretty Good" X-Risk 2021-08-29T18:00:28.190Z


Comment by Zach Stein-Perlman on The AGI Race Between the US and China Doesn’t Exist. · 2023-06-03T01:50:03.174Z · LW · GW

See also the new The Illusion of China’s AI Prowess: Regulating AI Will Not Set America Back in the Technology Race from Helen Toner, Jenny Xiao, and Jeffrey Ding.

Comment by Zach Stein-Perlman on DeepMind: Model evaluation for extreme risks · 2023-05-27T01:12:59.206Z · LW · GW

To briefly mention one way your skepticism proves too much (or has hidden assumptions?): clearly sufficiently strong capability evals, run during training runs, enforced by governments monitoring training runs, would ~suffice to prevent dangerous training runs.

Comment by Zach Stein-Perlman on GovAI: Towards best practices in AGI safety and governance: A survey of expert opinion · 2023-05-25T16:25:10.260Z · LW · GW

Assuming you mean the second 42 ("AGI labs take measures to limit potential harms that could arise from AI systems being sentient or deserving moral patienthood")-- I also don't know what labs should do, so I asked an expert yesterday and will reply here if they know of good proposals...

Comment by Zach Stein-Perlman on GovAI: Towards best practices in AGI safety and governance: A survey of expert opinion · 2023-05-25T16:17:55.932Z · LW · GW

I think 19 ideas got >90% agreement.

I agree the top ideas overlap. I think reasons one might support some over others depend on the details. 

Comment by Zach Stein-Perlman on Taboo "compute overhang" · 2023-05-24T06:12:38.602Z · LW · GW

Briefly: with arbitrarily good methods, we could train human-level AI with very little hardware. Assertions about hardware are only relevant in the context of the relevant level of algorithmic progress.

Or: nothing depends on whether sufficient hardware for human-level AI already exists given arbitrarily good methods.

(Also note that what's relevant for forecasting or decisionmaking is facts about how much hardware is being used and how much a lab could use if it wanted, not the global supply of hardware.)

Comment by Zach Stein-Perlman on Conjecture internal survey: AGI timelines and probability of human extinction from advanced AI · 2023-05-22T19:06:22.477Z · LW · GW

I am very glad you did this because in worlds where survey results look like this, I think it's good and important to make that easily legible to AI safety community outsiders. [Edit: and good and important to set a good example for other labs.]

Comment by Zach Stein-Perlman on Weight by Impact · 2023-05-21T17:38:20.880Z · LW · GW

Good post.

Another nuance: not all sets of possible futures count as a "world." E.g. you can't gerrymander some 1%-success-worlds together with some 99%-success-worlds to make a high-tractability 50%-success-world.

Some other discussion on leverage by me here.

Comment by Zach Stein-Perlman on [Interview w/ Zvi Mowshowitz] Should we halt progress in AI? · 2023-05-16T18:41:33.242Z · LW · GW

What are similar episodes from the past we can learn from?

See "Technological restraint" in "Slowing AI: Reading list." (But you seem to be most interested in the strategic/military aspect of technology, and most sources on technological restraint don't focus on that.)

Comment by Zach Stein-Perlman on GovAI: Towards best practices in AGI safety and governance: A survey of expert opinion · 2023-05-15T07:20:21.434Z · LW · GW

For reference, here are the 50 tested ideas (in descending order of popularity):

  1. Pre-deployment risk assessment. AGI labs should take extensive measures to identify, analyze, and evaluate risks from powerful models before deploying them.
  2. Dangerous capability evaluations. AGI labs should run evaluations to assess their models’ dangerous capabilities (e.g. misuse potential, ability to manipulate, and power-seeking behavior).
  3. Third-party model audits. AGI labs should commission third-party model audits before deploying powerful models.
  4. Safety restrictions. AGI labs should establish appropriate safety restrictions for powerful models after deployment (e.g. restrictions on who can use the model, how they can use the model, and whether the model can access the internet).
  5. Red teaming. AGI labs should commission external red teams before deploying powerful models.
  6. Monitor systems and their uses. AGI labs should closely monitor deployed systems, including how they are used and what impact they have on society.
  7. Alignment techniques. AGI labs should implement state-of-the-art safety and alignment techniques.
  8. Security incident response plan. AGI labs should have a plan for how they respond to security incidents (e.g. cyberattacks).
  9. Post-deployment evaluations. AGI labs should continually evaluate models for dangerous capabilities after deployment, taking into account new information about the model’s capabilities and how it is being used.
  10. Report safety incidents. AGI labs should report accidents and near misses to appropriate state actors and other AGI labs (e.g. via an AI incident database).
  11. Safety vs capabilities. A significant fraction of employees of AGI labs should work on enhancing model safety and alignment rather than capabilities.
  12. Internal review before publication. Before publishing research, AGI labs should conduct an internal review to assess potential harms.
  13. Pre-training risk assessment. AGI labs should conduct a risk assessment before training powerful models.
  14. Emergency response plan. AGI labs should have and practice implementing an emergency response plan. This might include switching off systems, overriding their outputs, or restricting access.
  15. Protection against espionage. AGI labs should take adequate measures to tackle the risk of state-sponsored or industrial espionage.
  16. Pausing training of dangerous models. AGI labs should pause the development process if sufficiently dangerous capabilities are detected.
  17. Increasing level of external scrutiny. AGI labs should increase the level of external scrutiny in proportion to the capabilities of their models.
  18. Publish alignment strategy. AGI labs should publish their strategies for ensuring that their systems are safe and aligned.
  19. Bug bounty programs. AGI labs should have bug bounty programs, i.e. recognize and compensate people for reporting unknown vulnerabilities and dangerous capabilities.
  20. Industry sharing of security information. AGI labs should share threat intelligence and information about security incidents with each other.
  21. Security standards. AGI labs should comply with information security standards (e.g. ISO/IEC 27001 or NIST Cybersecurity Framework). These standards need to be tailored to an AGI context.
  22. Publish results of internal risk assessments. AGI labs should publish the results or summaries of internal risk assessments, unless this would unduly reveal proprietary information or itself produce significant risk. This should include a justification of why the lab is willing to accept remaining risks.
  23. Dual control. Critical decisions in model development and deployment should be made by at least two people (e.g. promotion to production, changes to training datasets, or modifications to production).
  24. Publish results of external scrutiny. AGI labs should publish the results or summaries of external scrutiny efforts, unless this would unduly reveal proprietary information or itself produce significant risk.
  25. Military-grade information security. The information security of AGI labs should be proportional to the capabilities of their models, eventually matching or exceeding that of intelligence agencies (e.g. sufficient to defend against nation states).
  26. Board risk committee. AGI labs should have a board risk committee, i.e. a permanent committee within the board of directors which oversees the lab’s risk management practices.
  27. Chief risk officer. AGI labs should have a chief risk officer (CRO), i.e. a senior executive who is responsible for risk management.
  28. Statement about governance structure. AGI labs should make public statements about how they make high-stakes decisions regarding model development and deployment.
  29. Publish views about AGI risk. AGI labs should make public statements about their views on the risks and benefits from AGI, including the level of risk they are willing to take in its development.
  30. KYC screening. AGI labs should conduct know-your-customer (KYC) screenings before giving people the ability to use powerful models.
  31. Third-party governance audits. AGI labs should commission third-party audits of their governance structures.
  32. Background checks. AGI labs should perform rigorous background checks before hiring/appointing members of the board of directors, senior executives, and key employees.
  33. Model containment. AGI labs should contain models with sufficiently dangerous capabilities (e.g. via boxing or air-gapping).
  34. Staged deployment. AGI labs should deploy powerful models in stages. They should start with a small number of applications and fewer users, gradually scaling up as confidence in the model’s safety increases.
  35. Tracking model weights. AGI labs should have a system that is intended to track all copies of the weights of powerful models.
  36. Internal audit. AGI labs should have an internal audit team, i.e. a team which assesses the effectiveness of the lab’s risk management practices. This team must be organizationally independent from senior management and report directly to the board of directors.
  37. No [unsafe] open-sourcing. AGI labs should not open-source powerful models, unless they can demonstrate that it is sufficiently safe to do so.
  38. Researcher model access. AGI labs should give independent researchers API access to deployed models.
  39. API access to powerful models. AGI labs should strongly consider only deploying powerful models via an application programming interface (API).
  40. Avoiding hype. AGI labs should avoid releasing powerful models in a way that is likely to create hype around AGI (e.g. by overstating results or announcing them in attention-grabbing ways).
  41. Gradual scaling. AGI labs should only gradually increase the amount of compute used for their largest training runs.
  42. Treat updates similarly to new models. AGI labs should treat significant updates to a deployed model (e.g. additional fine-tuning) similarly to its initial development and deployment. In particular, they should repeat the pre-deployment risk assessment.
  43. Pre-registration of large training runs. AGI labs should register upcoming training runs above a certain size with an appropriate state actor.
  44. Enterprise risk management. AGI labs should implement an enterprise risk management (ERM) framework (e.g. the NIST AI Risk Management Framework or ISO 31000). This framework should be tailored to an AGI context and primarily focus on the lab’s impact on society.
  45. Treat internal deployments similarly to external deployments. AGI labs should treat internal deployments (e.g. using models for writing code) similarly to external deployments. In particular, they should perform a pre-deployment risk assessment.
  46. Notify a state actor before deployment. AGI labs should notify appropriate state actors before deploying powerful models.
  47. Notify affected parties. AGI labs should notify parties who will be negatively affected by a powerful model before deploying it.
  48. Inter-lab scrutiny. AGI labs should allow researchers from other labs to scrutinize powerful models before deployment.
  49. Avoid capabilities jumps. AGI labs should not deploy models that are much more capable than any existing models.
  50. Notify other labs. AGI labs should notify other labs before deploying powerful models.

And here are 50 ideas suggested by respondents:

  1. AGI labs should participate in democratic and participatory governance processes (e.g. citizen assemblies). Issues could include the level of risk that is acceptable and preferences for different governance models.
  2. AGI labs should engage the public and civil society groups in determining what risks should be considered and what level of risk is acceptable.
  3. AGI labs should contribute to improving AI and AGI literacy among the public and policymakers.
  4. AGI labs should be transparent about where training data comes from.
  5. AGI labs should use system cards.
  6. AGI labs should report what safety and alignment techniques they used to develop a model.
  7. AGI labs should publish their ethics and safety research.
  8. AGI labs should make capability demonstrations available to policymakers and the public before deployment.
  9. AGI labs should have written deployment plans of what they would do with an AGI or other advanced and powerful AI system.
  10. AGI labs should publicly predict the frequency of harmful AI incidents.
  11. AGI labs should generate realistic catastrophic risk models for advanced AI.
  12. AGI labs should track and report on their models’ capability to automate AI research and development.
  13. AGI labs should engage in efforts to systematically forecast future risks and benefits of the technology they build.
  14. AGI labs should generate realistic catastrophic risk models for advanced AI, potentially making these public or using them to raise awareness.
  15. AGI labs should publish an annual report where they present the predicted and actual impacts of their work, along with the evidence and assumptions these are based on.
  16. AGI labs should pre-register big training runs including the amount of compute used, the data used for training, and how many parameters the model will have.
  17. AGI labs should engage in employee and investor education and awareness on the risks of advanced AI systems and potential mitigating procedures that need to be taken that tradeoff profit for societal benefit.
  18. AGI labs should adequately protect whistleblowers.
  19. AGI labs should have an onboard process for managers and new employees that involves content explaining how the organization believes a responsible AGI developer would behave and how they are attempting to meet that standard.
  20. AGI labs should promote a culture that encourages internal deliberation and critique, and evaluate whether they are succeeding in building such a culture.
  21. AGI labs should have dedicated programs to improve the diversity, equity, and inclusion of their talent.
  22. AGI labs should have independent safety and ethics advisory boards to help with certain decisions.
  23. AGI labs should have internal review boards.
  24. AGI labs should be set up such that their governance structures permit them to tradeoff profits with societal benefit.
  25. AGI labs should have merge and assist clauses.
  26. AGI labs should report to an international non-governmental organization (INGO) that is publicly committed to human rights and democratic values.
  27. AGI labs should have an independent board of directors with technical AI safety expertise who have the mandate to put the benefits for society above profit and shareholder value.
  28. AGI labs should maintain a viable way to divert from building AGI (e.g. to build narrower models and applications), in case building AGI will not be possible to do safely.
  29. AGI labs should use the Three Lines of Defense risk management framework.
  30. AGI labs should take measures to avoid being sued for trading off profits with societal benefit.
  31. AGI labs should be subject to mandatory interpretability standards.
  32. AGI labs should conduct evaluation during training, being prepared to stop and analyze any training run that looks potentially risky or harmful.
  33. AGI labs should save logs of interactions with the AI system.
  34. AGI labs should consider caps on model size.
  35. AGI labs should be forced to have systems that consist of ensembles of capped size models instead of one increasingly large model.
  36. AGI labs should ensure that AI systems in an ensemble communicate in English and that these communications are logged for future analysis if an incident occurs.
  37. AGI labs should limit API access to approved and vetted applications to foreclose potential misuse and dual use risks.
  38. AGI labs should conduct simulated cyber attacks on their systems to check for vulnerabilities.
  39. AGI labs should have internal controls and processes that prevent a single person or group being able to deploy an advanced AI system when governance mechanisms have found this to be potentially harmful or illegal.
  40. AGI labs should disclose the data and labor practices involved in pre-training and training of powerful AI systems.
  41. AGI labs should disclose the environmental costs of developing and deploying powerful AI systems.
  42. AGI labs take measures to limit potential harms that could arise from AI systems being sentient or deserving moral patienthood.
  43. AGI labs should coordinate on self-regulatory best practices they use for safety.
  44. AGI labs should coordinate on best practices for external auditing and red-teaming.
  45. AGI labs should coordinate on best practices for incident reporting.
  46. AGI labs should report cluster sizes and training plans to other AGI labs to avoid incorrect perceptions of current capabilities and compute resources.
  47. AGI labs should have feedback mechanisms with communities that are affected by their models.
  48. AGI labs should have ethical principles and set out “red lines” for their work in advance.
  49. AGI labs should incorporate a privacy-preserving in machine learning (PPML) approach to auditing and governing AI models.
  50. AGI labs should use responsible AI licenses (RAIL) and engage in other practices that allow for degrees of openness on the spectrum from closed to open.
Comment by Zach Stein-Perlman on GovAI: Towards best practices in AGI safety and governance: A survey of expert opinion · 2023-05-15T04:27:04.719Z · LW · GW

It was posted at I'm not aware of versions other than the pdf.

Comment by Zach Stein-Perlman on GovAI: Towards best practices in AGI safety and governance: A survey of expert opinion · 2023-05-15T03:00:30.567Z · LW · GW

I think this is now the canonical collection of ideas for stuff AI labs should do (from an x-risk perspective). (Here are the 50 ideas with brief descriptions, followed by 50 ideas suggested by respondents-- also copied in a comment below.) (The only other relevant collection-y public source I'm aware of is my Ideas for AI labs: Reading list.) So it seems worth commenting with promising ideas not listed in the paper:

  • Alignment (including interpretability) research (as a common good, separate from aligning your own models)
    • Model-sharing: cautiously sharing powerful models with some external safety researchers to advance their research (separately from sharing for the sake of red-teaming)
  • Transparency stuff
  • Coordination stuff
  • Publication stuff
  • Planning stuff
  • [Supporting other labs doing good stuff]
  • [Supporting other kinds of actors (e.g., government) doing good stuff]

(I hope to update this comment with details later.)

Comment by Zach Stein-Perlman on AI policy ideas: Reading list · 2023-05-10T15:58:38.709Z · LW · GW

Thanks-- already have that as "Future of compute review - submission of evidence (CLTR et al. 2022)"

Comment by Zach Stein-Perlman on Zach Stein-Perlman's Shortform · 2023-05-09T02:00:38.745Z · LW · GW

Slowing AI: Bad takes

This shortform was going to be a post in Slowing AI but its tone is off.

This shortform is very non-exhaustive.

Bad take #1: China won't slow, so the West shouldn't either

There is a real consideration here. Reasonable variants of this take include

  • What matters for safety is not just slowing but also the practices of the organizations that build powerful AI. Insofar as the West is safer and China won't slow, it's worth sacrificing some Western slowing to preserve Western lead.
  • What matters for safety is not just slowing but especially slowing near the end. Differentially slowing the West now would reduce its ability to slow later (or even cause it to speed later). So differentially slowing the West now is bad.

(Set aside the fact that slowing the West generally also slows China, because they're correlated and because ideas pass from the West to China.) (Set aside the question of whether China will try to slow and how correlated that is with the West slowing.)

In some cases slowing the West would be worth burning lead time. But slowing AI doesn't just mean the West slowing itself down. Some interventions would slow both spheres similarly or even differentially slow China– most notably export controls, reducing diffusion of ideas, and improved migration policy.

See West-China relation.


Bad take #2: slowing can create a compute overhang, so all slowing is bad

Taboo "overhang."

Yes, insofar as slowing now risks speeding later, we should notice that. There is a real consideration here.

But in some cases slowing now would be worth a little speeding later. Moreover, some kinds of slowing don't cause faster progress later at all: for example, reducing diffusion of ideas, decreasing hardware progress, and any stable and enforceable policy regimes that slow AI.

See Quickly scaling up compute.


Bad take #3: powerful AI helps alignment research, so we shouldn't slow it

(Set aside the question of how much powerful AI helps alignment research.) If powerful AI is important for alignment research, that means we should aim to increase time with powerful AI, not how soon powerful AI appears.


Bad take #4: it would be harder for unaligned AI to take over in a world with less compute available (for it to hijack), and failed takeover attempts would be good, so it's better for unaligned AI to try to take over soon

No, running AI systems seems likely to be cheap and there's already plenty of compute.

Comment by Zach Stein-Perlman on What does it take to ban a thing? · 2023-05-08T16:53:21.310Z · LW · GW

If someone has more examples, or better explanations, I'd be glad to learn.

See "Technological restraint" in "Slowing AI: Reading list" for several more sources.

Comment by Zach Stein-Perlman on Zach Stein-Perlman's Shortform · 2023-05-04T18:15:29.674Z · LW · GW

Some bad default/attractor properties of cause-focused groups of humans:

  • Bad group epistemcis
    • The loudest aren't the most worth listening to
    • People don't dissent even when that would be correct
      • Because they don't naturally even think to challenge group assumptions
      • Because dissent is punished
  • Bad individual epistemics
  • Soldier mindset
    • Excessively focusing on persuasion and how to persuade, relative to understanding the world
    • Lacking vibes like curiosity is cool and changing your mind is cool
  • Feeling like a movement
    • Excessively focusing on influence-seeking for the movement
    • Having enemies and being excessively adversarial to them
    • (Signs/symptoms are people asking the group "what do we believe" and excessive fixation on what defines the group or distinguishes it from adjacent groups)
  • Having group instrumental-goals or plans or theories-of-victory that everyone is supposed to share (to be clear I think it's often fine for a group to share ~ultimate goals but groups often fixate on particular often-suboptimal paths to achieving those goals)
    • Choosing instrumental goals poorly and not revising them
  • Excessive fighting over status/leadership (maybe, sometimes)
  • Maybe... being bad at achieving goals, or bad instrumental rationality (group and individual)
  • Maybe something weird about authority...

(I'm interested in readings on this topic.)

Comment by Zach Stein-Perlman on AI policy ideas: Reading list · 2023-04-30T17:06:24.145Z · LW · GW

Thank you!!

  • Will add this; too bad it's so meta
  • Will read this-- probably it's worth adding and maybe it points to more specific sources also worth adding
  • Will add this; too bad it's so meta
  • Will add this
  • Already have this one
Comment by Zach Stein-Perlman on [deleted post] 2023-04-27T22:13:39.393Z

Please delete this tag; I'll rename "Restrain AI Development" to "Slowing Down AI" and retag what was tagged here as appropriate.

Comment by Zach Stein-Perlman on [deleted post] 2023-04-27T21:49:34.229Z

I propose merging this tag into Maybe change that one's name to this one's though? @Raemon 

Comment by Zach Stein-Perlman on My version of Simulacra Levels · 2023-04-26T16:15:43.067Z · LW · GW

I think there's something weird about your levels applying to statements or things you say rather than something more like internal attitudes. When you say something as opposed to just thinking it, you kind of by definition do so in order to affect others' beliefs about what you believe.

But internal-attitudes aren't a great fit either.

Comment by Zach Stein-Perlman on LessWrong moderation messaging container · 2023-04-22T07:36:09.689Z · LW · GW

Um, no; see "longer than would be optimal."

Comment by Zach Stein-Perlman on LessWrong moderation messaging container · 2023-04-22T02:03:14.755Z · LW · GW

If I was a new user reading "Your first post is a bit like an application," I would delay making a first post longer than would be optimal: I would spend too much time revising my first post or even discard several possibilities until I had a post idea I was ~sure was good.

Comment by Zach Stein-Perlman on Slowing AI: Foundations · 2023-04-18T23:53:20.652Z · LW · GW
  • To some extent, yes, it would need US + Europe (including UK) + China. A strong treaty is necessary for some goals.
  • I'd guess that the US alone could buy a year.
  • One Western government doing something often causes other Western governments to do it.

(Edit in response to reply: I don't think we have important disagreements here, so ending the conversation.)

Comment by Zach Stein-Perlman on Slowing AI: Foundations · 2023-04-18T23:12:04.856Z · LW · GW

I largely agree! Maybe we can get a stable policy regime of tracking hardware and auditing all large training runs with model evals that can identify unsafe systems. Maybe the US government can do intermediate stuff like tracking hardware and restricting training compute.

But mostly this frame is about raising questions or suggesting orientations or helping you notice if something appears.

(By the way, I roughly endorse this frame less than the others, which is why it's at the end.)

Comment by Zach Stein-Perlman on Slowing AI: Foundations · 2023-04-18T22:53:25.999Z · LW · GW

This frame makes the nonobvious/uncertain assertion that "many actors will be able to unilaterally end the world" I'm not interested in arguing whether by default many actors would be able to unilaterally build an AI that ends the world here. Insofar as that assertion is true, one kind of solution or way of orienting to the problem is to see it as a problem of unilateralism.

In response to your last paragraph: yeah, preventing a lab from building dangerous AI requires hard action like effectively monitoring it or controlling a necessary input to dangerous AI.

Comment by Zach Stein-Perlman on Slowing AI: Foundations · 2023-04-18T22:45:17.748Z · LW · GW

Agree in part.

I have the impression that for reasons I don't fully understand, scaling up training compute isn't just a matter of being willing to spend more. One does not simply spend $1B on compute.

Ideas and training compute substitute for each other sufficiently well enough that I don't think it's useful to talk about "[figuring] out how to make AGI before or after we [have] the compute to implement it." (And when "'hardware overhang' first came about" it had very different usage, e.g. the AI Impacts definition.)

Comment by Zach Stein-Perlman on AI policy ideas: Reading list · 2023-04-18T20:45:04.496Z · LW · GW

(Notes to self)

More sources to maybe integrate:

What should the see-also section do? I'm not sure. Figure that out. Consider adding general AI governance sources.

Maybe I should organize by topic...

Comment by Zach Stein-Perlman on Slowing AI: Foundations · 2023-04-17T21:57:16.669Z · LW · GW

(Not exactly: slowing increases time for making sense of AI, increasing risk awareness, governance, paying the alignment tax, influence-gaining and so forth– but also for misuse, safety-unconcerned actors exerting more influence, maybe multipolarity increasing, and so forth.)

Comment by Zach Stein-Perlman on Who is testing AI Safety public outreach messaging? · 2023-04-17T17:38:11.473Z · LW · GW

The only such thing I'm aware of is The Effectiveness of AI Existential Risk Communication to the American and Dutch Public.

Comment by Zach Stein-Perlman on FLI report: Policymaking in the Pause · 2023-04-15T17:01:11.891Z · LW · GW

From an x-risk perspective, I think this report is good. It's far from shovel-ready policy proposals, but it points in some reasonable directions and might advance the debate.

I think it's wrong on #5 (watermarking)– what could Meta do if LLaMa had been watermarked? And #7 seems to have little x-risk relevance.

Comment by Zach Stein-Perlman on List of requests for an AI slowdown/halt. · 2023-04-15T00:19:57.067Z · LW · GW


Comment by Zach Stein-Perlman on Anthropic is further accelerating the Arms Race? · 2023-04-07T02:14:47.333Z · LW · GW

Minerva (from June 2022) used 3e24; there's no way "several orders of magnitude larger" was right when the article was being written. I think the author just made a mistake.

Comment by Zach Stein-Perlman on Anthropic is further accelerating the Arms Race? · 2023-04-07T02:12:33.730Z · LW · GW

Epoch says 2.2e25. Skimming that page, it seems like a pretty unreliable estimate. They say their 90% confidence interval is about 1e25 to 5e25.

Comment by Zach Stein-Perlman on Anthropic is further accelerating the Arms Race? · 2023-04-06T23:50:19.369Z · LW · GW

Anthropic estimates its frontier model will require on the order of 10^25 FLOPs, or floating point operations — several orders of magnitude larger than even the biggest models today.

This doesn't make sense. GPT-4 used around 2*10^25 FLOP, someone estimated.

Comment by Zach Stein-Perlman on New survey: 46% of Americans are concerned about extinction from AI; 69% support a six-month pause in AI development · 2023-04-05T19:29:22.848Z · LW · GW

Nice. See also YouGov's other question from the same survey, "How likely do you think it is that artificial intelligence (AI) will eventually become more intelligent than people?" See also my collection of public surveys, Surveys of US public opinion on AI.

Comment by Zach Stein-Perlman on Zach Stein-Perlman's Shortform · 2023-03-30T08:58:43.070Z · LW · GW

I deeply sympathize with the presumptuous philosopher but 1a feels weird.

2a was meant to be conditional on non-simulation.

Actually putting numbers on 2a (I have a post on this coming soon), the anthropic update seems to say (conditional on non-simulation) there's almost certainly lots of aliens all of which are quiet, which feels really surprising.

To clarify what I meant on 3b: maybe "you live in a simulation" can explain why the universe looks old better than "uh, I guess all of the aliens were quiet" can.

Comment by Zach Stein-Perlman on Zach Stein-Perlman's Shortform · 2023-03-30T04:11:30.147Z · LW · GW

Propositions on SIA

Epistemic status: exploring implications, some of which feel wrong.

  1. If SIA is correct, you should update toward the universe being much larger than it naively (i.e. before anthropic considerations) seems, since there are more (expected) copies of you in larger universes.
    1. In fact, we seem to have to update to probability 1 on infinite universes; that's surprising.
  2. If SIA is correct, you should update toward there being more alien civilizations than it naively seems, since in possible-universes where more aliens appear, more (expected) copies of human civilization appear.
    1. The complication is that more alien civilizations makes it more likely that an alien causes you to never have existed, e.g. by colonizing the solar system billions of years ago. So a corollary is that you should update toward human-level civilizations being less likely to be "loud" or tending to affect fewer alien civilizations or something than it naively seems.
      1. So SIA predicts that there were few aliens in the early universe and many aliens around now.
      2. So SIA predicts that human-level civilizations (more precisely, I think: civilizations whose existence is correlated with your existence) tend not to noticeably affect many others (whether due to their capabilities, motives, or existential catastrophes).
      3. So SIA retrodicts there being a speed limit (the speed of light) and moreover predicts that noticeable-influence-propagation in fact tends to be even slower.
  3. If SIA is correct, you should update toward the proposition that you live in a simulation, relative to your naive credence.
    1. Because there could be lots more (expected) copies of you in simulations.
    2. Because that can explain why you appear to exist billions of years after billions of alien civilizations could have reached Earth.
Comment by Zach Stein-Perlman on Zach Stein-Perlman's Shortform · 2023-03-25T00:05:48.208Z · LW · GW

For example, if a lab is considering deploying a powerful model, it can prosocially show its hand--i.e., demonstrate that it has a powerful model--and ask others to make themselves partially transparent too. This affordance doesn't appear until the endgame. I think a refined version of it could be a big deal.

Comment by Zach Stein-Perlman on Zach Stein-Perlman's Shortform · 2023-03-24T23:52:26.216Z · LW · GW

Agree. I merely assert that we should be aware of and plan for the possibility of increased strategic clarity, risk awareness, etc. (and planning includes unpacking "etc.").

Probably taking the analogy too far, but: most games-that-can-have-endgames also have instances that don't have endgames; e.g. games of chess often end in the midgame.

Comment by Zach Stein-Perlman on Zach Stein-Perlman's Shortform · 2023-03-24T23:00:16.087Z · LW · GW

AI endgame

In board games, the endgame is a period generally characterized by strategic clarity, more direct calculation of the consequences of actions rather than evaluating possible actions with heuristics, and maybe a narrower space of actions that players consider.

Relevant actors, particularly AI labs, are likely to experience increased strategic clarity before the end, including AI risk awareness, awareness of who the leading labs are, roughly how much lead time they have, and how threshold-y being slightly ahead is.

There may be opportunities for coordination in the endgame that were much less incentivized earlier, and pausing progress may be incentivized in the endgame despite being disincentivized earlier (from an actor's imperfect perspective and/or from a maximally informed/wise/etc. perspective).

The downside of "AI endgame" as a conceptual handle is that it suggests thinking of actors as opponents/adversaries. Probably "crunch time" is better, but people often use that to gesture at hinge-y-ness rather than strategic clarity.

Comment by Zach Stein-Perlman on Zach Stein-Perlman's Shortform · 2023-03-22T04:30:00.599Z · LW · GW

AI strategy research projects project generators prompts

Mostly for personal use. Likely to be expanded over time.

Some prompts inspired by Framing AI strategy:

  • Plans
    • What plans would be good?
    • Given a particular plan that is likely to be implemented, what interventions or desiderata complement that plan (by making it more likely to succeed or by being better in worlds where it succeeds)
  • Affordances: for various relevant actors, what strategically significant actions could they take? What levers do they have? What would it be great for them to do (or avoid)?
  • Intermediate goals: what goals or desiderata are instrumentally useful?
  • Threat modeling: for various threats, model them well enough to understand necessary and sufficient conditions for preventing them.
  • Memes (& frames): what would it be good if people believed or paid attention to?

For forecasting prompts, see List of uncertainties about the future of AI.

Some miscellaneous prompts:

  • Slowing AI
    • How can various relevant actors slow AI?
    • How can the AI safety community slow AI?
    • What considerations or side effects are relevant to slowing AI?
  • How do labs act, as a function of [whatever determines that]? In particular, what's the deal with "racing"?
  • AI risk advocacy
    • How could the AI safety community do AI risk advocacy well?
    • What considerations or side effects are relevant to AI risk advocacy? 
  • What's the deal with crunch time?
  • How will the strategic landscape be different in the future?
  • What will be different near the end, and what interventions or actions will that enable? In particular, is eleventh-hour coordination possible? (Also maybe emergency brakes that don't appear until the end.)
  • What concrete asks should we have for labs? for government?
  • Meta: how can you help yourself or others do better AI strategy research?
Comment by Zach Stein-Perlman on How bad a future do ML researchers expect? · 2023-03-09T12:15:20.239Z · LW · GW

The dataset is public and includes a question "how long have you worked in" the "AI research area [you have] worked in for the longest time," so you could check something related!

Comment by Zach Stein-Perlman on Taboo "compute overhang" · 2023-03-01T20:39:53.163Z · LW · GW

Yeah, these seem like useful concepts in some contexts too.

I don't understand this sentence:

"Post-AGI compute overhang" here describes the gap between compute used to build AGI in the first place vs. more efficient designs that AI-aided progress could quickly discover.

It's the gap between the training compute of 'the first AGI' and what?

Comment by Zach Stein-Perlman on AI alignment researchers don't (seem to) stack · 2023-02-21T01:31:08.790Z · LW · GW

I largely agree. But I think not-stacking is only slightly bad because I think the "crappy toy model [where] every alignment-visionary's vision would ultimately succeed, but only after 30 years of study along their particular path" is importantly wrong; I think many new visions have a decent chance of succeeding more quickly and if we pursue enough different visions we get a good chance of at least one paying off quickly.

Edit: even if alignment researchers could stack into just a couple paths, I think we might well still choose to go wide.

Comment by Zach Stein-Perlman on The public supports regulating AI for safety · 2023-02-19T19:28:47.462Z · LW · GW

Combating bad regulation would be the obvious way.

In seriousness, I haven’t focused on interventions to improve regulation yet— I just noticed a thing about public opinion and wrote it. (And again, some possible regulations would be good.)

Comment by Zach Stein-Perlman on The public supports regulating AI for safety · 2023-02-19T06:47:20.803Z · LW · GW

But please consider, are you calling for regulation because it actually makes sense, or because it's the Approved Answer to problems?

I didn't call for regulation.

Some possible regulations would be good and some would be bad.

I do endorse trying to nudge regulation to be better than the default.

Comment by Zach Stein-Perlman on Framing AI strategy · 2023-02-15T00:00:10.152Z · LW · GW

I expect to update this comment with additional sources—and perhaps new analytic frames—as I become aware of them and they become public. Last updated 23 May 2023.


  • [draft] Matthijs Maas's "Levers of governance" in Transformative AI Governance: A Literature Review
  • Eugene Bardach's Things Governments Do (affordances for states from a non-AI perspective) (thanks to Matthijs Maas for this suggestion)
  • Observation: you get different taxonomies if you start at goals (like "slow China") vs levers (like "immigration policy"). And your uncertainties are like "how can actor X achieve goal Y" vs "how can actor X leverage its ability Z." Maybe you think of different affordances; try both ways.
  • Alex Gray mentions as a motivating/illustrative example (my paraphrasing): windfall clauses (or equity-sharing or other benefit-sharing mechanisms) are unlikely to be created by labs but it's relatively easy for labs to take an existing windfall-clause affordance

Intermediate goals:

Theories of victory:

Memes & frames:


  • [draft] Alex Lintz's "A simple model for how to prioritize different timelines"
  • Vaniver's Weight by Impact (2023)

If I was rewriting this post today, I would probably discuss something like object-level frames or strategic perspectives. They make aspects of a situation more salient; whether or not they’re true, and whether or not they’re the kind-of-thing that can be true, they can be useful. See Matthijs Maas's Strategic Perspectives on Transformative AI Governance for illustration.

Comment by Zach Stein-Perlman on Important fact about how people evaluate sets of arguments · 2023-02-14T07:34:28.794Z · LW · GW

This largely feels true.

But if someone is disposed to believe P because of strong argument X, the existence of weak arguments for X doesn't feel like it dissuades them.

There's a related phenomenon where--separate from what people believe and how they evaluate arguments--your adversaries will draw attention to the most objectionable things you say, and a movement's adversaries will draw attention to the most objectionable things a member of the movement says.

Comment by Zach Stein-Perlman on Are short timelines actually bad? · 2023-02-06T01:22:37.311Z · LW · GW

I roughly support slowing AI progress (although the space of possibilities has way more dimensions than just slow vs fast). Some takes on "Reasons one might try to accelerate progress":

  • Avoid/delay a race with China + Keep the good guys in the lead. Sure, if you think you can differentially accelerate better actors, that's worth noticing. (And maybe long timelines means more actors in general, which seems bad on net.) I feel pretty uncertain about the magnitude of these factors, though.
  • Smooth out takeoff. Sure, but be careful-- this factor suggests faster progress is good insofar as it's due to greater spending. This is consistent with trying to slow timelines by e.g. trying to get labs to publish less.
  • Another factor is non-AI x-risk: if human-level AI solves other risks, and greater exposure to other risks doesn't help with AI, this is a force in favor of rolling the dice on AI sooner. (I roughly believe non-AI x-risk is much smaller than the increase in x-risk from shorter timelines, but I'm flagging this as cruxy; if I came to believe that e.g. biorisk was much bigger, I would support accelerating AI.)
Comment by Zach Stein-Perlman on AI safety milestones? · 2023-01-25T10:19:10.736Z · LW · GW

Interesting, thanks.

(I agree in part, but (1) planning for far/slow worlds is still useful and (2) I meant more like metrics or model evaluations are part of an intervention, e.g. incorporated into safety standards than metrics inform what we try to do.)

Comment by Zach Stein-Perlman on AI safety milestones? · 2023-01-24T20:46:15.909Z · LW · GW

How is that relevant? It's about whether AI risk will be mainstream. I'm thinking about governance interventions by this community, which doesn't require the rest of the world to appreciate AI risk.