Posts

Mitigating extreme AI risks amid rapid progress [Linkpost] 2024-05-21T19:59:21.343Z
Akash's Shortform 2024-04-18T15:44:25.096Z
Cooperating with aliens and AGIs: An ECL explainer 2024-02-24T22:58:47.345Z
OpenAI's Preparedness Framework: Praise & Recommendations 2024-01-02T16:20:04.249Z
Speaking to Congressional staffers about AI risk 2023-12-04T23:08:52.055Z
Navigating emotions in an uncertain & confusing world 2023-11-20T18:16:09.492Z
International treaty for global compute caps 2023-11-09T18:17:04.952Z
Chinese scientists acknowledge xrisk & call for international regulatory body [Linkpost] 2023-11-01T13:28:43.723Z
Winners of AI Alignment Awards Research Contest 2023-07-13T16:14:38.243Z
AI Safety Newsletter #8: Rogue AIs, how to screen for AI risks, and grants for research on democratic governance of AI 2023-05-30T11:52:31.669Z
AI Safety Newsletter #7: Disinformation, Governance Recommendations for AI labs, and Senate Hearings on AI 2023-05-23T21:47:34.755Z
Eisenhower's Atoms for Peace Speech 2023-05-17T16:10:38.852Z
AI Safety Newsletter #6: Examples of AI safety progress, Yoshua Bengio proposes a ban on AI agents, and lessons from nuclear arms control 2023-05-16T15:14:45.921Z
AI Safety Newsletter #5: Geoffrey Hinton speaks out on AI risk, the White House meets with AI labs, and Trojan attacks on language models 2023-05-09T15:26:55.978Z
AI Safety Newsletter #4: AI and Cybersecurity, Persuasive AIs, Weaponization, and Geoffrey Hinton talks AI risks 2023-05-02T18:41:43.144Z
Discussion about AI Safety funding (FB transcript) 2023-04-30T19:05:34.009Z
Reframing the burden of proof: Companies should prove that models are safe (rather than expecting auditors to prove that models are dangerous) 2023-04-25T18:49:29.042Z
DeepMind and Google Brain are merging [Linkpost] 2023-04-20T18:47:23.016Z
AI Safety Newsletter #2: ChaosGPT, Natural Selection, and AI Safety in the Media 2023-04-18T18:44:35.923Z
Request to AGI organizations: Share your views on pausing AI progress 2023-04-11T17:30:46.707Z
AI Safety Newsletter #1 [CAIS Linkpost] 2023-04-10T20:18:57.485Z
Reliability, Security, and AI risk: Notes from infosec textbook chapter 1 2023-04-07T15:47:16.581Z
New survey: 46% of Americans are concerned about extinction from AI; 69% support a six-month pause in AI development 2023-04-05T01:26:51.830Z
[Linkpost] Critiques of Redwood Research 2023-03-31T20:00:09.784Z
What would a compute monitoring plan look like? [Linkpost] 2023-03-26T19:33:46.896Z
The Overton Window widens: Examples of AI risk in the media 2023-03-23T17:10:14.616Z
The Wizard of Oz Problem: How incentives and narratives can skew our perception of AI developments 2023-03-20T20:44:29.445Z
[Linkpost] Scott Alexander reacts to OpenAI's latest post 2023-03-11T22:24:39.394Z
Questions about Conjecure's CoEm proposal 2023-03-09T19:32:50.600Z
AI Governance & Strategy: Priorities, talent gaps, & opportunities 2023-03-03T18:09:26.659Z
Fighting without hope 2023-03-01T18:15:05.188Z
Qualities that alignment mentors value in junior researchers 2023-02-14T23:27:40.747Z
4 ways to think about democratizing AI [GovAI Linkpost] 2023-02-13T18:06:41.208Z
How evals might (or might not) prevent catastrophic risks from AI 2023-02-07T20:16:08.253Z
[Linkpost] Google invested $300M in Anthropic in late 2022 2023-02-03T19:13:32.112Z
Many AI governance proposals have a tradeoff between usefulness and feasibility 2023-02-03T18:49:44.431Z
Talk to me about your summer/career plans 2023-01-31T18:29:23.351Z
Advice I found helpful in 2022 2023-01-28T19:48:23.160Z
11 heuristics for choosing (alignment) research projects 2023-01-27T00:36:08.742Z
"Status" can be corrosive; here's how I handle it 2023-01-24T01:25:04.539Z
[Linkpost] TIME article: DeepMind’s CEO Helped Take AI Mainstream. Now He’s Urging Caution 2023-01-21T16:51:09.586Z
Wentworth and Larsen on buying time 2023-01-09T21:31:24.911Z
[Linkpost] Jan Leike on three kinds of alignment taxes 2023-01-06T23:57:34.788Z
My thoughts on OpenAI's alignment plan 2022-12-30T19:33:15.019Z
An overview of some promising work by junior alignment researchers 2022-12-26T17:23:58.991Z
Podcast: Tamera Lanham on AI risk, threat models, alignment proposals, externalized reasoning oversight, and working at Anthropic 2022-12-20T21:39:41.866Z
12 career-related questions that may (or may not) be helpful for people interested in alignment research 2022-12-12T22:36:21.936Z
Podcast: Shoshannah Tekofsky on skilling up in AI safety, visiting Berkeley, and developing novel research ideas 2022-11-25T20:47:09.832Z
Announcing AI Alignment Awards: $100k research contests about goal misgeneralization & corrigibility 2022-11-22T22:19:09.419Z
Ways to buy time 2022-11-12T19:31:10.411Z

Comments

Comment by Akash (akash-wasil) on AI #69: Nice · 2024-06-20T23:09:02.392Z · LW · GW

Ah, gotcha– are there more details about which board seats the LTBT will control//how board seats will be added? According to GPT, the current board members are Dario, Daniela, Yasmin, and Jay. (Presumably Dario and Daniela's seats will remain untouched and will not be the ones in LTBT control.)

Also gotcha– removed the claim that he was replaced by Jay.

Comment by Akash (akash-wasil) on AI #69: Nice · 2024-06-20T20:39:20.975Z · LW · GW

Luke Muehlhauser explains he resigned from the Anthropic board because there was a conflict with his work at Open Philanthropy and its policy advocacy. I do not see that as a conflict. If being a board member at Anthropic was a conflict with advocating for strong regulations or considered by them a ‘bad look,’ then that potentially says something is very wrong at Anthropic as well. Yes, there is the ‘behind the scenes’ story but one not behind the scenes must be skeptical. 

I also do not really understand why the COI was considered so strong or unmanageable that Luke felt he needed to resign. Note also that my impression is that OP funds very few "applied policy" efforts, and my impression is that the ones they do fund are mostly focusing on things that Anthropic supports (e.g., science of evals, funding for NIST). I also don't get the vibe that Luke leaving the board is coinciding with any significant changes to OP's approach to governance or policy.

More than that, I think Luke plausibly… chose the wrong role? I realize most board members are very part time, but I think the board of Anthropic was the more important assignment.

I agree with this (I might be especially inclined to believe this because I haven't been particularly impressed with the output from OP's governance team, but I think even if I believed it were doing a fairly good job under Luke's leadership, I would still think that the Anthropic board role were more valuable. On top of that, it would've been relatively easy for OP to replace Luke with someone who has a very similar set of beliefs.)

Comment by Akash (akash-wasil) on Fabien's Shortform · 2024-06-19T18:45:06.842Z · LW · GW

Makes sense— I think the thing I’m trying to point at is “what do you think better safety research actually looks like?”

I suspect there’s some risk that, absent some sort of pre-registrarion, your definition of “good safety research” ends up gradually drifting to be more compatible with the kind of research Anthropic does.

Of course, not all of this will be a bad thing— hopefully you will genuinely learn some new things that change your opinion of what “good research” is.

But the nice thing about pre-registration is that you can be more confident that belief changes are stemming from a deliberate or at least self-aware process, as opposed to some sort of “maybe I thought this all along//i didn’t really know what i believed before I joined” vibe. (and perhaps this is sufficiently covered in your doc)

Comment by Akash (akash-wasil) on Fabien's Shortform · 2024-06-17T01:19:38.240Z · LW · GW

Congrats on the new role! I appreciate you sharing this here.

If you're able to share more, I'd be curious to learn more about your uncertainties about the transition. Based on your current understanding, what are the main benefits you're hoping to get at Anthropic? In February/March, what are the key areas you'll be reflecting on when you decide whether to stay at Anthropic or come back to Redwood?

Obviously, your February/March write-up will not necessarily conform to these "pre-registered" considerations. But nonetheless, I think pre-registering some considerations or uncertainties in advance could be a useful exercise (and I would certainly find it interesting!)

Comment by Akash (akash-wasil) on MIRI's June 2024 Newsletter · 2024-06-16T17:36:59.681Z · LW · GW

Don’t have time to respond in detail but a few quick clarifications/responses:

— I expect policymakers to have the most relevant/important questions about policy and to be the target audience most relevant for enacting policies. Not solving technical alignment. (Though I do suspect that by MIRI’s lights, getting policymakers to understand alignment issues would be more likely to result in alignment progress than having more conversations with people in the technical alignment space.)

— There are lots of groups focused on comms/governance. MIRI is unique only insofar as it started off as a “technical research org” and has recently pivoted more toward comms/governance.

— I do agree that MIRI has had relatively low output for a group of its size/resources/intellectual caliber. I would love to see more output from MIRI in general. Insofar as it is constrained, I think they should be prioritizing “curious policy newcomers” over people like Matthew and Alex. — Minor but I don’t think MIRI is getting “outargued” by those individuals and I think that frame is a bit too zero-sum.

— Controlling for overall level of output, I suspect I’m more excited than you about MIRI spending less time on LW and more time on comms/policy work with policy communities (EG Malo contributing to the Schumer insight forums, MIRI responding to government RFCs). — My guess is we both agree that MIRI could be doing more on both fronts and just generally having higher output. My impression is they are working on this and have been focusing on hiring; I think if their output stayed relatively the same 3-6 months from now I will be fairly disappointed.

Comment by Akash (akash-wasil) on MIRI's June 2024 Newsletter · 2024-06-16T13:49:02.633Z · LW · GW

I think if MIRI engages with “curious newcomers” those newcomers will have their own questions/confusions/objections and engaging with those will improve general understanding.

Based on my experience so far, I don’t expect their questions/confusions/objections to overlap a lot with the questions/confusions/objections that tech-oriented active LW users have.

I also think it’s not accurate to say that MIRI tends to ignore its strongest critics; there’s perhaps more public writing/dialogues between MIRI and its critics than for pretty much any other organization in the space.

My claim is not that MIRI should ignore its critics but moreso that it should focus on replying to criticisms or confusions from “curious and important newcomers”. My fear is that MIRI might engage too much with criticisms from LW users and other ingroup members and not focus enough on engaging with policy folks, whose cruxes and opinions often differ substantially than EG the median LW commentator.

Comment by Akash (akash-wasil) on MIRI's June 2024 Newsletter · 2024-06-16T13:24:38.108Z · LW · GW

Offering a quick two cents: I think MIRI‘s priority should be to engage with “curious and important newcomers” (e.g., policymakers and national security people who do not yet have strong cached views on AI/AIS). If there’s extra capacity and interest, I think engaging with informed skeptics is also useful (EG big fan of the MIRI dialogues), but on the margin I don’t suspect it will be as useful as the discussions with “curious and important newcomers.”

Comment by Akash (akash-wasil) on Akash's Shortform · 2024-06-14T16:00:43.410Z · LW · GW

@Ryan Kidd @Lee Sharkey I suspect you'll have useful recommendations here.

Comment by Akash (akash-wasil) on Akash's Shortform · 2024-06-14T15:55:25.855Z · LW · GW

Recommended readings for people interested in evals work?

Someone recently asked: "Suppose someone wants to get into evals work. Is there a good reading list to send to them?" I spent ~5 minutes and put this list together. I'd be interested if people have additional suggestions or recommendations:

I would send them:

I would also encourage them to read stuff more on the "macrostrategy" of evals. Like, I suspect a lot of value will come from people who are able to understand the broader theory of change of evals and identify when we're "rowing" in bad directions. Some examples here might be:

Comment by Akash (akash-wasil) on AI catastrophes and rogue deployments · 2024-06-14T15:37:07.512Z · LW · GW

I think that rogue internal deployment might be a bigger problem than rogue external deployment, mostly because I’m pretty bullish on simple interventions to prevent weight exfiltration.

Can you say more about this? Unless I'm misunderstanding it, it seems like this hot take goes against the current "community consensus" which is something like "on the default AGI development trajectory, it's extremely unlikely that labs will be able to secure weights from China."

Would you say you're simply more bullish about upload limits than others? Or that you think the mainstream security people just haven't thought about some of the ways that securing weights might be easier than securing other things that society struggles to protect from state actors?

Comment by Akash (akash-wasil) on Access to powerful AI might make computer security radically easier · 2024-06-08T22:14:18.805Z · LW · GW

I think this is an interesting line of inquiry and the specific strategies expressed are helpful.

One thing I'd find helpful is a description of the kind of AI system that you think would be necessary to get us to state-proof security. 

I have a feeling the classic MIRI-style "either your system is too dumb to achieve the goal or your system is so smart that you can't trust it anymore" argument is important here. The post essentially assumes that we have a powerful trusted model that can do impressive things like "accurately identify suspicious actions" but is trusted enough to be widely internally deployed. This seems fine for a brainstorming exercise (and I do think such brainstorming exercises should exist).

But for future posts like this, I think it would be valuable to have a ~1-paragraph description of the AI system that you have in mind. Perhaps noting what its general capabilities and what its security-relevant capabilities are. I imagine this would help readers evaluate whether or not they expect to get a "Goldilocks system" (smart enough to do useful things but not so smart that internally deploying the system would be dangerous, even with whatever SOTA control procedures are applied.)

Comment by Akash (akash-wasil) on Akash's Shortform · 2024-06-07T23:33:53.558Z · LW · GW

@Peter Barnett @Rob Bensinger @habryka @Zvi @davekasten @Peter Wildeford you come to mind as people who might be interested. 

See also Wikipedia Page about the report (but IMO reading sections of the actual report is worth it.)

Comment by Akash (akash-wasil) on Akash's Shortform · 2024-06-07T23:29:17.922Z · LW · GW

I've started reading the Report on the International Control of Atomic Energy and am finding it very interesting/useful.

I recommend this for AI policy people– especially those interested in international cooperation, US policy, and/or writing for policy audiences

Comment by Akash (akash-wasil) on Response to Aschenbrenner's "Situational Awareness" · 2024-06-07T23:07:45.382Z · LW · GW

when it would potentially be vastly easier to spearhead an international alliance to prohibit this technology.

I would be interested in reading more about the methods that could be used to prohibit the proliferation of this technology (you can assume a "wake-up" from the USG). 

I think one of the biggest fears would be that any sort of international alliance would not have perfect/robust detection capabilities, so you're always risking the fact that someone might be running a rogue AGI project.

Also, separately, there's the issue of "at some point, doesn't it become so trivially easy to develop AGI that we still need the International Community Good Guys to develop AGI [or do something else] that gets us out of the acute risk period?" When you say "prohibit this technology", do you mean "prohibit this technology from being developed outside of the International Community Good Guys Cluster" or do you mean "prohibit this technology in its entirety?" 

Comment by Akash (akash-wasil) on AI #67: Brief Strange Trip · 2024-06-07T22:08:52.607Z · LW · GW

What is up with Anthropic’s public communications?

Once again this week, we saw Anthropic’s public communications lead come out warning about overregulation, in ways I expect to help move the Overton window away from the things that are likely going to become necessary.

Note also that Anthropic recently joined TechNet, an industry advocacy group that is generally considered "anti-regulation" and specifically opposes SB 1047.

I think a responsible AGI lab would be taking a much stronger role in pushing the Overton Window and pushing for strong policies. At the very least, I would hope that the responsible AGI lab has comms that clearly signal dangers from race dynamics, dangers from superintelligence, and the need for the government to be prepared to intervene swiftly in the event of emergencies. 

This is not what I see from Anthropic. I am disappointed in Anthropic. If Anthropic wants me to consider it a "responsible AGI lab", I will need an explanation of why Anthropic has stayed relatively silent, why it is joining groups that oppose SB 1047, and why its policy team seems to have advocated for ~nothing beyond voluntary commitments and optional model evaluations.

(I will note that I thought Dario's Senate Testimony last year included some reasonable things. Although the focus was on misuse threats, he mentions that we may no longer have the ability to control models and calls for legislation that would require that models pass certain standards before deployment).

Comment by Akash (akash-wasil) on Response to Aschenbrenner's "Situational Awareness" · 2024-06-07T19:12:42.355Z · LW · GW

The field is not ready, and it's not going to suddenly become ready tomorrow. We need urgent and decisive action, but to indefinitely globally halt progress toward this technology that threatens our lives and our children's lives, not to accelerate ourselves straight off a cliff.

I think most advocacy around international coordination (that I've seen, at least) has this sort of vibe to it. The claim is "unless we can make this work, everyone will die."

I think this is an important point to be raising– and in particular I think that efforts to raise awareness about misalignment + loss of control failure modes would be very useful. Many policymakers have only or primarily heard about misuse risks and CBRN threats, and the "policymaker prior" is usually to think "if there is a dangerous, tech the most important thing to do is to make the US gets it first."

But in addition to this, I'd like to see more "international coordination advocates" come up with concrete proposals for what international coordination would actually look like. If the USG "wakes up", I think we will very quickly see that a lot of policymakers + natsec folks will be willing to entertain ambitious proposals.

By default, I expect a lot of people will agree that international coordination in principle would be safer but they will fear that in practice it is not going to work. As a rough analogy, I don't think most serious natsec people were like "yes, of course the thing we should do is enter into an arms race with the Soviet Union. This is the safeest thing for humanity."

Rather, I think it was much more a vibe of "it would be ideal if we could all avoid an arms race, but there's no way we can trust the Soviets to follow-through on this." (In addition to stuff that's more vibesy and less rational than this, but I do think insofar as logic and explicit reasoning were influential, this was likely one of the core cruses.)

In my opinion, one of the most important products for "international coordination advocates" to produce is some sort of concrete plan for The International Project. And importantly, it would need to somehow find institutional designs and governance mechanisms that would appeal to both the US and China. Answering questions like "how do the international institutions work", "who runs them", "how are they financed", and "what happens if the US and China disagree" will be essential here.

The Baruch Plan and the Acheson-Lilienthal Report (see full report here) might be useful sources of inspiration.

P.S. I might personally spend some time on this and find others who might be interested. Feel free to reach out if you're interested and feel like you have the skillset for this kind of thing.

Comment by Akash (akash-wasil) on Zach Stein-Perlman's Shortform · 2024-06-07T01:29:23.840Z · LW · GW

@Ebenezer Dukakis I would be even more excited about a "how and why" post for internationalizing AGI development and spelling out what kinds of international institutions could build + govern AGI.

Comment by Akash (akash-wasil) on SB 1047 Is Weakened · 2024-06-06T16:42:27.599Z · LW · GW

To what extent do you think the $100M threshold will weaken the bill "in practice?" I feel like "severely weakened" might overstate the amount of weakenedness. I would probably say "mildly weakened."

I think the logic along the lines of "the frontier models are going to be the ones where the dangerous capabilities are discovered first, so maybe it seems fine (for now) to exclude non-frontier models" makes some amount of sense.

In the long-run, this approach fails because you might be able to hit dangerous capabilities with <$100M. But in the short-run, it feels like the bill covers the most relevant actors (Microsoft, Meta, Google, OpenAI, Anthropic).

Maybe I always thought the point of the bill was to cover frontier AI systems (which are still covered) as opposed to any systems that could have hazardous capabilities, so I see the $100M threshold as more of a "compromise consistent with the spirit of the bill" as opposed to a "substantial weakening of the bill." What do you think?

See also:

Comment by Akash (akash-wasil) on Thomas Kwa's Shortform · 2024-06-05T15:54:01.038Z · LW · GW

*Quickly checks my ratio*

"Phew, I've survived the Kwa Purge"

Comment by Akash (akash-wasil) on Akash's Shortform · 2024-06-04T22:28:12.241Z · LW · GW

@Bodgan, Can you spell out a vision for a stably multipolar world with the above assumptions satisfied?

IMO assumption B is doing a lot of the work— you might argue that the IE will not give anyone a DSA, in which case things get more complicated. I do see some plausible stories in which this could happen but they seem pretty unlikely.

@Ryan, thanks for linking to those. Lmk if there are particular points you think are most relevant (meta: I think in general I find discourse more productive when it’s like “hey here’s a claim, also read more here” as opposed to links. Ofc that puts more communication burden on you though, so feel free to just take the links approach.)

Comment by Akash (akash-wasil) on Akash's Shortform · 2024-06-04T19:06:31.780Z · LW · GW

the probable increase in risks of centralization might make it not worth it

Can you say more about why the risk of centralization differs meaningfully between the three worlds?

IMO if you assume that (a) an intelligence explosion occurs at some point, (b) the leading actor uses the intelligence explosion to produce a superintelligence that provides a decisive strategic advantage, and (c) the superintelligence is aligned/controlled...

Then you are very likely (in the absence of coordination) to result in centralization no matter what. It's just a matter of whether OpenAI/Microsoft (scenario #1), the USG and allies (scenario #2), or a broader international coalition (weighted heavily toward the USG and China) are the ones wielding the superintelligence.

(If anything, it seems like the "international coalition" approach seems less likely to lead to centralization than the other two approaches, since you're more likely to get post-AGI coordination.)

especially if you don't use AI automation (using the current paradigm, probably) to push those forward.

In my vision, the national or international project would be investing into "superalignment"-style approaches, they would just (hopefully) have enough time/resources to be investing into other approaches as well.

I typically assume we don't get "infinite time"– i.e., even the international coalition is racing against "the clock" (e.g., the amount of time it takes for a rogue actor to develop ASI in a way that can't be prevented, or the amount of time we have until a separate existential catastrophe occurs.) So I think it would be unwise for the international coalition to completely abandon DL/superalignemnt, even if one of the big hopes is that a safer paradigm would be discovered in time.

Comment by Akash (akash-wasil) on Akash's Shortform · 2024-06-04T17:20:39.019Z · LW · GW

My rough ranking of different ways superintelligence could be developed:

  1. Least safe: Corporate Race. Superintelligence is developed in the context of a corporate race between OpenAI, Microsoft, Google, Anthropic, and Facebook.
  2. Safer (but still quite dangerous): USG race with China. Superintelligence is developed in the context of a USG project or "USG + Western allies" project with highly secure weights. The coalition hopefully obtains a lead of 1-3 years that it tries to use to align superintelligence and achieve a decisive strategic advantage. This probably relies heavily on deep learning and means we do not have time to invest into alternative paradigms ("provably safe" systems, human intelligence enhancement, etc.
  3. Safest (but still not a guarantee of success): International coalition Superintelligence is developed in the context of an international project with highly secure weights. The coalition still needs to develop superintelligence before rogue projects can, but the coalition hopes to obtain a lead of 10+ years that it can use to align a system that can prevent rogue AGI projects. This could buy us enough time to invest heavily in alternative paradigms. 

My own thought is that we should be advocating for option #3 (international coordination) unless/until there is enough evidence that suggests that it's actually not feasible, and then we should settle for option #2. I'm not yet convinced by people who say we have to settle for option #2 just because EG climate treaties have not went well or international cooperation is generally difficult. 

But I also think people advocating #3 should be aware that there are some worlds in which international cooperation will not be feasible, and we should be prepared to do #2 if it's quite clear that the US and China are unwilling to cooperate on AGI development. (And again, I don't think we have that evidence yet– I think there's a lot of uncertainty here.)

Comment by Akash (akash-wasil) on Prometheus's Shortform · 2024-06-04T02:49:51.757Z · LW · GW

Thanks for sharing your experience here. 

One small thought is that things end up feeling extremely neglected once you index on particular subquestions. Like, on a high-level, it is indeed the case that AI safety has gotten more mainstream.

But when you zoom in, there are a lot of very important topics that have <5 people seriously working on them. I work in AI policy, so I'm more familiar with the policy/governance ones but I imagine this is also true in technical (also, maybe consider swapping to governance/policy!)

Also, especially in hype waves, I think a lot of people end up just working on the popular thing. If you're willing to deviate from the popular thing, you can often find important parts of the problem that nearly no one is addressing.

Comment by Akash (akash-wasil) on Seth Herd's Shortform · 2024-06-03T13:32:30.177Z · LW · GW

Second, our different takes will tend to make a lot of our communication efforts cancel each other out. If alignment is very hard, we must Shut It Down or likely die. If it's less difficult, we should primarily work hard on alignment.

I don't think this is (fully) accurate. One could have a high P(doom) but still think that the current AGI development paradigm is still best-suited to obtain good outcomes & government involvement would make things worse in expectation. On the flipside, one could have a low/moderate P(doom) but think that the safest way to get to AGI involves government intervention that ends race dynamics & think that government involvement would make P(doom) even lower. 

Absolute P(doom) is one factor that might affect one's willingness to advocate for strong government involvement, but IMO it's only one of many factors, and LW folks sometimes tend to make it seem like it's the main/primary/only factor.

Of course, if a given organization says they're supporting X because of their P(Doom), I agree that they should provide evidence for their P(doom). 

My claim is simply that we shouldn't assume that "low P(doom) means govt intervention bad and high P(doom) means govt intervention good". 

One's views should be affected by a lot of other factors, such as "how bad do you think race dynamics are", "to what extent do you think industry players are able and willing to be cautious", "to what extent do you think governments will end up understanding and caring about alignment", and "to what extent do you think governments would have safety cultures around intelligence enhancement compared to industry players."

Comment by Akash (akash-wasil) on Akash's Shortform · 2024-06-02T01:07:19.064Z · LW · GW

I found this answer helpful and persuasive– thank you!

Comment by Akash (akash-wasil) on We might be dropping the ball on Autonomous Replication and Adaptation. · 2024-05-31T15:14:47.400Z · LW · GW

Potentially unpopular take, but if you have the skillset to do so, I'd rather you just come up with simple/clear explanations for why ARA is dangerous, what implications this has for AI policy, present these ideas to policymakers, and iterate on your explanations as you start to see why people are confused.

Note also that in the US, the NTIA has been tasked with making recommendations about open-weight models. The deadline for official submissions has ended but I'm pretty confident that if you had something you wanted them to know, you could just email it to them and they'd take a look. My impression is that they're broadly aware of extreme risks from certain kinds of open-sourcing but might benefit from (a) clearer explanations of ARA threat models and (b) specific suggestions for what needs to be done.

Comment by Akash (akash-wasil) on We might be dropping the ball on Autonomous Replication and Adaptation. · 2024-05-31T15:10:01.711Z · LW · GW

Why do you think we are dropping the ball on ARA?

I think many members of the policy community feel like ARA is "weird" and therefore don't want to bring it up. It's much tamer to talk about CBRN threats and bioweapons. It also requires less knowledge and general competence– explaining ARA and autonomous systems risks is difficult, you get more questions, you're more likely to explain something poorly, etc.

Historically, there was also a fair amount of gatekeeping, where some of the experienced policy people were explicitly discouraging people from being explicit about AGI threat models (this still happens to some degree, but I think the effect is much weaker than it was a year ago.)

With all this in mind, I currently think raising awareness about ARA threat models and AI R&D threat models is one of the most important things for AI comms/policy efforts to get right.

In the status quo, even if the evals go off, I don't think we have laid the intellectual foundation required for policymakers to understand why the evals are dangerous. "Oh interesting– an AI can make copies of itself? A little weird but I guess we make copies of files all the time, shrug." or "Oh wow– AI can help with R&D? That's awesome– seems very exciting for innovation."

I do think there's a potential to lay the intellectual foundation before it's too late, and I think many groups are starting to be more direct/explicit about the "weirder" threat models. Also, I think national security folks have more of a "take things seriously and worry about things even if there isn't clear empirical evidence yet" mentality than ML people. And I think typical policymakers fall somewhere in between. 

Comment by Akash (akash-wasil) on Non-Disparagement Canaries for OpenAI · 2024-05-31T02:31:11.874Z · LW · GW

Minor note: Paul is at the US AI Safety Institute, while Jade & Geoffrey are at the UK AI Safety Institute. 

Comment by Akash (akash-wasil) on MIRI 2024 Communications Strategy · 2024-05-30T21:35:00.502Z · LW · GW

@habryka I think you're making a claim about whether or not the difference matters (IMO it does) but I perceived @Kaj_Sotala to be making a claim about whether "an average reasonably smart person out in society" would see the difference as meaningful (IMO they would not). 

(My guess is you interpreted "reasonable people" to mean like "people who are really into reasoning about the world and trying to figure out the truth" and Kaj interpreted reasonable people to mean like "an average person." Kaj should feel free to correct me if I'm wrong.)

Comment by Akash (akash-wasil) on MIRI 2024 Communications Strategy · 2024-05-30T21:30:03.138Z · LW · GW

My two cents RE particular phrasing:

When talking to US policymakers, I don't think there's a big difference between "causes a national security crisis" and "kills literally everyone." Worth noting that even though many in the AIS community see a big difference between "99% of people die but civilization restarts" vs. "100% of people die", IMO this distinction does not matter to most policymakers (or at least matters way less to them).

Of course, in addition to conveying "this is a big deal" you need to convey the underlying threat model. There are lots of ways to interpret "AI causes a national security emergency" (e.g., China, military conflict). "Kills literally everyone" probably leads people to envision a narrower set of worlds.

But IMO even "kills literally everybody" doesn't really convey the underlying misalignment/AI takeover threat model.

So my current recommendation (weakly held) is probably to go with "causes a national security emergency" or "overthrows the US government" and then accept that you have to do some extra work to actually get them to understand the "AGI--> AI takeover--> Lots of people die and we lose control" model.

Comment by Akash (akash-wasil) on MIRI 2024 Communications Strategy · 2024-05-30T16:15:09.421Z · LW · GW

Valid!

Comment by Akash (akash-wasil) on MIRI 2024 Communications Strategy · 2024-05-30T16:14:50.134Z · LW · GW

Thanks! Despite the lack of SMART goals, I still feel like this reply gave me a better sense of what your priorities are & how you'll be assessing success/failure.

One failure mode– which I'm sure is already on your radar– is something like: "MIRI ends up producing lots of high-quality stuff but no one really pays attention. Policymakers and national security people are very busy and often only read things that (a) directly relate to their work or (b) are sent to them by someone who they respect."

Another is something like: "MIRI ends up focusing too much on making arguments/points that are convincing to general audiences but fail to understand the cruxes/views of the People Who Matter." (A strawman version of this is something like "MIRI ends up spending a lot of time in the Bay and there's lots of pressure to engage a bunch with the cruxes/views of rationalists, libertarians, e/accs, and AGI company employees. Meanwhile, the kinds of conversations happening among natsec folks & policymakers look very different, and MIRI's materials end up being less relevant/useful to this target audience."

I'm extremely confident that these are already on your radar, but I figure it might be worth noting that these are two of the failure modes I'm most worried about. (I guess besides the general boring failure mode along the lines of "hiring is hard and doing anything is hard and maybe things just stay slow and when someone asks what good materials you guys have produced the answer is still 'we're working on it'.)

(Final note: A lot of my questions and thoughts have been critical, but I should note that I appreciate what you're doing & I'm looking forward to following MIRI's work in the space! :D)

Comment by Akash (akash-wasil) on MIRI 2024 Communications Strategy · 2024-05-30T16:06:36.143Z · LW · GW

Thank you! I still find myself most curious about the "how will MIRI make sure it understands its audience" and "how will MIRI make sure its materials are read by policymakers + natsec people" parts of the puzzle. Feel free to ignore this if we're getting too in the weeds, but I wonder if you can share more details about either of these parts.

There is also an audience-specific component, and to do well on that, we do need to understand our audience better. We are working to recruit beta readers from appropriate audience pools.

There are several approaches here, most of which will not be executed by the comms team directly, we hand off to others

Comment by Akash (akash-wasil) on Akash's Shortform · 2024-05-30T16:01:07.356Z · LW · GW

I'm surprised why some people are so interested in the idea of liability for extreme harms. I understand that from a legal/philosophical perspective, there are some nice arguments about how companies should have to internalize the externalities of their actions etc.

But in practice, I'd be fairly surprised if liability approaches were actually able to provide a meaningful incentive shift for frontier AI developers. My impression is that frontier AI developers already have fairly strong incentives to avoid catastrophes (e.g., it would be horrible for Microsoft if its AI model caused $1B in harms, it would be horrible for Meta and the entire OS movement if an OS model was able to cause $1B in damages.)

And my impression is that most forms of liability would not affect this cost-benefit tradeoff by very much. This is especially true if the liability is only implemented post-catastrophe. Extreme forms of liability could require insurance, but this essentially feels like a roundabout and less effective way of implementing some form of licensing (you have to convince us that risks are below an acceptable threshold to proceed.)

I think liability also has the "added" problem of being quite unpopular, especially among Republicans. It is easy to attack liability regulations as anti-innovation, argue that that it creates a moat (only big companies can afford to comply), and argue that it's just not how America ends up regulating things (we don't hold Adobe accountable for someone doing something bad with Photoshop.)

To be clear, I don't think "something is politically unpopular" should be a full-stop argument against advocating for it.

But I do think that "liability for AI companies" scores poorly both on "actual usefulness if implemented" and "political popularity/feasibility." I also think the "liability for AI companies" advocacy often ends up getting into abstract philosophy land (to what extent should companies internalize externalities) and ends up avoiding some of the "weirder" points (we expect AI has a considerable chance of posing extreme national security risks, which is why we need to treat AI differently than Photoshop.)

I would rather people just make the direct case that AI poses extreme risks & discuss the direct policy interventions that are warranted.

With this in mind, I'm not an expert in liability and admittedly haven't been following the discussion in great detail (partly because the little I have seen has not convinced me that this is an approach worth investing into). I'd be interested in hearing more from people who have thought about liability– particularly concrete stories for how liability would be expected to meaningfully shift incentives of labs. (See also here). 

Stylistic note: I'd prefer replies along the lines of "here is the specific argument for why liability would significantly affect lab incentives and how it would work in concrete cases" rather than replies along the lines of "here is a thing you can read about the general legal/philosophical arguments about how liability is good."

Comment by Akash (akash-wasil) on MIRI 2024 Communications Strategy · 2024-05-30T15:42:03.558Z · LW · GW

the artifacts we're producing are very big and we want to get them right.

To the extent that this can be shared– What are the artifacts you're most excited about, and what's your rough prediction about when they will be ready?

Moreover, how do you plan to assess the success/failure of your projects? Are there any concrete metrics you're hoping to achieve? What does a "really good outcome" for MIRI's comms team look like by the end of the year, and what does a "we have failed and need to substantially rethink our approach, speed, or personnel" outcome look like?

(I ask partially because one of my main uncertainties right now is how well MIRI will get its materials in front of the policymakers and national security officials you're trying to influence. In the absence of concrete goals/benchmarks/timelines, I could imagine a world where MIRI moves at a relatively slow pace, produces high-quality materials with truthful arguments, but this content isn't getting to the target audience, and the work isn't being informed by the concerns/views of the target audience.)

Comment by Akash (akash-wasil) on MIRI 2024 Communications Strategy · 2024-05-30T15:34:40.644Z · LW · GW

Got it– thank you! Am I right in thinking that your team intends to influence policymakers and national security officials, though? If so, I'd be curious to learn more about how you plan to get your materials in front of them or ensure that your materials address their core points of concern/doubt.

Put a bit differently– I feel like it would be important for your team to address these questions insofar as your team has the following goals:

The main audience we want to reach is policymakers – the people in a position to enact the sweeping regulation and policy we want – and their staff.

We are hopeful about reaching a subset of policy advisors who have the skill of thinking clearly and carefully about risk, particularly those with experience in national security.

Comment by Akash (akash-wasil) on MIRI 2024 Communications Strategy · 2024-05-30T01:04:30.672Z · LW · GW

Thank you for this update—I appreciate the clear reasoning. I also personally feel that the AI policy community is overinvested in the "say things that will get you points" strategy and underinvested in the "say true things that help people actually understand the problem" strategy. Specifically, I feel like many US policymakers have heard "be scared of AI because of bioweapons" but have not heard clear arguments about risks from autonomous systems, misalignment, AI takeover, etc. 

A few questions:

  1. To what extent is MIRI's comms team (or technical governance team) going to interact directly with policymakers and national security officials? (I personally suspect you will be more successful if you're having regular conversations with your target audience and taking note of what points they find confusing or unconvincing rather than "thinking from first principles" about what points make a sound argument.)
  2. To what extent is MIRI going to contribute to concrete policy proposals (e.g., helping offices craft legislation or helping agencies craft specific requests)?
  3. To what extent is MIRI going to help flesh out how its policy proposals could be implemented? (e.g., helping iron out the details of what a potential international AI compute governance regime would look like, how it would be implemented, how verification would work, what society would do with the time it buys)
  4. Suppose MIRI has an amazing resource about AI risks. How does MIRI expect to get national security folks and important policymakers to engage with it?

(Tagging @lisathiergart in case some of these questions overlap with the work of the technical governance team.)

Comment by Akash (akash-wasil) on What mistakes has the AI safety movement made? · 2024-05-29T03:32:49.151Z · LW · GW

6 respondents thought AI safety could communicate better with the wider world. The AI safety community do not articulate the arguments for worrying about AI risk well enough, come across as too extreme or too conciliatory, and lean into some memes too much or not enough.

I think this accurately captures a core debate in AI comms/AI policy at the moment. Some groups are worried about folks coming off as too extreme (e.g., by emphasizing AI takeover and loss-of-control risks) and some groups are worried about folks worrying so much about sounding "normal" that they give an inaccurate or incomplete picture of the risks (e.g., by getting everyone worried about AI-generated bioweapons, even if the speaker does not believe that "malicious use from bioweapons" is the most plausible or concerning threat model.) 

My own opinion is that I'm quite worried that some of the "attempts to look normal" have led to misleading/incorrect models of risk. These models of risk (which tend to focus more on malicious use than risks from autonomous systems) do not end up producing reasonable policy efforts.

The tides seem to be changing, though—there have been more efforts to raise awareness about AGI, AGI takeover, risks from autonomous systems, and risks from systems that can produce a decisive strategic advantage. I think these risks are quite important for policymakers to understand, and clear/straightforward explanations of them are rare. 

I also think status incentives are discouraging (some) people from raising awareness about these threat models– people don't want to look silly, dumb, sci-fi, etc. But IMO one of the most important comms/policy challenges will be getting people to take such threat models seriously, and I think there are ways to explain such threat models legitimately. 

Comment by Akash (akash-wasil) on Maybe Anthropic's Long-Term Benefit Trust is powerless · 2024-05-27T20:21:36.958Z · LW · GW

Thanks for looking into this! A few basic questions about the Trust:

1. Do we know if trustees can serve multiple terms? See below for a quoted section from Anthropic's site:

Trustees serve one-year terms and future Trustees will be elected by a vote of the Trustees.

2. Do we know what % of the board is controlled by the trustees, and by when it is expected to be a majority?

The Trust is an independent body of five financially disinterested members with an authority to select and remove a portion of our Board that will grow over time (ultimately, a majority of our Board).

3. Do we know if Paul is still a Trustee, or does his new role at USAISI mean he had to step down?

The initial Trustees are:

Jason Matheny: CEO of the RAND Corporation
Kanika Bahl: CEO & President of Evidence Action
Neil Buddy Shah: CEO of the Clinton Health Access Initiative (Chair)
Paul Christiano: Founder of the Alignment Research Center
Zach Robinson: Interim CEO of Effective Ventures US

Comment by Akash (akash-wasil) on Big Picture AI Safety: Introduction · 2024-05-24T17:07:31.846Z · LW · GW

Which of the institutions would you count as AGI labs? (genuinely curious– usually I don't think about academic labs [relative to like ODA + Meta + Microsoft] but perhaps there are some that I should be counting.)

And yeah, OP funding is a weird metric because there's a spectrum of how much grantees are closely tied to OP. Like, there's a wide spectrum from "I have an independent research group and got 5% of my total funding from OP" all the way to like "I get ~all my funding from OP and work in the same office as OP and other OP allies and many of my friends/colleagues are OP etc."

That's why I tried to use the phrase "close allies/grantees", to convey more of this implicit cultural stuff than merely "have you ever received OP $." My strong impression is that the authors of the paper are much more intellectually/ideologically/culturally independent from OP, relative to the list of 17 interviewees presented above. 

Comment by Akash (akash-wasil) on Big Picture AI Safety: Introduction · 2024-05-23T18:27:10.147Z · LW · GW

What do AI safety experts believe about the big picture of AI risk?

I would be careful not to implicitly claim that these 17 people are a "representative sample" of the AI safety community. Or, if you do want to make that claim, I think it's important to say a lot more about how these particular participants were chosen and why you think they are represented.

At first glance, it seems to me like this pool of participants overrepresents some worldviews and under-represents others. For example, it seems like the vast majority of the participants either work for AGI labs, Open Philanthropy, and close allies/grantees of OP. OP undoubtedly funds a lot of AIS groups, but there are lots of experts who approach AIS from a different set of assumptions and worldviews.

More specifically, I'd say this list of 17 experts over-represents what I might refer to as the "Open Phil + AGI labs + people funded by or close to those entities" cluster of thinkers (who IMO generally are more optimistic than folks at groups like MIRI, Conjecture, CAIS, FLI, etc.) & over-represents people who are primarily focused on technical research (who IMO are generally most optimistic about technical alignment, more likely to believe empirical work is better than conceptual work, and more likely to believe in technical rather than socio-technical approaches.)

To be clear– I still think that work like this is & can be important. Also, there is some representation from people outside of the particular subculture I'm claiming is over-represented.

But I think it is very hard to do a survey that actually meaningfully represents the AI safety community, and I think there are a lot of subjective decisions that go into figuring out who counts as an "expert" in the field. 

Comment by Akash (akash-wasil) on Open Thread Spring 2024 · 2024-05-21T20:02:27.550Z · LW · GW

Great point! (Also oops– I forgot that Irving was formerly OpenAI as well. He worked for DeepMind in recent years, but before that he worked at OpenAI and Google Brain.)

Do we have any evidence that DeepMind or Anthropic definitely do not do non-disparagement agreements? (If so then we can just focus on former OpenAI employees.)

Comment by Akash (akash-wasil) on New voluntary commitments (AI Seoul Summit) · 2024-05-21T19:37:11.464Z · LW · GW

This seems like a solid list. Scaling certainly seems core to the RSP concept.

IMO "red lines, iterative policy updates, and evaluations & accountability" are sort of pointing at the same thing. Roughly something like "we promise not to cross X red line until we have published Y new policy and allowed the public to scrutinize it for Z amount of time."

One interesting thing here is that none of the current RSPs meet this standard. I suppose the closest is Anthropic's, where they say they won't scale to ASL-4 until they publish a new RSP (this would cover "red line" but I don't believe they commit to giving the public a chance to scrutinize it, so it would only partially meet "iterative policy updates" and wouldn't meet "evaluations and accountability.")

They will not suddenly cross any of their red lines before the mitigations are implemented/a new RSP version has been published and given scrutiny, by pointing at specific evaluations procedures and policies

This seems like the meat of an ideal RSP. I don't think it's done by any of the existing voluntary scaling commitments. All of them have this flavor of "our company leadership will determine when the mitigations are sufficient, and we do not commit to telling you what our reasoning is." OpenAI's PF probably comes the closest, IMO (e.g., leadership will evaluate if the mitigations have moved the model from the stuff described in the "critical risk" category to the stuff described in the "high risk" category.)

As long as the voluntary scaling commitments end up having this flavor of "leadership will make a judgment call based on its best reasoning", it feels like the commitments lack most of the "teeth" of the kind of RSP you describe. 

(So back to the original point– I think we could say that something is only an RSP if it has the "we won't cross this red line until we give you a new policy and let you scrutinize it and also tell you how we're going to reason about when our mitigations are sufficient" property, but then none of the existing commitments would qualify as RSPs. If we loosen the definition, then I think we just go back to "these are voluntary commitments that have to do with scaling & how the lab is thinking about risks from scaling.")

Comment by Akash (akash-wasil) on Anthropic: Reflections on our Responsible Scaling Policy · 2024-05-21T17:53:09.331Z · LW · GW

on Earth you don't get sufficient credit for sharing good policies and there's substantial negative EV from misunderstandings and adversarial interpretations, so I guess it's often correct to not share :(

What's the substantial negative EV that would come from misunderstanding or adversarial interpretations? I feel like in this case, worst-case would be like "the non-compliance reporting policy is actually pretty good but a few people say mean things about it and say 'see, here's why we need government oversight.' But this feels pretty minor/trivial IMO.

As an 80/20 of publishing, maybe you could share a policy with an external auditor who would then publish whether they think it's good or have concerns. I would feel better if that happened all the time

This is clever, +1. 

Comment by Akash (akash-wasil) on New voluntary commitments (AI Seoul Summit) · 2024-05-21T17:41:12.179Z · LW · GW

even if you are skeptical of the value of RSPs, I think you should be in favor of a specific name for it so you can distinguish it from other, future voluntary safety policies that you are more supportive of

This is a great point– consider me convinced. Interestingly, it's hard for me to really precisely define the things that make something an RSP as opposed to a different type of safety commitment, but there are some patterns in the existing RSP/PF/FSF that do seem to put them in a broader family. (Ex: Strong focus on model evaluations, implicit assumption that AI development should continue until/unless evidence of danger is found, implicit assumption that company executives will decide once safeguards are sufficient).

Comment by Akash (akash-wasil) on Anthropic: Reflections on our Responsible Scaling Policy · 2024-05-21T16:46:46.866Z · LW · GW

That really seems more like a question for governments than for Anthropic

+1. I do want governments to take this question seriously. It seems plausible to me that Anthropic (and other labs) could play an important role in helping governments improve its ability to detect/process information about AI risks, though.

it's not clear why the government would get involved in a matter of voluntary commitments by a private organization

Makes sense. I'm less interested in a reporting system that's like "tell the government that someone is breaking an RSP" and more interested in a reporting system that's like "tell the government if you are worried about an AI-related national security risk, regardless of whether or not this risk is based on a company breaking its voluntary commitments."

My guess is that existing whistleblowing programs are the best bet right now, but it's unclear to me whether they are staffed by people who understand AI risks well enough to know how to interpret/process/escalate such information (assuming the information ought to be escalated).

Comment by Akash (akash-wasil) on New voluntary commitments (AI Seoul Summit) · 2024-05-21T16:42:24.932Z · LW · GW

a pretty specific framework with unique strengths I wouldn't want overlooked.

What are some of the unique strengths of the framework that you think might get overlooked if we go with something more like "voluntary safety commitments" or "voluntary scaling commitments"?

(Ex: It seems plausible to me that you want to keep the word "scaling" in, since there are lots of safety commitments that could plausibly have nothing to do with future models, and "scaling" sort of forces you to think about what's going to happen as models get more powerful.)

Comment by Akash (akash-wasil) on New voluntary commitments (AI Seoul Summit) · 2024-05-21T16:40:25.060Z · LW · GW

You can still have the RSP commitment rule be a foundation for actually effective policies down the line

+1. I do think it's worth noting, though, that RSPs might not be a sensible foundation for effective policies.

One of my colleagues recently mentioned that the voluntary commitments from labs are much weaker than some of the things that the G7 Hiroshima Process has been working on. 

More tangibly, it's quite plausible to me that policymakers who think about AI risks from first principles would produce things that are better and stronger than "codify RSPs." Some thoughts:

  • It's plausible to me that when the RSP concept was first being developed, it was a meaningful improvement on the status quo, but the Overton Window & awareness of AI risk has moved a lot since then.
  • It's plausible to me that RSPs set a useful "floor"– like hey this is the bare minimum.
  • It's plausible to me that RSPs are useful for raising awareness about risk– like hey look, OpenAI and Anthropic are acknowledging that models might soon have dangerous CBRN capabilities. 

But there are a lot of implicit assumptions in the RSP frame like "we need to have empirical evidence of risk before we do anything" (as opposed to an affirmative safety frame), "we just need to make sure we implement the right safeguards once things get dangerous" (as opposed to a frame that recognizes we might not have time to develop such safeguards once we have clear evidence of danger), and "AI development should roughly continue as planned" (as opposed to a frame that considers alternative models, like public-private partnerships). 

More concretely, I would rather see policy based on things like the recent Bengio paper than RSPs. Examples:

Despite evaluations, we cannot consider coming powerful frontier AI systems “safe unless proven unsafe.” With present testing methodologies, issues can easily be missed. Additionally, it is unclear whether governments can quickly build the immense expertise needed for reliable technical evaluations of AI capabilities and societal-scale risks. Given this, developers of frontier AI should carry the burden of proof to demonstrate that their plans keep risks within acceptable limits. By doing so, they would follow best practices for risk management from industries, such as aviation, medical devices, and defense software, in which companies make safety cases

Commensurate mitigations are needed for exceptionally capable future AI systems, such as autonomous systems that could circumvent human control. Governments must be prepared to license their development, restrict their autonomy in key societal roles, halt their development and deployment in response to worrying capabilities, mandate access controls, and require information security measures robust to state-level hackers until adequate protections are ready. Governments should build these capacities now.

Sometimes advocates of RSPs say "these are things that are compatible with RSPs", but overall I have not seen RSPs/PFs/FSFs that are nearly this clear about the risks, this clear about the limitations of model evaluations, or this clear about the need for tangible regulations.

I've feared previously (and continue to fear) that there are some motte-and-bailey dynamics at play with RSPs, where proponents of RSPs say privately (and to safety people) that RSPs are meant to have strong commitments and inspire strong regulation, but then in practice the RSPs are very weak and end up conveying and overly-rosy picture to policymakers.

Comment by Akash (akash-wasil) on New voluntary commitments (AI Seoul Summit) · 2024-05-21T16:22:45.318Z · LW · GW

I don't think the term is established, except within a small circle of EA/AIS people. The vast majority of policymakers do not know what RSPs are– they do know what "voluntary commitments" are. 

(We might just disagree here, doesn't seem like it's worth a big back-and-forth.)

Comment by Akash (akash-wasil) on Open Thread Spring 2024 · 2024-05-21T12:53:11.486Z · LW · GW

+1. Also curious about Jade Leung (formerly Open AI)– she's currently the CTO for the UK AI Safety Institute. Also Geoffrey Irving (formerly DeepMind), who is a research director at the UKAISI.