Posts

Advice to junior AI governance researchers 2024-07-08T19:19:07.316Z
Mitigating extreme AI risks amid rapid progress [Linkpost] 2024-05-21T19:59:21.343Z
Akash's Shortform 2024-04-18T15:44:25.096Z
Cooperating with aliens and AGIs: An ECL explainer 2024-02-24T22:58:47.345Z
OpenAI's Preparedness Framework: Praise & Recommendations 2024-01-02T16:20:04.249Z
Speaking to Congressional staffers about AI risk 2023-12-04T23:08:52.055Z
Navigating emotions in an uncertain & confusing world 2023-11-20T18:16:09.492Z
International treaty for global compute caps 2023-11-09T18:17:04.952Z
Chinese scientists acknowledge xrisk & call for international regulatory body [Linkpost] 2023-11-01T13:28:43.723Z
Winners of AI Alignment Awards Research Contest 2023-07-13T16:14:38.243Z
AI Safety Newsletter #8: Rogue AIs, how to screen for AI risks, and grants for research on democratic governance of AI 2023-05-30T11:52:31.669Z
AI Safety Newsletter #7: Disinformation, Governance Recommendations for AI labs, and Senate Hearings on AI 2023-05-23T21:47:34.755Z
Eisenhower's Atoms for Peace Speech 2023-05-17T16:10:38.852Z
AI Safety Newsletter #6: Examples of AI safety progress, Yoshua Bengio proposes a ban on AI agents, and lessons from nuclear arms control 2023-05-16T15:14:45.921Z
AI Safety Newsletter #5: Geoffrey Hinton speaks out on AI risk, the White House meets with AI labs, and Trojan attacks on language models 2023-05-09T15:26:55.978Z
AI Safety Newsletter #4: AI and Cybersecurity, Persuasive AIs, Weaponization, and Geoffrey Hinton talks AI risks 2023-05-02T18:41:43.144Z
Discussion about AI Safety funding (FB transcript) 2023-04-30T19:05:34.009Z
Reframing the burden of proof: Companies should prove that models are safe (rather than expecting auditors to prove that models are dangerous) 2023-04-25T18:49:29.042Z
DeepMind and Google Brain are merging [Linkpost] 2023-04-20T18:47:23.016Z
AI Safety Newsletter #2: ChaosGPT, Natural Selection, and AI Safety in the Media 2023-04-18T18:44:35.923Z
Request to AGI organizations: Share your views on pausing AI progress 2023-04-11T17:30:46.707Z
AI Safety Newsletter #1 [CAIS Linkpost] 2023-04-10T20:18:57.485Z
Reliability, Security, and AI risk: Notes from infosec textbook chapter 1 2023-04-07T15:47:16.581Z
New survey: 46% of Americans are concerned about extinction from AI; 69% support a six-month pause in AI development 2023-04-05T01:26:51.830Z
[Linkpost] Critiques of Redwood Research 2023-03-31T20:00:09.784Z
What would a compute monitoring plan look like? [Linkpost] 2023-03-26T19:33:46.896Z
The Overton Window widens: Examples of AI risk in the media 2023-03-23T17:10:14.616Z
The Wizard of Oz Problem: How incentives and narratives can skew our perception of AI developments 2023-03-20T20:44:29.445Z
[Linkpost] Scott Alexander reacts to OpenAI's latest post 2023-03-11T22:24:39.394Z
Questions about Conjecure's CoEm proposal 2023-03-09T19:32:50.600Z
AI Governance & Strategy: Priorities, talent gaps, & opportunities 2023-03-03T18:09:26.659Z
Fighting without hope 2023-03-01T18:15:05.188Z
Qualities that alignment mentors value in junior researchers 2023-02-14T23:27:40.747Z
4 ways to think about democratizing AI [GovAI Linkpost] 2023-02-13T18:06:41.208Z
How evals might (or might not) prevent catastrophic risks from AI 2023-02-07T20:16:08.253Z
[Linkpost] Google invested $300M in Anthropic in late 2022 2023-02-03T19:13:32.112Z
Many AI governance proposals have a tradeoff between usefulness and feasibility 2023-02-03T18:49:44.431Z
Talk to me about your summer/career plans 2023-01-31T18:29:23.351Z
Advice I found helpful in 2022 2023-01-28T19:48:23.160Z
11 heuristics for choosing (alignment) research projects 2023-01-27T00:36:08.742Z
"Status" can be corrosive; here's how I handle it 2023-01-24T01:25:04.539Z
[Linkpost] TIME article: DeepMind’s CEO Helped Take AI Mainstream. Now He’s Urging Caution 2023-01-21T16:51:09.586Z
Wentworth and Larsen on buying time 2023-01-09T21:31:24.911Z
[Linkpost] Jan Leike on three kinds of alignment taxes 2023-01-06T23:57:34.788Z
My thoughts on OpenAI's alignment plan 2022-12-30T19:33:15.019Z
An overview of some promising work by junior alignment researchers 2022-12-26T17:23:58.991Z
Podcast: Tamera Lanham on AI risk, threat models, alignment proposals, externalized reasoning oversight, and working at Anthropic 2022-12-20T21:39:41.866Z
12 career-related questions that may (or may not) be helpful for people interested in alignment research 2022-12-12T22:36:21.936Z
Podcast: Shoshannah Tekofsky on skilling up in AI safety, visiting Berkeley, and developing novel research ideas 2022-11-25T20:47:09.832Z
Announcing AI Alignment Awards: $100k research contests about goal misgeneralization & corrigibility 2022-11-22T22:19:09.419Z

Comments

Comment by Akash (akash-wasil) on Towards more cooperative AI safety strategies · 2024-07-23T17:44:52.007Z · LW · GW

Thanks! 

(I think "being the kind of agent who survives the selection process" can sometimes be an important epistemic thing to consider, though mostly when thinking about how systems work and what kinds of people/views those systems promote. Agreed that "being informed by many people who Y" is a rather weak one & certainly would not on its own warrant a disclosure.)

Comment by Akash (akash-wasil) on Towards more cooperative AI safety strategies · 2024-07-23T13:19:11.496Z · LW · GW

An elephant in the room (IMO) is that moving forward, OpenAI probably benefits from a world in which the AI safety community does not have much influence. 

There's a fine line between "play nice with others and be more cooperative" and "don't actually advocate for policies that you think would help the world, and only do things that the Big Companies and Their Allies are comfortable with."

Again, I don't think Richard sat in his room and thought "how do I spread a meme that is good for my company." I think he's genuinely saying what he believes and giving advice that he thinks will be useful to the AI safety community and improve society's future . 

But I also think that one of the reasons why Richard still works at OpenAI is because he's the kind of agent who genuinely believes things that tend to be pretty aligned with OpenAI's interests, and I suspect his perspective is informed by having lots of friends/colleagues at OpenAI. 

Someone who works for a tobacco company can still have genuinely useful advice for the community of people concerned about the health effects of smoking. But I still think it's an important epistemic norm that they add (at least) a brief disclaimer acknowledging that they work for a tobacco company. 

(And the case becomes even stronger in the event that they have to get approval from the tobacco company comms team, or they filter out any ideas that they have that could get them in trouble with the company. Or perhaps before writing/publishing a post they consider the fact that other people have been fired from their company for sharing information that was against company interests, that the CEO attempted to remove a board member under the justification that she published a paper that went against company interests, that the company has previously used history of using highly restrive NDAs to prevent people from saying things that go against company interests.)

Comment by Akash (akash-wasil) on Towards more cooperative AI safety strategies · 2024-07-22T10:29:22.483Z · LW · GW

Separately, do you think "organized opposition" could have ever been avoided? It sounds like you're making two claims:

  • When AI safety folks advocate for specific policies, this gives opponents something to rally around and makes them more likely to organize.
  • There are some examples of specific policies (e.g., restrictions on OS, SB1047) that have contributed to this.

Suppose no one said anything about OS, and also (separately) SB1047 never happened. Presumably, at some point, som groups start advocating for specific policies that go against the e/acc worldview. At that point, it seems like you get the organized resistance.

So I'm curious: What does the Ideal Richard World look like? Does it mean people are just much more selective about which policies to advocate for? Under what circumstances is it OK to advocate for something that will increase the political organization of opposing groups? Are there examples of policies that you think are so important that they're worth the cost (of giving your opposition something to rally around?) To what extent is the deeper crux the fact that you're less optimistic about the policy proposals actually helping? 

Comment by Akash (akash-wasil) on Towards more cooperative AI safety strategies · 2024-07-22T10:18:39.502Z · LW · GW

Thanks for this clarification– I understand your claim better now.

Do you have any more examples of evidence that suggests that AI safety caused (or contributed meaningfully to) this shift from "online meme culture" to "organized political force?" This seems like the biggest crux imo.

Comment by Akash (akash-wasil) on Towards more cooperative AI safety strategies · 2024-07-22T00:43:23.294Z · LW · GW

e/acc has coalesced in defense of open-source, partly in response to AI safety attacks on open-source. This may well lead directly to a strongly anti-AI-regulation Trump White House

IMO this overstates the influence of OS stuff on the broader e/acc movement.

My understanding is that the central e/acc philosophy is around tech progress. Something along the lines of "we want to accelerate technological progress and AGI progress as quickly as possible, because we think technology is extremely awesome and will lead to a bunch of awesome+cool outcomes." The support for OS is toward the ultimate goal of accelerating technological progress. 

In a world where AI safety folks didn't say/do anything about OS, I would still suspect clashes between e/accs and AI safety folks. AI safety folks generally do not believe that maximally fast/rapid technological progress is good for the world. This would inevitably cause tension between the e/acc worldview and the worldview of many AI safety folks, unless AI safety folks decided never to propose any regulations that could cause us to deviate from the maximally-fast pathways to AGI. This seems quite costly.

(Separately, I agree that "dunking on open-source people" is bad and that people should do less "dunking on X" in general. I don't really see this as an issue with prioritizing short-term wins so much as getting sucked into ingroup vs. outgroup culture wars and losing sight of one's actual goals.)

This may well lead directly to a strongly anti-AI-regulation Trump White House

Similar point here– I think it's extremely likely this would've happened anyways. A community that believes passionately in rapid or maximally-fast AGI progress already has strong motivation to fight AI regulations. 

Comment by Akash (akash-wasil) on Towards more cooperative AI safety strategies · 2024-07-18T23:53:18.197Z · LW · GW

As a datapoint, I think I was likely underestimating the level of adversarialness going on & this thread makes me less likely to lump Lightcone in with other parts of the community.

I have definitely taken actions within the bounds of what seems reasonable that have aimed at getting the EA community to shut down or disappear (and will probably continue to do so).

@habryka are you able to share details/examples RE the actions you've taken to get the EA community to shut down or disappear?

I also personally do straightforwardly think that most of the efforts of the extended EA-Alignment ecosystem are bad, and would give up a large chunk of my resources to reduce their influence on the world

I would also be interested in more of your thoughts on this. (My Habryka sim says something like "the episemtic norms are bad, and many EA groups/individuals are more focused on playing status games. They are spending their effort saying and doing things that they believe will give them influence points, rather than trying to say true and precise things. I think society's chances of getting through this critical period would be higher if we focused on reasoning carefully about these complex domains, making accurate arguments, and helping ourselves & others understand the situation." Curious if this is roughly accurate or if I'm missing some important bits. Also curious if you're able to expand on this or provide some examples of the things in the category "Many EA people think X action is a good thing to do, but I disagree."

Comment by Akash (akash-wasil) on Towards more cooperative AI safety strategies · 2024-07-18T23:09:06.657Z · LW · GW

I'm not quite sure where we disagree, but if I had to put my finger on it, it's something like "I don't think that people would be offput by Alice going to networking events to try to get a job in housing policy, and I don't think she would trigger any defense mechanisms."

Specific question for you: Would you say that "Alice going to a networking event" (assume she's doing it socially conventional/appropriate ways) would count as structural power-seeking? And would you discourage her from going?

More generally, there are a lot of things you're labeling as "power-seeking" which feel inaccurate or at least quite unnatural to label as "power-seeking", and I suspect that this will lead to confusion (or at worst, lead to some of the people you want to engage dismissing your valid points).

I think in your frame, Alice going to networking events would be seen as "there are some socially-accepted ways of seeking power" and in my frame this would be seen as "it doesn't really make sense to call this power-seeking, as most people would find it ridiculous/weird to apply the label 'power-seeking' to an action as simple as going to a networking event."

I'm also a bit worried about a motte-and-bailey here. The bold statement is "power-seeking (which I'm kind of defining as anything that increases your influence, regardless of how innocuous or socially accepted it seems) is bad because it triggers defense mechanisms" and the more moderated statement is "there are some specific ways of seeking power that have important social costs, and I think that some/many actors in the community underestimate those costs. Also, there are many strategies for achieving your goals that don't involve seeking power, and I think some/many people in the community are underestimating those."

I agree with the more moderated claims.

Comment by Akash (akash-wasil) on Towards more cooperative AI safety strategies · 2024-07-18T21:31:49.050Z · LW · GW

Influence-seeking activates the same kind of feeling though it's less strong than for "power-seeking."

but if Alice just reasoned from the top down about how to optimize her networking really hard for her career, in a non-socially-skilled way, a friend should pull her aside

+1. I suspect we'd also likely agree that if Alice just stayed in her room all day and only talked to her friends about what ideal housing policy should look like, someone should pull her aside and say "hey, you might want to go to some networking events and see if you can get involved in housing policy, or at least see if there are other things you should be doing to become a better fit for housing policy roles in the future."

In this case, it's not the desire to have influence that is the core problem. The core problem is whether or not Alice is taking the right moves to have the kind of influence she wants.

Bringing it back to the post– I think I'd be excited to see you write something more along the lines of "What mistakes do many people in the AIS community make when it comes to influence-seeking?" I suspect this would be more specific and concrete. I think the two suggestions at the end (prioritize legitimacy & prioritize competence) start to get at this.

Otherwise, I feel like the discussion is going to go into less productive directions, where people who already agree with you react like "Yeah, Alice is such a status-seeker! Stick it to her!" and people who disagree with you are like "Wait what? Alice is just trying to network so she can get a job in housing policy– why are you trying to cast that as some shady plot? Should she just stay in her room and write blog posts?" 

Comment by Akash (akash-wasil) on Towards more cooperative AI safety strategies · 2024-07-18T19:53:11.801Z · LW · GW

I agree with many of the points expressed in this post, though something doesn't sit right with me about some of the language/phrasing used.

For example, the terms "power-seeking" and "cooperative" feel somewhat loaded. It's not so much that they're inaccurate (when read in a rather precise and charitable way) but moreso that it feels like they have pretty strong connotations and valences. 

Consider:

Alice: I'm going to a networking event tonight; I might meet someone who can help me get a job in housing policy!

Bob: That's a power-seeking move.

Alice: Uh... what?

Bob: Well, you know, if you get a job, then that increases your power. It increases your ability to influence the world.

Alice: I guess me getting a job does technically increase my ability to influence the world, so if that's how you want to define "power-seeking" then you're technically correct, but that's not really the first word that comes to mind here. We usually use the word "power-seeking" to refer to bad people who are overly concerned with power-seeking– usually for personal or selfish gain at the expense of others. And I don't really think that's what I'm doing.

Separately, I'd be curious how you're defining "cooperative" in this context. (Does it mean "not power-seeking" or "strategies that focus more on sharing information with the public and making sure that competent people are in charge regardless of their views on AI safety", or something else?) 

Comment by Akash (akash-wasil) on Towards more cooperative AI safety strategies · 2024-07-18T19:42:17.606Z · LW · GW

Meta: I think these kinds of posts should include some sort of disclaimer acknowledging that you are an OpenAI employee & also mentioning whether or not the post was reviewed by OpenAI staff, OpenAI comms, etc.

I imagine you didn't do this because many people who read this forum are aware of this fact (and it's on your profile– it's not like you're trying to hide it), but I suspect this information could be useful for newcomers who are engaging with this kind of material.

Comment by Akash (akash-wasil) on Neel Nanda's Shortform · 2024-07-12T21:45:09.232Z · LW · GW

Separately, while I think the discussion around "is X net negative" can be useful, I think it ends up implicitly putting the frame on "can X justify that they are not net negative."

I suspect the quality of discourse– and society's chances to have positive futures– would improve if the frame were more commonly something like "what are the best actions for X to be taken" or "what are reasonable/high-value things that X could be doing."

And I think it's valid to think "X is net positive" while also thinking "I feel disappointed in X because I don't think it's using its power/resources in ways that would produce significantly better outcomes."

IDK what the bar should be for considering X a "responsible actor", but I imagine my personal bar is quite a bit higher than "(barely) net positive in expectation."

P.S. Both of these comments are on the opinionated side, so separately, I just wanted to say thank you Neel for speaking up & for offering your current takes on Anthropic. Strong upvoted!

Comment by Akash (akash-wasil) on Neel Nanda's Shortform · 2024-07-12T21:40:39.736Z · LW · GW

I'm a bit worried about a dynamic where smart technical folks end up feeling like "well, I'm kind of disappointed in Anthropic's comms/policy stuff from what I hear, and I do wish they'd be more transparent, but policy is complicated and I'm not really a policy expert".

To be clear, this is a quite reasonable position for any given technical researcher to have– the problem is that this provides pretty little accountability. In a world where Anthropic was (hypothetically) dishonest, misleading, actively trying to undermine/weaken regulations, or putting its own interests above the interests of the "commons", it seems to me like many technical researchers (even Anthropic staff) would not be aware of this. Or they might get some negative vibes but then slip back into a "well, I'm not a policy person, and policy is complicated" mentality.

I'm not saying there's even necessarily a strong case that Anthropic is trying to sabotage policy efforts (though I am somewhat concerned about some of the rhetoric Anthropic uses, public comments about thinking its too early to regulate, rumors that they have taken actions to oppose SB 1047, and a lack of any real "positive" signals from their positive team like EG recommending or developing policy proposals that go beyond voluntary commitments or encouraging people to measure risks.)

But I think once upon a time there was some story that if Anthropic defected in major ways, a lot of technical researchers would get concerned and quit/whistleblow. I think Anthropic's current comms strategy, combined with the secrecy around a lot of policy things, combined with a general attitude (whether justified or unjustified) of "policy is complicated and I'm a technical person so I'm just going to defer to Dario/Jack" makes me concerned that safety-concerned people won't be able to hold Anthropic accountable even if it actively sabotages policy stuff.

I'm also not really sure if there's an easy solution to this problem, but I do imagine part of the solution involves technical people (especially at Anthropic) raising questions, asking people like Jack and Dario to explain their takes more, and being more willing to raise public & private discussions about Anthropic's role in the broader policy space.

Comment by Akash (akash-wasil) on An AI Race With China Can Be Better Than Not Racing · 2024-07-03T00:42:28.545Z · LW · GW

A common scheme for a conversation about pausing the development of transformative AI goes like this:

Minor: The first linked post is not about pausing AI development. It mentions various interventions for "buying time" (like evals and outreach) but it's not about an AI pause. (When I hear the phrase "pausing AI development" I think more about the FLI version of this which is like "let's all pause for X months" and less about things like "let's have labs do evals so that they can choose to pause if they see clear evidence of risk".)

At a basic level, we want to estimate how much worse (or, perhaps, better) it would be for the United States to completely cede the race for TAI to the PRC.

My impression is that (most? many?) pause advocates are not talking about completely ceding the race to the PRC. I would guess that if you asked (most? many?) people who describe themselves as "pro-pause", they would say things like "I want to pause to give governments time to catch up and figure out what regulations are needed" or "I want to pause to see if we can develop AGI in a more secure way, such as (but not limited to) something like MAGIC." 

I doubt many of them would say "I would be in favor of a pause if it meant that the US stopped doing AI development and we completely ceded the race to China." I would suspect many of them might say something like "I would be in favor of a pause in which the US sees if China is down to cooperate, but if China is not down to cooperate, then I would be in favor of the US lifting the pause."

Comment by Akash (akash-wasil) on Akash's Shortform · 2024-06-30T16:43:08.386Z · LW · GW

Recommended reading:  A recent piece argues that the US-China crisis hotline doesn't work & generally raises some concerns about US-China crisis communication.

Some quick thoughts:

  • If the claims in the piece are true, there seem to be some (seemingly tractable) ways of substantially improving US-China crisis communication. 
  • The barriers seem more bureaucratic (understanding how the defense world works and getting specific agencies/people to do specific things) than political (I doubt this is something you need Congress to pass new legislation to improve.)
  • In general, I feel like "how do we improve our communication infrastructure during AI-related crises" is an important and underexplored area of AI policy. This isn't just true for US-China communication but also for "lab-government communication", "whistleblower-government communication", and "junior AI staffer-senior national security advisor" communication. 
    • Example: Suppose an eval goes off that suggests that an AI-related emergency might be imminent. How do we make sure this information swiftly gets to relevant people? To what extent do UKAISI and USAISI folks (or lab whistleblowers) have access to senior national security folks who would actually be able to respond in a quick or effective way?
  • I think IAPS' CDDC paper is a useful contribution here. I will soon be releasing a few papers in this broad space, with a focus on interventions that can improve emergency detection + emergency response.
  • One benefit of workshops/conferences/Track 2 dialogues might simply be that you get relevant people to meet each other, share contact information, build trust/positive vibes, and be more likely to reach out in the event of an emergency scenario.
  • Establishing things like the AI Safety and Security Board might also be useful for similar reasons. I think this has gotten a fair amount of criticism for being too industry-focused, and some of that is justified. Nonetheless, I think interventions along the lines of "make it easy for the people who might see the first signs of extreme risk have super clear ways of advising/contacting government officials" seem great. 
Comment by Akash (akash-wasil) on Fabien's Shortform · 2024-06-30T16:28:20.506Z · LW · GW

Thanks! In general, I like these bite-sized summaries of various things you're reading. Seems like a win for the commons, and I hope more folks engaging with governance/policy stuff do things like this.

Comment by Akash (akash-wasil) on ryan_greenblatt's Shortform · 2024-06-28T21:02:03.310Z · LW · GW

Do you feel like there are any benefits or drawbacks specifically tied to the fact that you’re doing this work as a contractor? (compared to a world where you were not a contractor but Anthropic just gave you model access to run these particular experiments and let Evan/Carson review your docs)

Comment by Akash (akash-wasil) on Buck's Shortform · 2024-06-25T00:48:38.126Z · LW · GW

@Richard_Ngo do you have any alternative approaches in mind that are less susceptible to regulatory capture? At first glance, I think this broad argument can be applied to any situation where the government regulates anything. (There's always some risk that R focuses on the wrong things or R experiences corporate/governmental pressure to push things through).

I do agree that the broader or more flexible the regulatory regime is, the more susceptible it might be to regulatory capture. (But again, this feels like it doesn't really have much to do with safety cases– this is just a question of whether we want flexible or fixed/inflexible regulations in general.)

Comment by Akash (akash-wasil) on Buck's Shortform · 2024-06-25T00:42:41.519Z · LW · GW

Here's how I understand your argument:

  1. Some people are advocating for safety cases– the idea that companies should be required to show that risks drop below acceptable levels.
  2. This approach is used in safety engineering fields.
  3. But AI is different from the safety engineering fields. For example, in AI we have adversarial risks.
  4. Therefore we shouldn't support safety cases.

I think this misunderstands the case for safety cases, or at least only argues against one particular justification for safety cases.

Here's how I think about safety cases (or really any approach in which a company needs to present evidence that their practices keep risks below acceptable levels):

  1. AI systems pose major risks. A lot of risks stem from race dynamics and competitive pressures.
  2. If companies were required to demonstrate that they kept risks below acceptable levels, this would incentivize a lot more safety research and curb some of the dangerous properties of race dynamics.
  3. Other fields also have similar setups, and we should try to learn from them when relevant. Of course, AI development will also have some unique properties so we'll have to adapt the methods accordingly.

I'd be curious to hear more about why you think safety cases fail to work when risks are adversarial (at first glance, it doesn't seem like it should be too difficult to adapt the high-level safety case approach).

I'm also curious if you have any alternatives that you prefer. I currently endorse the claim "safety cases are better than status quo" but I'm open to the idea that maybe "Alternative approach X is better than both safety cases and status quo."

Comment by Akash (akash-wasil) on AI takeoff and nuclear war · 2024-06-21T20:24:51.018Z · LW · GW

Interesting analysis! I think it'll be useful for more folks to think about the nuclear/geopolitical implications of AGI development, especially in worlds where governments are paying more attention & one or more nuclear powers experience a "wakeup" or "sudden increase in situational awareness."

Some specific thoughts:

Of these two risks, it is likely simpler to work to reduce the risk of failure to navigate. 

Can you say more about why you believe this? At first glance, it seems to be like "fundamental instability" is much more tied to how AI development goes, so I would've expected it to be more tractable [among LW users]. Whereas "failure to navigate" seems further outside our spheres of influence– it seems to me like there would be a lot of intelligence agency analysts, defense people, and national security advisors who are contributing to discussions about whether or not to go to war. Seems plausible that maybe a well-written analysis from folks in the AI safety community could be useful, but my impression is that it would be pretty hard to make a splash here since (a) things would be so fast-moving, (b) a lot of the valuable information about the geopolitical scene will be held by people working in government and people with security clearances, making it harder for outside people to reason about things, and (c) even conditional on valuable analysis, the stakeholders who will be deferred to are (mostly) going to be natsec/defense stakeholders.

3) In the aftermath of a nuclear war, surviving powers would be more fearful and hostile.

4) There would be greater incentives to rush for powerful AI, and less effort expended on going carefully or considering pausing.

There are lots of common-sense reasons why nuclear war is bad. That said, I'd be curious to learn more about how confident you are in these statements. In a post-catastrophe world, it seems quite plausible to me that the rebounding civilizations would fear existential catastrophes and dangerous technologies and try hard to avoid technology-induced catastrophes. I also just think such scenarios are very hard to reason about, such that there's a lot of uncertainty around whether AI progress would be faster (bc civs are fearful of each other and hostile) or slower (because civs are are fearful of technology-induced catastrophes and generally have more of a safety/security mindset.)

Comment by Akash (akash-wasil) on Richard Ngo's Shortform · 2024-06-21T13:14:51.318Z · LW · GW

I think part of the disappointment is the lack of communication regarding violating the commitment or violating the expectations of a non-trivial fraction of the community.

If someone makes a promise to you or even sets an expectation for you in a softer way, there is of course always some chance that they will break the promise or violate the expectation.

But if they violate the commitment or the expectation, and they care about you as a stakeholder, I think there's a reasonable expectation that they should have to justify that decision.

If they break the promise or violate the soft expectation, and then they say basically nothing (or they say "well I never technically made a promise– there was no contract!", then I think you have the right to be upset with them not only for violating you expectation but also for essentially trying to gaslight you afterward.

I think a Responsible Lab would have issued some sort of statement along the lines of "hey, we're hearing that some folks thought we had made commitments to not advance the frontier and some of our employees were saying this to safety-focused members of the AI community. We're sorry about this miscommunication, and here are some steps we'll take to avoid such miscommunications in the future." or "We did in fact intend to follow-through on that, but here are some of the extreme events or external circumstances that caused us to change our mind."

In the absence of such statement, it makes it seem like Anthropic does not really care about honoring its commitments/expectations or generally defending its reasoning on important safety-relevant issues. I find it reasonable that this disposition harms Anthropic's reputation among safety-conscious people and makes safety-conscious people less excited about voluntary commitments from labs in general.

Comment by Akash (akash-wasil) on AI #69: Nice · 2024-06-20T23:09:02.392Z · LW · GW

Ah, gotcha– are there more details about which board seats the LTBT will control//how board seats will be added? According to GPT, the current board members are Dario, Daniela, Yasmin, and Jay. (Presumably Dario and Daniela's seats will remain untouched and will not be the ones in LTBT control.)

Also gotcha– removed the claim that he was replaced by Jay.

Comment by Akash (akash-wasil) on AI #69: Nice · 2024-06-20T20:39:20.975Z · LW · GW

Luke Muehlhauser explains he resigned from the Anthropic board because there was a conflict with his work at Open Philanthropy and its policy advocacy. I do not see that as a conflict. If being a board member at Anthropic was a conflict with advocating for strong regulations or considered by them a ‘bad look,’ then that potentially says something is very wrong at Anthropic as well. Yes, there is the ‘behind the scenes’ story but one not behind the scenes must be skeptical. 

I also do not really understand why the COI was considered so strong or unmanageable that Luke felt he needed to resign. Note also that my impression is that OP funds very few "applied policy" efforts, and my impression is that the ones they do fund are mostly focusing on things that Anthropic supports (e.g., science of evals, funding for NIST). I also don't get the vibe that Luke leaving the board is coinciding with any significant changes to OP's approach to governance or policy.

More than that, I think Luke plausibly… chose the wrong role? I realize most board members are very part time, but I think the board of Anthropic was the more important assignment.

I agree with this (I might be especially inclined to believe this because I haven't been particularly impressed with the output from OP's governance team, but I think even if I believed it were doing a fairly good job under Luke's leadership, I would still think that the Anthropic board role were more valuable. On top of that, it would've been relatively easy for OP to replace Luke with someone who has a very similar set of beliefs.)

Comment by Akash (akash-wasil) on Fabien's Shortform · 2024-06-19T18:45:06.842Z · LW · GW

Makes sense— I think the thing I’m trying to point at is “what do you think better safety research actually looks like?”

I suspect there’s some risk that, absent some sort of pre-registrarion, your definition of “good safety research” ends up gradually drifting to be more compatible with the kind of research Anthropic does.

Of course, not all of this will be a bad thing— hopefully you will genuinely learn some new things that change your opinion of what “good research” is.

But the nice thing about pre-registration is that you can be more confident that belief changes are stemming from a deliberate or at least self-aware process, as opposed to some sort of “maybe I thought this all along//i didn’t really know what i believed before I joined” vibe. (and perhaps this is sufficiently covered in your doc)

Comment by Akash (akash-wasil) on Fabien's Shortform · 2024-06-17T01:19:38.240Z · LW · GW

Congrats on the new role! I appreciate you sharing this here.

If you're able to share more, I'd be curious to learn more about your uncertainties about the transition. Based on your current understanding, what are the main benefits you're hoping to get at Anthropic? In February/March, what are the key areas you'll be reflecting on when you decide whether to stay at Anthropic or come back to Redwood?

Obviously, your February/March write-up will not necessarily conform to these "pre-registered" considerations. But nonetheless, I think pre-registering some considerations or uncertainties in advance could be a useful exercise (and I would certainly find it interesting!)

Comment by Akash (akash-wasil) on MIRI's June 2024 Newsletter · 2024-06-16T17:36:59.681Z · LW · GW

Don’t have time to respond in detail but a few quick clarifications/responses:

— I expect policymakers to have the most relevant/important questions about policy and to be the target audience most relevant for enacting policies. Not solving technical alignment. (Though I do suspect that by MIRI’s lights, getting policymakers to understand alignment issues would be more likely to result in alignment progress than having more conversations with people in the technical alignment space.)

— There are lots of groups focused on comms/governance. MIRI is unique only insofar as it started off as a “technical research org” and has recently pivoted more toward comms/governance.

— I do agree that MIRI has had relatively low output for a group of its size/resources/intellectual caliber. I would love to see more output from MIRI in general. Insofar as it is constrained, I think they should be prioritizing “curious policy newcomers” over people like Matthew and Alex. — Minor but I don’t think MIRI is getting “outargued” by those individuals and I think that frame is a bit too zero-sum.

— Controlling for overall level of output, I suspect I’m more excited than you about MIRI spending less time on LW and more time on comms/policy work with policy communities (EG Malo contributing to the Schumer insight forums, MIRI responding to government RFCs). — My guess is we both agree that MIRI could be doing more on both fronts and just generally having higher output. My impression is they are working on this and have been focusing on hiring; I think if their output stayed relatively the same 3-6 months from now I will be fairly disappointed.

Comment by Akash (akash-wasil) on MIRI's June 2024 Newsletter · 2024-06-16T13:49:02.633Z · LW · GW

I think if MIRI engages with “curious newcomers” those newcomers will have their own questions/confusions/objections and engaging with those will improve general understanding.

Based on my experience so far, I don’t expect their questions/confusions/objections to overlap a lot with the questions/confusions/objections that tech-oriented active LW users have.

I also think it’s not accurate to say that MIRI tends to ignore its strongest critics; there’s perhaps more public writing/dialogues between MIRI and its critics than for pretty much any other organization in the space.

My claim is not that MIRI should ignore its critics but moreso that it should focus on replying to criticisms or confusions from “curious and important newcomers”. My fear is that MIRI might engage too much with criticisms from LW users and other ingroup members and not focus enough on engaging with policy folks, whose cruxes and opinions often differ substantially than EG the median LW commentator.

Comment by Akash (akash-wasil) on MIRI's June 2024 Newsletter · 2024-06-16T13:24:38.108Z · LW · GW

Offering a quick two cents: I think MIRI‘s priority should be to engage with “curious and important newcomers” (e.g., policymakers and national security people who do not yet have strong cached views on AI/AIS). If there’s extra capacity and interest, I think engaging with informed skeptics is also useful (EG big fan of the MIRI dialogues), but on the margin I don’t suspect it will be as useful as the discussions with “curious and important newcomers.”

Comment by Akash (akash-wasil) on Akash's Shortform · 2024-06-14T16:00:43.410Z · LW · GW

@Ryan Kidd @Lee Sharkey I suspect you'll have useful recommendations here.

Comment by Akash (akash-wasil) on Akash's Shortform · 2024-06-14T15:55:25.855Z · LW · GW

Recommended readings for people interested in evals work?

Someone recently asked: "Suppose someone wants to get into evals work. Is there a good reading list to send to them?" I spent ~5 minutes and put this list together. I'd be interested if people have additional suggestions or recommendations:

I would send them:

I would also encourage them to read stuff more on the "macrostrategy" of evals. Like, I suspect a lot of value will come from people who are able to understand the broader theory of change of evals and identify when we're "rowing" in bad directions. Some examples here might be:

Comment by Akash (akash-wasil) on AI catastrophes and rogue deployments · 2024-06-14T15:37:07.512Z · LW · GW

I think that rogue internal deployment might be a bigger problem than rogue external deployment, mostly because I’m pretty bullish on simple interventions to prevent weight exfiltration.

Can you say more about this? Unless I'm misunderstanding it, it seems like this hot take goes against the current "community consensus" which is something like "on the default AGI development trajectory, it's extremely unlikely that labs will be able to secure weights from China."

Would you say you're simply more bullish about upload limits than others? Or that you think the mainstream security people just haven't thought about some of the ways that securing weights might be easier than securing other things that society struggles to protect from state actors?

Comment by Akash (akash-wasil) on Access to powerful AI might make computer security radically easier · 2024-06-08T22:14:18.805Z · LW · GW

I think this is an interesting line of inquiry and the specific strategies expressed are helpful.

One thing I'd find helpful is a description of the kind of AI system that you think would be necessary to get us to state-proof security. 

I have a feeling the classic MIRI-style "either your system is too dumb to achieve the goal or your system is so smart that you can't trust it anymore" argument is important here. The post essentially assumes that we have a powerful trusted model that can do impressive things like "accurately identify suspicious actions" but is trusted enough to be widely internally deployed. This seems fine for a brainstorming exercise (and I do think such brainstorming exercises should exist).

But for future posts like this, I think it would be valuable to have a ~1-paragraph description of the AI system that you have in mind. Perhaps noting what its general capabilities and what its security-relevant capabilities are. I imagine this would help readers evaluate whether or not they expect to get a "Goldilocks system" (smart enough to do useful things but not so smart that internally deploying the system would be dangerous, even with whatever SOTA control procedures are applied.)

Comment by Akash (akash-wasil) on Akash's Shortform · 2024-06-07T23:33:53.558Z · LW · GW

@Peter Barnett @Rob Bensinger @habryka @Zvi @davekasten @Peter Wildeford you come to mind as people who might be interested. 

See also Wikipedia Page about the report (but IMO reading sections of the actual report is worth it.)

Comment by Akash (akash-wasil) on Akash's Shortform · 2024-06-07T23:29:17.922Z · LW · GW

I've started reading the Report on the International Control of Atomic Energy and am finding it very interesting/useful.

I recommend this for AI policy people– especially those interested in international cooperation, US policy, and/or writing for policy audiences

Comment by Akash (akash-wasil) on Response to Aschenbrenner's "Situational Awareness" · 2024-06-07T23:07:45.382Z · LW · GW

when it would potentially be vastly easier to spearhead an international alliance to prohibit this technology.

I would be interested in reading more about the methods that could be used to prohibit the proliferation of this technology (you can assume a "wake-up" from the USG). 

I think one of the biggest fears would be that any sort of international alliance would not have perfect/robust detection capabilities, so you're always risking the fact that someone might be running a rogue AGI project.

Also, separately, there's the issue of "at some point, doesn't it become so trivially easy to develop AGI that we still need the International Community Good Guys to develop AGI [or do something else] that gets us out of the acute risk period?" When you say "prohibit this technology", do you mean "prohibit this technology from being developed outside of the International Community Good Guys Cluster" or do you mean "prohibit this technology in its entirety?" 

Comment by Akash (akash-wasil) on AI #67: Brief Strange Trip · 2024-06-07T22:08:52.607Z · LW · GW

What is up with Anthropic’s public communications?

Once again this week, we saw Anthropic’s public communications lead come out warning about overregulation, in ways I expect to help move the Overton window away from the things that are likely going to become necessary.

Note also that Anthropic recently joined TechNet, an industry advocacy group that is generally considered "anti-regulation" and specifically opposes SB 1047.

I think a responsible AGI lab would be taking a much stronger role in pushing the Overton Window and pushing for strong policies. At the very least, I would hope that the responsible AGI lab has comms that clearly signal dangers from race dynamics, dangers from superintelligence, and the need for the government to be prepared to intervene swiftly in the event of emergencies. 

This is not what I see from Anthropic. I am disappointed in Anthropic. If Anthropic wants me to consider it a "responsible AGI lab", I will need an explanation of why Anthropic has stayed relatively silent, why it is joining groups that oppose SB 1047, and why its policy team seems to have advocated for ~nothing beyond voluntary commitments and optional model evaluations.

(I will note that I thought Dario's Senate Testimony last year included some reasonable things. Although the focus was on misuse threats, he mentions that we may no longer have the ability to control models and calls for legislation that would require that models pass certain standards before deployment).

Comment by Akash (akash-wasil) on Response to Aschenbrenner's "Situational Awareness" · 2024-06-07T19:12:42.355Z · LW · GW

The field is not ready, and it's not going to suddenly become ready tomorrow. We need urgent and decisive action, but to indefinitely globally halt progress toward this technology that threatens our lives and our children's lives, not to accelerate ourselves straight off a cliff.

I think most advocacy around international coordination (that I've seen, at least) has this sort of vibe to it. The claim is "unless we can make this work, everyone will die."

I think this is an important point to be raising– and in particular I think that efforts to raise awareness about misalignment + loss of control failure modes would be very useful. Many policymakers have only or primarily heard about misuse risks and CBRN threats, and the "policymaker prior" is usually to think "if there is a dangerous, tech the most important thing to do is to make the US gets it first."

But in addition to this, I'd like to see more "international coordination advocates" come up with concrete proposals for what international coordination would actually look like. If the USG "wakes up", I think we will very quickly see that a lot of policymakers + natsec folks will be willing to entertain ambitious proposals.

By default, I expect a lot of people will agree that international coordination in principle would be safer but they will fear that in practice it is not going to work. As a rough analogy, I don't think most serious natsec people were like "yes, of course the thing we should do is enter into an arms race with the Soviet Union. This is the safeest thing for humanity."

Rather, I think it was much more a vibe of "it would be ideal if we could all avoid an arms race, but there's no way we can trust the Soviets to follow-through on this." (In addition to stuff that's more vibesy and less rational than this, but I do think insofar as logic and explicit reasoning were influential, this was likely one of the core cruses.)

In my opinion, one of the most important products for "international coordination advocates" to produce is some sort of concrete plan for The International Project. And importantly, it would need to somehow find institutional designs and governance mechanisms that would appeal to both the US and China. Answering questions like "how do the international institutions work", "who runs them", "how are they financed", and "what happens if the US and China disagree" will be essential here.

The Baruch Plan and the Acheson-Lilienthal Report (see full report here) might be useful sources of inspiration.

P.S. I might personally spend some time on this and find others who might be interested. Feel free to reach out if you're interested and feel like you have the skillset for this kind of thing.

Comment by Akash (akash-wasil) on Zach Stein-Perlman's Shortform · 2024-06-07T01:29:23.840Z · LW · GW

@Ebenezer Dukakis I would be even more excited about a "how and why" post for internationalizing AGI development and spelling out what kinds of international institutions could build + govern AGI.

Comment by Akash (akash-wasil) on SB 1047 Is Weakened · 2024-06-06T16:42:27.599Z · LW · GW

To what extent do you think the $100M threshold will weaken the bill "in practice?" I feel like "severely weakened" might overstate the amount of weakenedness. I would probably say "mildly weakened."

I think the logic along the lines of "the frontier models are going to be the ones where the dangerous capabilities are discovered first, so maybe it seems fine (for now) to exclude non-frontier models" makes some amount of sense.

In the long-run, this approach fails because you might be able to hit dangerous capabilities with <$100M. But in the short-run, it feels like the bill covers the most relevant actors (Microsoft, Meta, Google, OpenAI, Anthropic).

Maybe I always thought the point of the bill was to cover frontier AI systems (which are still covered) as opposed to any systems that could have hazardous capabilities, so I see the $100M threshold as more of a "compromise consistent with the spirit of the bill" as opposed to a "substantial weakening of the bill." What do you think?

See also:

Comment by Akash (akash-wasil) on Thomas Kwa's Shortform · 2024-06-05T15:54:01.038Z · LW · GW

*Quickly checks my ratio*

"Phew, I've survived the Kwa Purge"

Comment by Akash (akash-wasil) on Akash's Shortform · 2024-06-04T22:28:12.241Z · LW · GW

@Bodgan, Can you spell out a vision for a stably multipolar world with the above assumptions satisfied?

IMO assumption B is doing a lot of the work— you might argue that the IE will not give anyone a DSA, in which case things get more complicated. I do see some plausible stories in which this could happen but they seem pretty unlikely.

@Ryan, thanks for linking to those. Lmk if there are particular points you think are most relevant (meta: I think in general I find discourse more productive when it’s like “hey here’s a claim, also read more here” as opposed to links. Ofc that puts more communication burden on you though, so feel free to just take the links approach.)

Comment by Akash (akash-wasil) on Akash's Shortform · 2024-06-04T19:06:31.780Z · LW · GW

the probable increase in risks of centralization might make it not worth it

Can you say more about why the risk of centralization differs meaningfully between the three worlds?

IMO if you assume that (a) an intelligence explosion occurs at some point, (b) the leading actor uses the intelligence explosion to produce a superintelligence that provides a decisive strategic advantage, and (c) the superintelligence is aligned/controlled...

Then you are very likely (in the absence of coordination) to result in centralization no matter what. It's just a matter of whether OpenAI/Microsoft (scenario #1), the USG and allies (scenario #2), or a broader international coalition (weighted heavily toward the USG and China) are the ones wielding the superintelligence.

(If anything, it seems like the "international coalition" approach seems less likely to lead to centralization than the other two approaches, since you're more likely to get post-AGI coordination.)

especially if you don't use AI automation (using the current paradigm, probably) to push those forward.

In my vision, the national or international project would be investing into "superalignment"-style approaches, they would just (hopefully) have enough time/resources to be investing into other approaches as well.

I typically assume we don't get "infinite time"– i.e., even the international coalition is racing against "the clock" (e.g., the amount of time it takes for a rogue actor to develop ASI in a way that can't be prevented, or the amount of time we have until a separate existential catastrophe occurs.) So I think it would be unwise for the international coalition to completely abandon DL/superalignemnt, even if one of the big hopes is that a safer paradigm would be discovered in time.

Comment by Akash (akash-wasil) on Akash's Shortform · 2024-06-04T17:20:39.019Z · LW · GW

My rough ranking of different ways superintelligence could be developed:

  1. Least safe: Corporate Race. Superintelligence is developed in the context of a corporate race between OpenAI, Microsoft, Google, Anthropic, and Facebook.
  2. Safer (but still quite dangerous): USG race with China. Superintelligence is developed in the context of a USG project or "USG + Western allies" project with highly secure weights. The coalition hopefully obtains a lead of 1-3 years that it tries to use to align superintelligence and achieve a decisive strategic advantage. This probably relies heavily on deep learning and means we do not have time to invest into alternative paradigms ("provably safe" systems, human intelligence enhancement, etc.
  3. Safest (but still not a guarantee of success): International coalition Superintelligence is developed in the context of an international project with highly secure weights. The coalition still needs to develop superintelligence before rogue projects can, but the coalition hopes to obtain a lead of 10+ years that it can use to align a system that can prevent rogue AGI projects. This could buy us enough time to invest heavily in alternative paradigms. 

My own thought is that we should be advocating for option #3 (international coordination) unless/until there is enough evidence that suggests that it's actually not feasible, and then we should settle for option #2. I'm not yet convinced by people who say we have to settle for option #2 just because EG climate treaties have not went well or international cooperation is generally difficult. 

But I also think people advocating #3 should be aware that there are some worlds in which international cooperation will not be feasible, and we should be prepared to do #2 if it's quite clear that the US and China are unwilling to cooperate on AGI development. (And again, I don't think we have that evidence yet– I think there's a lot of uncertainty here.)

Comment by Akash (akash-wasil) on Prometheus's Shortform · 2024-06-04T02:49:51.757Z · LW · GW

Thanks for sharing your experience here. 

One small thought is that things end up feeling extremely neglected once you index on particular subquestions. Like, on a high-level, it is indeed the case that AI safety has gotten more mainstream.

But when you zoom in, there are a lot of very important topics that have <5 people seriously working on them. I work in AI policy, so I'm more familiar with the policy/governance ones but I imagine this is also true in technical (also, maybe consider swapping to governance/policy!)

Also, especially in hype waves, I think a lot of people end up just working on the popular thing. If you're willing to deviate from the popular thing, you can often find important parts of the problem that nearly no one is addressing.

Comment by Akash (akash-wasil) on Seth Herd's Shortform · 2024-06-03T13:32:30.177Z · LW · GW

Second, our different takes will tend to make a lot of our communication efforts cancel each other out. If alignment is very hard, we must Shut It Down or likely die. If it's less difficult, we should primarily work hard on alignment.

I don't think this is (fully) accurate. One could have a high P(doom) but still think that the current AGI development paradigm is still best-suited to obtain good outcomes & government involvement would make things worse in expectation. On the flipside, one could have a low/moderate P(doom) but think that the safest way to get to AGI involves government intervention that ends race dynamics & think that government involvement would make P(doom) even lower. 

Absolute P(doom) is one factor that might affect one's willingness to advocate for strong government involvement, but IMO it's only one of many factors, and LW folks sometimes tend to make it seem like it's the main/primary/only factor.

Of course, if a given organization says they're supporting X because of their P(Doom), I agree that they should provide evidence for their P(doom). 

My claim is simply that we shouldn't assume that "low P(doom) means govt intervention bad and high P(doom) means govt intervention good". 

One's views should be affected by a lot of other factors, such as "how bad do you think race dynamics are", "to what extent do you think industry players are able and willing to be cautious", "to what extent do you think governments will end up understanding and caring about alignment", and "to what extent do you think governments would have safety cultures around intelligence enhancement compared to industry players."

Comment by Akash (akash-wasil) on Akash's Shortform · 2024-06-02T01:07:19.064Z · LW · GW

I found this answer helpful and persuasive– thank you!

Comment by Akash (akash-wasil) on We might be dropping the ball on Autonomous Replication and Adaptation. · 2024-05-31T15:14:47.400Z · LW · GW

Potentially unpopular take, but if you have the skillset to do so, I'd rather you just come up with simple/clear explanations for why ARA is dangerous, what implications this has for AI policy, present these ideas to policymakers, and iterate on your explanations as you start to see why people are confused.

Note also that in the US, the NTIA has been tasked with making recommendations about open-weight models. The deadline for official submissions has ended but I'm pretty confident that if you had something you wanted them to know, you could just email it to them and they'd take a look. My impression is that they're broadly aware of extreme risks from certain kinds of open-sourcing but might benefit from (a) clearer explanations of ARA threat models and (b) specific suggestions for what needs to be done.

Comment by Akash (akash-wasil) on We might be dropping the ball on Autonomous Replication and Adaptation. · 2024-05-31T15:10:01.711Z · LW · GW

Why do you think we are dropping the ball on ARA?

I think many members of the policy community feel like ARA is "weird" and therefore don't want to bring it up. It's much tamer to talk about CBRN threats and bioweapons. It also requires less knowledge and general competence– explaining ARA and autonomous systems risks is difficult, you get more questions, you're more likely to explain something poorly, etc.

Historically, there was also a fair amount of gatekeeping, where some of the experienced policy people were explicitly discouraging people from being explicit about AGI threat models (this still happens to some degree, but I think the effect is much weaker than it was a year ago.)

With all this in mind, I currently think raising awareness about ARA threat models and AI R&D threat models is one of the most important things for AI comms/policy efforts to get right.

In the status quo, even if the evals go off, I don't think we have laid the intellectual foundation required for policymakers to understand why the evals are dangerous. "Oh interesting– an AI can make copies of itself? A little weird but I guess we make copies of files all the time, shrug." or "Oh wow– AI can help with R&D? That's awesome– seems very exciting for innovation."

I do think there's a potential to lay the intellectual foundation before it's too late, and I think many groups are starting to be more direct/explicit about the "weirder" threat models. Also, I think national security folks have more of a "take things seriously and worry about things even if there isn't clear empirical evidence yet" mentality than ML people. And I think typical policymakers fall somewhere in between. 

Comment by Akash (akash-wasil) on Non-Disparagement Canaries for OpenAI · 2024-05-31T02:31:11.874Z · LW · GW

Minor note: Paul is at the US AI Safety Institute, while Jade & Geoffrey are at the UK AI Safety Institute. 

Comment by Akash (akash-wasil) on MIRI 2024 Communications Strategy · 2024-05-30T21:35:00.502Z · LW · GW

@habryka I think you're making a claim about whether or not the difference matters (IMO it does) but I perceived @Kaj_Sotala to be making a claim about whether "an average reasonably smart person out in society" would see the difference as meaningful (IMO they would not). 

(My guess is you interpreted "reasonable people" to mean like "people who are really into reasoning about the world and trying to figure out the truth" and Kaj interpreted reasonable people to mean like "an average person." Kaj should feel free to correct me if I'm wrong.)

Comment by Akash (akash-wasil) on MIRI 2024 Communications Strategy · 2024-05-30T21:30:03.138Z · LW · GW

My two cents RE particular phrasing:

When talking to US policymakers, I don't think there's a big difference between "causes a national security crisis" and "kills literally everyone." Worth noting that even though many in the AIS community see a big difference between "99% of people die but civilization restarts" vs. "100% of people die", IMO this distinction does not matter to most policymakers (or at least matters way less to them).

Of course, in addition to conveying "this is a big deal" you need to convey the underlying threat model. There are lots of ways to interpret "AI causes a national security emergency" (e.g., China, military conflict). "Kills literally everyone" probably leads people to envision a narrower set of worlds.

But IMO even "kills literally everybody" doesn't really convey the underlying misalignment/AI takeover threat model.

So my current recommendation (weakly held) is probably to go with "causes a national security emergency" or "overthrows the US government" and then accept that you have to do some extra work to actually get them to understand the "AGI--> AI takeover--> Lots of people die and we lose control" model.