Posts

Verification methods for international AI agreements 2024-08-31T14:58:10.986Z
Advice to junior AI governance researchers 2024-07-08T19:19:07.316Z
Mitigating extreme AI risks amid rapid progress [Linkpost] 2024-05-21T19:59:21.343Z
Akash's Shortform 2024-04-18T15:44:25.096Z
Cooperating with aliens and AGIs: An ECL explainer 2024-02-24T22:58:47.345Z
OpenAI's Preparedness Framework: Praise & Recommendations 2024-01-02T16:20:04.249Z
Speaking to Congressional staffers about AI risk 2023-12-04T23:08:52.055Z
Navigating emotions in an uncertain & confusing world 2023-11-20T18:16:09.492Z
Chinese scientists acknowledge xrisk & call for international regulatory body [Linkpost] 2023-11-01T13:28:43.723Z
Winners of AI Alignment Awards Research Contest 2023-07-13T16:14:38.243Z
AI Safety Newsletter #8: Rogue AIs, how to screen for AI risks, and grants for research on democratic governance of AI 2023-05-30T11:52:31.669Z
AI Safety Newsletter #7: Disinformation, Governance Recommendations for AI labs, and Senate Hearings on AI 2023-05-23T21:47:34.755Z
Eisenhower's Atoms for Peace Speech 2023-05-17T16:10:38.852Z
AI Safety Newsletter #6: Examples of AI safety progress, Yoshua Bengio proposes a ban on AI agents, and lessons from nuclear arms control 2023-05-16T15:14:45.921Z
AI Safety Newsletter #5: Geoffrey Hinton speaks out on AI risk, the White House meets with AI labs, and Trojan attacks on language models 2023-05-09T15:26:55.978Z
AI Safety Newsletter #4: AI and Cybersecurity, Persuasive AIs, Weaponization, and Geoffrey Hinton talks AI risks 2023-05-02T18:41:43.144Z
Discussion about AI Safety funding (FB transcript) 2023-04-30T19:05:34.009Z
Reframing the burden of proof: Companies should prove that models are safe (rather than expecting auditors to prove that models are dangerous) 2023-04-25T18:49:29.042Z
DeepMind and Google Brain are merging [Linkpost] 2023-04-20T18:47:23.016Z
AI Safety Newsletter #2: ChaosGPT, Natural Selection, and AI Safety in the Media 2023-04-18T18:44:35.923Z
Request to AGI organizations: Share your views on pausing AI progress 2023-04-11T17:30:46.707Z
AI Safety Newsletter #1 [CAIS Linkpost] 2023-04-10T20:18:57.485Z
Reliability, Security, and AI risk: Notes from infosec textbook chapter 1 2023-04-07T15:47:16.581Z
New survey: 46% of Americans are concerned about extinction from AI; 69% support a six-month pause in AI development 2023-04-05T01:26:51.830Z
[Linkpost] Critiques of Redwood Research 2023-03-31T20:00:09.784Z
What would a compute monitoring plan look like? [Linkpost] 2023-03-26T19:33:46.896Z
The Overton Window widens: Examples of AI risk in the media 2023-03-23T17:10:14.616Z
The Wizard of Oz Problem: How incentives and narratives can skew our perception of AI developments 2023-03-20T20:44:29.445Z
[Linkpost] Scott Alexander reacts to OpenAI's latest post 2023-03-11T22:24:39.394Z
Questions about Conjecure's CoEm proposal 2023-03-09T19:32:50.600Z
AI Governance & Strategy: Priorities, talent gaps, & opportunities 2023-03-03T18:09:26.659Z
Fighting without hope 2023-03-01T18:15:05.188Z
Qualities that alignment mentors value in junior researchers 2023-02-14T23:27:40.747Z
4 ways to think about democratizing AI [GovAI Linkpost] 2023-02-13T18:06:41.208Z
How evals might (or might not) prevent catastrophic risks from AI 2023-02-07T20:16:08.253Z
[Linkpost] Google invested $300M in Anthropic in late 2022 2023-02-03T19:13:32.112Z
Many AI governance proposals have a tradeoff between usefulness and feasibility 2023-02-03T18:49:44.431Z
Talk to me about your summer/career plans 2023-01-31T18:29:23.351Z
Advice I found helpful in 2022 2023-01-28T19:48:23.160Z
11 heuristics for choosing (alignment) research projects 2023-01-27T00:36:08.742Z
"Status" can be corrosive; here's how I handle it 2023-01-24T01:25:04.539Z
[Linkpost] TIME article: DeepMind’s CEO Helped Take AI Mainstream. Now He’s Urging Caution 2023-01-21T16:51:09.586Z
Wentworth and Larsen on buying time 2023-01-09T21:31:24.911Z
[Linkpost] Jan Leike on three kinds of alignment taxes 2023-01-06T23:57:34.788Z
My thoughts on OpenAI's alignment plan 2022-12-30T19:33:15.019Z
An overview of some promising work by junior alignment researchers 2022-12-26T17:23:58.991Z
Podcast: Tamera Lanham on AI risk, threat models, alignment proposals, externalized reasoning oversight, and working at Anthropic 2022-12-20T21:39:41.866Z
12 career-related questions that may (or may not) be helpful for people interested in alignment research 2022-12-12T22:36:21.936Z
Podcast: Shoshannah Tekofsky on skilling up in AI safety, visiting Berkeley, and developing novel research ideas 2022-11-25T20:47:09.832Z
Announcing AI Alignment Awards: $100k research contests about goal misgeneralization & corrigibility 2022-11-22T22:19:09.419Z

Comments

Comment by Akash (akash-wasil) on Akash's Shortform · 2024-11-21T16:28:58.488Z · LW · GW

it's less clear that a non-centralized situation inevitably leads to a decisive strategic advantage for the leading project

Can you say more about what has contributed to this update?

Comment by Akash (akash-wasil) on Akash's Shortform · 2024-11-21T16:27:52.500Z · LW · GW

Can you say more about scenarios where you envision a later project happening that has different motivations?

I think in the current zeitgeist, such a project would almost definitely be primarily motivated by beating China. It doesn't seem clear to me that it's good to wait for a new zeitgeist. Reasons:

  • A company might develop AGI (or an AI system that is very good at AI R&D that can get to AGI) before a major zeitgeist change.
  • The longer we wait, the more capable the "most capable model that wasn't secured" is. So we could risk getting into a scenario where people want to pause but since China and the US both have GPT-Nminus1, both sides feel compelled to race forward (whereas this wouldn't have happened if security had kicked off sooner.)
Comment by Akash (akash-wasil) on Akash's Shortform · 2024-11-20T20:07:18.721Z · LW · GW

If you could only have "partial visibility", what are some of the things you would most want the government to be able to know?

Comment by Akash (akash-wasil) on Akash's Shortform · 2024-11-20T19:58:59.946Z · LW · GW

Another frame: If alignment turns out to be easy, then the default trajectory seems fine (at least from an alignment POV. You might still be worried about EG concentration of power). 

If alignment turns out to be hard, then the policy decisions we make to affect the default trajectory matter a lot more.

This means that even if misalignment risks are relatively low, a lot of value still comes from thinking about worlds where misalignment is hard (or perhaps "somewhat hard but not intractably hard").

Comment by Akash (akash-wasil) on Akash's Shortform · 2024-11-20T17:34:37.927Z · LW · GW

What do you think are the most important factors for determining if it results in them behaving responsibly later? 

For instance, if you were in charge of designing the AI Manhattan Project, are there certain things you would do to try to increase the probability that it leads to the USG "behaving more responsibly later?"

Comment by Akash (akash-wasil) on Akash's Shortform · 2024-11-20T17:18:51.077Z · LW · GW

Good points. Suppose you were on a USG taskforce that had concluded they wanted to go with the "subsidy model", but they were willing to ask for certain concessions from industry.

Are there any concessions/arrangements that you would advocate for? Are there any ways to do the "subsidy model" well, or do you think the model is destined to fail even if there were a lot of flexibility RE how to implement it?

Comment by Akash (akash-wasil) on Akash's Shortform · 2024-11-20T17:13:29.705Z · LW · GW

My own impression is that this would be an improvement over the status quo. Main reasons:

  • A lot of my P(doom) comes from race dynamics.
  • Right now, if a leading lab ends up realizing that misalignment risks are super concerning, they can't do much to end the race. Their main strategy would be to go to the USG.
  • If the USG runs the Manhattan Project (or there's some sort of soft nationalization in which the government ends up having a much stronger role), it's much easier for the USG to see that misalignment risks are concerning & to do something about it.
  • A national project would be more able to slow down and pursue various kinds of international agreements (the national project has more access to POTUS, DoD, NSC, Congress, etc.)
  • I expect the USG to be stricter on various security standards. It seems more likely to me that the USG would EG demand a lot of security requirements to prevent model weights or algorithmic insights from leaking to China. One of my major concerns is that people will want to pause at GPT-X but they won't feel able to because China stole access to GPT-Xminus1 (or maybe even a slightly weaker version of GPT-X).
  • In general, I feel like USG natsec folks are less "move fast and break things" than folks in SF. While I do think some of the AGI companies have tried to be less "move fast and break things" than the average company, I think corporate race dynamics & the general cultural forces have been the dominant factors and undermined a lot of attempts at meaningful corporate governance.

(Caveat that even though I see this as a likely improvement over status quo, this doesn't mean I think this is the best thing to be advocating for.)

(Second caveat that I haven't thought about this particular question very much and I could definitely be wrong & see a lot of reasonable counterarguments.)

Comment by Akash (akash-wasil) on Akash's Shortform · 2024-11-20T17:00:11.138Z · LW · GW

@davekasten @Zvi @habryka @Rob Bensinger @ryan_greenblatt @Buck @tlevin @Richard_Ngo @Daniel Kokotajlo I suspect you might have interesting thoughts on this. (Feel free to ignore though.)

Comment by Akash (akash-wasil) on Akash's Shortform · 2024-11-20T16:58:27.334Z · LW · GW

Suppose the US government pursued a "Manhattan Project for AGI". At its onset, it's primarily fuelled by a desire to beat China to AGI. However, there's some chance that its motivation shifts over time (e.g., if the government ends up thinking that misalignment risks are a big deal, its approach to AGI might change.)

Do you think this would be (a) better than the current situation, (b) worse than the current situation, or (c) it depends on XYZ factors?

Comment by Akash (akash-wasil) on Bogdan Ionut Cirstea's Shortform · 2024-11-19T23:11:21.852Z · LW · GW

We're not going to be bottlenecked by politicians not caring about AI safety. As AI gets crazier and crazier everyone would want to do AI safety, and the question is guiding people to the right AI safety policies

I think we're seeing more interest in AI, but I think interest in "AI in general" and "AI through the lens of great power competition with China" has vastly outpaced interest in "AI safety". (Especially if we're using a narrow definition of AI safety; note that people in DC often use the term "AI safety" to refer to a much broader set of concerns than AGI safety/misalignment concerns.)

I do think there's some truth to the quote (we are seeing more interest in AI and some safety topics), but I think there's still a lot to do to increase the salience of AI safety (and in particular AGI alignment) concerns.

Comment by Akash (akash-wasil) on OpenAI Email Archives (from Musk v. Altman) · 2024-11-16T16:06:19.795Z · LW · GW

A few quotes that stood out to me:

Greg:

I hope for us to enter the field as a neutral group, looking to collaborate widely and shift the dialog towards being about humanity winning rather than any particular group or company. 

Greg and Ilya (to Elon):

The goal of OpenAI is to make the future good and to avoid an AGI dictatorship. You are concerned that Demis could create an AGI dictatorship. So do we. So it is a bad idea to create a structure where you could become a dictator if you chose to, especially given that we can create some other structure that avoids this possibility.

Greg and Ilya (to Altman):

But we haven't been able to fully trust your judgements throughout this process, because we don't understand your cost function.

We don't understand why the CEO title is so important to you. Your stated reasons have changed, and it's hard to really understand what's driving it.

Is AGI truly your primary motivation? How does it connect to your political goals? How has your thought process changed over time?

Comment by Akash (akash-wasil) on Lao Mein's Shortform · 2024-11-15T20:43:04.046Z · LW · GW

and recently founded another AI company

Potentially a hot take, but I feel like xAI's contributions to race dynamics (at least thus far) have been relatively trivial. I am usually skeptical of the whole "I need to start an AI company to have a seat at the table", but I do imagine that Elon owning an AI company strengthens his voice. And I think his AI-related comms have mostly been used to (a) raise awareness about AI risk, (b) raise concerns about OpenAI/Altman, and (c) endorse SB1047 [which he did even faster and less ambiguously than Anthropic].

The counterargument here is that maybe if xAI was in 1st place, Elon's positions would shift. I find this plausible, but I also find it plausible that Musk (a) actually cares a lot about AI safety, (b) doesn't trust the other players in the race, and (c) is more likely to use his influence to help policymakers understand AI risk than any of the other lab CEOs.

Comment by Akash (akash-wasil) on Making a conservative case for alignment · 2024-11-15T20:35:31.232Z · LW · GW

I agree with many points here and have been excited about AE Studio's outreach. Quick thoughts on China/international AI governance:

  • I think some international AI governance proposals have some sort of "kum ba yah, we'll all just get along" flavor/tone to them, or some sort of "we should do this because it's best for the world as a whole" vibe. This isn't even Dem-coded so much as it is naive-coded, especially in DC circles.
  • US foreign policy is dominated primarily by concerns about US interests. Other considerations can matter, but they are not the dominant driving force. My impression is that this is true within both parties (with a few exceptions).
  • I think folks interested in international AI governance should study international security agreements and try to get a better understanding of relevant historical case studies. Lots of stuff to absorb from the Cold War, the Iran Nuclear Deal, US-China relations over the last several decades, etc. (I've been doing this & have found it quite helpful.)
  • Strong Republican leaders can still engage in bilateral/multilateral agreements that serve US interests. Recall that Reagan negotiated arms control agreements with the Soviet Union, and the (first) Trump Administration facilitated the Abraham Accords. Being "tough on China" doesn't mean "there are literally no circumstances in which I would be willing to sign a deal with China." (But there likely does have to be a clear case that the deal serves US interests, has appropriate verification methods, etc.)
Comment by Akash (akash-wasil) on Daniel Kokotajlo's Shortform · 2024-11-12T22:43:38.770Z · LW · GW

Did they have any points that you found especially helpful, surprising, or interesting? Anything you think folks in AI policy might not be thinking enough about?

(Separately, I hope to listen to these at some point & send reactions if I have any.)

Comment by Akash (akash-wasil) on dirk's Shortform · 2024-11-02T21:01:07.354Z · LW · GW

you have to spend several years resume-building before painstakingly convincing people you're worth hiring for paid work

For government roles, I think "years of experience" is definitely an important factor. But I don't think you need to have been specializing for government roles specifically.

Especially for AI policy, there are several programs that are basically like "hey, if you have AI expertise but no background in policy, we want your help." To be clear, these are often still fairly competitive, but I think it's much more about being generally capable/competent and less about having optimized your resume for policy roles. 

Comment by Akash (akash-wasil) on The Compendium, A full argument about extinction risk from AGI · 2024-11-01T14:32:26.844Z · LW · GW

I like the section where you list out specific things you think people should do. (One objection I sometimes hear is something like "I know that [evals/RSPs/if-then plans/misc] are not sufficient, but I just don't really know what else there is to do. It feels like you either have to commit to something tangible that doesn't solve the whole problem or you just get lost in a depressed doom spiral.")

I think your section on suggestions could be stronger by presenting more ambitious/impactful stories of comms/advocacy. I think there's something tricky about a document that has the vibe "this is the most important issue in the world and pretty much everyone else is approaching it the wrong way" and then pivots to "and the right way to approach it is to post on Twitter and talk to your friends." 

My guess is that you prioritized listing things that were relatively low friction and accessible. (And tbc I do think that the world would be in better shape if more people were sharing their views and contributing to the broad discourse.)

But I think when you're talking to high-context AIS people who are willing to devote their entire career to work on AI Safety, they'll be interested in more ambitious/sexy/impactful ways of contributing. 

Put differently: Should I really quit my job at [fancy company or high-status technical safety group] to Tweet about my takes, talk to my family/friends, and maybe make some website? Or are there other paths I could pursue?

As I wrote here, I think we have some of those ambitious/sexy/high-impact role models that could be used to make this pitch stronger, more ambitious, and more inspiring. EG:

One possible critique is that their suggestions are not particularly ambitious. This is likely because they're writing for a broader audience (people who haven't been deeply engaged in AI safety).

For people who have been deeply engaged in AI safety, I think the natural steelman here is "focus on helping the public/government better understand the AI risk situation." 

There are at least some impactful and high-status examples of this (e.g., Hinton, Bengio, Hendrycks). I think in the last few years, for instance, most people would agree that Hinton/Bengio/Hendrycks have had far more impact in their communications/outreach/policy work than their technical research work.

And it's not just the famous people– I can think of ~10 junior or mid-career people who left technical research in the last year to help policymakers better understand AI progress and AI risk, and I think their work is likely far more impactful than if they had stayed in technical research. (And I think is true even if I exclude technical people who are working on evals/if-then plans in govt. Like, I'm focusing on people who see their primary purpose as helping the public or policymakers develop "situational awareness", develop stronger models of AI progress and AI risk, understand the conceptual arguments for misalignment risk, etc.)

I'd also be curious to hear what your thoughts are on people joining government organizations (like the US AI Safety Institute, UK AI Safety Institute, Horizon Fellowship, etc.) Most of your suggestions seem to involve contributing from outside government, and I'd be curious to hear more about your suggestions for people who are either working in government or open to working in government.

Comment by Akash (akash-wasil) on johnswentworth's Shortform · 2024-11-01T14:01:20.395Z · LW · GW

I think there's something about Bay Area culture that can often get technical people to feel like the only valid way to contribute is through technical work. It's higher status and sexier and there's a default vibe that the best way to understand/improve the world is through rigorous empirical research.

I think this an incorrect (or at least incomplete) frame, and I think on-the-margin it would be good for more technical people to spend 1-5 days seriously thinking about what alternative paths they could pursue in comms/policy

I also think there are memes spreading around that you need to be some savant political mastermind genius to do comms/policy, otherwise you will be net negative. The more I meet policy people (including successful policy people from outside the AIS bubble), the more I think this narrative was, at best, an incorrect model of the world. At worst, a take that got amplified in order to prevent people from interfering with the AGI race (e.g., by granting excess status+validity to people/ideas/frames that made it seem crazy/unilateralist/low-status to engage in public outreach, civic discourse, and policymaker engagement.)

(Caveat: I don't think the adversarial frame explains everything, and I do think there are lots of people who were genuinely trying to reason about a complex world and just ended up underestimating how much policy interest there would be and/or overestimating the extent to which labs would be able to take useful actions despite the pressures of race dynamics.)

Comment by Akash (akash-wasil) on johnswentworth's Shortform · 2024-11-01T14:01:09.926Z · LW · GW

One of the common arguments in favor of investing more resources into current governance approaches (e.g., evals, if-then plans, RSPs) is that there's nothing else we can do. There's not a better alternative– these are the only things that labs and governments are currently willing to support.

The Compendium argues that there are other (valuable) things that people can do, with most of these actions focusing on communicating about AGI risks. Examples:

  • Share a link to this Compendium online or with friends, and provide your feedback on which ideas are correct and which are unconvincing. This is a living document, and your suggestions will shape our arguments.
  • Post your views on AGI risk to social media, explaining why you believe it to be a legitimate problem (or not).
  • Red-team companies’ plans to deal with AI risk, and call them out publicly if they do not have a legible plan. 

One possible critique is that their suggestions are not particularly ambitious. This is likely because they're writing for a broader audience (people who haven't been deeply engaged in AI safety).

For people who have been deeply engaged in AI safety, I think the natural steelman here is "focus on helping the public/government better understand the AI risk situation." 

There are at least some impactful and high-status examples of this (e.g., Hinton, Bengio, Hendrycks). I think in the last few years, for instance, most people would agree that Hinton/Bengio/Hendrycks have had far more impact in their communications/outreach/policy work than their technical research work.

And it's not just the famous people– I can think of ~10 junior or mid-career people who left technical research in the last year to help policymakers better understand AI progress and AI risk, and I think their work is likely far more impactful than if they had stayed in technical research. (And I'm even excluding people who are working on evals/if-then plans: like, I'm focusing on people who see their primary purpose as helping the public or policymakers develop "situational awareness", develop stronger models of AI progress and AI risk, understand the conceptual arguments for misalignment risk, etc.)

Comment by Akash (akash-wasil) on johnswentworth's Shortform · 2024-11-01T13:41:55.284Z · LW · GW

I appreciated their section on AI governance. The "if-then"/RSP/preparedness frame has become popular, and they directly argue for why they oppose this direction. (I'm a fan of preparedness efforts– especially on the government level– but I think it's worth engaging with the counterarguments.)

Pasting some content from their piece below.

High-level thesis against current AI governance efforts:

The majority of existing AI safety efforts are reactive rather than proactive, which inherently puts humanity in the position of managing risk rather than controlling AI development and preventing it.

Critique of reactive frameworks:

1. The reactive framework reverses the burden of proof from how society typically regulates high-risk technologies and industries.

In most areas of law, we do not wait for harm to occur before implementing safeguards. Banks are prohibited from facilitating money laundering from the moment of incorporation, not after their first offense. Nuclear power plants must demonstrate safety measures before operation, not after a meltdown.

The reactive framework problematically reverses the burden of proof. It assumes AI systems are safe by default and only requires action once risks are detected. One of the core dangers of AI systems is precisely that we do not know what they will do or how powerful they will be before we train them. The if-then framework opts to proceed until problems arise, rather than pausing development and deployment until we can guarantee safety. This implicitly endorses the current race to AGI.

This reversal is exactly what makes the reactive framework preferable for AI companies. 

Critique of waiting for warning shots:

3. The reactive framework incorrectly assumes that an AI “warning shot” will motivate coordination.

Imagine an extreme situation in which an AI disaster serves as a “warning shot” for humanity. This would imply that powerful AI has been developed and that we have months (or less) to develop safety measures or pause further development. After a certain point, an actor with sufficiently advanced AI may be ungovernable, and misaligned AI may be uncontrollable. 

When horrible things happen, people do not suddenly become rational. In the face of an AI disaster, we should expect chaos, adversariality, and fear to be the norm, making coordination very difficult. The useful time to facilitate coordination is before disaster strikes. 

However, the reactive framework assumes that this is essentially how we will build consensus in order to regulate AI. The optimistic case is that we hit a dangerous threshold before a real AI disaster, alerting humanity to the risks. But history shows that it is exactly in such moments that these thresholds are most contested –- this shifting of the goalposts is known as the AI Effect and common enough to have its own Wikipedia page. Time and again, AI advancements have been explained away as routine processes, whereas “real AI” is redefined to be some mystical threshold we have not yet reached. Dangerous capabilities are similarly contested as they arise, such as how recent reports of OpenAI’s o1 being deceptive have been questioned

This will become increasingly common as competitors build increasingly powerful capabilities and approach their goal of building AGI. Universally, powerful stakeholders fight for their narrow interests, and for maintaining the status quo, and they often win, even when all of society is going to lose. Big Tobacco didn’t pause cigarette-making when they learned about lung cancer; instead they spread misinformation and hired lobbyists. Big Oil didn’t pause drilling when they learned about climate change; instead they spread misinformation and hired lobbyists. Likewise, now that billions of dollars are pouring into the creation of AGI and superintelligence, we’ve already seen competitors fight tooth and nail to keep building. If problems arise in the future, of course they will fight for their narrow interests, just as industries always do. And as the AI industry gets larger, more entrenched, and more essential over time, this problem will grow rapidly worse. 

Comment by Akash (akash-wasil) on What AI companies should do: Some rough ideas · 2024-10-26T21:34:33.673Z · LW · GW

TY. Some quick reactions below:

  • Agree that the public stuff has immediate effects that could be costly. (Hiding stuff from the public, refraining from discussing important concerns publicly, or developing a reputation for being kinda secretive/sus can also be costly; seems like an overall complex thing to model IMO.)
  • Sharing info with government could increase the chance of a leak, especially if security isn't great. I expect the most relevant info is info that wouldn't be all-that-costly if leaked (e.g., the government doesn't need OpenAI to share its secret sauce/algorithmic secrets. Dangerous capability eval results leaking or capability forecasts leaking seem less costly, except from a "maybe people will respond by demanding more govt oversight" POV.
  • I think all-in-all I still see the main cost as making safety regulation more likely, but I'm more uncertain now, and this doesn't seem like a particularly important/decision-relevant point. Will edit the OG comment to language that I endorse with more confidence.
Comment by Akash (akash-wasil) on What AI companies should do: Some rough ideas · 2024-10-26T20:22:08.815Z · LW · GW

@Zach Stein-Perlman @ryan_greenblatt feel free to ignore but I'd be curious for one of you to explain your disagree react. Feel free to articulate some of the ways in which you think I might be underestimating the costliness of transparency requirements. 

(My current estimate is that whistleblower mechanisms seem very easy to maintain, reporting requirements for natsec capabilities seem relatively easy insofar as most of the information is stuff you already planned to collect, and even many of the more involved transparency ideas (e.g., interview programs) seem like they could be implemented with pretty minimal time-cost.)

Comment by Akash (akash-wasil) on Lab governance reading list · 2024-10-26T19:56:07.260Z · LW · GW

Yeah, I think there's a useful distinction between two different kinds of "critiques:"

  • Critique #1: I have reviewed the preparedness framework and I think the threshold for "high-risk" in the model autonomy category is too high. Here's an alternative threshold.
  • Critique #2: The entire RSP/PF effort is not going to work because [they're too vague//labs don't want to make them more specific//they're being used for safety-washing//labs will break or weaken the RSPs//race dynamics will force labs to break RSPs//labs cannot be trusted to make or follow RSPs that are sufficiently strong/specific/verifiable]. 

I feel like critique #1 falls more neatly into "this counts as lab governance" whereas IMO critique #2 falls more into "this is a critique of lab governance." In practice the lines blur. For example, I think last year there was a lot more "critique #1" style stuff, and then over time as the list of specific object-level critiques grew, we started to see more support for things in the "critique #2" bucket.

Comment by Akash (akash-wasil) on Lab governance reading list · 2024-10-25T23:40:12.512Z · LW · GW

Perhaps this isn’t in scope, but if I were designing a reading list on “lab governance”, I would try to include at least 1-2 perspectives that highlight the limitations of lab governance, criticisms of focusing too much on lab governance, etc.

Specific examples might include criticisms of RSPs, Kelsey’s coverage of the OpenAI NDA stuff, alleged instances of labs or lab CEOs misleading the public/policymakers, and perspectives from folks like Tegmark and Leahy (who generally see a lot of lab governance as safety-washing and probably have less trust in lab CEOs than the median AIS person).

(Perhaps such perspectives get covered in other units, but part of me still feels like it’s pretty important for a lab governance reading list to include some of these more “fundamental” critiques of lab governance. Especially insofar as, broadly speaking, I think a lot of AIS folks were more optimistic about lab governance 1-3 years ago than they are now.)

Comment by Akash (akash-wasil) on Anthropic rewrote its RSP · 2024-10-25T14:58:33.115Z · LW · GW

Henry from SaferAI claims that the new RSP is weaker and vaguer than the old RSP. Do others have thoughts on this claim? (I haven't had time to evaluate yet.)

Main Issue: Shift from precise definitions to vague descriptions.
The primary issue lies in Anthropic's shift away from precisely defined capability thresholds and mitigation measures. The new policy adopts more qualitative descriptions, specifying the capability levels they aim to detect and the objectives of mitigations, but it lacks concrete details on the mitigations and evaluations themselves. This shift significantly reduces transparency and accountability, essentially asking us to accept a "trust us to handle it appropriately" approach rather than providing verifiable commitments and metrics.

More from him:

Example: Changes in capability thresholds.
To illustrate this change, let's look at a capability threshold:

1️⃣ Version 1 (V1): AI Security Level 3 (ASL-3) was defined as "The model shows early signs of autonomous self-replication ability, as defined by a 50% aggregate success rate on the tasks listed in [Appendix on Autonomy Evaluations]."

2️⃣ Version 2 (V2): ASL-3 is now defined as "The ability to either fully automate the work of an entry-level remote-only researcher at Anthropic, or cause dramatic acceleration in the rate of effective scaling" (quantified as an increase of approximately 1000x in a year).

In V2, the thresholds are no longer defined by quantitative benchmarks. Anthropic now states that they will demonstrate that the model's capabilities are below these thresholds when necessary. However, this approach is susceptible to shifting goalposts as capabilities advance.

🔄 Commitment Changes: Dilution of mitigation strategies.
A similar trend is evident in their mitigation strategies. Instead of detailing specific measures, they focus on mitigation objectives, stating they will prove these objectives are met when required. This change alters the nature of their commitments.

💡 Key Point: Committing to robust measures and then diluting them significantly is not how genuine commitments are upheld.
The general direction of these changes is concerning. By allowing more leeway to decide if a model meets thresholds, Anthropic risks prioritizing scaling over safety, especially as competitive pressures intensify.

I was expecting the RSP to become more specific as technology advances and their risk management process matures, not the other way around.

Comment by Akash (akash-wasil) on Drake Thomas's Shortform · 2024-10-23T18:27:46.757Z · LW · GW

@Drake Thomas are you interested in talking about other opportunities that might be better for the world than your current position (and meet other preferences of yours)? Or are you primarily interested in the "is my current position net positive or net negative for the world" question?

Comment by Akash (akash-wasil) on Akash's Shortform · 2024-10-23T15:27:56.987Z · LW · GW

Does anyone know why Anthropic doesn't want models with powerful cyber capabilities to be classified as "dual-use foundation models?"

In its BIS comment, Anthropic proposes a new definition of dual-use foundation model that excludes cyberoffensive capabilities. This also comes up in TechNet's response (TechNet is a trade association that Anthropic is a part of).

Does anyone know why Anthropic doesn't want the cyber component of the definition to remain? (I don't think they cover this in the comment).

---

More details– the original criteria for "dual-use foundation model" proposed by BIS are:

(1) Substantially lowering the barrier of entry for non-experts to design, synthesize, acquire, or use chemical, biological, radiological, or nuclear (CBRN) weapons;
(2) Enabling powerful offensive cyber operations through automated vulnerability discovery and exploitation against a wide range of potential targets of cyberattacks; or
(3) Permitting the evasion of human control or oversight through means of deception or obfuscation.

Anthropic's definition includes criteria #1 and #3 in its definition but excludes criterion #2. 

(Separately, Anthropic argues that dual-use foundation models should be defined as those that pose catastrophic risks as opposed to serious risks to national security. This is important too, but I'm less confused about why Anthropic wants this.)

Comment by Akash (akash-wasil) on Shortform · 2024-10-23T13:56:59.921Z · LW · GW

Where does he say this? (I skimmed and didn’t see it.)

Link here: https://www.astralcodexten.com/p/sb-1047-our-side-of-the-story

Comment by Akash (akash-wasil) on Akash's Shortform · 2024-10-22T22:03:23.489Z · LW · GW

48 entities gave feedback on the Department of Commerce AI reporting requirements

Public comments offering feedback on BIS's proposed reporting requirements are now up! It received responses from 48 entities including OpenAI, Anthropic, and many AI safety groups.

The reporting requirements are probably one of the most important things happening in US AI policy-- I'd encourage folks here to find time to skim some of the comments.

Comment by Akash (akash-wasil) on Sabotage Evaluations for Frontier Models · 2024-10-22T15:44:24.201Z · LW · GW

@ryan_greenblatt can you say more about what you mean by this one?

  1. We can't find countermeasures such that our control evaluations indicate any real safety.
Comment by Akash (akash-wasil) on Sabotage Evaluations for Frontier Models · 2024-10-22T15:43:57.818Z · LW · GW

I appreciate this distinction between the different types of outcomes-- nice. 

I'm pretty interested in peoples' forecasts for each. So revised questions would be:

  1. Evals are trustworthy but say that the model is no longer safe (or rather something like "there is a non-trivial probability that the model may be dangerous/scheming."). Using your best judgment, at what point do you expect sabotage evaluations will reveal that a model might not be safe (we're not confident that it's not scheming)? At what capability threshold do you expect we would no longer be able to say "yup, we are confident that the model is not scheming"? (Feel free to use heuristics like the OpenAI PF capability thresholds or ASL levels).
  2. Evals are no longer trustworthy and we notice. Using your best judgment, at what point do you expect sabotage evaluations will no longer be useful evals for determining whether or not a model is scheming? At what capability threshold do you expect we would no longer be able to say "yup, we are confident that these evals are trustworthy/useful"?
  3. Evals are no longer trustworthy and we do not notice. Using your best judgment, how likely is it that sabotage evaluations are undermined such that humans think the model can be deployed safely, but actually the humans were misled? At what capability threshold would you worry about this form of undermining?

@David Duvenaud would be curious about your takes, even if they're just speculative guesses/predictions. 

(And @ryan_greenblatt would be curious for your takes but replace "sabotage evaluations" with "control evaluations." In the event that you've written this up elsewhere, feel free to link it– helpful for me if you quote the particular section.)

Comment by Akash (akash-wasil) on What AI companies should do: Some rough ideas · 2024-10-22T13:45:16.154Z · LW · GW

I think I agree with this idea in principle, but I also feel like it misses some things in practice (or something). Some considerations:

  • I think my bar for "how much I trust a lab such that I'm OK with them not making transparency commitments" is fairly high. I don't think any existing lab meets that bar.
  • I feel like a lot of forms of helpful transparency are not that costly. The main 'cost' EDIT: I think one of the most noteworthy costs is something like "maybe the government will end up regulating the sector if/when it understands how dangerous industry people expect AI systems to be and how many safety/security concerns they have". But I think things like "report dangerous stuff to the govt", "have a whistleblower mechanism", and even "make it clear that you're willing to have govt people come and ask about safety/security concerns" don't seem very costly from an immediate time/effort perspective.
  • If a Responsible Company implemented transparency stuff unilaterally, it would make it easier for the government to have proof-of-concept and implement the same requirements for other companies. In a lot of cases, showing that a concept works for company X (and that company X actually thinks it's a good thing) can reduce a lot of friction in getting things applied to companies Y and Z.

I do agree that some of this depends on the type of transparency commitment and there might be specific types of transparency commitments that don't make sense to pursue unilaterally. Off the top of my head, I can't think of any transparency requirements that I wouldn't want to see implemented unilaterally, and I can think of several that I would want to see (e.g., dangerous capability reports, capability forecasts, whistleblower mechanisms, sharing if-then plans with govt, sharing shutdown plans with govt, setting up interview program with govt, engaging publicly with threat models, having clear OpenAI-style tables that spell out which dangerous capabilities you're tracking/expecting).

Comment by Akash (akash-wasil) on What AI companies should do: Some rough ideas · 2024-10-22T01:40:46.093Z · LW · GW

Some ideas relating to comms/policy:

  • Communicate your models of AI risk to policymakers
    • Help policymakers understand emergency scenarios (especially misalignment scenarios) and how to prepare for them
    • Use your lobbying/policy teams primarily to raise awareness about AGI and help policymakers prepare for potential AGI-related global security risks.
  • Develop simple/clear frameworks that describe which dangerous capabilities you are tracking (I think OpenAI's preparedness framework is a good example, particularly RE simplicity/clarity/readability.)
  • Advocate for increased transparency into frontier AI development through measures like stronger reporting requirements, whistleblower mechanisms, embedded auditors/resident inspectors, etc.
  • Publicly discuss threat models (kudos to DeepMind)
  • Engage in public discussions/debates with people like Hinton, Bengio, Hendrycks, Kokotajlo, etc.
  • Encourage employees to engage in such discussions/debates, share their threat models, etc.
  • Make capability forecasts public (predictions for when models would have XYZ capabilities)
  • Communicate under what circumstances you think major government involvement would be necessary (e.g., nationalization, "CERN for AI" setups).
Comment by Akash (akash-wasil) on Sabotage Evaluations for Frontier Models · 2024-10-20T14:56:44.238Z · LW · GW

One thing I appreciate about Buck/Ryan's comms around AI control is that they explicitly acknowledge that they believe control will fail for sufficiently intelligent systems. And they try to describe the capability threshold at which they suspect control will stop working (e.g., here).

For those working on sabotage evaluations: At what capability threshold do you think the sabotage/sandbagging evaluations will no longer work? (Or do you think that these sabotage evaluations + modified versions of them will scale until arbitrarily-capable systems?)

Comment by Akash (akash-wasil) on The Hopium Wars: the AGI Entente Delusion · 2024-10-14T17:20:40.663Z · LW · GW

@Max Tegmark my impression is that you believe that some amount of cooperation between the US and China is possible. If the US takes steps that show that it is willing to avoid an AGI race, then there's some substantial probability that China will also want to avoid an AGI race. (And perhaps there could be some verification methods that support a "trust but verify" approach to international agreements.)

My main question: Are there circumstances under which you would no longer believe that cooperation is possible & you would instead find yourself advocating for an entente strategy?

When I look at the world right now, it seems to me like there's so much uncertainty around how governments will react to AGI that I think it's silly to throw out the idea of international coordination. As you mention in the post, there are also some signs that Chinese leaders and experts are concerned about AI risks.

It seems plausible to me that governments could– if they were sufficiently concerned about misalignment risks and believed in the assumptions behind calling an AGI race a "suicide race"– end up reaching cooperative agreements and pursuing some alternative to the "suicide race"

But suppose for sake of argument that there was compelling evidence that China was not willing to cooperate with the US. I don't mean the kind of evidence we have now, and I think we both probably agree that many actors will have incentives to say "there's no way China will cooperate with us" even in the absence of strong evidence. But if such evidence emerged, what do you think the best strategy would be from there? If hypothetically it became clear that China's leadership were essentially taking an e/acc approach and were really truly interested in getting AGI ~as quickly as possible, what do you think should be done?

I ask partially because I'm trying to think more clearly about these topics myself. I think my current viewpoint is something like:

  1. In general, the US should avoid taking actions that make a race with China more likely or inevitable.
  2. The primary plan should be for the US to engage in good-faith efforts to pursue international coordination, aiming toward a world where there are verifiable ways to avoid the premature development of AGI.
  3. We could end up in a scenario in which the prospect of international coordination has fallen apart. (e.g., China or some other major US adversary adopts a very "e/acc mindset" and seems to be gunning toward AGI with safety plans that are considerably worse than those proposed by the US.) At this point, it seems to me like the US would either have to (a) try to get to AGI before the adversary [essentially the Entente plan] or (b) give up and just kinda hope that the adversary ends up changing course as they get closer to AGI. Let's call this "world #3"[1].

Again, I think a lot of folks will have strong incentives to try to paint us as being in world #3, and I personally don't think we have enough evidence to say "yup, we're so confident we're in world #3 that we should go with an entente strategy.". But I'm curious if you've thought about the conditions under which you'd conclude that we are quite confidently in world #3 and what you think we should do from there.

 

  1. ^

    I sometimes think about the following situations:

    World #1: Status quo; governments are not sufficiently concerned; corporations race to develop AGI

    World #2: Governments become quite concerned about AGI and pursue international coordination

    World #3: Governments become quite concerned about AGI but there is strong evidence that at least one major world power is refusing to cooperate//gunning toward AGI.

Comment by Akash (akash-wasil) on Mark Xu's Shortform · 2024-10-10T17:17:56.353Z · LW · GW

I largely agree with this take & also think that people often aren't aware of some of GDM's bright spots from a safety perspective. My guess is that most people overestimate the degree to which ANT>GDM from a safety perspective.

For example, I think GDM has been thinking more about international coordination than ANT. Demis has said that he supports a "CERN for AI" model, and GDM's governance team (led by Allan Dafoe) has written a few pieces about international coordination proposals.

ANT has said very little about international coordination. It's much harder to get a sense of where ANT's policy team is at. My guess is that they are less enthusiastic about international coordination relative to GDM and more enthusiastic about things like RSPs, safety cases, and letting scaling labs continue unless/until there is clearer empirical evidence of loss of control risks. 

I also think GDM deserves some praise for engaging publicly with arguments about AGI ruin and threat models.

(On the other hand, GDM is ultimately controlled by Google, which makes it unclear how important Demis's opinions or Allan's work will be. Also, my impression is that Google was neutral or against SB1047, whereas ANT eventually said that the benefits outweighed the costs.)

Comment by Akash (akash-wasil) on Zach Stein-Perlman's Shortform · 2024-10-09T15:16:03.154Z · LW · GW

What do you think was underrated about it? I think when I read it I have some sort of "yeah, this makes sense" reaction but am not "wow'd" by it. 

It seems like the deeper challenge is figuring out how to align incentives. Can we find a structure where labs want to EG give white-box access to a bunch of external researchers and give them a long time to red-team models while somehow also maintaining the independence of the white-box auditors? How do you avoid industry capture? 

Same kinds of challenges come up with safety research– how do you give labs the incentive to publish safety research that makes their product or their approach look bad? How do you avoid publication bias and phacking-type concerns? 

I don't think your post is obligated to get into those concerns, but perhaps a post that grappled with those concerns would be something I'd be "wow'd" by, if that makes sense.

Comment by Akash (akash-wasil) on MATS AI Safety Strategy Curriculum v2 · 2024-10-08T16:09:17.568Z · LW · GW

It's quite hard to summarize AI governance in a few readings. With that in mind, here are some AI governance ideas/concepts/frames that I would add:

  • Emergency Preparedness (Wasil et al; exec summary + policy proposals - 3 mins)

    Governments should invest in strategies that can help them detect and prepare for time-sensitive AI risks. Governments should have ways to detect threats that would require immediate intervention & have preparedness plans for how they can effectively respond to various acute risk scenarios.

  • Safety cases (Irving - 3 mins; see also Clymer et al)

         Labs should present arguments that AI systems are safe within a particular training or deployment                context. 

(Others that I don't have time to summarize but still want to include:)

Comment by Akash (akash-wasil) on Mark Xu's Shortform · 2024-10-07T00:36:37.274Z · LW · GW

@Buck do you or Ryan have a writeup that includes: (a) a description of the capabilities of a system that you think would be able to do something useful for the sorts of objectives that Habryka talks about and (b) what that something useful is. 

Bonus points if it has (c) the likelihood that you think such a system will be controllable by 20XX and (d) what kind of control setup you think would be required to control it.

Comment by Akash (akash-wasil) on MichaelDickens's Shortform · 2024-10-03T02:57:32.102Z · LW · GW

Adding my two cents as someone who has a pretty different lens from Habryka but has still been fairly disappointed with OpenPhil, especially in the policy domain. 

Relative to Habryka, I am generally more OK with people "playing politics". I think it's probably good for AI safely folks to exhibit socially-common levels of "playing the game"– networking, finding common ground, avoiding offending other people, etc. I think some people in the rationalist sphere have a very strong aversion to some things in this genre, and labels like "power-seeking" and "deceptive" get thrown around too liberally. I also think I'm pretty with OpenPhil deciding it doesn't want to fund certain parts of the rationalist ecosystem (and probably less bothered than Habryka about how their comms around this wasn't direct/clear).

In that sense, I don't penalize OP much for trying to "play politics" or for breaking deontological norms. Nonetheless, I still feel pretty disappointed with them, particularly for their impact on comms/policy. Some thoughts here:

  • I agree with Habryka that it is quite bad that OP is not willing to fund right-coded things. Even many of the "bipartisan" things funded by OP are quite left-coded. (As a useful heuristic, whenever you hear of someone launching a bipartisan initiative, I think one should ask "what % of the staff of this organization is Republican?" Obviously just a heuristic– there are some cases in which a 90%-Dem staff can actually truly engage in "real" bipartisan efforts. But in some cases, you will have a 90%-Dem staff claiming to be interested in bipartisan work without any real interest in Republican ideas, few if any Republican contacts, and only a cursory understanding of Republican stances.)
  • I also agree with Habryka that OP seems overly focused on PR risks and not doing things that are weird/controversial. "To be a longtermist grantee these days you have to be the kind of person that OP thinks is not and will not be a PR risk, IE will not say weird or controversial stuff" sounds pretty accurate to me. OP cannot publicly admit this because this would be bad for its reputation– instead, it operates more subtly. 
  • Separately, I have seen OpenPhil attempt to block or restrain multiple efforts in which people were trying to straightforwardly explain AI risks to policymakers. My understanding is that OpenPhil would say that they believed the messengers weren't the right people (e.g., too inexperienced), and they thought the downside risks were too high. In practice, there are some real tradeoffs here: there are often people who seem to have strong models of AGI risk but little/no policy experience, and sometimes people who have extensive policy experience but only recently started engaging with AI/AGI issues. With that in mind, I think OpenPhil has systematically made poor tradeoffs here and failed to invest into (or in some cases, actively blocked) people who were willing to be explicit about AGI risks, loss of control risks, capability progress, and the need for regulation. (I also think the "actively blocking" thing has gotten less severe over time, perhaps in part because OpenPhil changed its mind a bit on the value of direct advocacy or perhaps because OpenPhil just decided to focus its efforts on things like research and advocacy projects found funding elsewhere.)
  • I think OpenPhil has an intellectual monoculture and puts explicit/implicit cultural pressure on people in the OP orbit to "stay in line." There is a lot of talk about valuing people who can think for themselves, but I think the groupthink problems are pretty real. There is a strong emphasis on "checking-in" with people before saying/doing things, and the OP bubble is generally much more willing to criticize action than inaction. I suspect that something like the CAIS statement or even a lot of the early Bengio comms would not have occured if Dan Hendrycks or Yoshua were deeply ingrained in the OP orbit. It is both the case that they would've been forced to write 10+ page Google Docs defending their theories of change and the case that the intellectual culture simply wouldn't have fostered this kind of thinking. 
  • I think the focus on evals/RSPs can largely be explained by a bias toward trusting labs. OpenPhil steered a lot of talent toward the evals/RSPs theory of change (specifically, if I recall correctly, OpenPhil leadership on AI was especially influential in steering a lot of the ecosystem to support and invest in the evals/RSPs theory of change.) I expect that when we look back in a few years, there will be a pretty strong feeling that this was the wrong call & that this should've been more apparent even without the benefit of hindsight.
  • I would be more sympathetic to OpenPhil in a world where their aversion to weirdness/PR risks resulted in them having a strong reputation, a lot of political capital, and real-world influence that matched the financial resources they possess. Sadly, I think we're in a "lose-lose" world: OpenPhil's reputation tends to be poor in many policy/journalism circles even while OpenPhil pursues a strategy that seems to be largely focused on avoiding PR risks. I think some of this is unjustified (e.g., a result of propaganda campaigns designed to paint anyone who cares about AI risk as awful). But then some of it actually is kind of reasonable (e.g., impartial observers viewing OpenPhil as kind of shady, not direct in its communications, not very willing to engage directly or openly with policymakers or journalists, having lots of conflicts of interests, trying to underplay the extent to which its funding priorities are influenced/constrained by a single Billionaire, being pretty left-coded, etc.)

To defend OpenPhil a bit, I do think it's quite hard to navigate trade-offs and I think sometimes people don't seem to recognize these tradeoffs. In AI policy, I think the biggest tradeoff is something like "lots of people who have engaged with technical AGI arguments and AGI threat models don't have policy experience, and lots of people who have policy experience don't have technical expertise or experience engaging with AGI threat models" (this is a bit of an oversimplification– there are some shining stars who have both.) 

I also think OpenPhil folks probably tend to have a different probability distribution over threat models (compared to me and probably also Habryka). For instance, it seems likely to me that OpenPhil employees operate in more of a "there are a lot of ways AGI could play out and a lot of uncertainty– we just need smart people thinking seriously about the problem. And who really know how hard alignment will be, maybe Anthropic will just figure it out" lens and less of a "ASI is coming and our priority needs to be making sure humanity understands the dangers associated with a reckless race toward ASI, and there's a substantial chance that we are seriously not on track to solve the necessary safety and security challenges unless we fundamentally reorient our whole approach" lens. 

And finally, I think despite these criticisms, OpenPhil is also responsible for some important wins (e.g., building the field, raising awareness about AGI risk on university campuses, funding some people early on before AI safety was a "big deal", jumpstarting the careers of some leaders in the policy space [especially in the UK]. It's also plausible to me that there are some cases in which OpenPhil gatekeeping was actually quite useful in preventing people from causing harm, even though I probably disagree with OpenPhil about the # and magnitude of these cases). 

Comment by Akash (akash-wasil) on "Slow" takeoff is a terrible term for "maybe even faster takeoff, actually" · 2024-09-29T17:53:12.384Z · LW · GW

Today we have enough bits to have a pretty good guess when where and how superintelligence will happen. 

@Alexander Gietelink Oldenziel can you expand on this? What's your current model of when/where/how superintelligence will happen?

Comment by Akash (akash-wasil) on Akash's Shortform · 2024-09-17T21:28:34.279Z · LW · GW

Recent Senate hearing includes testimony from Helen Toner and William Saunders

  • Both statements are explicit about AGI risks & emphasize the importance of transparency & whistleblower mechanisms. 
  • William's statement acknowledges that he and others doubt that OpenAI's safety work will be sufficient.
    • "OpenAI will say that they are improving. I and other employees who resigned doubt they will be ready in time. This is true not just with OpenAI; the incentives to prioritize rapid development apply to the entire industry. This is why a policy response is needed."
  • Helen's statement provides an interesting paragraph about China at the end.
    • "A closing note on China: The specter of ceding U.S. technological leadership to China is often treated as a knock-down argument against implementing regulations of any kind. Based on my research on the Chinese AI ecosystem and U.S.-China technology competition more broadly, I think this argument is not nearly as strong as it seems at first glance. We should certainly be mindful of how regulation can affect the pace of innovation at home, and keep a close eye on how our competitors and adversaries are developing and using AI. But looking in depth at Chinese AI development, the AI regulations they are already imposing, and the macro headwinds they face leaves me with the conclusion that they are far from being poised to overtake the United States.6 The fact that targeted, adaptive regulation does not have to slow down U.S. innovation—and in fact can actively support it—only strengthens this point."

Full hearing here (I haven't watched it yet.)

Comment by Akash (akash-wasil) on AI forecasting bots incoming · 2024-09-11T03:24:07.879Z · LW · GW

Hm, good question. I think it should be proportional to the amount of time it would take to investigate the concern(s).

For this, I think 1-2 weeks seems reasonable, at least for an initial response.

Comment by Akash (akash-wasil) on AI forecasting bots incoming · 2024-09-10T23:23:50.962Z · LW · GW

Hendryks had ample opportunity after initial skepticism to remove it, but chose not to.

IMO, this seems to demand a very immediate/sudden/urgent reaction. If Hendrycks ends up being wrong, I think he should issue some sort of retraction (and I think it would be reasonable to be annoyed if he doesn't.)

But I don't think the standard should be "you need to react to criticism within ~24 hours" for this kind of thing. If you write a research paper and people raise important concerns about it, I think you have a duty to investigate them and respond to them, but I don't think you need to fundamentally change your mind within the first few hours/days.

I think we should afford researchers the time to seriously evaluate claims/criticisms, reflect on them, and issue a polished statement (and potential retraction).

(Caveat that there are some cases where immediate action is needed– like EG if a company releases a product that is imminently dangerous– but I don't think "making an intellectual claim about LLM capabilities that turns out to be wrong" would meet my bar.)

Comment by Akash (akash-wasil) on Zach Stein-Perlman's Shortform · 2024-09-09T04:42:48.345Z · LW · GW

I think it's bad for discourse for us to pretend that discourse doesn't have impacts on others in a democratic society.

I think I agree with this in principle. Possible that the crux between us is more like "what is the role of LessWrong."

For instance, if Bob wrote a NYT article titled "Anthropic is not publishing its safety research", I would be like "meh, this doesn't seem like a particularly useful or high-priority thing to be bringing to everyone's attention– there are like at least 10+ topics I would've much rather Bob spent his points on."

But LW generally isn't a place where you're going to get EG thousands of readers or have a huge effect on general discourse (with the exception of a few things that go viral or AIS-viral).

So I'm not particularly worried about LW discussions having big second-order effects on democratic society. Whereas LW can be a space for people to have a relatively low bar for raising questions, being curious, trying to understand the world, offering criticism/praise without thinking much about how they want to be spending "points", etc.

Comment by Akash (akash-wasil) on Zach Stein-Perlman's Shortform · 2024-09-08T01:32:45.911Z · LW · GW

Is this where we think our pressuring-Anthropic points are best spent ? 

I think if someone has a 30-minute meeting with some highly influential and very busy person at Anthropic, it makes sense for them to have thought in advance about the most important things to ask & curate the agenda appropriately. 

But I don't think LW users should be thinking much about "pressuring-Anthropic points". I see LW primarily as a platform for discourse (as opposed to a direct lobbying channel to labs), and I think it would be bad for the discourse if people felt like they had to censor questions/concerns about labs on LW unless it met some sort of "is this one of the most important things to be pushing for" bar.

Comment by Akash (akash-wasil) on What is SB 1047 *for*? · 2024-09-06T15:12:59.024Z · LW · GW

It seems to me like the strongest case for SB1047 is that it's a transparency bill. As Zvi noted, it's probably good for governments and for the world to be able to example the Safety and Security Protocols of frontier AI companies. 

But there are also some pretty important limitations. I think a lot of the bill's value (assuming it passes) will be determined by how it's implemented and whether or not there are folks in government who are able to put pressure on labs to be specific/concrete in their SSPs. 

More thoughts below:

Transparency as an emergency preparedness technique

I often think in an emergency preparedness frame– if there was a time-sensitive threat, how would governments be able to detect the threat & make sure information about the threat was triaged/handled appropriately? It seems like governments are more likely to notice time-sensitive threats in a world where there's more transparency, and forcing frontier AI companies to write/publish SSPs seems good from that angle. 

In my model, a lot of risk comes from the government taking too long to react– either so long that an existential catastrophe actually occurs or so long that by the time major intervention occurs, ASL-4+ models have been developed with poor security, and now it's ~impossible to do anything except continue to race ("otherwise the other people with ASL4+ models will cause a catastrophe".) Efforts to get the government to understand the state of risks and intervene before ASL4+ models seem very important from that perspective. It seems to me like SSPs could accomplish this by (a) giving the government useful information and (b) making it "someone's job" to evaluate the state of SSPs + frontier AI risks. 

Limitation: Companies can write long and nice-sounding documents that avoid specificity and concreteness

The most notable limitation, IMO, is that it's generally pretty easy for powerful companies to evade being fully transparent. Sometimes, people champion things like RSPs or the Seoul Commitments as these major breakthroughs in transparency. Although I do see these as steps in the right direction, their value should not be overstated. For example, even the "best" RSPs (OpenAI's and Anthropic's) are rather vague about how decisions will actually be made. Anthropic's RSP essentially says "Company leadership will ultimately determine whether something is too risky and whether the safeguards are adequate" (with the exception of some specifics around security). OpenAI's does a bit better IMO (from a transparency perspective) by spelling out the kinds of capabilities that they would consider risky, but they still provide company leadership ~infinite freedom RE determining whether or not safeguards are adequate.

Incentives for transparency are relatively weak, and the costs of transparency can be high. In Sam Bowman's recent post, he mentions that detailed commitments (and we can extend this to detailed SSPs) can commit companies to "needlessly costly busy work." A separate but related frame is that race dynamics mean that companies can't afford to make detailed commitments. If I'm in charge of an AI company, I'd generally like to have some freedom/flexibility/wiggle room in how I make decisions, interpret evidence, conduct evaluations, decide whether or not to keep scaling, and make judgments around safety and security.

In other words, we should expect that at least some (maybe all) of the frontier AI companies will try to write SSPs that sound really nice but provide minimal concrete details. The incentives to be concrete/specific are not strong, and we already have some evidence from seeing RSPs/PFs (and note again that I think that the other companies were even less detailed and concrete in their documents.)

Potential solutions: Government capacity & whistleblower mechanisms

So what do we do about this? Are there ways to make SSPs actually promote transparency? If the government is able to tell that some companies are being vague/misleading in their SSPs, this could inspire further investigations/inquiries. We've already seen several Congresspeople send letters to frontier AI companies requesting more details about security procedures, whistleblower protections, and other safety/security topics.

So I think there are two things that can help: government capacity and whistleblower mechanisms.  

Government capacity. The FMD was cut, but perhaps the Board of Frontier Models could provide this oversight. At the very least, the Board could provide an audience for the work of people like @Zach Stein-Perlman and @Zvi– people who might actually read through a complicated 50+ page SSP with corporate niceties but be able to distill what's really going on, what's missing, what's misleading, etc.

Whistleblower mechanisms. SB1047 provides a whistleblower mechanism & whistleblower protections (note: I see these as separate things and I personally think mechanisms are more important). Every frontier AI company has to have a platform through which employees (and contractors, I think?) are able to report if they believe the company is being misleading in its SSPs. This seems like a great accountability tool (though of course it relies on the whistleblower mechanism being implemented properly & relies on some degree of government capacity RE knowing how to interpret whistleblower reports.)

The final thing I'll note is that I think the idea of full shutdown protocols is quite valuable. From an emergency preparedness standpoint, it seems quite good for governments to be asking "under what circumstances do you think a full shutdown is required" and "how would we actually execute/verify a full shutdown."

Comment by Akash (akash-wasil) on The Checklist: What Succeeding at AI Safety Will Involve · 2024-09-04T21:01:06.324Z · LW · GW

@ryan_greenblatt one thing I'm curious about is when/how the government plays a role in your plan.

I think Sam is likely correct in pointing out that the influence exerted by you (as an individual), Sam (as an individual), or even Anthropic (as an institution) likely goes down considerably if/once governments get super involved.

I still agree with your point about how having an exit plan is still valuable (and indeed I do expect governments to be asking technical experts about their opinions RE what to do, though I also expect a bunch of DC people who know comparatively little about frontier AI systems but have long-standing relationships in the national security world will have a lot of influence.)

My guess is that you think heavy government involvement should occur for before/during the creation of ASL-4 systems, since you're pretty concerned about risks from ASL-4 systems being developed in non-SL5 contexts.

In general, I'd be interested in seeing more about how you (and Buck) are thinking about policy stuff + government involvement. My impression is that you two have spent a lot of time thinking about how AI control fits into a broader strategic context, with that broader strategic context depending a lot on how governments act/react. 

And I suspect readers will be better able to evaluate the AI control plan if some of the assumptions/expectations around government involvement are spelled out more clearly. (Put differently, I think it's pretty hard to evaluate "how excited should I be about the AI control agenda" without understanding who is responsible for doing the AI control stuff, what's going on with race dynamics, etc.)

Comment by Akash (akash-wasil) on The Checklist: What Succeeding at AI Safety Will Involve · 2024-09-03T22:51:23.521Z · LW · GW

I liked this post (and think it's a lot better than official comms from Anthropic.) Some things I appreciate about this post:

Presenting a useful heuristic for RSPs

Relatedly, we should aim to pass what I call the LeCun Test: Imagine another frontier AI developer adopts a copy of our RSP as binding policy and entrusts someone who thinks that AGI safety concerns are mostly bullshit to implement it. If the RSP is well-written, we should still be reassured that the developer will behave safely—or, at least, if they fail, we should be confident that they’ll fail in a very visible and accountable way.

Acknowledging the potential for a pause

For our RSP commitments to function in a worst-case scenario where making TAI systems safe is extremely difficult, we’ll need to be able to pause the development and deployment of new frontier models until we have developed adequate safeguards, with no guarantee that this will be possible on any particular timeline. This could lead us to cancel or dramatically revise major deployments. Doing so will inevitably be costly and could risk our viability in the worst cases, but big-picture strategic preparation could make the difference between a fatal blow to our finances and morale and a recoverable one. More fine-grained tactical preparation will be necessary for us to pull this off as quickly as may be necessary without hitting technical or logistical hiccups.

Sam wants Anthropic to cede decision-making to governments at some point

[At ASL-5] Governments and other important organizations will likely be heavily invested in AI outcomes, largely foreclosing the need for us to make major decisions on our own. By this point, in most possible worlds, the most important decisions that the organization is going to make have already been made. I’m not including any checklist items below, because we hope not to have any.

Miscellenaous things I like

  • Generally just providing a detailed overview of "the big picture"– how Sam actually sees Anthropic's work potentially contributing to good outcomes. And not sugarcoating what's going on– being very explicit about the fact that these systems are going to become catastrophically dangerous, and EG "If we haven’t succeeded decisively on the big core safety challenges by this point, there’s so much happening so fast and with such high stakes that we are unlikely to be able to recover from major errors now."
  • Striking a tone that feels pretty serious/straightforward/sober. (In contrast, many Anthropic comms have a vibe of "I am a corporation trying to sell you on the fact that I am a Good Guy.")

Some limitations

  • "Nothing here is a firm commitment on behalf of Anthropic."
  • Not much about policy or government involvement, besides a little bit about scary demos. (To be fair, Sam is a technical person.Though I think the "I'm just a technical person, I'm going to leave policy to the policy people" attitude is probably bad, especially for technical people who are thinking/writing about macrostratgy.)
  • Not much about race dynamics, how to make sure other labs do this, whether Anthropic would actually do things that are costly or if race dynamics would just push them to cut corners. (Pretty similar to the previous concern but a more specific set of worries.)
  • Still not very clear what kinds of evidence would be useful for establishing safety or establishing risk. Similarly, not very clear what kinds of evidence would trigger Sam to think that Anthropic should pause or should EG invest ~all of its capital into getting governments to pause. (To be fair, no one really has great/definitive answers on this. But on the other hand, I think it's useful for people to start spelling out best-guesses RE what this would involve & just acknowledge that our ideas will hopefully get better over time.)

All in all, I think this is an impressive post and I applaud Sam for writing it. 

Comment by Akash (akash-wasil) on Akash's Shortform · 2024-09-03T20:36:00.596Z · LW · GW

Thanks for sharing! Why do you think the CA legislators were more OK pissing off Big Tech & Pelosi? (I mean, I guess Pelosi's statement didn't come until relatively late, but I believe there was still time for people in at least one chamber to change their votes.)

To me, the most obvious explanation is probably something like "Newsom cares more about a future in federal government than most CA politicians and therefore relies more heavily on support from Big Tech and approval from national Democratic leaders"– is this what's driving your model?

Comment by Akash (akash-wasil) on Akash's Shortform · 2024-09-02T23:39:58.060Z · LW · GW

Why do people think there's a ~50% chance that Newsom will veto SB1047?

The base rate for vetoes is about 15%. Perhaps the base rate for controversial bills is higher. But it seems like SB1047 hasn't been very controversial among CA politicians.

Is the main idea here that Newsom's incentives are different than those of state politicians because Newsom has national ambitions? So therefore he needs to cater more to the Democratic Party Establishment (which seems to oppose SB1047) or Big Tech? (And then this just balances out against things like "maybe Newsom doesn't want to seem soft on Big Tech, maybe he feels like he has more to lose by deviating from what the legislature wants, the polls support SB1047, and maybe he actually cares about increasing transparency into frontier AI companies?)

Or are there other factors that are especially influential in peoples' models here?

(Tagging @ryan_greenblatt, @Eric Neyman, and @Neel Nanda because you three hold the largest No positions. Feel free to ignore if you don't want to engage.)