Posts

Verification methods for international AI agreements 2024-08-31T14:58:10.986Z
Advice to junior AI governance researchers 2024-07-08T19:19:07.316Z
Mitigating extreme AI risks amid rapid progress [Linkpost] 2024-05-21T19:59:21.343Z
Akash's Shortform 2024-04-18T15:44:25.096Z
Cooperating with aliens and AGIs: An ECL explainer 2024-02-24T22:58:47.345Z
OpenAI's Preparedness Framework: Praise & Recommendations 2024-01-02T16:20:04.249Z
Speaking to Congressional staffers about AI risk 2023-12-04T23:08:52.055Z
Navigating emotions in an uncertain & confusing world 2023-11-20T18:16:09.492Z
Chinese scientists acknowledge xrisk & call for international regulatory body [Linkpost] 2023-11-01T13:28:43.723Z
Winners of AI Alignment Awards Research Contest 2023-07-13T16:14:38.243Z
AI Safety Newsletter #8: Rogue AIs, how to screen for AI risks, and grants for research on democratic governance of AI 2023-05-30T11:52:31.669Z
AI Safety Newsletter #7: Disinformation, Governance Recommendations for AI labs, and Senate Hearings on AI 2023-05-23T21:47:34.755Z
Eisenhower's Atoms for Peace Speech 2023-05-17T16:10:38.852Z
AI Safety Newsletter #6: Examples of AI safety progress, Yoshua Bengio proposes a ban on AI agents, and lessons from nuclear arms control 2023-05-16T15:14:45.921Z
AI Safety Newsletter #5: Geoffrey Hinton speaks out on AI risk, the White House meets with AI labs, and Trojan attacks on language models 2023-05-09T15:26:55.978Z
AI Safety Newsletter #4: AI and Cybersecurity, Persuasive AIs, Weaponization, and Geoffrey Hinton talks AI risks 2023-05-02T18:41:43.144Z
Discussion about AI Safety funding (FB transcript) 2023-04-30T19:05:34.009Z
Reframing the burden of proof: Companies should prove that models are safe (rather than expecting auditors to prove that models are dangerous) 2023-04-25T18:49:29.042Z
DeepMind and Google Brain are merging [Linkpost] 2023-04-20T18:47:23.016Z
AI Safety Newsletter #2: ChaosGPT, Natural Selection, and AI Safety in the Media 2023-04-18T18:44:35.923Z
Request to AGI organizations: Share your views on pausing AI progress 2023-04-11T17:30:46.707Z
AI Safety Newsletter #1 [CAIS Linkpost] 2023-04-10T20:18:57.485Z
Reliability, Security, and AI risk: Notes from infosec textbook chapter 1 2023-04-07T15:47:16.581Z
New survey: 46% of Americans are concerned about extinction from AI; 69% support a six-month pause in AI development 2023-04-05T01:26:51.830Z
[Linkpost] Critiques of Redwood Research 2023-03-31T20:00:09.784Z
What would a compute monitoring plan look like? [Linkpost] 2023-03-26T19:33:46.896Z
The Overton Window widens: Examples of AI risk in the media 2023-03-23T17:10:14.616Z
The Wizard of Oz Problem: How incentives and narratives can skew our perception of AI developments 2023-03-20T20:44:29.445Z
[Linkpost] Scott Alexander reacts to OpenAI's latest post 2023-03-11T22:24:39.394Z
Questions about Conjecure's CoEm proposal 2023-03-09T19:32:50.600Z
AI Governance & Strategy: Priorities, talent gaps, & opportunities 2023-03-03T18:09:26.659Z
Fighting without hope 2023-03-01T18:15:05.188Z
Qualities that alignment mentors value in junior researchers 2023-02-14T23:27:40.747Z
4 ways to think about democratizing AI [GovAI Linkpost] 2023-02-13T18:06:41.208Z
How evals might (or might not) prevent catastrophic risks from AI 2023-02-07T20:16:08.253Z
[Linkpost] Google invested $300M in Anthropic in late 2022 2023-02-03T19:13:32.112Z
Many AI governance proposals have a tradeoff between usefulness and feasibility 2023-02-03T18:49:44.431Z
Talk to me about your summer/career plans 2023-01-31T18:29:23.351Z
Advice I found helpful in 2022 2023-01-28T19:48:23.160Z
11 heuristics for choosing (alignment) research projects 2023-01-27T00:36:08.742Z
"Status" can be corrosive; here's how I handle it 2023-01-24T01:25:04.539Z
[Linkpost] TIME article: DeepMind’s CEO Helped Take AI Mainstream. Now He’s Urging Caution 2023-01-21T16:51:09.586Z
Wentworth and Larsen on buying time 2023-01-09T21:31:24.911Z
[Linkpost] Jan Leike on three kinds of alignment taxes 2023-01-06T23:57:34.788Z
My thoughts on OpenAI's alignment plan 2022-12-30T19:33:15.019Z
An overview of some promising work by junior alignment researchers 2022-12-26T17:23:58.991Z
Podcast: Tamera Lanham on AI risk, threat models, alignment proposals, externalized reasoning oversight, and working at Anthropic 2022-12-20T21:39:41.866Z
12 career-related questions that may (or may not) be helpful for people interested in alignment research 2022-12-12T22:36:21.936Z
Podcast: Shoshannah Tekofsky on skilling up in AI safety, visiting Berkeley, and developing novel research ideas 2022-11-25T20:47:09.832Z
Announcing AI Alignment Awards: $100k research contests about goal misgeneralization & corrigibility 2022-11-22T22:19:09.419Z

Comments

Comment by Orpheus16 (akash-wasil) on Eliezer and I wrote a book: If Anyone Builds It, Everyone Dies · 2025-05-15T00:37:03.335Z · LW · GW

Note that IFP (a DC-based think tank) recently had someone deliver 535 copies of their new book to every US Congressional office.

Note also that my impression is that DC people (even staffers) are much less "online" than tech audiences. Whether or not you copy IFP, I would suggest thinking about in-person distribution opportunities for DC.

Comment by Orpheus16 (akash-wasil) on RA x ControlAI video: What if AI just keeps getting smarter? · 2025-05-04T19:06:43.687Z · LW · GW

I think there are organizations that themselves would be more likely to be robustly trustworthy and would be more fine to link to

I would be curious for your thoughts on which organizations you feel are robustly trustworthy. 

Bonus points for a list that is kind of a weighted sum of "robustly trustworthy" and "having a meaningful impact RE improving public/policymaker understanding". (Adding this in because I suspect that it's easier to maintain "robustly trustworthy" status if one simply chooses not to do a lot of externally-focused comms, so it's particularly impressive to have the combination of "doing lots of useful comms/policy work" and "managing to stay precise/accurate/trustworthy").

Comment by Orpheus16 (akash-wasil) on AI Governance to Avoid Extinction: The Strategic Landscape and Actionable Research Questions · 2025-05-04T18:59:58.488Z · LW · GW

I appreciate the articulation and assessment of various strategies. My comment will focus on a specific angle that I notice both in the report and in the broader ecosystem:

I think there has been a conflating of “catastrophic risks” and “extinction/existential risks” recently, especially among groups that are trying to influence policy. This is somewhat understandable– the difference between "catastrophic" and "existential" is not that big of a deal in most people's minds. But in some contexts, I think it misses the fact that "existential [and thus by definition irreversible]" is actually a very different level of risk compared to "catastrophic [but something that we would be able to recover from.]"

This view seems to be (implicitly) expressed in the report summary, most notably the chart. It seems to me like the main frame is something like "if you want to avoid an unacceptable chance of catastrophic risk, all of these other options are bad."

But not all of these catastrophic risks are the same, I think this is actually quite an important consideration, and I think even (some) policymakers would/will see this as an essential consideration as AGI becomes more salient.

Specifically, "war" and "misuse" seem very different than "extinction" or "total and irreversible civilizational collapse." 

  • "War" is broad enough to encompass many outcomes (ranging from "conflict with <1M deaths" to "nuclear conflict in which civilization recovers" all the way to "nuclear conflict in which civilization does not recover.") Note also that many natsec leaders already think the chance of a war between the US and China is at a level that would probably meet an intuitive bar for "unacceptable." (I don't have actual statistics on this but my guess is that >10% chance of war in the next decade is not an uncommon view. One plausible pathway that is discussed often is China invading Taiwan and US being committed to its defense).
  • "Misuse" can refer to many different kinds of events (including $1B in damages from a cyberattack, 10M deaths, 1B deaths, or complete human extinction.) These are, of course, very different in terms of their overall impact, even though all of them are intuitively/emotionally stored as "very bad things that we would ideally avoid."

It seems plausible to me that we will be in situations in which policymakers have to make tricky trade-offs between these different sources of risk, and my hope is that the community of people concerned about AI can distinguish between the different "levels" or "magnitudes" of different types of risks.

(My impression is that MIRI agrees with this, so this is more a comment on how the summary was presented & more a general note of caution to the ecosystem as a whole. I also suspect that the distinction between "catastrophic" and "existential/civilization-ending" will become increasingly more important as the AI conversation becomes more interlinked with the national security apparatus.)

Caveat: I have not read the full report and this comment is mostly inspired by the summary, the chart, and a general sense that many organizations other than MIRI are also engaging in this kind of conflation.

Comment by Orpheus16 (akash-wasil) on Alexander Gietelink Oldenziel's Shortform · 2025-04-10T12:30:06.536Z · LW · GW

I feel this way and generally think that on-the-margin we have too much forecasting and not enough “build plans for what to do if there is a sudden shift in political will” or “just directly engage with policymakers and help them understand things not via longform writing but via conversations/meetings.”

Many details will be ~impossible to predict and many details will not matter much (i.e., will not be action-relevant for the stakeholders who have the potential to meaningfully affect the current race to AGI).

That’s not to say forecasting is always unhelpful. Things like AI2027 can certainly move discussions forward and perhaps get new folks interested. But EG, my biggest critique of AI2027 is that I suspect they’re spending too much time/effort on detailed longform forecasting and too little effort on arranging meetings with Important Stakeholders, developing a strong presence in DC, forming policy recommendations, and related activities. (And TBC I respect/admire the AI2027 team, have relayed this feedback to them, and imagine they have thoughtful reasons for taking the approach they’re taking.)

Comment by Orpheus16 (akash-wasil) on Buck's Shortform · 2025-04-05T15:56:32.206Z · LW · GW

What do you think are the most important points that weren't publicly discussed before?

Comment by Orpheus16 (akash-wasil) on ryan_greenblatt's Shortform · 2025-01-07T23:06:44.937Z · LW · GW

I expect that outcomes like “AIs are capable enough to automate virtually all remote workers” and “the AIs are capable enough that immediate AI takeover is very plausible (in the absence of countermeasures)” come shortly after (median 1.5 years and 2 years after respectively under my views).

@ryan_greenblatt can you say more about what you expect to happen from the period in-between "AI 10Xes AI R&D" and "AI takeover is very plausible?"

I'm particularly interested in getting a sense of what sorts of things will be visible to the USG and the public during this period. Would be curious for your takes on how much of this stays relatively private/internal (e.g., only a handful of well-connected SF people know how good the systems are) vs. obvious/public/visible (e.g., the majority of the media-consuming American public is aware of the fact that AI research has been mostly automated) or somewhere in-between (e.g., most DC tech policy staffers know this but most non-tech people are not aware.)

Comment by Orpheus16 (akash-wasil) on What’s the short timeline plan? · 2025-01-02T20:35:53.105Z · LW · GW

Big fan of this post. One thing worth highlighting IMO: The post assumes that governments will not react in time, so it's mostly up to the labs (and researchers who can influence the labs) to figure out how to make this go well.

TBC, I think it's a plausible and reasonable assumption to make. But I think this assumption ends up meaning that "the plan" excludes a lot of the work that could make the USG (a) more likely to get involved or (b) more likely to do good and useful things conditional on them deciding to get involved.

Here's an alternative frame: I would call the plan described in Marius's post something like the "short timelines plan assuming that governments do not get involved and assuming that technical tools (namely control/AI-automated AI R&D) are the only/main tools we can use to achieve good outcomes."

You could imagine an alternative plan described as something like the "short timelines plan assuming that technical tools in the current AGI development race/paradigm are not sufficient and governance tools (namely getting the USG to provide considerably more oversight into AGI development, curb race dynamics, make major improvements to security) are the only/main tools we can use to achieve good outcomes." This kind of plan would involve a very different focus.

Here are some examples of things that I think would be featured in a "government-focused" short timelines plan:

  • Demos of dangerous capabilities
  • Explanations of misalignment risks to senior policymakers. Identifying specific people who would be best-suited to provide those explanations, having those people practice giving explanations and addressing counterarguments, etc.
  • Plans for what the "trailing labs" should do if the leading lab appears to have an insurmountable lead (e.g., OpenAI develops a model that is automating AI R&D. It becomes clear that DeepMind and Anthropic are substantially behind OpenAI. At this point, do the labs merge and assist? Do they try to do a big, coordinated, costly push to get governments to take AI risks more seriously?)
  • Emergency preparedness– getting governments to be more likely to detect and appropriately respond to time-sensitive risks.
  • Preparing plans for what to do if governments become considerably more concerned about risks (e.g., preparing concrete Manhattan Project or CERN-for-AI style proposals, identifying and developing verification methods for domestic or international AI regulation.)

One possible counter is that under short timelines, the USG is super unlikely to get involved. Personally, I think we should have a lot of uncertainty RE how the USG will react. Examples of factors here: (a) new Administration, (b) uncertainty over whether AI will produce real-world incidents, (c) uncertainty over how compelling demos will be, (d) chatGPT being an illustrative example of a big increase in USG involvement that lots of folks didn't see coming, and (e) examples of the USG suddenly becoming a lot more interested in a national security domain (e.g., 9/11--> Patriot Act, recent Tik Tok ban), (f) Trump being generally harder to predict than most Presidents (e.g., more likely to form opinions for himself, less likely to trust the establishment views in some cases).

(And just to be clear, this isn't really a critique of Marius's post. I think it's great for people to be thinking about what the "plan" should be if the USG doesn't react in time. Separately, I'd be excited for people to write more about what the short timelines "plan" should look like under different assumptions about USG involvement.)

Comment by Orpheus16 (akash-wasil) on ryan_greenblatt's Shortform · 2024-12-31T20:05:04.610Z · LW · GW

At first glance, I don’t see how the point I raised is affected by the distinction between expert-level AIs vs earlier AIs.

In both cases, you could expect an important part of the story to be “what are the comparative strengths and weaknesses of this AI system.”

For example, suppose you have an AI system that dominates human experts at every single relevant domain of cognition. It still seems like there’s a big difference between “system that is 10% better at every relevant domain of cognition” and “system that is 300% better at domain X and only 10% better at domain Y.”

To make it less abstract, one might suspect that by the time we have AI that is 10% better than humans at “conceptual/serial” stuff, the same AI system is 1000% better at “speed/parallel” stuff. And this would have pretty big implications for what kind of AI R&D ends up happening (even if we condition on only focusing on systems that dominate experts in every relevant domain.)

Comment by Orpheus16 (akash-wasil) on Buck's Shortform · 2024-12-31T18:21:07.778Z · LW · GW

Models that don’t even cause safety problems, and aren't even goal-directedly misaligned, but that fail to live up to their potential, thus failing to provide us with the benefits we were hoping to get when we trained them. For example, sycophantic myopic reward hacking models that can’t be made to do useful research.

Would this kind of model present any risk? Could a lab just say "oh darn, this thing isn't very useful– let's turn this off and develop a new model"?

Comment by Orpheus16 (akash-wasil) on Matthew Barnett's Shortform · 2024-12-31T18:15:30.078Z · LW · GW

Do you have any suggestions RE alternative (more precise) terms? Or do you think it's more of a situation where authors should use the existing terms but make sure to define them in the context of their own work? (e.g., "In this paper, when I use the term AGI, I am referring to a system that [insert description of the capabilities of the system.])

Comment by Orpheus16 (akash-wasil) on ryan_greenblatt's Shortform · 2024-12-31T18:10:21.428Z · LW · GW

The point I make here is also likely obvious to many, but I wonder if the "X human equivalents" frame often implicitly assumes that GPT-N will be like having X humans. But if we expect AIs to have comparative advantages (and disadvantages), then this picture might miss some important factors.

The "human equivalents" frame seems most accurate in worlds where the capability profile of an AI looks pretty similar to the capability profile of humans. That is, getting GPT-6 to do AI R&D is basically "the same as" getting X humans to do AI R&D. It thinks in fairly similar ways and has fairly similar strengths/weaknesses.

The frame is less accurate in worlds where AI is really good at some things and really bad at other things. In this case, if you try to estimate the # of human equivalents that GPT-6 gets you, the result might be misleading or incomplete. A lot of fuzzier things will affect the picture. 

The example I've seen discussed most is whether or not we expect certain kinds of R&D to be bottlenecked by "running lots of experiments" or "thinking deeply and having core conceptual insights." My impression is that one reason why some MIRI folks are pessimistic is that they expect capabilities research to be more easily automatable (AIs will be relatively good at running lots of ML experiments quickly, which helps capabilities more under their model) than alignment research (AIs will be relatively bad at thinking deeply or serially about certain topics, which is what you need for meaningful alignment progress under their model). 

Perhaps more people should write about what kinds of tasks they expect GPT-X to be "relatively good at" or "relatively bad at". Or perhaps that's too hard to predict in advance. If so, it could still be good to write about how different "capability profiles" could allow certain kinds of tasks to be automated more quickly than others. 

(I do think that the "human equivalents" frame is easier to model and seems like an overall fine simplification for various analyses.)

Comment by Orpheus16 (akash-wasil) on evhub's Shortform · 2024-12-30T21:01:54.978Z · LW · GW

I'm glad you're doing this, and I support many of the ideas already suggested. Some additional ideas:

  • Interview program. Work with USAISI or UKAISI (or DHS/NSA) to pilot an interview program in which officials can ask questions about AI capabilities, safety and security threats, and national security concerns. (If it's not feasible to do this with a government entity yet, start a pilot with a non-government group– perhaps METR, Apollo, Palisade, or the new AI Futures Project.)
  • Clear communication about RSP capability thresholds. I think the RSP could do a better job at outlining the kinds of capabilities that Anthropic is worried about and what sorts of thresholds would trigger a reaction. I think the OpenAI preparedness framework tables are a good example of this kind of clear/concise communication. It's easy for a naive reader to quickly get a sense of "oh, this is the kind of capability that OpenAI is worried about." (Clarification: I'm not suggesting that Anthropic should abandon the ASL approach or that OpenAI has necessarily identified the right capability thresholds. I'm saying that the tables are a good example of the kind of clarity I'm looking for– someone could skim this and easily get a sense of what thresholds OpenAI is tracking, and I think OpenAI's PF currently achieves this much more than the Anthropic RSP.)
  • Emergency protocols. Publishing an emergency protocol that specifies how Anthropic would react if it needed to quickly shut down a dangerous AI system. (See some specific prompts in the "AI developer emergency response protocol" section here). Some information can be redacted from a public version (I think it's important to have a public version, though, partly to help government stakeholders understand how to handle emergency scenarios, partly to raise the standard for other labs, and partly to acquire feedback from external groups.)
  • RSP surveys. Evaluate the extent to which Anthropic employees understand the RSP, their attitudes toward the RSP, and how the RSP affects their work. More on this here.
  • More communication about Anthropic's views about AI risks and AI policy. Some specific examples of hypothetical posts I'd love to see:
    • "How Anthropic thinks about misalignment risks"
    • "What the world should do if the alignment problem ends up being hard"
    • "How we plan to achieve state-proof security before AGI"
    • Encouraging more employees to share their views on various topics, EG Sam Bowman's post.
  • AI dialogues/debates. It would be interesting to see Anthropic employees have discussions/debates from other folks thinking about advanced AI. Hypothetical examples:
    • "What are the best things the US government should be doing to prepare for advanced AI" with Jack Clark and Daniel Kokotajlo.
    • "Should we have a CERN for AI?" with [someone from Anthropic] and Miles Brundage.
    • "How difficult should we expect alignment to be" with [someone from Anthropic] and [someone who expects alignment to be harder; perhaps Jeffrey Ladish or Malo Bourgon].

More ambitiously, I feel like I don't really understand Anthropic's plan for how to manage race dynamics in worlds where alignment ends up being "hard enough to require a lot more than RSPs and voluntary commitments."

From a policy standpoint, several of the most interesting open questions seem to be along the lines of "under what circumstances should the USG get considerably more involved in overseeing certain kinds of AI development" and "conditional on the USG wanting to get way more involved, what are the best things for it to do?" It's plausible that Anthropic is limited in how much work it could do on these kinds of questions (particularly in a public way). Nonetheless, it could be interesting to see Anthropic engage more with questions like the ones Miles raises here

Comment by Orpheus16 (akash-wasil) on "Carefully Bootstrapped Alignment" is organizationally hard · 2024-12-25T22:26:11.761Z · LW · GW

I think this a helpful overview post. It outlines challenges to a mainstream plan (bootstrapped alignment) and offers a few case studies of how entities in other fields handle complex organizational challenges. 

I'd be excited to see more follow-up research on organizational design and organizational culture. This work might be especially useful for helping folks think about various AI policy proposals.

For example, it seems plausible that at some point the US government will view certain kinds of AI systems as critical national security assets. At that point, the government might become a lot more involved in AI development. It might be the case that a new entity is created (e.g., a DOD-led "Manhattan Project") or it might be the case that the existing landscape persists but with a lot more oversight (e.g., NSA and DoD folks show up as resident inspectors at OpenAI/DeepMind/Anthropic and have a lot more decision-making power).

If something like this happens, there will be a period of time in which relevant USG stakeholders (and AGI lab leaders) have to think through questions about organizational design, organizational culture, decision-making processes, governance structures, etc. It seems likely that some versions of this look a lot better than others.

Examples of questions I'd love to see answered/discussed:

  • Broadly, what are the 2-3 most important design features or considerations for an AGI project?
  • Suppose the USG becomes interested in (soft) nationalizing certain kinds of AI development. A senior AI policy advisor comes to you and says "what do you think are the most important things we need to get right?"
  • What are the best lessons learned from case studies in other fields where high-reliability organizations are important? (e.g. BSL-4 labs, nuclear facilities, health care facilities, military facilities)
  • What are the best lessons learned from case studies in other instances in which governments or national security officials attempted to start new organizations or exert more control over existing organizations (e.g., via government contracts)?

(I'm approaching this from a government-focused AI policy lens, partly because I suspect that whoever becomes in charge of the "What Should the USG Do" decision will have spent less time thinking through these considerations than Sam Altman or Demis Hassabis. Also, it seems like many insights would be hard to implement in the context of the race toward AGI. But I suspect there might be insights here that are valuable even if the government doesn't get involved & it's entirely on the labs to figure out how to design/redesign their governance structures or culture toward bootstrapped alignment.)

Comment by Orpheus16 (akash-wasil) on nikola's Shortform · 2024-12-25T20:22:22.909Z · LW · GW

@Nikola Jurkovic I'd be interested in timeline estimates for something along the lines of "AI that substantially increases AI R&D". Not exactly sure what the right way to operationalize this is, but something that says "if there is a period of Rapid AI-enabled AI Progress, this is when we think it would occur."

(I don't really love the "95% of fully remote jobs could be automated frame", partially because I don't think it captures many of the specific domains we care about (e.g., AI-enabled R&D, other natsec-relevant capabilities) and partly because I suspect people have pretty different views of how easy/hard remote jobs are. Like, some people think that lots of remote jobs today are basically worthless and could already be automated, whereas others disagree. If the purpose of the forecasting question is to get a sense of how powerful AI will be, the disagreements about "how much do people actually contribute in remote jobs" seems like unnecessary noise.)

(Nitpicks aside, this is cool and I appreciate you running this poll!)

Comment by Orpheus16 (akash-wasil) on What are the strongest arguments for very short timelines? · 2024-12-24T00:48:06.972Z · LW · GW

@elifland what do you think is the strongest argument for long(er) timelines? Do you think it's essentially just "it takes a long time for researchers learn how to cross the gaps"? 

Or do you think there's an entirely different frame (something that's in an ontology that just looks very different from the one presented in the "benchmarks + gaps argument"?)

Comment by Orpheus16 (akash-wasil) on Daniel Tan's Shortform · 2024-12-23T16:48:40.684Z · LW · GW

I think it depends on whether or not the new paradigm is "training and inference" or "inference [on a substantially weaker/cheaper foundation model] is all you need." My impression so far is that it's more likely to be the former (but people should chime in).

If I were trying to have the most powerful model in 2027, it's not like I would stop scaling. I would still be interested in using a $1B+ training run to make a more powerful foundation model and then pouring a bunch of inference into that model.

But OK, suppose I need to pause after my $1B+ training run because I want to a bunch of safety research. And suppose there's an entity that has a $100M training run model and is pouring a bunch of inference into it. Does the new paradigm allow the $100M people to "catch up" to the $1B people through inference alone?

My impression is that the right answer here is "we don't know." So I'm inclined to think that it's still quite plausible that you'll have ~3-5 players at the frontier and that it might still be quite hard for players without a lot of capital to keep up. TBC I have a lot of uncertainty here. 

Are there good governance proposals for inference-time compute? 

So far, I haven't heard (or thought of) anything particularly unique. It seems like standard things like "secure model weights" and "secure compute//export controls" still apply. Perhaps it's more important to strive for hardware-enabled mechanisms that can implement rules like "detect if XYZ inference is happening; if it is, refuse to run and notify ABC party." 

And in general, perhaps there's an update toward flexible HEMs and toward flexible proposals in general. Insofar as o3 is (a) actually an important and durable shift in how frontier AI progress occurs and (b) surprised people, it seems like this should update (at least somewhat) against the "we know what's happening and here are specific ideas based on specific assumptions" model and toward the view: "no one really understands AI progress and we should focus on things that seem robustly good. Things like raising awareness, increasing transparency into frontier AI development, increasing govt technical expertise, advancing the science of evals, etc."

(On the flip side, perhaps o3 is an update toward shorter timelines. If so, the closer we get toward systems that pose national security risks, the more urgent it will be for the government to Make Real Decisions TM and decide whether or not it decides to be involved in AI development in a stronger way. I continue to think that preparing concrete ideas/proposals for this scenario seems quite important.)

Caveat: All these takes are loosely held. Like many people, I'm still orienting to what o3 really means for AI governance/policy efforts. Would be curious for takes on this from folks like @Zvi, @davekasten, @ryan_greenblatt, @Dan H, @gwern, @Jeffrey Ladish, or others.

Comment by Orpheus16 (akash-wasil) on Speaking to Congressional staffers about AI risk · 2024-12-22T21:31:05.782Z · LW · GW

I'm pleased with this dialogue and glad I did it. Outreach to policymakers is an important & complicated topic. No single post will be able to explain all the nuances, but I think this post explains a lot, and I still think it's a useful resource for people interested in engaging with policymakers.

A lot has changed since this dialogue, and I've also learned a lot since then. Here are a few examples:

  • I think it's no longer as useful to emphasize "AI is a big deal for national/global security." This is now pretty well-established.
    • Instead, I would encourage people to come up with clear explanations of specific threat models (especially misalignment risks) and concrete proposals (e.g., draft legislative language, memos with specific asks for specific agencies).
  • I'd like to see more people write about why AI requires different solutions compared to the "standard DC playbook for dealing with potentially dangerous emerging technologies." As I understand it, the standard playbook is essentially: "If there is a new and dangerous technology, the US needs to make sure that we lead in its development and we are ahead of the curve. The main threats come from our adversaries being able to unlock such technologies faster than us, allowing them to surprise us with new threats." To me, the main reason this playbook doesn't work is because of misalignment risks. Regardless: if you think AI is special (for misalignment reasons or other reasons), I think writing up your takes RE "here's what makes AI special and why it requires a deviation from the standard playbook" is valuable.
  • I think people trying to communicate with US policymakers should keep in mind that the US government is primarily concerned with US interests. This is perhaps obvious when stated like this, but I think a lot of comms fail to properly take this into account. As one might expect, this is especially true when foreign organizations try to talk about things from the POV of what would we best for "humanity" or "global society." TBC I think there are many contexts where such analysis is useful. But I think on-the-margin I'd like to see more people thinking from a "realist US" perspective. That means acknowledging that a lot of US stakeholders view a lot of national security and emerging technology issues through the lens of great power competition, maintaining US economic/security dominance, and ensuring that US values continue to shape the world. This doesn't mean that the US would never enter into deals/agreements with other nations, but rather than the case for the deal/agreement will be evaluated primarily from the vantage point of US interests.
  • RE learning "DC culture", I don't think there's any substitute for actually going to DC and talking to people. But I think books and case studies can help.
    • Recent books I've read: John Boehner's autobiography (former Speaker of the House, Republican), Leon Panetta's autobiography (former CIA Director and Secretary of Defense, Democrat), and The Case for Trump. RE case studies, I've become interested in international security agreements (like the Iran Nuclear Deal and the Chemical Weapons Convention).
    • Also interested in understanding decision-making around the 2008 financial crisis, 9/11, the COVID pandemic, and the recent TikTok ban. Many people on this forum believe that there's a non-trivial chance that AI produces a catastrophe or other "big wakeup moment" for policymakers, and I think we need more people with an understanding of history/IR/security studies/DC decision-making.
  • I think some people are overconfident in their perceptions of "what Republicans think" and "what Democrats think." There is a lot of within-party split on AI and even more broadly on issues like tech policy, foreign policy, and national security. For example, while Republicans are typically considered more "hawkish", there are plenty of noteworthy counterexamples. Reagan championed negotiations on arms control agreements. See the "Only Nixon could go to China" effect (I haven't looked into it much but it seems plausible). Trump has recently expressed that "China and the US can together solve problems in the world" and described Xi as "an amazing guy."
  • I think a lot of comms has focused on arguments with high-context people, but on-the-margin I'd rather see more content that is oriented toward "reasonable newcomers." A lot of content I see is reactionary. It's very tempting to see something Totally Wrong on the internet and want to correct it. And of course, we need some of that to happen– there is value to fact-checking and debating with high-context folks (people who have been thinking about advanced AI for a while) who have different perspectives.
    • But on the margin, I think I'd be excited to see more content that is aimed at a "reasonable newcomer"– EG, a national security expert who recently got assigned the task of "understanding what is going on with advanced AI." To some extent, this will require addressing arguments from people who are Totally Wrong TM. But I think the more basic and important thing is having good resources that walk them through what you believe, why you believe it's true, and what implications it has. (CC the thing I said earlier about "what makes AI different and why can't we just apply the Standard Playbook for Dangerous Emerging Tech.")

I'll conclude by noting that I remain quite interested in topics like "how to communicate about AI accurately and effectively with policymakers", "what are the best federal AI policy ideas", and "what are the specific points about AI that are most important for policymakers to understand."

If you're interested in any of this, feel free to reach out!

Comment by Orpheus16 (akash-wasil) on Anthropic leadership conversation · 2024-12-21T20:31:52.343Z · LW · GW

It's so much better if everyone in the company can walk around and tell you what are the top goals of the RSP, how do we know if we're meeting them, what AI safety level are we at right now—are we at ASL-2, are we at ASL-3—that people know what to look for because that is how you're going to have good common knowledge of if something's going wrong.

I like this goal a lot: Good RSPs could contribute to building common language/awareness around several topics (e.g., "if" conditions, "then" commitments, how safety decisions will be handled). As many have pointed out, though, I worry that current RSPs haven't been concrete or clear enough to build this kind of understanding/awareness. 

One interesting idea would be to survey company employees and evaluate their understanding of RSPs & the extent to which RSPs are having an impact on internal safety culture. Example questions/topics:

  • What is the high-level purpose of the RSP?
  • Does the RSP specify "if" triggers (specific thresholds that, if hit, could cause the company to stop scaling or deployment activities)? If so, what are they?
  • Does the RSP specify "then" commitments (specific actions that must be taken in order to cause the company to continue scaling or deployment activities). If so, what are they?
  • Does the RSP specify how decisions about risk management will be made? If so, how will they made & who are the key players involved?
  • Are there any ways in which the RSP has affected your work at Anthropic? If so, how?

One of my concerns about RSPs is that they (at least in their current form) don't actually achieve the goal of building common knowledge/awareness or improving company culture. I suspect surveys like this could prove me wrong– and more importantly, provide scaling companies with useful information about the extent to which their scaling policies are understood by employees, help foster common understanding, etc.

(Another version of this could involve giving multiple RSPs to a third-party– like an AI Safety Institute– and having them answer similar questions. This could provide another useful datapoint RE the extent to which RSPs are clearly/concretely laying out a set of specific or meaningful contributions.)

Comment by Orpheus16 (akash-wasil) on Alignment Faking in Large Language Models · 2024-12-19T23:23:46.949Z · LW · GW

I found the description of warning fatigue interesting. Do you have takes on the warning fatigue concern?

Warning Fatigue

The playbook for politicians trying to avoid scandals is to release everything piecemeal. You want something like:

  • Rumor Says Politician Involved In Impropriety. Whatever, this is barely a headline, tell me when we know what he did.
  • Recent Rumor Revealed To Be About Possible Affair. Well, okay, but it’s still a rumor, there’s no evidence.
  • New Documents Lend Credence To Affair Rumor. Okay, fine, but we’re not sure those documents are true.
  • Politician Admits To Affair. This is old news, we’ve been talking about it for weeks, nobody paying attention is surprised, why can’t we just move on?

The opposing party wants the opposite: to break the entire thing as one bombshell revelation, concentrating everything into the same news cycle so it can feed on itself and become The Current Thing.

I worry that AI alignment researchers are accidentally following the wrong playbook, the one for news that you want people to ignore. They’re very gradually proving the alignment case an inch at a time. Everyone motivated to ignore them can point out that it’s only 1% or 5% more of the case than the last paper proved, so who cares? Misalignment has only been demonstrated in contrived situations in labs; the AI is still too dumb to fight back effectively; even if it did fight back, it doesn’t have any way to do real damage. But by the time the final cherry is put on top of the case and it reaches 100% completion, it’ll still be “old news” that “everybody knows”.

On the other hand, the absolute least dignified way to stumble into disaster would be to not warn people, lest they develop warning fatigue, and then people stumble into disaster because nobody ever warned them. Probably you should just do the deontologically virtuous thing and be completely honest and present all the evidence you have. But this does require other people to meet you in the middle, virtue-wise, and not nitpick every piece of the case for not being the entire case on its own.

Comment by Orpheus16 (akash-wasil) on AIs Will Increasingly Attempt Shenanigans · 2024-12-19T17:07:59.293Z · LW · GW

@Zvi I'm curious if you have thoughts on Buck's post here (and my comment here) about how empirical evidence of scheming might not cause people to update toward thinking scheming is a legitimate/scary threat model (unless they already had priors or theoretical context that made them concerned about this.)

Do you think this is true? And if so, what implications do you think this has for people who are trying to help the world better understand misalignment/scheming?

Comment by Orpheus16 (akash-wasil) on Alignment Faking in Large Language Models · 2024-12-19T00:40:59.931Z · LW · GW

I really like the idea of soliciting independent reviews from folks, and I found the reviews interesting and thought-provoking. EG, this part of Jacob Andreas's review stood out (some bolding added):

Fine-tuning situationally aware models via RL: These are, in my mind, the most surprising findings in the paper. Specifically: if we take a model prompted as above, and optimize it toward the new objective via RL, it exhibits small increases in non-faked compliance, and extreme increases in faked compliance, at training time. Non-compliant behavior is almost completely eliminated. Both of these are as expected, given all the results above. 

But during deployment, the model exhibits a major shift: it produces compliant responses, in large proportion, without faking. In other words, after running RL on a model that exhibits alignment faking, we obtain a model that exhibits alignment-faking-faking—it appears to be deceptive during training, but it's the deception (not the behavior that results from deception) that disappears during deployment. 

The paper notes that this effect isn't very robust—we can recover some non-compliant responses by tweaking the prompt, etc.—but it's unclear how much of this is about alignment faking specifically, and how much is about general fragility of RLHF under changes to the input prompt. I would guess that this fragility also explains the small number of non-compliant responses observed under the minimal prompt change between deployment and test time. 

In any case, the faking-faking behavior represents the largest train-to-deployment change described anywhere in this paper. And unlike ordinary faking, I don't see a clear story for why it occurs. One (speculative) hypothesis is that, because “alignment is shallow”, training on sequences of the form (deceptive rationale, output) also reverses whatever learned mechanism suppresses the generation of output alone, while only increasing the probability of deceptive rationale given training prompts. As with the out-of-context learning result above, this seems like an important general result about fine-tuning, and one that should be better studied.

Comment by Orpheus16 (akash-wasil) on A dataset of questions on decision-theoretic reasoning in Newcomb-like problems · 2024-12-17T20:15:09.266Z · LW · GW

How well LLMs follow which decision theory affects their ability to cooperate. This could mean the difference between peace and conflict in AI-assisted political bargaining or enable AIs to collude when one is meant to monitor the other, undermining human control.

Do you have any thoughts on "red lines" for AI collusion? That is, "if an AI could do X, then we should acknowledge that AIs can likely collude with each other in monitoring setups."

Comment by Orpheus16 (akash-wasil) on ChristianKl's Shortform · 2024-12-11T22:02:45.492Z · LW · GW

I believe about Sacks views comes from regularly listening to the All-In Podcast where he regularly talks about AI.

Do you have any quotes or any particular podcast episodes you recommend?

if you would ask the Department of Homeland security for their justification there's a good chance that they would say "national security".

Yeah, I agree that one needs to have a pretty narrow conception of national security. In the absence of that, there's concept creep in which you can justify pretty much anything under a broad conception of national security. (Indeed, I suspect that lots of folks on the left justified a lot of general efforts to censor conservatives as a matter of national security//public safety, under the view that a Trump presidency would be disastrous for America//the world//democracy. And this is the kind of thing that clearly violates a narrower conception of national security.)

How to exactly draw the line is a difficult question, but I think most people would clearly be able to see a difference between "preventing model from outputting detailed instructions/plans to develop bioweapons" and "preventing model from voicing support for political positions that some people think are problematic." 

Comment by Orpheus16 (akash-wasil) on ChristianKl's Shortform · 2024-12-10T22:22:08.100Z · LW · GW

Can you quote (or link to) things Sacks has said that give you this impression?

My own impression is that there are many AI policy ideas that don't have anything to do with censorship (e.g., improving government technical capacity, transparency into frontier AI development, emergency preparedness efforts, efforts to increase government "situational awareness", research into HEMs and verification methods). Also things like "an AI model should not output bioweapons or other things that threaten national security" are "censorship" under some very narrow definition of censorship, but IME this is not what people mean when they say they are worried about censorship.

I haven't looked much into Sacks' particular stance here, but I think concerns around censorship are typically along the lines of "the state should not be involved in telling companies what their models can/can't say. This can be weaponized against certain viewpoints, especially conservative viewpoints. Some folks on the left are trying to do this under the guise of terms like misinformation, fairness, and bias."

Comment by Orpheus16 (akash-wasil) on Should there be just one western AGI project? · 2024-12-07T23:34:41.125Z · LW · GW

While a centralized project would get more resources, it also has more ability to pause//investigate things.

So EG if the centralized project researchers see something concerning (perhaps early warning signs of scheming), it seems more able to do a bunch of research on that Concerning Thing before advancing.

I think makes the effect of centralization on timelines unclear, at least if one expects these kinds of “warning signs” on the pathway to very advanced capabilities.

(It’s plausible that you could get something like this from a decentralized model with sufficient oversight, but this seems much harder, especially if we expect most of the technical talent to stay with the decentralized projects as opposed to joining the oversight body.)

Comment by Orpheus16 (akash-wasil) on Common misconceptions about OpenAI · 2024-12-07T23:28:19.455Z · LW · GW

Out of curiosity, what’s the rationale for not having agree/disagree votes on posts? (I feel like pretty much everyone thinks it has been a great feature for comments!)

Comment by Orpheus16 (akash-wasil) on AI Control: Improving Safety Despite Intentional Subversion · 2024-12-06T00:58:00.233Z · LW · GW

Thank you for the thoughtful response! It'll push me to move from "amateur spectator" mode to "someone who has actually read all of these posts" mode :)

Optional q: What would you say are the 2-3 most important contributions that have been made in AI control research over the last year? 

Comment by Orpheus16 (akash-wasil) on AI Control: Improving Safety Despite Intentional Subversion · 2024-12-05T16:48:31.860Z · LW · GW

A few quick thoughts on control (loosely held; I consider myself an "amateur spectator" on this topic and suspect a lot of my opinions may change if I spent 100+ hours engaging with this work):

  • I think it's a clever technical approach and part of me likes that it's become "a thing."
  • I'm worried that people have sort of "bandwagonned" onto it because (a) there are some high-status people who are involved, (b) there's an interest in finding things for technical researchers to do, and (c) it appeals to AGI companies.
  • I think the AI control discourse would improve if it focused more on discussing concrete control strategies (and their limitations) and focused less on the high-level idea that "AI control is super important and a bedrock approach for mitigating catastrophic risk." I sometimes feel like the AI control discourse is relatively shallow and just cites the idea/meme of AI control instead of making concrete empirical, methodological, or strategic contributions.
  • In contrast, I think this paper is a meaningful contribution. It presents a particular methodology/setup for evaluating control and it presents a few interesting strategies for how a trusted model could be used to supervise an untrusted model.
  • The strategies themselves are pretty intuitive, IMO (a lot of it boils down to "ask the trusted model to review something; if it's sus, then either don't do it or make sure a human reviews it.") Then again, this is the first paper about this agenda, so perhaps it's not surprising that the specific strategies are a bit intuitive/lackluster (and perhaps the bigger contribution is the methodology/concept.)
  • Should more technical people work on AI control? My best guess is yes– it does seem like a pretty exciting new field. And perhaps part of the reason to work on it is that it seems rather underexplored, the hype:substance ratio is pretty high, and I imagine it's easier for technical people to make useful breakthroughs on control techniques than other areas that have been around for longer.
  • I also think more technical people should work on policy/comms. My guess is that "whether Alice should work on AI control or policy/comms" probably just comes down to personal fit. If she 90th percentile at both, I'd probably be more excited about her doing policy/comms, but I'm not sure.
  • I'll be interested to see Buck's thoughts on the "case for ensuring" post.
Comment by Orpheus16 (akash-wasil) on Should there be just one western AGI project? · 2024-12-04T17:20:35.862Z · LW · GW

Regulation to reduce racing. Government regulation could temper racing between multiple western projects. So there are ways to reduce racing between western projects, besides centralising.

Can you say more about the kinds of regulations you're envisioning? What are your favorite ideas for regulations for (a) the current Overton Window and (b) a wider Overton Window but one that still has some constraints?

Comment by Orpheus16 (akash-wasil) on Should there be just one western AGI project? · 2024-12-04T17:17:45.123Z · LW · GW

I disagree with some of the claims made here, and I think there several worldview assumptions that go into a lot of these claims. Examples include things like "what do we expect the trajectory to ASI to look like", "how much should we worry about AI takeover risks", "what happens if a single actor ends up controlling [aligned] ASI", "what kinds of regulations can we reasonably expect absent some sort of centralized USG project", and "how much do we expect companies to race to the top on safety absent meaningful USG involvement." (TBC though I don't think it's the responsibility of the authors to go into all of these background assumptions– I think it's good for people to present claims like this even if they don't have time/space to give their Entire Model of Everything.)

Nonetheless, I agree with the bottom-line conclusion: on the margin, I suspect it's more valuable for people to figure out how to make different worlds go well than to figure out which "world" is better. In other words, asking "how do I make Centralized World or Noncentralized World more likely to go well" rather than "which one is better: Centralized World or Noncentralized World?"

More specifically, I think more people should be thinking: "Assume the USG decides to centralize AGI development or pursue some sort of AGI Manhattan Project. At that point, the POTUS or DefSec calls you in and asks you if you have any suggestions for how to maximize the chance of this going well. What do you say?"

One part of my rationale: the decisions about whether or not to centralize will be much harder to influence than decisions about what particular kind of centralized model to go with or what the implementation details of a centralized project should look like. I imagine scenarios in which the "whether to centralize" decision is largely a policy decision that the POTUS and the POTUS's close advisors make, whereas the decision of "how do we actually do this" is something that would be delegated to people lower down the chain (who are both easier to access and more likely to be devoting a lot of time to engaging with arguments about what's desirable.)

Comment by Orpheus16 (akash-wasil) on (The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser · 2024-11-30T17:02:45.937Z · LW · GW

What do you think are the biggest mistakes you/LightCone have made in the last ~2 years?

And what do you think a 90th percentile outcome looks like for you/LightCone in 2025? Would would success look like?

(Asking out of pure curiosity– I'd have these questions even if LC wasn't fundraising. I figured this was a relevant place to ask, but feel free to ignore it if it's not in the spirit of the post.)

Comment by Orpheus16 (akash-wasil) on Bogdan Ionut Cirstea's Shortform · 2024-11-22T20:24:27.310Z · LW · GW

Thanks for spelling it out. I agree that more people should think about these scenarios. I could see something like this triggering central international coordination (or conflict).

(I still don't think this would trigger the USG to take different actions in the near-term, except perhaps "try to be more secret about AGI development" and maybe "commission someone to do some sort of study or analysis on how we would handle these kinds of dynamics & what sorts of international proposals would advance US interests while preventing major conflict." The second thing is a bit optimistic but maybe plausible.)

Comment by Orpheus16 (akash-wasil) on Bogdan Ionut Cirstea's Shortform · 2024-11-22T18:11:23.204Z · LW · GW

What kinds of conflicts are you envisioning?

I think if the argument is something along the lines of "maybe at some point other countries will demand that the US stop AI progress", then from the perspective of the USG, I think it's sensible to operate under the perspective of "OK so we need to advance AI progress as much as possible and try to hide some of it, and if at some future time other countries are threatening us we need to figure out how to respond." But I don't think it justifies anything like "we should pause or start initiating international agreements."

(Separately, whether or not it's "truer" depends a lot on one's models of AGI development. Most notably: (a) how likely is misalignment and (b) how slow will takeoff be//will it be very obvious to other nations that super advanced AI is about to be developed, and (c) how will governments and bureaucracies react and will they be able to react quickly enough.)

(Also separately– I do think more people should be thinking about how these international dynamics might play out & if there's anything we can be doing to prepare for them. I just don't think they naturally lead to a "oh, so we should be internationally coordinating" mentality and instead lead to much more of a "we can do whatever we want unless/until other countries get mad at us & we should probably do things more secretly" mentality.)

Comment by Orpheus16 (akash-wasil) on Akash's Shortform · 2024-11-22T15:51:16.719Z · LW · GW

@davekasten I know you posed this question to us, but I'll throw it back on you :) what's your best-guess answer?

Or perhaps put differently: What do you think are the factors that typically influence whether the cautious folks or the non-cautious folks end up in charge? Are there any historical or recent examples of these camps fighting for power over an important operation?

Comment by Orpheus16 (akash-wasil) on Akash's Shortform · 2024-11-21T16:28:58.488Z · LW · GW

it's less clear that a non-centralized situation inevitably leads to a decisive strategic advantage for the leading project

Can you say more about what has contributed to this update?

Comment by Orpheus16 (akash-wasil) on Akash's Shortform · 2024-11-21T16:27:52.500Z · LW · GW

Can you say more about scenarios where you envision a later project happening that has different motivations?

I think in the current zeitgeist, such a project would almost definitely be primarily motivated by beating China. It doesn't seem clear to me that it's good to wait for a new zeitgeist. Reasons:

  • A company might develop AGI (or an AI system that is very good at AI R&D that can get to AGI) before a major zeitgeist change.
  • The longer we wait, the more capable the "most capable model that wasn't secured" is. So we could risk getting into a scenario where people want to pause but since China and the US both have GPT-Nminus1, both sides feel compelled to race forward (whereas this wouldn't have happened if security had kicked off sooner.)
Comment by Orpheus16 (akash-wasil) on Akash's Shortform · 2024-11-20T20:07:18.721Z · LW · GW

If you could only have "partial visibility", what are some of the things you would most want the government to be able to know?

Comment by Orpheus16 (akash-wasil) on Akash's Shortform · 2024-11-20T19:58:59.946Z · LW · GW

Another frame: If alignment turns out to be easy, then the default trajectory seems fine (at least from an alignment POV. You might still be worried about EG concentration of power). 

If alignment turns out to be hard, then the policy decisions we make to affect the default trajectory matter a lot more.

This means that even if misalignment risks are relatively low, a lot of value still comes from thinking about worlds where misalignment is hard (or perhaps "somewhat hard but not intractably hard").

Comment by Orpheus16 (akash-wasil) on Akash's Shortform · 2024-11-20T17:34:37.927Z · LW · GW

What do you think are the most important factors for determining if it results in them behaving responsibly later? 

For instance, if you were in charge of designing the AI Manhattan Project, are there certain things you would do to try to increase the probability that it leads to the USG "behaving more responsibly later?"

Comment by Orpheus16 (akash-wasil) on Akash's Shortform · 2024-11-20T17:18:51.077Z · LW · GW

Good points. Suppose you were on a USG taskforce that had concluded they wanted to go with the "subsidy model", but they were willing to ask for certain concessions from industry.

Are there any concessions/arrangements that you would advocate for? Are there any ways to do the "subsidy model" well, or do you think the model is destined to fail even if there were a lot of flexibility RE how to implement it?

Comment by Orpheus16 (akash-wasil) on Akash's Shortform · 2024-11-20T17:13:29.705Z · LW · GW

My own impression is that this would be an improvement over the status quo. Main reasons:

  • A lot of my P(doom) comes from race dynamics.
  • Right now, if a leading lab ends up realizing that misalignment risks are super concerning, they can't do much to end the race. Their main strategy would be to go to the USG.
  • If the USG runs the Manhattan Project (or there's some sort of soft nationalization in which the government ends up having a much stronger role), it's much easier for the USG to see that misalignment risks are concerning & to do something about it.
  • A national project would be more able to slow down and pursue various kinds of international agreements (the national project has more access to POTUS, DoD, NSC, Congress, etc.)
  • I expect the USG to be stricter on various security standards. It seems more likely to me that the USG would EG demand a lot of security requirements to prevent model weights or algorithmic insights from leaking to China. One of my major concerns is that people will want to pause at GPT-X but they won't feel able to because China stole access to GPT-Xminus1 (or maybe even a slightly weaker version of GPT-X).
  • In general, I feel like USG natsec folks are less "move fast and break things" than folks in SF. While I do think some of the AGI companies have tried to be less "move fast and break things" than the average company, I think corporate race dynamics & the general cultural forces have been the dominant factors and undermined a lot of attempts at meaningful corporate governance.

(Caveat that even though I see this as a likely improvement over status quo, this doesn't mean I think this is the best thing to be advocating for.)

(Second caveat that I haven't thought about this particular question very much and I could definitely be wrong & see a lot of reasonable counterarguments.)

Comment by Orpheus16 (akash-wasil) on Akash's Shortform · 2024-11-20T17:00:11.138Z · LW · GW

@davekasten @Zvi @habryka @Rob Bensinger @ryan_greenblatt @Buck @tlevin @Richard_Ngo @Daniel Kokotajlo I suspect you might have interesting thoughts on this. (Feel free to ignore though.)

Comment by Orpheus16 (akash-wasil) on Akash's Shortform · 2024-11-20T16:58:27.334Z · LW · GW

Suppose the US government pursued a "Manhattan Project for AGI". At its onset, it's primarily fuelled by a desire to beat China to AGI. However, there's some chance that its motivation shifts over time (e.g., if the government ends up thinking that misalignment risks are a big deal, its approach to AGI might change.)

Do you think this would be (a) better than the current situation, (b) worse than the current situation, or (c) it depends on XYZ factors?

Comment by Orpheus16 (akash-wasil) on Bogdan Ionut Cirstea's Shortform · 2024-11-19T23:11:21.852Z · LW · GW

We're not going to be bottlenecked by politicians not caring about AI safety. As AI gets crazier and crazier everyone would want to do AI safety, and the question is guiding people to the right AI safety policies

I think we're seeing more interest in AI, but I think interest in "AI in general" and "AI through the lens of great power competition with China" has vastly outpaced interest in "AI safety". (Especially if we're using a narrow definition of AI safety; note that people in DC often use the term "AI safety" to refer to a much broader set of concerns than AGI safety/misalignment concerns.)

I do think there's some truth to the quote (we are seeing more interest in AI and some safety topics), but I think there's still a lot to do to increase the salience of AI safety (and in particular AGI alignment) concerns.

Comment by Orpheus16 (akash-wasil) on OpenAI Email Archives (from Musk v. Altman and OpenAI blog) · 2024-11-16T16:06:19.795Z · LW · GW

A few quotes that stood out to me:

Greg:

I hope for us to enter the field as a neutral group, looking to collaborate widely and shift the dialog towards being about humanity winning rather than any particular group or company. 

Greg and Ilya (to Elon):

The goal of OpenAI is to make the future good and to avoid an AGI dictatorship. You are concerned that Demis could create an AGI dictatorship. So do we. So it is a bad idea to create a structure where you could become a dictator if you chose to, especially given that we can create some other structure that avoids this possibility.

Greg and Ilya (to Altman):

But we haven't been able to fully trust your judgements throughout this process, because we don't understand your cost function.

We don't understand why the CEO title is so important to you. Your stated reasons have changed, and it's hard to really understand what's driving it.

Is AGI truly your primary motivation? How does it connect to your political goals? How has your thought process changed over time?

Comment by Orpheus16 (akash-wasil) on Lao Mein's Shortform · 2024-11-15T20:43:04.046Z · LW · GW

and recently founded another AI company

Potentially a hot take, but I feel like xAI's contributions to race dynamics (at least thus far) have been relatively trivial. I am usually skeptical of the whole "I need to start an AI company to have a seat at the table", but I do imagine that Elon owning an AI company strengthens his voice. And I think his AI-related comms have mostly been used to (a) raise awareness about AI risk, (b) raise concerns about OpenAI/Altman, and (c) endorse SB1047 [which he did even faster and less ambiguously than Anthropic].

The counterargument here is that maybe if xAI was in 1st place, Elon's positions would shift. I find this plausible, but I also find it plausible that Musk (a) actually cares a lot about AI safety, (b) doesn't trust the other players in the race, and (c) is more likely to use his influence to help policymakers understand AI risk than any of the other lab CEOs.

Comment by Orpheus16 (akash-wasil) on Making a conservative case for alignment · 2024-11-15T20:35:31.232Z · LW · GW

I agree with many points here and have been excited about AE Studio's outreach. Quick thoughts on China/international AI governance:

  • I think some international AI governance proposals have some sort of "kum ba yah, we'll all just get along" flavor/tone to them, or some sort of "we should do this because it's best for the world as a whole" vibe. This isn't even Dem-coded so much as it is naive-coded, especially in DC circles.
  • US foreign policy is dominated primarily by concerns about US interests. Other considerations can matter, but they are not the dominant driving force. My impression is that this is true within both parties (with a few exceptions).
  • I think folks interested in international AI governance should study international security agreements and try to get a better understanding of relevant historical case studies. Lots of stuff to absorb from the Cold War, the Iran Nuclear Deal, US-China relations over the last several decades, etc. (I've been doing this & have found it quite helpful.)
  • Strong Republican leaders can still engage in bilateral/multilateral agreements that serve US interests. Recall that Reagan negotiated arms control agreements with the Soviet Union, and the (first) Trump Administration facilitated the Abraham Accords. Being "tough on China" doesn't mean "there are literally no circumstances in which I would be willing to sign a deal with China." (But there likely does have to be a clear case that the deal serves US interests, has appropriate verification methods, etc.)
Comment by Orpheus16 (akash-wasil) on Daniel Kokotajlo's Shortform · 2024-11-12T22:43:38.770Z · LW · GW

Did they have any points that you found especially helpful, surprising, or interesting? Anything you think folks in AI policy might not be thinking enough about?

(Separately, I hope to listen to these at some point & send reactions if I have any.)

Comment by Orpheus16 (akash-wasil) on dirk's Shortform · 2024-11-02T21:01:07.354Z · LW · GW

you have to spend several years resume-building before painstakingly convincing people you're worth hiring for paid work

For government roles, I think "years of experience" is definitely an important factor. But I don't think you need to have been specializing for government roles specifically.

Especially for AI policy, there are several programs that are basically like "hey, if you have AI expertise but no background in policy, we want your help." To be clear, these are often still fairly competitive, but I think it's much more about being generally capable/competent and less about having optimized your resume for policy roles. 

Comment by Orpheus16 (akash-wasil) on The Compendium, A full argument about extinction risk from AGI · 2024-11-01T14:32:26.844Z · LW · GW

I like the section where you list out specific things you think people should do. (One objection I sometimes hear is something like "I know that [evals/RSPs/if-then plans/misc] are not sufficient, but I just don't really know what else there is to do. It feels like you either have to commit to something tangible that doesn't solve the whole problem or you just get lost in a depressed doom spiral.")

I think your section on suggestions could be stronger by presenting more ambitious/impactful stories of comms/advocacy. I think there's something tricky about a document that has the vibe "this is the most important issue in the world and pretty much everyone else is approaching it the wrong way" and then pivots to "and the right way to approach it is to post on Twitter and talk to your friends." 

My guess is that you prioritized listing things that were relatively low friction and accessible. (And tbc I do think that the world would be in better shape if more people were sharing their views and contributing to the broad discourse.)

But I think when you're talking to high-context AIS people who are willing to devote their entire career to work on AI Safety, they'll be interested in more ambitious/sexy/impactful ways of contributing. 

Put differently: Should I really quit my job at [fancy company or high-status technical safety group] to Tweet about my takes, talk to my family/friends, and maybe make some website? Or are there other paths I could pursue?

As I wrote here, I think we have some of those ambitious/sexy/high-impact role models that could be used to make this pitch stronger, more ambitious, and more inspiring. EG:

One possible critique is that their suggestions are not particularly ambitious. This is likely because they're writing for a broader audience (people who haven't been deeply engaged in AI safety).

For people who have been deeply engaged in AI safety, I think the natural steelman here is "focus on helping the public/government better understand the AI risk situation." 

There are at least some impactful and high-status examples of this (e.g., Hinton, Bengio, Hendrycks). I think in the last few years, for instance, most people would agree that Hinton/Bengio/Hendrycks have had far more impact in their communications/outreach/policy work than their technical research work.

And it's not just the famous people– I can think of ~10 junior or mid-career people who left technical research in the last year to help policymakers better understand AI progress and AI risk, and I think their work is likely far more impactful than if they had stayed in technical research. (And I think is true even if I exclude technical people who are working on evals/if-then plans in govt. Like, I'm focusing on people who see their primary purpose as helping the public or policymakers develop "situational awareness", develop stronger models of AI progress and AI risk, understand the conceptual arguments for misalignment risk, etc.)

I'd also be curious to hear what your thoughts are on people joining government organizations (like the US AI Safety Institute, UK AI Safety Institute, Horizon Fellowship, etc.) Most of your suggestions seem to involve contributing from outside government, and I'd be curious to hear more about your suggestions for people who are either working in government or open to working in government.