Posts

Akash's Shortform 2024-04-18T15:44:25.096Z
Cooperating with aliens and AGIs: An ECL explainer 2024-02-24T22:58:47.345Z
OpenAI's Preparedness Framework: Praise & Recommendations 2024-01-02T16:20:04.249Z
Speaking to Congressional staffers about AI risk 2023-12-04T23:08:52.055Z
Navigating emotions in an uncertain & confusing world 2023-11-20T18:16:09.492Z
International treaty for global compute caps 2023-11-09T18:17:04.952Z
Chinese scientists acknowledge xrisk & call for international regulatory body [Linkpost] 2023-11-01T13:28:43.723Z
Winners of AI Alignment Awards Research Contest 2023-07-13T16:14:38.243Z
AI Safety Newsletter #8: Rogue AIs, how to screen for AI risks, and grants for research on democratic governance of AI 2023-05-30T11:52:31.669Z
AI Safety Newsletter #7: Disinformation, Governance Recommendations for AI labs, and Senate Hearings on AI 2023-05-23T21:47:34.755Z
Eisenhower's Atoms for Peace Speech 2023-05-17T16:10:38.852Z
AI Safety Newsletter #6: Examples of AI safety progress, Yoshua Bengio proposes a ban on AI agents, and lessons from nuclear arms control 2023-05-16T15:14:45.921Z
AI Safety Newsletter #5: Geoffrey Hinton speaks out on AI risk, the White House meets with AI labs, and Trojan attacks on language models 2023-05-09T15:26:55.978Z
AI Safety Newsletter #4: AI and Cybersecurity, Persuasive AIs, Weaponization, and Geoffrey Hinton talks AI risks 2023-05-02T18:41:43.144Z
Discussion about AI Safety funding (FB transcript) 2023-04-30T19:05:34.009Z
Reframing the burden of proof: Companies should prove that models are safe (rather than expecting auditors to prove that models are dangerous) 2023-04-25T18:49:29.042Z
DeepMind and Google Brain are merging [Linkpost] 2023-04-20T18:47:23.016Z
AI Safety Newsletter #2: ChaosGPT, Natural Selection, and AI Safety in the Media 2023-04-18T18:44:35.923Z
Request to AGI organizations: Share your views on pausing AI progress 2023-04-11T17:30:46.707Z
AI Safety Newsletter #1 [CAIS Linkpost] 2023-04-10T20:18:57.485Z
Reliability, Security, and AI risk: Notes from infosec textbook chapter 1 2023-04-07T15:47:16.581Z
New survey: 46% of Americans are concerned about extinction from AI; 69% support a six-month pause in AI development 2023-04-05T01:26:51.830Z
[Linkpost] Critiques of Redwood Research 2023-03-31T20:00:09.784Z
What would a compute monitoring plan look like? [Linkpost] 2023-03-26T19:33:46.896Z
The Overton Window widens: Examples of AI risk in the media 2023-03-23T17:10:14.616Z
The Wizard of Oz Problem: How incentives and narratives can skew our perception of AI developments 2023-03-20T20:44:29.445Z
[Linkpost] Scott Alexander reacts to OpenAI's latest post 2023-03-11T22:24:39.394Z
Questions about Conjecure's CoEm proposal 2023-03-09T19:32:50.600Z
AI Governance & Strategy: Priorities, talent gaps, & opportunities 2023-03-03T18:09:26.659Z
Fighting without hope 2023-03-01T18:15:05.188Z
Qualities that alignment mentors value in junior researchers 2023-02-14T23:27:40.747Z
4 ways to think about democratizing AI [GovAI Linkpost] 2023-02-13T18:06:41.208Z
How evals might (or might not) prevent catastrophic risks from AI 2023-02-07T20:16:08.253Z
[Linkpost] Google invested $300M in Anthropic in late 2022 2023-02-03T19:13:32.112Z
Many AI governance proposals have a tradeoff between usefulness and feasibility 2023-02-03T18:49:44.431Z
Talk to me about your summer/career plans 2023-01-31T18:29:23.351Z
Advice I found helpful in 2022 2023-01-28T19:48:23.160Z
11 heuristics for choosing (alignment) research projects 2023-01-27T00:36:08.742Z
"Status" can be corrosive; here's how I handle it 2023-01-24T01:25:04.539Z
[Linkpost] TIME article: DeepMind’s CEO Helped Take AI Mainstream. Now He’s Urging Caution 2023-01-21T16:51:09.586Z
Wentworth and Larsen on buying time 2023-01-09T21:31:24.911Z
[Linkpost] Jan Leike on three kinds of alignment taxes 2023-01-06T23:57:34.788Z
My thoughts on OpenAI's alignment plan 2022-12-30T19:33:15.019Z
An overview of some promising work by junior alignment researchers 2022-12-26T17:23:58.991Z
Podcast: Tamera Lanham on AI risk, threat models, alignment proposals, externalized reasoning oversight, and working at Anthropic 2022-12-20T21:39:41.866Z
12 career-related questions that may (or may not) be helpful for people interested in alignment research 2022-12-12T22:36:21.936Z
Podcast: Shoshannah Tekofsky on skilling up in AI safety, visiting Berkeley, and developing novel research ideas 2022-11-25T20:47:09.832Z
Announcing AI Alignment Awards: $100k research contests about goal misgeneralization & corrigibility 2022-11-22T22:19:09.419Z
Ways to buy time 2022-11-12T19:31:10.411Z
Instead of technical research, more people should focus on buying time 2022-11-05T20:43:45.215Z

Comments

Comment by Akash (akash-wasil) on MATS Winter 2023-24 Retrospective · 2024-05-12T01:14:57.773Z · LW · GW

Thank you for explaining the shift from scholar support to research management— I found that quite interesting and I don’t think I would’ve intuitively assumed that the research management frame would be more helpful.

I do wonder if as the summer progresses, the role of the RM should shift from writing the reports for mentors to helping the fellows prepare their own reports for mentors. IMO, fellows getting into the habit of providing these updates & learning how to “manage up” when it comes to mentors seems important. I suspect something in the cluster of “being able to communicate well with mentors//manage your mentor+collaborator relationships” is one of the most important “soft skills” for research success. I suspect a transition from “tell your RM things that they include in their report” to “work with your RM to write your own report” would help instill this skill.

Comment by Akash (akash-wasil) on MATS Winter 2023-24 Retrospective · 2024-05-12T01:02:45.972Z · LW · GW

Somewhat striking that the top 3 orgs on the career interest survey are Anthropic, DeepMind, and OpenAI.

I personally suspect that these are not the most impactful places for most MATS scholars to work (relative to say, UKAISI/USAISI, METR, starting new orgs/projects).

Regardless, curious if you have any thoughts on this & if it reflects anything about the culture/epistemics in MATS.

(And to be clear, I think the labs do have alignment teams that care about making progress & I suspect that there are some cases where joining a frontier lab alignment team is the most impactful thing for a scholar.)

Comment by Akash (akash-wasil) on Introducing AI Lab Watch · 2024-05-09T20:55:51.265Z · LW · GW

Could consider “frontier AI watch”, “frontier AI company watch”, or “AGI watch.”

Most people in the world (including policymakers) have a much broader conception of AI. AI means machine learning, AI is the thing that 1000s of companies are using and 1000s of academics are developing, etc etc.

Comment by Akash (akash-wasil) on RobertM's Shortform · 2024-05-08T21:29:09.244Z · LW · GW

I haven't followed this in great detail, but I do remember hearing from many AI policy people (including people at the UKAISI) that such commitments had been made.

It's plausible to me that this was an example of "miscommunication" rather than "explicit lying." I hope someone who has followed this more closely provides details.

But note that I personally think that AGI labs have a responsibility to dispel widely-believed myths. It would shock me if OpenAI/Anthropic/Google DeepMind were not aware that people (including people in government) believed that they had made this commitment. If you know that a bunch of people think you committed to sending them your models, and your response is "well technically we never said that but let's just leave it ambiguous and then if we defect later we can just say we never committed", I still think it's fair for people to be disappointed in the labs.

(I do think this form of disappointment should not be conflated with "you explicitly said X and went back on it", though.)

Comment by Akash (akash-wasil) on Introducing AI Lab Watch · 2024-05-05T18:24:04.242Z · LW · GW

There should be points for how the organizations act wrt to legislation. In the SB 1047 bill that CAIS co-sponsored, we've noticed some AI companies to be much more antagonistic than others. I think is is probably a larger differentiator for an organization's goodness or badness.

@Dan H are you able to say more about which companies were most/least antagonistic?

Comment by Akash (akash-wasil) on Buck's Shortform · 2024-05-03T15:57:45.015Z · LW · GW

It is pretty plausible to me that AI control is quite easy

I think it depends on how you're defining an "AI control success". If success is defined as "we have an early transformative system that does not instantly kill us– we are able to get some value out of it", then I agree that this seems relatively easy under the assumptions you articulated.

If success is defined as "we have an early transformative that does not instantly kill us and we have enough time, caution, and organizational adequacy to use that system in ways that get us out of an acute risk period", then this seems much harder.

The classic race dynamic threat model seems relevant here: Suppose Lab A implements good control techniques on GPT-8, and then it's trying very hard to get good alignment techniques out of GPT-8 to align a successor GPT-9. However, Lab B was only ~2 months behind, so Lab A feels like it needs to figure all of this out within 2 months. Lab B– either because it's less cautious or because it feels like it needs to cut corners to catch up– either doesn't want to implement the control techniques or it's fine implementing the control techniques but it plans to be less cautious around when we're ready to scale up to GPT-9. 

I think it's fine to say "the control agenda is valuable even if it doesn't solve the whole problem, and yes other things will be needed to address race dynamics otherwise you will only be able to control GPT-8 for a small window of time before you are forced to scale up prematurely or hope that your competitor doesn't cause a catastrophe." But this has a different vibe than "AI control is quite easy", even if that statement is technically correct.

(Also, please do point out if there's some way in which the control agenda "solves" or circumvents this threat model– apologies if you or Ryan has written/spoken about it somewhere that I missed.)

Comment by Akash (akash-wasil) on Questions for labs · 2024-05-02T20:15:27.322Z · LW · GW

Right now, I think one of the most credible ways for a lab to show its committment to safety is through its engagement with governments.

I didn’t mean to imply that a lab should automatically be considered “bad” if its public advocacy and its private advocacy differ.

However, when assessing how “responsible” various actors are, I think investigating questions relating to their public comms, engagement with government, policy proposals, lobbying efforts, etc would be valuable.

If Lab A had slightly better internal governance but lab B had better effects on “government governance”, I would say that lab B is more “responsible” on net.

Comment by Akash (akash-wasil) on Questions for labs · 2024-05-02T16:36:52.877Z · LW · GW

@Zach Stein-Perlman, great work on this. I would be interested in you brainstorming some questions that have to do with the lab's stances toward (government) AI policy interventions.

After a quick 5 min brainstorm, here are some examples of things that seem relevant:

  • I remember hearing that OpenAI lobbied against the EU AI Act– what's up with that?
  • I heard a rumor that Congresspeople and their teams reached out to Sam/OpenAI after his testimony. They allegedly asked for OpenAI's help to craft legislation around licensing, and then OpenAI refused. Is that true? 
  • Sam said we might need an IAEA for AI at some point– what did he mean by this? At what point would he see that as valuable?
  • In general, what do labs think the US government should be doing? What proposals would they actively support or even help bring about? (Flagging ofc that there are concerns about actual and perceived regulatory capture, but there are also major advantages to having industry players support & contribute to meaningful regulation).
  • Senator Cory Booker recently asked Jack Clark something along the lines of "what is your top policy priority right now//what would you do if you were a Senator." Jack responded with something along the lines of "I would make sure the government can deploy AI successfully. We need a testing regime to better understand risks, but the main risk is that we don't use AI enough, and we need to make sure we stay at the cutting edge." What's up with that?
  • Why haven't Dario and Jack made public statements about specific government interventions? Do they believe that there are some circumstances under which a moratorium would need to be implemented, labs would need to be nationalized (or internationalized), or something else would need to occur to curb race dynamics? (This could be asked to any of the lab CEOs/policy team leads– I don't mean to be picking on Anthropic, though I think Sam/OpenAI have had more public statements here, and I think the other labs are scoring more poorly across the board//don't fully buy into the risks in the first place.)
  • Big tech is spending a lot of money on AI lobbying. How much is each lab spending (this is something you can estimate with publicly available data), and what are they actually lobbying for/against?

I imagine there's a lot more in this general category of "labs and how they are interacting with governments and how they are contributing to broader AI policy efforts", and I'd be excited to see AI Lab Watch (or just you) dive into this more.

Comment by Akash (akash-wasil) on Introducing AI Lab Watch · 2024-05-02T00:20:06.907Z · LW · GW

I can imagine this growing into the default reference that people use when talking about whether labs are behaving responsibly.

I hope that this resource is used as a measure of relative responsibleness, and this doesn't get mixed up with absolute responsibleness. My understanding is that the resource essentially says, "here's some things that would be good– let's see how the labs compare on each dimension." The resource is not saying "if a lab gets a score above X% on each metric, then we are quite confident that the lab will not cause an existential catastrophe."

Moreover, my understanding is that the resource is not taking a position on whether or not it is "responsible"– in some absolute sense– for a lab to be scaling toward AGI in our current world. I see the resource as saying "conditional on a lab scaling toward AGI, are they doing so in a way that is relatively more/less responsible compared to the others that are scaling toward AGI."

This might be a pedantic point, but I think it's an important one to emphasize– a lab can score in 1st place and still present a risk to humanity that reasonable people would still deem unacceptable & irresponsible (or put differently, a lab can score in 1st place and still produce a catastrophe). 

Comment by Akash (akash-wasil) on tlevin's Shortform · 2024-05-01T15:15:17.582Z · LW · GW

Agree with lots of this– a few misc thoughts [hastily written]:

  1. I think the Overton Window frame ends up getting people to focus too much on the dimension "how radical is my ask"– in practice, things are usually much more complicated than this. In my opinion, a preferable frame is something like "who is my target audience and what might they find helpful." If you're talking to someone who makes it clear that they will not support X, it's silly to keep on talking about X. But I think the "target audience first" approach ends up helping people reason in a more sophisticated way about what kinds of ideas are worth bringing up. As an example, in my experience so far, many policymakers are curious to learn more about intelligence explosion scenarios and misalignment scenarios (the more "radical" and "speculative" threat models). 
  2. I don't think it's clear that the more effective actors in DC tend to be those who look for small wins. Outside of the AIS community, there sure do seem to be a lot of successful organizations that take hard-line positions and (presumably) get a lot of their power/influence from the ideological purity that they possess & communicate. Whether or not these organizations end up having more or less influence than the more "centrist" groups is, in my view, not a settled question & probably varies a lot by domain. In AI safety in particular, I think my main claim is something like "pretty much no group– whether radical or centrist– has had tangible wins. When I look at the small set of tangible wins, it seems like the groups involved were across the spectrum of "reasonableness."
  3. The more I interact with policymakers, the more I'm updating toward something like "poisoning the well doesn't come from having radical beliefs– poisoning the well comes from lamer things like being dumb or uninformed, wasting peoples' time, not understanding how the political process works, not having tangible things you want someone to do, explaining ideas poorly, being rude or disrespectful, etc." I've asked ~20-40 policymakers (outside of the AIS bubble) things like "what sorts of things annoy you about meetings" or "what tends to make meetings feel like a waste of your time", and no one ever says "people come in with ideas that are too radical." The closest thing I've heard is people saying that they dislike it when groups fail to understand why things aren't able to happen (like, someone comes in thinking their idea is great, but then they fail to understand that their idea needs approval from committee A and appropriations person B and then they're upset about why things are moving slowly). It seems to me like many policy folks (especially staffers and exec branch subject experts) are genuinely interested in learning more about the beliefs and worldviews that have been prematurely labeled as "radical" or "unreasonable" (or perhaps such labels were appropriate before chatGPT but no longer are).
  4. A reminder that those who are opposed to regulation have strong incentives to make it seem like basically-any-regulation is radical/unreasonable. An extremely common tactic is for industry and its allies to make common-sense regulation seem radical/crazy/authoritarian & argue that actually the people proposing strong policies are just making everyone look bad & argue that actually we should all rally behind [insert thing that isn't a real policy.] (I admit this argument is a bit general, and indeed I've made it before, so I won't harp on it here. Also I don't think this is what Trevor is doing– it is indeed possible to raise serious discussions about "poisoning the well" even if one believes that the cultural and economic incentives disproportionately elevate such points).
  5. In the context of AI safety, it seems to me like the most high-influence Overton Window moves have been positive– and in fact I would go as far as to say strongly positive. Examples that come to mind include the CAIS statement, FLI pause letter, Hinton leaving Google, Bengio's writings/speeches about rogue AI & loss of control, Ian Hogarth's piece about the race to god-like AI, and even Yudkowsky's TIME article. 
  6. I think some of our judgments here depend on underlying threat models and an underlying sense of optimism vs. pessimism. If one things that labs making voluntary agreements/promises and NIST contributing to the development of voluntary standards are quite excellent ways to reduce AI risk, then the groups that have helped make this happen deserve a lot of credit. If one thinks that much more is needed to meaningfully reduce xrisk, then the groups that are raising awareness about the nature of the problem, making high-quality arguments about threat models, and advocating for stronger policies deserve a lot of credit.

I agree that more research on this could be useful. But I think it would be most valuable to focus less on "is X in the Overton Window" and more on "is X written/explained well and does it seem to have clear implications for the target stakeholders?" 

Comment by Akash (akash-wasil) on AI Regulation is Unsafe · 2024-04-25T00:58:41.196Z · LW · GW

I'm not sure who you've spoken to, but at least among the AI policy people who I talk to regularly (which admittedly is a subset of people who I think are doing the most thoughtful/serious work), I think nearly all of them have thought about ways in which regulation + regulatory capture could be net negative. At least to the point of being able to name the relatively "easy" ways (e.g., governments being worse at alignment than companies).

I continue to think people should be forming alliances with those who share similar policy objectives, rather than simply those who belong in the "I believe xrisk is a big deal" camp. I've seen many instances in which the "everyone who believes xrisk is a big deal belongs to the same camp" mentality has been used to dissuade people from communicating their beliefs, communicating with policymakers, brainstorming ideas that involve coordination with other groups in the world, disagreeing with the mainline views held by a few AIS leaders, etc.

The cultural pressures against policy advocacy have been so strong that it's not surprising to see folks say things like "perhaps our groups are no longer natural allies" now that some of the xrisk-concerned people are beginning to say things like "perhaps the government should have more of a say in how AGI development goes than in status quo, where the government has played ~0 role and ~all decisions have been made by private companies."

Perhaps there's a multiverse out there in which the AGI community ended up attracting govt natsec folks instead of Bay Area libertarians, and the cultural pressures are flipped. Perhaps in that world, the default cultural incentives pushed people heavily brainstorming ways that markets and companies could contribute meaningfully to the AGI discourse, and the default position for the "AI risk is a big deal" camp was "well obviously the government should be able to decide what happens and it would be ridiculous to get companies involved– don't be unilateralist by going and telling VCs about this stuff."

I bring up this (admittedly kinda weird) hypothetical to point out just how skewed the status quo is. One might generally be wary of government overinvolvement in regulating emerging technologies yet still recognize that some degree of regulation is useful, and that position would likely still push them to be in the "we need more regulation than we currently have" camp.

As a final note, I'll point out to readers less familiar with the AI policy world that serious people are proposing lots of regulation that is in between "status quo with virtually no regulation" and "full-on pause." Some of my personal favorite examples include: emergency preparedness (akin to the OPPR), licensing (see Romney), reporting requirements, mandatory technical standards enforced via regulators, and public-private partnerships.

Comment by Akash (akash-wasil) on Akash's Shortform · 2024-04-24T20:14:37.346Z · LW · GW

I'm interested in writing out somewhat detailed intelligence explosion scenarios. The goal would be to investigate what kinds of tools the US government would have to detect and intervene in the early stages of an intelligence explosion. 

If you know anyone who has thought about these kinds of questions, whether from the AI community or from the US government perspective, please feel free to reach out via LessWrong.

Comment by Akash (akash-wasil) on Express interest in an "FHI of the West" · 2024-04-19T00:46:02.968Z · LW · GW

To what extent would the organization be factoring in transformative AI timelines? It seems to me like the kinds of questions one would prioritize in a "normal period" look very different than the kinds of questions that one would prioritize if they place non-trivial probability on "AI may kill everyone in <10 years" or "AI may become better than humans on nearly all cognitive tasks in <10 years."

I ask partly because I personally would be more excited of a version of this that wasn't ignoring AGI timelines, but I think a version of this that's not ignoring AGI timelines would probably be quite different from the intellectual spirit/tradition of FHI.

More generally, perhaps it would be good for you to describe some ways in which you expect this to be different than FHI. I think the calling it the FHI of the West, the explicit statement that it would have the intellectual tradition of FHI, and the announcement right when FHI dissolves might make it seem like "I want to copy FHI" as opposed to "OK obviously I don't want to copy it entirely I just want to draw on some of its excellent intellectual/cultural components." If your vision is the latter, I'd find it helpful to see a list of things that you expect to be similar/different.)

Comment by Akash (akash-wasil) on peterbarnett's Shortform · 2024-04-19T00:25:23.573Z · LW · GW

I would strongly suggest considering hires who would be based in DC (or who would hop between DC and Berkeley). In my experience, being in DC (or being familiar with DC & having a network in DC) is extremely valuable for being able to shape policy discussions, know what kinds of research questions matter, know what kinds of things policymakers are paying attention to, etc.

I would go as far as to say something like "in 6 months, if MIRI's technical governance team has not achieved very much, one of my top 3 reasons for why MIRI failed would be that they did not engage enough with DC people//US policy people. As a result, they focused too much on questions that Bay Area people are interested in and too little on questions that Congressional offices and executive branch agencies are interested in. And relatedly, they didn't get enough feedback from DC people. And relatedly, even the good ideas they had didn't get communicated frequently enough or fast enough to relevant policymakers. And relatedly... etc etc."

I do understand this trades off against everyone being in the same place, which is a significant factor, but I think the cost is worth it. 

Comment by Akash (akash-wasil) on Akash's Shortform · 2024-04-19T00:18:28.934Z · LW · GW

I do think evaporative cooling is a concern, especially if everyone (or a very significant amount) of people left. But I think on the margin more people should be leaving to work in govt. 

I also suspect that a lot of systemic incentives will keep a greater-than-optimal proportion of safety-conscious people at labs as opposed to governments (labs pay more, labs are faster and have less bureaucracy, lab people are much more informed about AI, labs are more "cool/fun/fast-paced", lots of govt jobs force you to move locations, etc.)

I also think it depends on the specific lab– EG in light of the recent OpenAI departures, I suspect there's a stronger case for staying at OpenAI right now than for DeepMind or Anthropic. 

Comment by Akash (akash-wasil) on AI #60: Oh the Humanity · 2024-04-18T15:48:33.702Z · LW · GW

Daniel Kokotajlo has quit OpenAI

I think now is a good time for people at labs to seriously consider quitting & getting involved in government/policy efforts.

I don't think everyone should leave labs (obviously). But I would probably hit a button that does something like "everyone at a lab governance team and many technical researchers spend at least 2 hours thinking/writing about alternative options they have & very seriously consider leaving."

My impression is that lab governance is much less tractable (lab folks have already thought a lot more about AGI) and less promising (competitive pressures are dominating) than government-focused work. 

I think governments still remain unsure about what to do, and there's a lot of potential for folks like Daniel K to have a meaningful role in shaping policy, helping natsec folks understand specific threat models, and raising awareness about the specific kinds of things governments need to do in order to mitigate risks.

There may be specific opportunities at labs that are very high-impact, but I think if someone at a lab is "not really sure if what they're doing is making a big difference", I would probably hit a button that allocates them toward government work or government-focused comms work.

Comment by Akash (akash-wasil) on Akash's Shortform · 2024-04-18T15:44:25.830Z · LW · GW

I think now is a good time for people at labs to seriously consider quitting & getting involved in government/policy efforts.

I don't think everyone should leave labs (obviously). But I would probably hit a button that does something like "everyone at a lab governance team and many technical researchers spend at least 2 hours thinking/writing about alternative options they have & very seriously consider leaving."

My impression is that lab governance is much less tractable (lab folks have already thought a lot more about AGI) and less promising (competitive pressures are dominating) than government-focused work. 

I think governments still remain unsure about what to do, and there's a lot of potential for folks like Daniel K to have a meaningful role in shaping policy, helping natsec folks understand specific threat models, and raising awareness about the specific kinds of things governments need to do in order to mitigate risks.

There may be specific opportunities at labs that are very high-impact, but I think if someone at a lab is "not really sure if what they're doing is making a big difference", I would probably hit a button that allocates them toward government work or government-focused comms work.

Written on a Slack channel in response to discussions about some folks leaving OpenAI. 

Comment by Akash (akash-wasil) on Paul Christiano named as US AI Safety Institute Head of AI Safety · 2024-04-17T14:27:02.108Z · LW · GW

I think this should be broken down into two questions:

  1. Before the EO, if we were asked to figure out where this kind of evals should happen, what institution would we pick & why?
  2. After the EO, where does it make sense for evals-focused people to work?

I think the answer to #1 is quite unclear. I personally think that there was a strong case that a natsec-focused USAISI could have been given to DHS or DoE or some interagency thing. In addition to the point about technical expertise, it does seem relatively rare for Commerce/NIST to take on something that is so natsec-focused. 

But I think the answer to #2 is pretty clear. The EO clearly tasks NIST with this role, and now I think our collective goal should be to try to make sure NIST can execute as effectively as possible. Perhaps there will be future opportunities to establish new places for evals work, alignment work, risk monitoring and forecasting work, emergency preparedness planning, etc etc. But for now, whether we think it was the best choice or not, NIST/USAISI are clearly the folks who are tasked with taking the lead on evals + standards.

Comment by Akash (akash-wasil) on Paul Christiano named as US AI Safety Institute Head of AI Safety · 2024-04-16T20:48:23.535Z · LW · GW

I'm excited to see how Paul performs in the new role. He's obviously very qualified on a technical level, and I suspect he's one of the best people for the job of designing and conducting evals.

I'm more uncertain about the kind of influence he'll have on various AI policy and AI national security discussions. And I mean uncertain in the genuine "this could go so many different ways" kind of way. 

Like, it wouldn't be particularly surprising to me if any of the following occurred:

  • Paul focuses nearly all of his efforts on technical evals and doesn't get very involved in broader policy conversations
  • Paul is regularly asked to contribute to broader policy discussions, and he advocates for RSPs and other forms of voluntary commitments.
  • Paul is regularly asked to contribute to broader policy discussions, and he advocates for requirements that go beyond voluntary commitments and are much more ambitious than what he advocated for when he was at ARC.
  • Paul is regularly asked to contribute to broader policy discussions, and he's not very good at communicating his beliefs in ways that are clear/concise/policymaker-friendly, so his influence on policy discussions is rather limited.
  • Paul [is/isn't] able to work well with others who have very different worldviews and priorities.

Personally, I see this as a very exciting opportunity for Paul to form an identity as a leader in AI policy. I'm guessing the technical work will be his priority (and indeed, it's what he's being explicitly hired to do), but I hope he also finds ways to just generally improve the US government's understanding of AI risk and the likelihood of implementing reasonable policies. On the flipside, I hope he doesn't settle for voluntary commitments (especially as the Overton Window shifts) & I hope he's clear/open about the limitations of RSPs.

More specifically, I hope he's able to help policymakers reason about a critical question: what do we do after we've identified models with (certain kinds of) dangerous capabilities? I think the underlying logic behind RSPs could actually be somewhat meaningfully applied to USG policy. Like, I think we would be in a safer world if the USG had an internal understanding of ASL levels, took seriously the possibility of various dangerous capabilities thresholds being crossed, took seriously the idea that AGI/ASI could be developed soon, and had preparedness plans in place that allowed them to react quickly in the event of a sudden risk. 

Anyways, a big congratulations to Paul, and definitely some evidence that the USAISI is capable of hiring some technical powerhouses. 

Comment by Akash (akash-wasil) on What convincing warning shot could help prevent extinction from AI? · 2024-04-16T20:22:41.405Z · LW · GW

I'll admit I have only been loosely following the control stuff, but FWIW I would be excited about a potential @peterbarnett & @ryan_greenblatt dialogue in which you two to try to identify & analyze any potential disagreements. Example questions:

  • What is the most capable system that you think we are likely to be able to control?
  • What kind of value do you think we could get out of such a system?
  • To what extent do you expect that system to be able to produce insights that help us escape the acute risk period (i.e., get out of a scenario where someone else can come along and build a catastrophe-capable system without implementing control procedures or someone else comes along and scales to the point where the control procedures are no longer sufficient)
Comment by Akash (akash-wasil) on Anthropic AI made the right call · 2024-04-15T23:02:10.940Z · LW · GW

Here are three possible scenarios:

Scenario 1, Active Lying– Anthropic staff were actively spreading the idea that they would not push the frontier.

Scenario 2, Allowing misconceptions to go unchecked– Anthropic staff were aware that many folks in the AIS world thought that Anthropic had committed to not pushing the frontier, and they allowed this misconception to go unchecked, perhaps because they realized that it was a misconception that favored their commercial/competitive interests.

Scenario 3, Not being aware– Anthropic staff were not aware that many folks had this belief. Maybe they heard it once or twice but it never really seemed like a big deal.

Scenario 1 is clearly bad. Scenarios 2 and 3 are more interesting. To what extent does Anthropic have the responsibility to clarify misconceptions (avoid scenario 2) and even actively look for misconceptions (avoid scenario 3)?

I expect this could matter tangibly for discussions of RSPs. My opinion is that the Anthropic RSP is written in such a way that readers can come away with rather different expectations of what kinds of circumstances would cause Anthropic to pause/resume.

It wouldn't be very surprising to me if we end up seeing a situation where many readers say "hey look, we've reached an ASL-3 system, so now you're going to pause, right?" And then Anthropic says "no no, we have sufficient safeguards– we can keep going now." And then some readers say "wait a second– what? I'm pretty sure you committed to pausing until your safeguards were better than that." And then Anthropic says "no... we never said exactly what kinds of safeguards we would need, and our leadership's opinion is that our safeguards are sufficient, and the RSP allows leadership to determine when it's fine to proceed."

In this (hypothetical) scenario, Anthropic never lied, but it benefitted from giving off a more cautious impression, and it didn't take steps to correct this impression.

I think avoiding these kinds of scenarios requires some mix of:

  • Clear, specific falsifiable statements on behalf of labs. 
  • Some degree of proactive attempts to identify and alleviate misconceptions

One counterargument is something like "Anthropic is a company, and there are lots of things to do, and this is is demanding an unusually high amount of attention-to-detail and proactive communication that is not typically expected of companies." To which my response is something like "yes, but I think it's reasonable to hold companies to such standards if they wish to develop AGI. I think we ought to hold Anthropic and other labs to this standard, especially insofar as they want the benefits associated with being perceived as the kind of safety-conscious lab that refuses to push the frontier or commits to scaling policies that include tangible/concrete plans to pause."

Comment by Akash (akash-wasil) on What convincing warning shot could help prevent extinction from AI? · 2024-04-15T21:32:10.409Z · LW · GW

I like these examples. One thing I'll note, however, is that I think the "warning shot discourse" on LW tends to focus on warning shots that would be convincing to a LW-style audience.

If the theory of the change behind the warning shot requires LW-types (for example, folks at OpenAI/DeepMind/Anthropic who are relatively familiar with AGI xrisk arguments) to become concerned, this makes sense.

But usually, when I think about winning worlds that involve warning shots, I think about government involvement as the main force driving coordination, an end to race dynamics, etc.

[Caveating that my models of the USG and natsec community are still forming, so epistemic status is quite uncertain for the rest of this message, but I figure some degree of speculation could be helpful anyways].

I expect the kinds of warning shots that would be concerning to governments/national security folks will look quite different than the kinds of warning shots that would be convincing to technical experts.

LW-style warning shots tend to be more– for a lack of a better term– rational. They tend to be rooted in actual threat models (e.g., we understand that if an AI can copy its weights, it can create tons of copies and avoid being easily turned off, and we also understand that its general capabilities are sufficiently strong that we may be close to an intelligence explosion or highly capable AGI).

In contrast, without this context, I don't think that "we caught an AI model copying its weights" would necessarily be a warning shot for USG/natsec folks. It could just come across as "oh something weird happened but then the company caught it and fixed it." Instead, I suspect the warning shots that are most relevant to natsec folks might be less "rational", and by that I mean "less rooted in actual AGI threat models but more rooted in intuitive things that seem scary."

Examples of warning shots that I expect USG/natsec people would be more concerned about:

  • An AI system can generate novel weapons of mass destruction
  • Someone uses an AI system to hack into critical infrastructure or develop new zero-days that impress folks in the intelligence community.
  • A sudden increase in the military capabilities of AI

These don't relate as much to misalignment risk or misalignment-focused xrisk threat models. As a result, a disadvantage of these warning shots is that it may be harder to make a convincing case for interventions that focus specifically on misalignment. However, I think they are the kinds of things that might involve a sudden increase in the attention that the USG places on AI/AGI, the amount of resources it invests into understanding national security threats from AI, and its willingness to take major actions to intervene in the current AI race.

As such, in addition to asking questions like "what is the kind of warning shot that would convince me and my friends that we have something dangerous", I think it's worth separately asking "what is the kind of warning shot that would convince natsec folks that something dangerous or important is occurring, regardless of whether or not it connects neatly to AGI risk threat models." 

My impression is that the policy-relevant warning shots will be the most important ones to be prepared for, and the community may (for cultural/social/psychological reasons) be focusing too little effort on trying to prepare for these kinds of "irrational" warning shots. 

Comment by Akash (akash-wasil) on Speaking to Congressional staffers about AI risk · 2024-02-28T21:28:08.435Z · LW · GW

If memory serves me well, I was informed by Hendrycks' overview of catastrophic risks. I don't think it's a perfect categorization, but I think it does a good job laying out some risks that feel "less speculative" (e.g., malicious use, race dynamics as a risk factor that could cause all sorts of threats) while including those that have been painted as "more speculative" (e.g., rogue AIs).

I've updated toward the importance of explaining & emphasizing risks from sudden improvements in AI capabilities, AIs that can automate AI research, and intelligence explosions. I also think there's more appetite for that now than there used to be. 

Comment by Akash (akash-wasil) on Daniel Kokotajlo's Shortform · 2024-02-23T14:22:31.341Z · LW · GW

What work do you think is most valuable on the margin (for those who agree with you on many of these points)?

Comment by Akash (akash-wasil) on The case for ensuring that powerful AIs are controlled · 2024-01-26T10:48:32.835Z · LW · GW

I'd be curious to hear more about your main disagreements.

Comment by Akash (akash-wasil) on The case for ensuring that powerful AIs are controlled · 2024-01-26T10:47:15.352Z · LW · GW

Good point. I think it's helpful when people working schemes with the rough flavor of "we do X, and then X helps us get to a useful AI that does not takeover" try to specify roughly how capable they expect the "useful AI does not takeover" to be.

Would be curious to hear more about the kinds of tasks that Ryan and Buck expect the first "transformatively useful but still controllable" AI will be able to do (perhaps expressed in multiplier terms, like 30X human SW engineer, perhaps expressed in terms of the kinds of novel reasoning it can perform, and perhaps expressed in terms of the kinds of research tasks or policy tasks it would be able to meaningfully help with).

Comment by Akash (akash-wasil) on The case for ensuring that powerful AIs are controlled · 2024-01-25T12:19:55.079Z · LW · GW

I'm excited to see how the AI control research direction evolves. 

After skimming the post, it seems like it makes a few different claims, some of which seem more intuitive than others. For example:

  • Claim 1: Labs should control powerful AI systems (very intuitive, even obvious)
  • Claim 2: It is (much?) easier to evaluate control than alignment (somewhat intuitive, particularly if one buys the argument that alignment evals are very hard because you always have the "what if the AI is just scheming and subverting the eval" concern.) 
  • Claim 3: We (Buck and Ryan) have identified some specific techniques that we believe are promising for control (not intuitive; technical arguments needed).
  • Claim 4: If we succeed in controlling "transformatively useful AI", then we may be able to stop the race toward superintelligence and get AI labs or governments to agree to not build superintelligence until it can be controlled (not intuitive; non-technical arguments needed).

I'd be excited to see more posts that specifically engage with the strongest counterpoints to claims #2-4. 

Some more on #2 & #4:

The basic problem with evaluating alignment is that no matter what behaviors you observe, you have to worry that your model is just acting that way in order to make you think that it is aligned.

I think those pessimistic about control evals could say something like "the basic problem with evaluating control is that no matter what techniques your red-team uses, you have to worry that your model is better at finding attacks than your red-team." Of course, you note in the post some reason why we should expect our red-team to have advantages over models, but also you recognize that this won't scale toward arbitrarily powerful AIs.

In some ways, this feels analogous to the following situation:

  • Alice from the Scaling Department: Look, our current "alignment" techniques are working for GPT-4. So let's scale up more! I get that at some point we won't be able to align the AI, but we're not there yet, so we should be able to keep going.
  • Bob from the Preparedness Team: Well, the problem is we don't know when our alignment techniques will stop working... like, it's quite plausible that the next AI system will be able to game our evals.
  • Alice from the Scaling Department: OK, well let me know when you have clear evidence that our alignment techniques or dangerous capability evals will fail.

Here's the analogy for control:

  • Alice from the Scaling Department: Look, our current control techniques are working for GPT-4. So let's scale up more! I get that at some point our control evals aren't going to be able to control arbitrarily powerful AI, but we're not there yet, so we should be able to keep going.
  • Bob from the Preparedness Team: Well, the problem is we don't know when our control evals will stop working... like, it's quite plausible that the next AI system will be better at coming up with attacks than our red-team.
  • Alice from the Scaling Department: OK, well let me know when you have clear evidence that our control evals will fail.

I'd be curious to hear more about how you're thinking about this (and apologies if some sections of the post already deal with this– feel free to quote them if I missed them in my initial skim). Specific questions:

  1. How do you react to the analogy above? Do you expect that control evals will do a better job than other types of evals at determining when we have reached the point where scaling further is (directly/imminently) dangerous? Or more simply, will it be clear when we need to stop?
  2. More broadly, what do you think we should do once we're starting to see AI systems that are powerful enough to subvert the control evals (or develop attacks that are better than those of the human red teamers)?
  3. Suppose control evals do allow us to stop at "transformatively useful systems that do not cause a catastrophe." Roughly what kinds of capabilities do you expect this system to have? (My vague MIRI sim says something like "either the system is too weak to get us out of the critical risk window or the system is so strong that it's capable of subverting your control evals"). 
Comment by Akash (akash-wasil) on TurnTrout's shortform feed · 2024-01-20T11:56:27.137Z · LW · GW

More broadly, TurnTrout, I've noticed you using this whole "look, if something positive happened, LW would totally rip on it! But if something is presented negatively, everyone loves it!" line of reasoning a few times (e.g., I think this logic came up in your comment about Evan's recent paper). And I sort of see you taking on some sort of "the people with high P(doom) just have bad epistemics" flag in some of your comments.

A few thoughts (written quickly, prioritizing speed over precision):

  1. I think that epistemics are hard & there are surely several cases in which people are biased toward high P(doom). Examples: Yudkowsky was one of the first thinkers/writers about AI, some people might have emotional dispositions that lead them toward anxious/negative interpretations in general, some people find it "cool" to think they're one of the few people who are able to accurately identify the world is ending, etc.
  2. I also think that there are plenty of factors biasing epistemics in the "hopeful" direction. Examples: The AI labs have tons of money and status (& employ large fractions of the community's talent), some people might have emotional dispositions that lead them toward overly optimistic/rosy interpretations in general, some people might find it psychologically difficult to accept premises that lead them to think the world is ending, etc.
  3. My impression (which could be false) is that you seem to be exclusively or disproportionately critical of poor arguments when they come from the "high P(doom)" side. 
  4. I also think there's an important distinction between "I personally think this argument is wrong" and "look, here's an example of propaganda + poor community epistemics." In general, I suspect community epistemics are better when people tend to respond directly to object-level points and have a relatively high bar for saying "not only do I think you're wrong, but also here are some ways in which you and your allies have poor epistemics." (IDK though, insofar as you actually believe that's what's happening, it seems good to say aloud, and I think there's a version of this that goes too far and polices speech reproductively, but I do think that statements like "community epistemics have been compromised by groupthink and fear" are pretty unproductive and could be met with statements like "community epistemics have been compromised by powerful billion-dollar companies that have clear financial incentives to make people overly optimistic about the trajectory of AI progress." 
  5. I am quite worried about tribal dynamics reducing the ability for people to engage in productive truth-seeking discussions. I think you've pointed out how some of the stylistic/tonal things from the "high P(doom)//alignment hard" side have historically made discourse harder, and I agree with several of your critiques. More recently, though, I think that the "low P(doom)//alignment not hard" side seem to be falling into similar traps (e.g., attacking strawmen of those they disagree with, engaging some sort of "ha, the other side is not only wrong but also just dumb/unreasonable/epistemically corrupted" vibe that predictably makes people defensive & makes discourse harder.
Comment by Akash (akash-wasil) on TurnTrout's shortform feed · 2024-01-20T11:27:06.191Z · LW · GW

My impression is that the Shoggath meme was meant to be a simple meme that says "hey, you might think that RLHF 'actually' makes models do what we value, but that's not true. You're still left with an alien creature who you don't understand and could be quite scary."

Most of the Shoggath memes I've seen look more like this, where the disgusting/evil aspects are toned down. They depict an alien that kinda looks like an octopus. I do agree that the picture evokes some sort of "I should be scared/concerned" reaction. But I don't think it does so in a "see, AI will definitely be evil" way– it does so in a "look, RLHF just adds a smiley face to a foreign alien thing. And yeah, it's pretty reasonable to be scared about this foreign alien thing that we don't understand."

To be a bit bolder, I think Shoggath is reacting to the fact that RLHF gives off a misleading impression of how safe AI is. If I were to use proactive phrasing, I could say that RLHF serves as "propaganda". Let's put aside the fact that you and I might disagree about how much "true evidence" RLHF provides RE how easy alignment will be. It seems pretty clear to me that RLHF [and the subsequent deployment of RLHF'd models] spreads an overly-rosy "meme" that gives people a misleading perspective of how well we understand AI systems, how safe AI progress is, etc.

From this lens, I see Shoggath as a counter-meme. It basically says "hey look, the default is for people to think that these things are friendly assistants, because that's what the AI companies have turned them into, but we should remember that actually we are quite confused about the alien cognition behind the RLHF smiley face."

Comment by Akash (akash-wasil) on The Plan - 2023 Version · 2024-01-02T14:38:52.253Z · LW · GW

If the timer starts to run out, then slap something together based on the best understanding we have. 18-24 months is about how long I expect it to take to slap something together based on the best understanding we have.

Can you say more about what you expect to be doing after you have slapped together your favorite plans/recommendations? I'm interested in getting a more concrete understanding of how you see your research (eventually) getting implemented.

Suppose after the 18-24 month process, you have 1-5 concrete suggestions that you want AGI developers to implement. Is the idea essentially that you would go to the superalignment team (and the equivalents at other labs) and say "hi, here's my argument for why you should do X?" What kinds of implementation-related problems, if any, do you see coming up?

I ask this partially because I think some people are kinda like "well, in order to do alignment research that ends up being relevant, I need to work at one of the big scaling labs in order to understand the frames/ontologies of people at the labs, the constraints/restrictions that would come up if trying to implement certain ideas, get better models of the cultures of labs to see what ideas will simply be dismissed immediately, identify cruxes, figure out who actually makes decisions about what kinds of alignment ideas will end up being used for GPT-N, etc etc."

My guess is that you would generally encourage people to not do this, because they generally won't have as much research freedom & therefore won't be able to work on core parts of the problem that you see as neglected. I suspect many would agree that there is some "I lose freedom" cost, but that this might be outweighed by the "I get better models of what kind of research labs are actually likely to implement" benefit, and I'm curious how you view this trade-off (or if you don't even see this as a legitimate trade-off). 

Comment by Akash (akash-wasil) on The Plan - 2023 Version · 2023-12-30T11:24:28.772Z · LW · GW

Does the "rater problem" (raters have systematic errors) simply apply to step one in this plan? I agree that once you have a perfect reward model, you no longer need human raters.

But it seems like the "rater problem" still applies if we're going to train the reward model using human feedback. Perhaps I'm too anchored to thinking about things in an RLHF context, but it seems like at some point in the process we need to have some way of saying "this is true" or "this chain-of-thought is deceptive" that involves human raters. 

Is the idea something like:

  • Eliezer: Human raters make systematic errors
  • OpenAI: Yes, but this is only a problem if we have human raters indefinitely provide feedback. If human raters are expected to provide feedback on 10,000,000 responses under time-pressure, then surely they will make systematic errors.
  • OpenAI: But suppose we could train a reward model on a modest number of responses and we didn't have time-pressure. For this dataset of, say, 10,000 responses, we are super careful, we get a ton of people to double-check that everything is accurate, and we are nearly certain that every single label is correct. If we train a reward model on this dataset, and we can get it to generalize properly, then we can get past the "humans make systematic errors" problem.

Or am I totally off//the idea is different than this//the "yet-to-be-worked-out-techniques" would involve getting the reward model to learn stuff without ever needing feedback from human raters?

Comment by Akash (akash-wasil) on NYT is suing OpenAI&Microsoft for alleged copyright infringement; some quick thoughts · 2023-12-29T08:57:53.380Z · LW · GW

I'd also be curious to know why (some) people downvoted this.

Perhaps it's because you imply that some OpenAI folks were captured, and maybe some people think that that's unwarranted in this case?

Sadly, the more-likely explanation (IMO) is that policy discussions can easily become tribal, even on LessWrong.

I think LW still does better than most places at rewarding discourse that's thoughtful/thought-provoking and resisting tribal impulses, but I wouldn't be surprised if some people were doing something like "ah he is saying something Against AI Labs//Pro-regulation, and that is bad under my worldview, therefore downvote."

(And I also think this happens the other way around as well, and I'm sure people who write things that are "pro AI labs//anti-regulation" are sometimes unfairly downvoted by people in the opposite tribe.)

Comment by Akash (akash-wasil) on EU policymakers reach an agreement on the AI Act · 2023-12-21T14:39:03.530Z · LW · GW

I appreciate the comment, though I think there's a lack of specificity that makes it hard to figure out where we agree/disagree (or more generally what you believe).

If you want to engage further, here are some things I'd be excited to hear from you:

  • What are a few specific comms/advocacy opportunities you're excited about//have funded?
  • What are a few specific comms/advocacy opportunities you view as net negative//have actively decided not to fund?
  • What are a few examples of hypothetical comms/advocacy opportunities you've been excited about?
  • What do you think about EG Max Tegmark/FLI, Andrea Miotti/Control AI, The Future Society, the Center for AI Policy, Holly Elmore, PauseAI, and other specific individuals or groups that are engaging in AI comms or advocacy? 

I think if you (and others at OP) are interested in receiving more critiques or overall feedback on your approach, one thing that would be helpful is writing up your current models/reasoning on comms/advocacy topics.

In the absence of this, people simply notice that OP doesn't seem to be funding some of the main existing examples of comms/advocacy efforts, but they don't really know why, and they don't really know what kinds of comms/advocacy efforts you'd be excited about.

Comment by Akash (akash-wasil) on OpenAI: Preparedness framework · 2023-12-19T20:32:52.308Z · LW · GW

They mention three types of mitigations:

  • Asset protection (e.g., restricting access to models to a limited nameset of people, general infosec)
  • Restricting deployment (only models with a risk score of "medium" or below can be deployed)
  • Restricting development (models with a risk score of "critical" cannot be developed further until safety techniques have been applied that get it down to "high." Although they kind of get to decide when they think their safety techniques have worked sufficiently well.)

My one-sentence reaction after reading the doc for the first time is something like "it doesn't really tell us how OpenAI plans to address the misalignment risks that many of us are concerned about, but with that in mind, it's actually a fairly reasonable document with some fairly concrete commitments"). 

Comment by Akash (akash-wasil) on OpenAI: Preparedness framework · 2023-12-19T04:44:20.687Z · LW · GW

I believe labels matter, and I believe the label "preparedness framework" is better than the label "responsible scaling policy." Kudos to OpenAI on this. I hope we move past the RSP label.

I think labels will matter most when communicating to people who are not following the discussion closely (e.g., tech policy folks who have a portfolio of 5+ different issues and are not reading the RSPs or PFs in great detail).

One thing I like about the label "preparedness framework" is that it begs the question "prepared for what?", which is exactly the kind of question I want policy people to be asking. PFs imply that there might be something scary that we are trying to prepare for. 

Comment by Akash (akash-wasil) on EU policymakers reach an agreement on the AI Act · 2023-12-15T23:45:51.921Z · LW · GW

Thanks for this overview, Trevor. I expect it'll be helpful– I also agree with your recommendations for people to consider working at standard-setting organizations and other relevant EU offices.

One perspective that I see missing from this post is what I'll call the advocacy/comms/politics perspective. Some examples of this with the EU AI Act:

  • Foundation models were going to be included in the EU AI Act, until France and Germany (with lobbying pressure from Mistral and Aleph Alpha) changed their position.
  • This initiated a political/comms battle between those who wanted to exclude foundation models (led by France and Germany) and those who wanted to keep it in (led by Spain).
  • This political fight rallied lots of notable figures, including folks like Gary Marcus and Max Tegmark, to publicly and privately fight to keep foundation models in the act.
  • There were open letters, op-eds, and certainly many private attempts at advocacy.
  • There were attempts to influence public opinion, pieces that accused key lobbyists of lying, and a lot of discourse on Twitter.

It's difficult to know the impact of any given public comms campaign, but it seems quite plausible to me that many readers would have more marginal impact by focusing on advocacy/comms than focusing on research/policy development.

More broadly, I worry that many segments of the AI governance/policy community might be neglecting to think seriously about what ambitious comms/advocacy could look like in the space.

I'll note that I might be particularly primed to bring this up now that you work for Open Philanthropy. I think many folks (rightfully) critique Open Phil for being too wary of advocacy, campaigns, lobbying, and other policymaker-focused activities. I'm guessing that Open Phil has played an important role in shaping both the financial and cultural incentives that (in my view) leads to an overinvestment into research and an underinvestment into policy/advocacy/comms. 

(I'll acknowledge these critiques are pretty high-level and I don't claim that this comment provides compelling evidence for them. Also, you only recently joined Open Phil, so I'm of course not trying to suggest that you created this culture, though I guess now that you work there you might have some opportunities to change it).

I'll now briefly try to do a Very Hard thing which is like "put myself in Trevor's shoes and ask what I actually want him to do." One concrete recommendation I have is something like "try to spend at least 5 minutes thinking about ways in which you or others around you might be embedded in a culture that has blind spots to some of the comms/advocacy stuff." Another is "make a list of people you read actively or talked to when writing this post. Then ask if there were any other people/orgs you could've reached out, particularly those that might focus more on comms+adovacy". (Also, to be clear, you might do both of these things and conclude "yea, actually I think my approach was very solid and I just had Good Reasons for writing the post the way I did.")

I'll stop here since this comment is getting long, but I'd be happy to chat further about this stuff. Thanks again for writing the post and kudos to OP for any of the work they supported/will support that ends up increasing P(good EU AI Act goes through & gets implemented). 

Comment by Akash (akash-wasil) on Language models seem to be much better than humans at next-token prediction · 2023-12-13T18:40:35.486Z · LW · GW

What are some examples of situations in which you refer to this point?

Comment by Akash (akash-wasil) on Speaking to Congressional staffers about AI risk · 2023-12-07T00:01:33.817Z · LW · GW

My own model differs a bit from Zach's. It seems to me like most of the publicly-available policy proposals have not gotten much more concrete. It feels a lot more like people were motivated to share existing thoughts, as opposed to people having new thoughts or having more concrete thoughts.

Luke's list, for example, is more of a "list of high-level ideas" than a "list of concrete policy proposals." It has things like "licensing" and "information security requirements"– it's not an actual bill or set of requirements. (And to be clear, I still like Luke's post and it's clear that he wasn't trying to be super concrete).

I'd be excited for people to take policy ideas and concretize them further. 

Aside:  When I say "concrete" in this context, I don't quite mean "people on LW would think this is specific." I mean "this is closer to bill text, text of a section of an executive order, text of an amendment to a bill, text of an international treaty, etc."

I think there are a lot of reasons why we haven't seen much "concrete policy stuff". Here are a few:

  • This work is just very difficult– it's much easier to hide behind vagueness when you're writing an academic-style paper than when you're writing a concrete policy proposal.
  • This work requires people to express themselves with more certainty/concreteness than academic-style research. In a paper, you can avoid giving concrete recommendations, or you can give a recommendation and then immediately mention 3-5 crucial considerations that could change the calculus. In bills, you basically just say "here is what's going to happen" and do much less "and here are the assumptions that go into this and a bunch of ways this could be wrong."
  • This work forces people to engage with questions that are less "intellectually interesting" to many people (e.g., which government agency should be tasked with X, how exactly are we going to operationalize Y?)
  • This work just has a different "vibe" to the more LW-style research and the more academic-style research. Insofar as LW readers are selected for (and reinforced for) liking a certain "kind" of thinking/writing, this "kind" of thinking/writing is different than the concrete policy vibe in a bunch of hard-to-articulate ways.
  • This work often has the potential to be more consequential than academic-style research. There are clear downsides of developing [and advocating for] concrete policies that are bad. Without any gatekeeping, you might have a bunch of newbies writing flawed bills. With excessive gatekeeping, you might create a culture that disincentivizes intelligent people from writing good bills. (And my own subjective impression is that the community erred too far on the latter side, but I think reasonable people could disagree here).

For people interested in developing the kinds of proposals I'm talking about, I'd be happy to chat. I'm aware of a couple of groups doing the kind of policy thinking that I would consider "concrete", and it's quite plausible that we'll see more groups shift toward this over time.  

Comment by Akash (akash-wasil) on Speaking to Congressional staffers about AI risk · 2023-12-05T21:06:30.952Z · LW · GW

Thanks for all of this! Here's a response to your point about committees.

I agree that the committee process is extremely important. It's especially important if you're trying to push forward specific legislation. 

For people who aren't familiar with committees or why they're important, here's a quick summary of my current understanding (there may be a few mistakes):

  1. When a bill gets introduced in the House or the Senate, it gets sent to a committee. The decision is made by the Speaker of the House or the priding officer in the Senate. In practice, however, they often defer to a non-partisan "parliamentarian" who specializes in figuring out which committee would be most appropriate. My impression is that this process is actually pretty legitimate and non-partisan in most cases(?). 
  2. It takes some degree of skill to be able to predict which committee(s) a bill is most likely to be referred to. Some bills are obvious (like an agriculture bill will go to an agriculture committee). In my opinion, artificial intelligence bills are often harder to predict. There is obviously no "AI committee", and AI stuff can be argued to affect multiple areas. With all that in mind, I think it's not too hard to narrow things down to ~1-3 likely committees in the House and ~1-3 likely committees in the Senate.
  3. The most influential person in the committee is the committee chair. The committee chair is the highest-ranking member from the majority party (so in the House, all the committee chairs are currently Republicans; in the Senate, all the committee chairs are currently Democrats). 
  4. A bill cannot be brought to the House floor or the Senate floor (cannot be properly debated or voted on) until it has gone through committee. The committee is responsible for finalizing the text of the bill and then voting on whether or not they want the bill to advance to the chamber (House or Senate). 
  5. The committee chair typically has a lot of influence over the committee. The committee chair determines which bills get discussed in committee, for how long, etc. Also, committee chairs usually have a lot of "soft power"– members of Congress want to be in good standing with committee chairs. This means that committee chairs often have the ability to prevent certain legislation from getting out of committee.
  6. If you're trying to get legislation passed, it's ideal to have the committee chair think favorably of that piece of legislation. 
  7. It's also important to have at least one person on the committee as someone who is willing to "champion" the bill. This means they view the bill as a priority & be willing to say "hey, committee, I really think we should be talking about bill X." A lot of bills die in committee because they were simply never prioritized. 
  8. If the committee chair brings the bill to a vote, and the majority of committee members vote in favor of the bill moving to the chamber, the bill can be discussed in the full chamber. Party leadership (Speaker of the House, Senate Majority Leader, etc.) typically play the most influential role in deciding which bills get discussed or voted on in the chambers. 
  9. Sometimes, bills get referred to multiple committees. This generally seems like "bad news" from the perspective of getting the bill passed, because it means that the bill has to get out of multiple committees. (Any single committee could essentially prevent the bill from being discussed in the chamber). 

(If any readers are familiar with the committee process, please feel free to add more info or correct me if I've said anything inaccurate.)

Comment by Akash (akash-wasil) on Speaking to Congressional staffers about AI risk · 2023-12-05T20:31:25.928Z · LW · GW

WTF do people "in AI governance" do?

Quick answer:

  1. A lot of AI governance folks primarily do research. They rarely engage with policymakers directly, and they spend much of their time reading and writing papers.
  2. This was even more true before the release of GPT-4 and the recent wave of interest in AI policy. Before GPT-4, many people believed "you will look weird/crazy if you talk to policymakers about AI extinction risk." It's unclear to me how true this was (in a genuine "I am confused about this & don't think I have good models of this" way). Regardless, there has been an update toward talking to policymakers about AI risk now that AI risk is a bit more mainstream. 
  3. My own opinion is that, even after this update toward policymaker engagement, the community as a whole is still probably overinvested in research and underinvested in policymaker engagement/outreach. (Of course, the two can be complimentary, and the best outreach will often be done by people who have good models of what needs to be done & can present high-quality answers to the questions that policymakers have). 
  4. Among the people who do outreach/policymaker engagement, my impression is that there has been more focus on the executive branch (and less on Congress/congressional staffers). The main advantage is that the executive branch can get things done more quickly than Congress. The main disadvantage is that Congress is often required (or highly desired) to make "big things" happen (e.g., setting up a new agency or a licensing regime).
Comment by Akash (akash-wasil) on MATS Summer 2023 Retrospective · 2023-12-04T19:59:10.477Z · LW · GW

I would also suspect that #2 (finding/generating good researchers) is more valuable than #1 (generating or accelerating good research during the MATS program itself).

One problem with #2 is that it's usually harder to evaluate and takes longer to evaluate. #2 requires projections, often over the course of years. #1 is still difficult to evaluate (what is "good alignment research" anyways?) but seems easier in comparison.

Also, I would expect correlations between #1 and #2. Like, one way to evaluate "how good are we doing at training researchers//who are the best researchers" is to ask "how good is the research they are producing//who produced the best research in this 3-month period?"

This process is (of course) imperfect. For example, someone might have great output because their mentor handed them a bunch of ready-to-go-projects, but the scholar didn't actually have to learn the important skills of "forming novel ideas" or "figuring out how to prioritize between many different directions." 

But in general, I think it's a pretty decent way to evaluate things. If someone has produced high-quality and original research during the MATS program, that sure does seem like a strong signal for their future potential. Likewise, in the opposite extreme, if during the entire summer cohort there were 0 instances of useful original work, that doesn't necessarily mean something is wrong, but it would make me go "hmmm, maybe we should brainstorm possible changes to the program that could make it more likely that we see high-quality original output next time, and then we see how much those proposed changes trade-off against other desireada."

(It seems quite likely to me that the MATS team has already considered all of this; just responding on the off-chance that something here is useful!)

Comment by Akash (akash-wasil) on MATS Summer 2023 Retrospective · 2023-12-03T06:28:52.341Z · LW · GW

Thanks for writing this! I’m curious if you have any information about the following questions:

  1. What does the MATS team think are the most valuable research outputs from the program?

  2. Which scholars was the MATS team most excited about in terms of their future plans/work?

IMO, these are the two main ways I would expect MATS to have impact: research output during the program and future research output/career trajectories of scholars.

Furthermore, I’d suspect things to be fairly tails-based (where EG the top 1-3 research outputs and the top 1-5 scholars are responsible for most of the impact).

Perhaps MATS as a program feels weird about ranking output or scholars so explicitly, or feels like it’s not their place.

But I think this kind of information seems extremely valuable. If I were considering whether or not I wanted to donate, for instance, my main questions would be “is the research good?” and “is the career development producing impactful people?” (as opposed to things like “what is the average rating on the EOY survey?”, though of course that information may matter for other purposes).

Comment by Akash (akash-wasil) on Deception Chess: Game #2 · 2023-12-03T02:39:14.050Z · LW · GW

I'd expect the amount of time this all takes to be a function of the time-control.

Like, if I have 90 mins, I can allocate more time to all of this. I can consult each of my advisors at every move. I can ask them follow-up questions.

If I only have 20 mins, I need to be more selective. Maybe I only listen to my advisors during critical moves, and I evaluate their arguments more quickly. Also, this inevitably affects the kinds of arguments that the advisors give.

Both of these scenarios seem pretty interesting and AI-relevant. My all-things-considered guess would be that the 20 mins version yields high enough quality data (particularly for the parts of the game that are most critical/interesting & where the debate is most lively) that it's worth it to try with shorter time controls.

(Epistemic status: Thought about this for 5 mins; just vibing; very plausibly underestimating how time pressure could make the debates meaningless).

Comment by Akash (akash-wasil) on Deception Chess: Game #2 · 2023-12-03T00:24:58.348Z · LW · GW

Is there a reason you’re using 3 hour time control? I’m guessing you’ve thought about this more than I have, but at first glance, it feels to me like this could be done pretty well with EG 60-min or even 20-min time control.

I’d guess that having 4-6 games that last 20-30 mins gives is better than having 1 game that lasts 2 hours.

(Maybe I’m underestimating how much time it takes for the players to give/receive advice. And ofc there are questions about the actual situations with AGI that we’re concerned about— EG to what extent do we expect time pressure to be a relevant factor when humans are trying to evaluate arguments from AIs?)

Comment by Akash (akash-wasil) on Integrity in AI Governance and Advocacy · 2023-11-19T20:04:22.052Z · LW · GW

Thanks! And why did Holden have the ability to choose board members (and be on the board in the first place)?

I remember hearing that this was in exchange for OP investment into OpenAI, but I also remember Dustin claiming that OpenAI didn’t actually need any OP money (would’ve just gotten the money easily from another investor).

Is your model essentially that the OpenAI folks just got along with Holden and thought he/OP were reasonable, or is there a different reason Holden ended up having so much influence over the board?

Comment by Akash (akash-wasil) on Integrity in AI Governance and Advocacy · 2023-11-19T10:54:43.798Z · LW · GW

Clarification/history question: How were these board members chosen?

Comment by Akash (akash-wasil) on How much to update on recent AI governance moves? · 2023-11-17T11:28:20.311Z · LW · GW

Thanks for this dialogue. I find Nate and Oliver's "here's what I think will actually happen" thoughts useful.

I also think I'd find it useful for Nate to spell out "conditional on good things happening, here's what I think the steps look like, and here's the kind of work that I think people should be doing right now. To be clear, I think this is all doomed, and I'm only saying this being Akash directly asked me to condition on worlds where things go well, so here's my best shot."

To be clear, I think some people do too much "play to your outs" reasoning. In the excess, this can lead to people just being like "well maybe all we need to do is beat China" or "maybe alignment will be way easier than we feared" or "maybe we just need to bet on worlds where we get a fire alarm for AGI."

I'm particularly curious to see what happens if Nate tries to reason in this frame, especially since I expect his "play to your outs" reasoning/conclusions might look fairly different from that of others in the community.

Some examples of questions for Nate (and others who have written more about what they actually expect to happen and less about what happens if we condition on things going well):

  • Condition on the worlds in which we see substantial progress in the next 6 months. What are some things that have happened in those worlds? What does progress look like?
  • Condition on worlds in which the actions of the AIS community end up having a strong positive influence in the next 6 months. What are some wins that the AIS community (or specific actors within it) achieve?
  • Suppose for the sake of this conversation that you are fully adopting a "play to your outs" mentality. What outs do you see? Regardless of the absolute probabilities you assign, which of these outs seem most likely and most promising?
  • All things considered, what do you currently see as the most impactful ways you can spend your time?
  • All things considered, what do you currently see as the most impactful ways that "highly talented comms/governance/policy people can be spending their time?" (can divide into more specific subgroups if useful). 

I'll also note that I'd be open to having a dialogue about this with Nate (and possibly other "doomy" people who have not written up their "play to your outs" thoughts).

Comment by Akash (akash-wasil) on Survey on the acceleration risks of our new RFPs to study LLM capabilities · 2023-11-11T15:01:29.064Z · LW · GW

Thanks for sharing this! I'm curious if you have any takes on Nate's comment or Oliver's comment:

Nate:

I don't think we have any workable plan for reacting to the realization that dangerous capabilities are upon us. I think that when we get there, we'll predictably either (a) optimize against our transparency tools or otherwise walk right of the cliff-edge anyway, or (b) realize that we're in deep trouble, and slow way down and take some other route to the glorious transhumanist future (we might need to go all the way to WBE, or at least dramatically switch optimization paradigms). 

Insofar as this is true, I'd much rather see efforts go _now_ into putting hard limits on capabilities in this paradigm, and booting up alternative paradigms (that aren't supposed to be competitive with scaling, but that are hopefully competitive with what individuals can do on home computers). I could see evals playing a role in that policy (of helping people create sane capability limits and measure whether they're being enforced), but that's not how I expect evals to be used on the mainline.

Oliver:

I have a generally more confident take that slowing things down is good, i.e. don't find arguments that "current humanity is better suited to handle the singularity" very compelling. 

I think I am also more confident that it's good for people to openly and straightforwardly talk about existential risk from AI. 

I am less confident in my answer to the question of "is generic interpretability research cost-effective or even net-positive?". My guess is still yes, but I really feel very uncertain, and feel a bit more robust in my answer to your question than that question.

Comment by Akash (akash-wasil) on Integrity in AI Governance and Advocacy · 2023-11-09T14:33:21.436Z · LW · GW

Adding a datapoint here: I've been involved in the Control AI campaign, which was run by Andrea Miotti (who also works at Conjecture). Before joining, I had heard some integrity/honesty concerns about Conjecture. So when I decided to join, I decided to be on the lookout for any instances of lying/deception/misleadingness/poor integrity. (Sidenote: At the time, I was also wondering whether Control AI was just essentially a vessel to do Conjecture's bidding. I have updated against this– Control AI reflects Andrea's vision. My impression is that Conjecture folks other than Andrea have basically no influence over what Control AI does, unless they convince Andrea to do something.)

I've been impressed by Andrea's integrity and honesty. I was worried that the campaign might have some sort of "how do we win, even if it misleads people" vibe (in which case I would've protested or left), but there was constantly a strong sense of "are we saying things that are true? Are we saying things that we actually believe? Are we communicating clearly?" I was especially impressed given the high volume of content (it is especially hard to avoid saying untrue/misleading things when you are putting out a lot of content at a fast pace.)

In contrast, integrity/honesty/openness norms feel much less strong in DC. When I was in DC, I think it was fairly common to see people "withhold information for strategic purposes", "present a misleading frame (intentionally)", "focus on saying things you think the other person will want to hear", or "decide not to talk at all because sharing beliefs in general could be bad." It's plausible to me that these are "the highest EV move" in some cases, but if we're focusing on honesty/integrity/openness, I think DC scored much worse. (See also Olivia's missing mood point). 

The Bay Area scores well on honesty/integrity IMO, but has its own problems, especially with groupthink/conformity/curiosity-killing. I think the Bay Area tends to do well on honesty/integrity norms (relative to other spaces), but I think these norms are enforced in a way that comes with important tradeoffs. For instance, I think the Bay Area tends to punish people for saying things that are imprecise or "seem dumb", which leads to a lot of groupthink/conformity and a lot of "people just withholding their beliefs so that they don't accidentally say something incorrect and get judged for it." Also, I think high-status people in the Bay Area are often able to "get away with" low openness/transparency/clarity. There are lots of cases where people are like "I believe X because Paul believes X" and then when asked "why does Paul believe X" they're like "idk". This seems like an axis separate from honesty/integrity, but it still leads to pretty icky epistemic discourse. 

(This isn't to say that people shouldn't criticize Conjecture– but I think there's a sad thing that happens where it starts to feel like both "sides" are just trying to criticize each other. My current position is much closer to something like "each of these communities has some relative strengths and weaknesses, and each of them has at least 1-3 critical flaws". Whereas in the status quo I think these discussions sometimes end up feeling like members of tribe A calling tribe B low integrity and then tribe B firing back by saying Tribe A is actually low integrity in an even worse way.) 

Comment by Akash (akash-wasil) on Integrity in AI Governance and Advocacy · 2023-11-07T14:59:08.230Z · LW · GW

I don't think aysja was endorsing "hope" as a strategy– at least, that's not how I read it. I read it as "we should hold leaders accountable and make it clear that we think it's important for people to state their true beliefs about important matters."

To be clear, I think it's reasonable for people to discuss the pros and cons of various advocacy tactics, and I think asking "to what extent do I expect X advocacy tactic will affect peoples' incentives to openly state their beliefs?" makes sense.

Separately, though, I think the "accountability frame" is important. Accountability can involve putting pressure on them to express their true beliefs, pushing back when we suspect people are trying to find excuses to hide their beliefs, and making it clear that we think openness and honesty are important virtues even when they might provoke criticism– perhaps especially when they might provoke criticism. I think this is especially important in the case of lab leaders and others who have clear financial interests or power interests in the current AGI development ecosystem.

It's not about hoping that people are honest– it's about upholding standards of honesty, and recognizing that we have some ability to hold people accountable if we suspect that they're not being honest. 

I would say that I see the main goal of outside-game advocacy work as setting up external incentives in such a way that pushes labs to good things rather than bad things

I'm currently most excited about outside-game advocacy that tries to get governments to implement regulations that make good things happen. I think this technically falls under the umbrella of "controlling the incentives through explicit regulation", but I think it's sufficiently different from outside-game advocacy work that is trying to get labs to do things voluntarily.