Akash's Shortform

akash-wasil

Akash's Shortform

post by Orpheus16 (akash-wasil) · 2024-04-18T15:44:25.096Z · LW · GW · 99 comments

99 comments

99 comments

Comments sorted by top scores.

comment by Orpheus16 (akash-wasil) · 2024-05-18T16:51:27.153Z · LW(p) · GW(p)

My current perspective is that criticism of AGI labs is an under-incentivized public good. I suspect there's a disproportionate amount of value that people could have by evaluating lab plans, publicly criticizing labs when they break commitments or make poor arguments, talking to journalists/policymakers about their concerns, etc.

Some quick thoughts:

Soft power– I think people underestimate the how strong the "soft power" of labs is, particularly in the Bay Area.
Jobs– A large fraction of people getting involved in AI safety are interested in the potential of working for a lab one day. There are some obvious reasons for this– lots of potential impact from being at the organizations literally building AGI, big salaries, lots of prestige, etc.
- People (IMO correctly) perceive that if they acquire a reputation for being critical of labs, their plans, or their leadership, they will essentially sacrifice the ability to work at the labs.
- So you get an equilibrium where the only people making (strong) criticisms of labs are those who have essentially chosen to forgo their potential of working there.
Money– The labs and Open Phil (which has been perceived, IMO correctly, as investing primarily into metastrategies that are aligned with lab interests) have an incredibly large share of the $$$ in the space. When funding became more limited, this became even more true, and I noticed a very tangible shift in the culture & discourse around labs + Open Phil
Status games//reputation– Groups who were more inclined to criticize labs and advocate for public or policymaker outreach were branded as “unilateralist”, “not serious”, and “untrustworthy” in core EA circles. In many cases, there were genuine doubts about these groups, but my impression is that these doubts got amplified/weaponized in cases where the groups were more openly critical of the labs.
Subjectivity of "good judgment"– There is a strong culture of people getting jobs/status for having “good judgment”. This is sensible insofar as we want people with good judgment (who wouldn’t?) but this often ends up being so subjective that it ends up leading to people being quite afraid to voice opinions that go against mainstream views and metastrategies (particularly those endorsed by labs + Open Phil).
Anecdote– Personally, I found my ability to evaluate and critique labs + mainstream metastrategies substantially improved when I spent more time around folks in London and DC (who were less closely tied to the labs). In fairness, I suspect that if I had lived in London or DC *first* and then moved to the Bay Area, it’s plausible I would’ve had a similar feeling but in the “reverse direction”.

With all this in mind, I find myself more deeply appreciating folks who have publicly and openly critiqued labs, even in situations where the cultural and economic incentives to do so were quite weak (relative to staying silent or saying generic positive things about labs).

Examples: Habryka, Rob Bensinger, CAIS, MIRI, Conjecture, and FLI. More recently, @Zach Stein-Perlman [LW · GW], and of course Jan Leike and Daniel K.

Replies from: Zach Stein-Perlman

↑ comment by Zach Stein-Perlman · 2024-05-18T17:00:08.975Z · LW(p) · GW(p)

Sorry for brevity, I'm busy right now.

Noticing good stuff labs do, not just criticizing them, is often helpful. I wish you thought of this work more as "evaluation" than "criticism."
It's often important for evaluation to be quite truth-tracking. Criticism isn't obviously good by default.

Edit:

3. I'm pretty sure OP likes good criticism of the labs; no comment on how OP is perceived. And I think I don't understand your "good judgment" point. Feedback I've gotten on AI Lab Watch from senior AI safety people has been overwhelmingly positive, and of course there's a selection effect in what I hear, but I'm quite sure most of them support such efforts.

4. Conjecture (not exclusively) has done things that frustrated me, including in dimensions like being "'unilateralist,' 'not serious,' and 'untrustworthy.'" I think most criticism of Conjecture-related advocacy is legitimate and not just because people are opposed to criticizing labs.

5. I do agree on "soft power" and some of "jobs." People often don't criticize the labs publicly because they're worried about negative effects on them, their org, or people associated with them.

Replies from: akash-wasil

↑ comment by Orpheus16 (akash-wasil) · 2024-05-18T17:30:07.740Z · LW(p) · GW(p)

RE 1& 2:

Agreed— my main point here is that the marketplace of ideas undervalues criticism.

I think one perspective could be “we should all just aim to do objective truth-seeking”, and as stated I agree with it.

The main issue with that frame, imo, is that it’s very easy to forget that the epistemic environment can be tilted in favor of certain perspectives.

EG I think it can be useful for “objective truth-seeking efforts” to be aware of some of the culture/status games that underincentivize criticism of labs & amplify lab-friendly perspectives.

RE 3:

Good to hear that responses have been positive to lab watch. My impression is that this is a mix of: (a) lab watch doesn’t really threaten the interests of labs (especially Anthropic, which is currently winning & currently the favorite lab among senior AIS ppl), (b) the tides have been shifting somewhat and it is genuinely less taboo to criticize labs than a year ago, and (c) EAs respond more positively to criticism that feels more detailed/nuanced (look I have these 10 categories, let’s rate the labs on each dimension) than criticisms that are more about metastrategy (e.g., challenging the entire RSP frame or advocating for policymaker outreach).

RE 4: I haven’t heard anything about Conjecture that I’ve found particularly concerning. Would be interested in you clarifying (either here or via DM) what you’ve heard. (And clarification note that my original point was less “Conjecture hasn’t done anything wrong” and more “I suspect Conjecture will be more heavily scrutinized and examined and have a disproportionate amount of optimization pressure applied against it given its clear push for things that would hurt lab interests.”)

comment by Orpheus16 (akash-wasil) · 2024-04-18T15:44:25.830Z · LW(p) · GW(p)

I think now is a good time for people at labs to seriously consider quitting & getting involved in government/policy efforts.

I don't think everyone should leave labs (obviously). But I would probably hit a button that does something like "everyone at a lab governance team and many technical researchers spend at least 2 hours thinking/writing about alternative options they have & very seriously consider leaving."

My impression is that lab governance is much less tractable (lab folks have already thought a lot more about AGI) and less promising (competitive pressures are dominating) than government-focused work.

I think governments still remain unsure about what to do, and there's a lot of potential for folks like Daniel K to have a meaningful role in shaping policy, helping natsec folks understand specific threat models, and raising awareness about the specific kinds of things governments need to do in order to mitigate risks.

There may be specific opportunities at labs that are very high-impact, but I think if someone at a lab is "not really sure if what they're doing is making a big difference", I would probably hit a button that allocates them toward government work or government-focused comms work.

Written on a Slack channel in response to discussions about some folks leaving OpenAI.

Replies from: alexander-gietelink-oldenziel, davekasten

↑ comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-04-18T17:25:17.512Z · LW(p) · GW(p)

I'd be worried about evaporative cooling. It seems that the net result of this would be that labs would be almost completely devoid of people earnest about safety.

I agree with you government pathways to impact are most plausible and until recently undervalued. I also agree with you there are weird competitive pressures at labs.

Replies from: akash-wasil

↑ comment by Orpheus16 (akash-wasil) · 2024-04-19T00:18:28.934Z · LW(p) · GW(p)

I do think evaporative cooling is a concern, especially if everyone (or a very significant amount) of people left. But I think on the margin more people should be leaving to work in govt.

I also suspect that a lot of systemic incentives will keep a greater-than-optimal proportion of safety-conscious people at labs as opposed to governments (labs pay more, labs are faster and have less bureaucracy, lab people are much more informed about AI, labs are more "cool/fun/fast-paced", lots of govt jobs force you to move locations, etc.)

I also think it depends on the specific lab– EG in light of the recent OpenAI departures, I suspect there's a stronger case for staying at OpenAI right now than for DeepMind or Anthropic.

↑ comment by davekasten · 2024-04-18T16:42:56.807Z · LW(p) · GW(p)

I largely agree, but think given government hiring timelines, there's no dishonor in staying at a lab doing moderately risk-reducing work until you get a hiring offer with an actual start date. This problem is less bad for the special hiring authorities being used for AI stuff oftentimes, but it's still not ideal.

comment by Orpheus16 (akash-wasil) · 2024-11-20T16:58:27.334Z · LW(p) · GW(p)

Suppose the US government pursued a "Manhattan Project for AGI". At its onset, it's primarily fuelled by a desire to beat China to AGI. However, there's some chance that its motivation shifts over time (e.g., if the government ends up thinking that misalignment risks are a big deal, its approach to AGI might change.)

Do you think this would be (a) better than the current situation, (b) worse than the current situation, or (c) it depends on XYZ factors?

Replies from: akash-wasil, habryka4, ricraz, Joe_Collman, trevor, Seth Herd, davekasten, akash-wasil, sohaib-imran, Phib, milosal

↑ comment by Orpheus16 (akash-wasil) · 2024-11-20T17:13:29.705Z · LW(p) · GW(p)

My own impression is that this would be an improvement over the status quo. Main reasons:

A lot of my P(doom) comes from race dynamics.
Right now, if a leading lab ends up realizing that misalignment risks are super concerning, they can't do much to end the race. Their main strategy would be to go to the USG.
If the USG runs the Manhattan Project (or there's some sort of soft nationalization in which the government ends up having a much stronger role), it's much easier for the USG to see that misalignment risks are concerning & to do something about it.
A national project would be more able to slow down and pursue various kinds of international agreements (the national project has more access to POTUS, DoD, NSC, Congress, etc.)
I expect the USG to be stricter on various security standards. It seems more likely to me that the USG would EG demand a lot of security requirements to prevent model weights or algorithmic insights from leaking to China. One of my major concerns is that people will want to pause at GPT-X but they won't feel able to because China stole access to GPT-Xminus1 (or maybe even a slightly weaker version of GPT-X).
In general, I feel like USG natsec folks are less "move fast and break things" than folks in SF. While I do think some of the AGI companies have tried to be less "move fast and break things" than the average company, I think corporate race dynamics & the general cultural forces have been the dominant factors and undermined a lot of attempts at meaningful corporate governance.

(Caveat that even though I see this as a likely improvement over status quo, this doesn't mean I think this is the best thing to be advocating for.)

(Second caveat that I haven't thought about this particular question very much and I could definitely be wrong & see a lot of reasonable counterarguments.)

Replies from: davekasten, o-o

↑ comment by davekasten · 2024-11-20T22:13:13.935Z · LW(p) · GW(p)

As you know, I have huge respect for USG natsec folks. But there are (at least!) two flavors of them: 1) the cautious, measure-twice-cut-once sort that have carefully managed deterrencefor decades, and 2) the "fuck you, I'm doing Iran-Contra" folks. Which do you expect will get in control of such a program ? It's not immediately clear to me which ones would.

Replies from: akash-wasil

↑ comment by Orpheus16 (akash-wasil) · 2024-11-22T15:51:16.719Z · LW(p) · GW(p)

@davekasten [LW · GW] I know you posed this question to us, but I'll throw it back on you :) what's your best-guess answer?

Or perhaps put differently: What do you think are the factors that typically influence whether the cautious folks or the non-cautious folks end up in charge? Are there any historical or recent examples of these camps fighting for power over an important operation?

↑ comment by O O (o-o) · 2024-11-20T18:38:58.477Z · LW(p) · GW(p)

Why is the built-in assumption for almost every single post on this site that alignment is impossible and we need a 100 year international ban to survive? This does not seem particularly intellectually honest to me. It is very possible no international agreement is needed. Alignment may turn out to be quite tractable.

Replies from: Vladimir_Nesov, MondSemmel, Seth Herd

↑ comment by Vladimir_Nesov · 2024-11-20T18:51:18.063Z · LW(p) · GW(p)

A mere 5% chance that the plane will crash during your flight is consistent with considering this extremely concerning and doing anything in your power to avoid getting on it. "Alignment is impossible" is not necessary for great concern, isn't implied by great concern.

Replies from: ricraz

↑ comment by Richard_Ngo (ricraz) · 2024-11-20T23:41:16.016Z · LW(p) · GW(p)

I don't think this line of argument is a good one. If there's a 5% chance of x-risk and, say, a 50% chance that AGI makes the world just generally be very chaotic and high-stakes over the next few decades, then it seems very plausible that you should mostly be optimizing for making the 50% go well rather than the 5%.

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2024-11-20T23:55:25.426Z · LW(p) · GW(p)

Still consistent with great concern. I'm pointing out that O O's point isn't locally valid, observing concern shouldn't translate into observing belief that alignment is impossible.

↑ comment by MondSemmel · 2024-11-20T18:58:03.789Z · LW(p) · GW(p)

Yudkowsky has a pinned tweet that states the problem quite well: it's not so much that alignment is necessarily infinitely difficult, but that it certainly doesn't seem anywhere as easy as advancing capabilities, and that's a problem when what matters is whether the first powerful AI is aligned:

Safely aligning a powerful AI will be said to be 'difficult' if that work takes two years longer or 50% more serial time, whichever is less, compared to the work of building a powerful AI without trying to safely align it.

Replies from: akash-wasil

↑ comment by Orpheus16 (akash-wasil) · 2024-11-20T19:58:59.946Z · LW(p) · GW(p)

Another frame: If alignment turns out to be easy, then the default trajectory seems fine (at least from an alignment POV. You might still be worried about EG concentration of power).

If alignment turns out to be hard, then the policy decisions we make to affect the default trajectory matter a lot more.

This means that even if misalignment risks are relatively low, a lot of value still comes from thinking about worlds where misalignment is hard (or perhaps "somewhat hard but not intractably hard").

↑ comment by Seth Herd · 2024-11-20T21:57:28.354Z · LW(p) · GW(p)

It's not every post, but there are still a lot of people who think that alignment is very hard.

The more common assumption is that we should assume that alignment isn't trivial, because an intellectually honest assessment of the range of opinions suggests that we collectively do not yet know how hard alignment will be.

↑ comment by habryka (habryka4) · 2024-11-20T17:06:58.012Z · LW(p) · GW(p)

If the project was fueled by a desire to beat China, the structure of the Manhattan project seems unlikely to resemble the parts of the structure of the Manhattan project that seemed maybe advantageous here, like having a single government-controlled centralized R&D effort.

My guess is if something like this actually happens, it would involve a large number of industry subsidies, and would create strong institutional momentum that even when things got dangerous, to push the state of the art forward, and in as much as there is pushback, continue dangerous development in secret.

In the case of nuclear weapons the U.S. really went very far under the advisement of Edward Teller, so I think the outside view here really doesn't look good:

Replies from: akash-wasil

↑ comment by Orpheus16 (akash-wasil) · 2024-11-20T17:18:51.077Z · LW(p) · GW(p)

Good points. Suppose you were on a USG taskforce that had concluded they wanted to go with the "subsidy model", but they were willing to ask for certain concessions from industry.

Are there any concessions/arrangements that you would advocate for? Are there any ways to do the "subsidy model" well, or do you think the model is destined to fail even if there were a lot of flexibility RE how to implement it?

Replies from: habryka4

↑ comment by habryka (habryka4) · 2024-11-20T18:57:47.835Z · LW(p) · GW(p)

I think "full visibility" seems like the obvious thing to ask for, and something that could maybe improve things. Also, preventing you from selling your products to the public, and basically forcing you to sell your most powerful models only to the government, gives the government more ability to stop things when it comes to it.

I will think more about this, I don't have any immediate great ideas.

Replies from: akash-wasil

↑ comment by Orpheus16 (akash-wasil) · 2024-11-20T20:07:18.721Z · LW(p) · GW(p)

If you could only have "partial visibility", what are some of the things you would most want the government to be able to know?

Replies from: nathan-helm-burger

↑ comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-11-20T22:19:03.543Z · LW(p) · GW(p)

I have an answer to that: making sure that NIST:AISI had at least scores of automated evals for checkpoints of any new large training runs, as well as pre-deployment eval access.

Seems like a pretty low-cost, high-value ask to me. Even if that info leaked from AISI, it wouldn't give away corporate algorithmic secrets.

A higher cost ask, but still fairly reasonable, is pre-deployment evals which require fine-tuning. You can't have a good sense of a what the model would be capable of in the hands of bad actors if you don't test fine-tuning it on hazardous info.

↑ comment by Richard_Ngo (ricraz) · 2024-11-20T23:39:31.945Z · LW(p) · GW(p)

Worse than the current situation, because the counterfactual is that some later project happens which kicks off in a less race-y manner.

In other words, whatever the chance of its motivation shifting over time, it seems dominated by the chance that starting the equivalent project later would just have better motivations from the outset.

Replies from: akash-wasil

↑ comment by Orpheus16 (akash-wasil) · 2024-11-21T16:27:52.500Z · LW(p) · GW(p)

Can you say more about scenarios where you envision a later project happening that has different motivations?

I think in the current zeitgeist, such a project would almost definitely be primarily motivated by beating China. It doesn't seem clear to me that it's good to wait for a new zeitgeist. Reasons:

A company might develop AGI (or an AI system that is very good at AI R&D that can get to AGI) before a major zeitgeist change.
The longer we wait, the more capable the "most capable model that wasn't secured" is. So we could risk getting into a scenario where people want to pause but since China and the US both have GPT-Nminus1, both sides feel compelled to race forward (whereas this wouldn't have happened if security had kicked off sooner.)

↑ comment by Joe Collman (Joe_Collman) · 2024-11-21T20:59:33.735Z · LW(p) · GW(p)

Some thoughts:

The correct answer is clearly (c) - it depends on a bunch of factors.
My current guess is that it would make things worse (given likely values for the bunch of other factors) - basically for Richard's reasons.
- Given [new potential-to-shift-motivation information/understanding], I expect there's a much higher chance that this substantially changes the direction of a not-yet-formed project, than a project already in motion.
- Specifically:
  - Who gets picked to run such a project? If it's primarily a [let's beat China!] project, are the key people cautious and highly adaptable when it comes to top-level goals? Do they appoint deputies who're cautious and highly adaptable?
    - Here I note that the kind of 'caution' we'd need is [people who push effectively for the system to operate with caution]. Most people who want caution are more cautious.
  - How is the project structured? Will the structure be optimized for adaptability? For red-teaming of top-level goals?
    - Suppose that a mid-to-high-level participant receives information making the current top-level goals questionable - is the setup likely to reward them for pushing for changes? (noting that these are the kind of changes that were not expected to be needed when the project launched)
  - Which external advisors do leaders of the project develop relationships with? What would trigger these to change?
  - ...
I do think that it makes sense to aim for some centralized project - but only if it's the right kind.
- I expect that almost all the directional influence is in [influence the initial conditions].
- For this reason, I expect [push for some kind of centralized project, and hope it changes later] is a bad idea.
- I think [devote great effort to influencing the likely initial direction of any such future project] seems a great idea (so long as you're sufficiently enlightened about desirable initial directions, of course :))
- I'd note that [initial conditions] needn't only be internal to the project - in principle we could have reason to believe that various external mechanisms would be likely to shift the project's motivation sufficiently over time. (I don't know of any such reasons)
I think the question becomes significantly harder once the primary motivation behind a project isn't [let's beat China!], but also isn't [your ideal project motivation (with your ideal initial conditions)].
I note that my p(doom) doesn't change much if we eliminate racing but don't slow down until it's clear to most decision makers that it's necessary.
- Likewise, I don't expect that [focus on avoiding the earliest disasters] is likely to be the best strategy. So e.g. getting into a good position on security seems great, all else equal - but I wouldn't sacrifice much in terms of [odds of getting to a sufficiently cautious overall strategy] to achieve better short-term security outcomes.

↑ comment by tlevin (trevor) · 2024-11-20T20:22:34.761Z · LW(p) · GW(p)

Depends on the direction/magnitude of the shift!

I'm currently feeling very uncertain about the relative costs and benefits of centralization in general. I used to be more into the idea of a national project that centralized domestic projects and thus reduced domestic racing dynamics (and arguably better aligned incentives), but now I'm nervous about the secrecy that would likely entail, and think it's less clear that a non-centralized situation inevitably leads to a decisive strategic advantage for the leading project. Which is to say, even under pretty optimistic assumptions about how much such a project invests in alignment, security, and benefit-sharing, I'm pretty uncertain that this would be good, and with more realistic assumptions I probably lean towards it being bad. But it super depends on the governance, the wider context, how a "Manhattan Project" would affect domestic companies and China's policymaking, etc.

(I think a great start would be not naming it after the Manhattan Project, though. It seems path dependent, and that's not a great first step.)

Replies from: akash-wasil

↑ comment by Orpheus16 (akash-wasil) · 2024-11-21T16:28:58.488Z · LW(p) · GW(p)

it's less clear that a non-centralized situation inevitably leads to a decisive strategic advantage for the leading project

Can you say more about what has contributed to this update?

↑ comment by Seth Herd · 2024-11-20T22:07:20.565Z · LW(p) · GW(p)

One factor is different incentives for decision-makers. The incentives (and the mindset) for tech companies is to move fast and break things. The incentives (and mindset) for government workers is usually vastly more conservative.

So if it is the government making decisions about when to test and deploy new systems, I think we're probably far better off WRT caution.

That must be weighed against the government typically being very bad at technical matters. So even an attempt to be cautious could be thwarted by lack of technical understanding of risks.

Of course, the Trump administration is attempting to instill a vastly different mindset, more like tech companies. So if it's that administration we're talking about, we're probably worse off on net with a combination of lack of knowledge and YOLO attitudes. Which is unfortunate - because this is likely to happen anyway.

As Habryka and others have noted, it also depends on whether it reduces race dynamics by aggregating efforts across companies, or mostly just throws funding fuel on the race fire.

↑ comment by davekasten · 2024-11-20T19:57:42.399Z · LW(p) · GW(p)

I think this is a (c) leaning (b), especially given that we're doing it in public. Remember, the Manhattan Project was a highly-classified effort and we know it by an innocuous name given to it to avoid attention.

Saying publicly, "yo, China, we view this as an all-costs priority, hbu" is a great way to trigger a race with China...

But if it turned out that we knew from ironclad intel with perfect sourcing that China was already racing (I don't expect this to be the case), then I would lean back more towards (c).

↑ comment by Orpheus16 (akash-wasil) · 2024-11-20T17:00:11.138Z · LW(p) · GW(p)

@davekasten [LW · GW] @Zvi [LW · GW] @habryka [LW · GW] @Rob Bensinger [LW · GW] @ryan_greenblatt [LW · GW] @Buck [LW · GW] @tlevin [LW · GW] @Richard_Ngo [LW · GW] @Daniel Kokotajlo [LW · GW] I suspect you might have interesting thoughts on this. (Feel free to ignore though.)

Replies from: daniel-kokotajlo

↑ comment by Daniel Kokotajlo (daniel-kokotajlo) · 2024-11-20T17:25:22.849Z · LW(p) · GW(p)

(c). Like if this actually results in them behaving responsibly later, then it was all worth it.

Replies from: akash-wasil

↑ comment by Orpheus16 (akash-wasil) · 2024-11-20T17:34:37.927Z · LW(p) · GW(p)

What do you think are the most important factors for determining if it results in them behaving responsibly later?

For instance, if you were in charge of designing the AI Manhattan Project, are there certain things you would do to try to increase the probability that it leads to the USG "behaving more responsibly later?"

↑ comment by Sohaib Imran (sohaib-imran) · 2024-11-21T09:06:29.578Z · LW(p) · GW(p)

One thing I’d be bearish on is visibility into the latest methods being used for frontier AI methods, which would downstream reduce the relevance of alignment research except for the research within the manhattan-like project itself. This is already somewhat true of the big labs eg. methods used for o1 like models. However, there is still some visibility in the form of system cards and reports which hint at the methods. When the primary intention is racing ahead of China, I doubt there will be reports discussing methods used for frontier systems.

↑ comment by worse (Phib) · 2024-11-20T22:20:31.247Z · LW(p) · GW(p)

Something I'm worried about now is some RFK Jr/Dr. Oz equivalent being picked to lead on AI...

↑ comment by MiloSal (milosal) · 2025-01-04T13:19:00.533Z · LW(p) · GW(p)

I'm fairly confident that this would be better than the current situation, and primarily because of something that others haven't touched on here.

The reason is that, regardless of who develops them, the first (militarily and economically) transformative AIs will cause extreme geopolitical tension and instability that is challenging to resolve safely. Resolving such a situation safely requires a well-planned off-ramp, which must route through extremely major national- or international-level decisions. Only governments are equipped to make decisions like these; private AGI companies certainly are not.

Therefore, unless development is at some point centralized in a USG project, there is no way to avoid the many paths to catastrophe that threaten the world during the period of extreme tension coinciding with AGI/ASI development.

comment by Orpheus16 (akash-wasil) · 2024-08-14T17:25:54.037Z · LW(p) · GW(p)

Here are some AI governance/policy thoughts that I've found myself articulating at least 3 times over the last month or so:

I think people interested in AI governance/policy should divide their projects into "things that could be useful in the current Overton Window" and "things that would require a moderate or major Overton Window shift to be useful." I think sometimes people end up not thinking concretely about which world they're aiming for, and this makes their work less valuable.
- If you're aiming for the current Overton Window, you need to be brutally honest about what you can actually achieve. There are many barriers to implementing sensible-seeming ideas. You need access to stakeholders who can do something. You should try to fail quickly. If your idea requires buy-in from XYZ folks, and X isn't interested, that's worth figuring out ASAP.
- If you're aiming for something outside the current Overton Window, you often have a lot of room to be imaginative. I think it's very easy to underestimate Overton Window shifts. If policymakers get considerably more concerned about AI risks, there are a lot of things that will "on the table". People say that AI safety folks were unprepared for the chatGPT surge– if you think that there will be 1-2 more surges of interest, it might be worth explicitly preparing for ideas that would be considered in those surges.
I think it's pretty essential to be in regular touch with policymakers/staffers if your main TOC is to get things done in the current Overton Window.
A common failure mode for "research types" is to write a 20+page paper and then ask "ok cool, which policymakers might be interested?" I think usually a better strategy is to try to get in touch with your target audience much earlier on in the process. Present the 1-2 page version of your idea and see if/where the nuance is useful. (To be clear, this is if your TOC involves directly influencing policy. This doesn't apply if your main TOC is to improve everyone's understanding of X topic or improve your own understanding of Y topic).
On the margin, I think more people who are new to AI governance/policy should be focusing on "things that would require a moderate or major Overton Window shift to be useful." I think there's more low-hanging fruit there that people can contribute to without necessarily having the kinds of networks/access that you often need to know what to do in the current Overton Window.
I think people tend to underestimate how quickly they could become a world expert in a specific area. This is especially true if you're applying it to the intersection of two areas. For example, it's very hard to become a world expert in international governance. But it's relatively easier to become a world expert in the intersection of "international governance" and "AI safety". There will be people who know more about international governance than you and people who know more about AI safety than you, but you might become one of the people who has thought the most rigorously about the intersection of the two topics.

Replies from: davekasten, yanni

↑ comment by davekasten · 2024-08-14T22:32:47.072Z · LW(p) · GW(p)

I think I agree with much-to-all of this. One further amplification I'd make about the last point: the culture of DC policymaking is one where people are expected to be quick studies and it's OK to be new to a topic; talent is much more funged from topic to topic in response to changing priorities than you'd expect. Your Lesswrong-informed outside view of how much you need to know on a topic to start commenting on policy ideas is probably wrong.

(Yes, I know, someone is about to say "but what if you are WRONG about the big idea given weird corner case X or second-order effects Y?" Look, reversed stupidity is not wisdom, but also also sometimes you can just quickly identify stupid-across-almost-all-possible-worlds ideas and convince people just not to do them rather than having to advocate for an explicit good-idea alternative.)

↑ comment by yanni kyriacos (yanni) · 2024-08-15T03:01:49.181Z · LW(p) · GW(p)

I think how delicately you treat your personal Overton Window should also depend on your timelines.

comment by Orpheus16 (akash-wasil) · 2024-09-17T21:28:34.279Z · LW(p) · GW(p)

Recent Senate hearing includes testimony from Helen Toner and William Saunders.

Both statements are explicit about AGI risks & emphasize the importance of transparency & whistleblower mechanisms.
William's statement acknowledges that he and others doubt that OpenAI's safety work will be sufficient.
- "OpenAI will say that they are improving. I and other employees who resigned doubt they will be ready in time. This is true not just with OpenAI; the incentives to prioritize rapid development apply to the entire industry. This is why a policy response is needed."
Helen's statement provides an interesting paragraph about China at the end.
- "A closing note on China: The specter of ceding U.S. technological leadership to China is often treated as a knock-down argument against implementing regulations of any kind. Based on my research on the Chinese AI ecosystem and U.S.-China technology competition more broadly, I think this argument is not nearly as strong as it seems at first glance. We should certainly be mindful of how regulation can affect the pace of innovation at home, and keep a close eye on how our competitors and adversaries are developing and using AI. But looking in depth at Chinese AI development, the AI regulations they are already imposing, and the macro headwinds they face leaves me with the conclusion that they are far from being poised to overtake the United States.6 The fact that targeted, adaptive regulation does not have to slow down U.S. innovation—and in fact can actively support it—only strengthens this point."

Full hearing here (I haven't watched it yet.)

Replies from: Benito

↑ comment by Ben Pace (Benito) · 2024-09-17T22:13:40.954Z · LW(p) · GW(p)

I am impressed regarding Helen Toner's China comment!

For a while I have been tracking a hypothesis that nobody working in DC in AI Policy would openly and prominently speak against competition with China being a current priority, but this quote shows that hypothesis does not hold.

Now I will track whether any such person explicitly states that it doesn't matter who gets there first, civilization will most likely end regardless, and that competition shouldn't be a priority even if China were ahead of the US. I haven't seen a prominent instance of this happening yet.

Replies from: gwern

↑ comment by gwern · 2024-09-17T22:30:20.885Z · LW(p) · GW(p)

Toner is one of the only people criticizing the China arms race claims, like last year: https://www.foreignaffairs.com/china/illusion-chinas-ai-prowess-regulation-helen-toner This also earned her some enmity on social media as a Commie stooge last year.

Replies from: Benito

↑ comment by Ben Pace (Benito) · 2024-09-17T22:43:48.903Z · LW(p) · GW(p)

Appreciate the link (and for others, here's an archived version without the paywall.)

I update toward a model of Helen's statements here not being very representative of what people in DC feel comfortable saying aloud, though to me it's still nice to know that literally anyone is able to say these words.

Replies from: davekasten

↑ comment by davekasten · 2024-09-17T23:18:28.311Z · LW(p) · GW(p)

Generally, it is difficult to understate how completely the PRC is seen as a bad-faith actor in DC these days. Many folks saw them engage in mass economic espionage for a decade while repeatedly promising to stop; those folks are now more senior in their careers than those formative moments. Then COVID happened, and while not everyone believes in the lab leak hypothesis, basically everyone believes that the PRC sure as heck reflexively covered up whether or not they were actually culpable.

(Edit: to be clear, reporting, not endorsing, these claims)

Replies from: Benito

↑ comment by Ben Pace (Benito) · 2024-09-17T23:23:28.581Z · LW(p) · GW(p)

Thanks for the info.

This is an area where I expect a lot of my info sources to be pretty adversarial, and furthermore I haven't looked into these issues a great deal, so I don't have a developed perspective on how bad-faith the Chinese government's agreements and information sources are.

I think I recall pretty adversarial information-sharing behavior from China toward the rest of the world in March 2020 (which I consider a massive deal), though I'd have to re-read Wikipedia and LessWrong to recall what exactly was going on.

comment by Orpheus16 (akash-wasil) · 2024-05-30T16:01:07.356Z · LW(p) · GW(p)

I'm surprised why some people are so interested in the idea of liability for extreme harms. I understand that from a legal/philosophical perspective, there are some nice arguments about how companies should have to internalize the externalities of their actions etc.

But in practice, I'd be fairly surprised if liability approaches were actually able to provide a meaningful incentive shift for frontier AI developers. My impression is that frontier AI developers already have fairly strong incentives to avoid catastrophes (e.g., it would be horrible for Microsoft if its AI model caused $1B in harms, it would be horrible for Meta and the entire OS movement if an OS model was able to cause $1B in damages.)

And my impression is that most forms of liability would not affect this cost-benefit tradeoff by very much. This is especially true if the liability is only implemented post-catastrophe. Extreme forms of liability could require insurance, but this essentially feels like a roundabout and less effective way of implementing some form of licensing (you have to convince us that risks are below an acceptable threshold to proceed.)

I think liability also has the "added" problem of being quite unpopular, especially among Republicans. It is easy to attack liability regulations as anti-innovation, argue that that it creates a moat (only big companies can afford to comply), and argue that it's just not how America ends up regulating things (we don't hold Adobe accountable for someone doing something bad with Photoshop.)

To be clear, I don't think "something is politically unpopular" should be a full-stop argument against advocating for it.

But I do think that "liability for AI companies" scores poorly both on "actual usefulness if implemented" and "political popularity/feasibility." I also think the "liability for AI companies" advocacy often ends up getting into abstract philosophy land (to what extent should companies internalize externalities) and ends up avoiding some of the "weirder" points (we expect AI has a considerable chance of posing extreme national security risks, which is why we need to treat AI differently than Photoshop.)

I would rather people just make the direct case that AI poses extreme risks & discuss the direct policy interventions that are warranted.

With this in mind, I'm not an expert in liability and admittedly haven't been following the discussion in great detail (partly because the little I have seen has not convinced me that this is an approach worth investing into). I'd be interested in hearing more from people who have thought about liability– particularly concrete stories for how liability would be expected to meaningfully shift incentives of labs. (See also here).

Stylistic note: I'd prefer replies along the lines of "here is the specific argument for why liability would significantly affect lab incentives and how it would work in concrete cases" rather than replies along the lines of "here is a thing you can read about the general legal/philosophical arguments about how liability is good."

Replies from: habryka4, None, Chris_Leong, LRudL, RedMan

↑ comment by habryka (habryka4) · 2024-05-30T22:53:29.132Z · LW(p) · GW(p)

One reason I feel interested in liability is because it opens up a way to do legal investigations. The legal system has a huge number of privileges that you get to use if you have reasonable suspicion someone has committed a crime or is being negligent. I think it's quite likely that if there was no direct liability, that even if Microsoft or OpenAI causes some huge catastrophe, that we would never get a proper postmortem or analysis of the facts, and would never reach high-confidence on the actual root-causes.

So while I agree that OpenAI and Microsoft want to of course already avoid being seen as responsible for a large catastrophe, having legal liability makes it much more likely there will be an actual investigation where e.g. the legal system gets to confiscate servers and messages to analyze what happens, which makes it then more likely that if OpenAI and Microsoft are responsible, they will be found out to be responsible.

Replies from: akash-wasil

↑ comment by Orpheus16 (akash-wasil) · 2024-06-02T01:07:19.064Z · LW(p) · GW(p)

I found this answer helpful and persuasive– thank you!

↑ comment by [deleted] · 2024-06-06T13:39:55.988Z · LW(p) · GW(p)

I think liability-based interventions are substantially more popular with Republicans than other regulatory interventions - they're substantially more hands-off than, for instance, a regulatory agency. They also feature prominently in the Josh Hawley proposal. I've also been told by a republican staffer that liability approaches are relatively popular amongst Rs.

An important baseline point is that AI firms (if they're selling to consumers) are probably by default covered by product liability by default. If they're covered by product liability, then they'll be liable for damages if it can be shown that there was a not excessively costly alternative design that they could have implemented that would have avoided that harm.

If AI firms aren't covered by product liability, they're liable according to standard tort law, which means they're liable if they're negligent under a reasonable person standard.

Liability law also gives (some, limited) teeth to NIST standards. If a firm can show that it was following NIST safety standards, this gives it a strong argument that it wasn't being negligent.

I share your scepticism of liability interventions as mechanisms for making important dents in the AI safety problem. Prior to the creation of the EPA, firms were still in principle liable for the harms their pollution caused, but the tort law system is generically a very messy way to get firms to reduce accident risks. It's expensive and time consuming to go through the court system, courts are reluctant to award punitive damages which means that externalities aren't internalised even theory (in expectation for firms,) and you need to find a plaintiff with standing to sue firms.

I think there are still some potentially important use cases for liability for reducing AI risks:

Making clear the legal responsibilities of private sector auditors (I'm quite confident that this is a good idea)
Individual liability for individuals with safety responsibilities at firms (although this would be politically unpopular on the right I'd expect)
Creating safe harbours from liability if firms fulfil some set of safety obligations (similarly to the California bill) - ideally safety obligations that are updated over time and tied to best practice
Requiring insurance to cover liability and using this to create better safety practices as firms to reduce insurance premiums and satisfy insurers' requirements for coverage
Tieing liability to specific failures modes that we expect to correlate with catastrophic failure modes, perhaps tied to a punitive damages regime - for instance holding a firm liable, including for punitive damages if a model causes harm via say goal misgenerlisation or firms lacking industry standard risk management practices

To be clear, I'm still sceptical of liability-based solutions and reasonably strongly favour regulatory proposals (where specific liability provisions will still play an important role.)

I'm not a lawyer and have no legal training.

↑ comment by Chris_Leong · 2024-05-30T23:11:22.332Z · LW(p) · GW(p)

I think we should be talking more about potentially denying a frontier AI license to any company that causes a major disaster (within some future licensing regime), where a company’s record before the law passes will be taken into amount.

↑ comment by L Rudolf L (LRudL) · 2024-06-01T11:07:35.240Z · LW(p) · GW(p)

One alternative method to liability for the AI companies is strong liability for companies using AI systems. This does not directly address risks from frontier labs having dangerous AIs in-house, but helps with risks from AI system deployment in the real world. It indirectly affects labs, because they want to sell their AIs.

A lot of this is the default. For example, Air Canada recently lost a court case after claiming a chatbot promising a refund wasn't binding on them. However, there could be related opportunities. Companies using AI systems currently don't have particularly good ways to assess risks from AI deployment, and if models continue getting more capable while reliability continues lagging, they are likely to be willing to pay an increasing amount for ways to get information on concrete risks, guard against it, or derisk it (e.g. through insurance against their deployed AI systems causing harms). I can imagine a service that sells AI-using companies insurance against certain types of deployment risk, that could also double as a consultancy / incentive-provider for lower-risk deployments. I'd be interested to chat if anyone is thinking along similar lines.

↑ comment by RedMan · 2024-05-30T21:34:21.727Z · LW(p) · GW(p)

There are analogies here in pollution. Some countries force industry to post bonds for damage to the local environment. This is a new innovation that may be working.

The reason the superfund exists in the US is because liability for pollution can be so severe that a company would simply cease to operate, and the mess would not be cleaned up.

In practice, when it comes to taking environmental risks, better to burn the train cars of vinyl chloride, creating a catastrophe too expensive for anyone to clean up or even comprehend than to allow a few gallons to leak, creating an expensive accident that you can actually afford.

comment by Orpheus16 (akash-wasil) · 2024-08-05T16:58:13.653Z · LW(p) · GW(p)

New Vox article criticizes Anthropic for trying to weaken SB1047 (as well as for some other things). Some notable sections:

Anthropic is lobbying to water down the bill. It wants to scrap the idea that the government should enforce safety standards before a catastrophe occurs. “Instead of deciding what measures companies should take to prevent catastrophes (which are still hypothetical and where the ecosystem is still iterating to determine best practices)” the company urges, “focus the bill on holding companies responsible for causing actual catastrophes.” In other words, take no action until something has already gone terribly wrong.
“Anthropic is trying to gut the proposed state regulator and prevent enforcement until after a catastrophe has occurred — that’s like banning the FDA from requiring clinical trials,” Max Tegmark, president of the Future of Life Institute, told me.
In what he called “a cynical procedural move,” Tegmark noted that Anthropic has also introduced amendments to the bill that touch on the remit of every committee in the legislature, thereby giving each committee another opportunity to kill it. “This is straight out of Big Tech’s playbook,” he said
The US has enforceable safety standards in industries ranging from pharma to aviation. Yet tech lobbyists continue to resist such regulations for their own products. Just as social media companies did years ago, they make voluntary commitments to safety to placate those concerned about risks, then fight tooth and nail to stop those commitments being turned into law.
"I am pretty skeptical of things that relate to corporate governance because I think the incentives of corporations are horrendously warped, including ours.” Those are the words of Jack Clark, the policy chief at Anthropic. [Quote is from a year ago]

Replies from: Zach Stein-Perlman, Aidan O'Gara

↑ comment by Zach Stein-Perlman · 2024-08-05T17:45:16.190Z · LW(p) · GW(p)

This article makes some fine points but some misleading ones and its thesis is wrong, I think. Bottom line: Anthropic does lots of good things and is doing much better than being maximally selfish/ruthless. (And of course this is possible, contra the article — Anthropic is led by humans who have various beliefs which may entail that they should make tradeoffs in favor of safety. The space of AI companies is clearly not so perfectly competitive that anyone who makes tradeoffs in favor of safety becomes bankrupt and irrelevant.)

It’s pushing back on a landmark California bill to regulate AI.

Yep, Anthropic's policy advocacy seems bad.

It’s taking money from Google and Amazon in a way that’s drawing antitrust scrutiny. And it’s being accused of aggressively scraping data from websites without permission, harming their performance.

My impression is that these are not big issues. I'm open to hearing counterarguments. [Edit: the scraping is likely a substantial issue for many sites; see comment below. (It is not an x-safety issue, of course.)]

Here’s another tension at the heart of AI development: Companies need to hoover up reams and reams of high-quality text from books and websites in order to train their systems. But that text is created by human beings, and human beings generally do not like having their work used without their consent.

I agree this is not ideal-in-all-ways but I'm not aware of a better alternative.

Web publishers and content creators are angry. Matt Barrie, chief executive of Freelancer.com, a platform that connects freelancers with clients, said Anthropic is “the most aggressive scraper by far,” swarming the site even after being told to stop. “We had to block them because they don’t obey the rules of the internet. This is egregious scraping [that] makes the site slower for everyone operating on it and ultimately affects our revenue.”

This is surprising to me. I'm not familiar with the facts. Seems maybe bad.

Deals like these [investments from Amazon and Google] always come with risks. The tech giants want to see a quick return on their investments and maximize profit. To keep them happy, the AI companies may feel pressure to deploy an advanced AI model even if they’re not sure it’s safe.

Yes there's nonzero force to this phenomenon, but my impression is that Amazon and Google have almost no hard power over Anthropic and no guaranteed access to its models (unlike e.g. how OpenAI may have to share its models with Microsoft, even if OpenAI thinks the model is unsafe), and I'm not aware of a better alternative.

[Edit: mostly I just think this stuff is not-what-you-should-focus-on if evaluating Anthropic on safety — there are much bigger questions.]

There are some things Anthropic should actually do better. There are some ways it's kinda impure, like training on the internet and taking investments. Being kinda impure is unavoidable if you want to be a frontier AI company. Insofar as Anthropic is much better on safety than other frontier AI companies, I'm glad it exists.

[Edit: I'm slightly annoyed that the piece feels one-sided — it's not trying to figure out whether Anthropic makes tradeoffs for safety or how it compares to other frontier AI companies, instead it's collecting things that sound bad. Maybe this is fine since the article's role is to contribute facts to the discourse, not be the final word.]

Replies from: habryka4, akash-wasil

↑ comment by habryka (habryka4) · 2024-08-05T19:58:43.687Z · LW(p) · GW(p)

My impression is that these are not big issues. I'm open to hearing counterarguments.

I think the Anthropic scraper has been causing a non-trivial amount of problems for LW. I am kind of confused because there might be scrapers going around that are falsely under the name "claudebot" but in as much as it is Anthropic, it sure has been annoying (like, killed multiple servers and has caused me like 10+ hours of headaches).

The part of the article I actually found most interesting is this:

In what he called “a cynical procedural move,” Tegmark noted that Anthropic has also introduced amendments to the bill that touch on the remit of every committee in the legislature, thereby giving each committee another opportunity to kill it.

This seems worth looking into and would be pretty bad.

Replies from: mesaoptimizer

↑ comment by mesaoptimizer · 2024-08-06T07:34:20.479Z · LW(p) · GW(p)

I hope you've at least throttled them or IP blocked them temporarily for being annoying. It is not that difficult to scrape a website while respecting its bandwidth and CPU limitations.

Replies from: habryka4

↑ comment by habryka (habryka4) · 2024-08-06T18:30:49.564Z · LW(p) · GW(p)

We complained to them and it's been better in recent months. We didn't want to block them because I do actually want LW to be part of the training set.

↑ comment by Orpheus16 (akash-wasil) · 2024-08-06T20:43:14.845Z · LW(p) · GW(p)

+1 to lots of this.

(Meta: Me posting the article is not an endorsement of the article as a whole. I agree with Zach that lots of sections of it don't seem fair/balanced and don't seem to be critical from an extreme risk perspective.

I think the bullet points I listed above summarize the parts that I think are important/relevant.)

↑ comment by aog (Aidan O'Gara) · 2024-08-06T15:28:01.917Z · LW(p) · GW(p)

I think there's a decent case that SB 1047 would improve Anthropic's business prospects, so I'm not sure this narrative makes sense. On one hand, SB 1047 might make it less profitable to run an AGI company, which is bad for Anthropic's business plan. But Anthropic is perhaps the best positioned of all AGI companies to comply with the requirements of SB 1047, and might benefit significantly from their competitors being hampered by the law.

The good faith interpretation of Anthropic's argument would be that the new agency created by the bill might be very bad at issuing guidance that actually reduces x-risk, and you might prefer the decision-making of AI labs with a financial incentive to avoid catastrophes without additional pressure to follow the exact recommendations of the new agency.

Replies from: akash-wasil

↑ comment by Orpheus16 (akash-wasil) · 2024-08-06T20:41:10.425Z · LW(p) · GW(p)

The good faith interpretation of Anthropic's argument would be that the new agency created by the bill might be very bad at issuing guidance that actually reduces x-risk, and you might prefer the decision-making of AI labs with a financial incentive to avoid catastrophes without additional pressure to follow the exact recommendations of the new agency.

Some quick thoughts on this:

If SB1047 passes, labs can still do whatever they want to reduce xrisk. This seems additive to me– I would be surprised if a lab was like "we think XYZ is useful to reduce extreme risks, and we would've done them if SB1047 had not passed, but since Y and Z aren't in the FMD guidance, we're going to stop doing Y and Z."
I think the guidance the agency issues will largely be determined by who it employs. I think it's valid to be like "maybe the FMD will just fail to do a good job because it won't employ good people", but to me this is more of a reason to say "how do we make sure the FMD gets staffed with good people who understand how to issue good recommendations", rather than "there is a risk that you issue bad guidance, therefore we don't want any guidance."
I do think that a poorly-implemented FMD could cause harm by diverting company attention/resources toward things that are not productive, but IMO this cost seems relatively small compared to the benefits acquired in the worlds where the FMD issues useful guidance. (I haven't done a quantitative EV calculation on this though, maybe someone should. I would suspect that even if you give FMD like 20-40% chance of good guidance, and 60-80% chance of useless guidance, the EV would still be net positive.)

comment by Orpheus16 (akash-wasil) · 2024-08-17T18:47:03.474Z · LW(p) · GW(p)

Why didn't industry succeed in killing SB1047 [so far]?

If someone had told me in 2022 that there would be a bill in CA that the major labs opposed and that the tech industry spent a fair amount of effort lobbying against (to the point of getting Congresspeople and Nancy Pelosi to chime in), I would've been like "that bill seems like it should get killed pretty early on in the process."

Like, if the bill has to go through 5+ committees, I would've predicted that it would die within the first 3 committees.So what's going on? Some plausible explanations:

Industry has less power over AI legislation than I (and maybe some others) thought
Industry has more influence on the federal government than on the CA legislatures
Industry underestimated SB1047 early on//didn't pay much attention to it and the opposition came relatively late in the game
Scott Weiner is really good at building coalitions and forming alliances
SB1047 is relatively light-touch and the burden is very high when industry tries to fight light-touch things

What do you think are the most noteworthy explanations for why industry has failed to kill SB1047 so far?

Replies from: davekasten, akash-wasil

↑ comment by davekasten · 2024-08-18T00:28:22.060Z · LW(p) · GW(p)

One question I have is whether Nancy Pelosi was asked and agreed to do this, or whether Nancy Pelosi identified this proactively as an opportunity to try to win back some tech folks to the Dem side. Substantially changes our estimate of how much influence the labs have in this conversation.

↑ comment by Orpheus16 (akash-wasil) · 2024-08-17T18:50:33.075Z · LW(p) · GW(p)

One plausible explanation is that industry still thinks it's likely to kill the bill, and they just didn't feel like they needed to play their cards sooner.

But this still leaves me surprised– I would've expected that it's in industry's interest to kill the bill earlier in the process because:

It might be easier to kill earlier on because it hasn't gained much traction/support
If you want to appear like you're open to regulation (which seems to be the policy of major AI companies), you probably want to kill it in a relatively silent/invisible way. If you have to be very loud and public and you get to the point where there are a bunch of media articles about it, you lose some credibility/reputation/alliances (and indeed I do think industry has lost some of this "plausibility of good will" as a result of the SB1047 saga)

comment by Orpheus16 (akash-wasil) · 2024-06-04T17:20:39.019Z · LW(p) · GW(p)

My rough ranking of different ways superintelligence could be developed:

Least safe: Corporate Race. Superintelligence is developed in the context of a corporate race between OpenAI, Microsoft, Google, Anthropic, and Facebook.
Safer (but still quite dangerous): USG race with China. Superintelligence is developed in the context of a USG project or "USG + Western allies" project with highly secure weights. The coalition hopefully obtains a lead of 1-3 years that it tries to use to align superintelligence and achieve a decisive strategic advantage. This probably relies heavily on deep learning and means we do not have time to invest into alternative paradigms ("provably safe" systems, human intelligence enhancement, etc.
Safest (but still not a guarantee of success): International coalition. Superintelligence is developed in the context of an international project with highly secure weights. The coalition still needs to develop superintelligence before rogue projects can, but the coalition hopes to obtain a lead of 10+ years that it can use to align a system that can prevent rogue AGI projects. This could buy us enough time to invest heavily in alternative paradigms.

My own thought is that we should be advocating for option #3 (international coordination) unless/until there is enough evidence that suggests that it's actually not feasible, and then we should settle for option #2. I'm not yet convinced by people who say we have to settle for option #2 just because EG climate treaties have not went well or international cooperation is generally difficult.

But I also think people advocating #3 should be aware that there are some worlds in which international cooperation will not be feasible, and we should be prepared to do #2 if it's quite clear that the US and China are unwilling to cooperate on AGI development. (And again, I don't think we have that evidence yet– I think there's a lot of uncertainty here.)

Replies from: bogdan-ionut-cirstea, Dagon, davekasten, Oscar Delaney

↑ comment by Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-06-04T18:56:35.200Z · LW(p) · GW(p)

I don't think the risk ordering is obvious at all, especially not between #2 and #3, and especially not if you also took into account tractability concerns and risks separate from extinction (e.g. stable totalitarianism, s-risks). Even if you thought coordinating with China might be worth it, I think it should be at least somewhat obvious why the US government [/ and its allies] might be very uncomfortable building a coalition with, say, North Korea or Russia. Even between #1 and #2, the probable increase in risks of centralization might make it not worth it, at least in some worlds, depending on how optimistic one might be about e.g. alignment or offense-defense balance from misuse of models with dangerous capabilities.

I also don't think it's obvious alternative paradigms would necessarily be both safer and tractable enough, even on 10-year timelines, especially if you don't use AI automation (using the current paradigm, probably) to push those forward.

Replies from: akash-wasil

↑ comment by Orpheus16 (akash-wasil) · 2024-06-04T19:06:31.780Z · LW(p) · GW(p)

the probable increase in risks of centralization might make it not worth it

Can you say more about why the risk of centralization differs meaningfully between the three worlds?

IMO if you assume that (a) an intelligence explosion occurs at some point, (b) the leading actor uses the intelligence explosion to produce a superintelligence that provides a decisive strategic advantage, and (c) the superintelligence is aligned/controlled...

Then you are very likely (in the absence of coordination) to result in centralization no matter what. It's just a matter of whether OpenAI/Microsoft (scenario #1), the USG and allies (scenario #2), or a broader international coalition (weighted heavily toward the USG and China) are the ones wielding the superintelligence.

(If anything, it seems like the "international coalition" approach seems less likely to lead to centralization than the other two approaches, since you're more likely to get post-AGI coordination.)

especially if you don't use AI automation (using the current paradigm, probably) to push those forward.

In my vision, the national or international project would be investing into "superalignment"-style approaches, they would just (hopefully) have enough time/resources to be investing into other approaches as well.

I typically assume we don't get "infinite time"– i.e., even the international coalition is racing against "the clock" (e.g., the amount of time it takes for a rogue actor to develop ASI in a way that can't be prevented, or the amount of time we have until a separate existential catastrophe occurs.) So I think it would be unwise for the international coalition to completely abandon DL/superalignemnt, even if one of the big hopes is that a safer paradigm would be discovered in time.

Replies from: bogdan-ionut-cirstea

↑ comment by Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-06-04T19:10:06.429Z · LW(p) · GW(p)

IMO if you assume that (a) an intelligence explosion occurs at some point, (b) the leading actor uses the intelligence explosion to produce a superintelligence that provides a decisive strategic advantage, and (c) the superintelligence is aligned/controlled...

I don't think this is obvious, stably-multipolar worlds seem at least plausible to me.

Replies from: ryan_greenblatt

↑ comment by ryan_greenblatt · 2024-06-04T20:35:32.985Z · LW(p) · GW(p)

See also here [LW · GW] and here [LW(p) · GW(p)].

Replies from: akash-wasil

↑ comment by Orpheus16 (akash-wasil) · 2024-06-04T22:28:12.241Z · LW(p) · GW(p)

@Bodgan, Can you spell out a vision for a stably multipolar world with the above assumptions satisfied?

IMO assumption B is doing a lot of the work— you might argue that the IE will not give anyone a DSA, in which case things get more complicated. I do see some plausible stories in which this could happen but they seem pretty unlikely.

@Ryan, thanks for linking to those. Lmk if there are particular points you think are most relevant (meta: I think in general I find discourse more productive when it’s like “hey here’s a claim, also read more here” as opposed to links. Ofc that puts more communication burden on you though, so feel free to just take the links approach.)

Replies from: ryan_greenblatt, bogdan-ionut-cirstea

↑ comment by ryan_greenblatt · 2024-06-05T01:29:27.343Z · LW(p) · GW(p)

(Yeah, I was just literally linking to things people might find relevant to read without making any particular claim. I think this is often slightly helpful, so I do it. Edit: when I do this, I should probably include a disclaimer like "Linking for relevance, not making any specific claim".)

↑ comment by Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-06-05T00:00:34.757Z · LW(p) · GW(p)

Yup, I was thinking about worlds in which there is no obvious DSA, or where the parties involved are risk averse enough (perhaps e.g. for reasons like in this talk)

Replies from: nathan-helm-burger

↑ comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-06-06T10:24:16.007Z · LW(p) · GW(p)

My expectation is that DSI can (and will) be achieved before ASI. In fact, I expect ASI to be about as useful as a bomb which has a minimum effect size of destroying the entire solar system if deployed. In other words, useful only for Mutually Assured Destruction. DSI only requires a nuclear-armed state actor to have an effective global missile defense system. Whichever nuclear-armed state actor gets that without any other group having that can effectively demand the surrender and disarmament of all other nations. Including confiscating their compute resources. Do you think missile defense is so difficult that only ASI can manage it? I don't. That seems like a technical discussion which would need more details to hash out. I'm pretty sure an explicitly designed tool AI and a large drone and satellite fleet could accomplish that.

↑ comment by Dagon · 2024-06-05T14:42:36.179Z · LW(p) · GW(p)

Competition is fractal. There are multiple hierarchies (countries/departments/agencies/etc, corporations/divisions/teams/etc), with individual humans acting on their own behalf. Often, individuals have influence and goals in multiple hierarchies.

Your 1/2/3 delineation is not the important part. It’s going to be all 3, with chaotic shifts as public perception, funding, and regulation shifts around.

↑ comment by davekasten · 2024-06-04T22:09:34.401Z · LW(p) · GW(p)

Agree -- I think people need to be prepared for "try-or-die" scenarios.

One unfun one I'll toss into the list: "Company A is 12 months from building Cthulhu, and governments truly do not care and there is extremely strong reason to believe that will not change in the next year. All our policy efforts have failed, our existing technical methods are useless, and the end of the world has come. Everyone report for duty at Company B, we're going to try to roll the hard six."

Replies from: mesaoptimizer

↑ comment by mesaoptimizer · 2024-06-04T22:31:07.498Z · LW(p) · GW(p)

If Company A is 12 months from building Cthulhu, we fucked up upstream. Also, I don't understand why you'd want to play the AI arms race -- you have better options. They expect an AI arms race. Use other tactics. Get into their OODA loop.

Unsee the frontier lab.

Replies from: davekasten

↑ comment by davekasten · 2024-06-04T22:41:13.855Z · LW(p) · GW(p)

...yes ? I think my scenario explicitly assumes that we've fucked up upstream in many, many ways.

Replies from: mesaoptimizer

↑ comment by mesaoptimizer · 2024-06-04T22:44:27.915Z · LW(p) · GW(p)

Oh, by that I meant something like "yeah I really think it is not a good idea to focus on an AI arms race". See also Slack matters more than any other outcome. [LW · GW]

↑ comment by Oscar (Oscar Delaney) · 2024-06-04T17:34:04.699Z · LW(p) · GW(p)

You are probably already familiar with this, but re option 3, the Multilateral AGI Consortium (MAGIC) proposal is I assume along the lines of what you are thinking.

Replies from: davekasten

↑ comment by davekasten · 2024-06-04T21:56:17.281Z · LW(p) · GW(p)

Indeed, Akash is familiar: https://arxiv.org/abs/2310.20563 :)

(I think it was a later paper he co-authored than the one you cite)

comment by Orpheus16 (akash-wasil) · 2024-10-22T22:03:23.489Z · LW(p) · GW(p)

48 entities gave feedback on the Department of Commerce AI reporting requirements.

Public comments offering feedback on BIS's proposed reporting requirements are now up! It received responses from 48 entities including OpenAI, Anthropic, and many AI safety groups.

The reporting requirements are probably one of the most important things happening in US AI policy-- I'd encourage folks here to find time to skim some of the comments.

comment by Orpheus16 (akash-wasil) · 2024-06-30T16:43:08.386Z · LW(p) · GW(p)

Recommended reading: A recent piece argues that the US-China crisis hotline doesn't work & generally raises some concerns about US-China crisis communication.

Some quick thoughts:

If the claims in the piece are true, there seem to be some (seemingly tractable) ways of substantially improving US-China crisis communication.
The barriers seem more bureaucratic (understanding how the defense world works and getting specific agencies/people to do specific things) than political (I doubt this is something you need Congress to pass new legislation to improve.)
In general, I feel like "how do we improve our communication infrastructure during AI-related crises" is an important and underexplored area of AI policy. This isn't just true for US-China communication but also for "lab-government communication", "whistleblower-government communication", and "junior AI staffer-senior national security advisor" communication.
- Example: Suppose an eval goes off that suggests that an AI-related emergency might be imminent. How do we make sure this information swiftly gets to relevant people? To what extent do UKAISI and USAISI folks (or lab whistleblowers) have access to senior national security folks who would actually be able to respond in a quick or effective way?
I think IAPS' CDDC paper is a useful contribution here. I will soon be releasing a few papers in this broad space, with a focus on interventions that can improve emergency detection + emergency response.
One benefit of workshops/conferences/Track 2 dialogues might simply be that you get relevant people to meet each other, share contact information, build trust/positive vibes, and be more likely to reach out in the event of an emergency scenario.
Establishing things like the AI Safety and Security Board might also be useful for similar reasons. I think this has gotten a fair amount of criticism for being too industry-focused, and some of that is justified. Nonetheless, I think interventions along the lines of "make it easy for the people who might see the first signs of extreme risk have super clear ways of advising/contacting government officials" seem great.

comment by Orpheus16 (akash-wasil) · 2024-09-02T23:39:58.060Z · LW(p) · GW(p)

Why do people think there's a ~50% chance that Newsom will veto SB1047?

The base rate for vetoes is about 15%. Perhaps the base rate for controversial bills is higher. But it seems like SB1047 hasn't been very controversial among CA politicians.

Is the main idea here that Newsom's incentives are different than those of state politicians because Newsom has national ambitions? So therefore he needs to cater more to the Democratic Party Establishment (which seems to oppose SB1047) or Big Tech? (And then this just balances out against things like "maybe Newsom doesn't want to seem soft on Big Tech, maybe he feels like he has more to lose by deviating from what the legislature wants, the polls support SB1047, and maybe he actually cares about increasing transparency into frontier AI companies?)

Or are there other factors that are especially influential in peoples' models here?

(Tagging @ryan_greenblatt [LW · GW], @Eric Neyman [LW · GW], and @Neel Nanda [LW · GW] because you three hold the largest No positions. Feel free to ignore if you don't want to engage.)

Replies from: neel-nanda-1, UnexpectedValues

↑ comment by Neel Nanda (neel-nanda-1) · 2024-09-03T00:17:13.479Z · LW(p) · GW(p)

My model is basically just "Newsom likely doesn't want to piss off Big Tech or Pelosi, and the incentive to not veto doesn't seem that high, and so seems highly likely to veto, and 50% veto seems super low". My fair is, like, 80% veto I think?

I'm not that compelled by the base rates argument, because I think the level of controversy over the bill is atypically high, so it's quite out of distribution. Eg I think Pelosi denouncing it is very unusual for a state Bill and a pretty big deal

Replies from: akash-wasil, UnexpectedValues, Raemon, daniel-samuel

↑ comment by Orpheus16 (akash-wasil) · 2024-09-03T20:36:00.596Z · LW(p) · GW(p)

Thanks for sharing! Why do you think the CA legislators were more OK pissing off Big Tech & Pelosi? (I mean, I guess Pelosi's statement didn't come until relatively late, but I believe there was still time for people in at least one chamber to change their votes.)

To me, the most obvious explanation is probably something like "Newsom cares more about a future in federal government than most CA politicians and therefore relies more heavily on support from Big Tech and approval from national Democratic leaders"– is this what's driving your model?

Replies from: neel-nanda-1

↑ comment by Neel Nanda (neel-nanda-1) · 2024-09-03T22:12:41.776Z · LW(p) · GW(p)

This is a fair point. I think Newsom is a very visible and prominent target who has more risk here (I imagine people don't pay that much attention to individual California legislators), it's individually his fault if he doesn't veto, and he wants to be President and thus cares much more about national stuff. While the California legislators were probably annoyed at Pelosi butting into state business.

↑ comment by Eric Neyman (UnexpectedValues) · 2024-09-03T05:58:38.118Z · LW(p) · GW(p)

I believe that Pelosi had never once spoken out against a state bill authored by a California Democrat before this.

Replies from: michael-roe

↑ comment by Michael Roe (michael-roe) · 2024-09-03T08:06:14.657Z · LW(p) · GW(p)

A financial conflict of interest is a wonderous thing...

Replies from: UnexpectedValues

↑ comment by Eric Neyman (UnexpectedValues) · 2024-09-03T17:44:35.374Z · LW(p) · GW(p)

For what it's worth, I don't have any particular reason to think that that's the reason for her opposition.

↑ comment by Raemon · 2024-09-03T01:19:09.305Z · LW(p) · GW(p)

Is there some source that particularly indicates this? I get why the 15% base rate might be low, but haven't actually seen evidence apart from this Manifold question that it'd be higher.

↑ comment by Daniel Samuel (daniel-samuel) · 2024-09-03T09:40:26.050Z · LW(p) · GW(p)

Newsom’s stance on Big Tech is a bit murky. He pushed ideas like the Data Dividend but overall, he seems pretty friendly to the industry.

As for Pelosi, she’s still super influential, but she’ll be 88 by the next presidential election. Her long-term influence is definitely something to watch and Newsom probably has a good read on how things will shift.

↑ comment by Eric Neyman (UnexpectedValues) · 2024-09-03T06:00:11.187Z · LW(p) · GW(p)

But it seems like SB1047 hasn't been very controversial among CA politicians.

I think this isn't true. Concretely, I bet that if you looked at the distribution of Democratic No votes among bills that reached Newsom's desk, this one would be among the highest (7 No votes and a bunch of not-voting, which I think is just a polite way to vote No; source). I haven't checked and could be wrong!

My take is basically the same as Neel's, though my all-things-considered guess is that he's 60% or so to veto. My position on Manifold is in large part an emotional hedge. (Otherwise I would be placing much smaller bets in the same direction.)

comment by Orpheus16 (akash-wasil) · 2024-06-07T23:29:17.922Z · LW(p) · GW(p)

I've started reading the Report on the International Control of Atomic Energy and am finding it very interesting/useful.

I recommend this for AI policy people– especially those interested in international cooperation, US policy, and/or writing for policy audiences.

Replies from: akash-wasil

↑ comment by Orpheus16 (akash-wasil) · 2024-06-07T23:33:53.558Z · LW(p) · GW(p)

@Peter Barnett [LW · GW] @Rob Bensinger [LW · GW] @habryka [LW · GW] @Zvi [LW · GW] @davekasten [LW · GW] @Peter Wildeford [LW · GW] you come to mind as people who might be interested.

See also Wikipedia Page about the report (but IMO reading sections of the actual report is worth it.)

comment by Orpheus16 (akash-wasil) · 2024-10-23T15:27:56.987Z · LW(p) · GW(p)

Does anyone know why Anthropic doesn't want models with powerful cyber capabilities to be classified as "dual-use foundation models?"

In its BIS comment, Anthropic proposes a new definition of dual-use foundation model that excludes cyberoffensive capabilities. This also comes up in TechNet's response (TechNet is a trade association that Anthropic is a part of).

Does anyone know why Anthropic doesn't want the cyber component of the definition to remain? (I don't think they cover this in the comment).

---

More details– the original criteria for "dual-use foundation model" proposed by BIS are:

(1) Substantially lowering the barrier of entry for non-experts to design, synthesize, acquire, or use chemical, biological, radiological, or nuclear (CBRN) weapons;
(2) Enabling powerful offensive cyber operations through automated vulnerability discovery and exploitation against a wide range of potential targets of cyberattacks; or
(3) Permitting the evasion of human control or oversight through means of deception or obfuscation.

Anthropic's definition includes criteria #1 and #3 in its definition but excludes criterion #2.

(Separately, Anthropic argues that dual-use foundation models should be defined as those that pose catastrophic risks as opposed to serious risks to national security. This is important too, but I'm less confused about why Anthropic wants this.)

Replies from: davekasten

↑ comment by davekasten · 2024-10-23T16:42:31.664Z · LW(p) · GW(p)

Wild speculation: they also have a sort of we're-watching-but-unsure provision about cyber operations capability in their most recent RSP update. In it, they say in part that "it is also possible that by the time these capabilities are reached, there will be evidence that such a standard is not necessary (for example, because of the potential use of similar capabilities for defensive purposes)." Perhaps they're thinking that automated vulnerability discovery is at least plausibly on-net-defensive-balance-favorable*, and so they aren't sure it should be regulated as closely, even if in still in some informal sense "dual use" ?

Again, WILD speculation here.

*A claim that is clearly seen as plausible by, e.g., the DARPA AI Grand Challenge effort.

comment by Orpheus16 (akash-wasil) · 2024-06-14T15:55:25.855Z · LW(p) · GW(p)

Recommended readings for people interested in evals work?

Someone recently asked: "Suppose someone wants to get into evals work. Is there a good reading list to send to them?" I spent ~5 minutes and put this list together. I'd be interested if people have additional suggestions or recommendations:

I would send them:

Model evaluations for extreme risks
Evaluating frontier models for dangerous capabilities
METR ARA paper
Recent AI Sandbagging paper [LW · GW]
Anthropic's challenges in evaluating AI systems
Apollo's starter guide for evals
A paper I'm writing on semi-structured interviews as a good complement to formal evaluations (in-progress)

I would also encourage them to read stuff more on the "macrostrategy" of evals. Like, I suspect a lot of value will come from people who are able to understand the broader theory of change of evals and identify when we're "rowing" in bad directions. Some examples here might be:

How evals might (or might not) prevent catastrophic risks from AI [LW · GW] (a bit outdated but still relevant IMO).
Lots of the discussion around RSPs (e.g., RSPs are pauses done right [LW · GW], RSPs are risk management done wrong, OpenAI’s Preparedness Framework: Praise & Recommendations [LW · GW])
A paper I'm writing on emergency preparedness, that includes some thoughts on government's "detection capabilities" (in-progress).
Six dimensions of operational adequacy [LW · GW] (relevant for "what happens when the evals go off")
Carefully bootstrapped alignment is organizationally hard [LW · GW](also relevant for "what happens when the evals go off")

Replies from: Jozdien, akash-wasil

↑ comment by Jozdien · 2024-06-16T09:50:03.070Z · LW(p) · GW(p)

I'm obviously biased, but I would recommend my post on macrostrategy of evals: The case for more ambitious language model evals [LW · GW].

↑ comment by Orpheus16 (akash-wasil) · 2024-06-14T16:00:43.410Z · LW(p) · GW(p)

@Ryan Kidd [LW · GW] @Lee Sharkey [LW · GW] I suspect you'll have useful recommendations here.

comment by Orpheus16 (akash-wasil) · 2024-04-24T20:14:37.346Z · LW(p) · GW(p)

I'm interested in writing out somewhat detailed intelligence explosion scenarios. The goal would be to investigate what kinds of tools the US government would have to detect and intervene in the early stages of an intelligence explosion.

If you know anyone who has thought about these kinds of questions, whether from the AI community or from the US government perspective, please feel free to reach out via LessWrong.

Akash's Shortform

Contents

99 comments