Posts

How the AI safety technical landscape has changed in the last year, according to some practitioners 2024-07-26T19:06:47.126Z
tlevin's Shortform 2024-04-30T21:32:59.991Z
EU policymakers reach an agreement on the AI Act 2023-12-15T06:02:44.668Z
Notes on nukes, IR, and AI from "Arsenals of Folly" (and other books) 2023-09-04T19:02:58.283Z
Apply to HAIST/MAIA’s AI Governance Workshop in DC (Feb 17-20) 2023-01-31T02:06:54.656Z
Update on Harvard AI Safety Team and MIT AI Alignment 2022-12-02T00:56:45.596Z

Comments

Comment by tlevin (trevor) on Akash's Shortform · 2024-11-20T20:22:34.761Z · LW · GW

Depends on the direction/magnitude of the shift!

I'm currently feeling very uncertain about the relative costs and benefits of centralization in general. I used to be more into the idea of a national project that centralized domestic projects and thus reduced domestic racing dynamics (and arguably better aligned incentives), but now I'm nervous about the secrecy that would likely entail, and think it's less clear that a non-centralized situation inevitably leads to a decisive strategic advantage for the leading project. Which is to say, even under pretty optimistic assumptions about how much such a project invests in alignment, security, and benefit-sharing, I'm pretty uncertain that this would be good, and with more realistic assumptions I probably lean towards it being bad. But it super depends on the governance, the wider context, how a "Manhattan Project" would affect domestic companies and China's policymaking, etc.

(I think a great start would be not naming it after the Manhattan Project, though. It seems path dependent, and that's not a great first step.)

Comment by tlevin (trevor) on tlevin's Shortform · 2024-08-30T00:27:05.693Z · LW · GW

It's not super clear whether from a racing perspective having an equal number of nukes is bad. I think it's genuinely messy (and depends quite sensitively on how much actors are scared of losing vs. happy about winning vs. scared of racing). 


Importantly though, once you have several thousand nukes the strategic returns to more nukes drop pretty close to zero, regardless of how many your opponents have, while if you get the scary model's weights and then don't use them to push capabilities even more, your opponent maybe gets a huge strategic advantage over you. I think this is probably true, but the important thing is whether the actors think it might be true.

In-general I think it's very hard to predict whether people will overestimate or underestimate things. I agree that literally right now countries are probably underestimating it, but an overreaction in the future also wouldn't surprise me very much (in the same way that COVID started with an underreaction, and then was followed by a massive overreaction).

Yeah, good point.

Comment by tlevin (trevor) on tlevin's Shortform · 2024-08-29T23:58:01.897Z · LW · GW

Yeah doing it again it works fine, but it was just creating a long list of empty bullet points (I also have this issue in GDocs sometimes)

Comment by tlevin (trevor) on tlevin's Shortform · 2024-08-29T23:56:41.643Z · LW · GW

Gotcha. A few disanalogies though -- the first two specifically relate to the model theft/shared access point, the latter is true even if you had verifiable API access: 

  1. Me verifying how many nukes you have doesn't mean I suddenly have that many nukes, unlike AI model capabilities, though due to compute differences it does not mean we suddenly have the same time distance to superintelligence. 
  2. Me having more nukes only weakly enables me to develop more nukes faster, unlike AI that can automate a lot of AI R&D.
  3. This model seems to assume you have an imprecise but accurate estimate of how many nukes I have, but companies will probably be underestimating the proximity of each other to superintelligence, for the same reason that they're underestimating their own proximity to superintelligence, until it's way more salient/obvious.
Comment by tlevin (trevor) on Monthly Roundup #21: August 2024 · 2024-08-29T23:49:26.936Z · LW · GW

In general, we should be wary of this sort of ‘make things worse in order to make things better.’ You are making all conversations of all sizes worse in order to override people’s decisions.

Glad to be included in the roundup, but two issues here.

First, it's not about overriding people's decisions; it's a collective action problem. When the room is silent and there's a single group of 8, I don't actually face a choice of a 2- or 3-person conversation; it doesn't exist! The music lowers the costs for people to split into smaller conversations, so the people who prefer those now have better choices, not worse.

Second, this is a Simpson's Paradox-related fallacy: you are indeed making all conversations more difficult, but in my model, smaller conversations are much better, so by making conversations of all sizes slightly to severely worse but moving the population to smaller conversations, you're still improving the conversations on net.

Comment by tlevin (trevor) on tlevin's Shortform · 2024-08-29T23:39:49.408Z · LW · GW

Also - I'm not sure I'm getting the thing where verifying that your competitor has a potentially pivotal model reduces racing?

Comment by tlevin (trevor) on tlevin's Shortform · 2024-08-29T23:37:19.612Z · LW · GW

The "how do we know if this is the most powerful model" issue is one reason I'm excited by OpenMined, who I think are working on this among other features of external access tools

Comment by tlevin (trevor) on tlevin's Shortform · 2024-08-29T23:35:01.177Z · LW · GW

If probability of misalignment is low, probability of human+AI coups (including e.g. countries invading each other) is high, and/or there aren't huge offense-dominant advantages to being somewhat ahead, you probably want more AGI projects, not fewer. And if you need a ton of compute to go from an AI that can do 99% of AI R&D tasks to an AI that can cause global catastrophe, then model theft is less of a factor. But the thing I'm worried about re: model theft is a scenario like this, which doesn't seem that crazy:

  • Company/country X has an AI agent that can do 99% [edit: let's say "automate 90%"] of AI R&D tasks, call it Agent-GPT-7, and enough of a compute stock to have that train a significantly better Agent-GPT-8 in 4 months at full speed ahead, which can then train a basically superintelligent Agent-GPT-9 in another 4 months at full speed ahead. (Company/country X doesn't know the exact numbers, but their 80% CI is something like 2-8 months for each step; company/country Y has less info, so their 80% CI is more like 1-16 months for each step.)
  • The weights for Agent-GPT-7 are available (legally or illegally) to company/country Y, which is known to company/country X.
  • Y has, say, a fifth of the compute. So each of those steps will take 20 months. Symmetrically, company/country Y thinks it'll take 10-40 months and company/country X thinks it's 5-80.
  • Once superintelligence is in sight like this, both company/country X and Y become very scared of the other getting it first -- in the country case, they are worried it will undermine nuclear deterrence, upend their political system, basically lead to getting taken over by the other. The relevant decisionmakers think this outcome is better than extinction, but maybe not by that much, whereas getting superintelligence before the other side is way better. In the company case, it's a lot less intense, but they still would much rather get superintelligence than their arch-rival CEO.
  • So, X thinks they have anywhere from 5-80 months before Y has superintelligence, and Y thinks they have 1-16 months. So X and Y both think it's easily possible, well within their 80% CI, that Y beats X.
  • X and Y have no reliable means of verifying a commitment like "we will spend half our compute on safety testing and alignment research."
  • If these weights were not available, Y would have a similarly good system in 18 months, 80% CI 12-24.

So, had the weights not been available to Y, X would be confident that it had 12 + 5 months to manage a capabilities explosion that would have happened in 8 months at full speed; it can spend >half of its compute on alignment/safety/etc, and it has 17 rather than 5 months of serial time to negotiate with Y, possibly develop some verification methods and credible mechanisms for benefit/power-sharing, etc. If various transparency reforms have been implemented, such that the world is notified in ~real-time that this is happening, there would be enormous pressure to do so; I hope and think it will seem super illegitimate to pursue this kind of power without these kinds of commitments. I am much more worried about X not doing this and instead just trying to grab enormous amounts of power if they're doing it all in secret.

[Also: I just accidentally went back a page by command-open bracket in an attempt to get my text out of bullet format and briefly thought I lost this comment; thank you in your LW dev capacity for autosave draft text, but also it is weirdly hard to get out of bullets]

Comment by tlevin (trevor) on tlevin's Shortform · 2024-08-29T21:43:21.063Z · LW · GW

[reposting from Twitter, lightly edited/reformatted] Sometimes I think the whole policy framework for reducing catastrophic risks from AI boils down to two core requirements -- transparency and security -- for models capable of dramatically accelerating R&D.

If you have a model that could lead to general capabilities much stronger than human-level within, say, 12 months, by significantly improving subsequent training runs, the public and scientific community have a right to know this exists and to see at least a redacted safety case; and external researchers need to have some degree of red-teaming access. Probably various other forms of transparency would be useful too. It feels like this is a category of ask that should unite the "safety," "ethics," and "accelerationist" communities?

And the flip side is that it's very important that a model capable of kicking off that kind of dash to superhuman capabilities not get stolen/exfiltrated, such that you don't wind up with multiple actors facing enormous competitive tradeoffs to rush through this process.

These have some tradeoffs, especially as you approach AGI -- e.g. if you develop a system that can do 99% of foundation model training tasks and your security is terrible you do have some good reasons not to immediately announce it -- but not if we make progress on either of these before then, IMO. What the Pareto Frontier of transparency and security looks like, and where we should land on that curve, seems like a very important research agenda.

If you're interested in moving the ball forward on either of these, my colleagues and I would love to see your proposal and might fund you to work on it!

Comment by tlevin (trevor) on tlevin's Shortform · 2024-07-31T07:10:03.915Z · LW · GW

Seems cheap to get the info value, especially for quieter music? Can be expensive to set up a multi-room sound system, but it's probably most valuable in the room that is largest/most prone to large group formation, so maybe worth experimenting with a speaker playing some instrumental jazz or something. I do think the architecture does a fair bit of work already.

Comment by tlevin (trevor) on tlevin's Shortform · 2024-07-31T01:31:32.289Z · LW · GW

I'm confident enough in this take to write it as a PSA: playing music at medium-size-or-larger gatherings is a Chesterton's Fence situation

It serves the very important function of reducing average conversation size: the louder the music, the more groups naturally split into smaller groups, as people on the far end develop a (usually unconscious) common knowledge that it's too much effort to keep participating in the big one and they can start a new conversation without being unduly disruptive. 

If you've ever been at a party with no music where people gravitate towards a single (or handful of) group of 8+ people, you've experienced the failure mode that this solves: usually these conversations are then actually conversations of 2-3 people with 5-6 observers, which is usually unpleasant for the observers and does not facilitate close interactions that easily lead to getting to know people. 

By making it hard to have bigger conversations, the music naturally produces smaller ones; you can modulate the volume to have the desired effect on a typical discussion size. Quiet music (e.g. at many dinner parties) makes it hard to have conversations bigger than ~4-5, which is already a big improvement. Medium-volume music (think many bars) facilitates easy conversations of 2-3. The extreme end of this is dance clubs, where very loud music (not coincidentally!) makes it impossible to maintain conversations bigger than 2. 

I suspect that high-decoupler hosts are just not in the habit of thinking "it's a party, therefore I should put music on," or even actively think "music makes it harder to talk and hear each other, and after all isn't that the point of a party?" But it's a very well-established cultural practice to play music at large gatherings, so, per Chesterton's Fence, you need to understand what function it plays. The function it plays is to stop the party-destroying phenomenon of big group conversations.

Comment by tlevin (trevor) on How the AI safety technical landscape has changed in the last year, according to some practitioners · 2024-07-28T23:24:09.512Z · LW · GW

I agree that that's the most important change and that there's reason to think people in Constellation/the Bay Area in general might systematically under-attend to policy developments, but I think the most likely explanation for the responses concentrating on other things is that I explicitly asked about technical developments that I missed because I wasn't in the Bay, and the respondents generally have the additional context that I work in policy and live in DC, so responses that centered policy change would have been off-target.

Comment by tlevin (trevor) on William_S's Shortform · 2024-05-17T18:16:33.324Z · LW · GW

Kelsey Piper now reports: "I have seen the extremely restrictive off-boarding agreement that contains nondisclosure and non-disparagement provisions former OpenAI employees are subject to. It forbids them, for the rest of their lives, from criticizing their former employer. Even acknowledging that the NDA exists is a violation of it."

Comment by tlevin (trevor) on tlevin's Shortform · 2024-05-02T01:29:02.053Z · LW · GW

Quick reactions:

  1. Re: how over-emphasis on "how radical is my ask" vs "what my target audience might find helpful" and generally the importance of making your case well regardless of how radical it is, that makes sense. Though notably the more radical your proposal is (or more unfamiliar your threat models are), the higher the bar for explaining it well, so these do seem related.
  2. Re: more effective actors looking for small wins, I agree that it's not clear, but yeah, seems like we are likely to get into some reference class tennis here. "A lot of successful organizations that take hard-line positions and (presumably) get a lot of their power/influence from the ideological purity that they possess & communicate"? Maybe, but I think of like, the agriculture lobby, who just sort of quietly make friends with everybody and keep getting 11-figure subsidies every year, in a way that (I think) resulted more from gradual ratcheting than making a huge ask. "Pretty much no group– whether radical or centrist– has had tangible wins" seems wrong in light of the EU AI Act (where I think both a "radical" FLI and a bunch of non-radical orgs were probably important) and the US executive order (I'm not sure which strategy is best credited there, but I think most people would have counted the policies contained within it as "minor asks" relative to licensing, pausing, etc). But yeah I agree that there are groups along the whole spectrum that probably deserve credit.
  3. Re: poisoning the well, again, radical-ness and being dumb/uninformed are of course separable but the bar rises the more radical you get, in part because more radical policy asks strongly correlate with more complicated procedural asks; tweaking ECRA is both non-radical and procedurally simple, creating a new agency to license training runs is both outside the DC Overton Window and very procedurally complicated.
  4. Re: incentives, I agree that this is a good thing to track, but like, "people who oppose X are incentivized to downplay the reasons to do X" is just a fully general counterargument. Unless you're talking about financial conflicts of interest, but there are also financial incentives for orgs pursuing a "radical" strategy to downplay boring real-world constraints, as well as social incentives (e.g. on LessWrong IMO) to downplay boring these constraints and cognitive biases against thinking your preferred strategy has big downsides.
  5. I agree that the CAIS statement, Hinton leaving Google, and Bengio and Hogarth's writing have been great. I think that these are all in a highly distinct category from proposing specific actors take specific radical actions (unless I'm misremembering the Hogarth piece). Yudkowsky's TIME article, on the other hand, definitely counts as an Overton Window move, and I'm surprised that you think this has had net positive effects. I regularly hear "bombing datacenters" as an example of a clearly extreme policy idea, sometimes in a context that sounds like it maybe made the less-radical idea seem more reasonable, but sometimes as evidence that the "doomers" want to do crazy things and we shouldn't listen to them, and often as evidence that they are at least socially clumsy, don't understand how politics works, etc, which is related to the things you list as the stuff that actually poisons the well. (I'm confused about the sign of the FLI letter as we've discussed.)
  6. I'm not sure optimism vs pessimism is a crux, except in very short, like, 3-year timelines. It's true that optimists are more likely to value small wins, so I guess narrowly I agree that a ratchet strategy looks strictly better for optimists, but if you think big radical changes are needed, the question remains of whether you're more likely to get there via asking for the radical change now or looking for smaller wins to build on over time. If there simply isn't time to build on these wins, then yes, better to take a 2% shot at the policy that you actually think will work; but even in 5-year timelines I think you're better positioned to get what you ultimately want by 2029 if you get a little bit of what you want in 2024 and 2026 (ideally while other groups also make clear cases for the threat models and develop the policy asks, etc.). Another piece this overlooks is the information and infrastructure built by the minor policy changes. A big part of the argument for the reporting requirements in the EO was that there is now going to be an office in the US government that is in the business of collecting critical information about frontier AI models and figuring out how to synthesize it to the rest of government, that has the legal authority to do this, and both the office and the legal authority can now be expanded rather than created, and there will now be lots of individuals who are experienced in dealing with this information in the government context, and it will seem natural that the government should know this information. I think if we had only been developing and advocating for ideal policy, this would not have happened (though I imagine that this is not in fact what you're suggesting the community do!).
Comment by tlevin (trevor) on tlevin's Shortform · 2024-04-30T21:33:00.502Z · LW · GW

I think some of the AI safety policy community has over-indexed on the visual model of the "Overton Window" and under-indexed on alternatives like the "ratchet effect," "poisoning the well," "clown attacks," and other models where proposing radical changes can make you, your allies, and your ideas look unreasonable (edit to add: whereas successfully proposing minor changes achieves hard-to-reverse progress, making ideal policy look more reasonable).

I'm not familiar with a lot of systematic empirical evidence on either side, but it seems to me like the more effective actors in the DC establishment overall are much more in the habit of looking for small wins that are both good in themselves and shrink the size of the ask for their ideal policy than of pushing for their ideal vision and then making concessions. Possibly an ideal ecosystem has both strategies, but it seems possible that at least some versions of "Overton Window-moving" strategies executed in practice have larger negative effects via associating their "side" with unreasonable-sounding ideas in the minds of very bandwidth-constrained policymakers, who strongly lean on signals of credibility and consensus when quickly evaluating policy options, than the positive effects of increasing the odds of ideal policy and improving the framing for non-ideal but pretty good policies.

In theory, the Overton Window model is just a description of what ideas are taken seriously, so it can indeed accommodate backfire effects where you argue for an idea "outside the window" and this actually makes the window narrower. But I think the visual imagery of "windows" actually struggles to accommodate this -- when was the last time you tried to open a window and accidentally closed it instead? -- and as a result, people who rely on this model are more likely to underrate these kinds of consequences.

Would be interested in empirical evidence on this question (ideally actual studies from psych, political science, sociology, econ, etc literatures, rather than specific case studies due to reference class tennis type issues).

Comment by tlevin (trevor) on The Worst Form Of Government (Except For Everything Else We've Tried) · 2024-03-26T22:45:46.898Z · LW · GW

The "highly concentrated elite" issue seems like it makes it more, rather than less, surprising and noteworthy that a lack of structural checks and balances has resulted in a highly stable and (relatively) individual-rights-respecting set of policy outcomes. That is, it seems like there would thus be an especially strong case for various non-elite groups to have explicit veto power.

Comment by tlevin (trevor) on On green · 2024-03-23T00:29:04.124Z · LW · GW

One other thought on Green in rationality: you mention the yin of scout mindset in the Deep Atheism post, and scout mindset and indeed correct Bayesianism involves a Green passivity and maybe the "respect for the Other" described here. While Blue is agnostic, in theory, between yin and yang -- whichever gives me more knowledge! -- Blue as evoked in Duncan's post and as I commonly think of it tends to lean yang: "truth-seeking," "diving down research rabbit holes," "running experiments," etc. A common failure mode of Blue-according-to-Blue is a yang that projects the observer into the observations: seeing new evidence as tools, arguments as soldiers. Green reminds Blue to chill: see the Other as it is, recite the litanies of Gendlin and Tarski, combine the seeking of truth with receptivity to what you find.

Comment by tlevin (trevor) on The Worst Form Of Government (Except For Everything Else We've Tried) · 2024-03-18T20:59:05.723Z · LW · GW

I think this post aims at an important and true thing and misses in a subtle and interesting but important way.

Namely: I don't think the important thing is that one faction gets a veto. I think it's that you just need limitations on what the government can do that ensure that it isn't too exploitative/extractive. One way of creating these kinds of limitations is creating lots of veto points and coming up with various ways to make sure that different factions hold the different veto points. But, as other commenters have noted, the UK government does not have structural checks and balances. In my understanding, what they have instead is a bizarrely, miraculously strong respect for precedent and consensus about what "is constitutional" despite (or maybe because of?) the lack of a written constitution. For the UK, and maybe other, less-established democracies (i.e. all of them), I'm tempted to attribute this to the "repeated game" nature of politics: when your democracy has been around long enough, you come to expect that you and the other faction will share power (roughly at 50-50 for median voter theorem reasons), so voices within your own faction start saying "well, hold on, we actually do want to keep the norms around."

Also, re: the electoral college, can you say more about how this creates de facto vetos? The electoral college does not create checks and balances; you can win in the electoral college without appealing to all the big factions (indeed, see Abraham Lincoln's 1860 win), and the electoral college puts no restraints on the behavior of the president afterward. It just noisily empowers states that happen to have factional mixes close to the national average, and indeed can create paths to victory that route through doubling down on support within your own faction while alienating those outside it (e.g. Trump's 2016 and 2020 coalitions).

Comment by tlevin (trevor) on EU policymakers reach an agreement on the AI Act · 2024-01-08T23:17:30.165Z · LW · GW

(An extra-heavy “personal capacity” disclaimer for the following opinions.) Yeah, I hear you that OP doesn’t have as much public writing about our thinking here as would be ideal for this purpose, though I think the increasingly adversarial environment we’re finding ourselves in limits how transparent we can be without undermining our partners’ work (as we’ve written about previously).

The set of comms/advocacy efforts that I’m personally excited about is definitely larger than the set of comms/advocacy efforts that I think OP should fund, since 1) that’s a higher bar, and 2) sometimes OP isn’t the right funder for a specific project. That being said:

  • So far, OP has funded AI policy advocacy efforts by the Institute for Progress and Sam Hammond. I personally don’t have a very detailed sense of how these efforts have been going, but the theory of impact for these was that both grantees have strong track records in communicating policy ideas to key audiences and a solid understanding of the technical and governance problems that policy needs to solve.
  • I’m excited about the EU efforts of FLI and The Future Society. In the EU context, it seems like these orgs were complementary, where FLI was willing to take steps (including the pause letter) that sparked public conversation and gave policymakers context that made TFS’s policy conversations more productive (despite raising some controversy). I have much less context on their US work, but from what I know, I respect the policymaker outreach and convening work that they do and think they are net-positive.
  • I think CAIP is doing good work so far, though they have less of a track record. I value their thinking about the effectiveness of different policy options, and they seem to be learning and improving quickly.
  • I don’t know as much about Andrea and Control AI, but my main current takes about them are that their anti-RSP advocacy should have been heavier on “RSPs are insufficient,” which I agree with, instead of “RSPs are counterproductive safety-washing,” which I think could have disincentivized companies from the very positive move of developing an RSP (as you and I discussed privately a while ago). MAGIC is an interesting and important proposal and worth further developing (though as with many clever acronyms I kind of wish it had been named differently).
  • I’m not sure what to think about Holly’s work and PauseAI. I think the open source protest where they gave out copies of a GovAI paper to Meta employees seemed good – that seems like the kind of thing that could start really productive thinking within Meta. Broadly building awareness of AI’s catastrophic potential seems really good, largely for the reasons Holly describes here. Specifically calling for a pause is complicated, both in terms of the goodness of the types of policies that could be called a pause and in terms of the politics (i.e., the public seems pretty on board, but it might backfire specifically with the experts that policymakers will likely defer to, but also it might inspire productive discussion around narrower regulatory proposals?). I think this cluster of activists can sometimes overstate or simplify their claims, which I worry about.

Some broader thoughts about what kinds of advocacy would be useful or not useful:

  • The most important thing, imo, is that whatever advocacy you do, you do it well. This sounds obvious, but importantly differs from “find the most important/neglected/tractable kind of advocacy, and then do that as well as you personally can do it.” For example, I’d be really excited about people who have spent a long time in Congress-type DC world doing advocacy that looks like meeting with staffers; I’d be excited about people who might be really good at writing trying to start a successful blog and social media presence; I’d be excited about people with a strong track record in animal advocacy campaigns applying similar techniques to AI policy. Basically I think comparative advantage is really important, especially in cases where the risk of backfire/poisoning the well is high.
  • In all of these cases, I think it’s very important to make sure your claims are not just literally accurate but also don’t have misleading implications and are clear about your level of confidence and the strength of the evidence. I’m very, very nervous about getting short-term victories by making bad arguments. Even Congress, not known for its epistemic and scientific rigor, has gotten concerned that AI safety arguments aren’t as rigorous as they need to be (even though I take issue with most of the specific examples they provide).
  • Relatedly, I think some of the most useful “advocacy” looks a lot like research: if an idea is currently only legible to people who live and breathe AI alignment, writing it up in a clear and rigorous way, such that academics, policymakers, and the public can interact with it, critique it, and/or become advocates for it themselves is very valuable.
  • This is obviously not a novel take, but I think other things equal advocacy should try not to make enemies. It’s really valuable that the issue remain somewhat bipartisan and that we avoid further alienating the AI fairness and bias communities and the mainstream ML community. Unfortunately “other things equal” won’t always hold, and sometimes these come with steep tradeoffs, but I’d be excited about efforts to build these bridges, especially by people who come from/have spent lots of time in the community to which they’re bridge-building.
Comment by tlevin (trevor) on AI Risk and the US Presidential Candidates · 2024-01-08T19:29:53.231Z · LW · GW

Just being "on board with AGI worry" is so far from sufficient to taking useful actions to reduce the risk that I think epistemics and judgment is more important, especially since we're likely to get lots of evidence (one way or another) about the timelines and risks posed by AI during the term of the next president.

Comment by tlevin (trevor) on AI Risk and the US Presidential Candidates · 2024-01-08T19:26:49.858Z · LW · GW

He has also broadly indicated that he would be hostile to the nonpartisan federal bureaucracy, e.g. by designating way more of them as presidential appointees, allowing him personally to fire and replace them. I think creating new offices that are effectively set up to regulate AI looks much more challenging in a Trump (and to some extent DeSantis) presidency than the other candidates.

Comment by tlevin (trevor) on EU policymakers reach an agreement on the AI Act · 2023-12-20T23:00:31.176Z · LW · GW

Thanks for these thoughts! I agree that advocacy and communications is an important part of the story here, and I'm glad for you to have added some detail on that with your comment. I’m also sympathetic to the claim that serious thought about “ambitious comms/advocacy” is especially neglected within the community, though I think it’s far from clear that the effort that went into the policy research that identified these solutions or work on the ground in Brussels should have been shifted at the margin to the kinds of public communications you mention.

I also think Open Phil’s strategy is pretty bullish on supporting comms and advocacy work, but it has taken us a while to acquire the staff capacity to gain context on those opportunities and begin funding them, and perhaps there are specific opportunities that you're more excited about than we are. 

For what it’s worth, I didn’t seek significant outside input while writing this post and think that's fine (given the alternative of writing it quickly, posting it here, disclaiming my non-expertise, and getting additional perspectives and context from commenters like yourself). However, I have spoken with about a dozen people working on AI policy in Europe over the last couple months (including one of the people whose public comms efforts are linked in your comment) and would love to chat with more people with experience doing policy/politics/comms work in the EU.

We could definitely use more help thinking about this stuff, and I encourage readers who are interested in contributing to OP’s thinking on advocacy and comms to do any of the following:

  • Write up these critiques (we do read the forums!); 
  • Join our team (our latest hiring round specifically mentioned US policy advocacy as a specialization we'd be excited about, but people with advocacy/politics/comms backgrounds more generally could also be very useful, and while the round is now closed, we may still review general applications); and/or 
  • Introduce yourself via the form mentioned in this post.
Comment by tlevin (trevor) on EU policymakers reach an agreement on the AI Act · 2023-12-17T19:06:29.394Z · LW · GW

Thank you! Classic American mistake on my part to round these institutions to their closest US analogies.

Comment by tlevin (trevor) on What I Would Do If I Were Working On AI Governance · 2023-12-09T18:31:51.436Z · LW · GW

I broadly share your prioritization of public policy over lab policy, but as I've learned more about liability, the more it seems like one or a few labs having solid RSPs/evals commitments/infosec practices/etc would significantly shift how courts make judgments about how much of this kind of work a "reasonable person" would do to mitigate the foreseeable risks. Legal and policy teams in labs will anticipate this and thus really push for compliance with whatever the perceived industry best practice is. (Getting good liability rulings or legislation would multiply this effect.)

Comment by tlevin (trevor) on Weighing Animal Worth · 2023-09-29T01:01:50.773Z · LW · GW

"We should be devoting almost all of global production..." and "we must help them increase" are only the case if:

  1. There are no other species whose product of [moral weight] * [population] is higher than bees, and
  2. Our actions only have moral relevance for beings that are currently alive.

(And, you know, total utilitarianism and such.)

Comment by tlevin (trevor) on Commonsense Good, Creative Good · 2023-09-29T00:44:13.591Z · LW · GW

Just want to plug Josh Greene's great book Moral Tribes here (disclosure: he's my former boss). Moral Tribes basically makes the same argument in different/more words: we evolved moral instincts that usually serve us pretty well, and the tricky part is realizing when we're in a situation that requires us to pull out the heavy-duty philosophical machinery.

Comment by tlevin (trevor) on Protest against Meta's irreversible proliferation (Sept 29, San Francisco) · 2023-09-20T18:55:16.760Z · LW · GW

I think the main thing stopping the accelerationists and open source enthusiasts from protesting with 10x as many people is that, whether for good reasons or not, there is much more opposition to AI progress and proliferation than support among the general public. (Admittedly this is probably less true in the Bay Area, but I would be surprised if it was even close to parity there and very surprised if it were 10x.)

Comment by tlevin (trevor) on I don't want to talk about AI · 2023-05-24T17:09:57.850Z · LW · GW

My response to both paragraphs is that the relevant counterfactual is "not looking into/talking about AI risks." I claim that there is at least as much social pressure from the community to take AI risk seriously and to talk about it as there is to reach a pessimistic conclusion, and that people are very unlikely to lose "all their current friends" by arriving at an "incorrect" conclusion if their current friends are already fine with the person not having any view at all on AI risks.

Comment by tlevin (trevor) on I don't want to talk about AI · 2023-05-22T22:12:51.610Z · LW · GW

I think it's admirable to say things like "I don't want to [do the thing that this community holds as near-gospel as a good thing to do.]" I also think the community should take it seriously that anyone feels like they're punished for being intellectually honest, and in general I'm sad that it seems like your interactions with EAs/rats about AI have been unpleasant.

That said...I do want to push back on basically everything in this post and encourage you and others in this position to spend some time seeing if you agree or disagree with the AI stuff.

  • Assuming that you think you'd look into it in a reasonable way, then you'd be much more likely to reach a doomy conclusion if it were actually true. If it were true, it would be very much in your interest — altruistically and personally — to believe it. In general, it's just pretty useful to have more information about things that could completely transform your life. If you might have a terminal illness, doesn't it make sense to find out soon so you can act appropriately even if it's totally untreatable?
  • I also think there are many things for non-technical people to do on AI risk! For example, you could start trying to work on the problem, or if you think it's just totally hopeless w/r/t your own work, you could work less hard and save less for retirement so you can spend more time and money on things you value now. 

For the "what if I decide it's not a big deal conclusion":

  • For points #1 through #3, I'm basically just surprised that you don't already experience this with the take "I don't want to learn about or talk about AI" such that it would get worse if your take was "I have a considered view that AI x-risk is low"! To be honest and a little blunt, I do judge people a bit when they have bad reasoning either for high or low levels of x-risk, but I'm pretty sure I judge them a lot more positively when they've made a good-faith effort at figuring it out.
  • For point #3 and #4, idk, Holden, Joe Carlsmith, Rob Long, and possibly I (among others) are all people who have (hopefully) contributed something valuable to the fight against AI risk with social science or humanities backgrounds, so I don't think this means you wouldn't be persuasive, and it seems incredibly valuable for the community if more people think things through and come to this opinion. The consensus that AI safety is a huge deal currently means we have hundreds of millions of dollars, hundreds of people (many of whom are anxious and/or depressed because of this consensus), and dozens of orgs focused on it. Imagine if this is wrong — we'd be inflicting so much damage!
Comment by tlevin (trevor) on Many AI governance proposals have a tradeoff between usefulness and feasibility · 2023-02-12T21:51:25.644Z · LW · GW

It seems to me like government-enforced standards are just another case of this tradeoff - they are quite a bit more useful, in the sense of carrying the force of law and applying to all players on a non-voluntary basis, and harder to implement, due to the attention of legislators being elsewhere, the likelihood that a good proposal gets turned into something bad during the legislative process, and the opportunity cost of the political capital.

Comment by tlevin (trevor) on Staring into the abyss as a core life skill · 2023-01-09T01:08:22.525Z · LW · GW

This post has already helped me admit that I needed to accept defeat and let go of a large project in a way that I think might lead to its salvaging by others - thanks for writing.

Comment by tlevin (trevor) on College Selection Advice for Technical Alignment · 2022-12-18T01:54:26.284Z · LW · GW

First, congratulations - what a relief to get in (and pleasant update on how other selective processes will go, including the rest of college admissions)!

I lead HAIST and MAIA's governance/strategy programming and co-founded CBAI, which is both a source of conflict of interest and insider knowledge, and my take is that you should almost certainly apply to MIT. MIT is a much denser pool of technical talent, but MAIA is currently smaller and less well-organized than HAIST. Just by being an enthusiastic participant, you could help make it a more robust group, and if you're at all inclined to help organize (which I think would be massively valuable), you could solve an important bottleneck in making MAIA an awesome source of excellent alignment researchers. (If this is the case, would love to chat.) You'd also be in the HAIST/MAIA social community either way, but I think you'd have more of a multiplier effect by engaging on the MAIA side.

As other commenters have noted, I think there are a few reasons to prefer MIT for your own alignment research trajectory, like a significantly stronger CS department (you can cross-register, but save yourself the commute!), a slightly nerdier and more truth-seeking culture, and better signaling value. (To varying degrees including negative values, these are probably also true for Caltech, Mudd, Olin, and Stanford, per John Wentworth's comment, but I'm more familiar with MIT.)

I also think it will just not take that long to do one more application, since you have another couple weeks to do it anyway. I would prioritize getting one last app to MIT over the line, and if you find you still have energy consider doing the same to Caltech, Stanford, maybe others, idk. Not the end of the world to end up at Harvard by any means, but I do think it would be good for both you and humanity if you wound up at MIT!

Comment by tlevin (trevor) on Consider working more hours and taking more stimulants · 2022-12-18T01:13:28.365Z · LW · GW

I don't think this is the right axis on which to evaluate posts. Posts that suggest donating more of your money to charities that save the most lives, causing less animal suffering via your purchases, and considering that AGI might soon end humanity are also "harmful to an average reader" in a similar sense: they inspire some guilt, discomfort, and uncertainty, possibly leading to changes that could easily reduce the reader's own hedonic welfare.

However -- hopefully, at least -- the "average reader" on LW/EAF is trying to believe true things and achieve goals like improving the world, and presenting them arguments that they can evaluate for themselves and might help them unlock more of their own potential seems good.

I also think the post is unlikely to be net-negative given the caveats about trying this as an experiment, the different effects on different kinds of work, etc.

Comment by tlevin (trevor) on Probably good projects for the AI safety ecosystem · 2022-12-05T23:20:22.485Z · LW · GW

Quick note on 2: CBAI is pretty concerned about our winter ML bootcamp attracting bad-faith applicants and plan to use a combo of AGISF and references to filter pretty aggressively for alignment interest. Somewhat problematic in the medium term if people find out they can get free ML upskilling by successfully feigning interest in alignment, though...

Comment by tlevin (trevor) on Book Review: The Righteous Mind · 2022-07-07T21:34:32.376Z · LW · GW

Great write-up. Righteous Mind was the first in a series of books that really usefully transformed how I think about moral cognition (including Hidden Games, Moral Tribes, Secret of Our Success, Elephant in the Brain). I think its moral philosophy, however, is pretty bad. In a mostly positive (and less thorough) review I wrote a few years ago (that I don't 100% endorse today), I write:

Though Haidt explicitly tries to avoid the naturalistic fallacy, one of the book’s most serious problems is its tendency to assume that people finding something disgusting implies that the thing is immoral (124, 171-4). Similarly, it implies that because most people are less systematizing than Bentham and Kant, the moral systems of those thinkers must not be plausible (139, 141). [Note from me in 2022: In fact, Haidt bizarrely argues that Bentham and Kant were likely autistic and therefore these theories couldn't be right for a mostly neurotypical world.] Yes, moral feelings might have evolved as a group adaptation to promote “parochial altruism,” but that does not mean we shouldn’t strive to live a universalist morality; it just means it’s harder. Thomas Nagel, in the New York Review of Books, writes that “part of the interest of [The Righteous Mind] is in its failure to provide a fully coherent response” to the question of how descriptive morality theories could translate into normative recommendations.

I became even more convinced that this instinct towards relativism is a big problem for The Righteous Mind since reading Joshua Greene's excellent Moral Tribes, which covers much of the same ground. But Greene shows that this is not just an aversion to moral truth; it stems from Haidt's undue pessimism about the role of reason.

Moral Tribes argues that our moral intuitions evolved to solve the Tragedy of the Commons, but the contemporary world faces the "Tragedy of Commonsense Morality," where lots of tribes with different systems for solving collective-action problems have to get along. Greene dedicates much of the section "Why I'm a Liberal" to his disagreements with Haidt. After noting his agreements — morality evolved to promote cooperation, is mostly implemented through emotions, different groups have different moral intuitions, a source of lots of conflict, and we should be less hypocritical and self-righteous in our denunciations of other tribes' views — Greene says:

These are important lessons. But, unfortunately, they only get us so far. Being more open-minded and less self-righteous should facilitate moral problem-solving, but it's not itself a solution[....]

Consider once more the problem of abortion. Some liberals say that pro-lifers are misogynists who want to control women's bodies. And some socila conservatives believe that pro-choicers are irresponsible moral nihilists who lack respect for human life, who are part of a "culture of death." For such strident tribal moralists—and they are all too common—Haidt's prescription is right on time. But what then? Suppose you're a liberal, but a grown-up liberal. You understand that pro-lifers are motivated by genuine moral concern, that they are neither evil nor crazy. Should you now, in the spirit of compromise, agree to additional restrictions on abortion? [...]

It's one thing to acknowledge that one's opponents are not evil. It's another thing to concede that they're right, or half right, or no less justified in their beliefs and values than you are in yours. Agreeing to be less self-righteous is an important first step, but it doesn't answer the all-important questions: What should we believe? and What should we do?

Greene goes on to explain that Haidt thinks liberals and conservatives disagree because liberals have the "impoverished taste receptors" of only caring about harm and fairness, while conservatives have the "whole palette." But, Greene argues, the other tastes require parochial tribalism: you have to be loyal to something, sanctify something, respect an authority, that you probably don't share with the rest of the world. This makes social conservatives great at solving Tragedies of the Commons, but very bad at the Tragedy of Commonsense Morality, where lots of people worshipping different things and respecting different authorities and loyal to different tribes have to get along with each other.

According to Haidt, liberals should be more open to compromise with social conservatives. I disagree. In the short term, compromise might be necessary, but in the long term, our strategy should not be to compromise with tribal moralists, but rather to persuade them to be less tribalistic.

I'm not a social conservative because I do not think that tribalism, which is essentially selfishness at the group level, serves the greater good. [...]

This is not to say that liberals have nothing to learn from social conservatives. As Haidt points out, social conservatives are very good at making each other happy. [...] As a liberal, I can admire the social capital invested in a local church and wish that we liberals had equally dense and supportive social networks. But it's quite another thing to acquiesce to that church's teaching on abortion, homosexuality, and how the world got made.

Greene notes that even Haidt finds "no compelling alternative to utilitarianism" in matters of public policy after deriding it earlier. "It seems that the autistic philosopher [Bentham] was right all along," Greene observes. Greene explains Haidt's "paradoxical" endorsement of utilitarianism as an admission that conscious moral reasoning — like a camera's "manual mode" instead of the intuitive "point-and-shoot" morality — isn't so underrated after all. If we want to know the right thing to do, we can't just assume that all of the moral foundations have a grain of truth, figure we're equally tribalistic, and compromise with the conservatives; we need to turn to reason.

While Haidt is of course right that sound moral arguments often fail to sway listeners, "like the wind and the rain, washing over the land year after year, a good argument can change the shape of things. It begins with a willingness to question one's tribal beliefs. And here, being a little autistic might help." He then cites Bentham criticizing sodomy laws in 1785 and Mill advocating gender equality in 1869. And then he concludes: "Today we, some of us, defend the rights of gays and women with great conviction. But before we could do it with feeling, before our feelings felt like 'rights,' someone had to do it with thinking. I'm a deep pragmatist [Greene's preferred term for utilitarians], and a liberal, because I believe in this kind of progress and that our work is not yet done."