We Should Prepare for a Larger Representation of Academia in AI Safety

post by Leon Lang (leon-lang) · 2023-08-13T18:03:19.799Z · LW · GW · 13 comments

Contents

  Why I think academia's share in AI safety will increase 
  Weak evidence that this is already happening
  What might one do to prepare?
  Uncertainties
None
13 comments

Epistemic Status: I had the idea for the post a few days ago and quickly wrote it down while on a train. I'm very curious about other perspectives.

TL;DR: The recent increased public interest in AI Safety will likely lead to more funding for and more researchers from academia. I expect this increase to be larger than that of non-academic AI Safety work. We should prepare for that by thinking about how we "onboard" new researchers and how to marginally allocate resources (time and money) in the future. 

Why I think academia's share in AI safety will increase 

With the recent public interest in AI (existential) safety, many people will think about how they can help. Among people who think "I might want to do research on AI Safety", most will come from academia because that's where most research happens. Among people who will think "I should fund AI Safety research", most will fund academic-style research because that's where most research talent sits, and because it's the "normal" thing to do. I expect this increase to be larger than that of AI Safety researchers in companies (though with less certainty), AI Safety orgs, or independent researchers of, e.g., the "Lesswrong / Alignment Forum" style. 

Weak evidence that this is already happening

At the university of Amsterdam, where I'm a PhD student, there has been increased interest in AI Safety recently. In particular, one faculty actively starts to think about AI existential safety and wants to design a course that will include scalable oversight, and 4 other faculty are at least starting to get informed about AI existential safety with an "open mind".

What might one do to prepare?

Needless to say, I didn't think about this a lot, so take the following with a grain of salt and add your own ideas.

Uncertainties

I find it plausible that the representation of AI Safety researchers in companies like OpenAI and DeepMind will also grow very fast, though I think the increase will be smaller than in academia. 

13 comments

Comments sorted by top scores.

comment by zoop · 2023-08-14T15:04:08.709Z · LW(p) · GW(p)

Here be cynical opinions with little data to back them.

It's important to point out that "AI Safety" in an academic context usually means something slightly different from typical LW fare. For starters, as most AI work descended from computer science, its pretty hard [1] to get anything published in a serious AI venue (conference/journal) unless you 

  1. Demonstrate a thing works
  2. Use theory to explain a preexisting phenomenon

Both PhD students and their advisors want to publish things in established venues, so by default one should expect academic AI Safety research to have a near-term prioritization and be less focused on AGI/ex-risk. That isn't to say research can't accomplish both things at once, but its worth noting.

Because AI Safety in the academic sense hasn't traditionally meant safety from AGI ruin, there is a long history of EA aligned people not really being aware of or caring about safety research. Safety has been getting funding for a long time, but it looked less like MIRI and more like the University of York's safe autonomy lab [2] or the DARPA Assured Autonomy program [3]. With these dynamics in mind, I fully expect the majority of new AI safety funding to go to one of the following areas:

  • Aligning current gen AI with the explicit intentions of its trainers in adversarial environments, e.g. make my chatbot not tell users how to make bombs when users ask, reduce the risk of my car hitting pedestrians.
  • Blurring the line between "responsible use" and "safety" (which is a sort of alignment problem), e.g. make my chatbot less xyz-ist, protecting training data privacy, ethics of AI use.
  • Old school hazard analysis and mitigation. This is like the hazard analysis a plane goes through before the FAA lets it fly, but now the planes have AI components. 

The thing that probably won't get funding is aligning a fully autonomous agent with the implicit interests of all humans (not just trainers), which generalizes to the ex-risk problem. Perhaps I lack imagination, but with the way things are I can't really imagine how you get enough published in the usual venues about this to build a dissertation out of it. 

 

[1] Yeah, of course you can get it published, but I think most would agree that its harder to get a pure theory ex-risk paper published in a traditional CS/AI venue than other types of papers. Perhaps this will change as new tracks open up, but I'm not sure.

[2] https://www.york.ac.uk/safe-autonomy/research/assurance/

[3] https://www.darpa.mil/program/assured-autonomy 

Replies from: Linch
comment by Linch · 2023-08-14T16:36:59.664Z · LW(p) · GW(p)

I expect academia to have more appetite for AI safety work that looks like (adversarial) robustness, mechanistic interpretability, etc, than alignment qua alignment. From the outside, it doesn't seem very unlikely for academia to do projects similar to what Redwood Research does, for example. 

Though typical elite academics might also be distracted by shiny publishable projects rather than be as focused/dedicated on core problems, compared to e.g. Redwood. This is somewhat counterbalanced by the potential of academia having higher quantity and possibly quality of high-end intellectual horsepower/rigor/training.

The thing that probably won't get funding is aligning a fully autonomous agent with the implicit interests of all humans (not just trainers), which generalizes to the ex-risk problem.

I think getting agents to robustly do what the trainers want would be a huge win. Instilling the right values conditional upon being able to instill any reasonable values seems like a) plausibly an easier problem, b) not entirely (or primarily?) technical, and c) a reasonable continuation of existing nontechnical work in AI governance, moral philosophy, political science, and well, having a society.

Replies from: sharmake-farah
comment by Noosphere89 (sharmake-farah) · 2023-08-14T17:05:56.529Z · LW(p) · GW(p)

I think getting agents to robustly do what the trainers want would be a huge win.

I want to mention that I sort of conjecture that this is the best result alignment can realistically get, at least without invoking mind-control/controlling of values directly, and that societal alignment is either impossible or trivial, depending on the constraints.

Replies from: Linch
comment by Linch · 2023-08-16T23:54:52.225Z · LW(p) · GW(p)

Hmm again it depends on whether you're defining "alignment" narrowly (the technical problem of getting superhumanly powerful machines to robustly attempt to do what humans actually want) or more broadly (eg the whole scope of navigating the transition from sapiens controlling the world to superhumanly powerful machines controlling the world in a way that helps humans survive and flourish)

If the former, I disagree with you slightly; I think "human values" are possibly broad enough that some recognizably-human values are easier to align than others. Consider the caricature of an amoral businessman vs someone trying to do principled preference utilitarianism for all of humanity.

If the latter, I think I disagree very strongly. There are many incremental improvements short of mind-control to make the loading of human values go more safely, eg, having good information security, theoretical work in preference aggregation, increasing certain types of pluralism, basic safeguards in lab and corporate governance, trying to make the subjects of value-loading to be of a larger set of people than a few lab heads and/or gov't leaders, advocacy for moral reflection and moral uncertainty, (assuming slowish takeoff) trying to make sure collective epistemics don't go haywire during the advent of ~human-level or slightly superhuman intelligences, etc.

comment by Nicholas / Heather Kross (NicholasKross) · 2023-08-15T02:48:03.230Z · LW(p) · GW(p)

I anticipate most of the academia-based AI safety research to be:

  1. Safety but not alignment.
  2. Near-term but not extinction-preventing.
  3. Flawed in a way that a LessWrong Sequences reader would quickly notice, but random academics (even at top places) might not.
  4. Accidentally speeding up capabilities.
  5. Heavy on tractability but light on neglectedness and importance.
  6. ...buuuuuut 0.05% of it could turn out to be absolutely critical for "hardcore" AI alignment anyway. This is simply due to the sheer size of academia giving it a higher absolute number of [people who will come up with good ideas] than the EA/rationalist/LessWrong-sphere.

Four recommendations I have (in addition to posting on Arxiv!):

  1. It'd be interesting to have a prediction market / measurement on this sort of thing, somehow.
  2. If something's already pretty likely to get a grant within academia, EA should probably not fund it, modulo whatever considerations might override that on a case-by-case basis (e.g. promoting an EA grantor via proximity to a promising mainstream project... As you can imagine, I don't think such considerations move the needle in the overwhelming majority of cases.)
  3. Related to (2): Explicitly look to fund the "uncool"-by-mainstream-standards projects. This is partly due to neglectedness, and partly due to "worlds where iterative design fails [? · GW]" AI-risk-process logic.
  4. The community should investigate and/or setup better infrastructure to filter for the small number of crucial papers noted in my prediction (6).
comment by jacquesthibs (jacques-thibodeau) · 2023-08-14T01:31:15.401Z · LW(p) · GW(p)

Grantmakers should think about how to react to a potentially changing funding landscape, with many more "traditional" grantmakers funding research in academia, and more talented academics being open to work on AI existential safety. This could also mean to prioritize work that is substantially different than what will be researched in academia.

I was thinking about this as well. Not sure how much effort is being put into this, but academics should maybe consider first trying to get funding from traditional sources to do their alignment work if that means more funding overall. Getting funded by EA/Rat sources means there’s less funding for independent researchers, who already have less avenues for funding. I’d love to know if this is something grantmakers would adjust on.

Perhaps there should even have someone part/full-time who is focused on getting alignment academics funded through traditional means rather than EA/Rat Grantmakers.

Of course, if the traditional funding won’t fund the kind of alignment work the academic wants to do then obviously worth getting funding outside of that. And if EA/Rat sources allow for a productivity boost in ways trad funding doesn’t fund, then it’s also worth it.

comment by Joshua Clancy (joshua-clancy) · 2023-08-15T23:26:33.888Z · LW(p) · GW(p)

Any and all efforts should be welcome. That being said I have my qualms with academic research in this field. 

  1. Perhaps that most important thing we need in AI safety is public attention as to gain the ability to effectively regulate. Academia is terrible at bringing public attention to complex issues. 
  2. We need big theoretical leaps. Academia tends to make iterative measurable steps. In the past we saw imaginative figures like Einstein make big theoretical leaps and rise in academia. But I would argue that the combination of how academia works today (It is a dense rainforest of papers where few see the light) plus this particular field of AI safety (Measurable advancement requires AGI to test on) is uniquely bad at rising theoretical leaps to the top.
  3. Academia is slooooooow.

I am scared we have moved from a world where we could sketch ideas out in pencil and gather feedback, to a world where we are writing in permanent marker. We have to quickly and collaboratively model 5 moves ahead. In my mind there is a meta problem of how we effectively mass collaborate, and academia is currently failing to do this. 

comment by niplav · 2023-08-14T00:10:01.800Z · LW(p) · GW(p)

See also Leech 2020.

comment by Anonymous (currymj) · 2023-08-16T09:34:48.081Z · LW(p) · GW(p)

I think the biggest difference is this will mean more people with a wider range of personality types, socially interacting in a more arms-length/professionalized way, according to the social norms of academia.

Especially in CS, you can be accepted among academics as a legitimate researcher even without a formal degree, but it would require being able and willing to follow these existing social norms.

And in order to welcome and integrate new AI safety researchers from academia, the existing AI safety scene would have to make some spaces to facilitate this style of interaction, rather than the existing informal/intense/low-social-distance style.

comment by the gears to ascension (lahwran) · 2023-08-15T11:13:49.097Z · LW(p) · GW(p)

Hmm. If academia picks stuff up, perhaps interhuman cosmopolitan alignment could use much more discussion on lesswrong, especially from interdisciplinary perspectives. In other words, if academia picks up short term alignment, long term alignment becomes a question of how to interleave differing value systems while ensuring actual practical mobility for socio emotionally disempowered folks. The thing social justice is trying to be but keeps failing at due to corruption turning attempts at analytical "useful wokeness" into crashed slogan thought; the thing libertarianism is trying to be but keeps failing at due to corruption turning defense of selves into decision theoretically unsound casual selfish reasoning; the thing financial markets keep trying to be but keep failing at due to corruption turning competitive reasoning systems into destructive centralization; etc. Qualitative insights from economics and evolutionary game theory and distributed systems programming and etc about how to see all of these issues as the same one as the utility function question in strongly supercapable superreasoner AI. Etc.

Just some rambling this brought to mind.

edited sep 13, here's what GPT4 wrote when asked to clarify wtf I'm talking about:

It seems like the paragraph discusses the potential of including more conversations about interdisciplinary cosmopolitan alignment in the community at lesswrong, particularly in light of developments in academia. You're recommending the community focus on synthesizing crucial insights from across various disciplines and cultural outlooks, with a view to advancing not just short-term but also long-term alignment efforts. The intent is towards a cosmopolitan harmony, where different value systems can coexist while providing tangible empowerment opportunities for people who are socio-emotionally disadvantaged.

The concept of 'alignment' you're discussing appears to be an application of social justice and economic principles aimed at driving change, though this is often undermined by corruption hindering promising initiatives. It corresponds to the 'useful wokeness', which unfortunately ends up being reduced to mere slogans and uncritical loyalty to ideological authority.

Another suggestion you indicated is found in the ideation behind libertarianism. This political ideology works towards autonomy and personal liberty but fails due to corruption that changes this ideal into impractical self-centeredness and hypocritical rule implementation. The common theme across these diverse though corruption-prone strategies is further amplified in your criticism of financial market dynamics, where the competitive nature of trading based reasoning systems often leads to harmful centralization. The discussed failure here appears to be more significant.

The utility function argument in artificial intelligence (AI) with high potential capabilities and reasoning abilities seem to be at the core of your argument. You suggest drawing qualitative insights from economics, evolutionary game theory, distributed system programming, and other disciplines for a more comprehensive understanding of these issues. By attempting to view these myriad challenges as interrelated, one might recast these into a single question about utility function—which if navigated effectively, could address the question of alignment not only in AIs but also across society at large.

Replies from: Herb Ingram
comment by Herb Ingram · 2023-08-15T23:27:47.651Z · LW(p) · GW(p)

No offense, this reads to me as if it was deliberately obfuscated or AI-generated (I'm sure you didn't do either of these, this is a comment on writing style). I don't understand what you're saying. Is it "LW should focus on topics that academia neglects"?

I also didn't understand at all what the part starting with "social justice" is meant to tell me or has to do with the topic.

comment by Gesild Muka (gesild-muka) · 2023-08-14T01:34:24.217Z · LW(p) · GW(p)

Perhaps standards and practices for who can and should teach AI safety (or new related fields) should be better defined.

comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2023-08-14T00:54:03.873Z · LW(p) · GW(p)

Good post. Agree this will happen.