Why I funded PIBBSS

post by Ryan Kidd (ryankidd44) · 2024-09-15T19:56:33.018Z · LW · GW · 8 comments

Contents

  Main points in favor of this grant
  Donor's main reservations
  Process for deciding amount
  Conflicts of interest
None
8 comments

I just left a comment on PIBBSS' Manifund grant request (which I funded $25k) that people might find interesting. PIBBSS needs more funding!

Main points in favor of this grant

  1. My inside view is that PIBBSS mainly supports “blue sky” or “basic” research, some of which has a low chance of paying off, but might be critical in “worst case [AF · GW]” alignment scenarios (e.g., where “alignment MVPs” don’t work, “sharp left turns [? · GW]” and “intelligence explosions [? · GW]” are more likely than I expect, or where we have more time before AGI than I expect). In contrast, of the technical research MATS supports, about half is basic research (e.g., interpretability, evals, agent foundations) and half is applied research (e.g., oversight + control, value alignment). I think the MATS portfolio is a better holistic strategy for furthering AI safety and reducing AI catastrophic risk. However, if one takes into account the research conducted at AI labs and supported by MATS, PIBBSS’ strategy makes a lot of sense: they are supporting a wide portfolio of blue sky research that is particularly neglected by existing institutions and might be very impactful in a range of possible “worst-case” AGI scenarios. I think this is a valid strategy in the current ecosystem/market and I support PIBBSS!
  2. In MATS’ recent post, “Talent Needs of Technical AI Safety Teams [LW · GW]”, we detail an AI safety talent archetype we name “Connector”. Connectors bridge exploratory theory and empirical science, and sometimes instantiate new research paradigms. As we discussed in the post, finding and developing Connectors is hard, often their development time is on the order of years, and there is little demand on the AI safety job market for this role. However, Connectors can have an outsized impact on shaping the AI safety field and the few that make it are “household names” in AI safety and usually build organizations, teams, or grant infrastructure around them. I think that MATS is far from the ideal training ground for Connectors (although some do pass through!) as our program is only 10 weeks long (with an optional 4 month extension) rather than the ideal 12-24 months, we select scholars to fit established mentors’ preferences rather than on the basis of their original research ideas, and our curriculum and milestones generally focus on building object-level scientific/engineering skills rather than research ideation, interdisciplinary knowledge transfer, and “identifying gaps [LW · GW]”. It’s thus no surprise that most MATS scholars are “Iterator” archetypes. I think there is substantial value in a program like PIBBSS existing, to support the long-term development of “Connectors” and pursue impact in a higher-variance way than MATS.
  3. PIBBSS seems to have a decent track record for recruiting experienced academics in non-CS fields and helping them repurpose their advanced research skills to develop novel approaches to AI safety. Highlights for me include Adam Shai’s “computational mechanics [LW · GW]” approach to interpretability and model cognition, Martín Soto’s “logical updatelessness [? · GW]” approach to decision theory, and Gabriel Weil’s “tort law [LW · GW]” approach to making AI labs liable for their potential harms on the long-term future.
  4. I don’t know Lucas Teixeira (Research Director) very well, but I know and respect Dušan D. Nešić (Operations Director) a lot. I also highly endorsed the former Research Director Nora Ammann’s vision (albeit while endorsing a different vision for MATS). I see PIBBSS as a competent and EA-aligned organization, and I would be excited to see them grow!
  5. I think PIBBSS would benefit from funding from diverse sources, as mainstream technical AI safety funders have pivoted more towards applied research (or more governance-relevant basic research like evals). I think Manifund regrantors are well-positioned to endorse more speculative basic research, but I don’t really know how to evaluate such research myself, so I’d rather defer to experts. PIBBSS seems well-positioned to provide this expertise! I know that Nora had quite deep models of this while Research Director and in talking with Dusan, I have had a similar impression. I hope to talk with Lucas soon!

Donor's main reservations

  1. It seems that PIBBSS might be pivoting away from higher variance blue sky research to focus on more mainstream AI interpretability. While this might create more opportunities for funding, I think this would be a mistake. The AI safety ecosystem needs a home for “weird ideas” and PIBBSS seems the most reputable, competent, EA-aligned place for this! I encourage PIBBSS to “embrace the weird,” albeit while maintaining high academic standards for basic research, modelled off the best basic science institutions.
  2. I haven’t examined PIBBSS’ applicant selection process and I’m not entirely confident it is the best version it can be, given how hard MATS has found mentor and applicant selection and my intuitions around the difficulty of choosing a blue sky research portfolio. I strongly encourage PIBBSS to publicly post and seek feedback on their applicant selection and research prioritization processes, so that the AI safety ecosystem can offer useful insight (and benefit from this). I would also be open to discussing these more with PIBBSS, though I expect this would be less useful.
  3. My donation is not very counterfactual here, given PIBBSS’ large budget and track record. However, there has been a trend in typical large AI safety funders away from agent foundations and interpretability, so I think my grant is still meaningful.

Process for deciding amount

I decided to donate the project’s minimum funding ($25k) so that other donors would have time to consider the project’s merits and potentially contribute. Given the large budget and track record of PIBBSS, I think my funds are less counterfactual here than for smaller, more speculative projects, so I only donated the minimum. I might donate significantly more to PIBBSS later if I can’t find better grants, or if PIBBSS is unsuccessful in fundraising.

Conflicts of interest

I don't believe there are any conflicts of interest to declare. 

8 comments

Comments sorted by top scores.

comment by Nora_Ammann · 2024-09-17T21:43:26.312Z · LW(p) · GW(p)

(To clarify: I co-founded and led PIBBSS since 2021, but stepped down from leadership [LW · GW] in June this year to work with davidad's on the Safeguarded AI programme. This means I'm no longer in charge of executive & day-to-day decisions at PIBBSS. As such, nothing of what I say below should be taking as authoritative source about what PIBBSS is going to do. I do serve on the board.)

Ryan -  I appreciate the donation, and in particular you sharing your reasoning here. 

I agree with a lot of what you write. Especially the point about "connectors (point 2) and the point about bringing in relatively speaking more senior academics from non-CS/non-ML field seems (point 3) are IMO things that are both valuable and PIBBSS has a good track record for delivering on. 

Re both your point 1 and reservation 1 -- trying to abstract a bit from the fact that terms like 'blue sky' and research are somewhat fuzzy, and I expect at least some parts of a disagreement here might turn out to resolve when considering concrete examples -- I do think there has been some change of research thinking & prioritization, which has been unfolding in my head since summer 2023, and more seriously since the start of 2024. Lucas is the best person to talk to this in more detail, but I'll still share a few thoughts that were on my mind back when I was leading PIBBSS. 

I continue to believe that there is a lot of value to be head in investigating the underlying principles of intelligent behaviour (what one might refer to as blue sky or basic research), and to do so with a good dose of epistemic pluralism -- studying such intelligent behaviour from/across a range of different systems, substrates and perspectives. However, after the first 2 years of PIBBSS, I also wasn't entirely happy with our concrete research outputs. I thought a lot - first by myself, and later with Lucas - about what's up here, and how to do better -- all the while staying true to the roots/generators/principles of PIBBSS, as I see them. 

One of the key axis of improvement we've grown increasingly confident on is what we sometimes refer to as "bridging the theory-practice gap". I'm pretty bullish on theory, but theory alone is not the answer. Theory on its own isn't in a great position toy know where to go next/what to prioritise, or whether it's making progress at all. I have seen many theoretical thread that feel intriguing/compelling, but failed in bottoming out to something   because they were lacking feedback loops that help guide them, and that force them to operationalise abstract notions into something of concrete empirical and interventionist value. 

This is not an argument against theory, in my eyes, but an argument that theorizing about intelligent behaviour (or any given phenomena) will benefit a lot from having productive feedback loops with the empirical.  As such, what we've come to want to foster at PIBBSS is (a) ambitious, 'theory-first' AI safety research, (b) with an interdisciplinary angle, (c) that is able to find meaningful and iterative empirical feedback loops. It's not so much that a theoretical project that doesn't yet have a meaningful way of making contact with reality should be disregarded -- but that an important source of theoretical progress is to find ways to articulating/operationalising that theory so that it starts making actual (testable) predictions about a real system.

These updates in our thinking led to a few different downstream decisions. One of them was trying to have our fellowship cohort have include some empirical ML profiles/projects (I endorse roughly a 10-15% fraction). Reasons for this is, both, that we think this work is likely to be useful, and also because it changes (and IMO improved) the cohort dynamics (compared to, say, 0-5% ML).  That said, I agree that once going above 20%, I would fear something essential about the PIBBSS spirit starts getting lost, and I'm not excited about that from an ecosystem perspective (given that e.g. MATS is doing what it's doing). 

Another downstream implication, though it's earlier days for us on that one, is that I've become pretty excited about trying to help move ambitious ideas from (what I call) an 'idea readiness level' (borrowing from the notion of 'technology readiness levels') 1-3, to an idea readiness level (IDL) of 5-7. (I think the way we were able to support Adam Shai & co in developing the computational mechanics agenda is a pretty great example of this, though notably their ideas were already relatively mature compared to other things PIBBSS might ambitiously help to mature.) My thinking here is that once an idea/research agenda is at IDLs 5-7, it typically has been able to enter the broader epistemic discourse, it has some initial legible evidence/output it can rely on to making its own case -- and at that point I would say it no longer is in the area where PIBBSS has the greatest comparative advantage to support it. (Though notably Lucas/Dusan, PIBBSS' new leadership might disagree & should have a chance to speak for themselves here.) There is currently no (?) notable places where IDL 1-3 ideas get a chance to get stress-tested and quickly iterate towards a more mature & empirically grounded agenda. I'd personally be very excited for a PIBBSS that becomes excellent at filling that gap. 

comment by clem_acs · 2024-09-17T12:25:27.649Z · LW(p) · GW(p)

Clem here - I was fellowship lead this year and have been a research affiliate and mentor for PIBBSS in the past. Thanks for posting this.  As might be expected in my position, I'm much more bullish than you / most people on what is often called "blue sky" research. Breakthroughs in the our fundamental understanding of agency, minds, learning etc. seem valuable in a range of scenarios, not just in world dominated by an "intelligence explosion". In particular, I think that this kind of work (a) benefits a huge amount from close engagement with empirical work, and (b) itself is very likely to inform near-future prosaic work. Furthermore, I feel that progress on these questions is genuinely possible, and is made significantly more likely with more people working on it from as many perspectives as possible. 

This said, I think two things you say under "reservations" I strongly agree with, and have some comments on.

> I encourage PIBBSS to “embrace the weird,” albeit while maintaining high academic standards for basic research, modelled off the best basic science institutions.

There are worlds where furthering our understanding of deeply confused basic concepts that underpin everything else we do isn't considered "the weird", but given that we're not in these worlds I have to agree. One big issues I see here is that doing this well requires marrying the better parts of academic culture with the better parts of tech / rationality culture (and yes, for this purpose I place those in the same bucket). Some of the places that I think do this best - e.g google's paradigms of intelligence team - have a culture / belief system somewhat incompatible with EA. It's worth noting that people often pursue basic questions for very different reasons.

> I strongly encourage PIBBSS to publicly post and seek feedback on their applicant selection and research prioritization processes, so that the AI safety ecosystem can offer useful insight (and benefit from this).

I think this is actually really important, and it's not something that I think PIBBSS does very well currently. One thing I would note is that, for reasons sketched above, I think it's important that the AI safety ecosystem aren't the only people interacting with this. One thing that's holding things back here is, in my view, a venue for this kind of research whose scope is, primarily, the basic research. This is not to say that the relevance and impact for safety shouldn't be a primary concern in research prioritisation - I think it very much should be - but I do think this can be done in a way that is more compatible with academic norms (at least those academic norms that are worth upholding). 

comment by TsviBT · 2024-09-17T07:43:40.075Z · LW(p) · GW(p)

(I have a lot of disagreements with everyone lol, but I appreciate Ryan putting some money where his mouth is re/ blue sky alignment research as a broad category, and the acknowledgement of "rather than the ideal 12-24 months" re/ "connectors".)

comment by Joseph Bloom (Jbloom) · 2024-09-17T08:06:42.803Z · LW(p) · GW(p)

It seems that PIBBSS might be pivoting away from higher variance blue sky research to focus on more mainstream AI interpretability. While this might create more opportunities for funding, I think this would be a mistake. The AI safety ecosystem needs a home for “weird ideas” and PIBBSS seems the most reputable, competent, EA-aligned place for this! I encourage PIBBSS to “embrace the weird,” albeit while maintaining high academic standards for basic research, modelled off the best basic science institutions.

 

I was a recent PIBBSS mentor, and am a mech interp person who is likely to be considered mainstream by many people and for this reason I wanted to push back on this concern.

A few thoughts:

  • I don't want to put words in your mouth but I do want to clarify that we shouldn't conflate having some mainstream mech interp and being only mech interp. Importantly, to my knowledge, there is very little chance of PIBBSS entirely doing mech interp, and so I think the salient question is should they have "a bit" (say 5-10% of scholars) do mech interp (which is likely more than they used to). I would advocate for a steady state proportion of between 10 - 20%, see further points for detail). 
  • In my opinion, the word "mainstream" suggests redundancy and brings to mind the idea that "well this could just be done at MATS". There are two reasons I think this is inaccurate. 
    • First, PIBBSS is likely to accept mentees who may not get into MATS / similar programs. Mentees with diverse background and possibly different skillsets. In my opinion, this kind of diversity can be highly valuable and bring new perspectives to mech interp (which is a pre-paradigmatic field in need of new takes).  I'm moderately confident that to the extent existing programs are highly selective, we should expect diversity to suffer in them (if you take the top 10% by metrics like competence, you're less likely to get the top 10% by diversity of intellectual background). 
    • Secondly, I think there's a statistical term for this but I forget what it is. PIBBSS being a home for weird ideas in mech interp as much as weird ideas in other areas of AI safety seems entirely reasonable to me. 
  • I also think that even some mainstream mech interp (and possible other areas like evals) should be a part of PIBBSS because it enriches the entire program:
    • My experience of the PIBBSS retreat and subsequent interactions suggests that a lot of value is created by having people who do empirical work interact with people who do more theoretical work. Empiricists gain ideas and perspective from theorists and theoretical researchers are exposed to more real world observations second hand. 
    • I weakly suspect that some debates / discussions in AI safety may be lopsided / missing details via the absence of sub-fields. In my opinion it's valuable to sometimes mix up who is in the room but likely worse in expectation to always remove mech interp people (hence my advocacy for 10 - 20% empiricists, with half of them being interp). 

 

Some final notes:

  • I'm happy to share details of the work my scholar and I did which we expect to publish in the next month.
  • I'll be a bit forward and suggest that if you (Ryan) or any other funders find the arguments above convincing then it's possible you might want to further PIBBSS and ask Nora how PIBBSS can source a bit more "weird" mech interp, a bit of mainstream mech interp and some other empirical sub-fields for the program. 

I'll share this in the PIBBSS slack to see if other's want to comment :) 

comment by james.lucassen · 2024-09-18T21:32:49.676Z · LW(p) · GW(p)

Nice post, glad you wrote up your thinking here.

I'm a bit skeptical of the "these are options that pay off if alignment is harder than my median" story. The way I currently see things going is:

  • a slow takeoff makes alignment MUCH, MUCH easier
  • as a result, all prominent approaches lean very hard on slow takeoff
  • there is uncertainty about takeoff speed, but folks have mostly given up on reducing this uncertainty

I suspect that even if we have a bunch of good agent foundations research getting done, the result is that we just blast ahead with methods that are many times easier because they lean on slow takeoff, and if takeoff is slow we're probably fine if it's fast we die.

Ways that could not happen:

  • Work of the of the form "here are ways we could notice we are in a fast takeoff world before actually getting slammed" produces evidence compelling enough to pause, or cause leading labs to discard plans that rely on slow takeoff
  • agent foundations research aiming to do alignment in faster takeoff worlds finds a method so good it works better than current slow takeoff tailored methods even in the slow takeoff case, and labs pivot to this method

Both strike me as pretty unlikely. TBC this doesn't mean those types of work are bad, I'm saying low probability not necessarily low margins

Replies from: ryankidd44
comment by Ryan Kidd (ryankidd44) · 2024-09-19T00:01:13.177Z · LW(p) · GW(p)

Cheers!

I think you might have implicitly assumed that my main crux here is whether or not take-off will be fast. I actually feel this is less decision-relevant for me than the other cruxes I listed, such as time-to-AGI or "sharp left turns." If take-off is fast, AI alignment/control does seem much harder and I'm honestly not sure what research is most effective; maybe attempts at reflectively stable or provable single-shot alignment seem crucial, or maybe we should just do the same stuff faster? I'm curious: what current AI safety research do you consider most impactful in fast take-off worlds?

To me, agent foundations research seems most useful in worlds where:

  • There is an AGI winter and we have time to do highly reliable agent design; or
  • We build alignment MVPs, institute a moratorium on superintelligence, and task the AIs to solve superintelligence alignment (quickly), possibly building off existent agent foundations work. In this world, existing agent foundations work helps human overseers ground and evaluate AI output.
comment by Lucas Teixeira · 2024-09-18T20:33:07.878Z · LW(p) · GW(p)

Hey Ryan, thank you for your support for the thoughtful write-up! It’s very useful for us to see what the alignment community at large, and our supporters specifically think of our work. I’ll respond to the point on “pivoting away from blue sky research” here and let Dušan address the other reservations in a separate comment.

As Nora has already mentioned [LW(p) · GW(p)], different people hold different notions on what it means to “keep it weird” and conduct “blue sky” and/or “non-paradigmatic” research. But in as far as this cluster of terms is pointing at research which is (a) aimed at innovating novel conceptual frames and (b) free from compromising pressures of short-term applications, then I would say that this is still the central focus of PIBBSS and that recent developments should be seen as updates to the founding vision, as opposed to full on departures.

The main technical bet in my reading of the PIBBSS founding mission (which people are free to disagree with, I’m curious in the ways in which they do), is that one can overcome the problem of epistemic access by leveraging insights from present day physically instantiated proxies [LW · GW]. Current day deep learning systems are impressive, and arguably stronger approximations to the kinds of AGI/ASI which we are concerned with, but they’re still proxies nonetheless and failing to treat them as such tends towards a set of associated failure cases.

Given both my personal experience with LLMs and my reading of the role that empirical engagement has historically played in non-paradigmatic research, I tend to advocate for a methodology which incorporates immediate feedback loops with present day deep learning systems over the classical "philosophy -> math -> engineering" deconfusion/agent foundations paradigm. This was most strongly reflected in the first iteration of the affiliateship cohort and is present in the language of the Manifund funding memo.

With that being said, given that PIBBSS, especially the fellowship, is largely a talent intervention aiming at providing a service to the field, I don’t believe its total portfolio should be confined to the limits of my research taste and experience. Especially after MIRI’s recent pivot, I think there’s a case to be made for PIBBSS to host research which doesn’t meet my personal preferences towards quick empirical engagement.

comment by Review Bot · 2024-09-17T07:11:37.410Z · LW(p) · GW(p)

The LessWrong Review [? · GW] runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2025. The top fifty or so posts are featured prominently on the site throughout the year.

Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?