Posts
Comments
How fast should the field of AI safety grow? An attempt at grounding this question in some predictions.
- Ryan Greenblatt seems to think we can get a 30x speed-up in AI R&D using near-term, plausibly safe AI systems; assume every AIS researcher can be 30x’d by Alignment MVPs
- Tom Davidson thinks we have <3 years from 20%-AI to 100%-AI; assume we have ~3 years to align AGI with the aid of Alignment MVPs
- Assume the hardness of aligning TAI is equivalent to the Apollo Program (90k engineer/scientist FTEs x 9 years = 810k FTE-years); therefore, we need ~9k more AIS technical researchers
- The technical AIS field is currently ~500 people; at the current growth rate of 28% per year, it will take 12 years to grow to 9k people (Oct 2036)
- Alternatively, if we bound by the Manhattan Project (25k FTEs x 5 years = 125 FTE-years), this will take 6.5 years (Jul 2031)
- Metaculus predicts weak AGI in 2026 and strong AGI in 2030; clearly, more talent development is needed if we want to make the Nov 2030 AGI deadline!
- If we want to make the 9k researchers goal by Nov 2030 AGI deadline, we need an annual growth rate of 65%, 2.3x the current growth rate of 28%
Ah, that's a mistake. Our bad.
Crucial questions for AI safety field-builders:
- What is the most important problem in your field? If you aren't working on it, why?
- Where is everyone else dropping the ball and why?
- Are you addressing a gap in the talent pipeline?
- What resources are abundant? What resources are scarce? How can you turn abundant resources into scarce resources?
- How will you know you are succeeding? How will you know you are failing?
- What is the "user experience" of my program?
- Who would you copy if you could copy anyone? How could you do this?
- Am I better than the counterfactual?
- Who are your clients? What do they want?
Additional resources, thanks to Avery:
- COT Scaling implies slower takeoff speeds (Zoellner - 10 min)
- o1: A Technical Primer (Hoogland - 20 min)
- Unpacking o1 and the Path to AGI (Brown - up to 8:38)
- Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters (Deepmind, Snell - 20 min to skim)
- Speculations on Test-Time Scaling (Rush - from 4:25 to 21:25, 17 min total)
And 115 prospective mentors applied for Summer 2025!
When onboarding advisors, we made it clear that we would not reveal their identities without their consent. I certainly don't want to require that our advisors make their identities public, as I believe this might compromise the intent of anonymous peer review: to obtain genuine assessment, without fear of bias or reprisals. As with most academic journals, the integrity of the process is dependent on the editors; in this case, the MATS team and our primary funders.
It's possible that a mere list of advisor names (without associated ratings) would be sufficient to ensure public trust in our process without compromising the peer review process. We plan to explore this option with our advisors in future.
Not currently. We thought that we would elicit more honest ratings of prospective mentors from advisors, without fear of public pressure or backlash, if we kept the list of advisors internal to our team, similar to anonymous peer review.
I'm tempted to set this up with Manifund money. Could be a weekend project.
How would you operationalize a contest for short-timeline plans?
But who is "MIRI"? Most of the old guard have left. Do you mean Eliezer and Nate? Or a consensus vote of the entire staff (now mostly tech gov researchers and comms staff)?
On my understanding, EA student clubs at colleges/universities have been the main “top of funnel” for pulling people into alignment work during the past few years. The mix people going into those clubs is disproportionately STEM-focused undergrads, and looks pretty typical for STEM-focused undergrads. We’re talking about pretty standard STEM majors from pretty standard schools, neither the very high end nor the very low end of the skill spectrum.
At least from the MATS perspective, this seems quite wrong. Only ~20% of MATS scholars in the last ~4 programs have been undergrads. In the most recent application round, the dominant sources of applicants were, in order, personal recommendations, X posts, AI Safety Fundamentals courses, LessWrong, 80,000 Hours, then AI safety student groups. About half of accepted scholars tend to be students and the other half are working professionals.
You could consider doing MATS as "I don't know what to do, so I'll try my hand at something a decent number of apparent experts consider worthwhile and meanwhile bootstrap a deep understanding of this subfield and a shallow understanding of a dozen other subfields pursued by my peers." This seems like a common MATS experience and I think this is a good thing.
Some caveats:
- A crucial part of the "hodge-podge alignment feedback loop" is "propose new candidate solutions, often grounded in theoretical models." I don't want to entirely focus on empirically fleshing out existing research directions to the exclusion of proposing new candidate directions. However, it seems that, often, new on-paradigm research directions emerge in the process of iterating on old ones!
- "Playing theoretical builder-breaker" is an important skill and I think this should be taught more widely. "Iterators," as I conceive of them, are capable of playing this game well, in addition to empirically testing these theoretical insights against reality. John, to his credit, did a great job of emphasizing the importance of this skill with his MATS workshops on the alignment game tree and similar.
- I don't want to entirely trust in alignments MVPs, so I strongly support empirical research that aims to show the failure modes of this meta-strategy. I additionally support the creation of novel strategic paradigms, though I think this is quite hard. IMO, our best paradigm-level insights as a field largely come from interdisciplinary knowledge transfer (e.g., from economics, game theory, evolutionary biology, physics), not raw-g ideas from the best physics postdocs. Though I wouldn't turn away a chance to create more von Neumann's, of course!
Alice is excited about the eliciting latent knowledge (ELK) doc, and spends a few months working on it. Bob is excited about debate, and spends a few months working on it. At the end of those few months, Alice has a much better understanding of how and why ELK is hard, has correctly realized that she has no traction on it at all, and pivots to working on technical governance. Bob, meanwhile, has some toy but tangible outputs, and feels like he's making progress.
I don't want to respond to the examples rather than the underlying argument, but it seems necessary here: this seems like a massively overconfident claim about ELK and debate that I don't believe is justified by popular theoretical worst-case objections. I think a common cause of "worlds where iterative design fails" is "iterative design seems hard and we stopped at the first apparent hurdle." Sure, in some worlds we can rule out entire classes of solutions via strong theoretical arguments (e.g., "no-go theorems"); but that is not the case here. If I felt confident that the theory-level objections to ELK and debate ruled out hodge-podge solutions, I would abandon hope in these research directions and drop them from the portfolio. But this "flinching away" would ensure that genuine progress on these thorny problems never happened. If Stephen Casper, et al. treated ELK as unsolvable, they would never have published "Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs". If Akbir Khan, et al. treated debate as unsolvable, they would never have published "Debating with More Persuasive LLMs Leads to More Truthful Answers." I consider these papers genuine progress towards hodge-podge alignment, which I consider a viable strategy towards "alignment MVPs." Many more such cases.
A crucial part of my worldview is that good science does not have to "begin with the entire end in mind." Exploratory research pushing the boundaries of the current paradigm might sometimes seem contraindicated by theoretical arguments, but achieve empirical results anyways. The theory-practise gap can cut both ways: sometimes sound-seeming theoretical arguments fail to predict tractable empirical solutions. Just because travelling salesman is NP-hard doesn't mean that a P-time algorithm for a restricted subclass of the problem doesn't exist! I do not feel sufficiently confident in the popular criticisms of dominant prosaic AI safety strategies to rule them out; far from it. I want new researchers to download these criticisms, as there is valuable insight here, but "touch reality" anyways, rather than flinch away from the messy work of iteratively probing and steering vast, alien matrices.
Maybe research directions like ELK and debate sizzle out because the theoretical objections hold up in practise. Maybe they sizzle out for unrelated reasons, such as weird features of neural nets or just because we don't have time for the requisite empirical tinkering. But we would never find out unless we tried! And we would never equip a generation of empirical researchers to iterate on candidate solutions until something breaks. It's this skill that I think is most lacking in alignment: turning theoretical builder-breaker moves into empirical experiments on frontier LLMs (the apparent most likely substrate of first-gen AGI) that move us iteratively closer to a sufficiently scalable hodge-podge alignment MVP.
Obviously I disagree with Tsvi regarding the value of MATS to the proto-alignment researcher; I think being exposed to high quality mentorship and peer-sourced red-teaming of your research ideas is incredibly valuable for emerging researchers. However, he makes a good point: ideally, scholars shouldn't feel pushed to write highly competitive LTFF grant applications so soon into their research careers; there should be longer-term unconditional funding opportunities. I would love to unlock this so that a subset of scholars can explore diverse research directions for 1-2 years without 6-month grant timelines looming over them. Currently cooking something in this space.
@yanni kyriacos when will you post about TARA and Sydney AI Safety Hub on LW? ;)
@abramdemski DM me :)
Can you make a Manifund.org grant application if you need funding?
I'm not sure!
We don't collect GRE/SAT scores, but we do have CodeSignal scores and (for the first time) a general aptitude test developed in collaboration with SparkWave. Many MATS applicants have maxed out scores for the CodeSignal and general aptitude tests. We might share these stats later.
I don't agree with the following claims (which might misrepresent you):
- "Skill levels" are domain agnostic.
- Frontier oversight, control, evals, and non-"science of DL" interp research is strictly easier in practice than frontier agent foundations and "science of DL" interp research.
- The main reason there is more funding/interest in the former category than the latter is due to skill issues, rather than worldview differences and clarity of scope.
- MATS has mid researchers relative to other programs.
I don't think it makes sense to compare Google intern salary with AIS program stipends this way, as AIS programs are nonprofits (with associated salary cut) and generally trying to select against people motivated principally by money. It seems like good mechanism design to pay less than tech internships, even if the technical bar for is higher, given that value alignment is best selected by looking for "costly signals" like salary sacrifice.
I don't think the correlation for competence among AIS programs is as you describe.
I think there some confounders here:
- PIBBSS had 12 fellows last cohort and MATS had 90 scholars. The mean/median MATS Summer 2024 scholar was 27; I'm not sure what this was for PIBBSS. The median age of the 12 oldest MATS scholars was 35 (mean 36). If we were selecting for age (which is silly/illegal, of course) and had a smaller program, I would bet that MATS would be older than PIBBSS on average. MATS also had 12 scholars with completed PhDs and 11 in-progress.
- Several PIBBSS fellows/affiliates have done MATS (e.g., Ann-Kathrin Dombrowski, Magdalena Wache, Brady Pelkey, Martín Soto).
- I suspect that your estimation of "how smart do these people seem" might be somewhat contingent on research taste. Most MATS research projects are in prosaic AI safety fields like oversight & control, evals, and non-"science of DL" interpretability, while most PIBBSS research has been in "biology/physics-inspired" interpretability, agent foundations, and (recently) novel policy approaches (all of which MATS has supported historically).
Also, MATS is generally trying to further a different research porfolio than PIBBSS, as I discuss here, and has substantial success in accelerating hires to AI scaling lab safety teams and research nonprofits, helping scholars found impactful AI safety organizations, and (I suspect) accelerating AISI hires.
Are these PIBBSS fellows (MATS scholar analog) or PIBBSS affiliates (MATS mentor analog)?
Updated figure with LASR Labs and Pivotal Research Fellowship at current exchange rate of 1 GBP = 1.292 USD.
That seems like a reasonable stipend for LASR. I don't think they cover housing, however.
That said, maybe you are conceptualizing of an "efficient market" that principally values impact, in which case I would expect the governance/policy programs to have higher stipends. However, I'll note that 87% of MATS alumni are interested in working at an AISI and several are currently working at UK AISI, so it seems that MATS is doing a good job of recruiting technical governance talent that is happy to work for government wages.
Note that governance/policy jobs pay less than ML research/engineering jobs, so I expect GovAI, IAPS, and ERA, which are more governance focused, to have a lower stipend. Also, MATS is deliberately trying to attract top CS PhD students, so our stipend should be higher than theirs, although lower than Google internships to select for value alignment. I suspect that PIBBSS' stipend is an outlier and artificially low due to low funding. Given that PIBBSS has a mixture of ML and policy projects, and IMO is generally pursuing higher variance research than MATS, I suspect their optimal stipend would be lower than MATS', but higher than a Stanford PhD's; perhaps around IAPS' rate.
That's interesting! What evidence do you have of this? What metrics are you using?
MATS lowered the stipend from $50/h to $40/h ahead of the Summer 2023 Program to support more scholars. We then lowered it again to $30/h ahead of the Winter 2023-24 Program after surveying alumni and determining that 85% would be accept $30/h.
CHAI interns are paid $5k/month for in-person interns and $3.5k/month for remote interns. I used the in-person figure. https://humancompatible.ai/jobs
Yes, this doesn't include those costs and programs differ in this respect.
Hourly stipends for AI safety fellowship programs, plus some referents. The average AI safety program stipend is $26/h.
Edit: updated figure to include more programs.
1% are "Working/interning on AI capabilities."
Erratum: previously, this statistic was "7%", which erroneously included two alumni who did not complete the program before Winter 2023-24, which is outside the scope of this report. Additionally, two of the three alumni from before Winter 2023-24 who selected "working/interning on AI capabilities" first completed our survey in Sep 2024 and were therefore not included in the data used for plots and statistics. If we include those two alumni, this statistic would be 3/74 = 4.1%, but this would be misrepresentative as several other alumni who completed the program before Winter 2023-24 filled in the survey during or after Sep 2024.
Scholars working on safety teams at scaling labs generally selected "working/interning on AI alignment/control"; some of these also selected "working/interning on AI capabilities", as noted. We are independently researching where each alumnus ended up working, as the data is incomplete from this survey (but usually publicly available), and will share separately.
Great suggestion! We'll publish this in our next alumni impact evaluation, given that we will have longer-term data (with more scholars) soon.
Cheers!
I think you might have implicitly assumed that my main crux here is whether or not take-off will be fast. I actually feel this is less decision-relevant for me than the other cruxes I listed, such as time-to-AGI or "sharp left turns." If take-off is fast, AI alignment/control does seem much harder and I'm honestly not sure what research is most effective; maybe attempts at reflectively stable or provable single-shot alignment seem crucial, or maybe we should just do the same stuff faster? I'm curious: what current AI safety research do you consider most impactful in fast take-off worlds?
To me, agent foundations research seems most useful in worlds where:
- There is an AGI winter and we have time to do highly reliable agent design; or
- We build alignment MVPs, institute a moratorium on superintelligence, and task the AIs to solve superintelligence alignment (quickly), possibly building off existent agent foundations work. In this world, existing agent foundations work helps human overseers ground and evaluate AI output.
I just left a comment on PIBBSS' Manifund grant proposal (which I funded $25k) that people might find interesting.
Main points in favor of this grant
- My inside view is that PIBBSS mainly supports “blue sky” or “basic” research, some of which has a low chance of paying off, but might be critical in “worst case” alignment scenarios (e.g., where “alignment MVPs” don’t work, or “sharp left turns” and “intelligence explosions” are more likely than I expect). In contrast, of the technical research MATS supports, about half is basic research (e.g., interpretability, evals, agent foundations) and half is applied research (e.g., oversight + control, value alignment). I think the MATS portfolio is a better holistic strategy for furthering AI alignment. However, if one takes into account the research conducted at AI labs and supported by MATS, PIBBSS’ strategy makes a lot of sense: they are supporting a wide portfolio of blue sky research that is particularly neglected by existing institutions and might be very impactful in a range of possible “worst-case” AGI scenarios. I think this is a valid strategy in the current ecosystem/market and I support PIBBSS!
- In MATS’ recent post, “Talent Needs of Technical AI Safety Teams”, we detail an AI safety talent archetype we name “Connector”. Connectors bridge exploratory theory and empirical science, and sometimes instantiate new research paradigms. As we discussed in the post, finding and developing Connectors is hard, often their development time is on the order of years, and there is little demand on the AI safety job market for this role. However, Connectors can have an outsized impact on shaping the AI safety field and the few that make it are “household names” in AI safety and usually build organizations, teams, or grant infrastructure around them. I think that MATS is far from the ideal training ground for Connectors (although some do pass through!) as our program is only 10 weeks long (with an optional 4 month extension) rather than the ideal 12-24 months, we select scholars to fit established mentors’ preferences rather than on the basis of their original research ideas, and our curriculum and milestones generally focus on building object-level scientific skills rather than research ideation and “gap-identifying”. It’s thus no surprise that most MATS scholars are “Iterator” archetypes. I think there is substantial value in a program like PIBBSS existing, to support the development of “Connectors” and pursue impact in a higher-variance way than MATS.
- PIBBSS seems to have decent track record for recruiting experienced academics in non-CS fields and helping them repurpose their advanced scientific skills to develop novel approaches to AI safety. Highlights for me include Adam Shai’s “computational mechanics” approach to interpretability and model cognition, Martín Soto’s “logical updatelessness” approach to decision theory, and Gabriel Weil’s “tort law” approach to making AI labs liable for their potential harms on the long-term future.
- I don’t know Lucas Teixeira (Research Director) very well, but I know and respect Dušan D. Nešić (Operations Director) a lot. I also highly endorsed Nora Ammann’s vision (albeit while endorsing a different vision for MATS). I see PIBBSS as a highly competent and EA-aligned organization, and I would be excited to see them grow!
- I think PIBBSS would benefit from funding from diverse sources, as mainstream AI safety funders have pivoted more towards applied technical research (or more governance-relevant basic research like evals). I think Manifund regrantors are well-positioned to endorse more speculative basic research, but I don’t really know how to evalutate such research myself, so I’d rather defer to experts. PIBBSS seems well-positioned to provide this expertise! I know that Nora had quite deep models of this while Research Director and in talking with Dusan, I have had a similar impression. I hope to talk with Lucas soon!
Donor's main reservations
- It seems that PIBBSS might be pivoting away from higher variance blue sky research to focus on more mainstream AI interpretability. While this might create more opportunities for funding, I think this would be a mistake. The AI safety ecosystem needs a home for “weird ideas” and PIBBSS seems the most reputable, competent, EA-aligned place for this! I encourage PIBBSS to “embrace the weird”, albeit while maintaining high academic standards for basic research, modelled off the best basic science institutions.
- I haven’t examined PIBBSS’ applicant selection process and I’m not entirely confident it is the best version it can be, given how hard MATS has found applicant selection and my intuitions around the difficulty of choosing a blue sky research portfolio. I strongly encourage PIBBSS to publicly post and seek feedback on their applicant selection and research prioritization processes, so that the AI safety ecosystem can offer useful insight. I would also be open to discussing these more with PIBBSS, though I expect this would be less useful.
- My donation is not very counterfactual here, given PIBBSS’ large budget and track record. However, there has been a trend in typical large AI safety funders away from agent foundations and interpretability, so I think my grant is still meaningful.
Process for deciding amount
I decided to donate the project’s minimum funding ($25k) so that other donors would have time to consider the project’s merits and potentially contribute. Given the large budget and track record of PIBBSS, I think my funds are less counterfactual here than for smaller, more speculative projects, so I only donated the minimum. I might donate significantly more to PIBBSS later if I can’t find better grants, or if PIBBSS is unsuccessful in fundraising.
Conflicts of interest
I don't believe there are any conflicts of interest to declare.
I don't think I'd change it, but my priorities have shifted. Also, many of the projects I suggested now exist, as indicated in my comments!
More contests like ELK with well-operationalized research problems (i.e., clearly explain what builder/breaker steps look like), clear metrics of success, and have a well-considered target audience (who is being incentivized to apply and why?) and user journey (where do prize winners go next?).
We've seen a profusion of empirical ML hackathons and contests recently.
A New York-based alignment hub that aims to provide talent search and logistical support for NYU Professor Sam Bowman’s planned AI safety research group.
Based on Bowman's comment, I no longer think this is worthwhile.
Hackathons in which people with strong ML knowledge (not ML novices) write good-faith critiques of AI alignment papers and worldviews
Apart Research runs hackathons, but these are largely empirical in nature (and still valuable).
A talent recruitment and onboarding organization targeting cyber security researchers
Palisade Research now exists and are running the AI Security Forum. However, I don't think Palisade are quite what I envisaged for this hiring pipeline.
IAPS AI Policy Fellowship also exists now!
SPAR exists! Though I don't think it can offer visas.
A London-based MATS clone
This index should include Lighthaven, right?
I interpret your comment as assuming that new researchers with good ideas produce more impact on their own than in teams working towards a shared goal; this seems false to me. I think that independent research is usually a bad bet in general and that most new AI safety researchers should be working on relatively few impactful research directions, most of which are best pursued within a team due to the nature of the research (though some investment in other directions seems good for the portfolio).
I've addressed this a bit in thread, but here are some more thoughts:
- New AI safety researchers seem to face mundane barriers to reducing AI catastrophic risk, including funding, infrastructure, and general executive function.
- MATS alumni are generally doing great stuff (~78% currently work in AI safety/control, ~1.4% work on AI capabilities), but we can do even better.
- Like any other nascent scientific/engineering discipline, AI safety will produce more impactful research with scale, albeit with some diminishing returns on impact eventually (I think we are far from the inflection point, however).
- MATS alumni, as a large swathe of the most talented new AI safety researchers in my (possibly biased) opinion, should ideally not experience mundane barriers to reducing AI catastrophic risk.
- Independent research seems worse than team-based research for most research that aims to reduce AI catastrophic risk:
- "Pair-programming", builder-breaker, rubber-ducking, etc. are valuable parts of the research process and are benefited by working in a team.
- Funding insecurity and grantwriting responsibilities are larger for independent researchers and obstruct research.
- Orgs with larger teams and discretionary funding can take on interns to help scale projects and provide mentorship.
- Good prosaic AI safety research largely looks more like large teams doing engineering and less like lone geniuses doing maths. Obviously, some lone genius researchers (especially on mathsy non-prosaic agendas) seem great for the portfolio too, but these people seem hard to find/train anyways (so there is often more alpha in the former by my lights). Also, I have doubts that the optimal mechanism to incentivize "lone genius research" is via small independent grants instead of large bounties and academic nerdsniping.
- Therefore, more infrastructure and funding for MATS alumni, who are generally value-aligned and competent, is good for reducing AI catastrophic risk in expectation.
Also note that historically many individuals entering AI safety seem to have been pursuing the "Connector" path, when most jobs now (and probably in the future) are "Iterator"-shaped, and larger AI safety projects are also principally bottlenecked by "Amplifiers". The historical focus on recruiting and training Connectors to the detriment of Iterators and Amplifiers has likely contributed to this relative talent shortage. A caveat: Connectors are also critical for founding new research agendas and organizations, though many self-styled Connectors would likely substantially benefit as founders by improving some Amplifier-shaped soft skills, including leadership, collaboration, networking, and fundraising.
In theory, sure! I know @yanni kyriacos recently assessed the need for an ANZ AI safety hub, but I think he concluded there wasn't enough of a need yet?