Long-Term Future Fund: April 2023 grant recommendations
post by abergal, calebp99, Linch, habryka (habryka4), Thomas Larsen (thomas-larsen), Vaniver · 2023-08-02T07:54:49.083Z · LW · GW · 3 commentsContents
Introduction Other updates Highlights Payout reports Longer grant write-ups Grants evaluated by Linchuan Zhang Stephen Grugett, James Grugett, Austin Chen ($200,000): 4 month stipend for 3 FTE to build a forecasting platform made available to the public based on user-created play-money prediction markets Solomon Sia ($71,000): 6-month stipend for providing consultation and recommendations on changes to the US regulatory environment for prediction markets. Grants evaluated by Oliver Habryka Alexander Turner ($220,000): Year-long stipend for shard theory and RL mechanistic interpretability research Vanessa Kosoy ($100,000): Working on the learning-theoretic AI alignment research agenda Skyler Crossman ($22,000): Support for Astral Codex Ten Everywhere meetups Grants evaluated by Asya Bergal Alignment Research Center $54,543: Support for a research & networking event for winners of the Eliciting Latent Knowledge contest Daniel Filan ($23,544): Funding to produce 12 more AXRP episodes, the AI X-risk Podcast. Grants evaluated by Caleb Parikh Conjecture ($72,827): Funding for a 2-day workshop to connect alignment researchers from the US, UK, and AI researchers and entrepreneurs from Japan. SERI MATS program ($316,000): 8 weeks scholars program to pair promising alignment researchers with renowned mentors. (Originally evaluated by Asya Bergal) Robert Long ($10,840): travel funding for participants in a workshop on the science of consciousness and current and near-term AI systems Grants evaluated by Matthew Gray Leap Laboratories ($195,000): One year of seed funding for a new AI interpretability research organisation. Daniel Kokotajlo ($10,000): Funding for a research retreat on a decision-theory/cause-prioritisation topic. Grants evaluated by Thomas Larsen Kaarel Hänni, Kay Kozaronek, Walter Laurito, and Georgios Kaklmanos ($167,480): Implementing and expanding on the research methods of the "Discovering Latent Knowledge" paper. Joseph Bloom ($50,000): Funding AI alignment research into circuits in decision transformers. Other grants we made during this period Appendix: How we set grant and stipend amounts Our grantees often have excellent earning potential Grants have substantive downsides relative to working in an organisation How we decide on personal stipend size Appendix: Eligibility criteria for LTFF grants Appendix: Special note on upskilling grants None 3 comments
(Cross-posted from the EA Forum [EA · GW].)
Introduction
This payout report is meant to cover the Long-Term Future Fund's grantmaking starting January 2022 (after our December 2021 payout report), going through April 2023 (1 January 2022 - 30 April 2023).
- Total funding recommended: $13.0M
- Total funding paid out: $12.16M
- Number of grants paid out: 327
- Acceptance rate (excluding desk rejections): 50.0%
- Acceptance rate (including desk rejections): 37.4%
- Report authors: Asya Bergal (chair), Linchuan Zhang, Oliver Habryka, Caleb Parikh, Thomas Larsen, Matthew Graves
52 of our grantees, worth $1.41M, requested that we not include public reports for their grants. (You can read our policy on public reporting here [EA · GW].) We referred 2 grants to other funders for evaluation ($0.501M). Our median response time over this period was 29 days.
The rest of our grants are listed below (either in long or short form), as well as in our public grants database.
If you’re interested in receiving funding from the Long-Term Future Fund, apply here.
(Note: The initial sections of this post were written by me, Asya Bergal.)
Other updates
We've had a substantial increase in applications since 2021-- we averaged 35 applications per month in the latter half of 2021, 69 applications per month in 2022, and 90 applications per month so far in 2023.
Our funding bar went up at the end of 2022, in response to a decrease in the overall funding available to long-term future-focused projects. If we assume our numerical ratings are consistent, then applying our new bar to our earlier 2022 funding would imply not having funded 28% of earlier grants.
We're looking for more funding. We've spent an average of ~$1M per month across March, April, and May 2023 to maintain our current bar, have $992,870.53 in reserves as of July 3, and are ideally looking to fundraise at least $10M for the coming year.
As described in this post [EA · GW], we're trying to increase our independence from Open Philanthropy, which provided ~45% of our funding in 2022. As a transitional measure, over the next 6 months, Open Philanthropy will be matching funding given to the Long-Term Future Fund by small donors 2:1, for up to $3.5M total, making now a particularly good time to donate. Donate here. (The Long-Term Future Fund is part of EA Funds, which is a fiscally sponsored project of Effective Ventures Foundation (UK) (EV UK) and Effective Ventures Foundation USA Inc. (EV US). Donations to the Long-Term Future Fund are donations to EV US or EV UK.)
As a temporary measure in response to uncertainty about our future funding levels, we’ve put the bottom ~40% of grants above our current funding bar on hold. I think we’ll make several of those grants after this round of fundraising is over, but I generally expect our funding bar to vary more over time and to depend more on individual donations than it has historically.
I will be stepping down as chair of the fund by the end of October (and potentially earlier)-- I've written some reflections on my time on the fund here [EA · GW]. We're looking for additional fund managers (including potential chair candidates)-- express interest here.
The fund's current fund managers are me (Asya Bergal), Linchuan Zhang, Oliver Habryka, and Caleb Parikh as permanent fund managers, and Thomas Larsen, Daniel Eth, Matthew Gray, Lauro Langosco, and Clara Collier as guest managers [EA · GW].
Our legal team asked us to highlight the eligibility criteria for our grants, which you can find in the appendices.
Highlights
Our grants include:
- $316,000 in June 2022 [EA · GW] to support SERI MATS, an 8-week scholar program that pairs promising alignment researchers with mentors in the alignment field.
- $72,000 in July 2022 [EA · GW] for a research and networking retreat for winners of the Eliciting Latent Knowledge contest.
- $200,000 in February 2022 [EA · GW] to support Stephen Grugett, James Grugett, and Austin Chen for 4 months to build a forecasting platform (Manifold Markets) based on user-created play-money prediction markets.
Payout reports
Longer grant write-ups
Grants evaluated by Linchuan Zhang
Stephen Grugett, James Grugett, Austin Chen ($200,000): 4 month stipend for 3 FTE to build a forecasting platform made available to the public based on user-created play-money prediction markets
- March 2022 Notes by Linch Zhang: This was my first substantive grant investigation. At the time, I felt shaky about it, but now I feel really good about it. The two main reasons I originally recommended this grant:
- 1. It was an investment into the people who wanted to do EA work – getting 3 ~Google-quality engineers to do more EA/longtermist work (as opposed to counterfactuals that were earning to give or worse) seems well worth it at 200k.
- 2. It was an investment into the team specifically. Having cohesive software teams seems like an important component for EA becoming formidable in the future, and is somewhat (surprisingly to me) missing in EA, especially outside of AI safety and crypto trading. I heard really good things about Manifold from early users, and they appeared to be developing at a speed that blew other software projects in forecasting (Metaculus, Foretold, Cultivate, Hypermind, etc) out of the water.
- At the time, it was not an investment into the prediction market itself/theory of change with regards to play-money prediction markets broadly, because the above two factors were sufficient to be decisive.
- At the time, it was also unclear whether they plan to go the for-profit route or the nonprofit route.
- They’ve since decided to go the for-profit route.
- Looking back, still too soon to be sure, but it looks like Manifold is going quite well. Continue to develop features at phenomenal speeds, lots of EAs and others in adjacent communities use the product, team is still producing fast and are excited for the future.
- From an “investment into team” perspective, I think Manifold now plausibly has the strongest software team in EA outside of AI safety and earning-to-give (not that I’d necessarily have enough visibility to know of all the potentially better teams, especially stealth ones).
- I have a number of disjunctive ToCs for how Manifold (and forecasting in general) can over time make the future better, some of which is implicitly covered here [EA · GW].
- Though I am still uncertain about whether this particular project is the best use of the cofounders + team’s time...a lot of the evidence I have to observe this is more an update on the team’s overall skill + cohesiveness rather than an update about their comparative advantage for prediction markets specifically.
- Addendum June 2023:
- I’ve grown more confused about the total impact or value of this grant. On the one hand, I think Manifold is performing at or moderately above expectations in terms of having a cohesive team that’s executing quickly, and many people in the community appear to find their product useful or at least interesting. On the other hand, the a) zero-interest-rate environment and corresponding high startup evaluations when I recommended this grant has ended in early 2022, and b) recent events have reduced a substantial fraction of EA funding, which meant 200K is arguably much more costly now than a year ago.
- Still, I think I’m broadly glad to have Manifold in our ecosystem. I think they’re very helpful for people in our and adjacent communities in training epistemics, and I’m excited to see them branch out into experiments in regranting and retroactive funding projects; from a first-principles perspective, it’d be quite surprising if the current status of EA grantmaking is sufficiently close to optimal.
Solomon Sia ($71,000): 6-month stipend for providing consultation and recommendations on changes to the US regulatory environment for prediction markets.
- Solomon Sia wants to talk to a range of advisers, including industry experts, users, and contacts at the CFTC, to see if there are good improvements in ways to regulate prediction markets in the US, while simultaneously protecting users and reducing regulatory risk and friction.
- This was an exploratory grant for seeing how it’s possible to improve the US regulatory environment for prediction markets with a resulting written report provided to EA Funds.
- I think this is a reasonable/great option to explore:
- I think my position on prediction markets is somewhat more cynical than that of most EAs in the forecasting space, but still, I’m broadly in favor of them and think they can be a critical epistemic intervention, both for uncovering new information and for legibility/common knowledge reasons.
- It seemed quite plausible to me that the uncertain regulatory environment for prediction markets in the US is impeding the growth of large real-money prediction markets on questions that matter.
- Solomon seemed unusually competent and knowledgeable about the tech regulations space, a skillset very few EAs have.
- Cultivating this skillset and having him think about EA issues seemed valuable.
- A potential new caveat is that in 2023 as AI risk worries heat up, it seems increasingly likely that we might be able to draw from a diverse skillset of experienced and newly interested/worried people.
- The for-profit motivations for this work are there but not very large, as unless a company is trying very hard to do specific regulatory capture for their company (which is bad and also practically very difficult), easing prediction market regulations has collective benefits and individual costs.
- (weakly held) I thought trying to nail this during the Biden administration is good because it seemed plausible that the current CFTC will be more predisposed to liking prediction markets than average for the CFTC.
- One interesting update is that EA connections are likely a mild plus in 2022, and a moderate liability in 2023.
- NB: Solomon and his collaborator think a) that the EA connection is still a mild to moderate positive b) it’s now unclear whether the Biden administration is better or worse than a counterfactual Republican administration.
- I’ve thought about this grant some afterwards, and I think even with the benefit of hindsight, I'm still a bit confused about how happy I should be about this grant ex-post.
- One thing is that I’ve grown a bit more confused about the output and tractability of interventions in this domain.
- The successes(?) Kalshi had confused me and I haven’t had enough time to integrate this into my worldview.
- My current impression is that CFTC is fairly open to informed opinions from others on this matter.
- I continue to believe it’s a good grant ex-ante.
- One thing is that I’ve grown a bit more confused about the output and tractability of interventions in this domain.
Grants evaluated by Oliver Habryka
Alexander Turner ($220,000): Year-long stipend for shard theory and RL mechanistic interpretability research
This grant has been approved but has not been paid out at the time of writing.
We’ve made grants to Alex to pursue AI Alignment research before:
- 2019: Building towards a “Limited Agent Foundations” thesis on mild optimization and corrigibility
- 2020: Understanding when and why proposed AI designs seek power over their environment ($30,000)
- 2021: Alexander Turner - Formalizing the side effect avoidance problem ($30,000)
- 2022: Alexander Turner - 12-month stipend supplement for CHAI research fellowship ($31,500)
We also made another grant in 2023 to a team led by Alex Turner for their post on steering vectors [LW · GW] for $115,411 (total includes payment to 5 team members, including, without limitation, travel expenses, office space, and stipends).
This grant is an additional grant to Alex, this time covering his full-time stipend for a year to do more research in AI Alignment.
Only the first one has a public grant write-up, and the reasoning and motivation behind all of these grants is pretty similar, so I will try to explain the reasoning behind all of them here.
As is frequently the case with grants I evaluate in the space of AI Alignment, I disagree on an inside-view level pretty strongly with the direction of the research that Alex has been pursuing for most of his AI Alignment career. Historically I have been, on my inside-view, pretty unexcited about Alex’s work on formalizing power-seekingness, and also feel not that excited about his work on shard theory. Nevertheless, I think these are probably among the best grants the LTFF has made in recent years.
The basic reasoning here is that despite me not feeling that excited about the research directions Alex keeps choosing, within the direction he has chosen, Alex has done quite high-quality work, and also seems to often have interesting and useful contributions in online discussions and private conversations. I also find his work particularly interesting, since I think that within a broad approach I often expected to be fruitless, Alex has produced more interesting insight than I expected. This in itself has made me more interested in further supporting Alex, since someone producing work that shows that I was at least partially wrong about a research direction being not very promising is more important to incentivize than work whose effects I am pretty certain of.
I would like to go into more detail on my models of how Alex’s research has updated me, and why I think it has been high quality, but I sadly don’t have the space or time here to go into that much depth. In-short, the more recent steering vector work seems like the kind of “obvious thing to try that could maybe help” that I would really like to saturate with work happening in the field, and the work on formalizing power-seeking theorems is also the kind of stuff that seems worth having done, though I do pretty deeply regret the overly academic/formal presentation which has somewhat continuously caused people to overinterpret the strength of its results (which Alex also seems to have regretted [LW(p) · GW(p)], and is also a pattern I have frequently observed in academic work that was substantially motivated by trying to “legitimize the field”).
Another aspect of this grant that I expect to have somewhat wide-ranging consequences is the stipend level we set on. Some basic principles that have lead me to suggest this stipend level:
- I have been using the anchor of “industry stipend minus 30%” as a useful heuristic for setting stipend levels for LTFF grants. The goal in that heuristic was to find a relatively objective standard that would allow grantees to think about stipend expectations on their own without requiring a lot of back and forth, while hitting a middle ground in the incentive landscape between salaries being so low that lots of top talent would just go into industry instead of doing impactful work, and avoiding grifter problems with people asking for LTFF grants because they expect they will receive less supervision and can probably get away without a ton of legible progress.
- In general I think self-employed salaries should be ~20-40% higher, to account for additional costs like health insurance, payroll taxes, administration overhead, and other things that an employer often takes care of.
I have been rethinking stipend policies, as I am sure many people in the EA community have been since the collapse of FTX, and I haven’t made up my mind on the right principles here. It does seem like a pretty enormous number of good projects are no longer having the funding to operate at their previous stipend levels, and it’s plausible to me that we should take the hit, lose out on a bunch of talent, and reduce stipend levels to a substantially lower level again to be more capable of handling funding shocks. But I am really uncertain on this, and at least in the space of AI Alignment, I can imagine the recent rise to prominence of AI Risk concerns could potentially alleviate funding shortfalls (or it could increase competition by having more talent flow into the space, which could reduce wages, which would also be great).
See the Stipend Appendix below, “How we set grant and stipend amounts”, for more information on EA Funds’ determination of grant and stipend amounts.
Vanessa Kosoy ($100,000): Working on the learning-theoretic AI alignment research agenda
This is a grant to cover half of Vanessa’s stipend for two years (the other half being paid by MIRI). We also made another grant to Vanessa in Q4 2020 for a similar amount.
My model of the quality of Vanessa’s work is primarily indirect, having engaged relatively little with the central learning-theoretic agenda that Vanessa has worked on. The work is also quite technically dense, and I haven’t found anyone else who could explain the work to me in a relatively straightforward way (though I have heard that Daniel Filan’s AXRP podcast with Vanessa is a better way to get started than previous material, though it hadn’t been published when I was evaluating this grant).
I did receive a decent number of positive references for Vanessa’s work, and I have seen her make contributions to other conversations online [? · GW] that struck me as indicative of a pretty deep understanding of the AI Alignment problem.
If I had to guess at the effects of this kind of work, though I should clarify I am substantially deferring to other people here in a way that makes me not particularly trust my specific predictions, I expect that the primary effect would be that the kind of inquiry Vanessa is pursuing highlights important confusions and mistaken assumptions in how we expect machine intelligence to work, which when resolved, will make researchers better at navigating the very large space of potential alignment approaches. I would broadly put this in the category of “Deconfusion Research” [? · GW].
Vanessa’s research resulted in various public blog posts, which can be found here [AF · GW].
Skyler Crossman ($22,000): Support for Astral Codex Ten Everywhere meetups
Especially since the collapse of FTX, I am quite interested in further diversifying the set of communities that are working on things I think are important to the future. AstralCodexTen and SlateStarCodex meetups seem among the best candidates for creating additional thriving communities with overlapping, but still substantially different norms.
I do feel currently quite confused about what a good relationship between adjacent communities like this and Effective Altruism-labeled funders like the Long Term Future Fund should be. Many of these meetups do not aim to do as much as good as possible, or have much of an ambitious aim to affect the long term future of humanity, and I think pressures in that direction would likely be more harmful than helpful, by introducing various incentives for deception and potentially preventing healthy local communities from forming by creating a misaligned relationship between the organizers (who are paid by EA institutions to produce as much talent for longtermist priorities) and the members (who are interested in learning cool things about rationality and the world and want to meet other people with similar interests).
Since this is a relatively small grant, I didn’t really resolve this confusion, and mostly decided to just go ahead with this. I also talked a bunch to Skyler about this, and currently think we can figure out a good relationship into the future on how it’s best to distribute funding like this, and I expect to think more about this in the coming weeks.
Grants evaluated by Asya Bergal
Any views expressed below are my personal views, and not the views of my employer, Open Philanthropy. (In particular, getting funding from the Long-Term Future Fund should not be read as an indication that the applicant has a greater chance of receiving funding from Open Philanthropy, and not receiving funding from the Long-Term Future Fund [or any risks and reservations noted in the public payout report] should not be read as an indication that the applicant has a smaller chance of receiving funding from Open Philanthropy.)
Alignment Research Center $54,543: Support for a research & networking event for winners of the Eliciting Latent Knowledge contest
- This was funding a research & networking event for the winners of the Eliciting Latent Knowledge contest [LW · GW] run in early 2022; the plan for the event was mainly for it to be participant-led, with participants sharing what they were working on and connecting with others, along with professional alignment researchers visiting to share their own work with participants.
- I think the case for this grant is pretty straightforward: the winners of this contest are (presumably) selected for being unusually likely to be able to contribute to problems in AI alignment, and retreats, especially those involving interactions with professionals in the space, have a strong track record of getting people more involved with this work.
Daniel Filan ($23,544): Funding to produce 12 more AXRP episodes, the AI X-risk Podcast.
We recommended a grant of $23,544 to pay Daniel Filan for his time making 12 additional episodes of the AI X-risk Research Podcast (AXRP), as well as the costs of hosting, editing, and transcription.
The reasoning behind this grant was similar to the reasoning behind my last grant to AXRP [EA · GW]:
- I’ve listened or read through several episodes of the podcast; I thought Daniel asked good questions and got researchers to talk about interesting parts of their work. I think having researchers talk about their work informally can provide value not provided by papers (and to a lesser extent, not provided by blog posts). In particular:
- I’ve personally found that talks by researchers can help me understand their research better than reading their academic papers (e.g. Jared Kaplan’s talk about his scaling laws paper). This effect seems to have also held for at least one listener [LW · GW] of Daniel’s podcast.
- Informal conversations can expose motivations for the research and relative confidence level in conclusions better than published work.
Daniel also shared some survey data [LW · GW] in his grant application about how people rated AXRP compared to other AI alignment resources, though I didn't look at this closely when making the grant decision, as I already had a reasonably strong prior towards funding.
Grants evaluated by Caleb Parikh
Conjecture ($72,827): Funding for a 2-day workshop to connect alignment researchers from the US, UK, and AI researchers and entrepreneurs from Japan.
- Conjecture applied for funding to host a two day AI safety workshop in Japan in collaboration with Araya (a Japanese AI company). They planned to invite around 40 people, with half of the attendees being AI researchers, and half being alignments researchers from the US and UK. Japanese researchers were generally senior, leading labs, holding postdoc positions in academia, or holding senior technical positions at tech companies.
- To my knowledge, there has been very little AI safety outreach conducted amongst strong academic communities in Asia (e.g. in Japan, Singapore, South Korea …). On the current margin, I am excited about more outreach being done in these countries within ultra-high talent groups. The theory of change for the grant seemed fairly straightforward: encourage talented researchers who are currently working in some area of AI to work on AI safety, and foster collaborations between them and the existing alignment community.
- Conjecture shared the invite list with me ahead of the event, and I felt good about the set of alignment researchers invited from the UK and US. I looked into the Japanese researchers briefly, but I found it harder to gauge the quality of invites given my lack of familiarity with the Japanese AI scene. I also trust Conjecture to execute operationally competently on events of this type, having assisted other AI safety organisations (such as SERI MATS) in the past.
- On the other hand, I have had some concerns about Conjecture, and I felt confused about whether this conference gave Conjecture more influence in ways that I would feel concerned about given the questionable integrity and judgement of their CEO, - see this [EA · GW] and this [EA · GW] section of a critique of their organisation (though note that I don’t necessarily endorse the rest of the post). It was also unclear to me how counterfactual the grant was, and how this traded off against activities that I would be less excited to see Conjecture run. I think this is a general issue with funding projects at organisations with flexible funding, as organisations are incentivised to present their most fundable projects (which they are also the most excited about), and then in cases where the funding request is successful, move funding that they would have spent on this projects to other lower impact projects. Overall, I modelled making this grant as being about a quarter as cost-effective as it might have been without these considerations (though I don’t claim this discount factor to be particularly reliable).
- Overall, I thought this grant was pretty interesting, and I think that the ex-ante case for it was pretty solid. I haven’t reviewed the outcomes of this grant yet, but I look forward to reviewing and potentially making more grants in this area.
- Update: Conjecture kindly directed me towards this [LW · GW] retrospective and have informed me that some Japanese attendees of their conference are thinking of creating an alignment org.
SERI MATS program ($316,000): 8 weeks scholars program to pair promising alignment researchers with renowned mentors. (Originally evaluated by Asya Bergal)
- SERI MATS is a program that helps established AI safety researchers find mentees. The program has grown substantially since we first provided funding, and now supports 15 mentors, but at the time, the mentors were Alex Gray, Beth Barnes, Evan Hubinger, John Wentworth, Leo Gao, Mark Xu, and Stuart Armstrong. Mentors took part in the program in Berkeley in a shared office space.
- When SERI MATS was founded, there were very few opportunities for junior researchers to try out doing alignment research. Many opportunities were informal mentorship positions, sometimes set up through cold emails or after connecting at conferences. The program has generally received many more qualified applicants than they have places for, and the vast majority of fellows report a positive experience of the program. I also believe the program has substantially increased the number of alignment research mentorship positions available.
- I think that SERI MATS is performing a vital role in building the talent pipeline for alignment research. I am a bit confused about why more organisations don’t offer larger internship programs so that the mentors can run their programs ‘in-house’. My best guess is that MATS is much better than most organisations running small internship programs for the first time, particularly in supporting their fellows holistically (often providing accommodation and putting significant effort into the MATS fellows community). One downside of the program relative to an internship at an organisation is that there are fewer natural routes to enter a managed position, though many fellows have gone on to receive LTFF grants for independent projects or continued their mentorship under the same mentor.
Robert Long ($10,840): travel funding for participants in a workshop on the science of consciousness and current and near-term AI systems
Please note this grant has been approved but at the time of writing it has not been paid out.
- We funded Robert Long to run a workshop on the science of consciousness for current and near-term AI systems. Robert and his FHI colleague, Patrick Butlin, began the project on consciousness in near-term AI systems during their time at FHI, where they both worked in the digital minds research group. Since January of this year, Rob has been continuing the project while a philosophy fellow at CAIS. There are surprisingly few people investigating the consciousness of near-term AI systems, which I find pretty worrying given the rapid pace of progress in ML. I think that it’s plausible we end up creating many copies of AI systems and use them in ways that we’d consider immoral given enough reflection , in part due to ignorance about their preferences. The workshop aimed to produce a report applying current theories of consciousness (like integrated information theory and global workspace theory) to current ml systems.
- I think that Rob is an excellent fit for this kind of work; he is one of the few people working in this area and has written quite a lot about AI consciousness on his blog. He has a PhD in philosophy from NYU, where he was advised by David Chalmers, and has experience running workshops (e.g. in 2020, he ran a workshop on philosophy and large language models with Amanda Askell).
Jeffrey Ladish ($98,000): 6-month stipend & operational expenses to start a cybersecurity & alignment risk assessment org
Please note this grant has been approved but at the time of writing it has not been paid out.
- Jeffrey Ladish applied for funding to set up an organisation to do AI risk communications, with a focus on cybersecurity and alignment risks. His organisation, Palisade Research Inc., plans to conduct risk assessments and communicate those risks to the public, labs and the government. The theory of change is that communicating catastrophic risks to the public and key decision makers could increase political support for slowing down AI and other measures that might reduce AI risk. I am particularly excited about Jeffrey’s organisation demonstrating offensive AI cyber capabilities and other demos that help to communicate current risks from advanced AI systems.
- I am pretty excited about Jeffrey’s organisation. He has worked on information security in various organisations (including Anthropic), he seems well-networked amongst people working in think tanks and AI labs, and I like his public writing on AI risk. I am generally sceptical of people doing work related to policy without having first worked in lower stakes positions in similar areas first, but I thought that Jeffrey was orienting to the downsides very reasonably and doing the sensible things, like developing plans with more experienced policy professionals.
Grants evaluated by Matthew Gray
Leap Laboratories ($195,000): One year of seed funding for a new AI interpretability research organisation.
- Jessica Rumbelow applied for seed funding to set up an interpretability research organisation, which hopes to develop a model-agnostic interpretability engine.
- I’m excited about this grant primarily based on the strength of research work she did with Matthew Watkins during SERI-MATS [LW · GW], discovering anomalous tokens like SolidGoldMagikarp.
- I think trends in the AI development space suggest a need for model-agnostic methods.
- More broadly, I think this showcases one of the primary benefits of interpretability research: it’s grounded in a way that makes it easy to verify and replicate.
Daniel Kokotajlo ($10,000): Funding for a research retreat on a decision-theory/cause-prioritisation topic.
- We funded a research retreat run by Daniel Kokotajlo on Evidential Cooperation in Large Worlds. I think research retreats like this are both quite productive and quite cheap; we only have to pay for travel and housing costs, and the attendees are filtered on intrinsic interest in the topic.
Grants evaluated by Thomas Larsen
Kaarel Hänni, Kay Kozaronek, Walter Laurito, and Georgios Kaklmanos ($167,480): Implementing and expanding on the research methods of the "Discovering Latent Knowledge" paper.
This is a team which started in SERI MATS applying for funding to continue their SERI MATS project on research checking for dishonesty in advanced AI systems.
My cruxes for this type of grant are:
(1) If done successfully, would this project help with alignment?
(2) How likely is this team to be successful?
My thoughts on (1):
This is meant to build upon Burns’ et al.'s Discovering Latent Knowledge paper (DLK), which finds a direction in activation space that is supposed to represent the 'truth' of a logical proposition.
I think that Eliciting Latent Knowledge [LW · GW] (ELK) is an important subproblem of alignment, and I think it can be directly applied to combat deceptive alignment. My independent impression is that this specific direction towards solving ELK is not very useful towards a full alignment solution, but that it may lead to slightly better monitoring. (In particular, I think even in a good outcome, this will only lead to an average case solution to ELK, meaning that when we explicitly train against this detector, it will fail.) I expect that AGI projects will be in a position where it's obvious that the systems they are building are capable and dangerous, and it will be apparent that instrumental incentives kick in for e.g. powerseeking and deception. I think that this technique might help us detect this danger, but given that we can't train against it, it doesn't let us actually fix the underlying problem. Thus, the lab will be in the difficult position of continuing on, or having to train against their detection system. I still think that incremental progress on detecting deception is good, because it can help push for a stop in capabilities growth before prematurely continuing to AGI.
My thoughts on (2):
They produced reasonable output during SERI MATS, including the beginning of a replication of the DLK paper. They weren't that specific in their grant application, but they wrote a number of ideas for ways to extend the paper in the LW post [LW · GW]. The two ideas that seem best to me are:
- Connecting DLK to mechanistic interpretability. This seems hard, but maybe tinkering around in activation space can be helpful.
- Creating a better confidence loss. In the original paper, only one statement was considered, and so the loss was coming from the constraint that P(q) + P(not q) = 1. They propose evaluating two propositions p & q, and getting more constraints from that.
These ideas don't seem amazing, but they seem like reasonable things to try. I expect that the majority of the benefit will come from staring at the model internals and the results of the techniques and then iterating. I hope that this process will churn out more and better ideas.
One reservation I have is that none of the applicants have an established research track record, though they have published several papers:
- Walter's Google Scholar Profile
This team did get strong references from Colin Burns and John Wentworth, which makes me a lot more excited about the project. All things considered, I'm excited about giving this team a chance to work on this project, and see how they are doing. I'm also generally enthusiastic about teams trying their hand at alignment research.
Joseph Bloom ($50,000): Funding AI alignment research into circuits in decision transformers.
Joseph applied for independent research funding to continue his research into decision transformer interpretability. I'm happy about Joseph's initial result [LW · GW], which found circuits in a decision transformer in a simple RL environment. I thought the applicant's write up was solid and gave me some updates on what cognitive machinery I expect to be induced by RL. In particular, I was excited about the preference directions in embedding space that they constructed. This seems like a useful initial step for retargeting the search [LW · GW], though more understanding of the circuits that are doing the optimization seems critical for this approach.
I think interpretability on RL models is pretty neglected and very relevant for safety.
According to a reference, the applicant was also in the top 3 ARENA participants, and was very motivated and agentic.
The counterfactual is that Joseph tries to get funding elsewhere, and if that fails, getting a research engineer job at an AI safety org (e.g. Redwood, Conjecture, Ought, etc). I encouraged this person to apply to the AI safety orgs, as I think that working at an org is generally more productive than independent research. These jobs are quite competitive, so it's likely that Joseph won't get hired by any of them, and in this case, it seems great to pay him to do independent alignment research.
Overall, I think that Joseph is a promising researcher, and is working on a useful direction, so I feel excited about supporting this.
Since receiving this grant, Joseph has received some more funding (here), and was mentioned in the Anthropic May Update.
Other grants we made during this period
Applicant Name | Grant Summary | Awarded Amount | Decision Date |
Thomas Woodside | Support to work on research projects relevant to AI alignment | $50,000 | January 2022 |
Anonymous | Support to study and gain a background in technical AI | $3,000 | January 2022 |
Charlie Steiner | Support for researching value learning | $50,000 | January 2022 |
Logan Smith | Support to create language model (LM) tools to aid alignment research through feedback and content generation | $40,000 | January 2022 |
Paul Colognese | Educational scholarship in AI safety | $13,000 | January 2022 |
Anonymous | AI governance PhD | $4,129 | January 2022 |
Ruth Grace Wong | Research paper about the history of philanthropy-driven national-scale movement-building strategy to inform how EA funders might go about building movements for good | $2,000 | February 2022 |
Stephen Grugett, James Grugett, Austin Chen | Support to build a forecasting platform based on user-created play-money prediction markets | $200,000 | February 2022 |
Marius Hobbhahn | Research on AI safety | $30,103 | February 2022 |
JJ Hepburn | Health coaching to optimise the health and wellbeing, and thus capacity/productivity, of those working on AI safety | $80,000 | February 2022 |
Vael Gates | Support for a study on AI researchers’ perceptions of safety | $9,900 | February 2022 |
William Bradshaw | Support to work on biosecurity | $11,400 | February 2022 |
Michael Parker | Catalogue the history of U.S. high-consequence pathogen regulations, evaluate their performance, and chart a way forward | $34,500 | February 2022 |
Stuart Armstrong | Support for setting up a research company in AI alignment | $33,762 | February 2022 |
Anonymous | AI safety field-building | $32,568 | February 2022 |
Anonymous | Travel funds to attend a conference and network with the community at an EA hub | $1,600 | February 2022 |
Timothy Underwood | Write a SF/F novel based on the EA community | $15,000 | February 2022 |
Simon Grimm | Financial support for work on a biosecurity research project and workshop, and travel expenses | $15,000 | February 2022 |
Anonymous | Scholarship/teaching buy-out to finish Master's thesis and commence AI safety research | $10,800 | February 2022 |
Oliver Zhang | Running an alignment theory mentorship program with Evan Hubinger | $3,600 | February 2022 |
Anonymous | A large conference hosting communities working on improving the long-term future | $250,000 | February 2022 |
Anonymous | Recording written materials that are useful for people working on AI governance | $5,100 | March 2022 |
Gavin Leech | Researching and documenting longtermist lessons from COVID | $5,625 | March 2022 |
Anonymous | Support to work on a safe exploration project with an AI research organization | $33,000 | March 2022 |
Anonymous | Support to work on a technical AI safety research project in an academic lab | $45,000 | March 2022 |
Jessica Cooper | Funding to trial a new London organisation aiming to 10x the number of AI safety researchers | $234,121 | March 2022 |
Aaron Bergman | Research on EA and longtermism | $70,000 | March 2022 |
Anonymous | Funding a visit to the Sculpting Evolution group for collaboration | $4,000 | March 2022 |
Jan-Willem van Putten | EU Tech Policy Fellowship with ~10 trainees | $68,750 | March 2022 |
Anonymous | 3-month funding to do an internship to develop career capital in policy advocacy | $12,600 | March 2022 |
Anonymous | Support for equipment for AI Safety and Metascience research | $1,905 | March 2022 |
Darryl Wright | 1-year research stipend (and travel and equipment expenses) for support for work on 2 AI safety projects: 1) Penalising neural networks for learning polysemantic neurons; and 2) Crowdsourcing from volunteers for alignment research. | $150,000 | March 2022 |
Anonymous | Support for travel and equipment expenses for EA work on AI alignment | $5,000 | March 2022 |
Tomáš Gavenčiak | Organise the third Human-Aligned AI Summer School, a 4-day summer school for 150 participants in Prague, summer 2022 | $110,000 | March 2022 |
Anonymous | Independent alignment research at the intersection of computational cognitive neuroscience and AGI safety | $55,000 | March 2022 |
Kai Sandbrink | Starting funds for a DPhil project in AI that addresses safety concerns in ML algorithms and positions | $3,950 | April 2022 |
Maximilian Kaufmann | Support to work on technical AI alignment research | $7,000 | April 2022 |
Anonymous | PhD in Safe and Trusted AI with a focus on inductive biases towards the interpretability of neural networks | $63,259 | April 2022 |
Chloe Lee | Support to study emerging policies in biosecurity for better understanding and global response coordination | $25,000 | April 2022 |
Jack Ryan | Support for alignment theory agenda evaluation | $25,000 | April 2022 |
Isabel Johnson | Support research, write, and publish a book: a survey on the unknown dangers of a contemporary nuclear strike | $5,000 | April 2022 |
Nicholas Greig | Neural network interpretability research | $12,990 | April 2022 |
Centre for the Governance of AI | GovAI salaries and overheads for new academic AI governance group | $401,537 | April 2022 |
Daniel Skeffington | Research and a report/paper on the the role of emergency powers in the governance of X-Risk | $26,000 | April 2022 |
Noga Aharony | Support for PhD developing computational techniques for novel pathogen detection | $20,000 | April 2022 |
Tim Farrelly | Equipment for AI Safety research | $3,900 | April 2022 |
Sasha Cooper | 6 months funding for supervised research on the probability of humanity becoming interstellar given non-existential catastrophe | $36,000 | April 2022 |
Kevin Wang | Support to work on Aisafety.camp project, impact of human dogmatism on training | $2,000 | April 2022 |
Philipp Bongartz | Enabling prosaic alignment research with a multi-modal model on natural language and chess | $25,000 | May 2022 |
Ross Graham | Stipend and research fees for completing dissertation research on public ethical attitudes towards x-risk | $60,000 | May 2022 |
Nikiforos Pittaras | Support and compute expenses for technical AI Safety research on penalising RL agent betrayal | $14,300 | May 2022 |
Josiah Lopez-Wild | Funding a new computer for AI alignment work, specifically a summer PIBBSS fellowship and ML coding | $2,500 | May 2022 |
Theo Knopfer | Support to explore biosecurity policy projects: BWC/ European early detection systems/Deep Vision risk mitigation | $27,800 | May 2022 |
Jan Kirchner | Support for working on "Language Models as Tools for Alignment" in the context of the AI Safety Camp. | $10,000 | May 2022 |
Lucius Bushnaq, Callum McDougall, Avery Griffin | Support to investigate the origins of modularity in neural networks | $125,000 | May 2022 |
Anonymous | Admissions fee for MPA in International Development at a top university | $800 | May 2022 |
Anonymous | Support for research on international standards for AI | $5,250 | May 2022 |
Rory Gillis | Research project designed to map and offer preliminary assessment of AI ideal governance research | $2,000 | May 2022 |
John Bridge | Research into the international viability of FHI's Windfall Clause | $3,000 | May 2022 |
CHERI / Naomi Nederlof | Stipends for students of 2022 CHERI’s summer residence | $134,532 | May 2022 |
Wyatt Tessari | Support to connect, expand and enable the AGI safety community in Canada | $87,000 | May 2022 |
Ondrej Bajgar | Support funding during 2 years of an AI safety PhD at Oxford | $11,579 | May 2022 |
Neil Crawford | Support gatherings during 12 months period for discussion of AI safety | $10,000 | May 2022 |
Anonymous | Support to do AI alignment research on Truthful/Honest AI | $120,000 | May 2022 |
Logan Strohl | Support to further develop a branch of rationality focused on patient and direct observation | $80,000 | May 2022 |
Anonymous | Support for courses and research on AI | $4,000 | May 2022 |
Anonymous | Support to explore the concept of normative risk and its potential practical consequences | $20,000 | May 2022 |
Philippe Rivet | Support for research into applied technical AI alignment work | $10,000 | May 2022 |
Anonymous | Support to extend Udacity Deep Reinforcement Learning Nanodegree | $1,400 | May 2022 |
Cindy Wu | ML security/safety summer research project: model backdooring through pre-processing | $5,000 | May 2022 |
Marius Hobbhahn | Support for Marius Hobbhahn for piloting a program that approaches and nudges promising people to get into AI safety faster | $50,000 | May 2022 |
Conor McGlynn | Up-skill for AI governance work before starting Science and Technology Policy PhD at Harvard | $17,220 | June 2022 |
Anonymous | Support to hire a shared PA for researchers working at two organisations contributing to AI safety and governance | $78,000 | June 2022 |
Nora Ammann | Support the PIBBSS fellowship with more fellows than originally anticipated and to realize a local residency | $180,200 | June 2022 |
Julia Karbing | Paid internships for promising Oxford students to try out supervised AI Safety research projects | $60,000 | June 2022 |
Anonymous | Support to take Sec401 course from SANS for cyber security professionals | $8,589 | June 2022 |
Anonymous | Funding for 1-year executive and research assistance to support 2 researchers working in the longtermist space | $84,000 | June 2022 |
Francis Rhys Ward | Funding to support PhD in AI Safety at Imperial College London, technical research and community building | $6,350 | June 2022 |
Peter Barnett | Equipment for technical AI safety research | $4,099 | June 2022 |
Jacques Thibodeau | 3-month research stipend to continue working on AISC project to build a dataset for alignment and a tool to accelerate alignment | $22,000 | June 2022 |
Chris Patrick | Stipend to produce a guide about AI safety researchers and their recent work, targeted to interested laypeople | $5,000 | June 2022 |
Anonymous | Software engineering to revise and resubmit a multi-objective reinforcement learning paper | $26,000 | June 2022 |
Anonymous | PhD/research stipend for work on key longtermist area | $30,000 | June 2022 |
Jay Bailey | Support for Jay Bailey for work in ML for AI Safety | $79,120 | June 2022 |
Thomas Kehrenberg | 6-month research stipend for AI alignment research | $15,000 | June 2022 |
Solomon Sia | Support to lobby the CFTC and legalise prediction markets | $138,000 | June 2022 |
Bálint Pataki | Support AI Policy studies in the ML Safety Scholars program and at Oxford | $3,640 | June 2022 |
Jade Zaslavsky | 12-month research stipend to work on ML models for detecting genetic engineering in pathogens | $85,000 | June 2022 |
Gergely Szucs | 6-month research stipend to develop an overview of the current state of AI alignment research, and begin contributing | $70,000 | June 2022 |
Anonymous | AI safety PhD funding | $7,875 | June 2022 |
Conor Barnes | Website visualising x-risk as a tree of branching futures per Metaculus predictions | $3,500 | June 2022 |
Jonas Hallgren | 3-month research stipend to set up a distillation course helping new AI safety theory researchers to distil papers | $14,600 | June 2022 |
Victor Warlop | SERI MATS aims at scaling the number of alignment theorists by pairing promising applicants with renowned mentors | $316,000 | June 2022 |
Patrick Gruban | Weekend organised as a part of the co-founder matching process of a group to found a human data collection org | $2,300 | June 2022 |
Victor Warlop Piers de Raveschoot | Retroactive grant for managing the MATS program, 1.0 and 2.0 | $27,000 | June 2022 |
Anonymous | A 1-year research stipend for up-skilling in technical and general AI alignment to prepare for an impactful job in the field | $110,000 | June 2022 |
Anonymous | 7-month research stipend to do independent AI Safety research on interpretability and upskill in ML engineering | $43,600 | June 2022 |
Mario Peng Lee | Stanford Artificial Intelligence Professional Program Tuition | $4,785 | July 2022 |
Conor Sullivan | Develop and market video game to explain the Stop Button Problem to the public & STEM people | $100,000 | July 2022 |
Quinn Dougherty | Short meatspace workshop to hone, criticize, and evaluate hazardousness of a new research programme in alignment | $9,000 | July 2022 |
Viktoria Malyasova | Stipend for up-skilling in infrabayesianism prior to start of SERI MATS program | $4,400 | July 2022 |
Samuel Nellessen | 6-month budget to self-study ML and the possible applications of a Neuro/CogScience perspective for AGI Safety | $4,524 | July 2022 |
Charles Whittaker | Support for academic research projects relating to pandemic preparedness and biosecurity | $8,150 | July 2022 |
Amrita A. Nair | 3-month funding for upskilling in technical AI Safety to test personal fit and potentially move to a career in alignment | $1,000 | July 2022 |
Jeffrey Ohl | Tuition to take one Harvard economics course in fall 2022 to be a more competitive econ graduate school applicant | $6,557 | July 2022 |
Anonymous | Funding to take an online course on public policy to help the applicant transition from Machine Learning to AI-Governance | $2,732 | July 2022 |
Samuel Brown | 6-month research stipend to research AI alignment, specifically the interaction between goal-inference and choice-maximisation | $47,074 | July 2022 |
Anonymous | Support for multiple ML projects to build up skills for AI safety PhD | $1,100 | July 2022 |
Anonymous | 25-month grant funding EA-relevant dissertation that contributes to improved research on rate-limiting steps and constraints in AI research. | $139,000 | July 2022 |
Kyle Scott | A research and networking event for winners of the Eliciting Latent Knowledge contest to encourage collaboration on aligning future machine learning systems with human interests | $72,000 | July 2022 |
Derek Shiller | Support for an academic project evaluating factors relevant to digital consciousness with the aim of better understanding how and how not to create conscious artificial intelligences. | $11,000 | July 2022 |
Anonymous | Funds to attend cybersecurity conferences - defcon.org and blackhat.com | $5,550 | July 2022 |
Max Clarke | Financial support for career exploration and related project in AI alignment | $26,077 | August 2022 |
Anonymous | 2-month research stipend to build skills, and broaden action space for EA related projects to undertake in gap year | $15,320 | August 2022 |
Jonathan Ng | Funding support for MLSS scholar to up-skill in ML for alignment, documenting key learnings, and visit Berkeley in pursuit of a career in technical AI safety. | $16,000 | August 2022 |
Hamza Tariq Chaudhry | Equipment expenses for summer research fellowship at CERI and organising the virtual Future of Humanity Summit | $2,500 | August 2022 |
Anonymous | Research project on strategies to mitigate x-risk in Party Politics | $3,000 | August 2022 |
Anonymous | Funding for administrative support to the CEO for a large team working on research of interest to the longtermist community | $50,847 | August 2022 |
Simon Skade | Funding for 3 months’ independent study to gain a deeper understanding of the alignment problem, publishing key learnings and progress towards finding new insights. | $35,625 | August 2022 |
Ardysatrio Haroen | Support participation in MLSS program working on AI alignment. | $745 | August 2022 |
Antonio Franca | Equipment stipend for MLSS scholar to do research in AI technical research | $2,000 | August 2022 |
Darren McKee | Support for a non-fiction book on threat of AGI for a general audience | $50,000 | August 2022 |
Steve Petersen | Research stipend to work on the foundational issue of *agency* for AI safety | $20,815.20 | August 2022 |
Ross Nordby | Support for AI safety research and concrete research projects | $62,500 | August 2022 |
Leah Pierson | 300-hour research stipend for a research assistant to help implement a survey of 2,250 American bioethicists to lead to more informed discussions about bioethics. | $4,500 | August 2022 |
Luca De Leo | 12-month research stipend to study and get into AI Safety Research and work on related EA projects | $14,000 | August 2022 |
Anonymous | Two months of independent study in alignment to start my career as an alignment researcher | $8,333 | August 2022 |
Robi Rahman | Support for part-time rationality community building | $4,000 | August 2022 |
Lennart Justen | Funding to increase my impact as an early-career biosecurity researcher | $6,000 | September 2022 |
Fabienne Sandkühler | Funding for research on the effect of creatine on cognition | $4,000 | September 2022 |
Chris Leong | Funding for the AI Safety Nudge Competition | $5,200 | September 2022 |
Brian Porter | Independent research and upskilling for one year, to transition from academic philosophy to AI alignment research | $60,000 | September 2022 |
John Wentworth | 1-year research stipend for research in applications of natural abstraction | $180,000 | September 2022 |
Anonymous | 6 month research stipend for SERI MATS scholar to continue working on Alignment and ML Interpretability | $48,000 | September 2022 |
Nicky Pochinkov | 6-month research stipend for SERI MATS scholar to continue working on theoretical AI alignment research, trying to better understand how ML models work to reduce X-risk from future AGI | $50,000 | September 2022 |
David Hahnemann, Luan Ademi | 6-month research stipend for 2 people working on modularity, a subproblem of Selection Theorems and budget for computation | $26,342 | September 2022 |
Dan Valentine | 12-month research stipend to transition career into technical alignment research | $25,000 | September 2022 |
Anonymous | 3-month funding to explore GCBR-focused biosecurity projects after having finished my virology PhD | $25,000 | September 2022 |
Logan Smith | 6-month research stipend for continued work on shard theory: studying how inner values are formed by outer reward schedules | $40,000 | September 2022 |
Gunnar Zarncke | One year grant for a project to reverse-engineer human social instincts by implementing Steven Byrnes' brain-like AGI | $16,600 | September 2022 |
Zach Peck | Supporting participation at the Center for the Advancement of Rationality (CFAR) workshop | $1,800 | September 2022 |
Anonymous | AI master's thesis and research in longtermism | $30,000 | September 2022 |
Anonymous | Upskilling in technical AI Safety Research to contribute to the field through an engineering or research role | $33,000 | September 2022 |
Adam Rutkowski | Piloting an EA hardware lab for prototyping hardware relevant to longtermist priorities | $44,000 | September 2022 |
Anonymous | Setting up experiments with LLM to examine Strategic Instrumental Behavior in real-life setting | $50,000 | September 2022 |
Egor Zverev | PhD program support | $6,500 | September 2022 |
Anonymous | 1-year research stipend to work on alignment research full time | $80,000 | September 2022 |
Shavindra Jayasekera | Research in machine learning and computational statistics | $38,101 | October 2022 |
Hoagy Cunningham | 6-month stipend for research into preventing steganography in interpretable representations using multiple agents | $20,000 | October 2022 |
Joel Becker | 5-month research stipend to support civilizational resilience projects arising from SHELTER Weekend | $27,248 | October 2022 |
Jonas Hallgren | 4 month research stipend to set up AI safety groups at 2 groups covering 3 universities in Sweden with eventual retreat | $10,000 | October 2022 |
Anonymous | 4 month research stipend in technical safety, ML, and AI chip supply chains before participating in an AI governance program | $11,500 | October 2022 |
Anonymous | 8-month research stipend to do research in AI safety | $35,000 | October 2022 |
Anonymous | 3-month research stipend in technical AI safety | $9,750 | October 2022 |
David Udell | One-year full-time research stipend to work on alignment distillation and conceptual research with Team Shard after SERI MATS | $100,000 | October 2022 |
John Burden | Funding 2 years of technical AI safety research to understand and mitigate risk from large foundation models | $209,501 | October 2022 |
Anonymous | AI safety research | $1,500 | October 2022 |
Garrett Baker | 12-month research stipend to work on alignment research | $96,000 | October 2022 |
Magdalena Wache | 9-month part-time research stipend for AI safety, test fit for theoretical research | $62,040 | October 2022 |
Anshuman Radhakrishnan | 6-month stipend to continue upskilling in Machine Learning in order to contribute to Prosaic AI Alignment Research | $55,000 | October 2022 |
Theo Knopfer | Travel Support to BWC RevCon & Side Events | $3,500 | October 2022 |
Daniel Herrmann | Support for PhD on embedded agency, to free up my time from teaching | $64,000 | October 2022 |
Jeremy Gillen | 6-month research stipend to work on the research I started during SERI MATS, solving alignment problems in model based RL | $40,000 | October 2022 |
Anonymous | 3.5 months’ support for ML engineering skill-up | $8,720 | October 2022 |
Edward Saperia | One year of funding to improve an established community hub for EA in London | $50,000 | November 2022 |
Chu Chen | 1-year research stipend for upskilling in technical AI alignment research | $96,000 | November 2022 |
Anonymous | 12-month stipend to research assumptions underlying most existing work on AI alignment and AI forecasting | $7,645 | November 2022 |
Kajetan Janiak | Support forAI safety research. | $4,000 | November 2022 |
Felix Hofstätter | 6-month research stipend for an AI alignment research project on the manipulation of humans by AI | $25,383 | November 2022 |
Maximilian Kaufmann | 4 month research stipend to support an early-career alignment researcher, who is taking a year to pursue research and test fit | $20,000 | November 2022 |
Will Aldred | 6-month research stipend to: 1) Carry out independent research into risks from nuclear weapons, 2) Upskill in AI strategy | $40,250 | November 2022 |
Benjamin Anderson | Support to conduct work in AI safety | $5,000 | November 2022 |
Arun Jose | 4-month funding for Arun Jose's independent alignment research and study | $15,478 | November 2022 |
Anonymous | Professional development grant for independent upskilling in AGI Safety | $3,600 | November 2022 |
Matthias Georg Mayer | 6-months research stipend for upskilling and researching “Framing computational systems such that we can find meaningful concepts." | $24,000 | November 2022 |
Johannes C. Mayer | 6 months research stipend. Turn intuitions, like goals, wanting, abilities, into concepts applicable to computational systems | $24,000 | November 2022 |
Anonymous | Funding for MSc Thesis on Language Models Safety | $28,160 | November 2022 |
Paul Bricman | 1-year stipend and compute for conducting a research project focused on AI safety via debate in the context of LLMs | $50,182 | November 2022 |
Simon Möller | 6-months research stipend to transition into technical AI Safety work by working through Jacob Hilton’s curriculum and a project | $65,000 | November 2022 |
Anonymous | Fall semester stipend to work on AI Safety research, in particular adversarial robustness, monitoring, and trojaning | $7,500 | November 2022 |
Alan Chan | 4-month research stipend for a research visit with David Krueger on evaluating non-myopia in language models and RLHF systems | $12,321 | November 2022 |
Tomislav Kurtovic | 3-month research stipend to skill up in ML and Alignment with goal of developing a streamlined course in Math/AI | $5,500 | November 2022 |
Kadri Reis | Support to participate in Biological Weapons Convention in Geneva | $1,500 | November 2022 |
Skyler Crossman | Twelve month funding for global rationality organization development | $130,000 | December 2022 |
Daniel O'Connell | Investigate AI alignment options | $54,250 | December 2022 |
Remmelt Ellen | Cover participant stipends for AI Safety Camp Virtual 2023 | $72,500 | December 2022 |
Josiah Lopez-Wild | Scholarship for PhD student working on research related to AI Safety | $8,000 | January 2023 |
Zhengbo Xiang (Alana) | Support for 18 months of independent alignment research and upskilling, focusing on developing a research agenda on corrigibility | $30,000 | January 2023 |
Daniel Filan | Funding to make 12 more AXRP episodes, the AI X-risk Research Podcast. | $23,544 | January 2023 |
Sam Marks | 3-week research stipend for three people to review AI alignment agendas | $26,000 | January 2023 |
Robert Kirk | Funding to perform human evaluations for evaluating different machine learning methods for aligning language models | $10,000 | January 2023 |
Jérémy Perret | Support for AI alignment outreach in France (video/audio/text/events) & field-building | $24,800 | January 2023 |
Peter Ruschhaupt | 3 months support for exploring career options in AI governance - upskilling, networking and writing articles summarising present AI governance work and ideas. | $20,000 | January 2023 |
Charlie Griffin | 8 months research stipend for alignment work: assisting academics, skilling up and personal research. | $35,000 | January 2023 |
Alexander Lintz | 6 months research stipend for independent work centred on distillation and coordination in the AI governance & strategy space | $69,940 | January 2023 |
Anonymous | Living cost stipend top up while working on long-term future relevant research at a think tank | $15,000 | January 2023 |
Francis Rhys Ward | Support for PhD in AI safety - technical research and community building work | $2,305 | January 2023 |
Lucius Bushnaq | 6-month research stipend for two people to find formalisms for modularity in neural networks | $72,560 | January 2023 |
David Quarel | Support for a project with the Cambridge AI Safety group. The group will be working on projects related to AI alignment, in particular, setting up experimental demonstrations of deceptive alignment. | $5,613 | January 2023 |
Tim Farkas | Funding to run a 20-30 people 2-3 day retreat & bring together key EA thinkers/actors of the mind enhancement cause area | $2,540 | February 2023 |
Wyatt Tessari | 3-month stipend to connect, expand and enable the AGI gov/safety community in Canada | $17,000 | February 2023 |
Anonymous | 14-month research stipend and research costs for 3 research reports on best risk communication practices for longtermist orgs | $96,000 | February 2023 |
Daniel Kokotajlo | Funding for research retreat on a decision-theory / cause-prioritisation topic. | $10,000 | February 2023 |
Alex Altair | Funding for research stipend to develop a framework of optimisation. | $8,000 | February 2023 |
Max Lamparth | Funding for technical AI safety research - using interpretability methods on large language models for AI safety. | $2,500 | February 2023 |
Liam Carroll | 6-week research stipend to publish a series of blogposts synthesising Singular Learning Theory for a computer science audience | $8,000 | February 2023 |
Amrita A. Nair | 3-month scholarship to support Amrita Nair's upskilling in AI Safety working on Evan Hubinger's Reward Side-Channels experiment proposal. | $5,000 | February 2023 |
Gerold Csendes | Funding for project transitioning from AI capabilities to AI Safety research. | $8,200 | February 2023 |
Anonymous | Career transition including but not limited to exploring helping set up an x-risk research institute and working on a research project on AI ethics boards | $30,000 | February 2023 |
Tamsin Leake | 6 months research stipend to do independent AI alignment research focused on formal alignment and agent foundations | $30,000 | February 2023 |
Chris Scammell, Andrea Miotti, Katrina Joslin | A 2-day workshop to connect alignment researchers from the US, UK, and AI researchers and entrepreneurs from Japan | $72,827 | February 2023 |
Joseph Bloom | 6-month research stipend to conduct AI alignment research circuits in decision transformers | $50,000 | February 2023 |
Carson Jones | 1 year research stipend (or less) to help alignment researchers improve their research ability via 1-on-1 conversations | $10,000 | February 2023 |
Andrei Alexandru | Fine-tuning large language models for an interpretability challenge (compute costs) | $11,300 | February 2023 |
Anonymous | Two month research stipend and bridge funding to complete an AI governance report and produce a related article | $11,560 | February 2023 |
Jacob Mendel | General support to spend 1 month working with Will Bradshaw's team at the Nucleic Acid Observatory producing reports on the merits of alternative sample choices to wastewater for metagenomic sequencing. | $4,910 | February 2023 |
Max Räuker | Funding for Max Rauker's part-time research stipend for a trial and developer costs to maintain and improve the AI governance document sharing hub | $15,000 | March 2023 |
Anonymous | A twelve month research stipend to pursue independent writing on the sociology and philosophy of longtermist effective altruism | $75,346 | March 2023 |
Anonymous | 3-4 month stipend for AI safety upskilling and research | $7,000 | March 2023 |
Fabian Schimpf | 6-month research stipend for AI alignment research and conduct independent research on limits of predictability | $28,875 | March 2023 |
Anonymous | Support for PhD student pursuing research areas that intersect economics and EA | $4,528 | March 2023 |
Kane Nicholson | 6-months research stipend for AI safety upskilling and research projects | $26,150 | March 2023 |
David Lindner | Support for David Linder and Jeremy Scheurer to participate in Redwood Research's REMIX program on mechanistic interpretability using their new causal scrubbing methodology | $4,300 | March 2023 |
Jessica Rumbelow | One year of seed funding for a new AI interpretability research organisation | $195,000 | March 2023 |
Alexander Large | 1 month general support for projects for small EA-aligned charities. | $3,618 | March 2023 |
Kaarel Hänni, Kay Kozaronek, Walter Laurito, and Georgios Kaklmanos | 6-month research stipend for Georgios Kaklamanos, Walter Laurito, Kaarel Hänni and Kay Kozaronek to continue their SERI-MATS project on expanding the "Discovering Latent Knowledge" paper | $167,480 | March 2023 |
Matt MacDermott | 3 month research stipend for SERI MATS extension on agent foundations research | $24,000 | March 2023 |
Max Kaufmann | 9 months of funding for an early-career alignment researcher to work with Owain Evans and others | $45,000 | March 2023 |
Anonymous | 40 hours of research stipend for researchers to finish a paper on governing AI via compute | $1,200 | March 2023 |
Robert Miles | Funding for additional fellows for the AISafety.info Distillation Fellowship, improving our single-point-of-access to AI safety | $54,962 | March 2023 |
Alexander Turner | Funding Alexander Turner and team research project - Writing new motivations into a policy network by understanding and controlling its internal decision-influences | $115,411 | March 2023 |
Anonymous | 3-months stipend for upskilling in ML to transition from mathematics (at PhD level) to AI safety work. During the grant period, project goals include replicating an interpretability paper with longer term goals of publishing project write-ups. | $5,300 | March 2023 |
Anonymous | 3-4 month salary to help setup a new division at a US think tank doing AI governance research | $26,800 | March 2023 |
Anonymous | 2-month living expenses while waiting to join a US think tank | $12,000 | March 2023 |
Andrey Tumas | 4-month research stipend for conceptual/theoretical research towards perfect world-model interpretability. | $30,000 | March 2023 |
Nora Ammann | Funding for PIBBSS research fellowship to host 6 additional fellows | $100,000 | March 2023 |
David Staley | Support to maintain a copy of the alignment research dataset etc in the Arctic World Archive for 5 years | $3,000 | March 2023 |
Wesley Fenza | One-year funding of Astral Codex Ten meetup in Philadelphia | $5,000 | March 2023 |
Matthew MacInnes | 8 months support to test fit for social scientific research related to AI governance, preparing for MPhil proposal. | $9,000 | March 2023 |
Anonymous | 3-months funding for upskilling in AI Safety and research on hardware-enabled mechanisms for AI Governance. | $48,000 | March 2023 |
Anonymous | Support for PhD Track in Health and Security at a top US university | $9,800 | March 2023 |
Nicholas Kees Dupuis | 12-month research stipend to continue developing research agenda on new ways to make LLMs directly useful for alignment research without advancing capabilities | $120,000 | March 2023 |
Anonymous | Scholarship for taking the Offsec Certified Professional (OSCP) certification - the industry leading Penetration Testing with Kali Linux course and online lab before taking the OSCP certification exam. | $2,000 | March 2023 |
Jingyi Wang | Organising OPTIC: in-person, intercollegiate forecasting tournament. Boston, Apr 22. Funding is for prizes, venue, etc. | $2,100 | March 2023 |
Rusheb Shah | 6 months research stipend to upskill on technical AI safety through collaboration with researchers and self-study. | $50,000 | March 2023 |
Alfred Harwood | 6-month research stipend to research geometric rationality, ergodicity economics and their applications to decision theory and AI | $11,000 | April 2023 |
Alexander Turner | Year-long research stipend for shard theory and RL mech int research | $220,000 | April 2023 |
Said Achmiz | 1 year support for developing and maintaining projects/resources used by the EA and rationality communities | $60,000 | April 2023 |
Skyler Crossman | Support Astral Codex Ten Everywhere meetups | $22,000 | April 2023 |
Vanessa Kosoy | 2-year research stipend for work on the learning-theoretic AI alignment research agenda | $100,000 | April 2023 |
Robert Long | Support participants in a workshop on the science of consciousness and current and near-term AI systems | $10,840 | April 2023 |
Mateusz Bagiński | 6-month research stipend to up skill in maths, ML and AI alignment as well as working on non-profit projects beneficial for AI safety in pursuit of a research career. | $14,136 | April 2023 |
Quentin Feuillade--Montixi | Funding for Quentin Feuillade-Montixi's 4 month SERI MATS extension in London, mentored by Janus and Nicholas Kees Dupuis to work on cyborgism | $32,000 | April 2023 |
Anonymous | 3 month research stipend for independent research into and articles on large language models, agent foundations, and AI alignment | $14,019 | April 2023 |
Smitha Milli | Support to participate in the Symposium on AGI Safety at Oxford | $1,500 | April 2023 |
Anonymous | 6-month research stipend and course funding to upskill in AI safety before entering the Civil Service Fast Stream in September 2023 (Data & Tech) | $14,488 | April 2023 |
Anonymous | Support for independent projects & upskilling for AI safety work | $18,000 | April 2023 |
Sage Bergerson | 5-month part time research stipend for collaborating on a research paper analysing the implications of compute access | $2,500 | April 2023 |
Iván Godoy | 6-month research stipend to dedicate full-time to upskilling/AI alignment research tentatively focused on agent foundations and start a MIRIx group in Buenos Aires. | $6,000 | April 2023 |
Naoya Okamoto | Support for Mathematics of Machine Learning course offered by the University of Illinois at Urbana-Champaign. | $7,500 | April 2023 |
Joshua Reiners | 4-month research stipend to work on a project finding the most interpretable directions in gpt2-small's early residual stream to better understand contemporary AI systems | $16,300 | April 2023 |
Appendix: How we set grant and stipend amounts
(Our legal team requested that we include this section; it was written by Caleb Parkih.)
Over the last year, we have directed a significant portion of our grants toward supporting individuals in the field of AI safety research. When compared to much of the non-profit sector, some of our grants may seem large. However, I believe there are strong justifications for this approach.
Our grantees often have excellent earning potential
Our grantees often exhibit extraordinary earning potential due to their skills and qualifications. Many of them are excellent researchers (or have the potential to become one in a few years) and could easily take jobs in big tech or finance, and some could command high salaries (over $400k/year) while conducting similar research at AI labs. I expect that offering lower grants would push some grantees to take higher-earning options in private industry, creating less altruistic value. My impression is that our grants are not larger than comparable grants or salaries offered by many established AI safety organizations. In fact, I anticipate our grants are likely lower.
Grants have substantive downsides relative to working in an organisation
Grants, while helpful, do have some drawbacks compared to conventional employment. We do not provide additional benefits often found in organizations, such as health insurance, office spaces, or operations support, and our stipends often offer less financial security than full-time employment. Often, a portion of a grant is designed to support grantees’ operational and living expenses while they pursue their research projects.
Generally, we expect our grantees to work full-time on their projects, with similar intensity to the work they’d do at other organizations within EA and AI safety, and we structure our grants to account for this amount of work. There are of course, benefits such as our grantees having more flexibility than they would in many organizations.
How we decide on personal stipend size
The fund operates as a collection of fund managers who sometimes have differing views on how much to fund a grantee for.
Our general process is:
- The fund manager assigned to a grant reviews the budget provided by the grantee and makes adjustments based on their understanding of the grant, the market rate for similar work and other factors.
- The grant size is then reviewed by the fund chair (Asya Bergal) and the director of EA Funds (Caleb Parikh).
One heuristic we commonly use (especially for new, unproven grantees) is to offer roughly 70% of what we anticipate the grantee would earn in an industry role. We want to compensate people fairly and allow them to transition to impactful work without making huge sacrifices, while conserving our funding and discouraging grifters. A relatively common procedure for fund managers to use to decide how much to fund a grantee (assuming a fund manager has already decided they're overall worth funding), is to:
- Calculate what we expect the grantee would earn for similar work in an industry role (in the location they’re planning on performing the grant activity).
- Look at the amount of funding the applicant has requested, and see if that amount differs significantly from 70% of their industry salary.
- If it doesn't differ significantly, make the grant with the requested number.
- If it does differ significantly, consider adjusting the grant upwards or downwards, taking into account other factors that would affect what an appropriate funding ask would be, e.g. their pre-existing track record. (We’re more likely to adjust a grant downwards if we think the requested amount is too high, than upwards if we think the requested amount is too low).
Appendix: Eligibility criteria for LTFF grants
(Our legal team requested that we include this section; it was written by Caleb Parikh.)
Career Stage: Our interest lies in assisting grantees who are at the beginning of their careers, are contemplating a career shift towards an area of higher impact, or have accumulated several years of experience in their respective fields.
Demonstrated Skills: We require that prospective grantees exhibit evidence of possessing the skills necessary for the type of work or study they plan to undertake. This evidence could come from previous experiences, credentials, or a particularly remarkable application.
Generally, our grants fulfil one of the following additional criteria:
High-Impact Projects: The central aim of the Long-Term Future Fund is to improve humanity’s odds of a long and flourishing future. We assess proposed projects based on their potential to contribute to this goal. However, it is not mandatory for grantees to share this specific objective or to be entirely focused on improving the long-term future.
Empowering people pursuing impactful work: Grants related to career support (e.g. travel grants for conferences, scholarships for online courses, or funding to allow time for skill development) can enable grantees to increase their positive impact over the course of their careers. Grantees should demonstrate a strong interest in a priority area for the long-term future, such as biosecurity or mitigating risks from advanced AI. This could be evidenced by past experiences, credentials, or an application that shows familiarity with the field they intend to study.
Appendix: Special note on upskilling grants
(Our legal team requested that we include this section.)
One of LTFF’s overall charitable purposes is to encourage qualified and thoughtful individuals to think about and find solutions for global catastrophic risks, such as advanced artificial intelligence. We do this by funding such individuals to research issues like AI alignment so that they become more knowledgeable in and/or potentially change their career path to fully invest in these issues.
3 comments
Comments sorted by top scores.
comment by johnswentworth · 2023-08-02T19:59:00.549Z · LW(p) · GW(p)
One downside of [MATS] relative to an internship at an organisation is that there are fewer natural routes to enter a managed position...
I think you misspelled "upside".
(Also useful post, thankyou for publishing it.)
comment by Leon Lang (leon-lang) · 2023-08-02T09:41:43.743Z · LW(p) · GW(p)
This is very helpful, thanks! Actually, the post includes several sections, including in the appendix, that might be more interesting to many readers than the grant recommendations themselves. Maybe it would be good to change the title a bit so that people also expect other updates.
Replies from: MondSemmel↑ comment by MondSemmel · 2023-08-02T11:00:10.530Z · LW(p) · GW(p)
I also found parts of this post surprisingly interesting, given the ultra-dry title and intimidating reading time.
To present this kind of content in a way more readers could benefit from, another option would be to post it as a small sequence, so people could vote and comment on separate sections.