Could We Automate AI Alignment Research? 2023-08-10T12:17:05.194Z
An Overview of the AI Safety Funding Situation 2023-07-12T14:54:36.732Z
Retrospective on ‘GPT-4 Predictions’ After the Release of GPT-4 2023-03-17T18:34:17.178Z
GPT-4 Predictions 2023-02-17T23:20:24.696Z
Stephen McAleese's Shortform 2023-01-08T21:46:25.888Z
AGI as a Black Swan Event 2022-12-04T23:00:53.802Z
Estimating the Current and Future Number of AI Safety Researchers 2022-09-28T21:11:33.703Z
How Do AI Timelines Affect Existential Risk? 2022-08-29T16:57:44.107Z
Summary of "AGI Ruin: A List of Lethalities" 2022-06-10T22:35:48.500Z


Comment by Stephen McAleese (stephen-mcaleese) on There should be more AI safety orgs · 2023-09-22T12:12:34.949Z · LW · GW

Thanks for the post! I think it does a good job of describing key challenges in AI field-building and funding.

The talent gap section describes a lack of positions in industry organizations and independent research groups such as SERI MATS. However, there doesn't seem to be much content on the state of academic AI safety research groups. So I'd like to emphasize the current and potential importance of academia for doing AI safety research and absorbing talent. The 80,000 Hours AI risk page says that there are several academic groups working on AI safety including the Algorithmic Alignment Group at MIT, CHAI in Berkeley, the NYU Alignment Research Group, and David Krueger's group in Cambridge.

The AI field as a whole is already much larger than the AI safety field so I think analyzing the AI field is useful from a field-building perspective. For example, about 60,000 researchers attended AI conferences worldwide in 2022. There's an excellent report on the state of AI research called Measuring Trends in Artificial Intelligence. The report says that most AI publications come from the 'education' sector which is probably mostly universities. 75% of AI publications come from the education sector and the rest are published by non-profits, industry, and governments. Surprisingly, the top 9 institutions by annual AI publication count are all Chinese universities and MIT is in 10th place. Though the US and industry are still far ahead in 'significant' or state-of-the-art ML systems such as PaLM and GPT-4.

What about the demographics of AI conference attendees? At NeurIPS 2021, the top institutions by publication count were Google, Stanford, MIT, CMU, UC Berkeley, and Microsoft which shows that both industry and academia play a large role in publishing papers at AI conferences.

Another way to get an idea of where people work in the AI field is to find out where AI PhD students go after graduating in the US. The number of AI PhD students going to industry jobs has increased over the past several years and 65% of PhD students now go into industry but 28% still go into academic jobs.

Only a few academic groups seem to be working on AI safety and many of the groups working on it are at highly selective universities but AI safety could become more popular in academia in the near future. And if the breakdown of contributions and demographics of AI safety will be like AI in general, then we should expect academia to play a major role in AI safety in the future. Long-term AI safety may actually be more academic than AI since universities are the largest contributor to basic research whereas industry is the largest contributor to applied research.

So in addition to founding an industry org or facilitating independent research, another path to field-building is to increase the representation of AI safety in academia by founding a new research group though this path may only be tractable for professors.

Comment by Stephen McAleese (stephen-mcaleese) on AI romantic partners will harm society if they go unregulated · 2023-09-06T12:58:05.654Z · LW · GW

Thanks for the post. It's great that people are discussing some of the less-frequently discussed potential impacts of AI.

I think a good example to bring up here is video games which seem to have similar risks. 

When you think about it, video games seem just as compelling as AI romantic partners. Many video games such as Call of Duty, Civilization, or League of Legends involve achieving virtual goals, leveling up, and improving skills in a way that's often more fulfilling than real life. Realistic 3D video games have been widespread since the 2000s but I don't think they have negatively impacted society all that much. Though some articles claim that video games are having a significant negative effect on young men.

Personally, I've spent quite a lot of time playing video games during my childhood and teenage years but I mostly stopped playing them once I went to college. But why replace an easy and fun way to achieve things with reality which is usually less rewarding and more frustrating? My answer is that achievements in reality are usually much more real, persistent, and valuable than achievements in video games. You can achieve a lot in video games but it's unlikely that you'll achieve goals that increase your status to as many people over a long period of time as you can in real life.

A relevant quote from the article I linked above:

"After a while I realized that becoming master of a fake world was not worth the dozens of hours a month it was costing me, and with profound regret I stashed my floppy disk of “Civilization” in a box and pushed it deep into my closet. I hope I never get addicted to anything like “Civilization” again."

Similarly, in the near term at least, AI romantic partners could be competitive with real relationships in the short term, but I doubt it will be possible to have AI relationships that are as fulfilling and realistic as a marriage that lasts several decades.

And as with the case of video games, status will probably favour real relationships causing people to value real relationships because they offer more status than virtual ones. One possible reason is that status depends on scarcity. Just as being a real billionaire offers much more status than being a virtual one, having a real high-quality romantic partner will probably yield much more status than a virtual one and as a result, people will be motivated to have real partners.

Comment by Stephen McAleese (stephen-mcaleese) on Could We Automate AI Alignment Research? · 2023-08-28T21:58:14.049Z · LW · GW

Some related posts on automating alignment research I discovered recently:

Comment by Stephen McAleese (stephen-mcaleese) on Could We Automate AI Alignment Research? · 2023-08-28T21:42:27.552Z · LW · GW

I agree that the difficulty of the alignment problem can be thought of as a diagonal line on the 2D chart above as you described.

This model may make having two axes instead of one unnecessary. If capabilities and alignment scale together predictably, then high alignment difficulty is associated with high capabilities, and therefore the capabilities axis could be unnecessary.

But I think there's value in having two axes. Another way to think about your AI alignment difficulty scale is like a vertical line in the 2D chart: for a given level of AI capability (e.g. pivotal AGI), there is uncertainty about how hard it would be to align such an AGI because the gradient of the diagonal line intersecting the vertical line is uncertain.

Instead of a single diagonal line, I now think the 2D model describes alignment difficulty in terms of the gradient of the line. An optimistic scenario is one where AI capabilities are scaled and few additional alignment problems arise or existing alignment problems do not become more severe because more capable AIs naturally follow human instructions and learn complex values. A highly optimistic possibility is that increased capabilities and alignment are almost perfectly correlated and arbitrarily capable AIs are no more difficult to align than current systems. Easy worlds correspond to lines in the 2D chart with low gradients and low-gradient lines intersect the vertical line corresponding to the 1D scale at a low point.

A pessimistic scenario can be represented in the chart as a steep line where alignment problems rapidly crop up as capabilities are increased. For example, in such hard worlds, increased capabilities could make deception and self-preservation much more likely to arise in AIs. Problems like goal misgeneralization might persist or worsen even in highly capable systems. Therefore, in hard worlds, AI alignment difficulty increases rapidly with capabilities and increased capabilities do not have helpful side effects such as the formation of natural abstrations that could curtail the increasing difficulty of the AI alignment problem. In hard worlds, since AI capabilities gains cause a rapid increase in alignment difficulty, the only way to ensure that alignment research keeps up with the rapidly increasing difficulty of the alignment problem is to limit progress in AI capabilities.

Comment by Stephen McAleese (stephen-mcaleese) on Neuromorphic AI · 2023-08-21T17:32:48.243Z · LW · GW

What you're describing above sounds like an aligned AI and I agree that convergence to the best-possible values over time seems like something an aligned AI would do.

But I think you're mixing up intelligence and values. Sure, maybe an ASI would converge on useful concepts in a way similar to humans. For example, AlphaZero rediscovered some human chess concepts. But because of the orthogonality thesis, intelligence and goals are more or less independent: you can increase the intelligence of a system without its goals changing.

The classic thought experiment illustrating this is Bostrom's paperclip maximizer which continues to value only paperclips even when it becomes superintelligent.

Also, I don't think neuromorphic AI would reliably lead to an aligned AI. Maybe an exact whole-brain emulation of some benevolent human would be aligned but otherwise, a neuromorphic AI could have a wide variety of possible goals and most of them wouldn't be aligned.

I suggest reading The Superintelligent Will to understand these concepts better.

Comment by Stephen McAleese (stephen-mcaleese) on Try to solve the hard parts of the alignment problem · 2023-08-19T14:45:05.158Z · LW · GW

If you don’t know where you’re going, it’s not helpful enough not to go somewhere that’s definitely not where you want to end up; you have to differentiate paths towards the destination from all other paths, or you fail.

I'm not exactly sure what you meant here but I don't think this claim is true in the case of RLHF because, in RLHF, labelers only need to choose which option is better or worse between two possibilities, and these choices are then used to train the reward model. A binary feedback style was chosen specifically because it's usually too difficult for labelers to choose between multiple options.

A similar idea is comparison sorting where the algorithms only need the ability to compare two numbers at a time to sort a list of numbers.

Comment by Stephen McAleese (stephen-mcaleese) on Could We Automate AI Alignment Research? · 2023-08-19T14:31:11.271Z · LW · GW

Thanks for the comment.

I think there's a possibility that there could be dangerous emergent dynamics from multiple interacting AIs but I'm not too worried about that problem because I don't think you can increase the capabilities of an AI much simply by running multiple copies of it. You can do more work this way but I don't think you can get qualitatively much better work.

OpenAI created GPT-4 by training a brand new model not by running multiple copies of GPT-3 together. Similarly, although human corporations can achieve more than a single person, I don't consider them to be superintelligent. I'd say GPT-4 is more capable and dangerous than 10 copies of GPT-3.

I think there's more evidence that emergent properties come from within the AI model itself and therefore I'm more worried about bigger models than problems that would occur from running many of them. If we could solve a task using multiple AIs rather than one highly capable AI, I think that would probably be safer and I think that's part of the idea behind iterated amplification and distillation.

There's value in running multiple AIs. For example, OpenAI used multiple AIs to summarize books recursively. But even if we don't run multiple AI models, I think a single AI running at high speed would also be highly valuable. For example, you can paste a long text into GPT-4 today and it will summarize it in less than a minute.

Comment by Stephen McAleese (stephen-mcaleese) on Against Almost Every Theory of Impact of Interpretability · 2023-08-18T10:25:44.565Z · LW · GW

In my opinion, much of the value of interpretability is not related to AI alignment but to AI capabilities evaluations instead.

For example, the Othello paper shows that a transformer trained on the next-word prediction of Othello moves learns a world model of the board rather than just statistics of the training text. This knowledge is useful because it suggests that transformer language models are more capable than they might initially seem.

Comment by Stephen McAleese (stephen-mcaleese) on AGI is easier than robotaxis · 2023-08-13T20:22:28.861Z · LW · GW

I highly recommend this interview with Yann LeCun which describes his view on self-driving cars and AGI.

Basically, he thinks that self-driving cars are possible with today's AI but would require immense amounts of engineering (e.g. hard-wired behavior for corner cases) because today's AI (e.g. CNNs) tends to be brittle and lacks an understanding of the world.

My understanding is that Yann thinks we basically need AGI to solve autonomous driving in a reliable and satisfying way because the car would need to understand the world like a human to drive reliably.

Comment by Stephen McAleese (stephen-mcaleese) on Could We Automate AI Alignment Research? · 2023-08-12T13:51:12.375Z · LW · GW

I think it's probably true today that LLMs are better at doing subtasks than doing work over long time horizons.

On the other hand, I think human researchers are also quite capable of incrementally advancing their own research agendas and there's a possibility that AI could be much more creative than humans at seeing patterns in a huge corpus of input text and coming up with new ideas from that.

Comment by Stephen McAleese (stephen-mcaleese) on What's A "Market"? · 2023-08-12T13:46:33.980Z · LW · GW

Thanks for the post. I like how it gives several examples and then aims to find what's in common between them.

Recently I've been thinking that research can be seen as a kind of market where researchers specialize in research they have a comparative advantage in and trade insights by publishing and reading other researchers' work.

Comment by Stephen McAleese (stephen-mcaleese) on is a major AGI lab · 2023-08-10T13:57:42.505Z · LW · GW

10,000 teraFLOPS

Each H100 will be closer to 1,000 teraFLOPs or less. For reference, the A100 generally produces 150 teraFLOPs in real-world systems.

Comment by Stephen McAleese (stephen-mcaleese) on Alignment Grantmaking is Funding-Limited Right Now · 2023-08-07T13:28:57.553Z · LW · GW

Today I received a rejection email from Oliver Hybryka on behalf of Lightspeed Grants. It says Lightspeed Grants received 600 applicants, ~$150M in default funding requests, and ~$350M in maximum funding requests. Since the original amount to be distributed was $5M, only ~3% of applications could be funded.

I knew they received more applicants than they could fund but I'm surprised by how much was requested and how large the gap was.

Comment by Stephen McAleese (stephen-mcaleese) on The Control Problem: Unsolved or Unsolvable? · 2023-08-02T20:25:50.561Z · LW · GW

I meant that I see most humans as aligned with human values such as happiness and avoiding suffering. The point I'm trying to make is that human minds are able to represent these concepts internally and act on them in a robust way and therefore it seems possible in principle that AIs could too.

I'm not sure whether humans are aligned with evolution. Many humans do want children but I don't think many are fitness maximizes where they want as many as possible.

Comment by Stephen McAleese (stephen-mcaleese) on The Control Problem: Unsolved or Unsolvable? · 2023-08-02T19:31:07.231Z · LW · GW

I think AI alignment is solvable for the same reason AGI is solvable: humans are an existence-proof for both alignment and general intelligence.

Comment by Stephen McAleese (stephen-mcaleese) on AXRP Episode 24 - Superalignment with Jan Leike · 2023-08-01T10:33:29.319Z · LW · GW

My summary of the podcast


The superalignment team is OpenAI’s new team, co-led by Jan Leike and Ilya Sutskever, for solving alignment for superintelligent AI. One of their goals is to create a roughly human-level automated alignment researcher. The idea is that creating an automated alignment researcher is a kind of minimum viable alignment project where aligning it is much easier than aligning a more advanced superintelligent sovereign-style system. If the automated alignment researcher creates a solution to aligning superintelligence, OpenAI can use that solution for aligning superintelligence.

The automated alignment researcher

The automated alignment researcher is expected to be used for running and evaluating ML experiments, suggesting research directions, and helping with explaining conceptual ideas. The automated researcher needs two components: a model capable enough to do alignment research, which will probably be some kind of advanced language model, and alignment for the model. Initially, they’ll probably start with relatively weak systems with weak alignment methods like RLHF and then scale up both using a bootstrapping approach where the model increases alignment and then the model can be scaled as it becomes more aligned. The end goal is to be able to convert compute into alignment research so that alignment research can be accelerated drastically. For example, if 99% of tasks were automated, research would be ~100 times faster.

The superalignment team

The automated alignment researcher will be built by the superalignment team which currently has 20 people and could have 30 people by the end of the year. OpenAI also plans to allocate 20% of its compute to the superalignment team. Apart from creating the automated alignment researcher, the superalignment team will continue doing research on feedback-based approaches and scalable oversight. Jan emphasizes that the superalignment team will still be needed even if there is an automated alignment researcher because he wants to keep humans in the loop. He also wants to avoid the risk of creating models that seek power, self-improve, deceive human overseers, or exfiltrate (escape).

Why Jan is optimistic

Jan is generally optimistic about the plan succeeding and estimates that it has an ~80% chance of succeeding even though Manifold only gives the project a 22% chance of success. He gives 5 reasons for being optimistic:

  • LLMs understand human intentions and morality much better than other kinds of agents such as RL game-playing agents. For example, often you can simply ask them to behave a certain way.
  • Seeing how well RLHF worked. For example, training agents to play Atari games works almost as well as using the reward signal. RLHF-aligned LLMs are much more aligned than base models.
  • It’s possible to iterate and improve alignment solutions using experiments and randomized controlled trials.
  • Evaluating research is easier than generating it.
  • The last reason is a bet on language models. Jan thinks many alignment tasks can be formulated as text-in-text-out tasks.

Controversial statements/criticisms

Jan criticizing interpretability research:

I think interpretability is neither necessary, nor sufficient. I think there is a good chance that we could solve alignment purely behaviorally without actually understanding the models internally. And I think, also, it’s not sufficient where: if you solved interpretability, I don’t really have a good story of how that would solve superintelligence alignment, but I also think that any amount of non-trivial insight we can gain from interpretability will be super useful or could potentially be super useful because it gives us an avenue of attack.

Jan criticizing alignment theory research:

I think there’s actually a lot more scope for theory work than people are currently doing. And so I think for example, scalable oversight is actually a domain where you can do meaningful theory work, and you can say non-trivial things. I think generalization is probably also something where you can say… formally using math, you can make statements about what’s going on (although I think in a somewhat more limited sense). And I think historically there’s been a whole bunch of theory work in the alignment community, but very little was actually targeted at the empirical approaches we tend to be really excited [about] now. And it’s also a lot of… Theoretical work is generally hard because you have to… you’re usually either in the regime where it’s too hard to say anything meaningful, or the result requires a bunch of assumptions that don’t hold in practice. But I would love to see more people just try … And then at the very least, they’ll be good at evaluating the automated alignment researcher trying to do it.

Ideas for complementary research

Jan also gives some ideas that would be complementary to OpenAI’s alignment research agenda:

  • Creating mechanisms for eliciting values from society.
  • Solving current problems like hallucinations, jailbreaking, or mode-collapse (repetition) in RLHF-trained models.
  • Improving model evaluations and evals to measure capabilities and alignment.
  • Reward model interpretability.
  • Figuring out how models generalize. For example, figuring out how to generalize alignment for easy-to-supervised tasks to hard tasks.
Comment by Stephen McAleese (stephen-mcaleese) on Open Problems and Fundamental Limitations of RLHF · 2023-07-31T21:29:34.779Z · LW · GW

Thanks for writing the paper! I think it will be really impactful and I think it fills a big gap in the literature.

I've always wondered what problems RLHF had and mostly I've seen only short informal answers about how it incentivizes deception or how humans can't provide a scalable signal for superhuman tasks which is odd because it's one of the most commonly used AI alignment methods.

Before your paper, I think this post was the most in-depth analysis of problems with RLHF I've seen so I think your paper is now probably the best resource for problems with RLHF. Apart from that post, the List of Lethalities post has a few related sections and this post by John Wentworth has a section on RLHF.

I'm sure your paper will spark future research on improving RLHF because it lists several specific discrete problems that could be tackled!

Comment by Stephen McAleese (stephen-mcaleese) on Alignment Grantmaking is Funding-Limited Right Now · 2023-07-28T14:42:02.375Z · LW · GW

At first, I predicted you were going to say that public funding would accelerate capabilities research over alignment but it seems like the gist of your argument is that lots of public funding would muddy the water and sharply reduce the average quality of alignment research.

That might be true for theoretical AI alignment research but I'd imagine it's less of a problem for types of AI alignment research that have decent feedback loops like interpretability research and other kinds of empirical research like experiments on RL agents.

One reason that I'm skeptical is that there doesn't seem to be a similar problem in the field of ML which is huge and largely publicly funded to the best of my knowledge and still makes good progress. Possible reasons why the ML field is still effective despite its size include sufficient empirical feedback loops and the fact that top conferences reject most papers (~25% is a typical acceptance rate for papers at NeurIPS).

Comment by Stephen McAleese (stephen-mcaleese) on Think carefully before calling RL policies "agents" · 2023-07-27T20:41:07.642Z · LW · GW

According to the LessWrong concepts page for agents, an agent is an entity that perceives its environment and takes actions to maximize its utility.

Comment by Stephen McAleese (stephen-mcaleese) on Alignment Grantmaking is Funding-Limited Right Now · 2023-07-20T23:10:32.257Z · LW · GW

Context of the post: funding overhang

The post was written in 2021 and argued that there was a funding overhang in longtermist causes (e.g. AI safety) because the amount of funding had grown faster than the number of people working.

The amount of committed capital increased by ~37% per year and the amount of deployed funds increased by ~21% per year since 2015 whereas the number of engaged EAs only grew ~14% per year.

The introduction of the FTX Future Fund around 2022 caused a major increase in longtermist funding which further increased the funding overhang.

Benjamin linked a Twitter update in August 2022 saying that the total committed capital was down by half because of a stock market and crypto crash. Then FTX went bankrupt a few months later.

The current situation

The FTX Future Fund no longer exists and Open Phil AI safety spending seems to have been mostly flat for the past 2 years. The post mentions that Open Phil is doing this to evaluate impact and increase capacity before possibly scaling more.

My understanding (based on this spreadsheet) is that the current level of AI safety funding has been roughly the same for the past 2 years whereas the number of AI safety organizations and researchers has been increasing by ~15% and ~30% per year respectively. So the funding overhang could be gone by now or there could even be a funding underhang.

Comparing talent vs funding

The post compares talent and funding in two ways:

  • The lifetime value of a researcher (e.g. $5 million) vs total committed funding (e.g. $1 billion)
  • The annual cost of a researcher (e.g. $100k) vs annual deployed funding (e.g. $100 million)

A funding overhang occurs when the total committed funding is greater than the lifetime value of all the researchers or the annual amount of funding that could be deployed per year is greater than the annual cost of all researchers.

Then the post says:

“Personally, if given the choice between finding an extra person for one of these roles who’s a good fit or someone donating $X million per year, to think the two options were similarly valuable, X would typically need to be over three, and often over 10 (where this hugely depends on fit and the circumstances).”

I forgot to mention that this statement was applied to leadership roles like research leads, entrepreneurs, and grantmakers who can deploy large amounts of funds or have a large impact and therefore can have a large amount of value. Ordinary employees probably have less financial value.

Assuming there is no funding overhang in AI safety anymore, the marginal value of funding over more researchers is higher today than it was when the post was written.

The future

If total AI safety funding does not increase much in the near term, AI safety could continue to be funding-constrained or become more funding constrained as the number of people interested in working on AI safety increases.

However, the post explains some arguments for expecting EA funding to increase:

  • There’s some evidence that Open Philanthropy plans to scale up its spending over the next several years. For example, this post says, “We gave away over $400 million in 2021. We aim to double that number this year, and triple it by 2025”. Though the post was written in 2022 so it could be overoptimistic.
  • According to Metaculus, there is a ~50% chance of another Good Ventures / Open Philanthropy-sized fund being created by 2026 which could substantially increase funding for AI safety.

My mildly optimistic guess is that as AI safety becomes more mainstream there will be a symmetrical effect where both more talent and funding are attracted to the field.

Comment by Stephen McAleese (stephen-mcaleese) on Alignment Grantmaking is Funding-Limited Right Now · 2023-07-20T13:02:22.940Z · LW · GW

I have a similar story. I left my job at Amazon this year because there were layoffs there. Also, the release of GPT-4 in March made working on AI safety seem more urgent.

Comment by Stephen McAleese (stephen-mcaleese) on Alignment Grantmaking is Funding-Limited Right Now · 2023-07-20T12:11:45.204Z · LW · GW

In this 80,000 Hours post (written in 2021), Benjamin Todd says "I’d typically prefer someone in these roles to an additional person donating $400,000–$4 million per year (again, with huge variance depending on fit)." This seems like an argument against earning to give for most people.

On the other hand, this post emphasizes the value of small donors.

Comment by Stephen McAleese (stephen-mcaleese) on Alignment Grantmaking is Funding-Limited Right Now · 2023-07-19T22:02:11.481Z · LW · GW

That seems like a better split and there are outliers of course. But I think orgs are more likely to be well-known to grant-makers on average given that they tend to have a higher research output, more marketing, and the ability to organize events. An individual is like an organization with one employee.

Comment by Stephen McAleese (stephen-mcaleese) on Alignment Grantmaking is Funding-Limited Right Now · 2023-07-19T21:44:32.593Z · LW · GW

Based on what I've written here, my verdict is that AI safety seems more funding constrained for small projects and individuals than it is for organizations for the following reasons:
- The funds that fund smaller projects such as LTFF tend to have less money than other funds such as Open Phil which seems to be more focused on making larger grants to organizations (Open Phil spends 14x more per year on AI safety).
- Funding could be constrained by the throughput of grant-makers (the number of grants they can make per year). This seems to put funds like LTFF at a disadvantage since they tend to make a larger number of smaller grants so they are more constrained by throughput than the total amount of money available. Low throughput incentivizes making a small number of large grants which favors large existing organizations over smaller projects or individuals.
- Individuals or small projects tend to be less well-known than organizations so grants for them can be harder to evaluate or might be more likely to be rejected. On the other hand, smaller grants are less risky.
- The demand for funding for individuals or small projects seems like it could increase much faster than it could for organizations because new organizations take time to be created (though maybe organizations can be quickly scaled).

Some possible solutions:
- Move more money to smaller funds that tend to make smaller grants. For example, LTFF could ask for more money from Open Phil.
- Hire more grant evaluators or hire full-time grant evaluators so that there is a higher ceiling on the total number of grants that can be made per year.
- Demonstrate that smaller projects or individuals can be as effective as organizations to increase trust.
- Seek more funding: half of LTFF's funds come from direct donations so they could seek more direct donations.
- Existing organizations could hire more individuals rather than the individuals seeking funding themselves.
- Individuals (e.g. independent researchers) could form organizations to reduce the administrative load on grant-makers and increase their credibility.

Comment by Stephen McAleese (stephen-mcaleese) on Alignment Grantmaking is Funding-Limited Right Now · 2023-07-19T20:22:54.473Z · LW · GW

This sounds more or less correct to me. Open Philanthropy (Open Phil) is the largest AI safety grant maker and spent over $70 million on AI safety grants in 2022 whereas LTFF only spent ~$5 million. In 2022, the median Open Phil AI safety grant was $239k whereas the median LTFF AI safety grant was only $19k in 2022.

Open Phil and LTFF made 53 and 135 AI safety grants respectively in 2022. This means the average Open Phil AI safety grant in 2022 was ~$1.3 million whereas the average LTFF AI safety grant was only $38k. So the average Open Phil AI safety grant is ~30 times larger than the average LTFF grant.

These calculations imply that Open Phil and LTFF make a similar number of grants (LTFF actually makes more) and that Open Phil spends much more simply because its grants tend to be much larger (~30x larger). So it seems like funds may be more constrained by their ability to evaluate and fulfill grants rather than having a lack of funding. This is not surprising given that the LTFF grantmakers apparently work part-time.

Counterintuitively, it may be easier for an organization (e.g. Redwood Research) to get a $1 million grant from Open Phil than it is for an individual to get a $10k grant from LTFF. The reason why is that both grants probably require a similar amount of administrative effort and a well-known organization is probably more likely to be trusted to use the money well than an individual so the decision is easier to make. This example illustrates how decision-making and grant-making processes are probably just as important as the total amount of money available.

LTFF specifically could be funding-constrained though given that it only spends ~$5 million per year on AI safety grants. Since ~40% of LTFF's funding comes from Open Phil and Open Phil has much more money than LTFF, one solution is for LTFF to simply ask for more money from Open Phil.

I don't know why Open Phil spends so much more on AI safety than LTFF (~14x more). Maybe it's simply because of some administrative hurdles that LTFF has when requesting money from Open Phil or maybe Open Phil would rather make grants directly.

Here is a spreadsheet comparing how much Open Phil, LTFF, and the Survival and Flourishing Fund (SFF) spend on AI safety per year.

Comment by Stephen McAleese (stephen-mcaleese) on Alignment Grantmaking is Funding-Limited Right Now · 2023-07-19T19:21:36.888Z · LW · GW

Plug: I recently published a long post on the EA Forum on AI safety funding: An Overview of the AI Safety Funding Situation.

Comment by Stephen McAleese (stephen-mcaleese) on Alignment Grantmaking is Funding-Limited Right Now · 2023-07-19T19:11:21.032Z · LW · GW

I created a Guesstimate model that estimates that $2-3 million (range: $25k - $25 million) in high-quality grants could have been requested for the Lightspeed grant ($5 million was available).

Comment by Stephen McAleese (stephen-mcaleese) on Alignment Megaprojects: You're Not Even Trying to Have Ideas · 2023-07-13T11:57:48.772Z · LW · GW

Thanks for the post. I think it's a valuable exercise to think about how AI safety could be accelerated with unlimited money.

I think the Manhattan Project idea is interesting but I see some problems with the analogy:

  • The Manhattan Project was originally a military project and to this day, the military is primarily funded and managed by the government. But most progress in AI today is made by companies such as OpenAI and Google and universities like the University of Toronto. I think a more relevant project is CERN because it's more recent and focused on the non-military development of science.
  • The Manhattan Project happened a long time ago and the world has changed a lot since then. The wealth and influence of tech companies and universities is probably much greater today than it was then.
  • It's not obvious that a highly centralized effort is needed. The Alignment Forum, open source developers, and the academic research community (e.g. the ML research community) are examples of decentralized research communities that seem to be highly effective at making progress. This probably wasn't possible in the past because the internet didn't exist.

I highly doubt that it's possible to recreate the Bay Area culture in a top-down way. I'm pretty sure China has tried this and I don't think they've succeeded.

Also, I think your description is overemphasizing the importance of geniuses like Von Neumann because 130,000 other people worked on the Manhattan Project too. I think something similar has happened at Google today where Jeff Dean is revered but in reality, I think most progress at Google is done by the tens of thousands of the smart but not genius dark matter developers there.

Anyway, let's assume that we have a giant AI alignment project that would cost billions. To fund this, we could:

  1. Expand EA funding substantially using community building.
  2. Ask the government to fund the project.

The government has a lot of money but it seems challenging to convince the government to fund AI alignment compared to getting funding from EA. So maybe some EAs with government expertise could work with the government to increase AI safety investment.

If the AI safety project gets EA funding, I think it needs to be cost-effective. The reality is that only ~12% of Open Phil's money is spent on AI safety. The reason why is that there is a triage situation with other cause areas like biosecurity, farm animal welfare, and global health and development so the goal is to find cost-effective ways to spend money on AI safety. The project needs to be competitive and has more value on the margin than other proposals.

In my opinion, the government projects that are most likely to succeed are those that build on or are similar to recent successful projects and are in the Overton window. For example:

My guess is that leveraging academia would be effective and scalable because you can build on the pre-existing talent, leadership, culture, and infrastructure. Alternatively, governments could create new regulations or laws to influence the behavior of companies (e.g. GDPR). Or they could found new think tanks or research institutes possibly in collaboration with universities or companies.

As for the school ideas, I've heard that Lee Sedol went to a Go school and as you mentioned, Soviet chess was fueled by Soviet chess programs. China has intensive sports schools but I doubt these kinds of schools would be considered acceptable in Western countries which is an important consideration given that most of AI safety work happens in Western countries like the US and UK.

In science fiction, there are even more extreme programs like the Spartan program in Halo where children were kidnapped and turned into super soldiers, or Star Wars where clone soldiers were grown and trained in special facilities.

I don't think these kinds of extreme programs would work. Advanced technologies like human cloning could take decades to develop and are illegal in many countries. Also, they sound highly unethical which is a major barrier to their success in modern developed countries like the US and especially EA-adjacent communities like AI safety.

I think a more realistic idea is something like the Atlas Fellowship or SERI MATS which are voluntary programs for aspiring researchers in their teens or twenties.

The geniuses I know of that were trained from an early age in Western-style countries are Mozart (music), Von Neumann (math), John Stuart Mill (philosophy), and Judit Polgár (chess). In all these cases, they were gifted children who lived in normal nuclear families and had ambitious parents and extra tutoring.

Comment by Stephen McAleese (stephen-mcaleese) on The virtue of determination · 2023-07-11T09:58:37.160Z · LW · GW

Thanks for writing this. I thought it was really enlightening.

I think longtermism and the idea that we could influence a vast future is a really interesting and important idea. As you said, future people will probably see us as both incredibly incompetent and influential. Maybe the most rational response to this situation is to be determined to make the future go well.

I also thought it was really insightful how you mentioned that discovering past truths must have been hard. We take science and technology for granted but much of it is the result of cumulative hard work over time by scientists and engineers.

Comment by Stephen McAleese (stephen-mcaleese) on Lessons On How To Get Things Right On The First Try · 2023-07-07T10:34:33.625Z · LW · GW

Some lessons from the exercise:
- All models and beliefs are wrong to some extent and the best way to find out how wrong they are is to put them to the test in experiments.  The map is not the territory.
- Be careful when applying existing knowledge and models to new problems or using analogies. The problem might require fresh thinking, new concepts, or a new paradigm.
- It's good to have a lot of different people working on a problem because each person can attack the problem in their own unique way from a different angle.  A lot of people may fail but some could succeed.
- Don't flinch away from anomalies.  A lot of scientific progress has resulted from changing models to account for anomalies (see The Structure of Scientific Revolutions).

Comment by Stephen McAleese (stephen-mcaleese) on When do "brains beat brawn" in Chess? An experiment · 2023-07-02T18:39:06.216Z · LW · GW

Thanks for the post! It was a good read. One point I don't think was brought up is the fact that chess is turn-based whereas real life is continuous.

Consequently, the huge speed advantage that AIs have is not that useful in chess because the AI still has to wait for you to make a move before it can move.

But since real life is continuous, if the AI is much faster than you, it could make 1000 'moves' for every move you make and therefore speed is a much bigger advantage in real life.

Comment by Stephen McAleese (stephen-mcaleese) on Lightcone Infrastructure/LessWrong is looking for funding · 2023-06-16T13:55:15.967Z · LW · GW

I like LessWrong and I visit the site quite often. I would be willing to pay a monthly subscription for the site especially if the subscription included extra features.

Maybe LessWrong could raise money in a similar way to Twitter: by offering a paid premium version.

A Fermi estimate of how much revenue LessWrong could generate:

  • As far as I know, the site gets about 100,000 monthly visitors.
  • If 10,000 of them signed up for the premium subscription at $10 per month, then LessWrong could generate $100,000 in revenue per month or $1.2 million per year which would cover 20-40% of the $3-6 million funding gap mentioned.
Comment by Stephen McAleese (stephen-mcaleese) on GPT-4 Predictions · 2023-06-16T09:50:47.957Z · LW · GW

I used the estimate from a document named What's In My AI? which estimates that the GPT-2 training dataset contains 15B tokens.

A quick way to estimate the total number of training tokens is to multiply the training dataset size in gigabytes by the number of tokens per byte which is typically about 0.25 according to the Pile paper. So 40B x 0.25 = 10 billion.

Comment by Stephen McAleese (stephen-mcaleese) on Why I'm Not (Yet) A Full-Time Technical Alignment Researcher · 2023-05-28T12:51:15.053Z · LW · GW

For context, I have a very similar background to you - I'm a software engineer with a computer science degree interested in working on AI alignment.

LTFF granted about $10 million last year. Even if all that money were spent on independent AI alignment researchers, if each researcher costs $100k per year, then there would only be enough money to fund about 100 researchers in the world per year so I don't see LTFF as a scalable solution.

Unlike software engineering,  AI alignment research tends to be neglected and underfunded because it's not an activity that can easily be made profitable. That's one reason why there are far more software engineers than AI alignment researchers.

Work that is unprofitable but beneficial such as basic science research has traditionally been done by university researchers who, to the best of my knowledge, are mainly funded by government grants.

I have also considered becoming independently wealthy to work on AI alignment in the past but that strategy seems too slow if AGI will be created relatively soon.

So my plan is to apply for jobs at organizations like Redwood Research or apply for funding from LTFF and if those plans fail, I will consider getting a PhD and getting funding from the government instead which seems more scalable.

Comment by Stephen McAleese (stephen-mcaleese) on Worlds Where Iterative Design Fails · 2023-05-27T14:53:54.744Z · LW · GW

One more reason why iterative design could fail is if we build AI systems with low corrigibility. If we build a misaligned AI with low corrigibility that isn't doing what we want, we might have difficulty shutting it down or changing its goal. I think that's one of the reasons why Yudkowsky believes we have to get alignment right on the first try.

Comment by Stephen McAleese (stephen-mcaleese) on SERI MATS - Summer 2023 Cohort · 2023-05-06T16:48:27.914Z · LW · GW

Does anyone know roughly how many candidates typically apply or what the acceptance rate is?

Comment by Stephen McAleese (stephen-mcaleese) on Sama Says the Age of Giant AI Models is Already Over · 2023-04-18T21:55:09.687Z · LW · GW

Maybe Sam knows a lot I don't know but here are some reasons why I'm skeptical about the end of scaling large language models:

  • From scaling laws we know that more compute and data reliably lead to better performance and therefore scaling seems like a low-risk investment.
  • I'm not sure how much GPT-4 cost but GPT-3 only cost $5-10 million which isn't much for large tech companies (e.g. Meta spends billions on the metaverse every year).
  • There are limits to how big and expensive supercomputers can be but I doubt we're near them. I've heard that GPT-4 was trained on ~10,000 GPUs which is a lot but not an insane amount (~$300m worth of GPUs). If there were 100 GPUs/m^2, all 10,000 GPUs could fit in a 10m x 10m room. A model trained with millions of GPUs is not inconceivable and is probably technically and economically possible today.

Because scaling laws are power laws (x-axis is logarithmic and y-axis is linear), there are diminishing returns to resources like more compute but I doubt we've reached the point where the marginal cost of training larger models exceeds the marginal benefit. Think of a company like Google: building the biggest and best model is immensely valuable in a global, winner-takes-all market like search.

Comment by Stephen McAleese (stephen-mcaleese) on GPT-4 Specs: 1 Trillion Parameters? · 2023-04-10T18:56:03.281Z · LW · GW

My estimate is about 400 billion parameters (100 billion - 1 trillion) based on EpochAI's estimate of GPT-4's training compute and scaling laws which can be used to calculate the optimal number of parameters and training tokens that should be used for language models given a certain compute budget.

Although 1 trillion sounds impressive and bigger models tend to achieve a lower loss given a fixed amount of data, an increased number of parameters is not necessarily more desirable because a bigger model uses more compute and therefore can't be trained on as much data.

If the model is made too big, the decrease in training tokens actually exceeds the benefit of the larger model leading to worse performance.

Extract from the Training Compute-Optimal Language Models paper:

"our analysis clearly suggests that given the training compute budget for many current LLMs, smaller models should have been trained on more tokens to achieve the most performant model."

Another quote from the paper:

"Unless one has a compute budget of  FLOPs (over 250× the compute used to train Gopher), a 1 trillion parameter model is unlikely to be the optimal model to train."

So unless the EpochAI estimate is too low by about an order of magnitude [1] or OpenAI has discovered new and better scaling laws, the number of parameters in GPT-4 is probably lower than 1 trillion.

My Twitter thread estimating the number of parameters in GPT-4. 


  1. ^

    I don't think it is but it could be.

Comment by Stephen McAleese (stephen-mcaleese) on Pausing AI Developments Isn't Enough. We Need to Shut it All Down · 2023-04-08T15:17:55.311Z · LW · GW

I generally agree with the points made in this post.

Points I agree with

Slowing down AI progress seems rational conditional on there being a significant probability that AGI will cause extinction.

Generally, technologies are accepted only when their expected benefit significantly outweighs their expected harms. Consider flying as an example. Let’s say the benefit of each flight is +10 and the harm of getting killed is -1000. If x is the probability of surviving then the net utility equation is .

Solving for x, the utility is 0 when . In other words, the flight would only be worth it if there was at least a 99% chance of survival which makes intuitive sense.

If we use the same utility function for AI and assume that Eliezer believes that creating AGI will have a 50% chance of causing human extinction then the outcome would be strongly net negative for humanity and one should agree with this sentiment unless one's P(extinction) is less than 1%.

Eliezer is saying that we can in principle make AI safe but argues that it could take decades to advance AI safety to the point where we can be sufficiently confident that creating an AGI would have net positive utility.

If slowing down AI progress is the best course of action, then achieving a good outcome for AGI seems more like an AI governance problem than a technical AI safety research problem.

Points I disagree with

"Progress in AI capabilities is running vastly, vastly ahead of progress in AI alignment or even progress in understanding what the hell is going on inside those systems. If we actually do this, we are all going to die."

I think Evan Hubinger has said that before if this were the case, GPT-4 would be less aligned than GPT-3 but the opposite is true in reality (GPT-4 is more aligned according to OpenAI). Still, I think we ideally want a scalable AI alignment solution long before the level of capabilities is reached where it’s needed.  A similar idea is how Claude Shannon conceived of a minimax chess algorithm decades before we had the compute to implement it.

Other points

Eliezer has been sounding the alarm for some time and it’s easy to get alarm fatigue and become complacent. But the fact that a leading member of the AI safety research community has a message as extreme as this is alarming.

Comment by Stephen McAleese (stephen-mcaleese) on [April Fools'] Definitive confirmation of shard theory · 2023-04-01T11:52:05.928Z · LW · GW

I forgot today was 1 April and thought this post was serious until I saw the first image!

Comment by Stephen McAleese (stephen-mcaleese) on Retrospective on ‘GPT-4 Predictions’ After the Release of GPT-4 · 2023-03-18T18:44:29.468Z · LW · GW

In this case, the percent error is 8.1% and the absolute error is 8%. If one student gets 91% on a test and another gets 99% they both get an A so the difference doesn't seem large to me.

The article linked seems to be missing. Can you explain your point in more detail?

Comment by Stephen McAleese (stephen-mcaleese) on Some Thoughts on Singularity Strategies · 2023-03-18T17:34:37.305Z · LW · GW

It's not obvious to me that creating super smart people would have a net positive effect because motivating them to decrease AI risk is itself an alignment problem. What if they instead decide to accelerate AI progress or do nothing at all?

Comment by Stephen McAleese (stephen-mcaleese) on Why I think strong general AI is coming soon · 2023-03-18T14:24:43.989Z · LW · GW

in order for us to hit that date things have to start getting weird now.

I don't think this is necessary. Isn't the point of exponential growth that a period of normalcy can be followed by rapid dramatic changes? Example: the area of lilypads doubles on a pond and only becomes noticeable in the last several doublings.

Comment by Stephen McAleese (stephen-mcaleese) on Why I think strong general AI is coming soon · 2023-03-18T14:22:06.273Z · LW · GW

Epic post. It reminds me of "AGI Ruin: A List of Lethalities" except it's more focused on AI timelines rather than AI risk.

Comment by Stephen McAleese (stephen-mcaleese) on Retrospective on ‘GPT-4 Predictions’ After the Release of GPT-4 · 2023-03-18T10:50:47.918Z · LW · GW

At 86.4%, GPT-4's accuracy is now approaching 100% but GPT-3's accuracy, which was my prior, was only 43.9%. Obviously one would expect GPT-4's accuracy to be higher than GPT-3's since it wouldn't make sense for OpenAI to release a worse model but it wasn't clear ex-ante that GPT-4's accuracy would be near 100%.

I predicted that GPT-4's accuracy would fall short of 100% accuracy by 20.6% when the true value was 13.6%. Using this approach, the error would be 

Strictly speaking, the formula for percent error according to Wikipedia is the relative error expressed as a percentage:

I think this is the correct formula to use because what I'm trying to measure is the deviation of the true value from the regression line (predicted value).

Using the formula, the percent error is 

I updated the post to use the term 'percent error' with a link to the Wikipedia page and a value of 8.1%.

Comment by Stephen McAleese (stephen-mcaleese) on A concrete bet offer to those with short AGI timelines · 2023-03-18T00:03:20.006Z · LW · GW

"Having thought about each of these milestones more carefully, and having already updated towards short timelines months ago"

You said that you updated and shortened your median timeline to 2047 and mode to 2035. But it seems to me that you need to shorten your timelines again.

In the It's time for EA leadership to pull the short-timelines fire alarm post says:

"it seems very possible (>30%) that we are now in the crunch-time section of a short-timelines world, and that we have 3-7 years until Moore's law and organizational prioritization put these systems at extremely dangerous levels of capability."

It seems that the purpose of the bet was to test this hypothesis:

"we are offering to bet up to $1000 against the idea that we are in the “crunch-time section of a short-timelines"

My understanding is that if AI progress occurred slowly and no more than one of the advancements listed were made by 2026-01-01 then this short timelines hypothesis would be proven false and could then be ignored.

However, the bet was conceded on 2023-03-16 which is much earlier than the deadline and therefore the bet failed to prove the hypothesis false.

It seems to me that the rational action is to now update toward believing that this short timelines hypothesis is true and 3-7 years from 2022 is 2025-2029 which is substantially earlier than 2047.

Comment by Stephen McAleese (stephen-mcaleese) on A proposed method for forecasting transformative AI · 2023-03-17T23:33:51.910Z · LW · GW

Strong upvote. I think the methods used in this post are very promising for accurately forecasting TAI for the reasons explained below. 

While writing GPT-4 Predictions I spent a lot of time playing around with the parametric scaling law L(N, D) from Hoffmann et al. 2022 (the Chinchilla paper). In the post, I showed that scaling laws can be used to calculate model losses and that these losses seem to correlate well with performance on the MMLU benchmark. My plan was to write a post extrapolating the progress further to TAI until I read this post which has already done that!

Scaling laws for language models seem to me like possibly the most effective option we have for forecasting TAI accurately for several reasons:

  • It seems as though the closest ML models to TAI that currently exist are language models and therefore predictive uncertainty should be lower for forecasting TAI from language models than from other types of less capable models.
  • A lot of economically valuable work such as writing and programming involves text and therefore language models tend to excel at these kinds of tasks. 
  • The simple training objective of language models makes it easier to reason about their properties and capabilities. Also, despite their simple training objective, large language models demonstrate impressive levels of generalization and even reasoning (e.g. chain-of-thought prompting).
  • Language model scaling laws are well-studied and highly accurate for predicting language model losses.
  • There are many existing examples of language models and their capabilities. Previous capabilities can be used as a baseline for predicting future capabilities.

Overall my intuition is that language model scaling laws require much fewer assumptions and guesswork for forecasting TAI and therefore should allow narrower and more confident predictions which your post seems to show (<10 OOM vs 20 OOM for the bio anchors method).

As I mentioned in this post there are limitations to using scaling laws such as the possibility of sudden emergent capabilities and the difficulty of predicting algorithmic advances.

  1. ^

    Exceptions include deep RL work by DeepMind such as AlphaTensor.

Comment by Stephen McAleese (stephen-mcaleese) on A concrete bet offer to those with short AGI timelines · 2023-03-17T20:34:28.779Z · LW · GW

I don't agree with the first point:

"a score of 80% would not even indicate high competency at any given task"

Although the MMLU task is fairly straightforward given that there are only 4 options to choose from (25% accuracy for random choices) and experts typically score about 90%, getting 80% accuracy still seems quite difficult for a human given that average human raters only score about 35%. Also, GPT-3 only scores about 45% (GPT-3 fine-tuned still only scores 54%), and GPT-2 scores just 32% even when fine-tuned.

One of my recent posts has a nice chart showing different levels of MMLU performance.

Extract from the abstract of the paper (2021):

"To attain high accuracy on this test, models must possess extensive world knowledge and problem solving ability. We find that while most recent models have near random-chance accuracy, the very largest GPT-3 model improves over random chance by almost 20 percentage points on average."

Comment by Stephen McAleese (stephen-mcaleese) on GPT-4: What we (I) know about it · 2023-03-16T08:36:41.218Z · LW · GW

Not that unlike GPT-2, GPT-3 does use some sparse attention. The GPT-3 paper says the model uses “alternating dense and locally banded sparse attention patterns in the layers of the transformer, similar to the Sparse Transformer”.

Comment by Stephen McAleese (stephen-mcaleese) on $20 Million in NSF Grants for Safety Research · 2023-03-06T22:36:49.598Z · LW · GW

Wow, this is an incredible achievement given how AI safety is still a relatively small field. For example, this post by 80,000 hours said that $10 - $50 million was spent globally on AI safety in 2020 according to The Precipice. Therefore this grant is roughly equivalent to an entire year of global AI safety funding!