The impact you might have working on AI safety

post by Fabien Roger (Fabien) · 2022-05-29T16:31:48.005Z · LW · GW · 1 comments

Contents

  The main steps of the estimation
  Main simplifying assumptions
  Mathematical model of the problem
    Model of when a world-ending AGI might happen, if there has been no AI safety progress
    Model of when the AI alignment problem might be solved
    Modeling your impacts on timelines
    Computing how your intervention affects the odds that humanity survives
  Results of this model
None
1 comment

This is an attempt at building a rough guess of the impact you might expect having with your career in AI safety.

Your impact will depend on the choices you make, and the estimation of their consequences also depends on your beliefs about AI safety, so I built an online tool so that you can input your own parameters into the model.

The main steps of the estimation

Main simplifying assumptions

These assumptions are very strong, and wrong. I’m unsure in which direction the results would move if the assumptions were more realistic. I am open to suggestions to replace parts of the model in order to replace these assumptions with assumptions that are less wrong. Please tell me if I introduced a strong assumption not mentioned above.

Mathematical model of the problem

Model of when a world-ending AGI might happen, if there has been no AI safety progress

Without your intervention, and if there is no AI safety progress, AGI will happen at time , where  follows a distribution of density  over , where [1]. AGI kills humanity with probability . This is assumed to be independent of when AGI happens. Therefore, without you, AGI will kill humanity at time , where  follows a distribution of density  (for all ).

Model of when the AI alignment problem might be solved

Without you, AI Alignment will be solved at time , where  follows a distribution of density , where . Its cumulative distribution function is .

Modeling your impacts on timelines

With your intervention, AI alignment will be solved at time , where  follows a distribution of density . Its cumulative distribution function is .

Between time  and , you increase the speed at which AI Alignment research is done by a factor of . That is modeled by saying that  where  is a continuous piecewise linear function with slope  in , slope  in , and slope  in : between  and , you make “time pass at the rate of ".

 can be broken down in several ways, one of which is , where  is the fraction of the progress in AI Alignment that the organization at which you work is responsible for, and  is how much you speed up the speed at which your organization is making progress[2]. This is only approximately true, and only relevant if you don’t speed your organization’s progress too much. Otherwise, effects like your organization depends on the work of others come into play and  becomes a complicated function of .

A similar work can be done to compute : with you, AGI will happen at time , where  follows a distribution of density  obtained from  in the same way as we obtained  from .

Computing how your intervention affects the odds that humanity survives

The probability of doom without your intervention is , where  (which does not depend on your intervention).

The probability of doom with your intervention is .

Hence, you save the world with probability . From there, you can also compute the expected number of lives saved.

Results of this model

This Fermi estimation takes as input  (your belief about AGI timelines),  (your belief about how AGI would go by default),  (your belief about AI alignment timelines),  (your belief about your organization's role in AI Alignment) and  (your belief about how much you help your organization). The results are a quite concrete measure of impact.

You can see a crude guess of what the results might look like if you work as a researcher in a medium-sized AI safety organization for your entire life here. With my current beliefs about AGI and AI alignment, humanity is doomed with probability , and you save the world[3] with probability .

I don’t have a very informed opinion about the inputs I put inside the estimation. I would be curious to know what result you would get with better informed estimations of the inputs. The website also contains other ways of computing the speedup s, and it is easy for me to add more, so feel free to ask for modifications!

Note: the website is still a work in progress, and I’m not sure that what I implemented is a correct way of discretizing the model above. The code is available on GitHub (link on the website), and I would appreciate it if you double-checked what I did and added more tests. If you want to use this tool to make important decisions, please contact me so that I increase its reliability.

  1. ^

  2. ^

     should take into account that your absence would probably not remove your job from your organization. In most cases, it would result in someone else doing your job, but slightly worse. 

  3. ^

    You “save humanity” in the same sense as you make your favorite candidate win if the election was perfectly balanced: the reasoning I used is in this 80000 hours article. In particular, the reasoning is done with causal decision theory [? · GW] and does not take into account the implications of your actions on your beliefs about other actors' actions.

1 comments

Comments sorted by top scores.

comment by Evan R. Murphy · 2022-05-31T04:42:34.662Z · LW(p) · GW(p)

Neat tool, it looks like you put a lot of work into this. I think the harder part for me is determining the parameters that need to be input into your model (e.g. % likelihood that AGI will go wrong). But this could be an interesting way to explore the ramifications of different views on AGI and safety research.