AI Probability Trees - Joe Carlsmith (2022)

post by Nathan Young · 2023-09-08T15:40:24.892Z · LW · GW · 1 comments

Contents

  Longer explanations
    What kind of AI is Carlsmith forecasting? What is APS?
  How this falls out in terms of possible worlds
    We can’t build APS by 2070 (35%)
    We can build APS but we choose not to (13%)
    It isn’t harder to deploy PS-aligned systems so we probably do (31%)
    Misaligned systems don’t cause $1tr of damage before 2070 (7%)
    Huge damage doesn’t lead to permanent disempowerment (8%)
    Permanent disempowerment somehow isn’t an existential catastrophe (.2%)
    Existential catastrophe (5%)
  How could these overviews be better?
None
1 comment

I am reviewing the work AI experts on what they think will happen with AI. This is a summary of Joe Carlsmith’s thoughts from his paper. AI risk scares me but often I feel pretty disconnected from it. This has helped me think about it. 

Here are Carlsmith’s thoughts in brief (he no longer fully endorses these):

Here is an interactive version of his probability tree: 


You can see all graphs I’ve done here: https://estimaker.app/ai

You can watch a video he did here (unrelated to me, I haven't talked to joe about this):

https://www.youtube.com/watch?v=UbruBnv3pZU 

Or 80k have done a similar write up here:

https://80000hours.org/problem-profiles/artificial-intelligence/#how-likely-is-an-AI-related-catastrophe 

Longer explanations

What kind of AI is Carlsmith forecasting? What is APS?

Advanced capability: they outperform the best humans on some set of tasks which when performed at advanced levels grant significant power in today’s world (tasks like scientific research, business/military/political strategy, engineering, and persuasion/manipulation). 

Agentic planning: they make and execute plans, in pursuit of objectives, on the basis of models of the world. 

Strategic awareness: the models they use in making plans represent with reasonable accuracy the causal upshot of gaining and maintaining power over humans and the real-world environment.

How this falls out in terms of possible worlds

We can’t build APS by 2070 (35%)

Many systems are possible in these worlds, including far more powerful ones than we have now, just not the ones that Carlsmith describes. Perhaps the last few percentage points of human capability are very hard to train, or perhaps LLMs don’t have world models capable of leading them to think strategically in new environments.

We can build APS but we choose not to (13%)

For some reason we choose not to build such systems, maybe because such powerful systems aren’t economically useful to us or because we ban their use.

It isn’t harder to deploy PS-aligned systems so we probably do (31%)

Here PS-aligned systems are as easy to build as those that aren’t. This is a great world. I hope we live in it. We build PS-aligned systems and they do what we want. Though now we just have to deal with human misalignment but at least they can’t leverage AI recursively increasing to bad ends.

This means that probably any person can call on systems more powerful than the best humans, with strategic planning. I guess it depends on how much they cost to run, but currently that would turn any reader into one of the most productive people they currently know. I find it hard to imagine how quickly the world would change (or how much new regulation would be created that only AIs could navigate)

Misaligned systems don’t cause $1tr of damage before 2070 (7%)

Unclear in such worlds why misaligned systems don’t spin out of control. Perhaps there is some deeper alignment (ie the orthogonality thesis is false), smaller crises cause regulation or APS systems counteract one-another. 

These worlds are hard for me to imagine.

Huge damage doesn’t lead to permanent disempowerment (8%)

While 1$T is a worldwide crisis (a quick google suggests that the 2008 crisis caused a comparable loss of growth). I guess that here the crisis causes people to take risk from APS systems seriously and then we avoid permanent disempowerment. Alternatively somehow such systems were never on track to permanently disempower but just caused massive damage without extinction. 

I guess these worlds look like the two above sets but just with an additional massive disaster at the front.

Permanent disempowerment somehow isn’t an existential catastrophe (.2%)

I find this hard to imagine, but I guess these are worlds where AI keeps us around in a way that isn’t a catastrophe for us. I wonder if Carlsmith would include a sort of benevolent AI dictator in this bucket. Dogs have a pretty good life, right?

Existential catastrophe (5%)

Whelp. Again it’s worth noting that Joe no longer fully endorses this.

How could these overviews be better?

We are still in early stages so I appreciate a lot of nitpicky feedback


 

1 comments

Comments sorted by top scores.

comment by Nathan Helm-Burger (nathan-helm-burger) · 2023-09-08T17:49:35.805Z · LW(p) · GW(p)

By 2070, it will become possible and financially feasible to build APS (Advanced, Planning, Strategically-aware) systems (65%)

For what it's worth, my estimate for APS by 2035 is ~95%. I think there are a lot of gradually accelerating processes underway affecting AI R&D which have positive feedback cycles. Trying to estimate how fast AI R&D will lead to a given capability threshold without taking positive feedback loops into account is almost certainly going to result in a huge underestimate of the speed.