james oofou's Shortform

james-oofou

james oofou's Shortform

post by james oofou (james-oofou) · 2024-09-12T11:50:35.703Z · LW · GW · 22 comments

22 comments

22 comments

Comments sorted by top scores.

comment by james oofou (james-oofou) · 2024-09-12T11:50:36.010Z · LW(p) · GW(p)

Is there a one stop shop type article presenting the AI doomer argument? I read the sequence posts related to AI doom but they're very scattered and more tailored toward trying to I guess exploring ideas than presenting a solid, cohesive argument. Of course, I'm sure that was the approach that made sense at the time. But I was wondering if since then there's been made some kind of canonical presentation of the AI doom argument? Something in the "attempts to be logically sound" side of things.

Replies from: jeremy-gillen, Mitchell_Porter, Seth Herd, sharmake-farah, Raemon, TAG, jam_brand, Dagon

↑ comment by Jeremy Gillen (jeremy-gillen) · 2024-09-12T12:13:03.400Z · LW(p) · GW(p)

If you're looking for recent, canonical one-stop-shop, the answer is List of Lethalities [LW · GW].

Replies from: lombertini

↑ comment by titotal (lombertini) · 2024-09-12T13:23:45.859Z · LW(p) · GW(p)

List of lethalities is not by any means a "one stop shop". If you don't agree with Eliezer on 90% of the relevant issues, it's completely unconvincing. For example, in that article he takes as an assumption that an AGI will be godlike level omnipotent, and that it will default to murderism.

Replies from: jeremy-gillen

↑ comment by Jeremy Gillen (jeremy-gillen) · 2024-09-12T14:17:46.439Z · LW(p) · GW(p)

If you don't agree with Eliezer on 90% of the relevant issues, it's completely unconvincing.

Of course. What kind of miracle are you expecting?

It also doesn't go into much depth on many of the main counterarguments. And doesn't go into enough detail that it even gets close to "logically sound". And it's not as condensed as I'd like. And it skips over a bunch of background. Still, it's valuable, and it's the closest thing to a one-post summary of why Eliezer is pessimistic about the outcome of AGI.

The main value of list of lethalities as a one-stop shop is that you can read it and then be able to point to roughly where you disagree with Eliezer. And this is probably what you want if you're looking for canonical arguments for AI risk. Then you can look further into that disagreement if you want.

Reading the rest of your comment very charitably: It looks like your disagreements are related to where AGI capability caps out, and whether default goals involve niceness to humans. Great!

If I read your comment more literally, my guess would be that you haven't read list of lethalities, or are happy misrepresenting positions you disagree with.

he takes as an assumption that an AGI will be godlike level omnipotent

He specifically defines a dangerous intelligence level as around the level required to design and build a nanosystem capable of building a nanosystem (or any of several alternative example capabilities) (In point 3). Maybe your omnipotent gods are lame.

and that it will default to murderism

This is false. Maybe you are referring to how there isn't any section justifying instrumental convergence? But it does have a link, and it notes that it's skipping over a bunch of background in that area (-3). That would be a different assumption, but if you're deliberately misrepresenting it, then that might be the part that you are misrepresenting.

↑ comment by Mitchell_Porter · 2024-09-12T18:35:45.232Z · LW(p) · GW(p)

David Chalmers asked for one last year, but there isn't.

I might give the essence of the assumptions as something like: you can't beat superintelligence; intelligence is independent of value; and human survival and flourishing require specific complex values that we don't know how to specify.

But further pitfalls reveal themselves later, e.g. you may think you have specified human-friendly values correctly, but the AI may then interpret the specification in an unexpected way.

What is clearer than doom, is that creation of superintelligent AI is an enormous gamble, because it means irreversibly handing control of the world to something non-human. Eliezer's position is that you shouldn't do that unless you absolutely know what you're doing. The position of the would-be architects of superintelligent AI is that hopefully they can figure out everything needed for a happy ending, in the course of their adventure.

One further point I would emphasize, in the light of the last few years of experience with generative AI, is the unpredictability of the output of these powerful systems. You can type in a prompt, and get back a text, an image, or a video, which is like nothing you anticipated, and sometimes it is very definitely not what you want. "Generative superintelligence" has the potential to produce a surprising and possibly "wrong" output that will transform the world and be impossible to undo.

↑ comment by Seth Herd · 2024-09-12T17:13:08.308Z · LW(p) · GW(p)

I'd actually recommend Zvi's On A List of Lethalities [LW · GW] over the original, as a more readily understandable version that covers the same arguments.

↑ comment by Noosphere89 (sharmake-farah) · 2024-09-12T17:44:15.183Z · LW(p) · GW(p)

I think this post is an excellent distillation of the AI doomer argument, and it importantly helps me understand why people think AI alignment was going to be difficult:

https://www.lesswrong.com/posts/wnkGXcAq4DCgY8HqA/a-case-for-ai-alignment-being-difficult [LW · GW]

↑ comment by Raemon · 2024-09-12T18:21:06.261Z · LW(p) · GW(p)

I think AGI Safety From First Principles [? · GW] by Richard Ngo is probably good.

I think AGI Ruin: A List of Lethalities [LW · GW] is comprehensive but also sort of advanced and skips over the two basic bits.

Replies from: Zolmeister

↑ comment by Zolmeister · 2024-09-12T19:15:03.281Z · LW(p) · GW(p)

Superintelligence FAQ [LW · GW] [1] [LW(p) · GW(p)] as well.

↑ comment by TAG · 2024-09-14T10:42:52.898Z · LW(p) · GW(p)

What I have noticed is that while there are cogent overviews of AI safety that don't come to the extreme conclusion that we all going to be killed by AI with high probability....and there are articles that do come to that conclusion without being at all rigorous or cogent....there aren't any that do both. From that I conclude there aren't any good reasons to believe in extreme AI doom scenarios, and you should disbelieve them. Others use more complicated reasoning, like "Yudkowsky is too intelligent to communicate his ideas to lesser mortals, but household believe him anyway".

(See @DPiepgrass saying something similar [LW · GW] and of course getting downvoted).

@MitchellPorter supplies us with some examples of gappy arguments.

human survival and flourishing require specific complex values that we don't know how to specify

There 's no evidence that "human values" are even a coherent entity , and no reason to believe that any AI of any architecture would need them.

But further pitfalls reveal themselves later, e.g. you may think you have specified human-friendly values correctly, but the AI may then interpret the specification in an unexpected way.

What is clearer than doom, is that creation of superintelligent AI is an enormous gamble, because it means irreversibly handing control of the world

Hang on a minute. Where does control of the come from? Do we give it to the AI? Does it take it?

to something non-human. Eliezer's position is that you shouldn't do that unless you absolutely know what you're doing. The position of the would-be architects of superintelligent AI is that hopefully they can figure out everything needed for a happy ending, in the course of their adventure.

One further point I would emphasize, in the light of the last few years of experience with generative AI, is the unpredictability of the output of these powerful systems. You can type in a prompt, and get back a text, an image, or a video, which is like nothing you anticipated, and sometimes it is very definitely not what you want. "Generative superintelligence" has the potential to produce a surprising and possibly "wrong" output that will transform the world and be impossible to undo.

Current generative AI has no ability to directly affect anything. Where would that come from?

↑ comment by jam_brand · 2024-09-14T23:06:00.355Z · LW(p) · GW(p)

Perhaps see https://homosabiens.substack.com/p/deadly-by-default by Duncan Sabien [LW · GW].

↑ comment by Dagon · 2024-09-12T15:28:00.620Z · LW(p) · GW(p)

I don't know that "the AI doomer argument" is a coherent thing. At least I haven't seen an attempt to gather or summarize it in an authoritative way. In fact, it's not really an argument (as far as I've seen), it's somewhere between a vibe and a prediction.

For me, when I'm in a doomer mood, it's easy to give a high probability to the idea that humanity will be extinct fairly soon (it may take centuries to fully die out, but will be fully irreversible path in 10-50 years, if it's not already). Note that this has been a common belief long before AI was a thing - nuclear war/winter, ecological collapse, pandemic, etc. are pretty scary, and humans are fragile.

My optimistic "argument" is really not better-formed. Humans are clever, and when they can no longer ignore a problem, they solve it. We might lose 90%+ of the current global population, and a whole lot of supply-chain and tech capability, but that's really only a few doublings lost, maybe a millennium to recover, and maybe we'll be smarter/luckier in the next cycle.

From your perspective, what do you think the argument is, in terms of thesis and support?

Replies from: Seth Herd

↑ comment by Seth Herd · 2024-09-12T17:10:29.768Z · LW(p) · GW(p)

There are a lot of detailed arguments for doom by misaligned AGI.

Coming to grips with them, and the conterarguments in actual proposals for aligning AGI and managing the political and economic fallout, is a herculean task. I feel it's taken me about two years of spending the majority of my work time on doing that to even have my head mostly around most of the relevant arguments. Having done that, my p(doom) is still roughly 50%, with wide uncertainty for unknown unknows still to be revealed or identified.

So if someone isn't going to do that, I think the above summary is pretty accurate. Alignment and managing the resulting shifts in the world is not easy, but it's not impossible. Sometimes humans do amazing things. Sometimes they do amazingly stupid things. So again, roughly 50% from this much rougher method.

comment by james oofou (james-oofou) · 2024-10-10T20:09:29.217Z · LW(p) · GW(p)

Here's some near-future fiction:

In 2027 the trend that began in 2024 with OpenAI's o1 reasoning system has continued. The compute required to run AI is no longer negligible compared to the cost of training it. Models reason over long periods of time. Their effective context windows are massive, they update their underlying models continuously, and they break tasks down into sub-tasks to be carried out in parallel. The base LLM they are built on is two generations ahead of GPT-4.

These systems are language model agents. They are built with self-understanding and can be configured for autonomy. These constitute proto-AGI. They are artificial intelligences that can perform much but not all of the intellectual work that humans can do (although even what these AI can do, they cannot necessarily do cheaper than a human could).

In 2029 people have spent over a year working hard to improve the scaffolding around proto-AGI to make it as useful as possible. Presently, the next generation of LLM foundational model is released. Now, with some further improvements to the reasoning and learning scaffolding, this is true AGI. It can perform any intellectual task that a human could (although it's very expensive to run at full capacity). It is better at AI research than any human. But it is not superintelligence. It is still controllable and its thoughts are still legible. So, it is put to work on AI safety research. Of course, by this point much progress has already been made on AI safety - but it seems prudent to get the AGI to look into the problem and get its go-ahead before commencing with the next training run. After a few months the AI declares it has found an acceptable safety approach. It spends some time on capabilities research then the training run for the next LLM begins.

In 2030 the next LLM is completed, and improved scaffolding is constructed. Now human-level AI is cheap, better-than-human-AI is not too expensive, and the peak capabilities of the AI are almost alien. For a brief period of time the value of human labour skyrockets, workers acting as puppets as the AI instructs them over video-call to do its bidding. This is necessary due to a major robotics shortfall. Human puppet-workers work in mines, refineries, smelters, and factories, as well as in logistics, optics, and general infrastructure. Human bottlenecks need to be addressed. This takes a few months, but the ensuing robotics explosion is rapid and massive.

2031 is the year of the robotics explosion. The robots are physically optimised for their specific tasks, coordinate perfectly with other robots, are able to sustain peak performance, do not require pay, and are controlled by cleverer-than-human minds. These are all multiplicative factors for the robots' productivity relative to human workers. Most robots are not humanoid, but let's say a humanoid robot would cost $x. Per $x robots in 2031 are 10,000 more productive than a human. This might sound like a ridiculously high number: one robot the equivalent of 10,000 humans? But let's do some rough math:

Advantage | Productivity Multiplier (relative to skilled human)

Physically optimised for their specific tasks | 5

Coordinate perfectly with other robots | 10

Able to sustain peak performance | 5

Do not require pay | 2

Controlled by cleverer-than-human minds | 20

5*10*5*2*20 = 10,000

Suppose that a human can construct one robot per year (taking into account mining and all the intermediary logistics and manufacturing). With robots 10^4 times as productive as humans, each robot will construct an average of 10^4 robots per year. This is the robotics explosion. By the end of the year there will be a 10^11 robots (more precisely, an amount of robots that is cost-equivalent to 10^11 humanoid robots).

By 2032 there are 10^11 robots, each with the productivity of 10^4 skilled human workers. That is a total productivity equivalent to 10^15 skilled human workers. This is roughly 10^5 times the productivity of humanity in 2024. At this point trillions of advanced processing units have been constructed and are online. Industry expands through the Solar System. The number of robots continues to balloon. The rate of research and development accelerates rapidly. Human mind upload is achieved.

Replies from: Seth Herd

↑ comment by Seth Herd · 2024-10-11T17:51:35.180Z · LW(p) · GW(p)

This sounds highly plausible. There are some other dangers your scenario leaves out, which I tried to explore in If we solve alignment, do we die anyway? [LW · GW]

comment by james oofou (james-oofou) · 2025-03-01T08:45:43.926Z · LW(p) · GW(p)

Who has written up forecasts on how reasoning will scale?

I see people say that e.g. the marginal cost of training DeepSeek R1 over DeepSeek v3 was very little. And I see people say that reasoning capabilities will scale a lot further than they already have. So what's the roadblock? Doesn't seem to be compute, so it's probably algorithmic.

But as a non-technical person I don't really know how to model this (other than some vague feeling from posts I've read here that reasoning length will increase exponentially and that this will correspond to significantly improved problem-solving skills and increased agency), but it seems pretty central to forming timelines. So, anyone written anything informative about this?

Replies from: carl-feynman, davey-morse

↑ comment by Carl Feynman (carl-feynman) · 2025-03-01T15:35:21.240Z · LW(p) · GW(p)

An interesting and important question.

We have data about how problem-solving ability scales with reasoning time for a fixed model. This isn’t your question, but it’s related. It’s pretty much logarithmic, IIRC.

The important question is, how far can we push the technique whereby reasoning models are trained? They are trained by having them solve a problem with chains of thought (CoT), and then having them look at their own CoT, and ask “how could I have thought that faster?” It’s unclear how far this technique can be pushed (at least to those of us outside the main AI labs).

The known scaling principles are unsatisfying from the point of view of someone who actually wants to know what will happen next. They can predict numbers like score on a certain test, or residual perplexity. But they can’t predict the emergence of new abilities like “can translate languages” or “can tell a joke” or “can take over the world.”

By the way, I wouldn’t put too much weight on any claims about “marginal cost of training DeepSeek R1 over DeepSeek v3”. DeepSeek has a track record of understating how much effort it took to do something. I’m not saying it’s actual dishonesty (although it might be) but it’s at least not counting costs that other companies include, so their estimates come out apparently much lower than other people.

↑ comment by Davey Morse (davey-morse) · 2025-03-01T16:26:33.468Z · LW(p) · GW(p)

One non-technical forecast, related to gpt4.5's announcement: https://x.com/davey_morse/status/1895563170405646458

Replies from: james-oofou

↑ comment by james oofou (james-oofou) · 2025-03-01T16:37:49.293Z · LW(p) · GW(p)

Thanks. One thing that confuses me is that, if this is true, why do mini reasoning models often seem to out-perform their full counterparts at certain tasks?

e.g. grok 3 beta mini (think) performed overall roughly the same or better than grok 3 beta (think) on benchmarks[1]. And I remember a similar thing with OAI's reasoning models.

[1] https://x.ai/blog/grok-3

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2025-03-01T16:53:55.603Z · LW(p) · GW(p)

Full Grok 3 only had a month for post-training, and keeping responses on general topics reasonable is a fiddly semi-manual process. They didn't necessarily have the R1-Zero idea either, which might make long reasoning easier to scale automatically (as long as you have enough verifiable tasks, which is the thing that plausibly fails to scale very far).

Also, running long reasoning traces for a big model is more expensive and takes longer, so the default settings will tend to give smaller reasoning models more tokens to reason with, skewing the comparison.

comment by james oofou (james-oofou) · 2025-02-14T15:01:30.117Z · LW(p) · GW(p)

Some predictions about where AI will be at end of year:

Content written by AI with human guidance in social media, fiction, news, and blogs will have seen a massive rise in popularity
AI friends will have surged in popularity
We'll have coding agents which can error correct / iterate, implement features which would take a skilled human ~an hour
There will still be little-to-no novel research created primarily by LLMs

comment by james oofou (james-oofou) · 2024-12-11T04:03:50.838Z · LW(p) · GW(p)

Here is an experiment that demonstrates the unlikelihood of one potential AI outcome.

The outcome shown to be unlikely:

Aligned ASI is achieved sometime in the next couple of decades and each person is apportioned a sizable amount of compute to do with as they wish.

The experiment:

I have made a precommitment that I will, conditional on the outcome described above occurring, simulate billions of lives for myself - each indistinguishable from the life I have lived so far. By "indistinguishable" I do not necessarily mean identical (which might be impossible or expensive). All that is necessary is that each has similar amounts of suffering, scale, detail, imminent AGI, etc. I'll set up these simulations so that in each of these simulated lives I will be transported at 4:00 pm Dec11'24 to a virtual personal utopia. Having precommitted to simulating these worlds, I should now expect to be transported into a personal utopia in three minutes time if this future is likely. And if I am not transported into a personal utopia I should conclude that this future is unlikely.

Let's see what happens...

It's 4:00 pm and I didn't get transported into utopia.

So, this outcome is unlikely.

QED

Potential weak points

I do see a couple of potential weak points in the logic of this experiment. Firstly, it might be the case that I'll have reason to simulate many indistinguishable lives in which I do not get transported to utopia, which would throw off the math. But I can't see why I'd choose to create simulations of myself in not optimally-enjoyable lives unless I had good reason to, so I don't think that objection holds.^[1]

The other potential weak point is that perhaps I wouldn't be willing to pay the opportunity cost of billions of years of personal utopia. Although billions of years of simulation is just a tiny proportion of my compute budget, it's still billions of years that could otherwise have been spent in perfect virtual utopia. I think this potentially a serious issue with the argument, although I will note that I don't actually have to simulate an entire life for the experiment to work, just a few minutes around 4:00pm on Dec11'24, minutes which were vaguely enjoyable. To address this objection the experiment could be carried out while euphoric (since the opportunity cost would then be lower).

^{^}
Perhaps, as a prank response to this post, someone could use some of their compute budget to simulate lives in which I don't get transported to utopia. But I think that there would be restrictions in place against running other people as anything other than p-zombies.

james oofou's Shortform

Contents

22 comments