Dario Amodei — Machines of Loving Grace

post by Matrice Jacobine · 2024-10-11T21:43:31.448Z · LW · GW · 19 comments

This is a link post for https://darioamodei.com/machines-of-loving-grace#basic-assumptions-and-framework

Contents

19 comments

I think and talk a lot about the risks of powerful AI. The company I’m the CEO of, Anthropic, does a lot of research on how to reduce these risks. Because of this, people sometimes draw the conclusion that I’m a pessimist or “doomer” who thinks AI will be mostly bad or dangerous. I don’t think that at all. In fact, one of my main reasons for focusing on risks is that they’re the only thing standing between us and what I see as a fundamentally positive future. I think that most people are underestimating just how radical the upside of AI could be, just as I think most people are underestimating how bad the risks could be.

In this essay I try to sketch out what that upside might look like—what a world with powerful AI might look like if everything goes right. Of course no one can know the future with any certainty or precision, and the effects of powerful AI are likely to be even more unpredictable than past technological changes, so all of this is unavoidably going to consist of guesses. But I am aiming for at least educated and useful guesses, which capture the flavor of what will happen even if most details end up being wrong. I’m including lots of details mainly because I think a concrete vision does more to advance discussion than a highly hedged and abstract one.

19 comments

Comments sorted by top scores.

comment by ryan_greenblatt · 2024-10-12T00:00:16.768Z · LW(p) · GW(p)

This essay doesn't seem very good in terms of making accurate predictions about the future.

I wish Dario answered questions like "if we let things go as fast as possible, what will the curve of energy production or industrial production look like" or "how plausible is it that AIs can quickly bootstrap to nanotech and how impressive would this nanotech be". I think these questions are upstream of all of the things Dario tries to address and might imply radically different conclusions.

For instance, if you can quickly multiply energy production by a factor of 1e15 (dyson sphere level) and use nanotech to make much better computers (say >1e6 times more efficient and vastly higher quantities to consume most energy), then a huge number of previously intractible computational problems can become tractable just via scale.

(If we condition on AIs which beat top human experts in 2027, I expect >1e10 additional energy production by 2035 if humanity (or whoever is in control) decides to go as fast as possible.)

More generally, Dario appears to assume that for 5-10 years after powerful AI we'll just have a million AIs which are a bit smarter than the smartest humans and perhaps 100x faster rather than AIs which are radically smarter, faster, and more numerous than humans. I don't see any argument that AI progress will stop at the point of top humans rather continuing much further. This is even putting aside the potential for vastly higher scale with increased energy production.

Dario tries to argue against "AIs as a tool to analyze data" and note that AIs will match or exceed human researchers, but seems to neglect that AIs could become vastly more powerful beyond this and greatly expand available compute.

If Dario thinks that progress will cap out at some level due to humans intentionally slowing down, it seems good to say this.

Replies from: Lanrian, T3t, leon-lang, anaguma, sharmake-farah
comment by Lukas Finnveden (Lanrian) · 2024-10-13T22:37:05.094Z · LW(p) · GW(p)

More generally, Dario appears to assume that for 5-10 years after powerful AI we'll just have a million AIs which are a bit smarter than the smartest humans and perhaps 100x faster rather than AIs which are radically smarter, faster, and more numerous than humans. I don't see any argument that AI progress will stop at the point of top humans rather continuing much further.

Well, there's footnote 10:

Another factor is of course that powerful AI itself can potentially be used to create even more powerful AI. My assumption is that this might (in fact, probably will) occur, but that its effect will be smaller than you might imagine, precisely because of the “decreasing marginal returns to intelligence” discussed here. In other words, AI will continue to get smarter quickly, but its effect will eventually be limited by non-intelligence factors, and analyzing those is what matters most to the speed of scientific progress outside AI.

So his view seems to be that even significantly smarter AIs just wouldn't be able to accomplish that much more than what he's discussing here. Such that they're not very relevant.

(I disagree. Maybe there are some hard limits, here, but maybe there's not. For most of the bottlenecks that Dario discusses, I don't know how you become confident that there are 0 ways to speed them up or circumvent them. We're talking about putting in many times more intellectual labor than our whole civilization has spent on any topic to date.)

comment by RobertM (T3t) · 2024-10-12T00:23:23.079Z · LW(p) · GW(p)

Yeah, the essay (I think correctly) notes that the most significant breakthroughs in biotech come from the small number of "broad measurement tools or techniques that allow precise but generalized or programmable intervention", which "are so powerful precisely because they cut through intrinsic complexity and data limitations, directly increasing our understanding and control".

Why then only such systems limited to the biological domain?  Even if it does end up being true that scientific and technological progress is substantially bottlenecked on real-life experimentation, where even AIs that can extract many more bits from the same observations than humans still suffer from substantial serial dependencies with no meaningful "shortcuts", it still seems implausible that we don't get to nanotech relatively quickly, if it's physically realizable.  And then that nanotech unblocks the rate of experimentation.  (If you're nanotech skeptical, human-like robots seem sufficient as actuators to speed up real-life experimentation by at least an order of magnitude compared to needing to work through humans, and work on those is making substantial progress.)

If Dario thinks that progress will cap out at some level due to humans intentionally slowing down, it seems good to say this.

Footnote 2 maybe looks like a hint in this direction if you squint, but Dario spent a decent chunk of the essay bracketing outcomes he thought were non-default and would need to be actively steered towards, so it's interesting that he didn't explicitly list those (non-tame futures) as a type of of outcome that he'd want to actively steer away from.

Replies from: sharmake-farah
comment by Noosphere89 (sharmake-farah) · 2024-10-12T01:26:31.260Z · LW(p) · GW(p)

My answer to this question of why Dario thought this:

Yeah, the essay (I think correctly) notes that the most significant breakthroughs in biotech come from the small number of "broad measurement tools or techniques that allow precise but generalized or programmable intervention", which "are so powerful precisely because they cut through intrinsic complexity and data limitations, directly increasing our understanding and control".

Why then only such systems limited to the biological domain?

Is because this is the area that Dario has most experience in being a biologist, and he freely admits to having limited expertise here:

I am fortunate to have professional experience in both biology and neuroscience, and I am an informed amateur in the field of economic development, but I am sure I will get plenty of things wrong. One thing writing this essay has made me realize is that it would be valuable to bring together a group of domain experts (in biology, economics, international relations, and other areas) to write a much better and more informed version of what I’ve produced here. It’s probably best to view my efforts here as a starting prompt for that group.

I also believe these set of reasons come into play here:

  • Avoid grandiosity. I am often turned off by the way many AI risk public figures (not to mention AI company leaders) talk about the post-AGI world, as if it’s their mission to single-handedly bring it about like a prophet leading their people to salvation. I think it’s dangerous to view companies as unilaterally shaping the world, and dangerous to view practical technological goals in essentially religious terms.
  • Avoid “sci-fi” baggage. Although I think most people underestimate the upside of powerful AI, the small community of people who do discuss radical AI futures often does so in an excessively “sci-fi” tone (featuring e.g. uploaded minds, space exploration, or general cyberpunk vibes). I think this causes people to take the claims less seriously, and to imbue them with a sort of unreality. To be clear, the issue isn’t whether the technologies described are possible or likely (the main essay discusses this in granular detail)—it’s more that the “vibe” connotatively smuggles in a bunch of cultural baggage and unstated assumptions about what kind of future is desirable, how various societal issues will play out, etc. The result often ends up reading like a fantasy for a narrow subculture, while being off-putting to most people.
comment by Leon Lang (leon-lang) · 2024-10-12T13:12:30.113Z · LW(p) · GW(p)

My impression is that Dario (somewhat intentionally?) plays the game of saying things he believes to be true about the 5-10 years after AGI, conditional on AI development not continuing.

What happens after those 5-10 years, or if AI gets even vastly smarter? That seems out of scope for the article. I assume he's doing that since he wants to influence a specific set of people, maybe politicians, to take a radical future more seriously than they currently do. Once a radical future is more viscerally clear in a few years, we will likely see even more radical essays. 

Replies from: nathan-helm-burger
comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-10-13T07:51:29.920Z · LW(p) · GW(p)

It's tricky to pin down from this what he gut-level believes versus thinks it expedient to publish.

Consider this passage and its footnote:

Thus, we should imagine a picture where intelligence is initially heavily bottlenecked by the other factors of production, but over time intelligence itself increasingly routes around the other factors, even if they never fully dissolve (and some things like physical laws are absolute)10. The key question is how fast it all happens and in what order.

10 Another factor is of course that powerful AI itself can potentially be used to create even more powerful AI. My assumption is that this might (in fact, probably will) occur, but that its effect will be smaller than you might imagine, precisely because of the “decreasing marginal returns to intelligence” discussed here. In other words, AI will continue to get smarter quickly, but its effect will eventually be limited by non-intelligence factors, and analyzing those is what matters most to the speed of scientific progress outside AI.

The two implied assumptions I note relevant to this:

  1. AI will only get a bit smarter (2-3x) than the smartest human, not a lot smarter (100x).

  2. Algorithmic advances won't make it vastly cheaper to train AI. Datacenters with oversight and computer governance, control of AGI by a small number of responsible parties, defense-dominant technology outcomes. This is an imagined future without radical changes in world governments, but also everything staying neat and tidy and controlled.

comment by anaguma · 2024-10-12T07:08:11.958Z · LW(p) · GW(p)

Maybe he thinks that much faster technological progress would cause social problems and thus wouldn’t be implemented by an aligned AI, even if it were possible. Footnote 2 points at this:

“ I do anticipate some minority of people’s reaction will be ‘this is pretty tame’… But more importantly, tame is good from a societal perspective. I think there’s only so much change people can handle at once, and the pace I’m describing is probably close to the limits of what society can absorb without extreme turbulence. ”

A separate part of the introduction argues that causing this extreme societal turbulence would be unaligned:

“Many things cannot be done without breaking laws, harming humans, or messing up society. An aligned AI would not want to do these things (and if we have an unaligned AI, we’re back to talking about risks). Many human societal structures are inefficient or even actively harmful, but are hard to change while respecting constraints like legal requirements on clinical trials, people’s willingness to change their habits, or the behavior of governments. ”

comment by Noosphere89 (sharmake-farah) · 2024-10-12T00:23:25.218Z · LW(p) · GW(p)

Yeah, I think this is a big flaw of the current analysis, and I think that right now, there seems to be a bit too much of assuming more limitations to the physical world than I expect to actually occur.

For example, this:

(say >1e6 times more efficient and vastly higher quantities to consume most energy),

will be developed by reversible computation, since we will likely have hit the Landauer Limit for non-reversible computation by then, and in principle there is basically 0 limit to how much you can optimize for reversible computation, which leads to massive energy savings, and this lets you not have to consume as much energy as current AIs or brains today.

And yeah, I think far more radically positive futures are possible assuming AI alignment has been solved.

To give credit to Dario Amodei though, I like that he thought about the positive scenarios for AI at all, because I view creation of positive stories of AI as a seriously undersupplied public good, and helps us keep ourselves from being eternally pessimistic without any grounding in reality.

Replies from: ryan_greenblatt, LosPolloFowler
comment by ryan_greenblatt · 2024-10-12T01:20:13.338Z · LW(p) · GW(p)

Actually >1e6 was my conservative guess even without reversible computing.

Claude claimed that Landauer Limit is 2e8 times more efficient than current GPUs (A100). (I didn't check Claude, I just asked 3.5 sonnet "How far are modern GPUs from theoretical energy efficiency for non-reversible computing?" and got this answer.) Edit: if someone knows the actual answer, please let me know.

So, even without reversible computing, you can go very far. Idk how much further reversible computing gets you, though I think it is probably possible.

Replies from: adam-jermyn
comment by Adam Jermyn (adam-jermyn) · 2024-10-16T00:06:45.708Z · LW(p) · GW(p)

I get 1e7 using 16 bit-flips per bfloat16 operation, 300K operating temperature, and 312Tflop/s (from Nvidia's spec sheet). My guess is that this is a little high because a float multiplication involves more operations than just flipping 16 bits, but it's the right order-of-magnitude.

comment by Stephen Fowler (LosPolloFowler) · 2024-10-12T12:16:05.837Z · LW(p) · GW(p)

will be developed by reversible computation, since we will likely have hit the Landauer Limit for non-reversible computation by then, and in principle there is basically 0 limit to how much you can optimize for reversible computation, which leads to massive energy savings, and this lets you not have to consume as much energy as current AIs or brains today.

With respect, I believe this to be overly optimistic about the benefits of reversible computation. 

Reversible computation means you aren't erasing information, so you don't lose energy in the form of heat (per Landauer[1][2]). But if you don't erase information, you are faced with the issue of where to store it

If you are performing a series of computations and only have a finite memory to work with, you will eventually need to reinitialise your registers and empty your memory, at which point you incur the energy cost that you had been trying to avoid. [3] 

Epistemics:
I'm quite confident (95%+) that the above is true. (edit: RogerDearnaley's comment has convinced me I was overconfident)  Any substantial errors would surprise me. 

I'm less confident in the footnotes.

  1. ^

  2. ^

    A cute, non-rigorous intuition for Landauer's Principle:
    The process of losing track of (deleting) 1 bit of information means your uncertainty about the state of the environment has increased by 1 bit. You must see entropy increase by at least 1 bit's worth of entropy.

    Proof:
    Rearrange the Landauer Limit to 

    Now, when you add a small amount of heat to a system, the change in entropy is given by:

    But the E occurring in Landauer's formula is not the total energy of a system, it is a small amount of energy required to delete the information. When it all ends up as heat, we can replace it with  and we have:

     

    Compare this expression with the physicist's definition of entropy. The entropy of a system is the a scaling factor, , times the logarithm of the number of micro-states that the system might be in, .

      

    The choice of units obscures the meaning of the final term.  converted from nats to bits is just 1 bit.

  3. ^

    Splitting hairs, some setups will allow you to delete information with a reduced or zero energy cost, but the process is essentially just "kicking the can down the road". You will incur the full cost during the process of re-initialisation. 

    For details, see equation (4) and fig (1) of Sagawa, Ueda (2009).

Replies from: roger-d-1
comment by RogerDearnaley (roger-d-1) · 2024-10-13T02:00:06.179Z · LW(p) · GW(p)

Reversible computation means you aren't erasing information, so you don't lose energy in the form of heat (per Landauer[1][2]). But if you don't erase information, you are faced with the issue of where to store it

If you are performing a series of computations and only have a finite memory to work with, you will eventually need to reinitialise your registers and empty your memory, at which point you incur the energy cost that you had been trying to avoid. [3] 

Generally, reversible computation allows you to avoid wasting energy by deleting any memory used on intermediate answers, and only do so only for final results. It does require that you have enough memory to store all those intermediate answers until you finish the calculation and then run it in reverse. If you don't have that much memory, you can divide your calculation into steps, connected by final results from each step fed into the next step, and save on the energy cost of all the intermediate results within each step, and pay the cost only for data passed from one step to the next or output from the last step. Or, for a 4x slowdown rather than the usual 2x slowdown for reversible computation, you can have two sizes of step, and have some intermediate results that last only during a small step, and others that are retained for a large step before being uncomputed.

Memory/energy loss/speed trade-off management for reversible computation is a little more complex than conventional memory management, but is still basically simple, and for many computational tasks you can achieve excellent tradeoffs.

Replies from: sharmake-farah
comment by Noosphere89 (sharmake-farah) · 2024-10-13T04:29:04.943Z · LW(p) · GW(p)

Yeah, I was thinking of uncomputing strategies that reversed the computation from a error-prone state to an error free state without consuming any energy and work, and it turns out that you can uncompute a result by reversing the computation backwards without having to delete it, which would release waste heat.

comment by RobertM (T3t) · 2024-10-12T07:44:10.401Z · LW(p) · GW(p)

Credit where credit is due: this is much better in terms of sharing one's models than one could say of Sam Altman, in recent days. 

As noted above the footnotes, many people at Anthropic reviewed the essay.  I'm surprised that Dario would hire so many people he thinks need to "touch grass" (because they think the scenario he describes in the essay sounds tame), as I'm pretty sure that describes a very large percentage of Anthropic's first ~150 employees (certainly over 20%, maybe 50%).

My top hypothesis is that this is a snipe meant to signal Dario's (and Anthropic's) factional alliance with Serious People; I don't think Dario actually believes that "less tame" scenarios are fundamentally implausible[1].  Other possibilities that occur to me, with my not very well considered probability estimates:

  • I'm substantially mistaken about how many early Anthropic employees think "less tame" outcomes are even remotely plausible (20%), and Anthropic did actively try to avoid hiring people with those models early on (1%).
  • I'm not mistaken about early employee attitudes, but Dario does actually believe AI is extremely likely to be substantially transformative, and extremely unlikely to lead to the "sci-fi"-like scenarios he derides (20%).  Conditional on that, he didn't think it mattered whether his early employees had those models (20%) or might have slightly preferred not, all else equal, but wasn't that fussed about it compared to recruiting strong technical talent (60%).

I'm just having a lot of trouble reconciling what I know of the beliefs of Anthropic employees, and the things Dario says and implies in this essay.  Do Anthropic employees who think less tame outcomes are plausible believe Dario when he says they should "touch grass"?  If you don't feel comfortable answering that question in public, or can't (due to NDA), please consider whether this is a good situation to be in.

  1. ^

    He has not, as far as I know, deigned to offer any public argument on the subject.

Replies from: Benito, RavenclawPrefect
comment by Ben Pace (Benito) · 2024-10-14T21:57:16.136Z · LW(p) · GW(p)

Credit where credit is due: this is much better in terms of sharing one's models than one could say of Sam Altman, in recent days. 

I mean I guess this is literally true, but to be clear I think it's broadly not much less deceptive (edit: or at least, 'filtered [LW · GW]').

I remind you of this Thiel quote [LW · GW]:

I think the pro-AI people in Silicon Valley are doing a pretty bad job on, let’s say, convincing people that it’s going to be good for them, that it’s going to be good for the average person, that it’s going to be good for our society. And if it all ends up being of some version where humans are headed toward the glue-factory like a horse... man, that probably makes me want to become a luddite too.

I think Amodei did not ask himself "What about my models of the situation would be most relevant to people trying to understand the world and the AI industry?" but "What about my models of the situation would be most helpful in building a positive narrative for AI." I imagine this is roughly the same algorithm that Altman is running, but Amodei is a much stronger intellectual so is able to write an essay this detailed and thoughtful.

Replies from: adam-jermyn, sharmake-farah
comment by Adam Jermyn (adam-jermyn) · 2024-10-16T00:11:27.619Z · LW(p) · GW(p)

He does start out by saying he thinks & worries a lot about the risks (first paragraph):

I think and talk a lot about the risks of powerful AI. The company I’m the CEO of, Anthropic, does a lot of research on how to reduce these risks... I think that most people are underestimating just how radical the upside of AI could be, just as I think most people are underestimating how bad the risks could be.

He then explains (second paragraph) that the essay is meant to sketch out what things could look like if things go well:

In this essay I try to sketch out what that upside might look like—what a world with powerful AI might look like if everything goes right.

I think this is a coherent thing to do?

Replies from: Benito
comment by Ben Pace (Benito) · 2024-10-18T00:24:11.697Z · LW(p) · GW(p)

My current belief is that this essay is optimized to be understandable by a much broader audience than any comparable public writing from Anthropic on extinction-level risk. 

For instance, did you know that the word 'extinction' doesn't appear anywhere on Anthropic's or Dario's websites? Nor do 'disempower' or 'disempowerment'. The words 'existential' and 'existentially' only come up three times: when describing the work of an external organization (ARC), in one label in a paper, and one mention in the Constitutional AI. In its place they always talk about 'catastrophic' risk, which of course for most readers spans a range many orders of magnitude less serious (e.g. damages of $100M). If Amodei doesn't believe that existential threats are legitimate, then I think there are many people at his organization who have gone there on the trust that it is indeed a primary concern of his and who will be betrayed in that. If he does, how has he managed to ensure basically no discussion of it on the company website or in its research, yet has published a long narrative of how AI can help with "poverty" "inequality" "peace" "meaning" "health" and other broad positives? This seems to me very likely to be heavily filtered sharing of his models and beliefs with highly distortionary impacts on the rest of the worlds' models of AI in the positive direction, which is to be expected from this $10B+ company that sells AI products, rather than (say) Amodei merely getting to things out of order and of course he'll soon be following-up with just as thorough an account of his models of the existential threats he believes are on the horizon, and the rest of the organization just never thought to write about it in their various posts and papers.

(I would be interested in a link to whatever the best and broadly-readable piece by Anthropic or its leadership on existential risk from AI is. Some chance it is better than I am modeling it as. I have not listened to any of Amodei's podcasts, perhaps he speaks more straightforwardly there.)

Added: As a small contrast, OpenAI mentions extinction and human disempowerment directly, in the 2nd paragraph on their Superalignment page, and an OpenAI blogpost by Altman links to a Karnofsky Cold Takes piece titled "AI Could Defeat All Of Us Combined". Altman also wrote two posts in 2014 on the topic of existential threats from Machine Intelligence. I would be interested to know the most direct things that Amodei has published about the topic.

comment by Noosphere89 (sharmake-farah) · 2024-10-15T02:58:34.716Z · LW(p) · GW(p)

This is explainable by the fact that the essay is a weird mix of both a call to action to bring about a positive vision of an AI future, combined with it also claiming/predicting some important things of what he thinks AI could do.

He is both importantly doing predictions/model sharing in the essay, and also shaping the prediction/scenario to make the positive vision more likely to be true (more cynically, one could argue that it's merely a narrative optimized for consumption to the broader public where the essay broadly doesn't have a purpose of being truth-tracking).

It's a confusing essay, ultimately.

comment by Drake Thomas (RavenclawPrefect) · 2024-10-15T23:38:07.217Z · LW(p) · GW(p)

(I work at Anthropic.) My read of the "touch grass" comment is informed a lot by the very next sentences in the essay:

But more importantly, tame is good from a societal perspective. I think there's only so much change people can handle at once, and the pace I'm describing is probably close to the limits of what society can absorb without extreme turbulence.

which I read as saying something like "It's plausible that things could go much faster than this, but as a prediction about what will actually happen, humanity as a whole probably doesn't want things to get incredibly crazy so fast, and so we're likely to see something tamer." I basically agree with that.

Do Anthropic employees who think less tame outcomes are plausible believe Dario when he says they should "touch grass"?

FWIW, I don’t read the footnote as saying “if you think crazier stuff is possible, touch grass” - I read it as saying “if you think the stuff in this essay is ‘tame’, touch grass”. The stuff in this essay is in fact pretty wild! 

That said, I think I have historically underrated questions of how fast things will go given realistic human preferences about the pace of change, and that I might well have updated more in the above direction if I'd chatted with ordinary people about what they want out of the future, so "I needed to touch grass" isn't a terrible summary. But IMO believing “really crazy scenarios are plausible on short timescales and likely on long timescales” is basically the correct opinion, and to the extent the essay can be read as casting shade on such views it's wrong to do so. I would have worded this bit of the essay differently.

Re: honesty and signaling, I think it's true that this essay's intended audience is not really the crowd that's already gamed out Mercury disassembly timelines, and its focus is on getting people up to shock level 2 or so rather than SL4, but as far as I know everything in it is an honest reflection of what Dario believes. (I don't claim any special insight into Dario's opinions here, just asserting that nothing I've seen internally feels in tension with this essay.) Like, it isn't going out of its way to talk about the crazy stuff, but I don't read that omission as dishonest.

For my own part:

  • I think it's likely that we'll get nanotech, von Neumann probes, Dyson spheres, computronium planets, acausal trade, etc in the event of aligned AGI.
  • Whether that stuff happens within the 5-10y timeframe of the essay is much less obvious to me - I'd put it around 30-40% odds conditional on powerful AI from roughly the current paradigm, maybe?
  • In the other 60-70% of worlds, I think this essay does a fairly good job of describing my 80th percentile expectations (by quality-of-outcome rather than by amount-of-progress).
  • I would guess that I'm somewhat more Dyson-sphere-pilled than Dario.
  • I’d be pretty excited to see competing forecasts for what good futures might look like! I found this essay helpful for getting more concrete about my own expectations, and many of my beliefs about good futures look like “X is probably physically possible; X is probably usable-for-good by a powerful civilization; therefore probably we’ll see some X” rather than having any kind of clear narrative about how the path to that point looks.