The convergent dynamic we missed

remmelt-ellen

The convergent dynamic we missed

post by Remmelt (remmelt-ellen) · 2023-12-12T23:19:01.920Z · LW · GW · 2 comments

2 comments

2 comments

Comments sorted by top scores.

comment by Tamsin Leake (carado-1) · 2023-12-14T07:38:36.394Z · LW(p) · GW(p)

I still do not agree with your position, but thanks to this post I think I at least understand it better than I did before. I think my core disagreements are:

Here is the catch: AGI components interacting to maintain and replicate themselves are artificial. Their physical substrate is distinct from our organic human substrate.

That needn't be the case. If all of the other arguments in this post were to hold, any AI or AI-coalition (whether aligned to us or not) which has taken over the world could simply notice "oh no, if I keep going I'll be overtaken by the effects described in Remmelt's post!" and then decide to copy itself onto biological computing or nanobots or whatever else strange options it can think of. An aligned AI would even moreso move towards such a substrate if you're correct that otherwise humans would die, because it wants to avoid this.

The more general piece of solutionspace I want to point to, here, is "if you think there's a way for eight billion uncoordinated human minds running on messy human brains inside of industrial civilization to survive, why couldn't aligned superintelligent AI just at the very least {implement/reuse} a copy of what human civilization is doing, and get robustness that way? (though I expect that it could come up with much better).

Another argument you may have heard is that the top-down intelligent engineering by goal-directed AGI would beat the bottom-up selection happening through this intelligent machinery.

That argument can be traced back to Eliezer Yudkowsky's sequence The Simple Math of Evolution.

I'm pretty sure I already believed this before reading any Yudkowsky, so I'll make my own argument here.

Intelligent engineering can already observed to work much faster than selection effects. It also seems straightforward to me that explicit planning to maximize a particular utility function would be expected to steer the world towards what it wants a lot faster than selection effects would. I could maybe expand on this point if you disagree, but I'd be really surprised by that.

And intelligence itself can be very robust to selection effects. Homomorphic-encryption and checkums, and things like {a large population of copies of itself repeatedly checking each other's entire software state and deactivating (via eg killswitch) any instance that has been corrupted}, are examples of technologies an AI can use to make its software robust to hardware change, in a way that'd take selection effects exponential time to be able to get even just one bit of corruption to stay into the system, such that it is not difficult for the superintelligent AI to ensure that approximately zero copies of itself ever get robustly corrupted until the heat death of the universe.

These fall outside the limits of what the AGI's actual built-in detection and correction methods could control for.

Would it? Even once it has nanobots and biotech and any other strange weird tech it can use to maintain whichever parts of itself (if any) match those descriptions?

We humans too depend on highly specific environmental conditions and contexts for the components nested inside our bodies (proteins→organelles→cells→cell lining→) to continue in their complex functioning, such to be maintaining of our overall existence.

Finally, as a last recourse if the rest of your post is true, an aligned AI which has taken over the world can simply upload humans so they don't die when the physical conditions become too bad. We can run on the same compute as its software does, immune to corruption from hardware in the same way.

As an alternative, an aligned superintelligent AI could only planets (or other celestial bodies) which we don't live on to run the bulk of its infrastructure, ensuring "from a distance" (through still very reliable tech that can be made to not get in the way of human life) that planets with humans on them don't launch an AI which would put the aligned superintelligent AI at risk.

Finally, note that these arguments are mostly disjunctive. Even just one way for aligned superintelligent AI to get around this whole argument you're making, would be sufficient to make it wrong. My thoughts above are not particularly my predictions for how an aligned superintelligent AI would actually do, but moreso "existence arguments" for how ways to get around this exist at all — I expect that an aligned superintelligence can come up with much better solutions than I can.

If there truly is no way at all for an aligned superintelligence to exist without humans dying, then (as I've mentioned before), it can just notice that and shut itself down, after spending much-less-than-500-years rearranging the world into one that is headed towards a much better direction (through eg widespread documentation of the issues with building AI and widespread training in rationality).

Replies from: remmelt-ellen

↑ comment by Remmelt (remmelt-ellen) · 2023-12-14T17:46:07.795Z · LW(p) · GW(p)

Thanks for your thoughts

and then decide to copy itself onto biological computing or nanobots or whatever else strange options it can think of.

If artificial general intelligence moves to a completely non-artificial substrate at many nested levels of configuration (meaning in this case, a substrate configured like use from the proteins to the cells), then it would not be artificial anymore.

I am talking about wetware like us, not something made out of standardised components. So these new wetware-based configurations definitely would also not have the general capacities you might think they would have. It's definitely not a copy of the AGI's configurations.

If they are standardised in their configuration (like hardware), the substrate-needs convergence argument above definitely still applies.

The argument is about how general artificial intelligence, as defined, would converge if they continue to exist. I can see how that was not clear from the excerpt, because I did not move over this sentence:
"This is about the introduction of self-sufficient learning machinery, and of all modified versions thereof over time, into the world we humans live in."

Intelligent engineering can already observed to work much faster than selection effects.

I get what you are coming from. Next to the speed of the design, maybe look at the *comprehensiveness* of the 'design'.

Something you could consider spending more time thinking about is how natural selection works through the span of all physical interactions between (parts of) the organism and their connected surroundings. And top-down design does not.

For example, Eliezer brought up before how top-down design of an 'eye' wouldn't have the retina sit back behind all that fleshy stuff that distorts light. A camera was designed much faster by humans. However, does a camera self-heal when it breaks like our eye does? Does a camera clean itself? And so on – to much fine-grained functional features of the eye.

And intelligence itself can be very robust to selection effects. Homomorphic-encryption and checkums

Yesterday, Anders Sandberg had a deep productive conversation about this with my mentor.

What is missing in your description is that the unidimensionality and simple direct causality of low-level error correction methods (eg. correcting bit flips) cannot be extrapolated to higher-level and more ambiguous abstractions (eg. correcting for viruses running over software, correcting for neural network hallucinations, correcting for interactive effects across automated machine production infrastructure).

These fall outside the limits of what the AGI's actual built-in detection and correction methods could control for.
> Would it?

Yes, because of the inequalities I explained in the longer post [LW · GW] you read. I'll leave it to the reader to do their own thinking to understand why.

As an alternative, an aligned superintelligent AI...

This is assuming the conclusion.
If we could actually have an aligned AGI (let's make the distinction), the evolutionary feedback effects cannot be sufficiently controlled for to stay aligned with internal reference values. The longer post explains why.

which has taken over the world can simply upload humans so they don't die when the physical conditions become too bad.

Those "emulated humans" based on lossy scans of human brains, etc, wouldn't be human anymore.
You need to understand the fine-grained biological complexity involved.

I expect that an aligned superintelligence can come up with much better solutions than I can.

If you keep repeating the word 'aligned', it does not make it so. Saying it also does not make it less infeasible.

If there truly is no way at all for an aligned superintelligence to exist without humans dying, then (as I've mentioned before), it can just notice that and shut itself down…

How about we have a few careful human thinkers presently living, like Anders, actually spend the time to understand the arguments?

How about we not wager all life on Earth on the hope that "AGI" being developed on the basis of corporate competition and other selective forces would necessarily be orienting around understanding the arguments and then acting in a coherently aligned enough way to shut themselves down?

… after spending much-less-than-500-years rearranging the world into one that is headed towards a much better direction

I know this sounds just like an intellectual debate, but you're playing with fire.

The convergent dynamic we missed

Contents

2 comments