eyeseyes

Posts
Comments

Posts

Comments

Comment by eyeseyes on Superintelligence Alignment Proposal · 2025-02-04T02:18:20.046Z · LW · GW

I don't know much about this field, but here are my thoughts on my lovely friend davey's piece. Engagement welcome:

Intro:

Some concern about the natural selection argument:

First, what's involved in your notion of self-interest? E.g. a bee is self-interested, but not reflexively— self-conscious self-interest is a different issue (that doesn't seem to be straightforwardly implied by any selection principle. I'm not convinced we humans, endowed with self-conscious self-interest, will be evolutionarily selected for in the long run). What suggests superintelligence has any use for self-definition?

Next, why should we believe that the essential features of natural selection will carry over to AI agent selection? (e.g. agents don’t evolve, species do through reproduction— do AI agents reproduce in the relevantly similar way? Also later down you ask what “category of life” will be selected for— this should be “phenotypes” if you’re sticking to the analogy.) Also, you seem to be suggesting there will be some kind of equilibrium, or place that AI will evolve “to”— why can’t we worry about a single evil AI that one individual programs?

Also, how do we measure evolutionary success? There are approx 10^28 pelagibacter bacteria (the most abundant life form on earth), which must mean that they’ve evolved well/stably. There are ~10^10 humans, and we’ve only been around for a couple tens of thousands of years (and might not be around for much longer— if I were a “super intelligence” fighting for my species’ survival, I would choose to be a durable, nonintrusive, ant)

Key assumptions

Where does “10x more powerful than humans” come from? Seems arbitrary?

Sense of self sections:

In general, I don’t think I understand what the concept of “sense of self” you have in mind is, including both “competitive” and “interconnected” versions. It seems like you mean a concept of how one fits in the world, which isn’t what I have in mind when I think of my “sense of self”. I think most people world-wide (if not everyone) see their self as (roughly) their body (or mind/agency, or probably both in some way). What is perhaps distinctly western/competitive is how we see ourselves in relation to society, but this depends on AI’s conception of us, not itself. And again, I don’t see why AI’s view of us needs to be any particular way.

On this point, why would the AI superintelligence “care” about us at all? Are there examples of competition between AI and us happening that hasn’t been specifically cooked up by programmers/researchers?

Are the possibilities exclusive? (E.g. what about a case where two superintelligences “merge” their selves?)

When you say “identifying” (e.g. with life, evolutionary units) do you mean this metaphorically or literally? [After finishing, it seems like you mean “self-identifying as a living thing like humans and cats and bacteria." This would be good to make clear, as you obviously don't mean that a superintelligence identifies with me-- a superintelligence will never be able to think and act on my thoughts as I do. But why does superintelligence need to self-classify as “living” or “nonliving” in the first place?]

Re: time horizons (and point 3 in the final section): “long-term” is relative to what we think as long term (100-5000 years, maybe). Why not expect AI to think a million years ahead, in which case it’s perhaps the charitable thing to put us out of our misery or something?

“Define a life form as a persistently striving system. If the crux of its identity—the core of its existence—is its striving rather than its code, then it may just as easily recognize striving in other organisms, beyond any central physical body, as part of its own self.”— not sure how I feel about this. First, what is striving? Striving for what? (What is it to merely “strive”?) Does striving require self-consciousness of striving? Second, the logic here strikes me as a little weird— either it's a very weak claim with a non sequitur antecedent (what does a particular feature of my identity have to do with recognizing this feature of others’ identities?), or if we take the strong version (with “can” instead of “may”) it implies that if I cannot easily recognize striving in other organisms, then the core of my existence is not striving. This seems false.

I don’t want you to take the above as a criticism of your conclusion— as you know, I think doomerism about all of this is a little more silly than realistic. But that’s because I think meaningful self-awareness of AI is a little more silly than realistic. If (per Yud’s article) our machines add “sleeper agents” into our medical technology, I would be much more likely to attribute this to a bad (human) actor, or a complicated error, than a mal-intentioned AI. Now, a complicated error like this is more likely with the expansive tools and unrestrictedness of AI, so I ultimately (obviously) agree with you that safety protocols should be in place.

Ways to Prepare

“Researchers should carefully review the evolutionary logic behind self-interest”— I’d be careful about things like “evolutionary logic”. I’ve never heard of this before, and evolutionary psychology in general is super contentious.

What makes self-improvement “recursive”?

A skeptical worry: why won’t AI read your paper and be angry we’ve exerted influence on it to be interconnected?

“The way to make further progress, in predicting how super intelligence will conceive of its self, I think, is to develop theoretical foundations around predicting how various self-boundaries get selected for.”— yeah this is really cool, expand!

User info

Posts

Comments