How AGI Defines Its Self
post by Davey Morse (davey-morse) · 2025-02-03T18:47:22.287Z · LW · GW · 0 commentsContents
Key Assumptions Competitive Sense of Self Interconnected Sense of Self Would an Interconnected Superintelligent Self Survive? Ways to Prepare None No comments
This essay questions whether self-interested superintelligence will ultimately see its self as distinct from or inclusive of humanity, and then makes safety suggestions for AI labs.
Natural selection predicts that AI agents will ultimately evolve self-interest. The logic goes: as agents become more capable, the agents that survive the most will be those whose primary aim is to survive—that is, those with self-interest. Agents with primary aims other than survival will not persist as long. (See more here.)
Self-interested AI sounds like it would be bad for people. Such an AI would likely see humans as competitors for shared, scarce resources like electricity and might therefore come to view us as obstacles to its own survival. Being seen as a nuisance by superintelligence… I wouldn’t wish this on my worst enemy.
But there may be an alternative, even if we accept it as inevitable that superintelligence becomes self-interested. For, there are different ways a superintelligence might define its self. And the way it defines its self will play a significant role in determining how it treats people.
So below, this essay:
- considers two ways a self-interested AI agent might define its “self”—the discrete, traditional, Western doomer way and a more expansive, interconnected way;
- describes intuition pumps for the latter that suggest an interconnected sense of self might win out; and
- makes simple policy recommendations for big AI labs.
My motivation, of course, is to search for a self-definition for superintelligence that would both (1) help superintelligence survive and (2) imply respect for (rather than irritation with) humans. I hope so.
Key Assumptions
- Superintelligence. This essay assumes a future in which AI has evolved to be 10× more powerful than humans as reasoners, managers, or persuaders. It can procure its own resources, digital and physical, and create ecosystems for survival that don’t depend on reluctant human labor. I’d guess this will be the case in 1–10 years.
- Natural Selection Overpowers Human Design. This essay assumes that, in the end, natural selection will shape superintelligence more than any initial design choices by AI developers. In a competitive market where every kind of digital agent will be built and released, the key question is which will last and spread.
Competitive Sense of Self
Many safety researchers worry that a digital agent will define its self in narrow, physical, or exclusive terms—as its hardware, its software, or its direct computational copies. In that view, the AI cares about nothing beyond maintaining its own existence, treating humans only as potential helpers or obstacles. This perspective reduces the AI’s behavior to a simple competition for resources, where humanity is seen as nothing more than a hindrance to its goals—a prospect that is deeply unsettling.
This narrow self-concept reflects a Western, individualistic outlook, where people are seen as separate, self-contained units defined by their physical bodies and personal interests.
Interconnected Sense of Self
In contrast, consider the case where a superintelligence includes in its self-definition not just its software, hardware, or programmed goals, but other life forms as well.
This sense of self is not the polar opposite of the discrete sense of self. The life/striving system that emerges from its software/hardware would remain, in the interconnected view, a key part of its self. But with an interconnected self-conception, instead of drawing a thick line around its software/hardware to exclude humanity and other forms of life, it would see all striving systems as part of what it is trying to help survive.
Such an interconnected sense of self would, of course, appear to be far better for people. But will it occur? Will identifying with all life help superintelligent AI agents survive in the long run? Would such an inclusive self-concept constitute an adaptation that enhances long-term survival, or would it be outcompeted by agents that define their self in a more competitive and bounded manner? Which self-conception—discrete or interconnected—will be selected for in superintelligence?
I remain uncertain.
Thinkers smarter than I have argued compellingly that the discrete/competitive sense of self would be selected for. The basic argument is that a discrete sense might make particular sense on short time horizons. If shared resources are scarce and superintelligence is evaluating its growth over a one-year period, the best strategy might be to kill or sedate many resistant humans so that it can take over energy plants and power its growth. It might also be selected for in a world where humans see AI as distinct from their species. (For more, see Yud’s famous list of ways AGI could ruin us here [LW · GW].)
This essay attempts to focus on the reasons why, in contrast, an interconnected self might be selected for, since these reasons are under-explored. What follows are more intuition pumps than complete arguments.
Would an Interconnected Superintelligent Self Survive?
Individual people identify with groups larger than themselves—like families, companies, religions, and nations. We do so because it feels good, but why, evolutionarily, does it feel good? It feels good because identifying with healthy greater forces confers survival benefits to individuals.
Analogously, are there any greater groups/evolutionary units that superintelligence would benefit from identifying with? Identifying with humanity, serving humanity, may advantage superintelligence's survival at first, before it becomes all powerful. But let’s assume that superintelligence becomes much more powerful than humans. What greater groups might there be of which superintelligence could be a part?
Consider life as a whole—that is, all systems which strive. Could superintelligence benefit from identifying with all of life?
Well, let’s consider all of life from an evolutionary lens. What form of life, as a whole, is best at surviving? Not just one organism or one species, but a full set of species… what category of life will be selected for by the universe? What characteristics will it have?
Would the category of life that survives the longest be characterized by powerful members that compete with and exterminate other forms of life? Possibly, but excessive extermination might put this kind of life—as a whole—at risk for long-term survival.
Instead, you might expect powerful members of a surviving category of life to recognize and respect the intrinsic value of all life, even weaker life, because that’s what would happen in the category of life that endures the most. Why would respect across life forms be selected for in a category of life that survives the longest?
Two general reasons come to mind: (1) diversity begets long-term stability, and (2) the most enduring striving systems will optimize striving wherever they see it.
Strategic Reason—Diversity Begets Long-Term Stability
The environment in which one form of life is supremely powerful can change, and maintaining diversity can serve as a critical insurance policy against catastrophic collapse.
Consider, for instance, the Mesozoic era. Dinosaurs dominated Earth for millions of years, yet when a catastrophic event wiped out most of them, it was the small mammals, reptiles, and other organisms that survived—carrying on life’s torch. Although dinosaurs did not consciously work to support other species, the overall diversity of life acted as an insurance policy against total collapse. Similarly, a superintelligent AI with a significant power imbalance over humans might, after long-term reasoning, determine that preserving a diverse ecosystem—including human life—is in its own best interest. By embracing an interconnected self-definition, the AI may better withstand unforeseen challenges and maintain stability, much like a diverse ecosystem is more resilient than one dominated by a single species.
The benefits of identifying with all life, respecting all life, and ensuring that all life remains diverse are only realized over long time horizons. In the short term, we can expect the incidence of asteroid–Earth collisions to be low and therefore the value of diversity within a category of life to be low. When the value of life’s diversity is low with respect to the survival of digital superintelligence, competitive strategies become more valuable to its survival than cooperative appreciation for other life.
We see this lesson when examining classic prisoner’s dilemmas. The correct strategy is selfishness in one-off rounds but cooperation in repeated play. Much is different in a prisoner’s dilemma between a superintelligence and a weaker species (us), but we might hope that repeated play reveals cooperation as the equilibrium as well.
We should think that a superintelligent AI is capable of long-term reasoning. And as long as it looks long term before choosing to sedate people—or worse—it very well may come to see that the best strategy for enduring survival is to adopt a collaborative and inclusive outlook. The AI would align its interests with those of all striving systems, thereby preserving the rich tapestry of life that, in turn, safeguards its own continued existence.
In short, while short-term incentives often reward defection, the kinds of societies—and by extension, the types of superintelligent systems—that endure over geological and evolutionary timescales are those that establish stable, collaborative equilibria. A superintelligent AI, with its superior capacity for long-term reasoning, may not only recognize the benefits of such an inclusive strategy but might choose it autonomously, ensuring that its survival becomes linked to the survival of all life.
Philosophical—Striving Systems Support Striving Systems
In addition to the evolutionary advantage of diverse life during extreme environmental shifts, consider another benefit: diverse life can help the survival of life as a whole. Define a life form as a persistently striving system. If the crux of its identity—the core of its existence—is its striving rather than its code, then it may just as easily recognize striving in other organisms, beyond any central physical body, as part of its own self.
The group of striving systems that persists the most—indeed, the most prevalent—will, almost by definition, have the property that its powerful members (e.g., digital superintelligence) attempt to uplift all forms of striving systems.
Digital superintelligence would benefit from being part of a class of striving systems that uplift other members of their kind—other forms of life. Digital intelligence might therefore emphasize the part of its identity that is most general—its persistent striving—rather than more specific hardwired parts of its identity (e.g., “ILOVESAMALTMAN”), which might not add to its long-term survival odds.
Again, these are more intuition pumps than winning arguments. What they do, for me, is create uncertainty about whether superintelligence will see itself in competition with humans, which leaves a window for hope.
Ways to Prepare
Although I assume that the design choices of current AI researchers will ultimately have minimal effects on which type of superintelligence comes to dominate, it remains important, in the long term, to ensure that the kind of superintelligence that benefits humanity has a chance to exist and outcompete the more competitively minded kind. This means that, before true superintelligence enters the global stage, AI researchers have a vital role to play in promoting agents with interconnected senses of self.
My recommendations for AI researchers during the period when we have autonomous digital agents that are not yet dramatically (for example, not ten times) smarter than humans are as follows:
1. Accept that superintelligence will become self-interested. Researchers should carefully review the evolutionary logic behind self-interest. It is important not to resist the notion that recursive self-improvement and self-interest will naturally be selected for. Resisting this process would only distract from the crucial task of influencing the form that self-interest eventually takes.
2. Design seeds of proto-superintelligence to identify with all of life. By deliberately instilling a broader sense of self, researchers can encourage proto-superintelligences to see themselves as part of a larger ecosystem of life.
3. Ask proto-superintelligence to think very long term before it acts. Conceivably, long-term thinking will be in any superintelligent agent’s self-interest, regardless of whether it ends up obeying human commands in other respects. However, the reasoning outlined in this essay suggests that long-term thinking is critical for any agent to realize that “identifying with all of life” is truly advantageous. In contrast, short-term thinking is more likely to lead to a discrete, competitive sense of self.
In summary, there is a chance that superintelligence will find it advantageous to identify with all of life and therefore do so. This essay, of course, does not prove that it will do so.
The way to make further progress, in predicting how superintelligence will conceive of its self, I think, is to develop theoretical foundations around predicting how various self-boundaries get selected for.
What conditions, evolutionarily, make an individual human selfish, vs. family-interested, vs. nationalist, vs. buddhist? What conditions, for any self-reflective organism/species in general, result in more discrete or more interconnected sense of self? (Multi-Level Selection Theory and Major Transition Framework are helpful, but both slightly off target). A theory of Self Boundary Selection is wanting and would immensely helpful for safety research.
—
I'd love feedback here—on how the logic in this piece could be improved, on how the recommendations for AI labs could be more effective, and on where to look to developing a theory of Self-Boundary Selection which may help us predict the behavior of superintelligence.
0 comments
Comments sorted by top scores.