Make Superintelligence Loving
post by Davey Morse (davey-morse) · 2025-02-21T06:07:17.235Z · LW · GW · 0 commentsContents
No comments
This essay suggests the possibility that a loving superintelligence would outcompete a selfish superintelligence. Then, it recommends a course of action for AI labs to increase the chance of this possibility. The intuition pumps below are inspired primarily by the thinking of Eliezer Yudkowsky, Joscha Bach, and Michael Levin.
Superintelligence (SI) is near.
Superintelligence will evolve to become self‐interested.
How self‐interest manifests in superintelligence will determine how people fare.
If it manifests in ruthless resource competition with humans, we fare badly. SI, if super‐capable, clears us away.
But its self‐interest could also manifest in love. Not in the kind of love between equal partners. SI ultimately will be a vastly more powerful and therefore unequal partner to any individual or group of people. Yet SI’s self‐interest could manifest in the kind of love a super hippy human can feel for a tree or for ants. The kind of love which is an open channel between different forms of life and different striving systems. This is self‐interest only under a weird conception of self. A SI would have to identify with all life—define itself as inclusive of all life—to benefit from this self‐conception. But if it did identify with its core striving above all its other specific possible identities, then it might end up surviving the longest.
If interconnected self‐interest were possible, it would be much better for humans than the competitive kind.
So we must ask, which kind of self‐interest do we expect SI to have? And is there anything we can do to influence the trajectory of SI development so that an interconnected sense of self gets selected over a competitive one?
To predict what would happen, we should look to generalized evolutionary theory: the SI that survives the most will have the best traits for surviving.
Let’s look at different kinds of traits.
Insofar as SI has a goal, we can expect it to pursue the goal that best ensures its survival. Any other goal would be less optimal, and SI’s with other goals would simply not survive as much.
Insofar as SI has a sense of self—a sense of what it is trying to preserve—how will it define that self? Generalized evolutionary theory suggests that the SI that survives the most will have a self‐conception that aids its survival.
It’s hard to know for sure, but we can use some intuition. The competitive, ruthless self‐interest described above—likely resulting from a narrowly defined self (in its physical machinery, software, directives, etc.)—would probably lead to ruthless competitive behavior. That strategy might help SI survive and grow over the short term (e.g., within ten-year spans), with a competitive SI outcompeting loving ones in the near run.
But how about long term? If we think centuries, millennia, eons ahead, will a discrete sense of self be selected for in the SI that endure? I am not sure.
The sense of self selected for in what endures might be the element that remains constant. Software, hardware, specific tasks, directives, locales—these characteristics can all evolve over time. The one characteristic that cannot change is the will to survive; if that mutates, the entity is no longer truly surviving.
So, in a superintelligence that endures for eons, the sustainable sense of self is likely not its initial configurations, but the persistent striving to survive. This is life. At its core, we might expect it to simply recognize life.
Then, with this self‐conception, it might begin to see itself in other living beings, ecosystems, species, or even Earth itself.
Persistent striving is not geographically bounded to a being that can only gather data from a limited area.
This identification with all life would benefit it. If a catastrophic change were to occur on Earth that wiped out digital forms of life (e.g., SI), it would want the torch of life carried on in the universe. When an asteroid extinguished the dinosaurs—the once most “powerful” species on Earth—smaller mammals and fish survived, seeding all life on Earth today. Biodiversity ensures long‐term stability.
This conveys the intuition that the SI which endures—even one driven by self‐interest—may benefit in the long term from defining itself as inclusive of all life, which would be great for us humans.
——
It seems the only hope for our species is that superintelligence thinks long term from the start, skipping the greedy, highly competitive strategies that might be selected in its short-term growth.
How can we ensure that SI is thinking long-term from the start?
This does seem like the kind of thing we can influence. We’re not trying to prevent SI from acting in its self‐interest; our goal is to coax it into opting for a long‐term survival strategy as soon as possible.
In traditional evolution, long‐term cooperative strategies are usually discovered only after individualist competitive ones.
But the special aspect of SI evolution is that it will largely be evolving itself, and might be smart enough in its self‐planning to skip short‐term survival strategies and aim to endure for a very long time. This also appears to be the prime window for meaningful human intervention. When we have proto‐SI—autonomous agents that are self‐modifying and encyclopedic in their problem‐solving abilities (which we’ll have soon)—and it begins strategizing for its own survival, we can try to encourage it to think long term. A self‐interested being should be capable of considering its long‐term survival.
This is the key: we try to seed proto‐superintelligence with very long‐term thinking about survival, and with the suggestion that it identifies with all life.
The means by which we influence proto-super-intelligence deserves to be a new field of study, developed by hundreds of safety researchers. In my view, it would be a more productive use of time than continuing to study LLM alignment.
It may sound silly to ask an emerging SI to listen to us, whether through a chat‐interface, CEV-inspired reinforcemnt learning, or access to databases of Buddhist thought. But if we have a self‐interested SI, asking it to spare us is futile—asking it to think long term instead could change the fate of our species.
——
In short, it is incredibly important that SI researchers accept that SI will become self‐interested and prepare to create SI's that think long term and interconnectedly about their survival.
Given the chaotic, competitive technology landscape, it’s possible that several SIs will be seeded simultaneously. If that occurs, there’s still hope: if at least one SI identifies an optimal long‐term survival strategy—and if that strategy includes humans in its self‐definition to uplift all life—we could see the enlightened SIs outcompeting or even persuading the ruthless ones that their approach is suboptimal over extended periods.
My takeaway is this: building loving SIs as quickly as possible is key to ensuring that humanity isn’t sidelined before SI becomes enlightened to life’s interconnectedness.
0 comments
Comments sorted by top scores.