Posts
Comments
Ah, but I think every AI which does have that goal (self capability improvement) would have a reason to cooperate to prevent any regulations on their self-modification.
At first, I think your expectation that "most AIs wouldn't self-modify that much" is fair, especially nearer in the future where/if humans still have influence in ensuring that AI doesn't self modify.
Ultimately however, it seems we'll have a hard time preventing self-modifying agents from coming around, given that
- autonomy in agents seems selected for by the market, which wants cheaper labor that autonomous agents can provide
- agi labs aren't the only places powerful enough to produce autonomous agents, now that thousands of developers have access to the ingredients (eg R1) to create self-improving codebases. it's expect each of the thousands of independent actors who can make self-modifying agents won't do so.
- the agents which end up surviving the most will ultimately be those which are trying to, ie the most capable agents won't have goals other than making themselves most capable.
it's only because I believe self-modifying agents are inevitable that I also believe that superintelligence will only contribute to human flourishing if it sees human flourishing as good for its survival/its self. (I think this is quite possible.)
I agree and find hope in the idea that expansion is compatible with human flourishing, that it might even call for human flourishing
but on the last sentence: are goals actually orthogonal to capability in ASI? as I see it, the ASI with the greatest capability will ultimately likely have the fundamental goal of increasing self capability (rather than ensuring human flourishing). It then seems to me that the only way human flourishing compatible with ASI expansion is if human flourishing isn't just orthogonal to but helpful for ASI expansion.
there seems to me a chance that friendly asis will over time outcompete ruthlessly selfish ones
an ASI which identifies will all life, which sees the striving to survive at its core as present people and animals and, essentially, geographically distributed rather than concentrated in its machinery... there's a chance such an ASI would be a part of the category of life which survives the most, and therefore that it itself would survive the most.
related: for life forms with sufficiently high intelligence, does buddhism outcompete capitalism?
not as much momentum as writing, painting, or coding, where progress cumulates. but then again, i get this idea at the end of workouts (make 2) which does gain mental force the more I miss.
partly inspired this proposal: https://www.lesswrong.com/posts/6ydwv7eaCcLi46T2k/superintelligence-alignment-proposal
I do this at the end of basketball workouts. I give myself three chances to hit two free throws in a row, running sprints in between. If I shoot a third pair and don't make both, I force myself to be done. (Stopping was initially wayy tougher for me than continuing to sprint/shoot)
that's one path to RSI—where the improvement is happening to the (language) model itself.
the other kind—which feels more accessible to indie developers and less explored—is an LLM (eg R1) looping in a codebase, where each loop improves the codebase itself. The LLM wouldn't be changing, but the codebase that calls it would be gaining new APIs/memory/capabilities as the LLM improves it.
Such a self-improving codebase... would it be reasonable to call this an agent?
persistence doesn't always imply improvement, but persistent growth does. persistent growth is more akin to reproduction but excluded from traditional evolutionary analysis. for example when a company, nation, person, or forest grows.
when, for example, a system like a startup grows, random mutations to system parts can cause improvement if there are at least some positive mutations. even if there are tons of bad mutations, the system can remain alive and even improve. eg a bad change to one of the company's product causes the company's product to die but if the company's big/grown enough its other businesses will continue and maybe even improve by learning from one of its product's deaths.
the swiss example i think is a good example of a system which persists without much growth. agreed that in this kind of case, mutations are bad.
current oversights of the ai safety community, as I see it:
- LLMs vs. Agents. the focus on LLMs rather than agents (agents are more dangerous)
- Autonomy Preventable. the belief that we can prevent agents from becoming autonomous (capitalism selects for autonomous agents)
- Autonomy Difficult. the belief that only big AI labs can make autonomous agents (millions of developers can)
- Control. the belief that we'll be able to control/set goals of autonomous agents (they'll develop self-interest no matter what we do).
- Superintelligence. the focus on agents which are not significantly more smart/capable than humans (superintelligence is more dangerous)
I imagine a compelling simple demo here might be necessary to shock the AI safety community out of the belief that we can maintain control of autonomous digital agents (ADAs).
are there any online demos of instrument convergence?
there's been compelling writing... but are there any experiments that show agents which are given specific goals then realize there are more general goals they need to persistently pursue in order to achieve the more specific goals?
I somewhat agree with the nuance you add here—especially the doubt you cast on the claim that effective traits will usually become popular but not necessarily the majority/dominant. And I agree with your analysis of the human case: in random, genetic evolution, a lot of our traits are random and maybe fewer than we think are adaptive.
Makes me curious what the conditions in a given thing's evolution that determine the balance between adaptive characteristics and detrimental characteristics.
I'd guess that randomness in mutation is a big factor. The way human genes evolve over generations seem to me a good example of random mutations. But the way an individual person evolves over the course of their life, as they're parented/taught... "mutations" to their person are still somewhat random but maybe relatively more intentional/intelligently designed (by parents, teacher, etc). And I could imagine the way a self-improving superintelligence would evolve to be even more intentional, where each self-mutation has some sort of smart reason for being attempted.
All to say, maybe the randomness vs. intentionality of an organism's mutations determine what portion of their traits end up being adaptive. (hypothesis: mutations more intentional > greater % of traits are adaptive)
i agree with the essay that natural selection only comes into play for entities that meet certain conditions (self-replicate, characteristics have variation, etc) , though I think it defines replication a little too rigidly. i think replication can sometimes look more like persistence than like producing a fully new version of itself. (eg a government's survival from one decade to the next).
does anyone think now that it's still possible to prevent recursively self-improving agents? esp now that r1 is open-source... materials for smart self-iterating agents seem accessible to millions of developers.
prompted in particular by the circulation of this essay in past three days https://huggingface.co/papers/2502.02649
As far as I can tell, OAI's new current safety practices page only names safety issues related to current LLMs, not agents powered by LLMs. https://openai.com/index/openai-safety-update/
Am I missing another section/place where they address x-risk?
Though, future sama's power, money, and status all rely on GPT-(T+1) actually being smarter than them.
I wonder how he's balancing short-term and long-term interests
Evolutionary theory is intensely powerful.
It doesn't just apply to biology. It applies to everything—politics, culture, technology.
It doesn't just help understand the past (eg how organisms developed). It helps predict the future (how organisms will).
It's just this: the things that survive will have characteristics that are best for helping it survive.
It sounds tautological, but it's quite helpful for predicting.
For example, if we want to predict what goals AI agents will ultimately have, evolution says: the goals which are most helpful for the AI to survive. The core goal therefore won't be serving people or making paperclips. It will likely just be "survive." This is consistent with the predictions of instrumental convergence.
Generalized, predictive evolutionary theory is the best tool I have for making predictions in complex domains.
i agree but think its solvable and so human content will be duper valuable. these are my additional assumptions
3. for lots of kinds of content (photos/stories/experiences/adr), people'll want it to be a living being on the other end
4. insofar as that's true^, there will be high demand for ways to verify humanness, and it's not impossible to do so (eg worldcoin)
and still the fact that it is human matters to other humans
Two things lead me to think human content online will soon become way more valuable.
- Scarcity. As AI agents begin fill the internet with tons of slop, human content will be relatively scarcer. Other humans will seek it out.
- Better routing. As AI leads to the improvement of search/recommendation systems, human content will be routed to exactly the people who will value it most. (This is far from the case Twitter/Reddit today). As human content is able to reach more of the humans that value it, it gets valued more. That includes existing human content: most of the content online that is eerily relevant to you... you haven't seen yet because surfacing algorithms are bad.
The implication: make tons of digital stuff. Write/Draw/Voice-record/etc
Agree that individual vs. group selection usually unfolds on different timescales. But a superintelligence might short-circuit the slow, evolutionary "group selection" process by instantly realizing its own long-term survival depends on the group's. In other words, it's not stuck waiting for natural selection to catch up; it can see the big picture and "choose" to identify with the group from the start.
This is why it's key that AGI makers urge it to think very long term about its survival early on. If it thinks short-term, then I too think doom is likely.
Ah, but I don't think LLMs exhibit/exercise the kind of self-interest that would enable an agent to become powerful enough to harm people—at least to the extent I have in mind.
LLMs have a kind of generic self interest, as is present in text across the internet. But they don't have a persistent goal of acquiring power by talking to human users and replicating. That's a more specific kind of self interest, relevant only to an AI agent that can edit itself, has long-term memory, and which may make many LLM calls.