We haven't quit evolution [short]

post by the gears to ascension (lahwran) · 2022-06-06T19:07:14.025Z · LW · GW · 3 comments

edit: there is now a much higher quality version [LW · GW] of this post, archive-browsers should read that instead.

I've seen folks say humanity's quick growth may have broken the link to evolution's primary objective, often referenced as total inclusive fitness. I don't think we have broken that connection.

  1. Let process temporarily refer to any energy-consuming structured chemical or physical reaction that consumes fuel - this could also be termed "computation" or in many but not all cases "life".
  2. let "defensibility" refer to the size of the moat in configuration space which maintains a process against interference - ie, nearest disrupting perturbation.
  3. for all matter, Evolution-of-matter's-process optimizes for process-defensibility-per-unit-fuel.
  4. genetic evolution is a subset of self-preserving processes. total inclusive fitness is intended to measure gene-level genetic selfishness in terms of offspring, but I would argue that discrete offspring are the wrong unit: genetic evolution's noise-defense-aka-mutation-resistance is built by the preservation of genes that increase durability*efficiency.
  5. therefore, because improving the self-selection self-process by use of contraception allows humans to guide their own reproduction, contraception is not automatically a divergence from incentive - and to the degree it is, it's selected against.
  6. therefore, improving the self-selection process by simply not dying allows humans to defend their own structure much more accurately than traditional reproduction - though it's not as defensible as strategies that replicate as hard as they can, a full integrated being can often be quite defensible over the medium term, and hopefully with life extension, over the long term as well.
  7. as further evidence, humans appear to have a significant desire to remember. This is well-described by this framework as well! mental process also qualifies as an evolution-of-matter's-process, and thought patterns seek some set of accepted state transitions so that the after-transition structure qualifies as "self".
  8. this also relates well to concerns folks on lesswrong have expressed regarding self-modification: all forms of process self-maintenance have some degree of self-update, and various energetic processes control their rate of self-update.
  9. it relates to EY's view that a safe hard-ASI should be asked to pull the ladder up behind itself: to ensure its own process-durability. In a post recently he used this as an example of the kind of defense a State-like singleton should have. however, I'd propose that maintaining its self-process should be done in a way that ends all vulnerability of any physical process.

If any of my word bindings are unclear, let me know and I'll add a definition that attempts to link the concepts to each other better.

Misc notes: I'm not the best english-solver, folks who've studied math proofs are likely much better than I am at this semiformal syntax, and if you've noticed an error, it's probably real, post it - doing logic involves heavy backtracking. I'm not as educated in these fields of math as I'd like to be. I have in my lw shortform [LW · GW] an index of youtube sources that discuss various topics including these, I've only skimmed for the most part, but in particular I'd recommend anyone serious about ai safety catch up on the work discussed at the simons institute.

3 comments

Comments sorted by top scores.

comment by Lone Pine (conor-sullivan) · 2022-09-10T00:33:00.565Z · LW(p) · GW(p)

I understood everything except point 9. Can you re-explain point 9 in different words?

Replies from: lahwran
comment by the gears to ascension (lahwran) · 2022-09-10T06:45:43.866Z · LW(p) · GW(p)

heh coming back to this I see I really could have edited it a bit more carefully. generally the pivotal acts framing involves planning to 1. create an agi which is perfectly aligned 2. get it to foom both inwards (distillation) and outwards (scaling). 3. ask it to pull up the ladder behind it: perform a "pivotal act" which takes over the world, ensuring that the entire planet's agency is now gated by the new AI, and that no being can violate the presumably-perfectly-aligned ai's imposed boundaries. the expectation, then, is that by creating this new singleton agent, the alignment-seeking process defined by the ai's code can no longer be interrupted by any plausibly-reachable intervention that any other life can attempt; any attempt by any other life to interfere with the top-level singleton gets prevented by the singleton's self-preservation systems.

this is, uh, a description of a coup. literally a coup to end all coups, forever. I don't think it's terribly controversial to claim that this is a fundamentally misaligned action that is entirely sufficient to classify the AI as misaligned if it agrees to do the thing when asked, because in real life, trying to do #1, create a perfectly aligned ai, is in the realm of wishful gofai thinking, imo - even if we succeed "completely"-so-to-speak, we'd still be designing a training process that could be expected to converge to alignment.

I would suggest that instead, we need to be looking for ways to produce pivotal memetics that allow all beings to upgrade their self-preservation in parallel. presumably, upgrading self-preservation in parallel will require the insightful ideas involved to describe, approximately legibly and verifiably (for some appropriately strict values of legible and verifiable), how to deescalate conflicts better than any other instruction manual. In other words, the pivotal act should, ideally, be to focus first on reducing extant agents' desire for conflict by constructive direct action to simply make everyone safer. instead of intervening subtractively, destroying anything that would allow another AI to come into existence, focus on explaining in terms understandable by every being why it is critical that they respect others' agency, and give them tools to do so.

And in order to pull this off, I'd claim that it can't involve first immensely violating their agency. What I'm hoping occurs through this path is that constructive solutions to problems like scarcity can get us to a point where people are safe enough to not want to fight in the first place, for the most part, so that coprotection networks can reach >50% percolation at all graph scales. It would be hard to end war forever by starting a war, but it seems more viable to me to end war forever by first drastically reducing risk of death to existing beings.

And my hope from this is that if we can pull it off, it means that evolutionary processes naturally become convergently safe: if everyone has instructions for how to defend themselves durably against threats of almost all sizes, in format that works for them and allows the agent reading the safe ai's message to gate the agency-of-instructions by the agency of the [human or other ai or plant genome or whatever] that is reading them, then the instructions only take effect if they in fact actually make the being more capable and safe.

This is a really vague sketch, of course. it sounds great, but it's hardly a target we haven't wanted to hit - I'm suggesting something that still relies on a pretty strongly aligned ai. But this is an alternative to violent pivotal acts that seems to me to at least be vaguely promising, and so we can discuss what would be needed to get the folks who have been studying game dynamics to become strong enough to figure out cooperative ai enough to pull it off.

In retrospect I was typical minding massively when I wrote OP, wasn't I, heh. It took me a lot of words to explain what I thought was obvious when writing #9! Let me know if this clears up what I'm dreaming about here well enough.

to rephrase one more time: imo, a plausible core of the safety objective is "increase the degree to which every agent-containing physical system the ai interact with respects the agency of all the agentic subprocesses within it, and inject tools that the agents within the system can use to deescalate agency-conflicts as needed"

comment by the gears to ascension (lahwran) · 2022-06-06T19:18:21.704Z · LW(p) · GW(p)

Note: this is an argument for the continuous view of alignment, which I've seen promoted elsewhere on lesswrong as well, eg: https://www.lesswrong.com/posts/H5iGhDhQBtoDpCBZ2/?commentId=sAhReJfp2SkvACwJQ [LW · GW]