human intelligence may be alignment-limited

post by bhauth · 2023-06-15T22:32:14.685Z · LW · GW · 3 comments

Previously, I argued that [LW · GW] human mental development implies that AI self-improvement from sub-human capabilities is possible, and that human intelligence comes at the cost of a longer childhood and greater divergence from evolutionarily-specified goals.

In that post, I raised 2 hypotheses:

Humans have a lot of mental variation [LW · GW]. Some people can't visualize 3d objects. Some people can't remember faces. Some people have synaesthesia. Such variation also exists among very smart people; there isn't convergence to a single intellectual archetype. You could argue that what's needed genetically is precise specification of something lower-level that underlies all that variation, but I don't think that's correct.

So, I don't think H:mutational_load is right. That leaves H:drift_bound as the only hypothesis that seems plausible to me.

Suppose that I'm correct that human intelligence comes at the cost of a longer childhood. The disadvantages of a long childhood vary depending on social circumstances. Humans may have some control mechanism which modifies the amount of mental self-improvement and thus the length of childhood depending on the surrounding environment. Certain environments - probably safe ones with ample food - would then be associated with both longer childhoods and a one-time increase in average intelligence. That would also cause greater divergence from evolutionarily-specified goals, which may show up as a decrease in fertility rates, or an increased rate of obsession with hobbies. That can obviously be pattern-matched to the situation in some countries today, but I don't mean to say that it's definitely true; I just want to raise it as a hypothesis.

If H:drift_bound is correct, it would be an example of an optimized system having a strong and adjustable tradeoff [LW · GW] between capabilities and alignment, which would be evidence for AI systems also tending to have such a tradeoff.

Agents are adaptation-executors with adaptations that accomplish goals, not goal-maximizers. Understanding agents as maximizing goals is a simplification used by humans to make them easier to understand. This is as true when the goal is self-improvement as it is with anything else [LW · GW].

"Creation of a more-intelligent agent" involves actions that are different at each step. I consider it an open question whether intelligent systems applying recursive self-improvement tend to remain oriented towards creating more-intelligent agents more than they remain oriented towards non-instrumental specified goals. My view is that one of the following is true:

  1. Instrumental convergence is correct, and can maintain creation of more-intelligent agents as a goal during recursive self-improvement despite the actions/adaptations involved being very different.
  2. Self-improvement has a fixed depth set by the initial design, rather than unlimited potential depth. This may limit AI to approximately human-level intelligence because drift would be a similarly limiting factor for both humans and AI, but it does seem that many humans have self-improvement as a goal, and some humans have creation of a more-intelligent but different self or even a more-intelligent completely separate agent as a goal.

3 comments

Comments sorted by top scores.

comment by romeostevensit · 2023-06-15T23:39:03.281Z · LW(p) · GW(p)

I don't think drift would necessarily be the same for humans and a wildly different intelligence architecture, but it's an interesting way to think about it.

Replies from: bhauth
comment by bhauth · 2023-06-16T00:05:23.035Z · LW(p) · GW(p)

Why do you think AGI would have a very different architecture from what humans do? I'd expect a lot of similarities, just with different hardware.

Replies from: romeostevensit
comment by romeostevensit · 2023-06-16T00:26:33.700Z · LW(p) · GW(p)

different constraints, different development search algo.