AI Alignment through Comparative Advantage

post by artemiocobb · 2024-08-04T00:32:52.628Z · LW · GW · 4 comments

Contents

  Why should humans posses a comparative advantage over AGIs?
  Why do we need (1)?
  A thought expirement in favor of the above, and an example for why we need (2)
  A mechanism for Coherent Extrapolated Volition?
  A new direction for AI alignment:
None
4 comments

Much of this post comes from the ideas presented in this paper. I assume the central claim of the paper; that AGI systems should posses the right to make contracts, hold property, and bring tort claims. In this post I aim to flesh out some ideas from the paper that specifically pertain to AI alignment rather than AGI governance. 

In a world where AGIs are superintelligent and outperform humans in every economically important task, how do we ensure humanity's survival and maintain a stable economic and social system? This proposal suggests a mechanism for aligning superintelligent systems with human interests, preventing catastrophic outcomes like mass unemployment, resource inequality, or the possibility of AGIs deeming humanity obsolete.

Humans must maintain a comparative advantage over AGIs, and I am under the belief that doing this beneficially requires that AGIs posses:

(1) A ceaseless objective that can always be further optimized for. There is no “maximum” attainable value.

(2) Completing the subgoals needed to optimize for the objective must incur a higher opportunity cost for the AGI than completing the goals we humans care about.

Why should humans posses a comparative advantage over AGIs?

As argued by this paper, AGIs may dominate humans in every economically important task. But, with fundamentally limited resources such as compute and energy, AGIs would suffer an opportunity cost by executing some tasks instead of others. Those tasks, which may even be necessary for AGIs to pursue their objectives, can be executed by humans instead. For example, say an AGI’s objective is to generate prime numbers. The AGI - being superintelligent - could produce other systems to mentain the GPUs it runs on and the power plants that generate the electricity it needs. But executing these subgoals requires compute and energy that could otherwise be spent on generating prime numbers, so these tasks are left for humans to execute instead. 

Assuming there is always a comparative advantage of humans over AGIs, humans will always have economically important tasks to complete. Further, AGIs will always be incentivized to avoid human extinction.  

Why do we need (1)?

For humans to maintain any comparative advantage over AGIs, the AGI’s optimization for its objective must be ceaseless. If an AGI can fully maximize its objective and is not deactivated, it may then use its resources to tend to all its subgoals. This reduces the opportunity cost of its subgoals, thereby diminishing any comparative advantage humans might have had. For example, once an AGI has cured all known diseases - assuming that is its only objective - it can devote its resources to building systems that maintain the GPUs it runs on so it can cure new diseases in the future. However, if an AGI’s objective is never fully satisfied and it requires limited resources to pursue this objective, there will always be an opportunity cost associated with its subgoals, preserving human relevance.

A thought expirement in favor of the above, and an example for why we need (2)

Assume we develop an AGI whose only objective is to generate prime numbers. Optimizing for this objective is ceaseless; there are infinitely many prime numbers. And this objective is not a proxy for any human values or goals. But optimizing for this objective requires completing numerous subgoals; maintaining infrastructure for electricity generation, developing the resources needed for this infrastructure (e.g., concrete, glass, …), designing better GPUs, constructing the parts needed for GPUs, etc. To avoid the high opportunity cost of devoting its limited resources to these subgoals when it could be generating prime numbers instead, the AGI can allocate them to humans. In exchange, to incentivize humans to complete these subgoals, the AGI can complete other tasks that the humans care about (e.g., curing diseases, growing crops, producing products and content humans enjoy). 

The crux of this system is that the tasks humans must complete to serve the AGI’s objective must incur a higher opportunity cost for the AGI than the tasks that we humans care about. If this condition is met, however, then a system of trading goods and services between AGIs and humans arise. The AGI serves to benefit from benefiting humans. Essentially, we are forcing the AGI to optimize for what humans care about as a prerequisite for optimizing its own objective. 

A mechanism for Coherent Extrapolated Volition?

The AGI described above would be incentivized to find out what human desires and motivations are and realize them. In doing so, this AGI would have more capitol to trade with humans in exchange for completing the subgoals needed for optimizing its objective. 

A new direction for AI alignment:

Under this framing, progress toward building generally capable, superintelligent AI systems is progress toward building beneficial AI systems. We also need technical solutions to ensure that assumption (2) is met, including investigating the specific conditions under which it holds true.

4 comments

Comments sorted by top scores.

comment by Charlie Steiner · 2024-08-05T07:17:14.510Z · LW(p) · GW(p)

Thanks for the link to this paper!

I think you're just getting downvoted because it was bad, but it was interesting to skim :)

So, one basic reason the comparative advantage argument doesn't work in this case is that an AI is not a fixed-size entity the way a human is, or the way a national economy is on the timescale of international trade. If you had a human whose goal was computing prime numbers, a human has a limited amount of work they can do, and so if they want to get a lot of prime-number-computing done, they'd better cooperate with other humans who will do useful tasks (e.g. growing food) that free up more time to compute primes.

Whereas if you had an AI whose goal was computing prime numbers, it doesn't have a fixed amount of work it can do - the amount of work it can do depends on how much computing power it can get. So there is no inherent tension between the amount of prime-number-computing it can do and the amount of work it can do on other useful tasks (e.g. running a power plant), because doing a useful task might change what computing resources are available to the AI.

It's like if there are two people on a desert island, and the island has precisely enough food for two people, neither person has an incentive to kill the other, and in fact they have incentive to cooperate and focus on their comparative advantages. But if one of the people is actually an alien capable of splitting down the middle into two copies, then as soon as they're more productive than the human they have an incentive to kill the human and use the food to copy themselves.

Replies from: artemiocobb
comment by artemiocobb · 2024-08-05T23:06:17.583Z · LW(p) · GW(p)

Thank you for your comment and some great points!

So there is no inherent tension between the amount of prime-number-computing it can do and the amount of work it can do on other useful tasks (e.g. running a power plant), because doing a useful task might change what computing resources are available to the AI.

I agree with you on this.  Would you buy my argument on comparative advantage if we assume that superintelligent systems cannot modify/improve/replicate themselves? If we assume that superintelligent systems are "fixed-size entities"? If still no, then can you highlight additional points you disagree with?

But if one of the people is actually an alien capable of splitting down the middle into two copies, then as soon as they're more productive than the human they have an incentive to kill the human and use the food to copy themselves.

Also a good point. But said alien would likely not attack the human unless the alien is absolutely confident it can kill the human with minimal damage to itself. Otherwise, the alien risks a debilitating injury, losing the fight and antagonizing the human, etc. I see a similar line of reasoning as why a "moderately" superintelligent system (sorry for being so imprecise here, just trying to convey an idea I am developing on the fly) would not modify/improve/replicate itself it if knew that attempting to do so would trigger a response that risks bad outcomes (e.g., being turned off, having a significant portion of its resources destroyed, having to spend resources on a lengthy conflict with humans instead of generating prime numbers, ...). Of course, a "highly" superintelligent system would not have to worry about this; it could likely wipe out humanity without much recourse from us. 

Replies from: Charlie Steiner
comment by Charlie Steiner · 2024-08-06T00:55:16.186Z · LW(p) · GW(p)

Would you buy my argument on comparative advantage if we assume that superintelligent systems cannot modify/improve/replicate themselves? If we assume that superintelligent systems are "fixed-size entities"? If still no, then can you highlight additional points you disagree with?

Yeah, you might still imagine "bad behavior" that's better for the AI than voluntary trade, e.g. manipulating / defrauding humans in various ways.

comment by khafra · 2024-08-05T06:27:10.023Z · LW(p) · GW(p)

One unstated, load-bearing assumption is that whatever service or good humans can trade to ASI will be of equal or greater worth to it than our subsistence income.