What is instrumental convergence?

post by Vishakha (vishakha-agrawal), Algon · 2025-03-12T20:28:35.556Z · LW · GW · 0 comments

This is a link post for https://aisafety.info/questions/897I/What-is-instrumental-convergence

Contents

No comments

This is an article in the featured articles series from AISafety.info. AISafety.info writes AI safety intro content. We'd appreciate any feedback

The most up-to-date version of this article is on our website, along with 300+ other articles on AI existential safety.

Instrumental convergence is the idea that sufficiently advanced intelligent systems with a wide variety of terminal goals would pursue very similar instrumental goals.

A terminal goal (also referred to as an "intrinsic goal" or "intrinsic value") is something that an agent values for its own sake (an "end in itself"), while an instrumental goal is something that an agent pursues to make it more likely that it will achieve its terminal goals (a "means to an end").

For instance, you might donate to an organization that helps the poor in order to improve people’s well-being. Here, “improve well-being” is a terminal goal that you value for its own sake, whereas “donate” is an instrumental goal that you value because it helps you achieve your terminal goal: if you found out that your money wasn’t making people better off, you’d stop donating.

While certain instrumental goals are particular to specific ends (e.g., filling a cup of water to quench your thirst), other instrumental goals are broadly useful. For example, if we imagine an AI with a very specific (and weird) terminal goal — to create as many paperclips as possible — we can see why this goal might lead to the AI pursuing a number of instrumental goals:[1]

  1. ^

    Nick Bostrom, "The Superintelligent Will: Motivation and Instrumental Rationality in Advanced Artificial Agents" (2012).

0 comments

Comments sorted by top scores.