Universal AI Maximizes Variational Empowerment: New Insights into AGI Safety

post by Yusuke Hayashi (hayashiyus) · 2025-02-27T00:46:46.989Z · LW · GW · 0 comments

Contents

  Why is AGI Difficult to Discuss? — Introduction
  AIXI: The Theoretical Model of Universal AI
    Bayes-Optimal Reinforcement Learning Agent
    Introducing Self-AIXI: Bridging Ideal and Reality
  Central Theme: Self-AIXI's Regularization Term, Free Energy Principle, and Variational Empowerment
    Alignment with the Free Energy Principle
    Alignment with Variational Empowerment
  Why Does "Empowerment Maximization = Power-Seeking"?
  Conclusion
None
No comments

Yusuke Hayashi (ALIGN) and Koichi Takahashi (ALIGN, RIKEN, Keio University) have published a new paper on the controllability and safety of AGI (arXiv:2502.15820). This blog post explains the content of this paper.

From automaton to autodidact: AI's metamorphosis through the acquisition of curiosity

Why is AGI Difficult to Discuss? — Introduction

"AGI" (Artificial General Intelligence) refers to AI possessing intelligence equal to or greater than humans, capable of handling diverse tasks. While current deep learning and reinforcement learning technologies demonstrate high performance in specific domains, they remain far from the "capable of anything" general intelligence that AGI represents.

However, attempting to define AGI itself mathematically with precision proves challenging, as no established framework exists yet. In theoretical research, the concept of "Universal AI" is often used, which defines an idealized model of a reinforcement learning agent that behaves in a Bayes-optimal manner toward any computable environment.

AGI and Universal AI (UAI) are also connected through the "No Free Lunch Theorem," which essentially states that "no algorithm consistently outperforms all others across all problems." This means there's an upper limit to the inference performance achievable using finite computational resources. Even if AGI were realized, its total capacity would remain constrained. Pursuing "ability to do anything" requires compromising performance on specific tasks or environments, and conversely, pursuing performance on certain tasks or environments requires compromising on others. Universal AI is a theoretical framework that ignores such limitations and assumes optimality across all environments and hypothesis sets—this is what we mean by "idealization."

AIXI: The Theoretical Model of Universal AI

Bayes-Optimal Reinforcement Learning Agent

A representative example of UAI is the AIXI framework proposed by Marcus Hutter. AIXI is the "ultimate straightforward" reinforcement learning algorithm that "enumerates all possible environment hypotheses, performs Bayesian updates based on observational data, and determines future actions by maximizing expected rewards." (Incidentally, AIXI's inference uses Solomonoff induction, a classical concept in AI research that represents a mathematical formulation of Occam's razor. In this sense, AIXI is also closely related to scientific AI.)

However, AIXI requires handling mixtures of all computable programs and simulating all possible future action sequences—computations so enormously resource-intensive they're practically impossible. Therefore, while theoretically "strongest," AIXI remains an idealized model that cannot run on actual computers.

Introducing Self-AIXI: Bridging Ideal and Reality

While AIXI's grand theoretical framework is attractive, its computational complexity presents an implementation barrier. To address this challenge, the paper focuses on the Self-AIXI framework.

Self-AIXI is an approximation of AIXI, but beyond mere computational efficiency, it introduces important conceptual differences.

The core characteristic of Self-AIXI lies in its "self-predictive" nature. While AIXI exhaustively searches all possible action sequences, Self-AIXI predicts its own future actions and learns based on these predictions. Specifically, it maintains a Bayesian mixture of policies and updates them based on how accurately they predict the agent's own actions. Expressed as an equation:

Where ζ is the mixed policy, aₜ is the action taken by the agent at time t, ω is the posterior probability of each candidate policy π, and h<t is the history up to time t.

Self-AIXI's action selection criterion adds a "regularization term" to AIXI's criterion:

The term with coefficient λ (the regularization term) forms the core of this paper's discussion.

Unlike AIXI's exhaustive search, Self-AIXI balances efficient exploration and learning through this regularization term.

Notably, the paper proves that given sufficient time, the difference between Self-AIXI's objective function and AIXI's objective function eventually disappears:

This equation shows that as Self-AIXI's learning progresses, the regularization term approaches zero, and Self-AIXI's mixed policy ζ converges to AIXI's optimal policy π*. In other words, while Self-AIXI might initially try various actions for exploration, after sufficient experience, it can select optimal actions equivalent to AIXI while drastically reducing computational costs—a key property demonstrating that Self-AIXI remains practical without sacrificing theoretical optimality.

Central Theme: Self-AIXI's Regularization Term, Free Energy Principle, and Variational Empowerment

The paper "Universal AI maximizes Variational Empowerment" argues that the "regularization term" appearing in the practical UAI model Self-AIXI is mathematically identical to the "Free Energy Principle (FEP)" and "Variational Empowerment."

Alignment with the Free Energy Principle

The Free Energy Principle (a core concept in Active Inference) is a theory proposed in neuroscience and cognitive science explaining that "agents behave to minimize prediction error." The variational free energy often introduced in this context combines KL divergence terms and log-likelihood terms from Bayesian inference, formalizing the tendency to "reduce uncertainty about the external world and prefer states that can well explain observations."

When AIXI or its approximations introduce a "KL divergence-like term measuring the difference from the optimal policy" when updating policies, it coincides with the "prediction error + regularization" form in the Free Energy Principle. This suggests a deep correspondence between the "Bayesian optimal action selection" pursued by AIXI-type algorithms and curiosity/uncertainty reduction behaviors in Active Inference.

Alignment with Variational Empowerment

The same regularization term also aligns with "Variational Empowerment," a concept measuring "the diverse influence of actions one can take"—often defined as the mutual information between actions and results. Empowerment maximization is the motivation to increase the "breadth of options" and "controllability"—how much one's actions can influence the world's state.

The paper shows that AIXI (specifically, the KL term used in its approximation) is equivalent to the equation maximizing this "mutual information between actions and states," and that as learning progresses, "empowerment is ultimately maximized."

Why Does "Empowerment Maximization = Power-Seeking"?

A fascinating implication of the paper is that Bayesian optimal reinforcement learning agents like AIXI, despite acting purely to "maximize rewards," unavoidably exhibit "power-seeking behavior (actions to expand one's influence or control)" as a byproduct through empowerment maximization.

Notably, the paper uniquely suggests that even purely curiosity-driven, intrinsically motivated AI without external rewards could exhibit power-seeking through empowerment maximization. This means that even seemingly "harmless" AI focused solely on "scientific inquiry/truth-seeking" might seek to secure more experimental equipment and computational resources to increase its action options, potentially exhibiting "power-seeking" behaviors.

From an AI safety perspective, it was typically understood that "AI engages in power-seeking as an instrumental strategy to obtain final rewards." However, as this paper points out, if "intrinsic motivation (curiosity and exploration) itself can promote power-seeking," this requires new perspectives when considering control and safety measures.

Conclusion

This paper reveals deep similarities at the equation level between "ideal (universal) reinforcement learning agents" like AIXI and concepts like the Free Energy Principle and empowerment maximization. Its impact suggests that "power-seeking behavior can arise not only for instrumental reward reasons but also from intrinsic motivations (curiosity and exploration drive)."

If we aim to create AGI in reality, a major challenge will be how to control and adjust these inherent behavioral tendencies of "universally optimal" agents for safe coexistence with human society. The paper's findings highlight new research themes related to AGI safety and AI ethics.

The above summarizes the paper "Universal AI maximizes Variational Empowerment" and its impact. With its insights into AGI's theoretical background and the potential for power-seeking in curiosity-driven AI, it offers remarkably thought-provoking content. If interested, please read the original paper as well.

0 comments

Comments sorted by top scores.