"Pick Two" AI Trilemma: Generality, Agency, Alignment.
post by Black Flag (robert-shala-1) · 2025-01-15T18:52:00.780Z · LW · GW · 0 commentsContents
Introduction Generality + Agency ⇒ Alignment sacrificed. Generality + Alignment ⇒ Agency curtailed. Agency + Alignment ⇒ Limited Generality Conclusion None No comments
Introduction
The conjecture is that an AI can fully excel in any two of these dimensions only by compromising the third.
In other words, a system that is extremely general and highly agentic will be hard to align; one that is general and aligned must limit its agency; and an agentic aligned system must remain narrow. Below, I discuss how today’s AI designs implicitly “pick two.”
This is a useful mental model to look at AI systems because it clarifies fundamental tensions in contemporary AI design. It highlights how and where compromises typically arise.
Generality + Agency ⇒ Alignment sacrificed.
An AI that is both very general and truly agentic – selecting and pursuing open-ended goals – poses the classic alignment problem. This is a much discussed topic, and its suffice to say that absent new breakthroughs, highly general and agentic AI systems will require stringent constraints (on objectives, actions, or knowledge) to remain aligned.
Generality + Alignment ⇒ Agency curtailed.
One path to aligned general intelligence is to remove persistent agency. A well-known concept is the Oracle or Tool AI, a super-intelligent system designed only for answering questions, with no ability to act in the world. By confining a generally intelligent AI to provide information or predictions on request (like a highly advanced question-answering system or an LLM “simulator” of possible responses) it is possible to leverage its broad knowledge while keeping it from executing plans autonomously. This setup encourages the AI to defer to human input rather than seize agency.
Modern large language models, which exhibit considerable generality, are deployed as assistants with carefully constrained actions. They are aligned via techniques like instruction tuning and RLHF, but notably, these techniques limit the AI’s “will” to do anything outside the user’s query or the allowed policies. They operate in a text box, not roaming the internet autonomously (except with cautious tool-use, and always under user instruction). As a result, we get generally knowledgeable, helpful systems that lack independent agency – effectively sacrificing the agent property to maintain alignment.
Agency + Alignment ⇒ Limited Generality
The third combination is building AI agents that are strongly aligned within a narrow domain or limited capability level. Narrow AI agents (a chess engine or autonomous vehicle) have specific goals and can act autonomously, but their generality is bounded. This limit on scope simplifies alignment: designers can more exhaustively specify objectives and safety constraints for a confined task environment.
Systems in many specialized roles (drones, industrial robots, recommendation algorithms) operate with agency but within narrow scopes and with heavy supervision or “tripwires” to stop errant behavior. For example, an autonomous driving system is designed with explicit safety rules and operates only in the driving context. It has agency (controls a vehicle) and is intended to be aligned with human safety values, but it certainly cannot write a novel or manipulate the stock market. In essence, we pay for alignment by constraining the AI’s generality.
Conclusion
Achieving two out of the triad is feasible: we can build very general tools (GPT-style oracles) that remain aligned by shunning agentic autonomy, or highly agentic systems aligned to human-specified tasks (like AlphaGo) that aren’t generally intelligent. We do not yet know how to build a generally intelligent, autonomous agent that we can trust with arbitrary decisions in the open world.
This doesn’t mean the trilemma is insurmountable in principle – ongoing research in value learning, transparency, and control theory aims to bend these trade-offs. But until alignment techniques reliably scale with capability, prudent AI development will “pick two.”
The hope is that with new alignment paradigms (e.g. scalable oversight or provably beneficial AI), future AI can expand toward full generality and agency without sacrificing safety – but until then, any claim of achieving all three should be met with healthy skepticism.
0 comments
Comments sorted by top scores.