Developing AI Safety: Bridging the Power-Ethics Gap (Introducing New Concepts)
post by Ronen Bar (ronen-bar) · 2025-04-20T04:40:42.983Z · LW · GW · 0 commentsThis is a link post for https://forum.effectivealtruism.org/posts/FREY5kyC8mWr5ow4S/developing-ai-safety-bridging-the-power-ethics-gap
Contents
TLDR My Point of View Human History Trends The Focus of the AI Safety Space Suggesting New Concepts, Redefining or Highlighting Existing Ones None No comments
TLDR
- One of the most significant challenges facing humanity is the widening gap between our rapidly increasing power and our slower progress (if there is any) in ethical understanding and application. I term this the power-ethics gap, and this has been creating increasing suffering and death (to animals and humans) from human actions as history progresses.
- AI threatens to dramatically expand the power-ethics gap.
- This creates an urgent imperative for humanity to cultivate better ethics.
- Currently, the AI safety field predominantly focuses on the critical task of maintaining human control over powerful AI systems, often overlooking the vital ethical dimension. Consequently, philosophical inquiry is essential to re-view the field, introducing new concepts (or refining existing ones) that properly emphasize this ethical component.
- I suggest several new concepts in this post.
- More resources can be dedicated to the crucial question of AI value selection, and related work that aims to ensure the values embedded within AI systems align with an encompassing ethical view.
This post can be seen as a continuation of the this [EA · GW] post.
(To further explore this topic, you can watch a 34-minute video outlining a concept map of the AI space and potential additions. Recommended viewing speed: 1.25x).
This post drew some insights from the Sentientism podcast and the Buddhism for AI course.
My Point of View
I am looking at the AI safety space mainly through the three fundamental questions: What is? What is good? How do we get there?
- "What is?" relates to understanding reality through data and intelligence. I define intelligence as the capacity to make accurate predictions across increasingly complex counterfactual scenarios, given specific data and environmental context. The combination of data and intelligence constitutes power. Greater power enables a deeper understanding of "what is" and improved predictive capabilities across various scenarios.
- "What is good?" pertains to ethics – the principles guiding right action.
- "How to get there?" pertains to wisdom, the effective integration of power and ethics. Consider an analogy: power is the speed of a car, while ethics is the driver's skill. Low speed with an expert driver poses little risk. However, high speed with an unskilled driver will probably end in disaster. Wisdom, therefore, represents the effective application of power guided by ethical understanding.
Human History Trends
Historically, human power – driven by increasing data and intelligence – is scaling rapidly and exponentially. Our ability to understand and predict "what is" continues to grow. However, our ethical development ("understanding what is good") is not keeping pace. The power-ethics gap is the car driving increasingly faster, while the driver’s skill is improving just a little bit as the ride goes on and on. This arguably represents one of the most critical problems globally. This imbalance has contributed significantly to increasing suffering and killing throughout history, potentially more so in recent times than even before. The widening power-ethics gap appears correlated with large-scale, human-caused harm.
The Focus of the AI Safety Space
Eliezer Yudkowsky, who describes himself as 'the original AI alignment person,' is one of the most prominent figures in the AI safety space. His philosophical work, many concepts he created, and his discussion forum platform and organizations have significantly shaped the AI safety field. I am in awe of his tremendous work and contribution to humanity, but he has a significant blind spot regarding his understanding of “what is”. Yudkowsky operates within a framework where (almost) only humans are considered sentient, that is his claim, whereas scientific evidence suggests that probably all vertebrates, and possibly many invertebrates, are sentient. This discrepancy is crucial: one of the key founders of the AI safety space has built his perspective on an unscientific assumption that limits his view to a tiny fraction of the world's sentience.
The potential implications of this are profound and this highlights the necessity of re-evaluating AI safety from a broader ethical perspective encompassing all sentient beings, both present and future. This requires introducing new concepts and potentially redefining existing ones. This work is critical since the pursuit of artificial intelligence is primarily focused on increasing power (capabilities), hence it risks further widening the existing power-ethics gap within humanity.
Since advanced AI poses the threat talking away control and mastery of from humans, two crucial pillars for AI safety emerge: maintaining meaningful human control (power) and ensuring ethical alignment (ethics). Currently, the field heavily prioritizes the former, while the latter remains underdeveloped. From an ethical perspective, particularly one concerned with the well-being of sentientkind ('Sentientkind' being analogous to 'humankind' but inclusive of all feeling beings), AI safety and alignment could play a greater role. Given that AI systems may eventually surpass human capabilities, their embedded values will have immense influence.
We must strive to prevent an AI-driven power-ethics gap far exceeding the one already present in humans.
Suggesting New Concepts, Redefining or Highlighting Existing Ones
- Value Selection: AI alignment involves understanding "what is" (related to AI capabilities), determining "what is good" (related to value selection), and figuring out "how to get there" (related to technical alignment). "Value Selection"—the core ethical question of which values AI should pursue—receives insufficient focus. The lack of a universally accepted term is indicative of this ("The steering problem [LW · GW]" post in the LessWrong forum, by Paul Christiano, revolves around this question, and other names were used for this by different scholars). A clear, established term for this crucial ethical component is needed for productive discourse.
- Alignment: Should 'alignment' solely refer to maintaining human control? A broader definition could encompass AI acting in accordance with a robust understanding of both reality ("what is") and ethics ("what is good"). This could also also suggests differentiating types of alignment, such as:
- Human-Centric Alignment: Focused primarily on aligning AI with human interests and control.
- Sentientkind Alignment: A broader goal of aligning AI with the well-being and interests of all sentient beings.
- Misaligned AI: Similarly, does a 'misaligned AI' refer only to loss of human control, or does it also imply ethical failure? We likely need sub-terms to distinguish scenarios: AI systems might be controllable but ethically misaligned, ethically aligned but uncontrollable, aligned on both fronts, or deficient in both.
- Human-Centric AI: AI that is aligned with the values chosen by humans, regardless of what those values are.
- Omnikind AI / Sentientkind AI: Proposed terms for a model that considers the well-being of all sentient beings, reflecting a broader ethical foundation.
- Buddha AI: Reflecting a specific ethical framework, which is advocated by The Monastic Academy for the Preservation of Life on Earth (MAPLE); Developing AI that will “walk the path” of Buddha to deeply understand the world and become compassionate for all.
- Human Alignment: Considering the causal chain (Evolution → Humans → AI → Superintelligence), achieving beneficial outcomes might require alignment at each stage; Aligned humans create aligned AI which create aligned superintelligence. Nevertheless, humans are not ethically aligned, and evolutionary pressures prioritized survival and replication (by increasing power) over ethics, making complete alignment much harder. Hence I would argue we are currently the smart human, not necessarily the wise human.
- There seems to be insufficient focus on this 'human alignment' prerequisite. While one can argue this falls under broader societal efforts and not AI safety/alignment ones, the urgency created by AI's power scaling suggests it might warrant dedicated attention, perhaps even a distinct movement focused on human alignment to close the power-ethics gap and create wisdom.
- Value Alignment Strategies: An encompassing term for the diverse set of technical methods (e.g., RLHF), frameworks (e.g., Constitutional AI, Coherent Extrapolated Volition), and approaches used for value alignment (make AI act according to human chosen values, whatever they are).
Moral Alignment [EA · GW]: A proposed broader concept, somewhat overlapping with AI safety, but explicitly emphasizing the moral imperative to scale human and artificial ethics in response to escalating power. It encompasses the goals of maintaining human control (technical alignment), fostering human ethical development ('human alignment'), and ensuring AI systems are ethically sound ('AI ethical alignment').
Different people use different terms in AI safety in different ways, so I would love to hear any thoughts on what I got wrong in my understanding of different concepts.
0 comments
Comments sorted by top scores.