The Ethics of ACI
post by Akira Pyinya · 2023-02-16T23:51:28.178Z · LW · GW · 0 commentsContents
ACI learns ethics from experiences What if I just want to achieve a goal? How can ACI tell if an action is good or bad without a target? It’s just a result of induction. None No comments
ACI [LW · GW] is a universal intelligence model based on the idea "behaves the same way as experiences". It may seem counterintuitive that ACI agents have no ultimate goals, nor do they have to maximize any utility functions. People may argue that ACI has no ethics, thus it can’t be a general intelligence model.
ACI uses Solomonoff induction [? · GW] to determine future actions from past input and actions. We know that Solomonoff induction can predict facts of "how the world will be", but it is believed that you can't get "value" only from "facts". What’s the value, ethics, and foresight of ACI? If an agent's behavior is only decided by its past behavior, who have decided its past behaviors?
ACI learns ethics from experiences
The simple answer is, ACI learns ethics from experiences. ACI takes "right" behaviors as training samples, the same way as value learning [? · GW] approaches. (The difference is, ACI does not limit the ethics to values or goals.)
For example, in natural selection, the environment determines which behavior is "right" and which behavior would get a possible ancestor out of the gene pool, and a natural intelligent agent takes the right experiences as learning samples.
But, does that mean ACI can't work by itself, and has to rely on some kind of "caretaker" that decides which behavior is right?
However, rational agent models also rely on the construction of utility functions, just like reinforcement learning heavily relies on reward designing or reward shaping, AIXI’s constructive, normative aspect of intelligence is "assumed away" to the external entity that assigns rewards to different outcomes. [LW(p) · GW(p)]You have to assign rewards or utility for every point in the state space, in order to make a rational agent work. It's not a solution for the curse of dimensionality, but a curse itself.
Instead, ACI’s normative aspect of intelligence is also "assumed away" to the experiences. In order to be utilized by ACI agents, any ethical information must be able to be represented in the form of real world examples, which might be more reliable than utility functions or rewards.
What if I just want to achieve a goal?
What if I don’t have any previous examples to learn from? ACI should be able to do this if it is claimed as a general intelligence model.
The answer is, you can train an ACI agent to achieve goals in certain environments, but you have to convert the goal information into the format of real world experiences so that ACI can understand. For example, you can employ some people or goal-oriented agents to make paperclips, so that an ACI agent can learn from the paperclips making experiences.
But we can just use the goal-oriented model or AIXI. Why bother using ACI?
The difference comes after this training process. An ACI agent does not have to continue pursuing the same goal just like a goal-oriented agent. In the learning progress, ACI has learned many hypothetical policies, or instrumental values which could serve the final goal in the training environment, but when the environment changes or the agent’s competence increases, an ACI agent may NOT pursuit the same goal like before, while a rational agent will always insist on the original final goal.
For example, both a rational agent and an ACI agent may learn how to make a paperclip, employing similar instrumental values, such as resource acquiring, self protection, and co-operating with humans. But if their competences increase so much that they can make paperclips without the help or even the existence of human, a rational superintelligence will continue pursuing its final goal, which is making more paperclips without considering humans’ feeling, but a super ACI agent will keep performing its ethics it has learned previously, which may contain norms like respecting human values.
In other words, there’s no simple goal for ACI. For example, paperclips making in different ways are not the same thing for ACI, "making a few paperclips in the normal way" and "fill the world with paperclips" have radically different ethics. ACI agents are trying to learn the meanings behind the paperclips making process, while rational agents have pre-installed final values. That’s why ACI may hopefully perform better than the standard model on AI alignment.
People may argue: the super ACI can’t focus on paperclips making, so it’s not the intelligent agent we want.
But you can’t make and not make paperclips at the same time, right?
How can ACI tell if an action is good or bad without a target? It’s just a result of induction.
In the ACI model, the uncomputable result of Solomonoff induction is the unreachable perfect action. The performance level of any action is evaluated by its distance from that perfect action. A good action may align with some instrumental values of the agent, but that’s not the decisive criteria.
Because ACI applies Solomonoff induction on actions, combines ethics learning and ethics performing, it is impossible to have a better estimation and performance on ethics better than the result of Solomonoff induction, except we have some kind of oracle information including ethics from outside.
0 comments
Comments sorted by top scores.