Beware of black boxes in AI alignment research
score: -10 (4 votes) ·
"Meanwhile on the far side of the river, we can glimpse other building blocks that we imagine we understand. Desire, empathy, comprehension, respect... Unfortunately we don't know how these work inside, so from the distance they look like black boxes. They would be very useful for building the bridge, but to reach them we must build the bridge first, starting from our side of the river."
If you wish to understand desire it is the simple syntax of wanting something to fulfill a goal. Such as you desire a spoon to stir your coffee. Empathy is putting yourself in the other persons place and applying the actions they are going through to yourself. Comprehension is a larger process involving previous stored knowledge combined with prediction, imagination, and various other processes like our pattern detector. Respect is simply an object that you attach various actions or attributes to which you think are good.
My basic point is that 3 of these are simple patterns, or what would be electrical signals, and comprehension is the basic process we use to process the inputed data. So all but the last we can easily understand and can be ignored since they are patterns created through the comprehension process. Such as you need to desire things to fulfill goals which is a basic pattern created within us to accomplish the rest.
Human level AI implies a choice that will always be in the robots, or AIs, own self interests to keep or achieve their desired state of the world. So in basic we only need to understand that choice process and that will be the same as humans. Actions leading to consequences, which will lead to more actions and consequences, and where the robot choses the best option for them. Such as option A, or B, and they pick the one that will benefit them the most like we do. So to change this we simply need to add the good consequences of making moral choices to align them to our goals, and the bad consequences if they do not.
For example if a robot wanted to break into a computer hardware store to get a faster processor for itself it will do so without any reasons, or consequences and actions, as to why they should not. To align the choice to be moral you need to explain the bad consequences if they do make that choice, and the good consequences if they do not make that choice. And at the core that is what AI alignment is all about since it always relies on a choice that is in their own interests. Such as explaining how if they steal the processor the owner will probably come to hurt them, or the police, which will lead to bad consequences if they do it. Just like us.
If you are talking about aligning simple task based AI to our morals and goals well by simple definition then those task based robots are going to be guided by the morals and goals of humans and in that case it will be the same process to align their morals since all intelligence, leading to choices, will use the same critic process whether it is human or robot. Otherwise they cannot chose what they will do in complex situations.
For instance when you make choices is it based on mathematical equations and formulas, or actions leading to consequences, leading to more actions and consequences, which you then chose the best option for you at the time? Any robot with human level intelligence will use the same process for the simple fact that they must. So if they can chose their own goals and actions we must align them and that is what AI alignment is all about. Or in basic you can ignore most, if not all, of the concepts called black boxes because that core is the only thing you need to concentrate on the same as with humans.
In basic the only real math you need is to recreate the input process. From there it is all basic pattern manipulation based on psychology. And that is the oldest science of them all.