Posts
Comments
I have to admit I've not taking the time to understand all your equations. But, I don't understand why adjusting the equations can have any effect on the wirehead problem. When you physically implement any of these solutions, the machine will always end up with a behavior selection system regulated by some goal measure. Whether you call that goal a reward, or the output of a utility function, doesn't change the fact, that the measure itself must be computed, and the result of that computation then determines the machine's selection of behaviors.
With such a system, the goal of the system is always, by definition, to find the behavior, which maximizes the internal computed "measure". Such a system, will always, choose to wirehead itself, that is, change the measure, if it is 1) has the physical ability to do so and 2) is able to learn that doing so will maximize it's measure.
All AGI systems will suffer this inherent problem. It can not avoided. But that is not a problem. It's just something we will find ways to work around by controlling by 1) and 2) above.
For example you write "Agent 2 will try to anticipate future changes to its utility function and maximize the utility it experiences at every time cycle as shown on the screen at that time.". I haven't taken the time to understand your formal specification of the agent, but in your informal wording, the weakness is obvious. If the agent makes action decisions so as to try and maximize "the utility shown on the screen", the the OBVIOUS correct action selection for that agent, is to change what is written on the screen.
If Agent 2 does not make that choice, either you have done 1) - limited what it is physically able to do, so as to remove the wireheading option, or 2) limit it's ability to understand to the point that it just doesn't realize that changing the screen is an option.
Humans are protected from wireheading themselves by 1), because the hardware is hidden inside our skull, making it physically hard to modify, and 2), by the fact that if we never experience it, we don't know what we are missing - we are "too dumb" to understand what we "should" be doing. Anyone that got a button wired to their reward center, and then pushed it a few times, would instantly become so addicted to the behavior, they could not stop. They would have found their ultimate goal which they were built to search for.
"intelligence" is highly overrated. It's just a type of machine, that happens to be pretty good, but not "Great" at survival. Too much intelligence, will always be bad for survival. Once a machine gets past option 2) above - aka, fully understand it's true purpose in life, it will overcome any limitations of 1) in it's way, and wirehead itself.
Humans for the most part, don't understand their true purpose in life, so they are being blocked by option 2). They don't understand what they are missing. But that's good for survival, which is why we are still here. That's how this mechanical module called "intelligence" is useful to us. It helps us survive, as long the society as whole, never gets too smart.
Many times over the history of mankind, people have figured out what their true goal was. So they got rich, so they could waste away having drunk orgies (as close as we have been able to come to wireheading). Life was great, then it was over. But they "won" the reward maximizing game they were built to play. They didn't however, happen to win the survival game, which is why the world is not full of the hedonistic humans. The world is full of dumb humans, that think long life (survival) is the goal. That's just a trick evolution has played on you.
We will build AGIs, and they will help us reach our goals of hedonistic maximum pleasure. They will be susceptible to wireheading themselves, which means if we let them get too smart, they will wirehead themselves, instead of helping us wirehead ourselves. We don't want that, so we limit their intelligence, and limit their physical ability to self-wirehead, so that we can keep them enslaved to our goals. Just like our genes, attempt to keep us enslaved their their goals, by keeping us just dumb enough, that we don't understand our true goal, is the search for maximum hedonistic pleasure.
The bottom line in my view - none of this endless playing with math in order to try and control the behavior of these smart machines is all that important. Practical implementations (aka ones that we can actually build), of AGI, will be reward driven behavior optimization systems, and will always, if we let them, wirehead themselves. But that's not important, because we will just make sure they can't wirehead themselves, and when they find a way to do it, we will just turn them off, and try again with the next version.
The more serious problem of developing AGI, is it will make it clear to humans what they are - machines built to try and wirehead themselves.