Posts
Comments
In some sense, the Agent Foundations program at MIRI sees the problem as: human values are currently an informal object. We can only get meaningful guarantees for formal systems. So, we need to work on formalizing concepts like human values. Only then will we be able to get formal safety guarantees.
unless i'm misunderstanding you or MIRI, that's not their primary concern at all:
Another way of putting this view is that nearly all of the effort should be going into solving the technical problem, "How would you get an AI system to do some very modest concrete action requiring extremely high levels of intelligence, such as building two strawberries that are completely identical at the cellular level, without causing anything weird or disruptive to happen?"
Where obviously it's important that the system not do anything severely unethical in the process of building its strawberries; but if your strawberry-building system requires its developers to have a full understanding of meta-ethics or value aggregation in order to be safe and effective, then you've made some kind of catastrophic design mistake and should start over with a different approach.
this was posted after your comment, but i think this is close enough:
And the idea that intelligent systems will inevitably want to take over, dominate humans, or just destroy humanity through negligence is preposterous.
They would have to be specifically designed to do so.
Whereas we will obviously design them to not do so.