William D'Alessandro (william-d-alessandro)
Is Deontological AI Safe? [Feedback Draft]
Lots of good stuff here, thanks. I think most of this is right.
- Agreed about powerful AI being prone to unpredictable rules-lawyering behavior. I touch on this a little in the post, but I think it's really important that it's not just the statements of the rules that determine how a deontological agent acts, but also how the relevant (moral and non-moral) concepts are operationalized, how different shapes and sizes of rule violation are weighted against each other, how risk and probability are taken into account, and so on. With all those parameters in play, we should have a high prior on getting weird and unforeseen behavior.
- Also agreed that you can mitigate many of these risks if you've got a weak deontological agent with only a few behavior-guiding parameters and a limited palette of available actions.
- My impression of the AIs value alignment literature is that it's actually quite diverse. There are some people looking at deontological approaches using top-down rules, and some people who take moral uncertainty or pluralism seriously and think we should at least include deontology in our collection of potential moral alignment targets. (Some of @Dan H 's work falls into that second category, e.g. this paper and this one.) In general, I think the default to utilitarianism probably isn't as automatic among AI safety and ethics researchers as it is in LW/EA circles.