Conservation of Expected Ethics isn't enough

post by Stuart_Armstrong · 2016-06-15T18:08:10.000Z · LW · GW · 1 comments

An idea relevant for AI control; index here.

Thanks to Jessica Taylor.

I've been playing with systems to ensure or incentivise conservation of expected ethics - the idea that if an agent estimates that utilities and are (for instance) equally likely, then its future estimate for the correctness of and must be the same. In other words, it can try and get more information, but can't bias the direction of the update.

Unfortunately, CEE isn't enough. Here are a few decisions the AI can take that respect CEE. Imagine that the conditions of update relied on, for instance, humans answering questions:

#. Don't ask. #. Ask casually. #. Ask emphatically. #. Build a robot that randomly rewires humans to answer one way or the other. #. Build a robot that observes humans, figures out which way they're going to answer, then rewires them to answer the opposite way.

All of these conserve CEE, but, obviously, the last two options are not ideal...

1 comments

Comments sorted by top scores.

comment by RyanCarey · 2017-01-02T21:44:08.000Z · LW(p) · GW(p)

I noticed that CEE is already named in philosophy. Conservation of expected ethics is roughly what what Artnzenius calls Weak Desire Reflection. He calls Conservation of expected evidence Belief Reflection. [1]

  1. Arntzenius, Frank. "No regrets, or: Edith Piaf revamps decision theory." Erkenntnis 68.2 (2008): 277-297. http://www.kennyeaswaran.org/readings/Arntzenius08.pdf