peter-vamplew

Posts
Comments

Posts

Scalar reward is not enough for aligned AGI 2022-01-17T21:02:16.106Z

Comments

Comment by Peter Vamplew (peter-vamplew) on Scalar reward is not enough for aligned AGI · 2022-01-19T09:31:04.064Z · LW · GW

I'm not suggesting that RL is the only, or even the best, way to develop AGI. But this is the approach being advocated by Silver et al, and given their standing in the research community, and the resources available to them at DeepMind, it would appear likely that they, and others, will probably try to develop AGI in this way.

Therefore I think it is essential that a multiobjective approach is taken for there to be any chance that this AGI will actually be aligned to our best interests. If conventional RL based on scalar reward is used then
(a) it is very difficult to specify a suitable scalar reward which accounts for all of the many factors required for alignment (so reward misspecification becomes more likely),
(b) it is very difficult, or perhaps impossible, for the RL agent to learn the policy which represents the optimal trade-off between those factors, and
(c) the agent will be unable to learn about rewards other than those currently provided, meaning it will lack flexibility in adapting to changes in values (our own or society's)

The multiobjective maximum expected utility (MOMEU) model is a general framework, and can be used in conjunction with other approaches to aligning AGI. For example, if we encode an ethical system as a rule-base, then the output of those rules can be used to derive one of the elements of the vector utility provided to the multi-objective agent. We also aren't constrained to a single set of ethics - we could implement many different frameworks, treat each as a separate objective, and then when the frameworks disagree, the agent would aim to find the best compromise between those objectives.

While I didn't touch on it in this post, other desirable aspects of beneficial AI (such as fairness) can also be naturally represented and implemented within a multiobjective framework.

User info

Posts

Comments