What is the relationship between Preference Learning and Value Learning?

riccardo-volpato

What is the relationship between Preference Learning and Value Learning?

post by Riccardo Volpato (riccardo-volpato) · 2020-01-13T21:08:40.334Z · LW · GW · No comments

This is a question post.

  Answers
    3 G Gordon Worley III
None
No comments

It appears that in the last few years the AI Alignment community has dedicated great attention to the Value Learning Problem [1]. In particular, the work of Stuart Armstrong [AF · GW] stands out to me.

Concurrently, during the last decade, researcher such as Eyke Hüllermeier Johannes Fürnkranz produced a significant amount of work on the topics of preference learning [2] and preference-based reinforcement learning [3].

While I am not highly familiar with the Value Learning literature, I consider the two fields closely related if not overlapping, but I have not often seen references the Preference Learning work, and vice-versa.

Is this because the two fields are less related than what I think? And more specifically, how do the two fields relate with each other?

References

[1] - Soares, Nate. "The value learning problem." Machine Intelligence Research Institute, Berkley (2015).

[2] - Fürnkranz, Johannes, and Eyke Hüllermeier. Preference learning. Springer US, 2010.

[3] - Fürnkranz, Johannes, et al. "Preference-based reinforcement learning: a formal framework and a policy iteration algorithm." Machine learning 89.1-2 (2012): 123-156.

Answers

answer by Gordon Seidoh Worley (G Gordon Worley III) · 2020-01-14T20:23:20.655Z · LW(p) · GW(p)

The short answer is that yes, they are related and basically about the same thing. However the approaches of researchers vary a lot.

Relevant considerations that come to mind:

The extent to which values/preferences are legible
The extent to which they are discoverable
The extent to which they are hidden variables
The extent to which they are normative
How important immediate implementability is
How important extreme optimization is
How important safety concerns are

The result is that I think there is something of a divide between safety-focused researchers and capabilities-focused researchers in this area due to different assumptions and that makes each others work not very interesting/relevant to the other cluster.

↑ comment by Riccardo Volpato (riccardo-volpato) · 2020-01-15T09:20:20.548Z · LW(p) · GW(p)

Interesting points. The distinctions you mention could equally apply in distinguishing narrow from ambitious value learning. In fact, I think preference learning is pretty much the same as narrow value learning. Thus, could it be that ambitious value learning research may not be very interested in preference learning to a similar extent in which they are not interested in narrow value learning?

"How important safety concerns" is certainly right, but the story of science teaches us that taking something from a domain with different concerns to another domain has often proven extremely useful.

No comments

Comments sorted by top scores.

What is the relationship between Preference Learning and Value Learning?

Contents

Answers

No comments