An algorithm with preferences: from zero to one variable

post by Stuart_Armstrong · 2016-11-15T15:14:01.313Z · LW · GW · Legacy · 2 comments

A simple way of thinking that I feel clarifies a lot of issues (related to Blue-Minimizing Robot):

Suppose you have an entity H that follows algorithm alH . Then define:

The interpretation part of wants is crucial, but it is often obscured in practice in value learning. That's because we often start with things like `H is a boundedly rational agent that maximises u...', or we lay out the agent in such a way that that's clearly the case.

What we're doing there is writing the entity as alH(u) --- an algorithm with a special variable u that tracks what the entity wants. In the case of cooperative inverse reinforcement learning, this is explicit, as the human's values are given by a θ, known to the human. Thus the human's true algorithm is alH(.), the human observes θ, meaning that θ is an objective fact about the universe. And then the human follows alH(θ).

Note here that knowing what the human is in the one-variable sense (i.e. knowing alH(.)) helps with the correct deduction about what they want, while simply knowing the joint alH(θ) does not.

In contrast an interpretation starts with a zero-variable algorithm, and attempts to construct a one-variable one. There for, given alH it constructs (one or more) alHi(.) and ui such that

This illustrates the crucial role of interpretation, especially if alH is highly complex.

 

2 comments

Comments sorted by top scores.

comment by entirelyuseless · 2016-11-15T15:56:04.307Z · LW(p) · GW(p)

If H is an entity, it is not an algorithm. The algorithm is just one aspect of the thing. You can also see that in terms of what it does; any entity will affect its environment in many ways that have nothing to do with any algorithm which it has.

Replies from: Stuart_Armstrong
comment by Stuart_Armstrong · 2016-11-15T16:56:32.816Z · LW(p) · GW(p)

Yeah. I'm not too fused about the definitions of doing and being (you can restrict down to the entity itself if you want). It's the "wanting" that I'm focusing on here.