AvE: Assistance via Empowerment

post by FactorialCode · 2020-06-30T22:07:50.220Z · LW · GW · 1 comments

This is a link post for https://arxiv.org/abs/2006.14796

This might be relevant to the AI safety crowd. Key quote:

"Our key insight is that agents can assist humans without inferring their goals or limiting their autonomy by instead increasing the human’s controllability of their environment – in other words, their ability to affect the environment through actions. We capture this via empowerment, an information-theoretic quantity that is a measure of the controllability of a state through calculating the logarithm of the number of possible distinguishable future states that are reachable from the initial state [41]. In our method, Assistance via Empowerment (AvE), we formalize the learning of assistive agents as an augmentation of reinforcement learning with a measure of human empowerment. The intuition behind our method is that by prioritizing agent actions that increase the human’s empowerment, we are enabling the human to more easily reach whichever goal they want. Thus, we are assisting the human without information about their goal[...]Without any information or prior assumptions about the human’s goals or intentions, our agents can still learn to assist humans."[Emphasis and omissions are mine]

From the abstract: One difficulty in using artificial agents for human-assistive applications lies in the challenge of accurately assisting with a person's goal(s). Existing methods tend to rely on inferring the human's goal, which is challenging when there are many potential goals or when the set of candidate goals is difficult to identify. We propose a new paradigm for assistance by instead increasing the human's ability to control their environment, and formalize this approach by augmenting reinforcement learning with human empowerment. This task-agnostic objective preserves the person's autonomy and ability to achieve any eventual state. We test our approach against assistance based on goal inference, highlighting scenarios where our method overcomes failure modes stemming from goal ambiguity or misspecification. As existing methods for estimating empowerment in continuous domains are computationally hard, precluding its use in real time learned assistance, we also propose an efficient empowerment-inspired proxy metric. Using this, we are able to successfully demonstrate our method in a shared autonomy user study for a challenging simulated teleoperation task with human-in-the-loop training.

How does this fit in with other control problem approaches [LW · GW]? What is the relationship between this and Turner's power formalism [LW · GW]?

They also carried out a survey that didn't look like it made it into the paper, but shows up on the project web page: https://sites.google.com/berkeley.edu/ave/home

1 comments

Comments sorted by top scores.

comment by TurnTrout · 2020-07-01T03:24:56.612Z · LW(p) · GW(p)

I'm a big fan of this, conceptually (will read the paper tomorrow morning). Attainable utility preservation is secretly trying to preserve human power. [LW · GW] As a nitpick, though, they should probably approximate "average goal achievement ability" instead of empowerment (for formal reasons outlined in Appendix A of Optimal Farsighted Agents Tend to Seek Power). [LW(p) · GW(p)]

As I've written previously [LW(p) · GW(p)], if we could build competitive agents which reliably increased human control-over-the-future, I think that would be pretty damn good. Don't worry about CEV for now - let's just get into a stable future. 

But, getting accurate models of humans seems difficult, and human power is best measured with respect to the policies which our cognitive algorithms can actually discover (I recently gave a curated talk on this - transcript coming soon). Assuming optimality could create weird incentives, but maybe the paper has something to say about that.

All in all, I don't feel optimistic about AvE-like approaches actually scaling to superhuman, if they need to explicitly pick out a human from the environment.