TASP Ep 3 - Optimal Policies Tend to Seek Power

post by Quinn (quinn-dougherty) · 2021-03-11T01:44:02.814Z · LW · GW · 0 comments

This is a link post for https://technical-ai-safety.libsyn.com/3-optimal-policies-tend-to-seek-power

Contents

    Show notes:
  Optimal Policies Tend to Seek Power
None
No comments

Welcome to the Technical AI Safety Podcast, the show where I interview computer scientists about their papers. This month I covered Optimal Policies Tend to Seek Power, which is closely related to Seeking Power is Often Robustly Instrumental in MDPs [LW · GW] which is a part of the Reframing Impact [? · GW] sequence and was recently a part of the 2019 review [LW · GW].

The point of the show is to make papers more parsable, the interview features a detailed walkthrough padded on either side by discussion of where the work came from and where it's going.

I had a lot of fun doing this month's episode, a tricky paper to wrap my head around but very rewarding. Do let me know if you have trouble finding it on your favorite podcast app, thanks!

Show notes:

With Alex Turner

Feedback form

Request an episode

Optimal Policies Tend to Seek Power

by Alexander Matt Turner, Logan Smith, Rohin Shah, Andrew Critch, Prasad Tadepalli

Abstract:

Some researchers have speculated that capable reinforcement learning agents are often incentivized to seek resources and power in pursuit of their objectives. While seeking power in order to optimize a misspecified objective, agents might be incentivized to behave in undesirable ways, including rationally preventing deactivation and correction. Others have voiced skepticism: human power-seeking instincts seem idiosyncratic, and these urges need not be present in reinforcement learning agents. We formalize a notion of power within the context of Markov decision processes. With respect to a class of neutral reward function distributions, we provide sufficient conditions for when optimal policies tend to seek power over the environment.

What Counts as Defection? [LW · GW]

Non-Obstruction [LW · GW]

0 comments

Comments sorted by top scores.