Posts

Comments

Comment by Stephen Zhao (stephen-zhao) on Rock, Paper and Scissors: A Game Theory View · 2023-01-22T02:38:11.319Z · LW · GW

The first game is the prisoner's dilemma if you read the payoffs as player A/B, which is a bit different from how it's normally presented.

And yes, prisoner's dilemma is not zero sum.

Comment by Stephen Zhao (stephen-zhao) on Empowerment is (almost) All We Need · 2022-11-12T19:25:36.843Z · LW · GW

Short comment on the last point - euthanasia is legal in several countries (thus wanting to die is not prevented, and even socially accepted) and in my opinion the moral choice of action in certain situations.

Comment by Stephen Zhao (stephen-zhao) on Empowerment is (almost) All We Need · 2022-11-12T19:04:58.989Z · LW · GW

Thanks for your response - good points and food for thought there. 

One of my points is that this is a problem which arises depending on your formulation of empowerment, and so you have to be very careful with the way in which you mathematically formulate and implement empowerment. If you use a naive implementation I think it is very likely that you get undesirable behaviour (and that's why I linked the AvE paper as an example of what can happen). 

Also related is that it's tricky to define what the "reasonable future time cutoff" is. I don't think this is trivial to solve - use too short of a cutoff, and your empowerment is too myopic. Use too long of a cut-off, and your model stops you from ever spending your money, and always gets you to hoard more money. If you use a hard coded x amount of time, then you have edge cases around your cut-off time. You might need a dynamic time cutoff then, and I don't think that's trivial to implement.

I also disagree with the characterization of the issue in the AvE paper just being a hyperparameter issue. Correct me if I am wrong here (as I may have misrepresented/misinterpreted the general gist of ideas and comments on this front) - I believe a key idea around human empowerment is that we can focus on maximally empowering humans - almost like human empowerment is a "safe" target for optimization in some sense. I disagree with this idea, precisely because examples like in AvE show that too much human empowerment can be bad. The critical point I wanted to get across here is that human empowerment is not a safe target for optimization.

Also, the other key point related to the examples like the submarine, protest, and suicide is that empowerment can sometimes be in conflict with our reward/utility/desires. The suicide example is the best illustrator of this (and it seems not too far-fetched to imagine someone who wants to suicide, but can't, and then feels increasingly worse - which seems like quite a nightmare scenario to me). Again, empowerment by itself isn't enough to have desirable outcomes; you need some tradeoff with the utility/reward/desires of humans - empowerment is hardly all (or almost all) that you need.

To summarize the points I wanted to get across:

  1. Unless you are very careful with the specifics of your formulation of human empowerment, it very likely will result in bad outcomes. There are lots of implementation details to be considered (even beyond everything you mentioned in your post).
  2. Human empowerment is not a safe target for optimization/maximization. I think this holds even if you have a careful definition of human empowerment (though I would be very happy to be proven wrong on this).
  3. Human empowerment can be in conflict with human utility/desires, best illustrated by the suicide example. Therefore, I think human empowerment could be helpful for alignment, but am very skeptical it is almost all you need. 

Edit: I just realized there are some other comments by other commenters that point out similar lines of reasoning to my third point. I think this is a critical issue with the human empowerment framework and want to highlight it a bit more, specifically highlighting JenniferRM's suicide example which I think is the example that most vividly demonstrates the issue (my scenarios also point to the same issue, but aren't as clear of a demonstration of the problem).

Comment by Stephen Zhao (stephen-zhao) on Empowerment is (almost) All We Need · 2022-11-11T01:29:49.072Z · LW · GW

I think JenniferRM's comment regarding suicide raises a critical issue with human empowerment, one that I thought of before and talked with a few people about but never published. I figure I may as well write out my thoughts here since I'm probably not going to do a human empowerment research project (I almost did; this issue is one reason I didn't). 

The biggest problem I see with human empowerment is that humans do not always want to maximally empowered at every point in time. The suicide example is a great example, but not the only one. Other examples I came up with include: tourists who go on a submarine trip deep in the ocean, or environmentalists who volunteer to be tied to a tree as part of a protest. Fundamentally, the issue is that at some point, we want to be able to commit to a decision and its associated consequences, even if it comes at the cost of our empowerment. 

There is even empirical support for this issue with human empowerment. In the paper Assistance Via Empowerment (https://proceedings.neurips.cc/paper/2020/file/30de9ece7cf3790c8c39ccff1a044209-Paper.pdf), the authors use a reinforcement learning agent trained with a mix of the original RL reward and a human empowerment term as a co-pilot on LunarLander, to help human agents land the LunarLander craft without crashing. They find that if the coefficient on the human empowerment term is too high, "the copilot tends to override the pilot and focus only on hovering in the air". This is exactly the problem above; focusing only on empowerment (in a naive empowerment formulation) can easily lead to the AI preventing us from achieving certain goals we may wish to achieve. In the case of LunarLander in the paper, we want to land, but the AI may stop us, because by getting closer to the ground for landing, we've reduced our empowerment. 

It may be that current formulations of empowerment are too naive, and could possibly be reworked or extended to deal with this issue. E.g. you might try to have a human empowerment mode, and then a human assistance mode that focuses not on empowerment but on inferring the human's goal and trying to assist with it; and then some higher level module detects when a human intends to commit to a course of action. But this seems problematic for many other reasons (including those covered in other discussions about alignment).

Overall, I like the idea of human empowerment, but greatly disagree with the idea that human empowerment (especially using the current simple math formulations I've seen) is all we need.