LDT (and everything else) can be irrational
post by Christopher King (christopher-king) · 2024-11-06T04:05:36.932Z · LW · GW · 6 commentsContents
Symmetrical Ultimatum Game You can't become $9 rock No agent is rational in every problem Implications None 6 comments
you should not reject the 'offer' of a field that yields an 'unfair' amount of grain! - Ultimatum Game (Arbital)
In this post, I demonstrate a problem in which there is an agent that outperforms Logical Decision Theory, and show how for any agent you can construct a problem and competing agent that outperforms it. Defining rationality as winning, this means that no agent is rational in every problem.
Symmetrical Ultimatum Game
We consider a slight variation on the ultimatum game to make it completely symmetrical. The symmetrical ultimatum game is a two-player game in which each players says how much money they want. The amount is a positive integer number of dollars. If the sum is ≤$10, both players get the amount of money they choose. Otherwise, they both get nothing.
Now consider the decision problem of playing the symmetrical ultimatum game against a logical decision theorist.
A casual decision theorist does particularly poorly in this problem, since the LDT agent always chooses $9 leaving the casual decision theorist with $1.
How does a LDT agent fare? Well, logical decision theory is still a bit underspecified. However, notice that this question reduces to "how does a LDT agent do against a LDT agent in a symmetrical game?". Without knowing any details about LDT, we must conclude that the expected value is at most $5.
What about a rock with $9 painted on it? The LDT agent in the problem reasons that the best action is to choose $1, so the rock gets $9.
Thus, $9 rock is more rational than LDT in this problem. □
You can't become $9 rock
Now, what makes this problem particularly difficult is how picky the LDT agent in the problem is. If based on the previous you decide to "become $9 rock", the LDT agent will defect against you. If based on the previous section you build a robot that always chooses $9, the LDT agent will defect against that robot. Only a truly natural $9 rock can win.
No agent is rational in every problem
Consider an agent X. There are two cases:
- Against $9 rock, X always chooses $1. Consider the problem "symmetrical ultimatum game against X". By symmetry, X on average can get at most $5. But $9 rock always gets $9. So $9 rock is more rational than X.
- Against $9 rock, X sometimes chooses more than $1 (thus getting nothing). Consider the problem "symmetrical ultimatum game against $9 rock". X on average gets less than $1. But an agent that always picks $1 (that is, a $1 rock) always gets $1. So $1 rock is more rational than X.
□
Implications
I still have an intuition that LDT is the "best" decision theory so far. See
Integrity for consequentialists [EA · GW] for practical benefits of a LDT style of decision making.
However, there can be no theorem that LDT is always rational, since it isn't. And replacing LDT with a different agent can not fix the problem. Notice that, as a special case, humans can never be rational.
This seems to suggest some sort of reformulation of rationality is needed. For example, given LDT's reasonableness, one option is to violate the thesis of Newcomb's Problem and Regret of Rationality [LW · GW] and simply define rationality to be LDT.
6 comments
Comments sorted by top scores.
comment by quetzal_rainbow · 2024-11-06T06:16:57.023Z · LW(p) · GW(p)
It's just no free lunch theorem? For every computable decision procedure you can construct environment which predicts exact output for this decision procedure and reacts in way of maximum damage, making decision procedure to perform worse than random action selection.
Replies from: christopher-king↑ comment by Christopher King (christopher-king) · 2024-11-06T16:40:01.071Z · LW(p) · GW(p)
Yes this would be a no free lunch theorem for decision theory.
It is different from the "No free lunch in search and optimization" theorem though. I think people had an intuition that LDT will never regret its decision theory, because if there is a better decision theory than LDT will just copy it. You can think of this as LDT acting as tho it could self-modify. So the belief (which I am debunking) is that the environment can never punish the LDT agent; it just pretends to be the environment's favorite agent.
The issue with this argument is that in the problem I published above, the problem itself contains a LDT agent, and that LDT agent can "punish" the first for acting like, or even pre-committing to, or even literally self-modifying to become $9 rock. It knows that the first agent didn't have to do that.
So the first LDT agent will literally regret not being hardcoded to "output $9".
This is very robust to what we "allow" agents to do (can they predict each other, how accurately can they predict each other, what counterfactuals are legit or not, etc...), because no matter what the rules are you can't get more than $5 in expectation in a mirror match.
comment by Mikhail Samin (mikhail-samin) · 2024-11-07T10:52:45.140Z · LW(p) · GW(p)
Playing ultimatum game against an agent that gives in to $9 from rocks but not from us is not in the fair problem class, as the payoffs depend directly on our algorithm and not just on our choices and policies.
https://arbital.com/p/fair_problem_class/
A simpler game is “if you implement or have ever implemented LDT, you get $0; otherwise, you get $100”.
LDT decision theories are probably the best decision theories for problems in the fair problem class.
(Very cool that you’ve arrived at the idea of this post independently!)
Replies from: christopher-king↑ comment by Christopher King (christopher-king) · 2024-11-07T14:40:08.077Z · LW(p) · GW(p)
LDT decision theories are probably the best decision theories for problems in the fair problem class.
The post demonstrates why this statement is misleading.
If "play the ultimatum game against a LDT agent" is not in the fair problem class, I'd say that LDT shouldn't be in the "fair agent class". It is like saying that in a tortoise-only race, the best racer is a hare because a hare can beat all the tortoises.
So based on the definitions you gave I'd classify "LDT is the best decision theory for problems in the fair problem class" as not even wrong.
In particular, consider a class of allowable problems S, but then also say that an agent X is allowable only if "play a given game with X" is in S. Then the proof in the No agent is rational in every problem section of my proof goes through for allowable agents. (Note that that argument in that section is general enough to apply to agents that don't give into $9 rock.)
Practically speaking: if you're trying to follow decision theory X, than playing against other X is a reasonable problem
Replies from: mikhail-samin↑ comment by Mikhail Samin (mikhail-samin) · 2024-11-07T15:56:33.803Z · LW(p) · GW(p)
It’s reasonable to consider two agents playing against each other. “Playing against your copy” is a reasonable problem. ($9 rocks get 0 in this problem, LDTs probably get $5.)
Newcomb, Parfit’s hitchhiker, smoking, etc. are all very reasonable problems that essentially depend on the buttons you press when you play the game. It is important to get these problems right.
But playing against LDT is not necessarily in the “fair problem class” because the game might behave differently depending on your algorithm/on how you arrive at taking actions, and not just depending on your actions.
Your version of it- playing against an LDT- is indeed different from playing against a game that looks at whether we’re an alphabetizing agent and pick X instead of Y because X<Y and not because we looked at the expected utility: we would want LDT to perform optimally in this game. But the reason LDT-created-rock loses to a natural rock here isn’t fundamentally different from the reason LDT loses to an alphabetizing agent in the other game and it is known that you can construct a game like that where LDT will lose to something else. You can make the game description sound more natural, but I feel like there’s a sharp divide between the “fair problem class” problems and others.
(I also think that in real life, where this game might play out, there isn’t really a choice we can make, to make our AI a $9 rock instead of an LDT agent; because when we do that due to the rock’s better performance in this game, our rock gets slightly less than $5 in EV instead of getting $9; LDT doesn’t perform worse than other agents we could’ve chosen in this game.)
comment by Dagon · 2024-11-06T17:04:59.692Z · LW(p) · GW(p)
This is an important theorem. There is no perfect decision theory, especially against equal-or-better opponents. I tend to frame it as "the better predictor wins". Almost all such adversarial/fixed-sum cases are about power, not fairness or static strategy/mechanism.
We (humans, including very smart theorists) REALLY want to frame it as clever ways to get outcomes that fit our intuitions. But it's still all about "who goes first (in the logical/credible-committment sense)".