comment by Chantiel ·
2021-08-30T19:56:57.788Z · LW(p) · GW(p)
I'm questioning whether we would actually want to use Updateless Decision Theory, Functional Decision Theory, or future decision theories like them.
I think that in sufficiently extreme cases, I would act according to Evidential Decision Theory and not according something like UDT, FDT, or any similar successor. And I think I would continue to want to take the evidential decision theoretic-recommended action instead even if I had arbitrarily high intelligence, willpower, and had infinitely long to think about it. And, though I'd like to hear others' thoughts on this, I suspect others would do the same.
I'll provide an example of when this would happen.
Before that, consider regular XOR extortion: You get a message from a truthworthy predictor that says, "I will send you this message if you send me $10, or if your house is about to be ruined by carpenter ants, but not if both happen." UDT and FDT recommend not paying them money. And if I were in that situation, I bet I wouldn't pay, either.
However, imagine changing the XOR extortion to be as follows: the message now says "I will send you this message if you send me $10, or if you and all your family and friends will be severely tortured until heat death, but not both.
In that situation, I'd pay the $10, assuming the probability of the torture actually happening is significant. But FDT and UDT would, I think, recommend not paying it.
And I don't think it's irrational I'd pay.
Feel free to correct me, but the main reasons people seem to like UDT and FDT is that agents that use it would "on average" perform better than those using other decision theories, in fair circumstances. And sure, the average agent implementing a decision policy that says to not pay would probably get higher utility in expectation than the average agent would would pay, due to spending less money paying up from extortion. And that by giving in to the extortion, agents that implement approximately the same decision procedure I do would on average get less utility.
And I think the face that UDT and FDT agents systematically outperform arbitary EDT agents is something that matters to me. But still, I only care about it my actions conforming the best-performing decision theories to so a limited extent. What I really, really care about is not having me, the actual, current me, be sent to a horrible fate filled with eternal agony. I think my dread of this would be enough to make me pay the $10, despite any sort of abstract argument in favor of not paying.
So I wouldn't take the action UDT or FDT would recommend, and would just use evidential decision theory. This makes me question whether we should use something like UDT or FDT when actually making AI. Suppose UDT recommended the AI take some action a. And suppose it was foreseeable that, though such a percept-action mapping would perform well in general, for us it would totally give us the short end of the stick. For example, suppose it said to not give in to some form of extortion, even though if we didn't we would all get tortured until heat death. Would we really want the AI to go not pay up, and then get us all tortured?
I'm talked previously [LW(p) · GW(p)] about how evidential decision theory can be used to emulate the actions of an arbitrary agent using a more "advanced" decision theory by just defining terminal values on the truth value of mathematical objects representing answers to the question of what would have happened in other hypothetical situations. For example, you could make an Evidential Decision Theory agent act similarly to a UDT agent in non-extreme cases by placing making its utility function place high value to the answer to a question something like, "if you imagine a formal reasoning system and you have it condition on the statement <insert mathematical description of my decision procedure> results in recommending the percept-action mapping m, then a priori agents in general with my utility function would get expected utility of x".
This way, we can still make decisions that would score reasonably highly according to UDT and FDT, while not being obligated to get ourselves tortured.
Also, it seems to me that UDT and FDT are all about, basically, in some situations making yourself knowably worse-off than you could have, roughly because agents in general who would take the action in that situation would get higher utility in expectation. I want to say that these sorts of procedures seem concerningly hackable. In principle, other opportunistic civilizations could create agents any circumstances in order to change the best percept-action mapping to use a priori and thus change what AI's on Earth could use.
I provide a method to "hack" UDT here [LW(p) · GW(p)]. Wei Dai agreed that it was a reasonable concern in private conversation.
This is why I'm skeptical about the value of UDT, FDT, and related theories, and think that perhaps we would be best off just sticking with EDT but with terminal values that can be used to approximately emulate the other decision theories when we would like to.
I haven't heard these considerations mentioned before, so I'm interested in links to any previous discussion or comments explaining what you think of it.