Subagent perfect minimax

post by Vanessa Kosoy (vanessa-kosoy) · 2017-01-06T13:47:12.000Z · LW · GW · 0 comments

% operators that are separated from the operand by a space

% operators that require brackets

% operators that require parentheses

% Paper specific

This post continues the study of minimax forecasting.

The minimax decision rule has the pathology that, when events are sufficiently "optimistic", behavior can become highly suboptimal. This is analogous to off-policy irrational behavior in Nash equilibria of games in extensive form. In order to remedy the problem, we introduce a refinement called "subagent perfect minimax," somewhat analogous to subgame perfect equilibria and other related solution concepts. It is possible to prove existence and, when the model factorizes, dynamic consistency.

The proofs are omitted, but we can easily provide them if necessary.

##Motivation

Consider the following class of environments: during the first step, the agent gains 1$ or action which leads to gaining $$0$.

The minimax payoff of this class is 0$ in the first step, choose if you gained 1p=0$) payoff of .

Note that the minimax environment has and only differs from on histories which are impossible in that environment. Moreover, if we considered the class for some , the minimax payoff would be $$(1+\epsilon)$, and \(\pi^*\) would be the sole minimax policy. Therefore, in order to eliminate the pathology, we need to formulate a stability condition that ensures any admissible history is treated as occurring with at least infinitesimal probability.

##Results

Now consider the forecasting setting, with action set , observation set , time discount function and reward function . As before, and define the utility function .

Consider and denote

#Definition 1

Consider finite. is called an -stable minimax policy for when there are sequences and s.t. , and is a minimax policy for

In particular, the definition assumes is non-empty.

#Proposition 1

For any finite, there exists which is an -stable minimax policy for .

#Proposition 2

For any finite and an -stable minimax policy for , is in particular a minimax policy for .

#Proposition 3

Consider finite, an -stable minimax policy for and s.t. . Assume factorizes at into . Denote and the restriction mappings. Define and define by

Then


Note that, thanks to -stability, Proposition 3 doesn't require the condition , as opposed to the case for arbitrary minimax policies (see Proposition 1 here).

#Definition 2

is called a subagent perfect minimax policy for when there is a sequence s.t. each is finite, and , and a sequence s.t. and is an -stable minimax policy for .

#Proposition 4

There exists which is a subagent perfect minimax policy for .

#Proposition 5

For a subagent perfect minimax policy for , is in particular a minimax policy for .

#Proposition 6

Consider a subagent perfect minimax policy for and . Assume factorizes at into . Then

0 comments

Comments sorted by top scores.