Subagent perfect minimax

vanessa-kosoy

Subagent perfect minimax

post by Vanessa Kosoy (vanessa-kosoy) · 2017-01-06T13:47:12.000Z · LW · GW · 0 comments

No comments

% operators that are separated from the operand by a space

% operators that require brackets

% operators that require parentheses

% Paper specific

This post continues the study of minimax forecasting.

The minimax decision rule has the pathology that, when events are sufficiently "optimistic", behavior can become highly suboptimal. This is analogous to off-policy irrational behavior in Nash equilibria of games in extensive form. In order to remedy the problem, we introduce a refinement called "subagent perfect minimax," somewhat analogous to subgame perfect equilibria and other related solution concepts. It is possible to prove existence and, when the model factorizes, dynamic consistency.

The proofs are omitted, but we can easily provide them if necessary.

##Motivation

Consider the following class of environments: during the first step, the agent gains 1$ or action $b$ which leads to gaining $$0$.

The minimax payoff of this class is 0$ in the first step, choose $b$ if you gained 1 $, w h i c h i s t h e s a m e a s t h e w o r s t - c a s e ($ p=0$) payoff of $π^{*}$ .

Note that the minimax environment has $p = 0$ and $π$ only differs from $π^{*}$ on histories which are impossible in that environment. Moreover, if we considered the class $p \in [ϵ, 1]$ for some $ϵ > 0$ , the minimax payoff would be $$(1+\epsilon)$, and $\pi^*$ would be the sole minimax policy. Therefore, in order to eliminate the pathology, we need to formulate a stability condition that ensures any admissible history is treated as occurring with at least infinitesimal probability.

##Results

Now consider the forecasting setting, with action set $A$ , observation set $O$ , time discount function $γ : N \to R^{\geq 0}$ and reward function $r : (A \times O)^{*} \to R$ . As before, $γ$ and $r$ define the utility function $u : (O^{*} \to A) \times O^{ω} \to R$ .

Consider $Φ \in P_{C} (O^{ω})$ and denote

$O_{Φ}^{+} := {x \in O^{+} ∣ \exists μ \in Φ : μ (x O^{ω}) > 0}$

#Definition 1

Consider $X \subseteq O_{Φ}^{+}$ finite. $π^{*} \in P (O^{*} \to A)$ is called an $X$ -stable minimax policy for $Φ$ when there are sequences ${ϵ^{(n)} : X \to (0, 1)}_{n \in N}$ and ${π^{(n)} \in P (O^{*} \to A)}_{n \in N}$ s.t. $ϵ^{(n)} \to 0$ , $π^{(n)} \to π^{*}$ and $π^{(n)}$ is a minimax policy for

$Φ^{(n)} := {μ \in Φ ∣ \forall x \in O^{*}, o \in O : x o \in X ⟹ μ (x o O^{ω}) \geq ϵ^{(n)} (x o) μ (x O^{ω})}$

In particular, the definition assumes $Φ^{(n)}$ is non-empty.

#Proposition 1

For any $X \subseteq O_{Φ}^{+}$ finite, there exists $π^{*} \in P (O^{*} \to A)$ which is an $X$ -stable minimax policy for $Φ$ .

#Proposition 2

For any $X \subseteq O_{Φ}^{+}$ finite and $π^{*} \in P (O^{*} \to A)$ an $X$ -stable minimax policy for $Φ$ , $π^{*}$ is in particular a minimax policy for $Φ$ .

#Proposition 3

Consider $X \subseteq O_{Φ}^{+}$ finite, $π^{*} \in P (O^{*} \to A)$ an $X$ -stable minimax policy for $Φ$ and $x \in O^{*}$ s.t. $\forall y ⊑ x : y \in X \cup {λ_{O}}$ . Assume $Φ$ factorizes at $x$ into $Φ_{1}, Φ_{2} \in P_{C} (O^{ω})$ . Denote ${pr}_{1} : (O^{*} \to A) \to (¯ x O^{*} \to A)$ and ${pr}_{2} : (O^{*} \to A) \to (x O^{*} \to A)$ the restriction mappings. Define $π_{1}^{*} := {pr}_{1 *} π$ and define $π_{2}^{*} : A^{| x |} \to (x O^{*} \to A)$ by

$π_{2}^{*} (t) := {pr}_{2 *} (π^{*} ∣ {s : O^{*} \to A ∣ t_{i} = s (x_{< i})})$

Then

$π_{2}^{*} \in a r g m a x π_{2} : A^{| x |} \to (x O^{*} \to A) min μ \in Φ_{2} E_{(π_{1}^{*} \otimes_{x} π_{2}) \times μ} [u_{> | x |}]$

Note that, thanks to $X$ -stability, Proposition 3 doesn't require the condition ${min}_{μ \in Φ_{1}} μ (x O^{ω}) > 0$ , as opposed to the case for arbitrary minimax policies (see Proposition 1 here).

#Definition 2

$π^{*} \in P (O^{*} \to A)$ is called a subagent perfect minimax policy for $Φ$ when there is a sequence ${X^{(n)} \subseteq O_{Φ}^{+}}_{n \in N}$ s.t. each $X^{(n)}$ is finite, $X^{(n)} \subseteq X^{(n + 1)}$ and $⋃_{n \in N} X^{(n)} = O_{Φ}^{+}$ , and a sequence $π^{(n)} \in P (O^{*} \to A)$ s.t. $π^{(n)} \to π$ and $π^{(n)}$ is an $X^{(n)}$ -stable minimax policy for $Φ$ .

#Proposition 4

There exists $π^{*} \in P (O^{*} \to A)$ which is a subagent perfect minimax policy for $Φ$ .

#Proposition 5

For $π^{*} \in P (O^{*} \to A)$ a subagent perfect minimax policy for $Φ$ , $π^{*}$ is in particular a minimax policy for $Φ$ .

#Proposition 6

Consider $π^{*} \in P (O^{*} \to A)$ a subagent perfect minimax policy for $Φ$ and $x \in O_{Φ}^{+}$ . Assume $Φ$ factorizes at $x$ into $Φ_{1}, Φ_{2} \in P_{C} (O^{ω})$ . Then

$π_{2}^{*} \in a r g m a x π_{2} : A^{| x |} \to (x O^{*} \to A) min μ \in Φ_{2} E_{(π_{1}^{*} \otimes_{x} π_{2}) \times μ} [u_{> | x |}]$

0 comments

Comments sorted by top scores.

Subagent perfect minimax

Contents

0 comments