Optimal predictors for global probability measures

vanessa-kosoy

Optimal predictors for global probability measures

post by Vanessa Kosoy (vanessa-kosoy) · 2015-10-06T17:40:19.000Z · LW · GW · 0 comments

    Results
  New notation
  Definition 1
  Definition 2
  Definition 3
  Theorem 1
  Theorem 2
  Theorem 3
  Definition 4
  Definition 5
  Theorem 4
  Theorem 5
  Theorem 6
  Definition 6
  Theorem 7
  Definition 7
  Note 1
  Theorem 8
  Defintion 8
  Theorem 9
  Theorem 10
    Appendix
  Definition 9
  Lemma 1
  Lemma 2
  Lemma 3
  Lemma 4
  Proof of Lemma 4
  Lemma 5
  Proof of Lemma 5
  Proposition 1
  Proof of Proposition 1
  Proof of Theorem 4
  Lemma 6
  Proof of Lemma 6
  Proposition 2
  Proof of Proposition 2
  Proof of Theorem 8
  Proposition 3
  Proof of Proposition 3
  Proposition 4
  Proof of Proposition 4
  Lemma 7
  Proof of Lemma 7
  Proposition 5
  Proof of Proposition 5
  Proof of Theorem 10
None
No comments

There are two commonly used formalisms in average case complexity theory: problems with a single global probability measure on parameter space or problems with a family of such probability measures. Previous posts about optimal predictors focused on the family approach. In this post we give the formalism for global measures.

Results

New notation

$1$ will denote the one-element set.

Given $μ$ a probability measure on ${0, 1}^{*}$ , $X$ a set, $Q : {0, 1}^{*}^{2} a l g - \to X$ , $T_{Q}^{μ} (s)$ stands for the maximal runtime of $Q (x, y)$ for $x \in supp μ$ , $y \in {0, 1}^{s}$ .

Definition 1

A unidistributional estimation problem is a pair $(f, μ)$ where $μ$ is a probability measure on ${0, 1}^{*}$ and $f : supp μ \to [0, 1]$ is an arbitrary function.

Definition 2

Given appropriate sets $X$ , $Y$ , consider $P : N \times X \times {0, 1}^{*}^{2} a l g - \to Y$ , $r : N \to N$ polynomial and $a : N \to {0, 1}^{*}$ . The triple $^P = (P, r, a)$ is called a $(p o l y, l o g)$ -scheme of signature $X \to Y$ when

(i) The runtime of $P (j, x, y, z)$ is bounded by $p (j)$ with $p$ polynomial.

(ii) $| a (j) | \leq c_{1} + c_{2} log (j + 1)$ for some $c_{1}, c_{2} \in N$ .

A $(p o l y, l o g)$ -scheme of signature ${0, 1}^{*} \to [0, 1]$ will also be called a $(p o l y, l o g)$ -predictor.

Given $j \in N$ , $x \in X$ , ${^P}^{j} (x)$ will denote the $Y$ -valued random variable $P (j, x, y, a (j))$ where $y$ is sampled from the probability measure $U^{r (j)}$ . We will also use the notation ${^P}^{j} (x, y) := P (j, x, y, a (j))$ .

Fix $Δ$ an error space of rank 1.

Definition 3

Consider $(f, μ)$ a unidistributional estimation problem and $^P = (P, r, a)$ a $(p o l y, l o g)$ -predictor. $^P$ is called a $Δ (p o l y, l o g)$ -optimal predictor for $(f, μ)$ when for any $(p o l y, l o g)$ -predictor $^Q = (Q, s, b)$ , there is $δ \in Δ$ s.t.

$E_{μ \times U^{r (j)}} [({^P}^{j} (x) - f (x))^{2}] \leq E_{μ \times U^{s (j)}} [({^Q}^{j} (x) - f (x))^{2}] + δ (j)$

$Δ (p o l y, l o g)$ -optimal predictors for unidistributional problems have properties and existence theorems analogical to $Δ (p o l y, l o g)$ -optimal predictor schemes, where the role of the rank 2 error space $Δ_{a v g}^{2}$ is taken by the rank 1 error space $Δ_{l l}^{1}$ (see Definition 8). The theorems are listed below. The proofs of Theorem 4, Theorem 8 (which is stated stronger than previous analogues) and Theorem 10 are given in the appendix. Adapting the other proofs is straightforward.

Theorem 1

Suppose $(j + 1)^{- 1} \in Δ$ . Consider $(f, μ)$ a unidistributional estimation problem and $^P = (P, r, a)$ a $Δ (p o l y, l o g)$ -optimal predictor for $(f, μ)$ . Suppose ${p_{j} \in [0, 1]}_{j \in N}$ , ${q_{j} \in [0, 1]}_{j \in N}$ are s.t.

$∃ϵ>0∀j:(μ×Ur(j)){(x,y)∈{0,1}∗2∣pj≤^Pj(x,y)≤qj}≥ϵ$

Define

$ϕ_{j} := E_{μ \times U^{r (j)}} [f (x) - {^P}^{j} (x, y) ∣ p_{j} \leq {^P}^{j} (x, y) \leq q_{j}]$

Assume that either $p_{j}, q_{j}$ have a number of digits logarithmically bounded in $j$ or $P^{j}$ produces outputs with a number of digits logarithmically bounded in $j$ (by Theorem A.7 if any $Δ (p o l y, l o g)$ -optimal predictor exists for $(f, μ)$ then a $Δ (p o l y, l o g)$ -optimal predictor with this property exists as well). Then, $| ϕ | \in Δ$ .

Theorem 2

Consider $μ$ a probability measure on ${0, 1}^{*}$ and $f_{1}, f_{2} : supp μ \to [0, 1]$ s.t. $f_{1} + f_{2} \leq 1$ . Suppose ${^P}_{1}$ is a $Δ (p o l y, l o g)$ -optimal predictor for $(f_{1}, μ)$ and ${^P}_{2}$ is a $Δ (p o l y, l o g)$ -optimal predictor for $(f_{2}, μ)$ . Then $η ({^P}_{1} + {^P}_{2})$ is a $Δ (p o l y, l o g)$ -optimal predictor for $(f_{1} + f_{2}, μ)$ .

Theorem 3

Consider $μ$ a probability measure on ${0, 1}^{*}$ and $f_{1}, f_{2} : supp μ \to [0, 1]$ s.t. $f_{1} + f_{2} \leq 1$ . Suppose ${^P}_{1}$ is a $Δ (p o l y, l o g)$ -optimal predictor for $(f_{1}, μ)$ and $^P$ is a $Δ (p o l y, l o g)$ -optimal predictor for $(f_{1} + f_{2}, μ)$ . Then, $η (^P - {^P}_{1})$ is a $Δ (p o l y, l o g)$ -optimal predictor for $(f_{2}, μ)$ .

Definition 4

Fix $Δ$ an error space of rank 1. A probability measure $μ$ on ${0, 1}^{*}$ is called $Δ (l o g)$ -sampleable when there is a $(p o l y, l o g)$ -scheme $^S$ of signature $1 \to {0, 1}^{*}$ such that

$∑x∈{0,1}∗|μ(x)−Pr[^Sk=x]|∈Δ$

$^S$ is called a $Δ (l o g)$ -sampler for $μ$ .

Definition 5

Consider $Δ$ an error space of rank 1. A unidistributional estimation problem $(f, μ)$ is called $Δ (l o g)$ -generatable when there is a $(p o l y, l o g)$ -scheme of $^G$ of signature $1 \to {0, 1}^{*} \times [0, 1]$ such that

(i) ${^G}_{1}$ is a $Δ (l o g)$ -sampler for $μ$ .

(ii) $E [({^G}_{2}^{k} - f ({^G}_{1}^{k}))^{2}] \in Δ$

$^G$ is called a $Δ (l o g)$ -generator for $(f, μ)$ .

Theorem 4

Consider $(f_{1}, μ_{1})$ , $(f_{2}, μ_{2})$ unidistributional estimation problems with respective $Δ (p o l y, l o g)$ -optimal predictors ${^P}_{1}$ and ${^P}_{2}$ . Assume $μ_{1}$ is $Δ (l o g)$ -sampleable and $(f_{2}, μ_{2})$ is $Δ (l o g)$ -generatable. Define $^P$ by ${^P}^{j} ((x_{1}, x_{2})) := {^P}_{1}^{j} (x_{1}) {^P}_{2}^{j} (x_{2})$ . Then, $^P$ is a $Δ (p o l y, l o g)$ -optimal predictor for $(f_{1} \times f_{2}, μ_{1} \times μ_{2})$ .

Theorem 5

Consider $μ$ a probability measure on ${0, 1}^{*}$ and $f : supp μ \to [0, 1]$ , $D \subseteq supp μ$ . Assume ${^P}_{D}$ is a $Δ (p o l y, l o g)$ -optimal predictor for $(χ_{D}, μ)$ and ${^P}_{f ∣ D}$ is a $Δ (p o l y, l o g)$ -optimal predictor for $(f, μ ∣ D)$ . Define $^P$ by ${^P}^{j} (x) := {^P}_{D}^{j} (x) {^P}_{f ∣ D}^{j} (x)$ . Then ${^P}^{j} (x)$ is a $Δ (p o l y, l o g)$ -optimal predictor for $(χ_{D} f, μ)$ .

Theorem 6

Fix $h$ a polynomial s.t. $2^{- h} \in Δ$ . Consider $μ$ a probability measure on ${0, 1}^{*}$ , $f : supp μ \to [0, 1]$ and $D \subseteq supp μ$ non-empty. Assume ${^P}_{D}$ is a $Δ (p o l y, l o g)$ -optimal predictor for $(χ_{D}, μ)$ and ${^P}_{χ_{D} f}$ is a $Δ (p o l y, l o g)$ -optimal predictor for $(χ_{D} f, μ)$ . Define ${^P}_{f ∣ D}$ by

${^P}_{f ∣ D}^{j} (x) := ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ \begin{matrix} 1 & if {^P}_{D}^{j} (x) = 0 η (\frac{{^P}_{χ_{D} f}^{j} (x)}{{^P}_{D}^{j} (x)}) & rounded to h (j) binary places if {^P}_{D}^{j} (x) > 0 \end{matrix}$

Then, ${^P}_{f ∣ D}$ is a $Δ (p o l y, l o g)$ -optimal predictor for $(f, μ ∣ D)$ .

Definition 6

Consider $μ$ a probability measure on ${0, 1}^{*}$ and ${^Q}_{1} = (Q_{1}, s_{1}, b_{1})$ , ${^Q}_{2} = (Q_{2}, s_{2}, b_{2})$ $(p o l y, l o g)$ -predictors. We say ${^Q}_{1}$ is $Δ$ -similar to ${^Q}_{2}$ relative to $μ$ (denoted ${^Q}_{1} μ ≃ Δ {^Q}_{2}$ ) when $E_{μ \times U^{s_{1} (k, j)} \times U^{s_{2} (k, j)}} [({^Q}_{1}^{j} (x, y_{1}) - {^Q}_{2}^{j} (x, y_{2})^{2}] \in Δ$ .

Theorem 7

Consider $(f, μ)$ a unidistributional estimation problem, $^P$ a $Δ (p o l y, l o g)$ -optimal predictor for $(f, μ)$ and $^Q$ a $(p o l y, l o g)$ -predictor. Then, $^Q$ is a $Δ (p o l y, l o g)$ -optimal predictor for $(f, μ)$ if and only if $^P μ ≃ Δ^Q$ .

Definition 7

Consider $(f, μ)$ , $(g, ν)$ unidistributional estimation problems, $^ζ = (ζ, r_{ζ}, a_{ζ})$ a $(p o l y, l o g)$ -scheme of signature ${0, 1}^{*} \to {0, 1}^{*}$ . $^ζ$ is called a $Δ$ -pseudo-invertible reduction of $(f, μ)$ to $(g, ν)$ when the following conditions hold:

(i) $E_{μ \times U^{r_{ζ} (j)}} [(g ({^ζ}^{j} (x)) - f (x))^{2}] \in Δ$

(ii) $P r_{μ \times U^{r_{ζ} (j)}} [ν ({^ζ}^{j} (x)) = 0] \in Δ$

(iii) There is $M > 0$ and $^R = (R, r_{R}, a_{R})$ a $(p o l y, l o g)$ -scheme of signature ${0, 1}^{*} \to Q \cap [0, M]$ s.t.

$E_{ν \times U^{r_{R} (j)}} [({^R}^{j} (y) - \frac{P r_{μ \times U^{r_{ζ} (j)}} [{^ζ}^{j} (x) = y]}{ν (y)})^{2}] \in Δ$

(iv) There is a $(p o l y, l o g)$ -scheme $^ξ = (ξ, r_{ξ}, a_{ξ})$ of signature ${0, 1}^{*} \to {0, 1}^{*}$ s.t.

$Eμ×Urζ(k)[∑x′∈{0,1}∗|PrUrξ(k)[^ξk(^ζk(x,z),w)=x′]−Prμ×Urζ(k)[x′′=x′∣^ζk(x′′,z′)=^ζk(x,z)]|]∈Δ$

Such $^ξ$ is called a $Δ$ -pseudo-inverse of $^ζ$ .

Note 1

The conditions of Definition 7 are weaker than corresponding definitions in previous posts in the sense that exact equalities were replaced by approximate equalities with error bounds related to $Δ$ . However, analogous relaxations can be done in the "multidistributional" theory too.

Theorem 8

Suppose $(j + 1)^{- 1} \in Δ$ . Consider $(f, μ)$ , $(g, ν)$ unidistributional estimation problems, $^ζ$ a $Δ$ -pseudo-invertible reduction of $(f, μ)$ to $(g, ν)$ and ${^P}_{g}$ a $Δ (p o l y, l o g)$ -optimal predictor for $(g, ν)$ . Define ${^P}_{f}$ by ${^P}_{f}^{j} (x) := {^P}_{g}^{j} ({^ζ}^{j} (x))$ . Then, ${^P}_{f}$ is a $Δ (p o l y, l o g)$ -optimal predictor for $(f, μ)$ .

Defintion 8

$Δ_{l l}^{1}$ is the set of bounded functions $δ : N \to R^{\geq 0}$ s.t.

$\exists ϵ > 0 : lim j \to \infty (log log j)^{ϵ} δ (j) = 0$

It is easily seen that $Δ_{l l}^{1}$ is a rank 1 error space.

Theorem 9

Consider $(f, μ)$ a unidistributional estimation problem. Define $Υ : N \times {0, 1}^{*}^{3} a l g - \to [0, 1]$ by

$Υ^{j} (x, y, Q) := β (e v^{j} (Q, x, y))$

Define $υ_{f, μ} : N \to {0, 1}^{*}$ by

$υ_{f, μ}^{j} := a r g m i n | Q | \leq log j E_{μ \times U^{j}} [(Υ^{j} (x, y, Q) - f (x))^{2}]$

Then, $(Υ, j, υ_{f, μ})$ is a $Δ_{l l}^{1} (p o l y, l o g)$ -optimal predictor for $(f, μ)$ .

Theorem 10

There is an oracle machine $Λ$ that accepts an oracle of signature $O : N \times {0, 1}^{*} \to {0, 1}^{*} \times [0, 1]$ and a polynomial $r : N \to N$ where the allowed oracle calls are $O^{k} (x)$ for $| x | = r (k)$ and computes a function of signature $N \times {0, 1}^{*}^{2} \to [0, 1]$ s.t. for any $(f, μ)$ a unidistributional estimation problem and $^G$ a corresponding $Δ_{l l}^{1} (l o g)$ -generator, $Λ [^G]$ is a $Δ_{l l}^{1} (p o l y, l o g)$ -optimal predictor for $(f, μ)$ .

The following is the description of $Λ$ . Consider $O : N \times {0, 1}^{*} \to {0, 1}^{*} \times [0, 1]$ and a polynomial $r : N \to N$ . We describe the computation of $Λ [O, r]^{j} (x)$ where the extra argument of $Λ$ is regarded as internal coin tosses.

We loop over the first $j$ words in lexicographic order. Each word is interpreted as a program $Q : {0, 1}^{*}^{2} a l g - \to [0, 1]$ . We loop over $j^{2}$ "test runs". At test run $i$ , we generate $(x_{i} \in {0, 1}^{*}, t_{i} \in [0, 1])$ by evaluating $O^{j} (y_{i})$ for $y_{i}$ sampled from $U^{r (j)}$ . We then sample $z_{i}$ from $U^{j}$ and compute $s_{i} := e v^{j} (Q, x_{i}, z_{i})$ . At the end of the test runs, we compute the average error $ϵ (Q) := \frac{1}{j^{2}} \sum_{i} (s_{i} - t_{i})^{2}$ . At the end of the loop over programs, the program $Q^{*}$ with the lowest error is selected and the output $e v^{j} (Q^{*}, x)$ is produced.

Appendix

Definition 9

Given $n \in N$ , a function $δ : N^{n + 1} \to R^{\geq 0}$ is called $Δ$ -moderate when

(i) $δ$ is non-decreasing in arguments $2$ to $n + 1$ .

(ii) For any collection of polynomials ${p_{i} : N^{2} \to N}_{i < n}$ , $δ (j, p_{0} (j) \dots p_{n - 1} (j)) \in Δ$

Lemma 1

Fix $(f, μ)$ a unidistributional estimation problem and $^P := (P, r, a)$ a $(p o l y, l o g)$ -predictor. Then, $^P$ is $Δ (p o l y, l o g)$ -optimal iff there is a $Δ$ -moderate function $δ : N^{3} \to [0, 1]$ s.t. for any $j, s \in N$ , $Q : {0, 1}^{*}^{2} a l g - \to [0, 1]$

$E_{μ \times U^{r (j)}} [({^P}^{j} (x, y) - f (x))^{2}] \leq E_{μ \times U^{s}} [(Q (x, y) - f (x))^{2}] + δ (j, T_{Q}^{μ} (s), 2^{| Q |})$

The proof is analogous to the case of $Δ (p o l y, l o g)$ -optimal predictor schemes and we omit it.

Lemma 2

Suppose $(j + 1)^{- 1} \in Δ$ . Fix $(f, μ)$ a unidistributional estimation problem and $^P = (P, r, a)$ a corresponding $Δ (p o l y, l o g)$ -optimal predictor. Consider $^Q = (Q, s, b)$ a $(p o l y, l o g)$ -predictor, $M > 0$ and $^w = (w, r_{w}, a_{w})$ a $(p o l y, l o g)$ -scheme of signature ${0, 1}^{*} \to Q \cap [0, M]$ . Assume $r_{w} (j) \geq max (r (j), s (j))$ . Then there is $δ \in Δ$ s.t.

$E_{μ \times U^{r_{w} (j)}} [{^w}^{j} (x, y) ({^P}^{j} (x, y_{\leq r (j)}) - f (x))^{2}] \leq E_{μ \times U^{r_{w} (j)}} [{^w}^{j} (x, y) ({^Q}^{j} (x, y_{\leq s (j)}) - f (x))^{2}] + δ (j)$

The proof is analogous to the case of $Δ (p o l y, l o g)$ -optimal predictor schemes and we omit it.

Lemma 3

Consider $(f, μ)$ a unidistributional estimation problem and $^P = (P, r, a)$ a $Δ (p o l y, l o g)$ -optimal predictor for $(f, μ)$ . Then there are $c_{1}, c_{2} \in R$ and a $Δ$ -moderate function $δ : N^{3} \to [0, 1]$ s.t. for any $j, s \in N$ , $Q : {0, 1}^{*}^{2} a l g - \to [0, 1]$

$| E_{μ \times U^{s} \times U^{r (j)}} [Q ({^P}^{j} - f)] | \leq (c_{1} + c_{2} E_{μ \times U^{s}} [Q^{2}]) δ (j, T_{Q}^{μ} (s), 2^{| Q |})$

Conversely, consider $M \in Q$ and $^P$ a $(p o l y, l o g)$ -scheme of signature ${0, 1}^{*} \to Q \cap [- M, + M]$ . Suppose that for any $(p o l y, l o g)$ -scheme $^Q = (Q, s, b)$ of signature ${0, 1}^{*} \to Q \cap [- M - 1, + M]$ we have

$| E_{μ \times U^{s (j)} \times U^{r (j)}} [{^Q}^{j} ({^P}^{j} - f)] | \in Δ$

Define $~ P$ to be a $(p o l y, l o g)$ -predictor s.t. computing ${~ P}^{j}$ is equivalent to computing $η ({^P}^{j})$ rounded to $h (j)$ digits after the binary point, where $2^{- h} \in Δ$ . Then, $~ P$ is a $Δ (p o l y, l o g)$ -optimal predictor for $(f, μ)$ .

The proof is analogous to the case of $Δ (p o l y, l o g)$ -optimal predictor schemes and we omit it.

Lemma 4

Consider $μ$ a probability measure on ${0, 1}^{*}$ . Suppose $^S$ is a $Δ (l o g)$ -sampler for $μ$ . Then, $\exists δ \in Δ$ s.t. for any bounded function $f : supp μ \to R$

$| E_{μ} [f] - E [f ({^S}^{k})] | \leq (sup | f |) δ (k)$

Proof of Lemma 4

$Eμ[f]−E[f(^Sk)]=∑x∈{0,1}∗μ(x)f(x)−∑x∈{0,1}∗Pr[^Sk=x]f(x)$

$Eμ[f]−E[f(^Sk)]=∑x∈{0,1}∗(μ(x)−Pr[^Sk=x])f(x)$

$|Eμ[f]−E[f(^Sk)]|≤∑x∈{0,1}∗|(μ(x)−Pr[^Sk=x])f(x)|$

$|Eμ[f]−E[f(^Sk)]|≤(sup|f|)∑x∈{0,1}∗|μ(x)−Pr[^Sk=x]|$

Since $^S$ is a $Δ (l o g)$ -sampler, we get the desired result.

Lemma 5

Consider a family of sets ${X^{k}}_{k \in N}$ and family of probability measures ${μ^{k}}_{k \in N}$ on $X^{k}$ . Denote $Y := ⊔_{k} supp μ^{k}$ . Consider $f : Y \to R$ a function and $g : Y \to R$ a bounded function. Suppose that

$E_{μ^{k}} [f (x)^{2}] \in Δ$

Then it follows that

$| E_{μ^{k}} [g (x) f (x)] | \in Δ$

Proof of Lemma 5

$| E_{μ^{k}} [g (x) f (x)] | \leq E_{μ^{k}} [| g (x) f (x) |]$

$| E_{μ^{k}} [g (x) f (x)] | \leq (sup | g |) E_{μ^{k}} [| f (x) |]$

$| E_{μ^{k}} [g (x) f (x)] | \leq (sup | g |) \sqrt{E_{μ^{k}} [f (x)^{2}]}$

Proposition 1

Consider $(f, μ)$ a unidistributional estimation problem, $^G$ a $Δ (l o g)$ -generator for $(f, μ)$ and $g : supp μ \to R$ a bounded function. Then

$| E_{μ} [g f] - E [g ({^G}_{1}^{k}) {^G}_{2}^{k}] | \in Δ$

Proof of Proposition 1

By Lemma 4 we have

$| E_{μ} [g f] - E [g ({^G}_{1}^{k}) f ({^G}_{1}^{k})] | \in Δ$

By property (ii) of generators and Lemma 5 we have

$| E [g ({^G}_{1}^{k}) {^G}_{2}^{k}] - E [g ({^G}_{1}^{k}) f ({^G}_{1}^{k})] | \in Δ$

Combining the two we get the desired result.

Proof of Theorem 4

We have

$^P ((x_{1}, x_{2})) - (f_{1} \times f_{2}) (x_{1}, x_{2}) = ({^P}_{1} (x_{1}) - f_{1} (x_{1})) f_{2} (x_{2}) + {^P}_{1} (x_{1}) ({^P}_{2} (x_{2}) - f_{2} (x_{2}))$

Therefore, for any $(p o l y, l o g)$ -scheme $^Q = (Q, s, b)$ of signature ${0, 1}^{*} \to Q \cap [- 1, + 1]$

$| E [^Q (^P - f_{1} \times f_{2})] | \leq | E [^Q ((x_{1}, x_{2})) {^P}_{1} (x_{1}) - f_{1} (x_{1})) f_{2} (x_{2})] | + | E [^Q ((x_{1}, x_{2})) {^P}_{1} (x_{1}) ({^P}_{2} (x_{2}) - f_{2} (x_{2}))] |$

By Lemma 3, it is sufficient to show an appropriate bound for each of the terms on the right hand side. Suppose $^G = (G, r_{G}, a_{G})$ is a $Δ (l o g)$ -generator for $(f_{2}, μ_{2})$ . Applying Proposition 1 to the first term, we get

$| E_{μ_{1} \times μ_{2} \times U^{s (j) + r_{1} (j)}} [{^Q}^{j} ((x_{1}, x_{2})) ({^P}_{1}^{j} (x_{1}) - f_{1} (x_{1})) f_{2} (x_{2})] | \leq | E_{μ_{1} \times U^{r_{G} (j)} \times U^{s (j) + r_{1} (j)}} [{^Q}^{j} ((x_{1}, {^G}_{1}^{j})) ({^P}_{1}^{j} (x_{1}) - f_{1} (x_{1})) {^G}_{2}^{j}] | + δ_{2}^{1} (j)$

where $δ_{2}^{1} \in Δ^{1}$ .

$| E_{μ_{1} \times μ_{2} \times U^{s (j) + r_{1} (j)}} [{^Q}^{j} ((x_{1}, x_{2})) ({^P}_{1}^{j} (x_{1}) - f_{1} (x_{1})) f_{2} (x_{2})] | \leq E_{μ_{1}} [| E_{U^{r_{G} (j) + s (j) + r_{1} (j)}} [{^Q}^{j} ((x_{1}, {^G}_{1}^{j})) {^G}_{2}^{j} ({^P}_{1}^{j} (x_{1}) - f_{1} (x_{1}))] |] + δ_{2}^{1} (j)$

Applying Lemma 3 for $P_{1}$ , we get

$| E_{μ_{1} \times μ_{2} \times U^{s (j) + r_{1} (j)}} [{^Q}^{j} ((x_{1}, x_{2})) ({^P}_{1}^{j} (x_{1}) - f_{1} (x_{1})) f_{2} (x_{2})] | \leq E_{μ_{1}} [δ_{1} (j)] + δ_{2}^{1} (j) \leq δ_{1} (j) + δ_{2}^{1} (j)$

where $δ_{1} \in Δ$ .

Suppose $(S, r_{S}, a_{S})$ is a $Δ (l o g)$ -sampler for $μ_{1}$ . Applying Lemma 4 to the second term, we get

$| E_{μ_{1} \times μ_{2} \times U^{s (j) + r_{1} (j)}} [{^Q}^{j} ((x_{1}, x_{2})) {^P}_{1} (x_{1}) ({^P}_{2}^{j} (x_{2}) - f_{2} (x_{2}))] | \leq | E_{U^{r_{S} (j)} \times μ_{2} \times U^{s (j) + r_{1} (j)}} [{^Q}^{j} (({^S}^{j}, x_{2})) {^P}_{1} ({^S}^{j}) ({^P}_{2}^{j} (x_{2}) - f_{2} (x_{2}))] | + δ_{1}^{1} (j)$

where $δ_{1}^{1} \in Δ^{1}$ .

$| E_{μ_{1} \times μ_{2} \times U^{s (j) + r_{1} (j)}} [{^Q}^{j} ((x_{1}, x_{2})) {^P}_{1} (x_{1}) ({^P}_{2}^{j} (x_{2}) - f_{2} (x_{2}))] | \leq E_{μ_{2}} [| E_{U^{r_{S} (j) + s (j) + r_{1} (j)}} [{^Q}^{j} (({^S}^{j}, x_{2})) {^P}_{1} ({^S}^{j}) ({^P}_{2}^{j} (x_{2}) - f_{2} (x_{2}))] |] + δ_{1}^{1} (j)$

Applying Lemma 3 for $P_{2}$ , we get

$| E_{μ_{1} \times μ_{2} \times U^{s (j) + r_{1} (j)}} [{^Q}^{j} ((x_{1}, x_{2})) {^P}_{1} (x_{1}) ({^P}_{2}^{j} (x_{2}) - f_{2} (x_{2}))] | \leq E_{μ_{2}} [δ_{2} (j)] + δ_{1}^{1} (j) \leq δ_{2} (j) + δ_{1}^{1} (j)$

where $δ_{2} \in Δ$ . Again, we got the required bound.

Lemma 6

Consider a family of sets ${X^{k}}_{k \in N}$ and family of probability measures ${μ^{k}}_{k \in N}$ on $X^{k}$ . Denote $Y := ⊔_{k} supp μ^{k}$ . Consider $f_{1}, f_{2} : Y \to R$ bounded functions and ${g_{α} : Y \to R}_{α \in I}$ a uniformly bounded family of functions indexed by some set $I$ . Suppose that

$E_{μ^{k}} [(f_{1} (x) - f_{2} (x))^{2}] \in Δ$

Then there is $δ \in Δ$ s.t.

$\forall α \in I : | E_{μ^{k}} [(g_{α} (x) - f_{1} (x))^{2} - (g_{α} (x) - f_{2} (x))^{2}] | \leq δ (k)$

Proof of Lemma 6

$(g_{α} (x) - f_{1} (x))^{2} - (g_{α} (x) - f_{2} (x))^{2} = (2 g_{α} (x) - f_{1} (x) - f_{2} (x)) (f_{2} (x) - f_{1} (x))$

$| E_{μ^{k}} [(2 g_{α} (x) - f_{1} (x) - f_{2} (x)) (f_{2} (x) - f_{1} (x))] | \leq sup (| 2 g_{α} - f_{1} - f_{2} |) \sqrt{E_{μ^{k}} [(f_{1} (x) - f_{2} (x))^{2}]}$

Proposition 2

Consider $(f, μ)$ , $(g, ν)$ unidistributional estimation problems. Suppose $^ζ = (ζ, r_{ζ}, a_{ζ})$ is a $Δ$ -pseudo-invertible reduction of $(f, μ)$ to $(g, ν)$ and $^ξ = (ξ, r_{ξ}, a_{ξ})$ is it's $Δ$ -pseudo-inverse. Then there is $δ \in Δ$ s.t. for any bounded function $h : {0, 1}^{*}^{2} \to R$

$| E_{μ \times U^{r_{ζ} (j)} \times U^{r_{ξ} (j)}} [h ({^ξ}^{j} ({^ζ}^{j} (x, z), w), {^ζ}^{j} (x, z))] - E_{μ \times U^{r_{ζ} (j)}} [h (x, {^ζ}^{j} (x, z))] | \leq (sup | h |) δ (j)$

Proof of Proposition 2

Denote $μ_{ζ}^{j} := μ \times U^{r_{ζ} (j)}$ . According to the definitive property of $^ξ$

$Eμjζ[∑x′∈{0,1}∗|PrUrξ(j)[^ξj(^ζj(x,z),w)=x′]−Prμjζ[x′′=x′∣^ζj(x′′,z′)=^ζj(x,z)]|]=δ(j)$

where $δ \in Δ$ . Therefore

$Eμjζ[∑x′∈{0,1}∗|(PrUrξ(j)[^ξj(^ζj(x,z),w)=x′]−Prμjζ[x′′=x′∣^ζj(x′′,z′)=^ζj(x,z)])h(x′,^ζj(x,z))|]≤(sup|h|)δ(j)$

$|Eμjζ[∑x′∈{0,1}∗(PrUrξ(j)[^ξj(^ζj(x,z),w)=x′]−Prμjζ[x′′=x′∣^ζj(x′′,z′)=^ζj(x,z)])h(x′,^ζj(x,z))]|≤(sup|h|)δ(j)$

$|Eμjζ[∑x′∈{0,1}∗Pr[^ξj(^ζj(x,z),w)=x′]h(x′,^ζj(x,z))]−Eμjζ[∑x′∈{0,1}∗Prμjζ[x′′=x′∣^ζj(x′′,z′)=^ζj(x,z)]h(x′,^ζj(x,z))]|≤(sup|h|)δ(j)$

$|Eμjζ×Urξ(j)[h(^ξj(^ζj(x,z),w),^ζj(x,z))]−Eμjζ[∑x′∈{0,1}∗Prμjζ[x′′=x′∣^ζj(x′′,z′)=^ζj(x,z)]h(x′,^ζj(x,z))]|≤(sup|h|)δ(j)$

$| E_{μ_{ζ}^{j} \times U^{r_{ξ} (j)}} [h ({^ξ}^{j} ({^ζ}^{j} (x, z), w), {^ζ}^{j} (x, z))] - E_{μ_{ζ}^{j}} [E_{μ_{ζ}^{j}} [h (x^{'}, {^ζ}^{j} (x, z)) ∣ {^ζ}^{j} (x^{'}, z^{'}) = {^ζ}^{j} (x, z)]] | \leq (s u p | h |) δ (j)$

$| E_{μ_{ζ}^{j} \times U^{r_{ξ} (j)}} [h ({^ξ}^{j} ({^ζ}^{j} (x, z), w), {^ζ}^{j} (x, z))] - E_{μ_{ζ}^{j}} [E_{μ_{ζ}^{j}} [h (x^{'}, {^ζ}^{j} (x^{'}, z^{'})) ∣ {^ζ}^{j} (x^{'}, z^{'}) = {^ζ}^{j} (x, z)]] | \leq (s u p | h |) δ (j)$

$| E_{μ_{ζ}^{j} \times U^{r_{ξ} (j)}} [h ({^ξ}^{j} ({^ζ}^{j} (x, z), w), {^ζ}^{j} (x, z))] - E_{μ_{ζ}^{j}} [h (x, {^ζ}^{j} (x, z))] | \leq (s u p | h |) δ (j))$

Proof of Theorem 8

Consider ${^Q}_{f} = (Q_{f}, s_{f}, b_{f})$ a $(p o l y, l o g)$ -predictor. Let ${^Q}_{g} = (Q_{g}, s_{g}, b_{g})$ be the $(p o l y, l o g)$ -predictor defined by

${^Q}_{g}^{j} (x) := {^Q}_{f}^{j} ({^ξ}^{j} (x))$

Applying Lemma 2 we get

$E_{ν \times U^{r_{R} (j)} \times U^{r_{g} (j)}} [{^R}^{j} (y) ({^P}_{g}^{j} (y) - g (y))^{2}] \leq E_{ν \times U^{r_{R} (j)} \times U^{s_{g} (j)}} [{^R}^{j} (y) ({^Q}_{g}^{j} (y) - g (y))^{2}] + δ (j)$

where $δ \in Δ$ .

Using the definitive property of $^R$ we can apply Lemma 5 to the left hand side and get

$E_{ν \times U^{r_{R} (j)} \times U^{r_{g} (j)}} [{^R}^{j} (y) ({^P}_{g}^{j} (y) - g (y))^{2}] = E_{ν \times U^{r_{g} (j)}} [\frac{P r_{μ \times U^{r_{ζ} (j)}} [{^ζ}^{j} (x) = y]}{ν (y)} ({^P}_{g}^{j} (y) - g (y))^{2}] + γ_{R} (j)$

where $| γ_{R} | \in Δ$ . Using property (ii) of pseudo-invertible reductions, we get

$E_{ν \times U^{r_{R} (j)} \times U^{r_{g} (j)}} [{^R}^{j} (y) ({^P}_{g}^{j} (y) - g (y))^{2}] = E_{μ \times U^{r_{g} (j)} \times U^{r_{ζ} (j)}} [({^P}_{g}^{j} ({^ζ}^{j} (x)) - g ({^ζ}^{j} (x)))^{2}] + γ_{R} (j)$

Using the definition of ${^P}_{f}$ and Lemma 6 applied via property (i) of pseudo-invertible reductions, we get

$E_{ν \times U^{r_{R} (j)} \times U^{r_{g} (j)}} [{^R}^{j} (y) ({^P}_{g}^{j} (y) - g (y))^{2}] = E_{μ \times U^{r_{g} (j)} \times U^{r_{ζ} (j)}} [({^P}_{f}^{j} (x) - f (x))^{2}] + γ_{ζ} (j) + γ_{R} (j)$

where $| γ_{ζ} | \in Δ$ .

Using the definitive property of $^R$ , Lemma 5 and property (ii) of pseudo-invertible reductions on the right-hand side, we get

$E_{ν \times U^{r_{R} (j)} \times U^{s_{g} (j)}} [{^R}^{j} (y) ({^Q}_{g}^{j} (y) - g (y))^{2}] = E_{μ \times U^{s_{f} (j)} \times U^{r_{ζ} (j)} \times U^{r_{ξ} (j)}} [({^Q}_{f}^{j} ({^ξ}^{j} ({^ζ}^{j} (x))) - g ({^ζ}^{j} (x)))^{2}] + γ_{R}^{'} (j)$

where $| γ_{R}^{'} | \in Δ$ . Applying Proposition 2

$E_{ν \times U^{r_{R} (j)} \times U^{s_{g} (j)}} [{^R}^{j} (y) ({^Q}_{g}^{j} (y) - g (y))^{2}] = E_{μ \times U^{s_{f} (j)} \times U^{r_{ζ} (j)}} [({^Q}_{f}^{j} (x) - g ({^ζ}^{j} (x)))^{2}] + γ_{ξ} (j) + γ_{R}^{'} (j)$

where $| γ_{ξ} | \in Δ$ . Applying Lemma 6 via property (i) of pseudo-invertible reductions

$E_{ν \times U^{r_{R} (j)} \times U^{s_{g} (j)}} [{^R}^{j} (y) ({^Q}_{g}^{j} (y) - g (y))^{2}] = E_{μ \times U^{s_{f} (j)}} [({^Q}_{f}^{j} (x) - f (x))^{2}] + γ_{ζ}^{'} (j) + γ_{ξ} (j) + γ_{R}^{'} (j)$

Putting everything together

$E_{μ \times U^{r_{g} (j)} \times U^{r_{ζ} (j)}} [({^P}_{f}^{j} (x) - f (x))^{2}] \leq E_{μ \times U^{s_{f} (j)}} [({^Q}_{f}^{j} (x) - f (x))^{2}] + δ^{'} (j)$

for $δ^{'} \in Δ$ .

Proposition 3

Consider $a > 1$ and $δ : [a, \infty) \to R^{\geq 0}$ a non-increasing function. Suppose that

$\int_{a}^{\infty} \frac{δ (x)}{x log x} d x < \infty$

Then

$lim x \to \infty (log log x) δ (x) = 0$

Proof of Proposition 3

Assume to the contrary that there is $ϵ > 0$ and an unbounded sequence ${x_{i} \in [a, \infty)}_{i \in N}$ s.t.

$(log log x_{i}) δ (x_{i}) \geq ϵ$

Define $y_{i} := 2^{\sqrt{log x_{i}}}$ . For any $x \in [y_{i}, x_{i}]$ we have

$δ (x) \geq δ (x_{i}) \geq \frac{ϵ}{log log x_{i}} = \frac{ϵ}{2 log log y_{i}} \geq \frac{ϵ}{2 log log x}$

Therefore

$\int_{y_{i}}^{x_{i}} \frac{δ (x)}{x log x} d x \geq \frac{ϵ}{2} \int_{y_{i}}^{x_{i}} \frac{d x}{x log x log log x} = \frac{ϵ}{2} (log log log x_{i} - log log log y_{i}) = \frac{ϵ}{2}$

Since we can choose an infinite number of non-overlapping intervals of the form $[y_{i}, x_{i}]$ , we reach the contradiction

$\int_{a}^{\infty} \frac{δ (x)}{x log x} d x = \infty$

Proposition 4

Consider a polynomial $q : N \to N$ . There is a function $λ_{q} : N^{2} \to [0, 1]$ s.t.

(i) $\forall j \in N : \sum i \in N λ_{q} (j, i) = 1$

(ii)For any non-increasing function $ϵ : N \to [0, 1]$ we have

$ϵ (j) - \sum i \in N λ_{q} (j, i) ϵ (q (j) + i) \in Δ_{l l}^{1}$

Proof of Proposition 4

Given polynomials $q_{1}, q_{2} : N \to N$ s.t. $q_{1} (j) \geq q_{2} (j)$ for $j ≫ 0$ , the proposition for $q_{1}$ implies the proposition for $q_{2}$ by setting

$λ_{q_{2}} (j, i) := {\begin{matrix} λ_{q_{1}} (j, i - q_{1} (j) + q_{2} (j)) & if i - q_{1} (j) + q_{2} (j) \geq 0 0 & if i - q_{1} (j) + q_{2} (j) < 0 \end{matrix}$

Therefore, it is enough to prove to proposition for polynomials of the form $q (j) = j^{m}$ for $m > 0$ .

We have

$\int_{x = 3}^{3^{m}} ϵ (⌊ x ⌋) d (log log x) \leq \int_{x = 3}^{3^{m}} d (log log x) = log m$

For any $M > 0$

$\int_{x = 3}^{3^{m}} ϵ (⌊ x ⌋) d (log log x) - \int_{x = M}^{M^{m}} ϵ (⌊ x ⌋) d (log log x) \leq log m$

$\int_{x = 3}^{M} ϵ (⌊ x ⌋) d (log log x) - \int_{x = 3^{m}}^{M^{m}} ϵ (⌊ x ⌋) d (log log x) \leq log m$

$\int_{x = 3}^{M} ϵ (⌊ x ⌋) d (log log x) - \int_{x = 3}^{M} ϵ (⌊ x^{m} ⌋) d (log log x) \leq log m$

$\int_{x = 3}^{M} (ϵ (⌊ x ⌋) - ϵ (⌊ x^{m} ⌋)) d (log log x) \leq log m$

$\int_{x = 3}^{\infty} (ϵ (⌊ x ⌋) - ϵ (⌊ x^{m} ⌋)) d (log log x) \leq log m$

Since $⌊ x^{m} ⌋ \geq ⌊ x ⌋^{m}$ we can choose $λ_{q}$ satisfying condition (i) so that

$j + 1 \int x = j ϵ (⌊ x^{m} ⌋) d (log log x) = (log log (j + 1) - log log j) \sum i λ_{q} (j, i) ϵ (j^{m} + i)$

It follows that

$\int_{x = 3}^{\infty} (ϵ (⌊ x ⌋) - \sum i λ_{q} (⌊ x ⌋, i) ϵ (⌊ x ⌋^{m} + i)) d (log log x) \leq log m$

$\int_{3}^{\infty} \frac{ϵ (⌊ x ⌋) - \sum_{i} λ_{q} (⌊ x ⌋, i) ϵ (⌊ x ⌋^{m} + i)}{x log x} d x \leq log m$

Applying Proposition 3, we get the desired result.

Lemma 7

Consider $(f, μ)$ a unidistributional estimation problem, $^P = (P, r, a)$ , $^Q = (Q, s, b)$ $(p o l y, l o g)$ -predictors. Suppose $p : N \to N$ a polynomial and $δ \in Δ_{l l}^{1}$ are s.t.

$\forall i, j \in N : E [({^P}^{p (j) + i} - f)^{2}] \leq E [({^Q}^{j} - f)^{2}] + δ (j)$

Then $\exists δ^{'} \in Δ_{l l}^{1}$ s.t.

$E [({^P}^{j} - f)^{2}] \leq E [({^Q}^{j} - f)^{2}] + δ^{'} (j)$

Proof of Lemma 7

Define $ϵ (j) := {sup}_{k \geq j} E [({^P}^{k} - f)^{2}]$ .

By Proposition 4 we have

$~ δ (j) := ϵ (j) - \sum i λ_{p} (j, i) ϵ (p (j) + i) \in Δ_{l l}^{1}$

$ϵ (j) = \sum i λ_{p} (j, i) ϵ (p (j) + i) + ~ δ (j)$

$ϵ (j) \leq \sum i λ_{p} (j, i) (E [({^Q}^{j} - f)^{2}] + δ (j)) + ~ δ (j)$

$ϵ (j) \leq E [({^Q}^{j} - f)^{2}] + δ (j) + ~ δ (j)$

$E [({^P}^{j} - f)^{2}] \leq E [({^Q}^{j} - f)^{2}] + δ (j) + ~ δ (j)$

Proposition 5

Consider $δ \in Δ_{l l}^{1}$ . Define $δ^{'} (j) := {sup}_{k \geq j} δ (k)$ . Then, $δ^{'} \in Δ_{l l}^{1}$ .

Proof of Proposition 5

Suppose $α$ is s.t. ${lim}_{j \to \infty} (log log j)^{α} δ (j) = 0$ . We claim that ${lim}_{j \to \infty} (log log j)^{α} δ^{'} (j) = 0$ . Assume to the contrary that for some $ϵ > 0$ we have an unbounded sequence ${n_{i}}_{i \in N}$ s.t. $(log log n_{i})^{α} δ^{'} (n_{i}) \geq ϵ$ . We can then choose ${m_{i}}_{i \in N}$ s.t. $m_{i} \geq n_{i}$ and $(log log n_{i})^{α} δ (m_{i}) \geq \frac{ϵ}{2}$ . But this implies $(log log m_{i})^{α} δ (m_{i}) \geq \frac{ϵ}{2}$ which is a contradiction.

Proof of Theorem 10

Consider $^P = (P, r, a)$ a $(p o l y, l o g)$ -predictor. Choose $p : N \to N$ a non-constant polynomial s.t. evaluating $Λ [G]^{p (j)}$ involves running ${^P}^{j}$ until it halts "naturally" (such $p$ exists because $^P$ runs in at most polynomial time and has at most logarithmic advice). Given $i, j \in N$ , consider the execution of $Λ [G]^{p (j) + i}$ . The standard deviation of $ϵ ({^P}^{j})$ with respect to the internal coin tosses of $Λ$ is at most $(p (j) + i)^{- 1}$ . Applying Lemma 6 followed by Lemma 4, the expectation value is $E [({^P}^{j} - f)^{2}] + γ_{P}$ where $| γ_{P} | \leq δ (p (j) + i)$ for $δ \in Δ_{l l}^{1}$ . By Chebyshev's inequality,

$P r [ϵ ({^P}^{j}) \geq E [({^P}^{j} - f)^{2}] + δ (p (j) + i) + (p (j) + i)^{- \frac{1}{2}}] \leq (p (j) + i)^{- 1}$

Hence

$P r [ϵ (Q^{*}) \geq E [({^P}^{j} - f)^{2}] + δ (p (j) + i) + (p (j) + i)^{- \frac{1}{2}}] \leq (p (j) + i)^{- 1}$

The standard deviation of $ϵ (Q)$ for any $Q$ is also at most $(p (j) + i)^{- 1}$ . The expectation value is $E [(e v^{p (j) + i} (Q) - f)^{2}] + γ_{Q}$ where $| γ_{Q} | \leq δ (p (j) + i)$ . Therefore

$P r [\exists Q < p (j) + i : ϵ (Q) \leq E [(e v^{p (j) + i} (Q) - f)^{2}] - δ (p (j) + i) - (p (j) + i)^{- \frac{1}{4}}] \leq (p (j) + i) (p (j) + i)^{- \frac{3}{2}} = (p (j) + i)^{- \frac{1}{2}}$

The extra $p (j) + i$ factor comes from summing probabilities over $p (j) + i$ programs. Combining we get

$P r [E [(e v^{p (j) + i} (Q^{*}) - f)^{2}] \geq E [({^P}^{j} - f)^{2}] + 2 δ (p (j) + i) + (p (j) + i)^{- \frac{1}{2}} + (p (j) + i)^{- \frac{1}{4}}] \leq (p (j) + i)^{- 1} + (p (j) + i)^{- \frac{1}{2}}$

$E [(Λ [G]^{p (j) + i} - f)^{2}] \leq E [({^P}^{j} - f)^{2}] + 2 δ (p (j) + i) + (p (j) + i)^{- 1} + 2 (p (j) + i)^{- \frac{1}{2}} + (p (j) + i)^{- \frac{1}{4}}$

By Proposition 5, we can assume $δ$ is non-increasing without loss of generality. Therefore

$E [(Λ [G]^{p (j) + i} - f)^{2}] \leq E [({^P}^{j} - f)^{2}] + 2 δ (p (j)) + p (j)^{- 1} + 2 p (j)^{- \frac{1}{2}} + p (j)^{- \frac{1}{4}}$

Applying Lemma 7 we get the desired result.

0 comments

Comments sorted by top scores.

Optimal predictors for global probability measures

Contents

Results

New notation

Definition 1

Definition 2

Definition 3

Theorem 1

Theorem 2

Theorem 3

Definition 4

Definition 5

Theorem 4

Theorem 5

Theorem 6

Definition 6

Theorem 7

Definition 7

Note 1

Theorem 8

Defintion 8

Theorem 9

Theorem 10

Appendix

Definition 9

Lemma 1

Lemma 2

Lemma 3

Lemma 4

Proof of Lemma 4

Lemma 5

Proof of Lemma 5

Proposition 1

Proof of Proposition 1

Proof of Theorem 4

Lemma 6

Proof of Lemma 6

Proposition 2

Proof of Proposition 2

Proof of Theorem 8

Proposition 3

Proof of Proposition 3

Proposition 4

Proof of Proposition 4

Lemma 7

Proof of Lemma 7

Proposition 5

Proof of Proposition 5

Proof of Theorem 10

0 comments