Quasi-optimal predictors
post by Vanessa Kosoy (vanessa-kosoy) · 2015-12-25T14:17:05.000Z · LW · GW · 2 commentsContents
Definition 1 Theorem 1 Theorem 2 Theorem 3 Theorem 4 Theorem 5 Theorem 6 Definition 2 Theorem 7 Definition 3 Theorem 8 Theorem 9 Appendix Lemma 1 Proof of Lemma 1 Lemma 2 Proof of Lemma 2 Proof of Theorem 1 Lemma 3 Proof of Lemma 3 Proof of Theorem 2 Proof of Theorem 4 Proof of Theorem 7 Lemma 4 Proof of Lemma 4 Proof of Theorem 6 Proof of Theorem 8 Proof of Theorem 9 None 2 comments
In this post I define the concept of quasi-optimal predictors which is a weaker variant on the theme of optimal predictors. I explain the properties of quasi-optimal predictors that I currently understand (which are completely parallel to the properties of optimal predictors) and give an example where there is a quasi-optimal predictor but there is no optimal predictor.
All proofs are given in the appendix and are mostly analogous to proofs of corresponding theorems for optimal predictors.
Definition 1
Given a distributional decision problem, a quasi-optimal predictor for is a family of polynomial size Boolean circuits s.t. for any family of polynomial size Boolean circuits we have
where .
Theorem 1
Consider a distributional decision problem and a quasi-optimal predictor for . Suppose , are s.t.
Then:
Theorem 2
Consider a word ensemble and , disjoint languages. Suppose is a quasi-optimal predictor for and is a quasi-optimal predictor for . Then, is a quasi-optimal predictor for .
Theorem 3
Consider a word ensemble and , disjoint languages. Suppose is a quasi-optimal predictor for and is a quasi-optimal predictor for . Then, is a quasi-optimal predictor for .
Theorem 4
Consider , distributional decision problems with respective quasi-optimal predictors and . Define as the family of circuits computing . Then, is a quasi-optimal predictor for .
Theorem 5
Consider and a word ensemble. Assume is a quasi-optimal predictor for and is a quasi-optimal predictor for . Then is a quasi-optimal predictor for
Theorem 6
Consider and a word ensemble. Assume . Assume is a quasi-optimal predictor for and is a quasi-optimal predictor for . Define as the circuit family computing
Then, is a quasi-optimal predictor for .
Definition 2
Consider a word ensemble and two circuit families. We say is quasisimilar to relative to (denoted ) when .
Theorem 7
Consider a distributional decision problem, a quasi-optimal predictor for and a polynomial size family. Then, is a quasi-optimal predictor for if and only if .
Definition 3
Consider , distributional decision problems, a polynomial size family of circuits. is called a (non-uniform) strong pseudo-invertible reduction of to when there is a polynomial s.t. the following conditions hold:
(i)
(ii) There is s.t.
(iii) There is a polynomial and a family of polynomial size circuits s.t.
(iv) There are polynomial size circuits s.t.
Theorem 8
Consider , distributional decision problems, a strong pseudo-invertible reduction of to and a quasi-optimal predictor for . Define as the family of circuits computing . Then, is a quasi-optimal predictor for .
Theorem 9
Consider a one-to-one non-uniformly hard one-way function. Define . Then, is a quasi-optimal predictor for .
Appendix
Lemma 1
Consider a distributional decision problem and a family of polynomial size. Then, is a quasi-optimal predictor if and only if there is a function s.t.
(i) is non-decreasing in the second argument.
(ii) For any polynomial :
In the following, we will call functions satisfying conditions (i) and (ii) quasinegligible.
(iii) for any we have
Proof of Lemma 1
Define
Lemma 2
Consider a distributional decision problem and a corresponding quasi-optimal predictor. Then, there is a function s.t.
(i) is non-decreasing in the second and third arguments.
(ii) For all polynomials :
(iii) for all , and we have
Proof of Lemma 2
Given , denote
Consider circuit computing the following function:
There is a polynomial s.t. . By Lemma 1,
for quasinegligible.
Integrating the inequality with respect to from to , we get
Proof of Theorem 1
Define
Assume to the contrary that there is and an infinite set s.t.
Define as the circuits computing
is bounded by a polynomial since produces binary fractions of polynomial size therefore it is possible to compare them to the fixed numbers using a polynomial size circuit even if the latter have infinite binary expansions.
We have
Define to be truncated to the first significant binary digit. Define as the circuits computing
By the assumption, has binary notation of bounded size, therefore is bounded by a polynomial.
Applying Lemma 2 we get
for vanishing at infinity.
Obviously , therefore
The expression on the left hand side is a quadratic polynomial in which attains its maximum at and has roots at and . is between and , but not closer to than . Therefore, the inequality is preserved if we replace by .
Substituting the equation for we get
Thus vanishes at infinity on , which is a contradiction.
Lemma 3
Consider a distributional decision problem. If is a quasi-optimal predictor for then there are and a quasinegligible function s.t. for any we have
Conversely, suppose and is a polynomial size family for which there is a quasinegligible function s.t. for any we have
Define to be s.t. computing is equivalent to computing rounded to digits after the binary point. Then, is a quasi-optimal predictor.
Proof of Lemma 3
Assume is an optimal predictor. Consider and where and . The function can be approximated by a circuit of size for some fixed polynomial , within rounding error s.t. . By Lemma 1,
where is quasinegligible. is bounded by a negligible function and therefore can be ignored by redefining . As in the proof of Theorem 1, can be dropped.
The expression on the left hand side is a quadratic polynomial in . Explicitly:
Moving to the right hand side and dividing both sides by we get
Take where is the rounding error. We get
Conversely, assume that for any
Consider . We have
can be computed by a circuit of size polynomial in and . Applying the assumption we get
where is quasinegligible. Noting that and we get
Observing that is bounded by a negligible function, we get the desired result.
Proof of Theorem 2
Consider . We have
Using Lemma 3:
Therefore
Using Lemma 3 again we get the desired result.
Proof of Theorem 4
We have
Therefore, for any
By Lemma 3, it is sufficient to show an appropriate bound for each of the terms on the right hand side. For the first term, we have
For any given , can be computed by a circuit with input of size polynomial in and . Applying Lemma 3 to , we get
where is a polynomial and is quasinegligible. Since is bounded by a polynomial in for , we get the bound we need.
For the second term, we have
For any given , can be computed by a circuit with input of size polynomial in , and . Applying Lemma 3 to , we get
Again, we got the required bound.
Proof of Theorem 7
Assume is a quasi-optimal predictor. Applying Lemma 3 to predictor and circuits computing , we get
for some vanishing at infinity. Applying Lemma 3 to predictor and circuits computing , we get
for some vanishing at infinity. We have
Conversely, assume . Consider some . We have
for some vanishing at infinity, since .
for some quasinegligible , using Lemma 3. Combining both inequalities we get
Using Lemma 3 again we conclude is a quasi-optimal predictor.
Lemma 4
Consider and a word ensemble. Assume is a quasi-optimal predictor for and is a quasi-optimal predictor for . Define
Then, .
Proof of Lemma 4
By Theorem 3 and Lemma 3 there is a quasinegligible function such that for any we have
Take to be the circuit computing . Its size is polynomial in therefore
where vanishes at infinity.
Since both terms inside the absolute value are non-negative we get the desired result.
Proof of Theorem 6
When we have
Define to be the circuit computing . Since , Lemma 4 implies that . This implies and by Theorem 7 is a quasi-optimal predictor for .
We have (whether or ) and therefore
Consider .
By Lemma 3 it is sufficient to prove appropriate bounds on and . Both bounds follow from Lemma 3 using the facts and are quasi-optimal predictors and is bounded by a polynomial.
Proof of Theorem 8
Consider , . Define to be the circuit computing . Applying Lemma 2, treating as a constant and using as the weight circuit, we get
where is quasinegligible. We used condition (ii) to get a constant bound on and condition (iv) to get a polynomial bound on .
We take the expectation value of both sides with respect to the uniform measure over :
The left hand side can be rewritten as follows
Grouping the sum by , we get
The first term on the right hand side can be rewritten as
Grouping the sum by we get:
Condition (iii) tells us that is only non-vanishing when and that in this case it equals . Therefore
Putting everything together, we get
Proof of Theorem 9
Assume to the contrary that is not quasi-optimal. Then there is an infinite set , a polynomial size family of circuits and s.t.
Define the functions by . We have
Substituting into the inequality above
For every given , is either non-negative for all or non-positive for . Hence we can move the absolute value inside the integral:
This implies that we can choose s.t.
Using the fact that the graph of the square root lies below its tangent at any point, this leads to
Define as the circuits computing . The definitions of and imply that is bounded by a polynomial. The inequality above and the definitions of and imply
But this contradicts the assumption on .
Note that this argument doesn't show is optimal since while the averaging over preserves the property of vanishing at infinity, it doesn't preserve the property of negligibility. Moreover, it is possible to show that no optimal predictor for exists.
2 comments
Comments sorted by top scores.
comment by orthonormal · 2015-07-05T23:52:44.000Z · LW(p) · GW(p)
So, to be clear, the difference between an optimal predictor and a quasi-optimal predictor is as follows: is the amount by which some other polynomial-size circuit family is able to beat the current predictor. An optimal predictor cannot be beaten by any more than a such that for any , while a quasi-optimal predictor can only assert it cannot be beaten by any more than a such that . Yes?
Replies from: vanessa-kosoy↑ comment by Vanessa Kosoy (vanessa-kosoy) · 2015-07-06T05:04:05.000Z · LW(p) · GW(p)
Exactly. It's such a simple tweak that it's embarrassing I haven't noticed it in the first place. It's just that in average complexity theory negligible errors seem much more popular. The price to pay is that some theorems require stronger assumptions namely something that had to be bounded by a polynomial now has to be bounded by a constant. On the other hand Theorems 9 demonstrates that there is no chance optimal predictors cover all of (for example) whereas quasi-optimal predictors might work. Also, I think that in there is a universal construction for uniform quasi-optimal predictors (as opposed to optimal predictors) similar to Levin's universal search although I haven't fleshed out the details yet (anyhow such a construction would be theoretically valid but highly inefficient in practice).