Predictor schemes with logarithmic advice

post by Vanessa Kosoy (vanessa-kosoy) · 2016-03-27T08:41:23.000Z · LW · GW · 2 comments

Contents

    Results
  New notation
  Definition 1
  Definition 2
  Definition 3
  Note 1
  Theorem 1
  Note 2
  Definition 4
  Definition 5
  Note 3
  Theorem 2
    Appendix A
  Theorem A.1
  Theorem A.2
  Theorem A.3
  Theorem A.4
  Theorem A.5
  Theorem A.6
  Definition A.1
  Theorem A.7
  Note A.1
    Appendix B
  Definition B.1
  Lemma B.1
  Proof of Lemma B.1
  Note B.1
  Lemma B.2
  Proof of Lemma B.2
  Proof of Theorem A.1
  Lemma B.3
  Proof of Lemma B.3
  Proof of Theorem A.4
    Appendix C
  Proposition C.1
  Proof of Proposition C.1
  Lemma C.1
  Proof of Lemma C.1
  Proof of Theorem 1
  Proof of Theorem 2
None
2 comments

We introduce a variant of optimal predictor schemes where optimality holds within the space of random algorithms with logarithmic advice. These objects are also guaranteed to exist for the error space . We introduce the class of generatable problems and construct a uniform universal predictor scheme for this class which is optimal in the new sense with respect to the error space. This is achieved by a construction similar to Levin's universal search.

Results

New notation

Given , is the following algorithm. When is computed, is interpreted as a program and is executed for time . The resulting output is produced.

The notation means .

is the mapping from a binary expansion to the corresponding real number.

Given a word ensemble, a set, , stands for the maximal runtime of for , .


Previous posts focused on prediction of distributional decision problems, which is the "computational uncertainty" analogue of probability. Here, we use the broader concept of predicting distributional estimation problems (functions), which is analogous to expectation value.

Definition 1

A distributional estimation problem is a pair where is an arbitrary function (even irrational values are allowed) and is a word ensemble.

Definition 2

Given an appropriate set , consider , polynomial and . The triple is called an -valued -bischeme when

(i) The runtime of is bounded by with polynomial.

(ii) for some .

A -valued -bischeme will also be called a -predictor scheme.


We think of as a random algorithm where the second word parameter represents its internal coin tosses. The third word parameter represents the advice and we usually substitute there.

We will use the notations , .

Definition 3

Fix an error space of rank 2 and a distributional estimation problem. Consider a -predictor scheme. is called a -optimal predictor scheme for when for any -predictor scheme , there is s.t.

Note 1

The notation is meant to remind us that we allow a polynomial quantity of random bits and a logarithmic quantity of advice . In fact, the definitions and some of the theorems can be generalized to other quantities of random and advice (see also Note B.1). Thus, predictor schemes from previous posts are -predictor schemes, -predictor schemes are limited to O(1) advice, -predictor schemes use a logarithmic number of random bits and no advice and so on. As usual in complexity theory, it is redundant to consider more advice than random since advice is strictly more powerful.


-optimal predictor scheme satisfy properties analogical to -optimal predictor schemes. These properties are listed in Appendix A. The proofs of Theorem A.1 and A.4 are given in Appendix B. The other proofs are straightforward adaptions of corresponding proofs with polynomial advice.

We also have the following existence result:

Theorem 1

Consider a distributional estimation problem. Define by

Define by

Then, is a -optimal predictor scheme for .

Note 2

Consider a distributional decision problem . Assume admits , , and a function s.t.

(i) runs in quasi-polynomial time ().

(ii)

(iii)

Then it is easy to see we can construct a -predictor scheme taking values in s.t. . The implication doesn't work for larger sizes of time or advice. Therefore, the uncertainty represented by -optimal predictor schemes is associated with the resource gap between quasi-polynomial time plus advice and the resources needed to (heuristically) solve the problem in question.


The proof of Theorem 1 is given in Appendix C: it is a straightforward adaptation of the corresponding proof for polynomial advice. Evidently, the above scheme is non-uniform. We will now describe a class of problems which admits uniform -optimal predictor schemes.

Definition 4

Consider an error space of rank 1. A word ensemble is called -sampleable when there is that runs in polynomial time in the 1st argument, of logarithmic size and a polynomial such that

is called a -sampler for .

Definition 5

Consider an error space of rank 1. A distributional estimation problem is called -generatable when there are and that run in polynomial time in the 1st argument, of logarithmic size and a polynomial such that

(i) is a -sampler for .

(ii)

is called a -generator for .

When is the empty string, is called a -generator for . Such is called -generatable.

Note 3

The class of -generatable problems can be regarded as an average-case analogue of . If is a decision problem (i.e. its range is ), words s.t. , can be regarded as "proofs" of and words s.t. , can be regarded as "proofs" of .

Theorem 2

There is an oracle machine that accepts an oracle of signature and a polynomial where the allowed oracle calls are for and computes a function of signature s.t. for any a distributional estimation problem and a corresponding -generator, is a -optimal predictor scheme for .


In particular if is -generatable, we get a uniform -optimal predictor scheme.

The following is the description of . Consider and a polynomial . We describe the computation of where the extra argument of is regarded as internal coin tosses.

We loop over the first words in lexicographic order. Each word is interpreted as a program . We loop over "test runs". At test run , we generate by evaluating for sampled from . We then sample from and compute . At the end of the test runs, we compute the average error . At the end of the loop over programs, the program with the lowest error is selected and the output is produced.

The proof that this construction is -optimal is given in Appendix C.

Appendix A

Fix an error space of rank 2.

Theorem A.1

Suppose there is a polynomial s.t. . Consider a distributional estimation problem and a -optimal predictor scheme for . Suppose , are s.t.

Define

Assume that either have a number of digits logarithmically bounded in or produces outputs with a number of digits logarithmically bounded in (by Theorem A.7 if any -optimal predictor scheme exists for then a -optimal predictor scheme with this property exists as well). Then, .

Theorem A.2

Consider a word ensemble and s.t. . Suppose is a -optimal predictor scheme for and is a -optimal predictor scheme for . Define by for . Then, is a -optimal predictor scheme for .

Theorem A.3

Consider a word ensemble and s.t. . Suppose is a -optimal predictor scheme for and is a -optimal predictor scheme for . Define by for . Then, is a -optimal predictor scheme for .

Theorem A.4

Fix an error space of rank 1 s.t. given , the function lies in . Consider , distributional estimation problems with respective -optimal predictor schemes and . Assume is -sampleable and is -generatable. Define by and for not of this form. Define by for . Then, is a -optimal predictor scheme for .

Theorem A.5

Consider , and a word ensemble. Assume is a -optimal predictor scheme for and is a -optimal predictor scheme for . Define by for . Then is a -optimal predictor scheme for .

Theorem A.6

Fix a polynomial s.t. . Consider , and a word ensemble. Assume . Assume is a -optimal predictor scheme for and is a -optimal predictor scheme for . Define by

Then, is a -optimal predictor scheme for .

Definition A.1

Consider a word ensemble and , -predictor schemes. We say is -similar to relative to (denoted ) when .

Theorem A.7

Consider a distributional estimation problem, a -optimal predictor scheme for and a -predictor scheme. Then, is a -optimal predictor scheme for if and only if .

Note A.1

-similarity is not an equivalence relation on the set of arbitrary -predictor schemes. However, it is an equivalence relation on the set of -predictor schemes satisfying (i.e. the -expectation value of the intrinsic variance of is in ). In particular, for any any -optimal predictor scheme for has this property.

Appendix B

Definition B.1

Given , a function is called -moderate when

(i) is non-decreasing in arguments to .

(ii) For any collection of polynomials ,

Lemma B.1

Fix a distributional estimation problem and a -predictor scheme. Then, is -optimal iff there is a -moderate function s.t. for any ,

Proof of Lemma B.1

Define

Note B.1

Lemma B.1 shows that the error bound for -optimal predictor scheme is in some sense uniform with respect to . This doesn't generalize to e.g. -optimal predictor schemes. The latter still admit a weaker version of Theorems A.1 and direct analogues of Theorems A.2, A.3, A.5, A.6 and A.7. Theorem A.4 doesn't seem to generalize.

Lemma B.2

Suppose there is a polynomial s.t. . Fix a distributional estimation problem and a corresponding -optimal predictor scheme. Consider a -predictor scheme, , with runtime bounded by a polynomial in the first two arguments, and of logarithmic size. Then there is s.t.

Proof of Lemma B.2

Given , define to be rounded within error . Thus, the number of digits in is logarithmic in and . Denote . Consider the -predictor scheme defined by

satisfies bounds on runtime and advice size uniform in . Therefore, Lemma B.1 implies that there is s.t.


In the following proofs we will use shorthand notations that omit most of the symbols that are clear for the context. That is, we will use to mean , to mean , to mean etc.

Proof of Theorem A.1

Define and by

We have

Define to be truncated to the first significant binary digit. Denote the set of for which . Consider a -predictor scheme satisfying

Such exists since for , has binary notation of logarithmically bounded size.

Applying Lemma B.2 we get

for .

Obviously , therefore

The expression on the left hand side is a quadratic polynomial in which attains its maximum at and has roots at and . is between and , but not closer to than . Therefore, the inequality is preserved if we replace by .

Substituting the equation for we get

Thus for all we have

In particular, .

Lemma B.3

Consider a distributional estimation problem and a -optimal predictor scheme for . Then there are and a -moderate function s.t. for any ,

Conversely, consider and a -valued -bischeme. Suppose that for any -valued -bischeme we have .

Define to be s.t. computing is equivalent to computing rounded to digits after the binary point, where . Then, is a -optimal predictor scheme for .

Proof of Lemma B.3

Assume is a -optimal predictor scheme. Consider , . Define where and . Define to compute rounded within error . By Lemma B.1

where is -moderate. It follows that

where is -moderate ( doesn't enter the error bound because of the rounding). As in the proof of Theorem A.1, can be dropped.

The expression on the left hand side is a quadratic polynomial in . Explicitly:

Moving to the right hand side and dividing both sides by we get

Take where is the rounding error. We get

Conversely, assume that for any -valued -bischeme

Consider a -predictor scheme. We have

Taking to be we get

where . Noting that and we get

Observing that is bounded by a function in , we get the desired result.


Theorems A.2 and A.3 follow trivially from Lemma B.3 and we omit the proof.

Proof of Theorem A.4

We have

Therefore, for any -valued -bischeme

By Lemma B.3, it is sufficient to show an appropriate bound for each of the terms on the right hand side. Suppose is a -generator for . For the first term, we have

where . Applying Lemma B.3 for , we get

where .

Suppose is a -sampler for . For the second term, we have

where . Applying Lemma B.3 for , we get

where . Again, we got the required bound.

Appendix C

Proposition C.1

Consider a polynomial . There is a function s.t.

(i)

(ii) For any function we have

Proof of Proposition C.1

Given functions s.t. for , the proposition for implies the proposition for by setting

Therefore, it is enough to prove to proposition for functions of the form for .

Consider s.t.

Observe that

Since takes values in

Similarly

The last two equations imply that

Raising to a power is equivalent to adding a constant to , therefore

Since we can choose satisfying condition (i) so that

It follows that

Lemma C.1

Consider a distributional estimation problem, , -predictor schemes. Suppose a polynomial and are s.t.

Then s.t.

Proof of Lemma C.1

By Proposition C.1 we have

Proof of Theorem 1

Define by

It is easily seen that

Therefore, there is a polynomial s.t. for any -predictor scheme

Applying Lemma C.1, we get the desired result.

Proof of Theorem 2

Consider a -predictor scheme. Choose a polynomial s.t. evaluating involves running until it halts "naturally" (such exists because runs in at most polynomial time and has at most logarithmic advice). Given , consider the execution of . The standard deviation of with respect to the internal coin tosses of is at most . The expectation value is where for which doesn't depend on . By Chebyshev's inequality,

Hence

The standard deviation of for any is also at most . The expectation value is where . Therefore

The extra factor comes from summing probabilities over programs. Combining we get

Applying Lemma C.1 we get the desired result.

2 comments

Comments sorted by top scores.

comment by Vanessa Kosoy (vanessa-kosoy) · 2015-09-28T19:39:18.000Z · LW(p) · GW(p)

EDIT: Deleted examples of generatable problems since they are wrong. They are in fact examples of a weaker notion which I think also admits uniform OPS but this should be explored elsewhere.

comment by Vanessa Kosoy (vanessa-kosoy) · 2015-08-04T08:56:58.000Z · LW(p) · GW(p)

EDIT: Corrected Example 5.2 and added Note 2 (previous Note 2 renamed to Note 3)