Implementing CDT with optimal predictor systems

post by Vanessa Kosoy (vanessa-kosoy) · 2015-12-20T12:58:44.000Z · LW · GW · 2 comments

Contents

    Preliminaries
  Construction 1
  Proposition 1
  Notation
    Results
  Theorem 1
  Definition 1
  Corollary 1
  Definition 2
  Construction 2
  Corollary 2
  Theorem 2
  Corollary 3
  Construction 3
  Corollary 4
  Definition 3
  Theorem 3
  Theorem 4
    Appendix A
  Proposition A.1
  Proof of Proposition A.1
  Proof of Proposition 1
  Proposition A.2
  Proof of Proposition A.2
  Lemma A.1
    Appendix B
  Lemma B.1
  Lemma B.2
  Proof of Lemma B.2
  Proof of Theorem 1
  Proposition B.1
  Proof of Proposition B.1
  Proof of Corollary 1
  Proof of Corollary 2
  Proposition B.2
  Proof of Proposition B.2
  Proposition B.3
  Proof of Proposition B.3
  Proposition B.4
  Proof of Proposition B.4
  Proof of Theorem 2
  Proof of Corollary 3
  Proof of Corollary 4
  Proposition B.5
  Proof of Proposition B.5
  Proof of Theorem 3
  Proposition B.6
  Proof of Proposition B.6
  Proof of Theorem 4
None
2 comments

We consider transparent games between bounded CDT agents ("transparent" meaning each player has a model of the other players). The agents compute the expected utility of each possible action by executing an optimal predictor of a causal counterfactual, i.e. an optimal predictor for a function that evaluates the other players and computes the utility for the selected action. Since the agents simultaneously attempt to predict each other, the optimal predictors form an optimal predictor system for the reflective system comprised by the causal counterfactuals of all agents. We show that for strict maximizers, the resulting outcome is a bounded analogue of an approximate Nash equilibrium, i.e. a strategy which is an optimal response within certain resource constraints up to an asymptotically small error. For "thermalizers" (agents that choose an action with probability proportional to ), we get a similar result with expected utility replaced by "free utility" . Thus, such optimal predictor systems behave like bounded counterparts of reflective oracles.

Preliminaries

The proofs for this section are given in Appendix A.

We redefine and to be somewhat smaller proto-error spaces which nevertheless yield the same existence theorems as before. This is thanks to Lemma A.1.

Construction 1

Given , denote the set of bounded functions s.t.

Denote

Proposition 1

If is s.t. , is a proto-error space.

For any , is an ample proto-error space. When is non-decreasing, is stable.

is a stable ample proto-error space.

Notation

We denote . We allow the same abuse of notation for this symbol as for usual big notation.

For any -bischeme we use to denote .

For reflective systems we write indices in as superscripts rather than subscripts as before.

Given , we denote and .

Results

The proofs for this section are given in Appendix B.

We start by showing that choosing an action by maximizing over optimally predicted utility is an optimal strategy in some sense.

Theorem 1

Fix a proto-error space . Consider a finite set, a word ensemble and . Suppose are -optimal predictors for . Assume that and don't depend on (this doesn't limit generality since we can replace by and by a polynomial ). Consider an -valued -bischeme. Define another -valued -bischeme where is computed by sampling from and choosing arbitrarily from s.t. is maximal. Then, there is s.t.

Definition 1

Given suitable sets and , a -scheme of signature is s.t.

(i) The runtime of is bounded by for some polynomial .

(ii) for some polynomial .

(iii) is a probability measure on and there is s.t. .

A -scheme of signature is also called a -valued -scheme.

As usual, we sometimes use as implicit notation in which receives a value sampled from for the second argument and/or a value sampled from for the third argument.

Corollary 1

Fix . Consider a finite set, a word ensemble and . Suppose are -optimal predictors for . Consider and an -valued -scheme. Define as in Theorem 1. Then


We now introduce the game theoretical setting.

Definition 2

A distributional game is a quadruple where is a word ensemble, is a finite set representing players, are finite sets representing the possible actions of each player and represent the utility functions.


For each , defines an ordinary game in normal form. We think of as a single game where the strategies are (some class of) functions and the payoffs are .

Construction 2

Consider a distributional game and . We construct the reflective system .

is defined by

Define by

Denote to be the space of mixed actions of player i.e. the space of probability measures on .

Define by

Define by

Define by

Finally, define by


For any -predictor we will use the notation to mean

For a family of -predictors, we will use the notations and to mean

Corollary 2

Fix a distributional game, and an -optimal predictor system for . Consider , and an -valued scheme. Then


We now move from studying strict maximizers to studying thermalizers.

Theorem 2

Fix a proto-error space . Consider a finite set, a word ensemble and . Suppose are -optimal predictors for and is some function. Assume that and don't depend on . Consider an -valued -bischeme. Suppose is another -valued -bischeme satisfying

Here is sampled from , is the normalization factor and . Then, there is s.t.

Here for stands the (base ) Shannon entropy of a probability measure on .

Corollary 3

Fix . Consider a finite set, a word ensemble and . Suppose are -optimal predictors for and a bounded function. Consider and an -valued -scheme. Assume is as in Theorem 2. Then

Here stands for averaging the probability measure on with respect to parameter using probability distribution .

Construction 3

Consider a distributional game, and . We construct the reflective system .

Define by

Here is the normalization factor.

Define by

Define by

Finally, define by


We define the notations , and analogously to before.

Corollary 4

Fix a distributional game, , bounded and an -optimal predictor system for . Consider , and an -valued scheme. Then


Finally, we show that assuming doesn't depend on and doesn't fall to zero too fast, deterministic advice is sufficient to construct an optimal predictor system for .

Definition 3

Consider a reflective system and . is said to be -Hoelder when there are , , , and s.t.

(i)

(ii)

(iii)

(iv)

(v)

Theorem 3

Consider a finite set , a reflective system and . If is -Hoelder then it has an -optimal predictor system.

Theorem 4

Consider a distributional game, and . Assume for some . Then is -Hoelder.

Appendix A

Proposition A.1

Suppose is a non-decreasing function s.t. . Define by

Then, .

Proof of Proposition A.1

Consider . We have

Proof of Proposition 1

Mostly obvious modulo Proposition A.1. To see is stable for non-decreasing , consider a non-constant polynomial and . Define . To get the desired condition for and with , consider any s.t. and for sufficiently large we have . Suche exists since for for sufficiently large . We have

In particular

Proposition A.2

Consider a polynomial . There is a function s.t.

(i)

(ii) For any function we have

Proof of Proposition A.2

Given functions s.t. for , the proposition for implies the proposition for by setting

Therefore, it is enough to prove to proposition for functions of the form for .

Consider any . We have

Since takes values in

Similarly

The last two equations imply that

Raising to a power is equivalent to adding a constant to , therefore

Since we can choose satisfying condition (i) so that

It follows that

Lemma A.1

Consider a distributional estimation problem, , -predictors. Suppose a polynomial, and are s.t.

Then s.t.


The proof of Lemma A.1 is analogous to before and we omit it.

Appendix B

The following is a slightly stronger version of one direction of the orthogonality lemma.

Lemma B.1

Consider a distributional estimation problem and an -optimal predictor for . Then there are and an -moderate function s.t. for any , a probability measure on and that runs in time on any valid input


The proof is analogous to the original and we omit it.

Lemma B.2

Fix a proto-error space . Consider a finite set, a word ensemble and . Suppose are -optimal predictors for . Assume that and don't depend on . Consider an -valued -bischeme. Assume factors as . Then

Proof of Lemma B.2

Applying Lemma B.1 we get the desired result.

Proof of Theorem 1

Lemma B.2 implies

Combining the two

The construction of implies that for every we have

Combining with the above we get the desired result.

Proposition B.1

Consider , , bounded below and bounded. Assume that . Then .

Proof of Proposition B.1

Proof of Corollary 1

Choose some . Define the -valued -scheme by

Applying Theorem 1 we get s.t.

Proposition B.1 implies

Also

Putting everything together we get the desired result.

Proof of Corollary 2

For each define and . It is easy to see is an -optimal predictor for . We have

Here selects an element of uniformly at random up to a small rounding error which yields the term. Also

Applying Corollary 1 we get the desired result.

Proposition B.2

Consider a finite set, and . Define by where is the normalization constant . Consider with . Then

Proof of Proposition B.2

On the other hand

Combining the two we get the desired result.

Proposition B.3

Consider a finite set, and . Define the probability measures by where are normalization factors. Then

Proof of Proposition B.3

Proposition B.4

Consider a finite set. There is a s.t. for any probability measures on we have

Here is the total variation distance.

Proof of Proposition B.4

Given we have

For some we have for hence

Using the concavity of the square root we complete the proof.

Proof of Theorem 2

For any , and define and probability measures on by

Here is the normalization factor. By the convexity of Kullback-Leibler divergence

Applying Proposition B.3

Applying the uniqueness theorem, we get

We also have

for some . Applying Propositions B.2 and B.4 we get that

for some . Combining with the above, we get

In particular, for some

Applying Proposition B.2

Applying Lemma B.2 we get the desired result.

Proof of Corollary 3

Choose some . Define the -valued -scheme by

Applying Theorem 2 we get s.t.

Using the concavity of entropy and the property of

Applying Proposition B.1 we get the desired result.

Proof of Corollary 4

For each define and . It is easy to see is an -optimal predictor for . Construct the -bischeme to satisfy the condition of Theorem 2 with . We have (with the help of Proposition B.4)

Applying Corollary 3 we get the desired result.

Proposition B.5

Consider a finite set , a -Hoelder reflective system and two collections of -predictors and . Assume that . Then

Proof of Proposition B.5

Using the similarity of and there are s.t.

Using we get the desired result.

Proof of Theorem 3

Fix a finite set and a collection . Consider a -Hoelder reflective system. By the general existence theorem, there is an -optimal predictor system for . For each we can choose , an -optimal predictor for . By the uniqueness theorem, we have . By Proposition B.5 this implies . This means is an -optimal predictor for and is an -optimal predictor system for .

Proposition B.6

Fix a finite set and a collection . Consider a reflective system. Assume there are , , and s.t.

(i)

(ii)

(iii)

(iv) For any and s.t.

Then, is -Hoelder.

Proof of Proposition B.6

Identify with where . Consider any . Define recursively for all by , and . In particular, it follows that . We have

Denoting , we get

Proof of Theorem 4

Consider and s.t. .

Slightly abusing notation, define by

We have

Using Pinsker's inequality

Using Proposition B.3

Applying Proposition B.6 we get the desired result.

2 comments

Comments sorted by top scores.

comment by paulfchristiano · 2015-12-20T20:42:57.000Z · LW(p) · GW(p)

I find these theorem statements quite hard to follow. It looks like this is doing roughly the right thing and would be better to work with than reflective oracles (i.e. to the extent that there were new complications, those complications would represent real and important phenomena that were simply covered up by the simplifying assumptions). In that case I am quite interested in AIXI-with-optimal-predictors (though probably I'm most interested in a more accessible presentation, and I really feel like it should be possible to have a cleaner formalism).

As an example, it seems like this result might be of broad interest to computer scientists, since e.g. there is no polynomial time algorithm that finds an approximate Nash equilibria in general games, even apparently simple ones. An extremely natural open question is "so what should we actually expect to happen when extremely rational players play such games?" It looks like this approach may be able to give some kind of an answer via an appropriate notion of boundedly optimal, but it is pretty hard to understand whether this approach does give a satisfying answer, and if so what it is.

I think that the relevant take-away is in Theorem 1. My impression is that you are holding the strategy sets constant while effectively taking a limit over computational resources of the players (and potentially the difficulty of understanding the payoff matrix). Is that right? In that case, I expect convergence to the N.E. to only occur once the computational resources are exponential in the size of the game.

(Of course that would still be a big improvement over uncomputability, I'm just trying to understand what is going on.)

Replies from: vanessa-kosoy
comment by Vanessa Kosoy (vanessa-kosoy) · 2015-12-22T08:02:00.000Z · LW(p) · GW(p)

AIXI-with-optimal-predictors: I believe this is relatively straightforward. However, my plan for the next step was adapting these results to a decision rule based on logical counterfactuals in a way which produces metathreat equilibria.

Bounded Nash equilibria: I don't think the concept is entirely novel. I've seen some papers which discuss Nash-like equilibria with computational resource bounds, although the the area seems to remain largely unexplored. The particular setting I use here is not very relevant to what you're suggesting since finding Nash equilibria is non-polynomial in the number of strategies whereas here I keep the number of strategies constant. Instead, the complexity comes from the dependence of the payoff tensor on the parameter sampled from .

Your description of Theorem 1 is more or less correct except there's only a "payoff vector" here since this is a 1 player setting. The multiplayer setting is used in Corollary 2.

Regarding dependence on game size, it is not as bad as exponential. The Lipton-Markakis-Mehta algorithm finds -equilibria in time