Two Questions about Solomonoff Induction

post by Scott Garrabrant · 2015-10-04T05:15:55.000Z · LW · GW · 4 comments

In this post, I ask two questions about Solomonoff Induction. I am not sure if these questions are open or not. If you know the answer to either of them, please let me know. I think that the answers may be very relevant to stuff I am currently working on in Asymptotic Logical Uncertainty.

Consider a universal Turing machine which takes in an infinite stream of random bits, and outputs a finite or infinite string of bits. For any bit string , let denote the probability that is a prefix of the output of .

Let be an probabilistic environment which outputs an infinite bit string, and set be the first bits output by .

If assigns outputs according to any computable probability distribution, then with probability 1, quickly converges to the actual probability that outputs a 1 at location given that its first bits were .

In particular, if outputs an infinite stream of iid bits which are each 1 with some rational probability , then will converge to with probability 1.

However I am curious what happens if is not computable. In the simplest case outputs an infinite stream of iid bits which are each 1 with some uncomputable real number probability . In this case, will converge to with probability 1?

To make things slightly more complicated, assume that the odd index bits of come from a fair coin, but the even bits come from an adversary who chooses the bits in an uncomputable way. (This adversary can see all the bits output by so far, but not future bits.) In this case, will converge to 1/2?

4 comments

Comments sorted by top scores.

comment by SamEisenstat · 2017-07-05T22:20:44.000Z · LW(p) · GW(p)

Universal Prediction of Selected Bits solves the related question of what happens if the odd bits are adversarial but the even bits copy the preceding odd bits. Basically, the universal semimeasure learns to do the right thing, but the exact sense in which the result is positive is subtle and has to do with the difference between measures and semimeasures. The methods may also be relevant to the questions here, though I don't see a proof for either question yet.

comment by jessicata (jessica.liu.taylor) · 2015-10-12T01:12:54.000Z · LW(p) · GW(p)

My intuition is that, for the iid bits with uncomputable probability r, there will exist an optimal (up to a constant) solution of the form:

  1. compute some real number by running the shortest program computing this
  2. generate bits iid from

If this is the case, then the optimal algorithm will choose to minimize if is the number of bits observed. As goes to infinity, minimizing this objective causes to converge to .

Now I'm not sure how to prove that an optimal (up to a constant) hypothesis of the form above exists. Basically this is saying that for some constant with high probability. Possibly the argument goes something like: for any non-conditionally-iid way of assigning probabilities to the bits, there is a better iid way (by averaging the probabilities). And if you are flipping the bits iid, then at some point you compute the probability of each flip, so you might as well just optimally compress some . But I couldn't quite formalize this argument well, because of the fact that hypotheses can encode correct correlations through overfitting.

comment by Vanessa Kosoy (vanessa-kosoy) · 2016-10-12T09:23:19.000Z · LW(p) · GW(p)

Can you provide links to the solutions, I am curious? (especially regarding the second question)

comment by cousin_it · 2015-12-11T16:15:32.000Z · LW(p) · GW(p)

Well, Laplace's rule of succession is a computable prior that will almost certainly converge to your uncomputable probability value, and I think the difference in log scores from the "true" prior will be finite too. Since Laplace's rule is included in the Solomonoff mixture, I suspect that things should work out nicely. I don't have a proof, though.