Help request: What is the Kolmogorov complexity of computable approximations to AIXI?

post by AnnaSalamon · 2010-12-05T10:23:55.626Z · LW · GW · Legacy · 9 comments

Contents

9 comments

Does anyone happen to know the Komogorov complexity (in some suitable, standard UTM -- or, failing that, in lines of Python or something) of computable approximations of AIXI?

I'm writing a paper on how simple or complicated intelligence is, and what implications that has for AI forecasting.  In that context: adopt Shane Legg's measure of intelligence (i.e., let "intelligence" measure a system's average goal-achievement across the different "universe" programs that might be causing it to win or not win reward at each time step, weighted according to the universe program's simplicity).

Let k(x, y) denote the Kolmogorov complexity of the shortest program that attains an intelligence of at least x, when allowed an amount y of computation (i.e., of steps it gets to run our canonical UTM).  Then, granting certain caveats, AIXI and approximations thereto tell us that the limit as y approaches infinity of k(x,y) is pretty small for any computably attainable value of x.  (Right?)

What I'd like is to stick an actual number, or at least an upper bound, on "pretty small".

If someone could help me out, I'd be much obliged.

9 comments

Comments sorted by top scores.

comment by timtyler · 2010-12-05T11:24:25.136Z · LW(p) · GW(p)

Not your question, but great intelligence is necessarily highly complex - according to:

"Is there an Elegant Universal Theory of Prediction?" - Shane Legg.

Replies from: AnnaSalamon
comment by AnnaSalamon · 2010-12-05T12:14:32.904Z · LW(p) · GW(p)

Thanks. This looks helpful.

Replies from: Will_Sawin
comment by Will_Sawin · 2011-01-04T21:54:26.828Z · LW(p) · GW(p)

if there's some set S of computer programs, and the algorithm A solves the halting problem of S, then A + a small amount of code predicts all sequences generated by programs in S.

Method: Go through S in some order. Skip over each program that doesn't halt. Skip over each program when it's falsified by the data. Stop when you reach a program that fits the data, and use its programs until its falsified, at which point you move on to the next program...

Humans are computable. There exists a set of programs that humans cannot predict and a (strongly related) set that we cannot tell if they halt. Cannot meaning constitutionally incapable.

Clearly, we should not expect an AI we build to be able to predict these programs.

What if our world is one of those programs? That sounds scary.

comment by rwallace · 2010-12-05T11:57:45.877Z · LW(p) · GW(p)

If I understand your question correctly, it boils down to asking how long is a minimal implementation of the AIXItl algorithm. The answer to that is, pretty short; it would fit in one page of code. (This fact is of little practical importance, of course, since it would still require much more computation than is possible according to our current understanding of physics.)

It's possible I'm not understanding your question correctly, in which case please point out what I'm missing.

Replies from: Roko, AnnaSalamon
comment by Roko · 2010-12-05T20:16:55.619Z · LW(p) · GW(p)

I agree with this.

K(AIXItl) <= length of code that Hutter actually took to write it ~ 300 lines of code.

With respect to a minimal state-symbol turing machine it would be 300 + (however much it takes to translate common math concepts like "argmax" into the minimal state-symbol turing machine's language).

comment by AnnaSalamon · 2010-12-05T12:16:48.972Z · LW(p) · GW(p)

I think that's the question, unless there are some shorter or in some way better computable approximations to AIXI out there. I am not as familiar with this field as I would like, so someone please correct me if I'm making a silly mistake somewhere.

If anyone has gone to the trouble of actually counting the number of bits in AIXItl (in some particular, simple programming language / UTM), that would be nice to quote.

Replies from: rwallace
comment by rwallace · 2010-12-05T12:21:20.114Z · LW(p) · GW(p)

One variant somebody actually implemented is called Monte Carlo AIXI; granted that it won't have been optimized for minimum code size, but a Google search might find you a copy of the code you could download and look at its size for an upper bound.

comment by Louie · 2010-12-09T13:26:57.515Z · LW(p) · GW(p)

Monte-Carlo AIXI compiles down to 3MB. This is almost entirely due to having a bunch of fast math libraries compiled into it but it's definitely a reasonable measure of it's real world Kolmogorov complexity. Or at least an upper bound.

Although, that is MC-AIXI(FAC-CTW). It may not have the same strong optimality guarantees as computing power goes to inf. Perhaps check the paper about it to make sure?

My experience with testing this implementation of AIXI myself is that it can only function on toy domains and only then when the data has been heavily curated to the point where it is basically only able to recognize the simplest of patterns. The only really cool thing it's doing is recognizing them in a general way. But it doesn't scale to actually do anything interesting on real data no matter how much computing power you have in practice. 100 years of Moore's law won't make this implementation of AIXI able to distinguish 2 faces from each other given two jpg images.

I can look into it more if you want. I'm not sure I'm clear on what upper board you want.

Replies from: jveness
comment by jveness · 2010-12-18T10:15:21.305Z · LW(p) · GW(p)

Hi.

A couple of comments from the author of mc-aixi(fac-ctw).

There is a massive difference in generalisation ability between Solomonoff Induction and the scaled down Bayes mixture used by mc-aixi(fac-ctw). The mixture used in this approximation scales with more CPU power, however it will never get anywhere near the full power of Solomonoff Induction. The class of environments the agent supports is described in more detail in the paper Louie references.

Let's be clear though - 100 years of Moores Law would significantly improve the performance of even the current implementation of the agent. However it is important to not forget about data efficiency. mc-aixi(fac-ctw) can only exploit very simple structure given reasonable amounts of data. If this structure doesn't exist, the performance will not be much better than learning by rote. Thus Louie's comment about image recognition is spot on. Future versions will be able to generalise more effectively for certain kinds of data, however expect progress to be very incremental.

For what it's worth, the next version of mc-aixi will contain about 3x more lines of source code. It will also be restricted to toy domains, however they will be significantly more interesting than what is currently published.

Happy to answer any other questions.

Cheers, Joel