Estimating the kolmogorov complexity of the known laws of physics?

post by Strilanc · 2013-07-08T04:30:31.764Z · LW · GW · Legacy · 46 comments

In the post Complexity and Intelligence, Eliezer says that the Kolmogorov Complexity (length of shortest equivalent computer program) of the laws of physics is about 500 bits:

Suppose you ran a Turing machine with unlimited tape, so that, starting from our laws of physics, it simulated our whole universe - not just the region of space we see around us, but all regions of space and all quantum branches. [...]

Then the "Kolmogorov complexity" of that entire universe [...] would be 500 bits, or whatever the size of the true laws of physics when written out as equations on a sheet of paper.

Where did this 500 come from?

I googled around for estimates on the Kolmogorov Complexity of the laws of physics, but didn't find anything. Certainly nothing as concrete as 500.

I asked about it on the physics stack exchange, but haven't received any answers as of yet.

I considered estimating it myself, but doing that well involves significant time investment. I'd need to learn the standard model well enough to write a computer program that simulated it (however inefficiently or intractably, it's the program length that matters not it's time or memory performance).

Based on my experience programming, I'm sure it wouldn't take a million bits. Probably less than ten thousand. The demo scene does some pretty amazing things with 4096 bits. But 500 sounds like a teeny tiny amount to mention off hand for fitting the constants, the forces, the particles, and the mathematical framework for doing things like differential equations. The fundamental constants alone are going to consume ~20-30 bits each.

Does anyone have a reference, or even a more worked-through example of an estimate?

46 comments

Comments sorted by top scores.

comment by passive_fist · 2013-07-08T05:28:22.639Z · LW(p) · GW(p)

First, remember that Kolmogorov complexity is only well-defined up to a constant, which is determined by your model of computation. So saying that something has a Kolmogorov complexity of 500 bits only makes sense if the model of computation has been defined. In this case, it has - the Universal Turing Machine model. The reason I'm mentioning this is that a particular simulation might have a wildly differing complexity when specified on a Turing Machine as opposed to, say, x86 machine code (which is typically very complex and contains a lot of redundancies, greatly inflating bit counts).

Second, the article addresses why you can have the paradoxical situation of the Universe being low-complexity while specific things in the Universe are of high complexity (using the Earth as an example):

If you want to print out the entire universe from the beginning of time to the end, you only need to specify the laws of physics.

If you want to print out just Earth by itself, then it's not enough to specify the laws of physics. You also have to point to just Earth within the universe.

This is why you can have computer programs, like from the demo scene, that seemingly can't be compressed smaller than thousands of bits, despite existing in a low-complexity Universe. To specify a program, you have to give enough information to pick it out from the sea of other allowed programs.

To specify the Universe, you only have to specify enough information to pick it out from the landscape of all possible Universes. According to string theory (which is a Universal theory in the sense that it is Turing-complete) the landscape of possible Universes is 2^500 or so, which leads to 500 bits of information. Perhaps this is where Eliezer got the figure from (though I admit that I don't exactly know where he got it from either). Note that this would be an upper bound. It's possible that the Universe is much simpler than what string theory suggests.

That said, that's the complexity under the string theory model, not the turing machine model. So the Kolmogorov complexity under the Turing machine model would be less than 500+(the amount of information needed to specify string theory under the Turing machine model). The latter would also probably be a few hundred bits (string theory's theoretical core is quite simple once you strip away all the maths that is needed for doing calculations).

So Eliezer might be wrong about that particular figure but it would surprise me if it were many orders of magnitude off the mark.

Replies from: MrMind, Strilanc
comment by MrMind · 2013-07-08T07:26:28.030Z · LW(p) · GW(p)

First, remember that Kolmogorov complexity is only well-defined up to a constant, which is determined by your model of computation.

And that is why it makes little sense to talk about the complexity of one object without comparing or using it in relation to other objects, like the OP asked. If you implement the law of physics in Brainfuck code, on a x86 instruction set, or on a machine specifically dedicated to have the law of physics as a command, you can get wildly different complexities, even down to a few bit (think about the UTM that has a specific instruction that emulates the law of physics).

Replies from: pragmatist
comment by pragmatist · 2013-07-08T19:11:14.025Z · LW(p) · GW(p)

Does it even make sense to talk of relative Kolmogorov complexity when we're dealing with finite strings? For any two finite strings A and B, it is always possible to find some model of computation in which A is less complex than B (and vice versa, of course).

Replies from: passive_fist
comment by passive_fist · 2013-07-08T20:41:13.471Z · LW(p) · GW(p)

It does make sense to talk of relative KC, actually, but it's not trivial to see why: http://en.wikipedia.org/wiki/Kolmogorov_complexity#Invariance_theorem

In particular, "If K1 and K2 are the complexity functions relative to description languages L1 and L2, then there is a constant c – which depends only on the languages L1 and L2 chosen – such that

\forall s\ |K_1(s) - K_2(s)| \leq c. "
Replies from: pragmatist
comment by pragmatist · 2013-07-09T05:46:16.412Z · LW(p) · GW(p)

Yes, I'm aware of this. It's still true, I believe, that for any two finite strings s1 and s2 one can find description languages L1 and L2 (with complexity functions K1 and K2) such that

K1(s1) > K1(s2)

and

K2(s1) < K2(s2).

So there is no language-independent sense in which s1 is more complex than s2 (or vice versa). To make the claim more concrete, consider the fact that for any finite string, one could construct a Universal Turing Machine that outputs that string when given the input 0 (the string is essentially hard-coded into the structure of the machine). This corresponds to a description language in which that string has minimal K-complexity.

This is all compatible with the invariance theorem. As a simple illustration, let the constant c associated with L1 and L2 be 5, let K1(s1) be 10, K1(s2) be 9, K2(s1) be 6 and K2(s2) be 8. In this example, both the inequalities I've given above are true, and the invariance theorem isn't violated.

comment by Strilanc · 2013-07-08T05:41:27.781Z · LW(p) · GW(p)

First, remember that Kolmogorov complexity is only well-defined up to a constant, which is determined by your model of computation. [...]

Right, if you do something like (say) take away the addition instruction then the shortest program might get longer because it has to emulate addition using subtraction and negation (or use a totally different approach, or include a translation pass that expands the additions into negate+subtract or include a universal turing machine that runs the original machine or... yeah).

Second, the article addresses why you can have the paradoxical situation of the Universe being low-complexity while specific things in the Universe are of high complexity.

In this case I care about the complexity of the laws that govern the change in state (positions of atoms or values of the wave functions) with respect to time. I will not make you pay for the presumably absurd amount of data required for the initial state of >10^(82) atoms. That would make talking about the complexity of the laws laughably negligible. I realize that this is, in a sense, an arbitrary distinction but... that's the question I want to know the answer to.

According to string theory (which is a Universal theory in the sense that it is Turing-complete) the landscape of possible Universes is 2^500 or so, which leads to 500 bits of information. Perhaps this is where Eliezer got the figure from (though I admit that I don't exactly know where he got it from either).

Interesting guess at where the number could have come from. I just assumed he tallied up various aspects of the standard model somehow and got an answer between 450 and 510.

Replies from: Anatoly_Vorobey
comment by Anatoly_Vorobey · 2013-07-08T07:03:19.172Z · LW(p) · GW(p)

The standard model is over here, see page 36: http://arxiv.org/pdf/hep-th/0610241v1.pdf

I will not make you pay for the presumably absurd amount of data required for the initial state of >10^(82) atoms.

You don't have any atoms in the initial state - nor anything that can reasonably be called matter. My (very ignorant) guess is that we won't even know what it takes to specify the initial state before we have a unified GR+QFT theory.

According to string theory (which is a Universal theory in the sense that it is Turing-complete) the landscape of possible Universes is 2^500 or so, which leads to 500 bits of information.

If one wishes to describe oneself in a particular universe, then, assuming MWI, fixing the universe is peanuts, complexity-wise, compared to fixing the appropriate Everett branch. The number of bits there is just astounding, it would seem to me.

Replies from: Strilanc
comment by Strilanc · 2013-07-08T08:15:06.601Z · LW(p) · GW(p)

The standard model is over here, see page 36: http://arxiv.org/pdf/hep-th/0610241v1.pdf

Uh, wow, that's a somewhat large equation. It has like 500 terms. Seems... inconsistent with physicists seeing beauty in physics.

Replies from: Mitchell_Porter
comment by Mitchell_Porter · 2013-07-08T11:27:14.822Z · LW(p) · GW(p)

You can see it in just five lines here, page 1. And an even more compact formulation would just list the symmetry groups, the various fields and how they transform under each group, and would then stipulate that the Lagrangian contains every possible renormalizable term (which is a principle in the construction of such theories, since renormalizable terms that aren't included get generated anyway).

comment by solipsist · 2013-07-08T04:45:17.777Z · LW(p) · GW(p)

The fundamental constants alone are going to consume ~20-30 bits each.

Not if those constants are free variables, or can computed in some way we don't understand. Frankly, I'd find it a bit ugly if those constants were hard-coded into the fabric of the universe.

Replies from: rocurley, Strilanc
comment by rocurley · 2013-07-08T21:43:55.425Z · LW(p) · GW(p)

Well, several of the universal constants arguably define our units. For every base type of physical quantity (things like distance, time, temperature, and mass, but not, for example, speed, which can be constructed out of distance and time), you can set a physical constant to 1 if you're willing to change how you measure that property. For example, you can express distance in terms of time (measuring distance in light-seconds or light-years). By doing so, you can discard the speed of light: set it to 1. Speeds are now ratios of time to time: something moving at 30% the speed of light would move 0.3 (light) seconds per second: their speed would be the dimensionless quantity 0.3. You can drop many other physical constants in this fashion: Offhead, the speed of light, the gravitational constant, planks constant, the coulomb constant, and the Boltzmann constant can all be set to 1 without any trouble, and therefore don't count against your complexity budget.

Replies from: Dre
comment by Dre · 2013-07-10T21:59:01.556Z · LW(p) · GW(p)

First not: I'm not disagreeing with you so much as just giving more information.

This might buy you a few bits (and lots of high energy physics is done this way, with powers of electronvolts the only units here). But there will still be free variables that need to be set. Wikipedia claims (with a citation to this John Baez post) that there are 26 fundamental dimensionless physical constants. These, as far as we know right now, have to be hard coded in somewhere, maybe in units, maybe in equations, but somewhere.

Replies from: MarsColony_in10years
comment by MarsColony_in10years · 2015-04-27T05:45:37.526Z · LW(p) · GW(p)

As a reference for anyone encountering this discussion, I thought I'd mention Natural Units explicitly. Basically, they are the systems of units that particle physicists use. They are attempts to normalize out as many fundamental constants as possible, exactly as you discuss.

Unfortunately, you can't build a system that gets them all. You are always left with some fraction over pi, or the square root of the fine-structure constant, or something.

comment by Strilanc · 2013-07-08T05:21:07.318Z · LW(p) · GW(p)

I agree that the constants might be related in ways we don't know, which would allow compression. I'm more interested in an upper bound on the complexity than an exact value (which is likely incomputable for halting problem reasons), so I'm willing to be over by 100 bits because we fail to see a pattern.

As far variable constants: Sure, we can estimate the kolmogorov complexity where the "constants" are inputs. Or we can estimate the kolmogorov complexity of the current laws plus the state 13.7 billion years ago at the big bang. Or we can estimate the complexity of a program that runs all programs. All of these questions are interesting. But right now I want the answer to the one I asked, not the others.

edit clarified response

comment by The_Duck · 2013-07-08T04:45:31.361Z · LW(p) · GW(p)

Computer simulation of the strong interaction part of the Standard Model is a big research area: you may want to read about lattice QCD. I've written a simple lattice QCD simulation in a few hundred lines of code. If you Google a bit you can probably find some example code. The rest of the Standard Model has essentially the same structure and would only be a few more lines of code.

Replies from: Strilanc
comment by Strilanc · 2013-07-08T05:31:06.121Z · LW(p) · GW(p)

Assuming you're right about it only being a few more lines of code, and that you didn't use a lot of external libraries, that puts an upper bound at... a few dozen kilobits? I'm guessing that could be made a lot smaller, since you were likely not focused on minimizing the lines of code at all costs.

comment by Risto_Saarelma · 2013-07-08T07:25:33.188Z · LW(p) · GW(p)

Attacking physics head on is going to be a bit hand-wavy, since we don't have the complete picture of a formal system that encompasses all physics yet. What do we know about estimating actual, stated-as-an-approximate-number-of-bits Kolmogorov complexities for formal systems which we can describe completely? Can we give an informed estimate about the number of bits needed for the rules of chess, or Newtonian-only toy physics?

Replies from: Strilanc
comment by Strilanc · 2013-07-08T08:07:49.532Z · LW(p) · GW(p)

Attacking physics head on is going to be a bit hand-wavy, since we don't have the complete picture of a formal system that encompasses all physics yet.

How about for "the standard model" instead of "known physics"? That's a much better defined problem.

What do we know about estimating actual, stated-as-an-approximate-number-of-bits Kolmogorov complexities for formal systems which we can describe completely? Can we give an informed estimate about the number of bits needed for the rules of chess, or Newtonian-only toy physics?

Yes, we can (as long as you assume a particular instruction set). I don't know if anyone's done it, though. Also important to note that we can give upper bounds, but lower bounds are extremely difficult because of the inability to prove properties like "output matches physics" of programs in general.

comment by DanielLC · 2013-07-08T05:12:31.733Z · LW(p) · GW(p)

or whatever the size of the true laws of physics when written out as equations on a sheet of paper.

500 bits was a guess. It was not intended to be accurate. It just looks better if he uses an actual number.

Replies from: Strilanc
comment by Strilanc · 2013-07-08T05:28:17.188Z · LW(p) · GW(p)

I'm fine with a guess being included in the text, even one that isn't elaborated on. The post wasn't about making that particular estimate, and elaborating on it there would have been counter-productive.

However, that doesn't really answer my question. What factors need to be accounted for when estimating the kolmogorov complexity of physics? Which parts cost the most? Which parts cost almost nothing?

comment by shminux · 2013-07-08T04:45:00.058Z · LW(p) · GW(p)

Start small. Think F=ma first. Then build up.

EDIT: what's with the downvotes, people? "Start small" is the first rule of any complex undertaking. Once you master the Newtonian physics estimates (not as straightforward a task as it might seem), you can see how to work up from there.

comment by ThisSpaceAvailable · 2013-07-15T01:07:49.420Z · LW(p) · GW(p)

If we're talking about the size of a program to simulate the universe, isn't there good evidence that it's not Turing computable? My understanding is that solving Newton's equations for more than two bodies is not computable, to take just one example.

Replies from: Strilanc, gwern
comment by Strilanc · 2013-07-15T06:40:01.903Z · LW(p) · GW(p)

You're confusing chaos with incomputability. Chaos has to do with mixing, error amplification, and imperfect initial conditions. Incomputability has to do with abstract problems and whether or not they can solved by computers.

In any case, even if the standard model was an approximation of truly Turing-incomputable physics laws... I still want to know what its Kolmogorov complexity is.

comment by gwern · 2013-07-15T01:24:00.123Z · LW(p) · GW(p)

My understanding is that solving Newton's equations for more than two bodies is not computable, to take just one example.

What do you mean?

Replies from: Strilanc
comment by Strilanc · 2013-07-15T06:40:49.515Z · LW(p) · GW(p)

It's a reference to the three body problem, confusing incomputability with chaos.

comment by Douglas_Knight · 2013-07-08T17:38:44.710Z · LW(p) · GW(p)

The video you link is 4k bytes, not 4k bits.

comment by Kawoomba · 2013-07-08T17:10:17.303Z · LW(p) · GW(p)

It's that time again!

Time to link slide 11 of this presentation!

comment by ESRogs · 2013-07-08T07:55:40.218Z · LW(p) · GW(p)

The demo scene does some pretty amazing things with 4096 bits.

Clicked a couple links and got to this presentation, but I still wasn't able to figure out -- what exactly are they producing that's under 4k? Is it a media file with video and audio (and if so, what format is it in), or is it a small program that in turn produces a larger media file?

Replies from: Strilanc
comment by Strilanc · 2013-07-08T08:02:48.039Z · LW(p) · GW(p)

They typically produce a program that, when run, plays video and audio in real time after a loading period.

Replies from: ESRogs
comment by ESRogs · 2013-07-08T09:18:19.001Z · LW(p) · GW(p)

So, I take it the uncompiled source would be what's under 4k, rather than the resulting executable. Is that right?

Replies from: None
comment by [deleted] · 2013-07-08T09:25:55.251Z · LW(p) · GW(p)

It's the executable size that counts, generally.

Replies from: ESRogs
comment by ESRogs · 2013-07-08T20:48:16.442Z · LW(p) · GW(p)

Ah, cool. Thanks!

comment by Baughn · 2013-07-08T12:23:09.686Z · LW(p) · GW(p)

It's the program length that matters not it's time or memory performance

That's a common assumption, but does this really make sense?

It seems to me that if you have a program doing two things - simulating two people, say - then, if it has twice the overhead for the second one, the first should in some sense exist twice as much.

The same argument could be applied to universes.

Replies from: Baughn, gwern, Strilanc
comment by Baughn · 2013-07-09T01:07:18.757Z · LW(p) · GW(p)

Okay, I wouldn't normally do this, but.. what's with the downvote? I honestly have no idea what the problem is, which makes avoiding it hard. Please explain.

comment by gwern · 2013-07-08T15:54:54.138Z · LW(p) · GW(p)

https://en.wikipedia.org/wiki/Speed_prior / http://www.idsia.ch/~juergen/speedprior.html

Replies from: Baughn
comment by Baughn · 2013-07-09T01:14:09.420Z · LW(p) · GW(p)

"Use of the Speed Prior has the disadvantage of leading to less optimal predictions"

Unless I'm misunderstanding something, doesn't this imply we've already figured out that that's not the true prior? Which would be very interesting indeed.

Replies from: Eugine_Nier, gwern
comment by Eugine_Nier · 2013-07-09T04:02:29.802Z · LW(p) · GW(p)

Would someone mind explaining what "the true prior" is? Given that probability is in the mind, I don't see how the concept makes sense.

Replies from: Baughn, JoshuaZ
comment by Baughn · 2013-07-09T09:33:16.913Z · LW(p) · GW(p)

I was going for "Matches what the universe is actually doing", whether that means setting it equal to the apparent laws of physics or to something like a dovetailer.

Sure, there's no way of being sure we've figured out the correct rule; doesn't mean there isn't one.

Replies from: Eugine_Nier
comment by Eugine_Nier · 2013-07-10T02:23:54.415Z · LW(p) · GW(p)

In other words it's the prior that's 1 on the actual universe and 0 on everything else.

Replies from: Baughn
comment by Baughn · 2013-07-10T10:58:15.656Z · LW(p) · GW(p)

Sure. A little hard to determine, fair enough.

comment by JoshuaZ · 2013-07-09T04:31:13.621Z · LW(p) · GW(p)

I'm confused also. I think they may mean something like "empirically not the optimal prior we can use with a small amount of computation" but that doesn't seem consistent with how it is being used.

Replies from: Eugine_Nier
comment by Eugine_Nier · 2013-07-09T04:50:59.130Z · LW(p) · GW(p)

"empirically not the optimal prior we can use with a small amount of computation"

I'm not even sure that makes sense since if this is based on empirical observations, presumable there was some prior prior that was updated based on those observations.

Replies from: JoshuaZ
comment by JoshuaZ · 2013-07-09T04:55:45.143Z · LW(p) · GW(p)

Well, they could be using a set of distinct priors (say 5 or 6 of them) and then noting over time which set required less major updating in general, but I don't think this is what is going on either. We may need to just wait for Baughn to clarify what they meant.

comment by gwern · 2013-07-09T02:01:00.219Z · LW(p) · GW(p)

has the disadvantage of leading to less optimal predictions

As far as I know, any computable prior will have that disadvantage relative to full uncomputable SI.

comment by Strilanc · 2013-07-08T12:32:00.859Z · LW(p) · GW(p)

You're saying a Solomonoff Inductor would be outperformed by a variant that weighted quick programs more favorably, I think. (At the very least, it makes approximations computable.)

Whether or not penalizing for space/time cost increases the related complexity metric of the standard model is an interesting question, and there's a good chance it's a large penalty since simulating QM seems to require exponential time, but for starters I'm fine with just an estimate of the Kolmogorov Complexity.

Replies from: Baughn
comment by Baughn · 2013-07-09T01:02:10.114Z · LW(p) · GW(p)

Well, I'm saying the possibility is worth considering. I'm hardly going to claim certainty in this area.

As for QM...

The metric I think makes sense is, roughly, observer-moments divided by CPU time. Simulating QM takes exponential time, yes, but there's an equivalent exponential increase in the number of observer-moments. So QM shouldn't have a penalty vs. classical.

On the flip side this type of prior would heavily favor low-fidelity simulations, but I don't know if that's any kind of strike against it.