An Undergraduate Reading Of: Macroscopic Prediction by E.T. Jaynes

post by ryan_b · 2018-04-19T17:30:39.893Z · score: 38 (9 votes) · LW · GW · 13 comments


  Return to Body

I stumbled across a paper from 1996 on Macroscopic Prediction by E.T. Jaynes which interested me; I thought I would document my reading in the style I recommended in this post [LW · GW]. I don't have any expertise in Jaynes' fields, so it will serve as a good check for intuitions. It also may be of historical interest to the community. Lastly, I don't call it an undergraduate reading for nuthin', so it may be directly informational for people with less mathematical or scientific background.

This paper is organized a little differently, with the following sections:









I will match these sections up to the divisions I used originally, and go from there.


There isn't one in the version I have, which makes sense on account of this paper not proving a particular result.


This includes sections 1 and 2. The question is why macrophenomena are difficult to predict. The goal is to find a principle sufficiently general that it can be used for physics questions and for questions of biology and economics. The latter is an example where predictions are very poor, but even physics has difficulties with the relationship between macrophenomena and microphenomena, e.g. lasers. Some key points:

We enunciate a rather basic principle, which might be dismissed as an obvious triviality were it not for the fact that it is not recognized in any of the literature known to this writer:

If any macrophenomenon is found to be reproducible, then it follows that all microscopic details that were not reproduced, must be irrelevant for understanding and predicting it. In particular, all circumstances that were not under the experimenter's control are very likely not to be reproduced, and therefore are very likely not to be relevant.


This includes sections 3-6. Skipped.


The conclusion clarifies the relationship between this idea and what is currently (as of 1996) being done on similar problems.

Return to Body

Picking back up with section 3, and carrying through.

Gibbs' variational principle is, therefore, so simple in rationale that one almost hesitates to utter such a triviality; it says "predict that final state that can be realized by Nature in the greatest number of ways, while agreeing with your macroscopic information."

[I haven't figured out how to work LaTex in this interface, so I am skipping the bulk of the Mathematical Formalism section. It is also freely available in the link above]

So completes reading a Jaynes paper from about an undergraduate level.


Comments sorted by top scores.

comment by ryan_b · 2018-04-19T19:28:00.850Z · score: 27 (7 votes) · LW(p) · GW(p)

Naive Comments

I find this paper pretty inspirational. I've been playing with the intuitions he lays out in the first two sections for days.

It was written in 1996, and I am not altogether sure where the Principle of Maximum Entropy fits in - he uses the phrase 'maximum entropy' a lot. It occurs to me the Principle of Maximum Caliber may have a relationship with MaxEnt similar to that between Gibbs and Clausius' statements of the Second Law of Thermodynamics, but this isn't clear to me in the main because I know almost nothing about MaxEnt.

I was also reading the reply to Francois Chollet where the improvement of AlphaGo Zero over AlphaGo was being given as an example. In thinking about that in relation to this paper, I have two feelings:

1) I notice not a lot of coverage of what AlphaGo Zero was actually doing during the three day training period, and I really should look that up specifically.

2) What I suspect happened is AlphaGo Zero brute-force mapped the "phase-space" of Go for three days. The possible combinations of piece positions (microstates) are computationally intractable, so I read - AlphaGo Zero went to work on a different level of macrophenomena. So given the rules of Go and current position A, and virtually all I, it confidently predicts the winning end-game positions B.

This makes me think that the real trick to good predictions is making the optimal choice of macrophenomena. In the paper Jaynes consistently highlights nuance related to distinguishing his method from statistical mechanics, which makes sense as that is otherwise how people know him. It seems pretty clear to me that his generalizations liberate us from the traditional associations, which opens up a lot of room for new categories of macrophenomena. For example, consider this from the paper:

On a different plane, we feel that we understand the general thinking and economic motivations of the individual people who are the micro-elements of a society; yet millions of those people combine to make a macroeconomic system whose oscillations and unstable behavior, in defiance of equilibrium theory, leave us bewildered.

So we have:

humans (microphenomena) -> economy (macrophenomena)

But suppose we find humans computationally intractable and the macrophenomena of the economy too imprecise. We could add a middle layer of institutions, like firms and governments, which are also made up of humans. So now we have:

humans (microphenomena) -> institutions (macrophenomena)


institutions (microphenomena) -> economy (macrophenomena)

So if it happens that institutions is something you can get a good grip on, no one else will be able to significantly out-predict you about the economy unless they can get a better grip than you on institutions, or they find a new macrophenomena above humans that they can master comparably well and contains more information about the economy than institutions do.

Since we are starting from the perspective of macrophenomena, I keep wanting to say resolution. So if we have our microphenomena on the bottom and the macrophenomena at the top, one strategy might be to look 'down' from the macrophenomena and try to identify the lowest-level intermediate-phenomena that can be reasonably computed, and then get a decisive description of that phenomena before returning to predicting the macrophenomena.

Sort of the same way a Fast Fourier Transform works; by cleverly choosing intermediate steps, we can get to the answer we want faster (or in this case, more accurately).

comment by habryka (habryka4) · 2018-04-20T03:09:58.483Z · score: 12 (3 votes) · LW(p) · GW(p)

To use LaTeX in our editor: Press CTRL+4 (or cmd-4 on a Mac) and you will enter LaTeX mode.

comment by ryan_b · 2018-04-20T12:31:02.115Z · score: 11 (2 votes) · LW(p) · GW(p)

Superb! I will fiddle with this and start adding the key equations over the coming days.

comment by johnswentworth · 2018-04-19T23:55:08.978Z · score: 11 (4 votes) · LW(p) · GW(p)
If any macrophenomenon is found to be reproducible, then it follows that all microscopic details that were not reproduced, must be irrelevant for understanding and predicting it. In particular, all circumstances that were not under the experimenter's control are very likely not to be reproduced, and therefore are very likely not to be relevant.

I'm having trouble expressing in words just how useful that is. It clarifies a whole range of questions and topics I think about regularly. Thankyou for sharing!

comment by romeostevensit · 2018-04-20T22:47:49.906Z · score: 10 (3 votes) · LW(p) · GW(p)

Extremely useful and conveniently timely for my own work. Thanks for writing this up.

comment by JeremyHussell · 2018-04-21T20:10:51.867Z · score: 8 (4 votes) · LW(p) · GW(p)

Note that this paper was first published in 1985, not 1996. The full source is in a footnote at the bottom of the first page.

comment by ryan_b · 2018-04-22T01:43:03.605Z · score: 6 (2 votes) · LW(p) · GW(p)

Well spotted! This helps with putting the maximum entropy comments in context.

comment by Ben Pace (Benito) · 2018-04-19T22:49:03.239Z · score: 8 (3 votes) · LW(p) · GW(p)

Promoted to frontpage.

comment by ryan_b · 2018-04-19T17:53:52.330Z · score: 7 (3 votes) · LW(p) · GW(p)

A word on the better methods Jaynes refers to: these are [1] and [2] in the References. I actually encountered [2], Truesdell's Rational Thermodynamics, previously as a consequence of this community.

The pitch here is basically tackling thermodynamics from axioms with field equations. In particular Truesdell advocated a method of moments, which is to say keep adding fields of the behavior you are concerned with to refine the answer.

Truesdell was important to the field of Continuum Mechanics, which is still used in engineering. The idea here is that rather than calculating what happens to each particle of a material, the material is treated as a continuum, and then you calculate the change in the particular property you are concerned with. This is an efficient way to get numerical answers about stress, shear, heat conduction, memory effects, tearing, etc. Rational Thermodynamics is generalizing the continuum method. They have proved Navier-Stokes as a special case of the method of moments, although they also demonstrated that many additional moments does not significantly outperform Navier-Stokes. The suggestion is that it may take hundreds or thousands of moments to get an improvement here, which was impractical at the time Truesdell was writing. However, that was before we had really powerful computing tools for addressing problems like this.

The way I got to Truesdell was through an anonymous blog from a poster on either the old LessWrong or possibly even the joint Overcoming Bias posts, and in that blog they referenced a review Jaynes did of one of Truesdell's papers. Jaynes was impressed that Truesdell had arrived at virtually the same formalism that Jaynes himself had, through purely mathematical means.

This is not important, except for being cool and motivating me to look into Rational Thermodynamics.

comment by Hazard · 2020-05-04T21:34:31.105Z · score: 4 (2 votes) · LW(p) · GW(p)

I liked this paper and summary, and was able to follow most of it except for the actual physics :)

I feel like I missed something important though:

If we are trying to judge , what's the use of knowing the entropy of state ? The thrust I got was "Give weight to possible in accordance with their entropy, and somehow constrain that with info from ", but I didn't get a sense of what using as constraints looked like (I expect that it would make more sense if I could do the physics examples).

comment by ryan_b · 2020-05-05T19:34:22.403Z · score: 4 (2 votes) · LW(p) · GW(p)

I think this is captured in Section 5, the Maximum Caliber Principle:

We are given macroscopic information A which might consist of values of several physical quantities . . . such as distribution of stress, magnetization, concentration of various chemical components, etc. in various space time regions. This defines a caliber . . . which measures the number of time dependent microstates consistent with the information A.

So the idea is that you take the macro information A, use that to identify the space of possible microstates. For maximum rigor you do this independently for A and B, and if they do not share any microstates then B is impossible. When we make a prediction about B, we choose the value of B that has the biggest overlap with the possible microstates of A.

He talks a little bit more about the motivation for doing this in the Conclusion, here:

We should correct a possible misconception that the reader may have gained. Most recent discussions of macrophenomena outside of physical chemistry concentrate entirely on the dynamics (microscopic equations of motion or an assumed dynamical model at a higher level, deterministic or stochastic) and ignore the entropy factors of macrostates altogether. Indeed, we expect that such efforts will succeed fairly well if the macrostates of interest do not differ greatly in entropy.

Emphasis mine. So the idea here is that if you don't need to account for the entropy of A, you will be able to tackle the problem using normal methods. If the normal methods fail, it's a sign that we need to account for the entropy of A, and therefore to use this method.

I can't do the physics examples either except in very simple cases. I am comforted by this line:

Although the mathematical details needed to carry it out can become almost infinitely complicated...
comment by Hazard · 2020-05-05T23:13:22.633Z · score: 4 (2 votes) · LW(p) · GW(p)

Thanks! In my head, I was using the model of "flip 100 coins, exact value of all coins is micro states, heads-tails count is macro state". In that model, the macro states form disjoint sets, so it's probably not a good example.

I think I get your point in abstract, but I'm struggling to form an example model that fits it. Any suggestions?

comment by ryan_b · 2020-06-16T21:05:55.600Z · score: 9 (2 votes) · LW(p) · GW(p)

Apologies for this being late; I also struggled to come up with an example model. Checking the references, he talks about A more thoroughly in the paper where the idea was originally presented.

I strongly recommend taking a look at page 5 of the PDF, which is where he starts a two page section clarifying the meaning of entropy in this context. I think this will help a lot...once I figure it out.