post by ryan_b · 2018-04-19T17:30:39.893Z · score: 38 (9 votes) · LW · GW · 13 comments

## Contents

  Abstract
Introduction
Body
Conclusion
None


I stumbled across a paper from 1996 on Macroscopic Prediction by E.T. Jaynes which interested me; I thought I would document my reading in the style I recommended in this post [LW · GW]. I don't have any expertise in Jaynes' fields, so it will serve as a good check for intuitions. It also may be of historical interest to the community. Lastly, I don't call it an undergraduate reading for nuthin', so it may be directly informational for people with less mathematical or scientific background.

This paper is organized a little differently, with the following sections:

1. INTRODUCTION

2. HISTORICAL BACKGROUND

3. THE BASIC IDEA

4. MATHEMATICAL FORMALISM

5. THE MAXIMUM CALIBER PRINCIPLE

6. BUBBLE DYNAMICS

7. CONCLUSION

8. REFERENCES

I will match these sections up to the divisions I used originally, and go from there.

Abstract

There isn't one in the version I have, which makes sense on account of this paper not proving a particular result.

Introduction

This includes sections 1 and 2. The question is why macrophenomena are difficult to predict. The goal is to find a principle sufficiently general that it can be used for physics questions and for questions of biology and economics. The latter is an example where predictions are very poor, but even physics has difficulties with the relationship between macrophenomena and microphenomena, e.g. lasers. Some key points:

• Microphenomena and macrophenomena are defined in relation to each other: in general, understanding the elements that make up something else is insufficient for understanding the something else. An additional principle is needed.
• Jaynes argues that the Gibbs entropy from thermodynamics is that principle.
• Statistical mechanics does not work for this because its theorems treat the microstate as getting close to all possible states allowed by the total energy. This does not match observations, e.g. solids and organisms.
• Over human-relevant timescales, we see far fewer configurations of macrophenomena than allowed by their energy.
• Given information about macroscopic quantities A, other relevant information I, what can we say about other macroscopic quantities B?
• Not enough information for deduction, therefore inference
• Carnot -> Kelvin -> Clausius
• Clausius' statement of the Second Law of Thermodynamics gives little information about future macrostates, and only says entropy trends toward increasing. Intermediate states undefined.
• Enter Gibbs, with a variational principle for determining the final equilibrium state.
• Nobody seems to have noticed until G. N. Lewis in 1923, 50 years later.
• 50 years after G. N. Lewis, Jaynes thought we had only about half of the available insight from Gibbs.
• This is probably because Gibbs died young, without time for expository work and students to carry on. Therefore re-discovery was necessary.
• A quote:

We enunciate a rather basic principle, which might be dismissed as an obvious triviality were it not for the fact that it is not recognized in any of the literature known to this writer:

If any macrophenomenon is found to be reproducible, then it follows that all microscopic details that were not reproduced, must be irrelevant for understanding and predicting it. In particular, all circumstances that were not under the experimenter's control are very likely not to be reproduced, and therefore are very likely not to be relevant.

• Control of a few macroscopic quantities is often enough for a reproducible macroscopic result, e.g. heat conduction, viscuous laminar flow, shockwaves, lasers.
• DNA determines most things about the organism; this is highly reproducible; should be predictable.
• We should expect that progress since Clausius deals with how to recognize and deal with I. Gibbs does this. Physics remains stuck with Clausius' formulation, despite better alternatives being available. [See a bit more on this in the comments]
• Physical chemists have used Gibbs through G.N. Lewis for a long time, but rule-of-thumb extensions to cover non-equilibrium cases are numerous and unsatisfactory.

Body

This includes sections 3-6. Skipped.

Conclusion

The conclusion clarifies the relationship between this idea and what is currently (as of 1996) being done on similar problems.

• Possible misconception: recent work on macrostates is about dynamics, like microscopic equations of motion or higher-level dynamical models; they ignore entropy.
• If the macrostates differ little in entropy, then entropy-less solutions are expected to be successful. Areas where they do not work are a good candidate for this entropy method.
• It is expected that dynamics will reappear automatically when using the entropy method on realistic problems, through the Heisenberg operator.

Picking back up with section 3, and carrying through.

• First thought: the macrostate is only a projection of the microstate with less detail, ergo microbehavior determines macrobehavior. There are no other considerations.
• This is wrong. We have to consider that we never know the microstate, only about the macrostate.
• Reproducibility means that should be enough, if we can use the information right.
• Gibbs and Hetereogeneous Equilibrium: given a few macrovariables in non-equilibrium, predict the final equilibrium macrostate.
• To solve this, Gibbs made the Second Law a stronger statement: entropy will increase, to the maximum allowed by experimental conditions and conservation laws.
• This makes the Second Law weaker than conservation laws: there are microstates allowed by the data for which the system will not go to the macrostate of maximum entropy.
• If reproducible, then Gibbs' rule predicts quantitatively.
• Entropy is only a property of the macrostate. Unfortunately, Gibbs did not elucidate entropy itself.
• From Boltzmann, Einstein, and Planck: the thermodynamic entropy is basically the logarithm of the phase volume; the number of ways it can be realized.
• Quote:

Gibbs' variational principle is, therefore, so simple in rationale that one almost hesitates to utter such a triviality; it says "predict that final state that can be realized by Nature in the greatest number of ways, while agreeing with your macroscopic information."

• Generalizes: predict the behavior that can happen in the greatest number of ways, while agreeing with whatever information you have.
• From simplicity, generality. Then Jaynes chides scientists for demanding complexity to accept things.
• Reproducibility means that we have all the required information.
• Macrostate information A means some class of microstates C, the majority of which have to agree for reproducibility to happen.
• A subset of microstates in C would not lead to the predicted result, therefore it is inference rather than deduction.
• In thermodynamics a small increase in the entropy of a macrostate leads to an enormous increase in the number of ways to realize it; this is why Gibbs' rule works.
• We cannot expect as large a ratio in other fields, but that is not necessary to be useful and can be compensated for with more information.
• The information is useful insofar as it shrinks class C; how useful is how much entropy reduction it achieves.
• We need to locate C and determine which macrostate is consistent with most of them. Enter probability theory.

[I haven't figured out how to work LaTex in this interface, so I am skipping the bulk of the Mathematical Formalism section. It is also freely available in the link above]

• We use probability distributions over microstates. This being the early stages of the Bayesian Wars, obligatory frequentism sux.
• Asymptotic equipartition theorem of information theory, using von Neumann-Shannon information entropy from quantum theory.
• From experimentation, we see W = exp(H) is valid.
• Equilibrium statistical mechanics is contained in the rule as a special case.
• There is a problem of "induction time"; if all we have is t = 0, then our predictions are already as good is possible.
• Real phenomena have already persisted from some time in the past, so induction time problem is resolved. Values of A at t != 0 improve predictions.
• This motivates the interpretation of probabilities.
• The probability density matrix, with maximum entropy, for one moment of time, assigns equal probability to every compatible state regardless of history.
• Fading memory effects are characteristic of irreversible processes; behavior depends on history.
• There is an extension to allow time-dependent information; this includes the dreaded "extended in the obvious way."
• At this point if you use regular thermodynamic parameters, the Clausius experimental entropy falls out.
• Maximum information entropy as a function of the entire space-time history of the macroscopic process: the caliber.
• There's a Maximum Caliber Principle which flat defeats me because I don't know anything about the Fokker-Planck and Onsager work it makes reference to.
• In the Bubble Dynamics section they offer a sketch of using short term memory effects.

comment by ryan_b · 2018-04-19T19:28:00.850Z · score: 27 (7 votes) · LW(p) · GW(p)

I find this paper pretty inspirational. I've been playing with the intuitions he lays out in the first two sections for days.

It was written in 1996, and I am not altogether sure where the Principle of Maximum Entropy fits in - he uses the phrase 'maximum entropy' a lot. It occurs to me the Principle of Maximum Caliber may have a relationship with MaxEnt similar to that between Gibbs and Clausius' statements of the Second Law of Thermodynamics, but this isn't clear to me in the main because I know almost nothing about MaxEnt.

I was also reading the reply to Francois Chollet where the improvement of AlphaGo Zero over AlphaGo was being given as an example. In thinking about that in relation to this paper, I have two feelings:

1) I notice not a lot of coverage of what AlphaGo Zero was actually doing during the three day training period, and I really should look that up specifically.

2) What I suspect happened is AlphaGo Zero brute-force mapped the "phase-space" of Go for three days. The possible combinations of piece positions (microstates) are computationally intractable, so I read - AlphaGo Zero went to work on a different level of macrophenomena. So given the rules of Go and current position A, and virtually all I, it confidently predicts the winning end-game positions B.

This makes me think that the real trick to good predictions is making the optimal choice of macrophenomena. In the paper Jaynes consistently highlights nuance related to distinguishing his method from statistical mechanics, which makes sense as that is otherwise how people know him. It seems pretty clear to me that his generalizations liberate us from the traditional associations, which opens up a lot of room for new categories of macrophenomena. For example, consider this from the paper:

On a different plane, we feel that we understand the general thinking and economic motivations of the individual people who are the micro-elements of a society; yet millions of those people combine to make a macroeconomic system whose oscillations and unstable behavior, in defiance of equilibrium theory, leave us bewildered.

So we have:

humans (microphenomena) -> economy (macrophenomena)

But suppose we find humans computationally intractable and the macrophenomena of the economy too imprecise. We could add a middle layer of institutions, like firms and governments, which are also made up of humans. So now we have:

humans (microphenomena) -> institutions (macrophenomena)

AND

institutions (microphenomena) -> economy (macrophenomena)

So if it happens that institutions is something you can get a good grip on, no one else will be able to significantly out-predict you about the economy unless they can get a better grip than you on institutions, or they find a new macrophenomena above humans that they can master comparably well and contains more information about the economy than institutions do.

Since we are starting from the perspective of macrophenomena, I keep wanting to say resolution. So if we have our microphenomena on the bottom and the macrophenomena at the top, one strategy might be to look 'down' from the macrophenomena and try to identify the lowest-level intermediate-phenomena that can be reasonably computed, and then get a decisive description of that phenomena before returning to predicting the macrophenomena.

Sort of the same way a Fast Fourier Transform works; by cleverly choosing intermediate steps, we can get to the answer we want faster (or in this case, more accurately).

comment by habryka (habryka4) · 2018-04-20T03:09:58.483Z · score: 12 (3 votes) · LW(p) · GW(p)

To use LaTeX in our editor: Press CTRL+4 (or cmd-4 on a Mac) and you will enter LaTeX mode.

comment by ryan_b · 2018-04-20T12:31:02.115Z · score: 11 (2 votes) · LW(p) · GW(p)

Superb! I will fiddle with this and start adding the key equations over the coming days.

comment by johnswentworth · 2018-04-19T23:55:08.978Z · score: 11 (4 votes) · LW(p) · GW(p)
If any macrophenomenon is found to be reproducible, then it follows that all microscopic details that were not reproduced, must be irrelevant for understanding and predicting it. In particular, all circumstances that were not under the experimenter's control are very likely not to be reproduced, and therefore are very likely not to be relevant.

I'm having trouble expressing in words just how useful that is. It clarifies a whole range of questions and topics I think about regularly. Thankyou for sharing!

comment by romeostevensit · 2018-04-20T22:47:49.906Z · score: 10 (3 votes) · LW(p) · GW(p)

Extremely useful and conveniently timely for my own work. Thanks for writing this up.

comment by JeremyHussell · 2018-04-21T20:10:51.867Z · score: 8 (4 votes) · LW(p) · GW(p)

Note that this paper was first published in 1985, not 1996. The full source is in a footnote at the bottom of the first page.

comment by ryan_b · 2018-04-22T01:43:03.605Z · score: 6 (2 votes) · LW(p) · GW(p)

Well spotted! This helps with putting the maximum entropy comments in context.

comment by Ben Pace (Benito) · 2018-04-19T22:49:03.239Z · score: 8 (3 votes) · LW(p) · GW(p)

Promoted to frontpage.

comment by ryan_b · 2018-04-19T17:53:52.330Z · score: 7 (3 votes) · LW(p) · GW(p)

A word on the better methods Jaynes refers to: these are [1] and [2] in the References. I actually encountered [2], Truesdell's Rational Thermodynamics, previously as a consequence of this community.

The pitch here is basically tackling thermodynamics from axioms with field equations. In particular Truesdell advocated a method of moments, which is to say keep adding fields of the behavior you are concerned with to refine the answer.

Truesdell was important to the field of Continuum Mechanics, which is still used in engineering. The idea here is that rather than calculating what happens to each particle of a material, the material is treated as a continuum, and then you calculate the change in the particular property you are concerned with. This is an efficient way to get numerical answers about stress, shear, heat conduction, memory effects, tearing, etc. Rational Thermodynamics is generalizing the continuum method. They have proved Navier-Stokes as a special case of the method of moments, although they also demonstrated that many additional moments does not significantly outperform Navier-Stokes. The suggestion is that it may take hundreds or thousands of moments to get an improvement here, which was impractical at the time Truesdell was writing. However, that was before we had really powerful computing tools for addressing problems like this.

The way I got to Truesdell was through an anonymous blog from a poster on either the old LessWrong or possibly even the joint Overcoming Bias posts, and in that blog they referenced a review Jaynes did of one of Truesdell's papers. Jaynes was impressed that Truesdell had arrived at virtually the same formalism that Jaynes himself had, through purely mathematical means.

This is not important, except for being cool and motivating me to look into Rational Thermodynamics.

comment by Hazard · 2020-05-04T21:34:31.105Z · score: 4 (2 votes) · LW(p) · GW(p)

I liked this paper and summary, and was able to follow most of it except for the actual physics :)

I feel like I missed something important though:

If we are trying to judge , what's the use of knowing the entropy of state ? The thrust I got was "Give weight to possible in accordance with their entropy, and somehow constrain that with info from ", but I didn't get a sense of what using as constraints looked like (I expect that it would make more sense if I could do the physics examples).

comment by ryan_b · 2020-05-05T19:34:22.403Z · score: 4 (2 votes) · LW(p) · GW(p)

I think this is captured in Section 5, the Maximum Caliber Principle:

We are given macroscopic information A which might consist of values of several physical quantities . . . such as distribution of stress, magnetization, concentration of various chemical components, etc. in various space time regions. This defines a caliber . . . which measures the number of time dependent microstates consistent with the information A.

So the idea is that you take the macro information A, use that to identify the space of possible microstates. For maximum rigor you do this independently for A and B, and if they do not share any microstates then B is impossible. When we make a prediction about B, we choose the value of B that has the biggest overlap with the possible microstates of A.

He talks a little bit more about the motivation for doing this in the Conclusion, here:

We should correct a possible misconception that the reader may have gained. Most recent discussions of macrophenomena outside of physical chemistry concentrate entirely on the dynamics (microscopic equations of motion or an assumed dynamical model at a higher level, deterministic or stochastic) and ignore the entropy factors of macrostates altogether. Indeed, we expect that such efforts will succeed fairly well if the macrostates of interest do not differ greatly in entropy.

Emphasis mine. So the idea here is that if you don't need to account for the entropy of A, you will be able to tackle the problem using normal methods. If the normal methods fail, it's a sign that we need to account for the entropy of A, and therefore to use this method.

I can't do the physics examples either except in very simple cases. I am comforted by this line:

Although the mathematical details needed to carry it out can become almost infinitely complicated...
comment by Hazard · 2020-05-05T23:13:22.633Z · score: 4 (2 votes) · LW(p) · GW(p)

Thanks! In my head, I was using the model of "flip 100 coins, exact value of all coins is micro states, heads-tails count is macro state". In that model, the macro states form disjoint sets, so it's probably not a good example.

I think I get your point in abstract, but I'm struggling to form an example model that fits it. Any suggestions?

comment by ryan_b · 2020-06-16T21:05:55.600Z · score: 9 (2 votes) · LW(p) · GW(p)

Apologies for this being late; I also struggled to come up with an example model. Checking the references, he talks about A more thoroughly in the paper where the idea was originally presented.

I strongly recommend taking a look at page 5 of the PDF, which is where he starts a two page section clarifying the meaning of entropy in this context. I think this will help a lot...once I figure it out.