# [LINK] The Bayesian Second Law of Thermodynamics

post by shminux · 2015-08-12T16:52:48.556Z · score: 8 (9 votes) · LW · GW · Legacy · 16 commentsSean Carroll et al. posted a preprint with the above title. Sean also has a discussion of it in his blog.

While I am a physicist by training, statistical mechanics and thermodynamics is not my strong suit, and I hope someone with expertise in the area can give their perspective on the paper. For now, here is my summary, apologies for any potential errors:

There is a tension between different definitions of entropy: Boltzmann entropy, which counts macroscopically indistinguishable microstates always increases, except for extremely rare decreases. Gibbs/Shannon entropy, which counts our knowledge of a system, can decrease if an observer examines the system and learns something new about it. Jaynes had a paper on that topic, Eliezer discussed this in the Sequences, and spxtr recently wrote a post about it. Now Carroll and collaborators propose the "Bayesian Second Law" that quantifies this decrease in Gibbs/Shannon entropy due to a measurement:

[...] we derive the Bayesian Second Law of Thermodynamics, which relates the original (un-updated) distribution at initial and final times to the updated distribution at initial and final times. That relationship makes use of the cross entropy between two distributions [...]

[...] the Bayesian Second Law (BSL) tells us that this lack of knowledge — the amount we would learn on average by being told the exact state of the system, given that we were using the un-updated distribution — is always larger at the end of the experiment than at the beginning (up to corrections because the system may be emitting heat)

This last point seems to resolve the tension between the two definitions of entropy, and has applications to non-equilibrium processes, where an observer is replaced with an outcome of some natural process, such as RNA self-assembly.

## 16 comments

Comments sorted by top scores.

There is so much confusion surrounding the topic of entropy. Which is somewhat sad, since it's fundamentally a very well-defined and useful concept. Entropy *is* my strong suit, and I'll try to see if I can help.

There are no 'different definitions' of entropy. Boltzmann and Shannon Entropy are the same concept. The problem is that information theory by itself doesn't give the complete *physical* picture of entropy. Shannon entropy only tells you what the entropy of a given distribution is, but it doesn't tell you what the distribution of states for a physical system is. This is the root of the 'tension' that you're describing. Much of the problems in reconciling information theory with statistical mechanics have been that we don't often have a clear idea what the distribution of states of a given system is.

which counts macroscopically indistinguishable microstates always increases, except for extremely rare decreases.

The 2nd law is never violated, not even a little. Unfortunately the idea that entropy itself can decrease in a closed system is a misconception which has become very widespread. *Disorder* can sometimes decrease in a closed system, but disorder has nothing to do with entropy!

Gibbs/Shannon entropy, which counts our knowledge of a system, can decrease if an observer examines the system and learns something new about it.

This is exactly the same as Boltzmann entropy. This is the origin of Maxwell's demon, and it doesn't violate the 2nd law.

the Bayesian Second Law (BSL) tells us that this lack of knowledge — the amount we would learn on average by being told the exact state of the system, given that we were using the un-updated distribution — is always larger at the end of the experiment than at the beginning (up to corrections because the system may be emitting heat)

This is precisely correct and is the proper way to view entropy. Ideas similar to this have been floating around for quite some time and this work doesn't seem to be anything fundamentally new. It just seems to be rephrasing of existing ideas. However if it can help people understand entropy then I think it's a quite valuable rephrasing.

I was thinking about writing a series of blog posts explaining entropy in a rigorous yet simple way, and got to the draft level before real-world commitments caught up with me. But if anyone is interested and knows about the subject and is willing to offer their time to proofread, I'm willing to have a go at it again.

this work doesn't seem to be anything fundamentally new. It just seems to be rephrasing of existing ideas. However if it can help people understand entropy then I think it's a quite valuable rephrasing.

Sean Carroll seems to think otherwise, judging by the abstract:

We derive a generalization of the Second Law of Thermodynamics that uses Bayesian updates to explicitly incorporate the effects of a measurement of a system at some point in its evolution.

[...]

We also derive refined versions of the Second Law that bound the entropy increase from below by a non-negative number, as well as Bayesian versions of the Jarzynski equality.

This seems to imply that this is a genuine research result, not just a didactic exposition. Do you disagree?

The 2nd law is never violated, not even a little. Unfortunately the idea that entropy itself can decrease in a closed system is a misconception which has become very widespread. Disorder can sometimes decrease in a closed system, but disorder has nothing to do with entropy!

Could you elaborate on this further? Order implies regularity which implies information that I can burn to extract useful work. I think I agree with you, but I'm not sure that I understand all the implications of what I'm agreeing with.

A simple example is that in a closed container filled with gas it's possible for all the gas molecules to spontaneously move to one side of the container. This temporarily increases the order but has nothing to do with entropy.

I think you're ignoring the difference between the Boltzmann and Gibbs entropy, both here and in your original comment. This is going to be long, so I apologize in advance.

Gibbs entropy is a property of ensembles, so it doesn't change when there is a spontaneous fluctuation towards order of the type you describe. As long as the gross constraints on the system remain the same, the ensemble remains the same, so the Gibbs entropy doesn't change. And it is the Gibbs entropy that is most straightforwardly associated with the Shannon entropy. If you interpret the ensemble as a probability distribution over phase space, then the Gibbs entropy of the ensemble is just the Shannon entropy of the distribution (ignoring some irrelevant and anachronistic constant factors). Everything you've said in your comments is perfectly correct, if we're talking about Gibbs entropy.

Boltzmann entropy, on the other hand, is a property of regions of phase space, not of ensembles or distributions. The famous Boltzmann formula equates entropy with the logarithm of the volume of a region in phase space. Now, it's true that corresponding to every phase space region there is an ensemble/distribution whose Shannon entropy is identical to the Boltzmann entropy, namely the distribution that is uniform in that region and zero elsewhere. But the converse isn't true. If you're given a generic ensemble or distribution over phase space and also some partition of phase space into regions, it need not be the case that the Shannon entropy of the distribution is identical to the Boltzmann entropy of *any* of the regions.

So I don't think it's accurate to say that Boltzmann and Shannon entropy are the same concept. Gibbs and Shannon entropy are the same, yes, but Boltzmann entropy is a less general concept. Even if you interpret Boltzmann entropy as a property of distributions, it is only identical to the Shannon entropy for a subset of possible distributions, those that are uniform in some region and zero elsewhere.

As for the question of whether Boltzmann entropy can decrease spontaneously in a closed system -- it really depends on how you partition phase space into Boltzmann macro-states (which are just regions of phase space, as opposed to Gibbs macro-states, which are ensembles). If you define the regions in terms of the gross experimental constraints on the system (e.g. the volume of the container, the external pressure, the external energy function, etc.), then it will indeed be true that the Boltzmann entropy can't change without some change in the experimental constraints. Trivially true, in fact. As long as the constraints remain constant, the system remains within the same Boltzmann macro-state, and so the Boltzmann entropy must remain the same.

However, this wasn't how Boltzmann himself envisioned the partitioning of phase space. In his original "counting argument" he partitioned phase space into regions based on the collective properties of the particles themselves, not the external constraints. So from his point of view, the particles all being scrunched up in one corner of the container is not the same macro-state as the particles being uniformly spread throughout the container. It is a macro-state (region) of smaller volume, and therefore of lower Boltzmann entropy. So if you partition phase space in this manner, the entropy of a closed system *can* decrease spontaneously. It's just enormously unlikely. It's worth noting that subsequent work in the Boltzmannian tradition, ranging from the Ehrenfests to Penrose, has more or less adopted Boltzmann's method of delineating macrostates in terms of the collective properties of the particles, rather than the external constraints on the system.

Boltzmann's manner of talking about entropy and macro-states seems necessary if you want to talk about the entropy of the universe as a whole increasing, which is something Carroll definitely wants to talk about. The increase in the entropy of the universe is a consequence of spontaneous changes in the configuration of its constituent particles, not a consequence of changing external constraints (unless you count the expansion of the universe, but that is not enough to fully account for the change in entropy on Carroll's view).

This is going to be a somewhat technical reply, but here goes anyway.

Boltzmann entropy, on the other hand, is a property of regions of phase space, not of ensembles or distributions. The famous Boltzmann formula equates entropy with the logarithm of the volume of a region in phase space. Now, it's true that corresponding to every phase space region there is an ensemble/distribution whose Shannon entropy is identical to the Boltzmann entropy, namely the distribution that is uniform in that region and zero elsewhere.

You cannot calculate the Shannon entropy of a continuous distribution so this doesn't make sense. However I see what you're getting at here - if we assume that all parts of the phase space have equal probability of being visited, then the 'size' of the phase space can be taken as proportional to the 'number' of microstates (this is studied under ergodic theory). But to make this argument work for actual physical systems where we want to calculate real quantities from theoretical considerations, the phase space must be 'discretized' in some way. A very simple way of doing this is the Sackur-Tetrode formulation which discretizes a continuous space based on the Heisenberg uncertainty principle ('discretize' is the best word I can come up with here -- what I mean is not listing the microstates but instead giving the volume of the phase space in terms of some definite elementary volume). But there's a catch here. To be able to use the HUP, you have to formulate the phase space in terms of complementary parameters. For instance, momentum+position, or time+energy.

However, this wasn't how Boltzmann himself envisioned the partitioning of phase space. In his original "counting argument" he partitioned phase space into regions based on the collective properties of the particles themselves, not the external constraints.

My previous point illustrates why this naive view is not physical - you can't discretize any kind of system. With some systems - like a box full of particles that can have arbitrary position and momentum - you get infinite (non-physical) values for entropy. It's easy to see why you can now get a fluctuation in entropy - infinity 'minus' some number is still infinity!

I tried re-wording this argument several times but I'm still not satisfied with my attempt at explaining it. Nevertheless, this is how it is. Looking at entropy based on models of collective properties of particles may be interesting theoretically but it may not always be a physically realistic way of calculating the entropy of the system. If you go through something like the Sackur-Tetrode way, though, you see that Boltzmann entropy is the same thing as Shannon entropy.

Boltzmann's original combinatorial argument already presumed a discretization of phase space, derived from a discretization of single-molecule phase space, so we don't need to incorporate quantum considerations to "fix" it. The combinatorics relies on dividing single-particle state space into tiny discrete boxes, then looking at the number of different ways in which particles could be distributed among those boxes, and observing that there are more ways for the particles to be spread out evenly among the boxes than for them to be clustered. Without discretization the entire argument collapses, since no more than one particle would be able to occupy any particular "box", so clustering would be impossible.

So Boltzmann *did* successfully discretize a box full of particles with arbitrary position and momentum, and using his discretization he derived (discrete approximations of) the Maxwell-Boltzmann distribution and the Boltzmann formula for entropy. And he did all this without invoking (or, indeed, being aware of) quantum considerations. So the Sackur-Tetrode route is not a requirement for a discretized Boltzmann-esque argument. I guess you could argue that in the absence of quantum considerations there is no way to justify the discretization, but I don't see why not. The discretization need not be interpreted as ontological, emerging from the Uncertainty Principle. It could be interpreted as merely epistemological, a reflection of limits to our abilities of observation and intervention.

Incidentally, none of these derivations require the assumption of ergodicity in the system. The result that the size of a macrostate in phase space is proportional to the number of microstates emerges purely from the combinatorics, with no assumptions about the system's dynamics (other than that they are Hamiltonian). Ergodicity, or something like it, is only required to establish that the time spent by a system in a particular macrostate is proportional to the size of the macrostate, and that is used to justify probabilistic claims about the system, such as the claim that a closed system observed at an arbitrary time is overwhelmingly likely to be in the macrostate of maximum Boltzmann entropy.

So ultimately, I do think the point Carroll was making is valid. The Boltzmann entropy -- as in, the actual original quantity defined by Boltzmann and refined by the Ehrenfests, not the modified interpretation proposed by people like Jaynes -- is distinct from the Gibbs entropy. The former can increase (or decrease) in closed system, the latter cannot.

To put it slightly more technically, the Gibbs entropy, being a property of a distribution that evolves according to Hamiltonian laws, is bound to stay constant by Liouville's theorem, unless there is a geometrical change in the accessible phase space or we apply some coarse-graining procedure. Boltzmann entropy, being a property of macrostates, not of distributions, is not bound by Liouville's theorem. Even if you interpret the Boltzmann entropy as a property of a distribution, it is not a distribution that evolves in a Hamiltonian manner. It evolves discontinuously when the system moves from one Boltzmann macrostate to the next.

But if I know that all the gas molecules are in one half of the container, then I can move a piston for free and then as the gas expands to fill the container again I can extract useful work. It seems like if I know about this increase in order it definitely constitutes a decrease in entropy.

If you know precisely when this increase in order will occur then your knowledge about the system is necessarily very high and your entropy is necessarily very low (probably close to zero) to begin with.

I feel like this may be a semantics issue. I think that order implies information. To me, saying that a system becomes more ordered implies that I know about the increased order somehow. Under that construction, disorder (i.e. the absence of detectable patterns) is a measure of ignorance and disorder then is closely related to entropy. You may be preserving a distinction between the map and territory (i.e. between the system and our knowledge of the system) that I'm neglecting. I'm not sure which framework is more useful/productive.

I think it's definitely an important distinction to be aware of either way.

'order' is not a well-defined concept. One person's order is another's chaos. Entropy, on the other hand, is a well-defined concept.

Even though entropy depends on the information you have about the system, the *way* that it depends on that is not subjective, and any two observers with the same amount of information about the system must come up with the exact same quantity for entropy.

All of this might seem counter-intuitive at first but it makes sense when you realize that Entropy(system) isn't well-defined, but Entropy(system, model) is precisely defined. The 'model' is what Bayesians would call the prior. It is always there, either implicitly or explicitly.

I'd be interested in this.

Also, I have a background in thermodynamics (took three courses and TAed one course), but from an engineering perspective. I'd be happy to proofread if you want someone to, though my statistical mechanics background is probably weaker than yours.

I'm very surprised E.T. Jaynes isn't cited anywhere in the bibliography, given he wrote tons of articles about that kind of things.

I'm not a physicist, but aren't this and the linked quanta article on Prof. England's work bad news? (great filter wise)

If this implies self-assembly is much more common in the universe, then that makes it worse for the latter proposed filters (i.e. makes them EDIT higher probability)