Thermodynamic entropy = Kolmogorov complexity

aram-ebtekar

Thermodynamic entropy = Kolmogorov complexity

post by Aram Ebtekar (EbTech) · 2025-02-17T05:56:06.960Z · LW · GW · 12 comments

This is a link post for https://doi.org/10.1103/PhysRevE.111.014118

12 comments

Direct PDF link for non-subscribers

Information theory must precede probability theory, and not be based on it. By the
very essence of this discipline, the foundations of information theory have a finite combinatorial character.
- Andrey Kolmogorov

Many [LW · GW] alignment [? · GW] researchers [LW · GW] borrow [LW · GW] intuitions from thermodynamics: entropy relates to information, which relates to learning and epistemology. These connections were first revealed by Szilárd's resolution of Maxwell's famous thought experiment. However, the classical tools of equilibrium thermodynamics are not ideally suited to studying information processing far from equilibrium.

This new work reframes thermodynamics in terms of the algorithmic entropy. It takes an information-first approach, delaying the introduction of physical concepts such as energy and temperature until after the foundations are set. I find this approach more conceptually principled and elegant than the traditional alternatives.

It's based on a 30-year-old workshop paper, which until now was largely abandoned. Roughly speaking, the algorithmic entropy of a physical state is its Kolmogorov complexity; that is, the length of the shortest program that outputs a precise description of its microscopic configuration (to some bounded precision). This definition does away with probability distributions and macrovariables, and satisfies very general laws!

The paper is long, in part because I tried to make it self-contained. If you find yourself using entropy in a setting that is not described by a large number of identically distributed variables, then consider reframing your intuitions in terms of the algorithmic entropy!

12 comments

Comments sorted by top scores.

comment by Yoav Ravid · 2025-02-17T14:13:18.718Z · LW(p) · GW(p)

How does this relate to cross-entropy [LW · GW]?

Replies from: EbTech

↑ comment by Aram Ebtekar (EbTech) · 2025-02-17T16:15:35.957Z · LW(p) · GW(p)

They are equal! As discussed in the comments to that post, the difference is at most a constant; it's possible to make that constant vanish by an appropriate choice of reference universal Turning machine.

Replies from: sharmake-farah

↑ comment by Noosphere89 (sharmake-farah) · 2025-02-17T17:04:32.863Z · LW(p) · GW(p)

Notably, the difference can become non-constant if we allow for semi-measures on infinite strings:

https://www.lesswrong.com/posts/KcvJXhKqx4itFNWty/k-complexity-is-silly-use-cross-entropy-instead#LNgzYraTeGcEyry9a [LW(p) · GW(p)]

Replies from: Yoav Ravid

↑ comment by Yoav Ravid · 2025-02-17T19:00:14.539Z · LW(p) · GW(p)

Isn't that significant?

Replies from: EbTech

↑ comment by Aram Ebtekar (EbTech) · 2025-02-17T19:33:51.121Z · LW(p) · GW(p)

It is, when dealing with sequences that go on to infinity. In that case you get the "KM complexity", from Definition 4.5.8 of Li & Vitanyi (2019). For online sequence prediction, Solomonoff's prior needs to sum the weights from every program.

No such complications appear in the entropy of a bounded system at a fixed precision. And ultimately, it appears that for entropy to increase, you need some kind of coarse-graining, leading us to finite strings. I discuss this in the Background section and around Corollary 1.

comment by Stephen Fowler (LosPolloFowler) · 2025-02-17T11:43:36.109Z · LW(p) · GW(p)

A while ago, I watched recordings of the lectures given by by Wolpert and Kardes at the Santa Fe Institute*, and I am extremely excited to see you and Marcus Hutter working in this area.

Could you speculate on if you see this work having any direct implications for AI Safety?

Edit:

I was incorrect. The lectures from Wolpert and Kardes were not the ones given at the Santa Fe Institute.

Replies from: EbTech, matthias-dellago

↑ comment by Aram Ebtekar (EbTech) · 2025-02-17T16:38:15.698Z · LW(p) · GW(p)

Were those recorded!?

For direct implications, I'd like to speak with the alignment researchers who use ideas from thermodynamics. While Shannon's probabilistic information theory is suited to settings where the law of large numbers holds, algorithmic information theory should bring more clarity in messier settings that are relevant for AGI.

Less directly, I used physics as a testing ground to develop some intuitions on how to apply algorithmic information theory. The follow-up agenda is to develop a theory of generalization (i.e., inductive biases) using algorithmic information theory. A lot of AI safety concerns depend on the specific ways that AIs (mis)generalize beliefs and objectives, so I'd like us to have more precise ideas about which generalizations are likely to occur.

↑ comment by Matthias Dellago (matthias-dellago) · 2025-02-19T10:42:35.453Z · LW(p) · GW(p)

I would be interested in seeing those talks, can you maybe share links to these recordings?

Replies from: LosPolloFowler

↑ comment by Stephen Fowler (LosPolloFowler) · 2025-02-22T04:17:45.306Z · LW(p) · GW(p)

These recordings I watched were actually from 2022 and weren't the Sante Fe ones.

comment by Matthias Dellago (matthias-dellago) · 2025-02-19T10:36:58.601Z · LW(p) · GW(p)

Very good work, thank you for sharing!

Intuitively speaking, the connection between physics and computability arises because the coarse-grained dynamics of our Universe are believed to have computational capabilities equivalent to a universal Turing machine [19–22].

I can see how this is a reasonable and useful assumption, but the universe seems to be finite in both space and time and therefore not a UTM. What convinced you otherwise?

Replies from: EbTech

↑ comment by Aram Ebtekar (EbTech) · 2025-02-19T16:08:32.827Z · LW(p) · GW(p)

Thanks. Obviously this claim needs some interpretation, but a UTM still seems a better model of the Universe than, say, any lower automation in the Chomsky hierarchy. For the purposes of defining entropy, it's important that we can use a small base machine, plus a memory tape that we may think of as expanding in an online fashion.

comment by UnderTruth · 2025-02-17T18:07:19.053Z · LW(p) · GW(p)

I am reminded of this paper, on the equivalence of information-theoretic formalisms and those of physics. I am linking the paper here not as an endorsement, but because it may provide some unusual, but useful, lines of thought.

Thermodynamic entropy = Kolmogorov complexity

Contents

12 comments