Boltzmann Brains and Within-model vs. Between-models Probability

post by Charlie Steiner · 2018-07-14T09:52:41.107Z · LW · GW · 12 comments

Contents

  Why Boltzmann brains?
  What now?
  Changing Occam's Razor?
None
12 comments

Why Boltzmann brains?

Back in the days of Ludwig Boltzmann (before the ~1912 discovery of galactic redshift), it seemed like the universe could be arbitrarily old. Since the second law of thermodynamics says that everything tends to a state of high entropy, a truly old universe would have long ago sublimated into a lukewarm bath of electromagnetic radiation (perhaps with a few enormous dead stars slowly drifting towards each other). If this was true, our solar system and the stars we see would be just a bubble of order in a vast sea of chaos - perhaps spontaneously arising, like how if you keep shuffling a deck of cards, eventually it will pass through all arrangements, even the ordered ones.

The only problem is, bubbles of spontaneous order are astoundingly unlikely. However long it takes for one iron star to spontaneously reverse fusion and become hydrogen, it takes that length of time squared for two stars to do it. So if we're trying to explain our subjective experience within long-lived universe that might have these bubbles of spontaneous order, the most likely guesses are those that involve the smallest amount of matter. The classic example is an isolated brain with your memories and current experience, briefly congealing out of the vacuum before evaporating again.

What now?

The rarest, if sort of admirable, response is to agree you're probably a Boltzmann brain, and at all times do the thing that gets you the most pleasure in the next tenth of a second. But maybe you subscribe to Egan's law ("Everything adds up to normality"), in which case there are basically two responses. Either you are probably a Boltzmann brain, but shouldn't act like it because you only care about long-lived selves, or you think that the universe genuinely is long-lived.

Within our best understanding of the apparent universe, there shouldn't be any Boltzmann brains. Accelerating expansion will drive the universe closer and closer to its lowest energy eigenstate, which suppresses time evolution. Change in a quantum system comes from energy differences between states, and in a fast-expanding universe, those energy differences get redshifted away.

But that's only the within-model understanding. Given our sense-data, we should assign a probability distribution over many different possible laws of physics that could explain that sense-data. And if some of those laws of physics result in very large number of Boltzmann brains of you, does this mean that you should agree with Bostrom's Presumptuous Philosopher [LW · GW], and therefore assign very high probability that you're a Boltzmann brain, nearly regardless of the prior over laws?

In short, suppose that the common-sense universe is the simplest explanation of your experiences but contains only one copy of them, while a slightly different universe is more complex (say the physical laws take 20 more bits to write) but contains 10^100 copies of your experiences. Should you act as if you're in the more complicated universe?

Bostrom and FHI have written some interesting things about this, and coined some TLAs, but I haven't read anything that really addressed what feels like the fundamental tension: between believing you're a Boltzmann brain on the one hand, and having an ad-hoc distinction between within-model and between-model probabilities on the other hand.

Changing Occam's Razor?

Here's an idea that's probably not original: what would a Solomonoff inductor think? A Solomonoff inductor doesn't directly reason about physical laws, it just tries to find the shortest program that reproduces its observations so far. Different copies of it within the same universe actually correspond to different programs - a simple program that specifies the physical laws, and then a complicated indexical parameter that tells you where in this universe you can find its observations so far.

If we pick only the simplest program making each future prediction (e.g. about the Presumptuous Philosopher's science experiment), then the number of copies per universe doesn't matter at all. Even if we are a little bit more general and consider a prior over the different generating prefix-free programs, the inclusion of this indexical parameter in the complexity of the program means that even if a universe has infinite copies of you, there's still only a limited amount of probability assigned, and the harder they are to locate (like by virtue of being extremely ridiculously rare), the more bits they're penalized.

All of this sounds like a really appealing resolution. Except... it's contrary to our current usage of Occam's razor. Under the current framework, the prior probability of a hypothesis about physics depends only on the complexity of the physical laws - no terms for initial conditions, and certainly no dependence on who exactly you are and how computationally easy it is to specify you in those laws. We even have lesswrong posts correcting people who think Occam's razor penalizes physical theories with billions and billions of stars. But if we now have a penalty for the difficulty to locate ourselves within the universe, a part of this penalty looks like the log of the number of stars! I'm not totally sure whether this is a bug or a feature yet.

12 comments

Comments sorted by top scores.

comment by cousin_it · 2018-07-14T11:17:16.776Z · LW(p) · GW(p)

Yeah, using a simplicity prior is appealing for this kind of questions. A related idea, which I haven't really seen anywhere, is that it might solve Loschmidt's paradox.

The paradox comes from realizing that, since the laws of physics are time-symmetric and our most likely future has higher entropy, our most likely past (conditional only on the present) also has higher entropy than the present. All your memories are lies, and the glass of milk just assembled itself from a broken state a second ago! The usual solution is by conditioning on both the present and a hypothetical low-entropy distant past, which is known as the "past hypothesis". Then the mystery is why the distant past had such low entropy, which is neatly resolved by a simplicity prior saying some highly ordered states have very short descriptions. For example, a uniform density of gas in space which can collapse into stars etc. would be a nice starting point.

Your last paragraph is interesting. Maybe the simplicity prior indeed leads to a penalty for larger universes! Or maybe not, if the idea of finding observers in a universe is itself simple, and we just need to say "find observer #2345671 in universe #3". This whole thing makes me very confused.

Replies from: Charlie Steiner
comment by Charlie Steiner · 2018-07-14T13:27:02.129Z · LW(p) · GW(p)

I'm not sure my memories being lies is actually all that likely even taking Loschmidt into account. I don't expect my memories to get scrambled in the near future, so I also shouldn't expect my memories to recently have spontaneously unscrambled. Is that how it works? Hm, but on the other hand, does the asymmetry of my memories provide sufficient evidence that the past was low-entropy? If you have a max-entropy prior over states of matter in the universe, no, absolutely not. You'd need some kind of simplicity prior or similar. So I guess you're totally right :)

comment by Pattern · 2018-07-15T05:25:59.112Z · LW(p) · GW(p)

Why would the simplest program that describes all your observations so far be a specification of the entire universe along with your location?

Replies from: Charlie Steiner
comment by Charlie Steiner · 2018-07-16T08:21:20.033Z · LW(p) · GW(p)

I think I misspoke a little. A more natural division is between physical law, knowledge of the universe's state (which can be localized around you, you don't have to start at the big bang), and miscellaneous correction terms. Of course there are programs that don't look anything like this, but the idea is that if we really are in a universe governed by simple laws, in the limit of lots of observations this sort of hypothesis will become good. This is also a requirement for using a Solomonoff inductor to talk about physical laws in the first place.

comment by avturchin · 2018-07-14T10:00:07.114Z · LW(p) · GW(p)

There is one more possible view on BBs: That they are the only type of minds that exist, but this produces the reality and predictions almost like the observed world. This is because for any random mind N exists another random mind N+1, which is very similar to N, but differs just on 1 bit, which - under some assumptions - creates illusionary chain of BBs, which could be called trajectories in the space of possible minds.

comment by Stephan Banev (stephan-banev) · 2019-03-13T17:43:19.334Z · LW(p) · GW(p)

Well, a chain of Boltzmann brain snapshots may very well be my brain. It includes ALL aspects of brain what constitutes me as me: my childhood memory, the hot cap of coffee in my hand right now, stars in the sky I observe, my current thinking about this text… anything what my brain perceives as reality, the reality it is part of... Apparently, it does not require the reality to actually be, it requires only Boltzmann brain to be in such state as it perceives to be part of some reality. The assembly of identical Boltzmann brains is essentially not an assembly of many but a single sample since identical copies cannot be distinguished; therefore, the collection of different/unique snapshots of Boltzmann brain constitutes such assembly. The subset of such assembly which is a composition of “slightly” different from each other snapshots and the order on such subset may represent the time evolution of one of Boltzmann brains (for example my brain). The open question is: in what way it should be different and at what magnitude of “slightness” and what order on subset should be to have a “sane” Boltzmann brain along its timeline/evolution???

Two Kurt Gödel’ theorems:

1) Any consistent axiomatic is not complete. (*)

2) The consistency of axiomatic is not provable within its frame. (**)

Any observation is a true predicate (observations represented by false predicates are not observable). A sane Boltzmann brain must be capable to make observations. Some subsets of Boltzmann brains assembly and the orders on these subsets may represent the Boltzmann brains capable to have observations along its timeline/evolution; therefore, all Boltzmann brains snapshots which violate this “abilities” get excluded/extinct since they cannot constitute a “sane” Boltzmann brain. According to first Gödel’ theorem (*) there is no way to have/predefine a complete subset & order of snapshots of Boltzmann brains which represent a defined “sane” Boltzmann brain along its timeline/evolution – such subset must be open and redefine itself along its time evolution to comply with second Gödel’ theorem (**). This process resembles the natural selection where the environment is presented by Kurt Gödel’ theorems (*/**) and the ability to make observation represents survival. I would call i : Gödel’s natural selection of Boltzmann brains ;o) Apparently, the neighborhood of particular sane Boltzmann brain may represent the superposition of quantum states of macro body (shalom to Everett’s relative state).

comment by interstice · 2018-09-26T15:26:28.038Z · LW(p) · GW(p)

The penalty for specifying where you are in space and time is dwarfed by the penalty for specifying which Everett branch you're in.

Replies from: Charlie Steiner
comment by Charlie Steiner · 2018-09-27T00:00:40.279Z · LW(p) · GW(p)

True - which would lead a solomonoff inductor to look harder for theories that are simple even including initial conditions :)

Replies from: interstice
comment by interstice · 2018-09-27T16:29:38.356Z · LW(p) · GW(p)

How do the initial simple conditions relate to the branching? Our universe seems to have had simple initial conditions but there's still been a lot of random branching, right? That is, the universe from our perspective is just one branch of a quantum state evolving simply from simple conditions, so you need O(#branching events) bits to describe it. Incidentally this undermines Eliezer's argument for MWI [LW · GW]based on Solomonoff induction, though MWI is probably still true

[EDITED: Oh, from one of your other comments I see that you aren't saying the shortest program involves beginning at the start of the universe. That makes sense]

comment by TheMajor · 2018-08-14T10:48:11.256Z · LW(p) · GW(p)

Isn't it more natural to let all programs that explain your observations contribute to the total probability, a la 'which hypotheses are compatible with the data'? This method works well in worlds with lots of observers similar to you - on the one hand it takes a lot of bits to specify which observer you are, but on the other hand all the other observers (actually the programs describing the experiences of those observers) contribute to the total posterior probability of that world.

Replies from: Charlie Steiner
comment by Charlie Steiner · 2018-08-14T16:31:47.473Z · LW(p) · GW(p)

Yes, they can all contribute, though they can't all contribute equally, because your total probability has to sum to one, there are an infinite number of possible explanations, and there is no uniform distribution on the integers.

Also, apologies if some of my jargon was confusing. Most of it's in the second chapter (I think?) of Li and Vitanyi's textbook, if you're interested, but really I should have just written with less jargon.

Replies from: TheMajor
comment by TheMajor · 2018-08-16T07:27:01.109Z · LW(p) · GW(p)

Isn't that exactly what the 'takes lots of bits to specify which overserver you are'-part can take care of though? Also I'm not sure what it means for a world to contain literally infinite copies of something - do you mean that on average there is some fixed (non-zero) density over a finite volume, and the universe is infinitely large? I think this issue with infinities is unrelated to the core point of this post.