Insights from the randomness/ignorance model are genuine

sil-ver

Insights from the randomness/ignorance model are genuine

post by Rafael Harth (sil-ver) · 2019-11-13T16:18:55.544Z · LW · GW · 23 comments

23 comments

(Based on the randomness/ignorance model proposed in 1 [LW · GW] $\to$ 2 [LW · GW] $\to$ 3 [LW · GW].)

The bold claim of this sequence thus far is that the randomness/ignorance model solves a significant part of the anthropics puzzle. (Not everything since it's still incomplete.) In this post I argue that this "solution" is genuine, i.e. it does more than just redefine terms. In particular, I argue that my definition of probability for randomness is the only reasonable choice.

The only axiom I need for this claim is that probability must be consistent with betting odds in all cases: if $H$ comes true in two of three situations where $O$ is observed, and this is known, then $P (H | O)$ needs to be $\frac{2}{3}$ , and no other answer is acceptable. This idea isn't new; the problem with it is that it doesn't actually produce a definition of probability, because we might not know how often $H$ comes true if $B$ is observed. It cannot define probability in the original Presumptuous Philosopher problem, for example.

But in the context of the randomness/ignorance model, the approach becomes applicable. Stating my definition for when uncertainty is random in one sentence, we get

Your uncertainty about $H$ , given observation $O$ , is random iff you know the relative frequency with which $H$ happens, evaluated across all observations $O^{'}$ that, for you, are indistinguishable to $O$ with regard to $H$ .

Where "relative frequency" is the frequency of $H$ compared to $\neg H$ , i.e. you know that $H$ happens in $n$ out of $m$ cases. A good look at this definition shows that it is precisely the condition needed to apply the betting odds criterion. So the model simply divides everything into those cases where you can apply betting odds and those where you can't.

If the Sleeping Beauty experiment is repeated sufficiently often using a fair coin, then roughly half of all experiments will run in the 1-interview version, and the other half will run the 2-interview version. In that case, Sleeping Beauty's uncertainty is random and the reasoning from 3 [LW · GW] goes through to output $\frac{2}{3}$ for it being Monday. The experiment being repeated sufficiently often might be considered a reasonably mild restriction; in particular, it is a given if the universe is large enough that everything which appears once appears many times. Given that Sleeping Beauty is still controversial, the model must thus be either nontrivial or wrong, hence "genuine".

Here is an alternative justification for my definition of random probability. Suppose $H$ is the hypothesis we want to evaluate (like "today is Monday") and $O$ is the full set of observations we currently have (formally, the full brain state of Sleeping Beauty). Then what we care about is the value of $P (H | O)$ . Now consider the term $\frac{P (H | O)}{P (H | \neg O)}$ ; let's call it $λ$ . If $λ$ is known, then $P (H | O)$ can be computed as $P (H | O) = (1 + λ^{- 1})^{- 1}$ , so knowledge of $λ$ implies knowledge of $P (H | O)$ and vice-versa. But $λ$ is more "fundamental" than $P (H | O)$ , in the sense that it can be defined as the ratio of two frequencies. Take all situations in which $O$ – or any other a set of observations $O^{'}$ which, from your perspective, is indistinguishable to $O$ – is observed, and count in how many of those $H$ is true vs. false. The ratio of these two values is $λ$ .

A look at the above criterion for randomness shows that it's just another way of saying that the value of $λ$ is known. Since, again, the value of $λ$ determines the value of $P (H | O)$ , this means that the definition of probability as betting odds, in the case that the relevant uncertainty is random, falls almost directly out of the formula.

23 comments

Comments sorted by top scores.

comment by interstice · 2019-11-13T22:57:51.918Z · LW(p) · GW(p)

This seems like a step backwards from UDASSA [LW · GW], another potential solution to many anthropic problems. UDASSA has a completely formal specification, while this model relies on a somewhat unclear verbal definition. So you need to know the 'relative frequency' with which H happens. But what are we averaging over here? Our universe? All possible universes? If uncertain about which universe we are in, how should we average over the different universes? What if we are reasoning about an event which, as far as we know, will only happen once?

Replies from: sil-ver

↑ comment by Rafael Harth (sil-ver) · 2019-11-14T00:02:00.742Z · LW(p) · GW(p)

I have answers to all of these questions! I just haven't posted them yet. If I present an entirely new theory in one super long post, then obviously no-one reads it. In fact, it would be irrational to read it because the prior that I'm onto something is just too low to invest the time. A sequence of short posts where each post makes a point which can be understood by anyone having read up to that post – that's not optimal, but how else could you do it? This is a completely genuine question if you have an answer.

So the structure I've chosen is to first state the distinction, then lay out the model that deals with randomness only (because that already does some stuff which SIA and SSA can't), then explain how to deal with ignorance, which makes the model complete, and then present a formalized version. The questions you just listed all deal with the ignorance part, the part that's still in the pipeline.

Well, and I didn't know I was competing with UDASSA [LW · GW], because I didn't know it existed. For some reason it's sitting at 38 karma, which makes it easy to miss, and you're the first to bring it up. I'll read it before I post anything else.

Replies from: interstice

↑ comment by interstice · 2019-11-14T00:13:11.480Z · LW(p) · GW(p)

It's true that UDASSA is tragically underrated, given that(it seems to me) it provides a satisfactory resolution to all anthropic problems. I think this might be a situation where people tend to leave the debate and move on to something else when they seem to have found a satisfactory position, like how most LW people don't bother arguing about whether god exists anymore.

Replies from: Wei_Dai, sil-ver

↑ comment by Wei Dai (Wei_Dai) · 2019-11-14T03:03:41.879Z · LW(p) · GW(p)

I think this might be a situation where people tend to leave the debate and move on to something else when they seem to have found a satisfactory position

Well not exactly, I came up with UDASSA originally but found it not entirely satisfactory, so I moved on to something that eventually came to be called UDT. I wrote down my reasons at against UD+ASSA [LW · GW] and under Paul's post [LW(p) · GW(p)].

Perhaps it would be good to have this history be more readily available to people looking for solutions to anthropic reasoning though, if you guys have suggestions on how to do that.

Replies from: sil-ver, interstice

↑ comment by Rafael Harth (sil-ver) · 2019-11-14T09:59:41.884Z · LW(p) · GW(p)

The solution to this kind of thing should be a wiki, I think. If the LessWrong wiki were kept up to date enough to have a page on anthropics, that would have solved the issue in this case and should work for many similar cases.

↑ comment by interstice · 2019-11-14T03:35:06.331Z · LW(p) · GW(p)

Right, I knew that many people had since moved on to UDT due to limitations of UDASSA for decision-making. What I meant was that UDASSA seems to be satisfactory at resolving the typical questions about anthropic probabilities, setting aside decision theory/noncomputability issues.

I agree it would be nice to have all this information in an readily-accessible place. Maybe the posts setting out the ideas and later counter-arguments could be put in a curated sequence.

Replies from: sil-ver

↑ comment by Rafael Harth (sil-ver) · 2019-11-14T10:01:04.016Z · LW(p) · GW(p)

I actually knew about UDT. Enough to understand how it wins in Transparent Newcomb, but not enough to understand that it extends to anthropic problems.

↑ comment by Rafael Harth (sil-ver) · 2019-11-14T00:30:30.759Z · LW(p) · GW(p)

The ASSA is the Absolute Self Selection Assumption. It is a variant on the Self Selection Assumption (SSA) of Nick Bostrom. The SSA says that you should think of yourself as being a randomly selected conscious entity (aka "observer") from the universe. The Absolute SSA extends this concept to "observer moments" (OMs). An observer moment is one moment of existence of an observer's consciousness. If we think of conscious experience as a process, the OM is created by dividing this process up into small units of time such that no perceptible change occurs within that unit. The ASSA then says that you should think of the OM you are presently experiencing as being randomly selected from among all OMs in the universe.

This is what I'm doing. I haven't read the entire thing yet, but this paragraph basically explains the key idea of my model. I was going to address how to count instances eventually (near the end), and it bottoms out at observer moments. The full idea, abbreviated, is "start with a probability distribution over different universes, in each one apply the randomness thing via counting observer moments, then weigh those results with your distribution". This gives you intuitive results in Doomsday (no update), P/P (some bias towards larger universe depending on how strongly you believe in other universes), Sleeping Beauty (basically 1/3) and the "how do we update on X-risk given that we're still alive" question (complicated).

It appears that I independently came up with ASSA, plus a different way of presenting it. And probably a weaker formalism.

I'm obviously unhappy about this, but thank you for bringing it to my attention now rather than later.

One reason I was assuming there couldn't be other theories I was unaware of is that Stuart Armstrong was posting about anthropics and he seemed totally unaware.

Replies from: interstice

↑ comment by interstice · 2019-11-14T02:28:18.342Z · LW(p) · GW(p)

Yeah, I also had similar ideas for solving anthropics a few years ago, and was surprised when I learned that UDASSA had been around for so long. At least you can take pride in having found the right answer independently.

I think that UDASSA gives P(heads) = 1/2 on the Sleeping Beauty problem due to the way it weights different observer-moments, proportional to 2^(-description length). This might seem a bit odd, but I think it's necessary to avoid problems with Boltzmann brains and the like.

Replies from: sil-ver

↑ comment by Rafael Harth (sil-ver) · 2019-11-14T10:04:27.941Z · LW(p) · GW(p)

You mean P(monday)? In that case it would be different although have some similarity. Why is the description length of the monday observer moment longer than the tuesday one?

Replies from: interstice

↑ comment by interstice · 2019-11-14T17:47:33.460Z · LW(p) · GW(p)

No, I mean Beauty's subjective credence that the coin came up heads. That should be 1/2 by the nature of a coin flip. Then, if the coin comes up tails, you need 1 bit to select between the subjectively identical states of waking up on Monday or Tuesdsay. So in total:

P(heads, Monday) = 1/2,

P(tails, Monday) = 1/4

P(tails, Tuesday) = 1/4

(EDIT: actually this depends on how difficult it is to locate memories on Monday vs. Tuesday, which might be harder given that your memory has been erased. I think that for 'natural' ways of locating your consciousness it should be close to / $\frac{1}{4}$ / $\frac{1}{4}$ though)

(DOUBLE EDIT, MUCH LATER: actually it now seems to me like the thirder position might apply here, since the density of spacetime locations with the right memories is higher in the tails branch than the heads)

comment by Gordon Seidoh Worley (gworley) · 2019-11-13T20:23:49.239Z · LW(p) · GW(p)

I guess I'm a bit out of the loop on questions about how to define uncertainty, so I'm a bit confused about what position you are against or how this is different from what others do. That is, it seems to be like you are trying to fix a problem you perceive in the way people currently think about uncertainty, but I'm not sure what that problem is so that I can even understand how this framing might fix it. I've been reading this sequence of posts thinking "yeah, sure, this all sounds reasonable" but also without really understanding the context for it. I know you did the post on anthropics, but even there it wasn't really that clear to me how this framing helps us over what is perhaps otherwise normally done, although perhaps that reflects my ignorance of existing arguments about what methods of anthropic reasoning are correct.

Replies from: sil-ver

↑ comment by Rafael Harth (sil-ver) · 2019-11-13T22:27:58.804Z · LW(p) · GW(p)

Yeah, I wrote this assuming people have the context.

So there's a class of questions where standard probability theory doesn't give clear answers. This was dubbed anthropics or anthropic probability. To deal with this, two principles were worked out, SSA and SIA, which are well-defined and produce answers. But for both of them, there are problems where their answers seem absurd.

I think the best way to understand the problem of anthropics is by looking at the Doomsday argument as an example. Consider all humans who will ever live (assuming they're not infinitely many). Say that's $N$ many. For simplicity, we assume that there are only two cases, either humanity goes extinct tomorrow, in which case $N$ is about sixty billion – but let's make that $10^{11}$ for simplicity – or humanity flourishes and expands through the cosmos, in which case $N$ is, say, $10^{18}$ . Let's call $S$ the hypothesis that humans go extinct, and $L$ the hypothesis that they don't (that's for "short" and "long" human history). Now we want to update on $P (L)$ given the observation that you are human number $n$ (so $n$ will be about 30 billion). Let's call that observation $O$ . Also let $p$ be your prior on $L$ , so $P (L) = p$ .

The Doomsday argument now goes as follows. The term $P (O | L)$ is $10^{- 18}$ , because if $L$ is true then there are a total of $10^{18}$ people, each position is equally likely, so $10^{- 18}$ is just the chance to get your particular one. On the other hand, $P (O | S)$ is $10^{- 11}$ , because if $S$ is true there are only $10^{11}$ people total. So we simply apply Bayes on the observation $O$ , and then use the law of total probability in the demonimator to obtain

$P (L | O) = P (O | L) \frac{P (L)}{P (O)} = 10^{- 18} \frac{p}{P (O | L) P (L) + P (O | \neg L) P (\neg L)} = \frac{10^{- 18} p}{10^{- 18} p + 10^{- 12} (1 - p)}$

If $p = 0.999$ , this term equals about 0.00989. So even if you were very confident that humanity would make it, you should still assign just below 1% on that after updating. If you want to work it out yourself, this is where you should pause and think about what part of this is wrong.

So the part that's problematic is the probability for $P (O | L)$ . There is a hidden assumption that you had to be one of the humans who was actually born. This was then dubbed the Self-Sampling Assumption (SSA), namely

All other things equal, an observer should reason as if they are randomly selected from the set of all actually existent observers (past, present and future) in their reference class.

So SSA endorses the Doomsday argument. The principled way to debunk this is the Self-Indexing Assumption (SIA), which says

All other things equal, an observer should reason as if they are randomly selected from the set of all possible observers.

If you apply SIA, then $P (O | L) = P (O | S)$ and hence $P (L | O) = P (O)$ . Updating on $O$ no longer does anything.

So this is the problem where SSA gives a stupid anwer. The problem where SIA gives the stupid answer is the Presumptuous Philosopher problem: there are two theories of how large the universe is, according to one it's $10^{9}$ times as large as it is according to the other. If you apply the SIA rule, you get that the odds for living in the small universe is $\frac{1}{1 + 10^{9}}$ (if the prior was $\frac{1}{2}$ on both).

There is also Full Non-indexical Conditioning which is technically a different theory, and it argues differently, but it outputs the same as SIA in every case, so basically there are just the two. And that, as far as I know, is the state of the art. No-one has come up with a theory that can't be made to look ridiculous. Stuart Armstrong has made a bunch of LW posts about this recently-ish, but he hasn't proposed a solution, he's pointed out that existing theories are problematic. This one [LW · GW], for example.

I've genuinely spent a lot of time thinking really hard about this stuff, and my conclusion is that the "reason as if you're randomly selected from a set of observers" thing is the key problem here. I think that's the reason why this still hasn't been worked out. It's just not the right way to look at it. I think the relevant variable which everyone is missing is that there are two fundamentally different kinds of uncertainty, and if you structure your theory around that, everything works out. And I think I do have a theory where everything works out. It doesn't update on Doomsday and it doesn't say the large universe is $10^{9}$ times as likely as the small one. It doesn't give a crazy answer anywhere. And it does it all based on simple principles.

Does that answer the question? It's possible that I should have started the sequence with a post that states the problem; like I just assumed everyone would know the problem without ever thinking about whether that's actually the case.

Replies from: clone of saturn, gworley

↑ comment by clone of saturn · 2019-11-14T01:24:03.508Z · LW(p) · GW(p)

Could you explain why the Doomsday argument answer seems absurd, or why I don't have to be a human who was actually born?

↑ comment by Gordon Seidoh Worley (gworley) · 2019-11-13T22:50:58.137Z · LW(p) · GW(p)

I think so, thanks.

comment by Thelo · 2019-11-13T21:39:14.915Z · LW(p) · GW(p)

"The experiment being repeated sufficiently often might be considered a reasonably mild restriction; in particular, it is a given if the universe is large enough that everything which appears once appears many times."

Why is that a given? The set of integers is very large, but the number 3 only appears once in it.

Replies from: sil-ver, shirisaya

↑ comment by Rafael Harth (sil-ver) · 2019-11-13T22:49:48.503Z · LW(p) · GW(p)

I think the relevant difference is that, in the set of integers, each element is strictly more complex than the previous one, but in the universe, you can probably upper bound the complexity (that's what I'm assuming, anyway). So eventually stuff should repeat, and then anything that has a nonzero probability of appearing will appear arbitrarily often as you increase the size. For example, if there's an upper bound to the complexity of a planet, then you can only have that many planets until you get a repeat.

Replies from: Thelo, TAG

↑ comment by Thelo · 2019-11-14T17:51:12.725Z · LW(p) · GW(p)

That doesn't seem to follow, actually. You could easily have a very large universe that's almost entirely empty space (which does "repeat"), plus a moderate amount of structures that only appear once each.

And as a separate argument, plenty of processes are irreversible in practice. For instance, consider a universe where there's a "big bang" event at the start of time, like an ordinary explosion. I'd expect that universe to never return to that original intensely-exploding state, because the results of explosions don't go backwards in time, right?

Replies from: sil-ver

↑ comment by Rafael Harth (sil-ver) · 2019-11-14T18:10:48.639Z · LW(p) · GW(p)

That doesn't seem to follow, actually. You could easily have a very large universe that's almost entirely empty space (which does "repeat"), plus a moderate amount of structures that only appear once each.

Yeah, nonemptiness was meant to be part of the assumption in the phrase you quoted.

And as a separate argument, plenty of processes are irreversible in practice. For instance, consider a universe where there's a "big bang" event at the start of time, like an ordinary explosion. I'd expect that universe to never return to that original intensely-exploding state, because the results of explosions don't go backwards in time, right?

We're getting into territory where I don't feel qualified to argue – although it seems like that objection only applies to some very specific things, and probably not to most Sleeping Beauty like scenarios.

↑ comment by TAG · 2019-11-14T10:05:02.748Z · LW(p) · GW(p)

the set of integers, each element is strictly more complex than the previous one

Not by algorithmic complexity. The integer consisting of a million 3s in a row is quite compressible.

Replies from: sil-ver

↑ comment by Rafael Harth (sil-ver) · 2019-11-14T10:09:59.541Z · LW(p) · GW(p)

But by number of bits, which is what you need to avoid repetition.

↑ comment by shirisaya · 2019-11-13T21:48:08.418Z · LW(p) · GW(p)

The typical answer is that this is a result of the Poincaré recurrence theorem

Replies from: Thelo

↑ comment by Thelo · 2019-11-14T17:42:57.365Z · LW(p) · GW(p)

Thanks for the mention, I had never heard of that concept before.

I have strong reflexes of revulsion against this idea that everything must reoccur (aren't plenty of processes irreversible in our world?), but it's getting too off-topic for the original article, and I need to think more about this.

Insights from the randomness/ignorance model are genuine

Contents

23 comments