Another Non-Anthropic Paradox: The Unsurprising Rareness of Rare Events

ape-in-the-coat

Another Non-Anthropic Paradox: The Unsurprising Rareness of Rare Events

post by Ape in the coat · 2024-01-21T15:58:14.236Z · LW · GW · 16 comments

  Intoduction
  The Impossibility of Random Number Generators
  Intuition vs Math
  Even More Extreme Version of the Problem
  Solving the Paradox
  Conclusion
None
16 comments

This is the fifth post in my series on Anthropics. The previous one is Anthropical Paradoxes are Paradoxes of Probability Theory [LW · GW]. The next one is Why Two Valid Answers Approach is not Enough for Sleeping Beauty [LW · GW].

Intoduction

As all anthropic problems are just probability theory problems, let's look deeper at probability theory itself. In my previous post I claimed that there are a lot of paradoxes^[1] there, and people are just not paying attention to them, unless they become relevant to anthropical reasoning, and then these paradoxes are mistakenly attributed to anthropics. So let's highlight one more of such paradoxes, solve it fully in the realm of probability theory, and then in a future post, this result will turn out to be helpful for deconfusing an anthropical problem.

The Impossibility of Random Number Generators

It's well known that rare events are rare. If an event has 1/n probability then we should expect it to happen about 1 time per n tries. Thus, if you observe some event that you believe to be rare, it should be surprising, proportionate to the rarity of the event. If you keep observing that events you believe to be low probability keep happening, then it's a clear signal that something is wrong with your model of the world.

So far so good. But then, how are we not constantly mind blown by the existence of random number generators?

After all, it's a device that can produce events of arbitrary rarity at will. Toss a coin ten times and you get a sequence of Heads and Tails which has only 1/2^10 probability. Run python function randint(0, 1000) ten times and you've just witnessed an event that has only 1/1000^10 probability. And it's not because the function was called so many times that some of the outcomes happen to be low probability. Every call of a function leads to an improbable outcome! What is going on?

Intuition vs Math

Let's notice that some outcomes of random number generators intuitively surprise us much more than others. If you throw a coin ten times and you get a sequence of ten Heads - that would be more surprising than a sequence of . And yet, according to probability theory $P (H H T H T T H T T H) = P (H H H H H H H H H H)$ . So we shouldn't really be surprised more. Is it just a bug of our human psyche that some sequences feel more random to us than others? Surely it has to be! We are biased but math can't be^[2].

On the other hand, there is something right about our naive human intuition. As we've already noticed, if we consider every outcome of ten coin tosses to be an improbable event we have to be constantly surprised and soon to start doubting our reality. But if we are surprised only by the outcome of 10 Heads in a row, the situation adds up to normality! After all, 10 Heads in a row will happen only in a rare subset of all outcomes. In this sense our biased human intuition satisfies the Law of Conservation of Expected Evidence.

Doesn't this contradict the fact that $P (H H T H T T H T T H) = P (H H H H H H H H H H)$ ? Not at all! There are creatures in the possible mind space^[3] whose intuition works in the opposite way. They are surprised specifically by the sequence of $H H T H T T H T T H$ and do not mind the sequence of $H H H H H H H H H H$ . As a result, they would also satisfy the Law of Conservation of Expected Evidence, as they are surprised in a similarly rare, though different, subset of all possible outcomes.

We may say that the mathematical model here is not describing one specific intuition but a general principle. And that there is actually another coherent mathematical model describing specifically our intuition. These two models do not contradict each other, they are just applicable in different circumstances.

Okay, so our intuition is pointing to something true. That's all fine and good. Still, how can it all works out together? Both statements can't be true:

We have observed an event, which probability is 1/2^10
We are not supposed to be surprised, as if we observed an event which probability is 1/2^10

Even More Extreme Version of the Problem

Before I reveal the solution to the initial problem, let's look into a more radical version of it. What about random number generators that produce real numbers? There is an uncountable amount of them, so the probability of seeing a particular real number produced by a random generator should be zero! We can witness literally impossible events at any moment we want and yet, we are not surprised by that at all.

Let's notice our confusion [LW · GW] and apply the standard technique for resolving it.

Our strength as rationalists is our ability to be more confused by fiction than by reality. If we are confused it means that something we believe in is false. We couldn't witness an impossible event. So, it means that we didn't. What event did we observe, then?

Random number generators do not actually produce real numbers. They produce float numbers with a specified accuracy. So, their probability isn't zero.

The same principle applies every time we deal with continuous distributions. When we check a thermometer, we do not actually observe an event "a specific real value of temperature was shown", which would've been impossible according to our mathematical model. Thermometers can only show us an interval: value +/- measurement error, each value of which has a non-zero probability.

Now we are back to square one. The probability of a specific float number to be generated is not zero, but it's still very small. But now I think we have a pretty good hint of what's going on. Neither math nor our intuition is wrong. We just do not observe the event we thought we observed.

Solving the Paradox

Let's apply the confusion resolving procedure one more time. It's extremely unlikely that we observed an extremely unlikely event. Then, with all likelihood, we didn't. What event did we observe, though?

In case of ten coin tosses in a row, what we actually observed is an event which can be called "any non-specific combination of Heads and Tails with length 10 was produced". This happens most of the time when we throw a coin ten times. But also, there are specific combinations which particularly capture our attention. This event encompasses such elementary outcomes as "10 Heads" and "10 Tails". And, very rarely, such combinations are produced. And then we are lawfully surprised, as we should be.

What I've just described seems as a usual default setting that human psyches has. But we can easily change it.

All it takes is just committing to track a specific set of combinations in your mind. And when the generator produces it and not any other combination - you will be rightfully surprised. The more elementary outcomes are being tracked, the more likely that one of them will be produced by a random generator, and thus the less surprising it is.

Does it mean that we can manipulate probabilities with our minds [? · GW]? Isn't it an example of weird anthropic psychic power, against existence of which I've been arguing the whole time?

Well, there is clearly nothing anthropical about this power. Not only we are not talking about self-location, it's not at all available only for minds.

Yes, your mind does have some settings allowing you to regulate which event can be observed. But so does an electric thermometer with a customizable measurement error. Or a line of code, changing the number of digits that are shown for float numbers.

You can also just shut your eyes, or disable the random number generator - does it also feel counterintuitive that these actions affect which events you can observe?

The opposite would be weird. If you could observe events regardless of the sharpness of your senses or the state of your mind or the accuracy of your tools - now this would be a quite literal psychic power, contradicting the Second Law of Thermodynamics [? · GW]. The ability to apply any map to any territory and get an accurate result.

Conclusion

So, it turns out, rare events are indeed rare. Quite unsurprisingly so. In this particular case our naive intuitions seem to be quite on point. Where our intuitions are leading us astray, however, is in assumption that event space always contains a set, consisting of any particular elementary outcome from the sample space:

$\forall (Ω, F, P), E \in Ω \Rightarrow {E} \in F$

This assumption is wrong. The definition of probability space doesn't prevent us from using a less rich σ-algebra, for example, for two coin tosses we can have this set up:

$Ω = {H H, T T, H T, T H}, F = {\emptyset, {H T, T H}, {H H, T T}, {H H, T T, H T, T H}}$

$P (H T, T H) = 1 / 2, P (H H, T T) = 1 / 2$

And yet, when we see outcome $H H$ we intuitively assume that it necessarily has to be its own event with probability 1/4, instead of just part of a larger event with probability 1/2. As if rareness of event is the sole property [LW · GW] of the sequence of coin tosses.

This is also related to the way humans use words. We often use "event" and "outcome" as synonyms. But mathematically, they are quite different. Event is a set of outcomes. And as the domain of probability function $P$ is event space $F$ , not sample space $Ω$ , there is no such thing as probability of an outcome.^[4]

This is not a mistake that often reminds us about itself. And so, it's easy to keep making it without noticing. Even now when specifically highlighted it may look like a completely niche thing, an irrelevant nitpick. Most of the time we don't need to rigorously construct probability space to find a correct answer to a decision theory problem.

But then, occasionally, we do need to be rigorous. And if we are not used to it - we fail to arrive to a correct answer. And, in case we are particularly unlucky, also create a decades long philosophical dispute with different new schools of thought, persuading otherwise reasonable people to believe in ridiculous things. And if you know anything about philosophical disputes then you understand that untangling them is much more complicated task than not making the initial mistake in the first place.

But, as this ship has long sailed, my next couple of posts will be dedicated to this untangling procedure as we've finally done all the groundwork to solve the infamous Sleeping Beauty Paradox.

The next post in the series is Why Two Valid Answers Approach is not Enough for Sleeping Beauty [LW · GW].

^{^}
I do not actually mean that probability theory itself is unsound and should be abandoned. By "paradox" here I mean an apparent contradiction, that people are confused about, because they misinterpret probability theory. As always, it's not the art that has failed us, but it is us who has failed the art.
^{^}
However, math can be unapplicable to a specific real-world situation or be applicable differently than we originally thought.
^{^}
We do not even need to postulate a weird alien psyche here. These creatures can very well be humans - more on that below.
^{^}
Sadly, this confusion is so pervasive that, at the moment of writing, Wikipedia article on outcome talks about their probabilities, despite specifically mentioning that outcomes shouldn't be confused with events. The article on probability space, however, correctly states that probability function assigns values to events.

16 comments

Comments sorted by top scores.

comment by Throwaway2367 · 2024-01-21T18:18:57.607Z · LW(p) · GW(p)

I don't know.. not using the whole powerset when is finite kinda rubs me the wrong way. (EDIT: correction: what clashes with my aesthetic sense isn't that it's not the whole powerset, rather that I instinctively want to have random variables denoting any coinflip when presented with a list of coinflips yet I can't have that if the set of events is not the powerset because in that case those wouldn't be measurable functions. I think the following expands on this same intuition without the measure-theoretic formalism.)

Consider the situation where I'm flipping the coin and I keep getting heads, I imagine I get more and more surprised as I'm flipping.

Consider now that I am at the moment when I've already flipped $n$ coins, but before flipping the $(n + 1)$ th one. I'm thinking about the next flip: To model the situation in my mind, there clearly should be an event where the $(n + 1)$ th coin is heads and another event where the $(n + 1)$ th coin is tails. Furthermore, these events should have equal (possibly conditional) probabilities yet I will be much more surprised if I get heads again.

This makes me think that the key isn't that I didn't actually observe a low probability event (because in my opinion it does not make sense to model the situation above with a $σ$ -algebra where the $(n + 1)$ th coin being tails is grouped with the $(n + 1)$ th coin being heads because in that case I wouldn't be able to calculate separate probabilities for those events) rather the key is that I feel surprise when one of my assumptions about the world has become too improbable compared to an alternative: in this case, the assumption that the coin is unbiased. After observing lots of heads the probability that the coin is biased in favor of heads gets much greater than that of it being unbiased, even if we started out with a high prior that it's unbiased.

Replies from: Radford Neal, Ape in the coat, polytope

↑ comment by Radford Neal · 2024-01-22T01:11:34.078Z · LW(p) · GW(p)

Yes, this is the right view.

In real life we never know for sure that coin tosses are independent and unbiased. If we flip a coin 50 times and get 50 heads, we are not actually surprised at the level of an event with 1 in 2 to the -50 probability (about 1 in 10 to the -15). We are instead surprised at the level of our subjective probability that the coin is grossly biased (for example, it might have a head on both sides), which is likely much greater than that.

But in any case, it is not rare for rare events to occur, for the simple reason that the total probability of a set of mutually-exclusive rare events need not be low. That is the case with 50 coin tosses that we do assume are unbiased and independent. Any given result is very rare, but of course the total probability for all possible results is one. There's nothing puzzling about this.

Trying to avoid rare events by choosing a restrictive sigma algebra is not a viable approach. In the sigma algebra for 50 coin tosses, we would surely want to include events for "1st toss is a head", "2nd toss is a head", ..., "50th toss is a head", which are all not rare, and are the sort of event one might want to refer to in practice. But sigma algebras are closed under complement and intersection, so if these events are in the sigma algebra, then so are all the events like "1st toss is a head, 2nd toss is a tail, 3rd toss is a head, ..., 50th toss is a tail", which all have probability 1 in 20 to the -50.

↑ comment by Ape in the coat · 2024-01-22T06:13:01.502Z · LW(p) · GW(p)

I instinctively want to have random variables denoting any coinflip when presented with a list of coinflips yet I can't have that if the set of events is not the powerset because in that case those wouldn't be measurable functions.

No problem, you just explicitly use two different mathematical models at the same time, modelling different aspects of your problem. One for the whole series of the coin tosses and the other for the i-th coin toss.

$Ω i = {H, T}, F i = {\emptyset, {H}, {T}, {H, T}}$

Notice, that using a powerset

$F = {\emptyset, {H T}, {T H}, {H T, T H}, {H H, T T}, {H H, T T, H T, T H}}$

doesn't allow you to express individual coin tosses anyway - you need a different sample space for it.

Consider the situation where I'm flipping the coin and I keep getting heads, I imagine I get more and more surprised as I'm flipping.

Likewise, consider situations where:

You've written a specific non-trivial long combination of Heads and Tails and then, as you flip a coin, this particular combination is being produced. All the same logic for n-th flip. You are not much surprised to see every individual coin toss outcome, but are more and more surprised that they end up into a specific sequence that you've written beforehand.
Same as 1. but you've written a different combination of Heads and Tails and thus you are neither surprised to see every individual outcome, nor the total result.
Same as 1. but you haven't written any combination in advance. Once again you are not surprised.

In 1. you've observed a rare event and are surprised because of it. In 2. and 3. you didn't and thus you are not. Even if you've observed the same outcome - sequence of Heads and Tails - in all 1. 2. and 3. The events that you've observed are quite different. And if you do a simple sanity check, you will notice that, indeed, it's very easy to replicate situations 2. and 3. but very hard to replicate 1.

Situation 1. is similar to observing many Heads in a row. The difference is that your brain is wired to track for many Heads/many Tails by default, but you being able to track the specific non-trivial combination requires an active precommitment.

rather the key is that I feel surprise when one of my assumptions about the world has become too improbable

This is not an alternative explanation. This is restating the same fact in different terms. If your assumption about the world has become too improbable it means that you've accumulated enough evidence against this assumption. Strength of the evidence against an assumption is literally how improbable encountering such event according to this assumption is. It's not one way or the other. It's always both.

Replies from: Throwaway2367

↑ comment by Throwaway2367 · 2024-01-22T07:50:02.574Z · LW(p) · GW(p)

As I understand (but correct me if I am wrong), your claim is that we don't feel surprise when observing what is commonly thought of as a rare event, because we don't actually observe a rare event, because of one quirk of our human psychology we implicitly use a non-maximal event space. But you now seem to allow for another probability space which, if true, seems to me a somewhat inelegant part of the theory. Do you claim that our subconscious tracks events in multiple ways simultaneously or am I misunderstanding you?

Relatedly, the power set does allow me to express individual coin tosses. Let be the following function on $Ω$ :

$X_{1} (ω) = {\begin{matrix} 1 if ω \in {H H, H T} 0 otherwise \end{matrix}$

In this case $X_{1}$ is measurable, because $X_{1}^{- 1} [{1}] = {H H, H T} \in P (Ω)$ (minor point: Your $F$ is not the powerset of $Ω$ ), same for $X_{1}^{- 1} [{0}]$ . Therefore $X_{1}$ is actually a random variable modeling that the first throw is head.

Regarding your examples, I'm not sure I'm understanding you: Is your claim that the eventspace is different in the three cases leading to different probabilities for the events observed? I thought your theory said that our human psychology works with non-maximal eventspaces, but it seems it also works with different event spaces in different situations? (EDIT: Rereading the post, it seems you've adressed this part: if I understand correctly, one can influence their event space by way of focusing on specific outcomes?)

Wouldn't it be much simpler to say that in 1, your previous assumption that the coinflips are independent from what you write on a paper became too low probability after observing the coinflips and that caused the feeling of surprise?

I'm afraid I don't understand your last paragraph, to me it clearly seems an alternative explanation. Please, elaborate. It's not true that any time I observe a low-probability event, one of my assumptions gets low-prob. For example, if I observe HHTHTTHHTTHT, no assumption of mine does, because I didn't have a previous assumption that I will get coinflips different from HHTHTTHHTTHT. An assumption is not just any statement\proposition\event, it's a belief about the world which is actually assumed beforehand.

To me your explanation leaves some things unexplained: for example: In what situation will our human psychology use which non-maximal event spaces? What is the evolutionary reason for this quirk? Isn't being surprised in the all heads case rational in an objective sense? Should we expect an alien species to be or not be surprised?

For my proposed explanation these are easy questions to answer: We are not surprised because of the non-maximal event spaces, rather, we are surprised if one of our assumptions loses a lot of probability. The evolutionary reason is that the feeling of surprise caused us to investigate and in cases when one of our assumptions got too improbable, we should actually investigate the alternatives. Yes, being surprised in these cases is objectively rational and we should expect an alien species to do the same on all-heads throw and not do the same on some random string of H/T.

Replies from: Ape in the coat

↑ comment by Ape in the coat · 2024-01-24T10:56:08.779Z · LW(p) · GW(p)

As I understand (but correct me if I am wrong), your claim is that we don't feel surprise when observing what is commonly thought of as a rare event, because we don't actually observe a rare event, because of one quirk of our human psychology we implicitly use a non-maximal event space.

Yes, this is correct. The general principle is "You can observe only what you are paying attention to". And human quirk is by default paying attention to many Heads/Tails in a row.

But you now seem to allow for another probability space which, if true, seems to me a somewhat inelegant part of the theory. Do you claim that our subconscious tracks events in multiple ways simultaneously or am I misunderstanding you?

It's not I who allow stuff. The point is that there is nothing in probability theory that forbids us from doing it. It's not some new radical idea either. Solomonoff inductor is supposed to track all the models at the same time, for instance.

Another point is that, in fact, our minds (not necessary subconscious) are indeed able to use multiple mathematical models at the same time, as long as the sample spaces are different. This is an empirical claim, which you may check yourself.

The question of elegance is less important to me, it's a matter of taste, essentially. Personally, I think using two different models for their specific tasks and nothing else is a more elegant design than trying to stick all the required functionality and then some more into one bigger model. Actually I still don't think that just one model would be enough for what you want it to do, anyway.

minor point: Your is not the powerset of $Ω$

Yes, you are completely correct, I forgot to add triplets and a couple of pairs. Anyway, let's explore this kind of modeling:

Therefore $X_{1}$ is actually a random variable modeling that the first throw is head.

So suppose you make a series of n coin tosses, your sample space are all possible combinations of Heads and Tails with length n and event space is its powers set. Let's define event $H_{i}$ as a set of all possible combinations of Heads and Tails with length n where Heads is in i-th place. You toss a coin the first time and get Heads. Has the event $H_{1}$ just happened?

No, because $H_{1}$ is realized only when either of its outcomes is realized and it's outcomes are series of coin tosses with length n. So you can only say that $H_{1}$ happened after all the coin tossing is done.

So if you want to update in process, you need a model for the i-th coin toss - which sample space are all possible combinations of Heads and Tails with length i and event space is its powers set. And then with every coin toss this model changes. So in the end you will have n different models.

Also I think you will have to use model for the current coin toss result anyway so that the switch from i to i+1 can properly be implemented. Maybe there is some clever way around this problem. In any case, human minds seem to be working the obvious way: notice that the outcome of the current coin toss is Heads/Tails, add it to the list of all the previous coin tosses with length i and thus be able to say which outcome in i+1th model has been realized.

And, of course, if you want to compare different assumptions about a coin you will have to track even more models in your mind.

(EDIT: Rereading the post, it seems you've adressed this part: if I understand correctly, one can influence their event space by way of focusing on specific outcomes?

Yes, your edit is correct. We can change what we are paying attention to and thus observe different events, which mathematically can be described as having different event spaces. There are some potential issues here, like whether your really made yourself pay attention only to the specific combination you've selected and thus are not surprised at all by ten Heads in a row, or are you just adding a new combination to the list of specific combinations which includes all Heads and all Tails thus becoming only about 50% less surprised when observing all Heads.

But this doesn't matter much in the realm of decision making. If you want to do some action with only 1/2^n probability you can commit to a specific outcome with length n, toss a coin n times and do the action only if this particular outcome is realized.

Wouldn't it be much simpler to say that in 1, your previous assumption that the coinflips are independent from what you write on a paper became too low probability after observing the coinflips and that caused the feeling of surprise?

Strictly speaking, no, because now you have to add the whole new level of multiple alternative hypothesis with their own probability spaces you are also tracking in your mind and prioritizing between them.

I have a simple rule: "surprise is proportional to the improbability of the event observed" and then use already existent difference between events and outcomes, to explain why observing every outcome of a random number generator is not surprising.

You add an extra distinction between "observed events" and "assumption invalidating observed events". And I don't see what it bring to the table. Seems to be a clear case of an extra entity. You can just reduce three entities model (assumption invalidating events, events, outcomes) to two entities (events, outcomes) model, without loosing anything.

It's not true that any time I observe a low-probability event, one of my assumptions gets low-prob. For example, if I observe HHTHTTHHTTHT, no assumption of mine does, because I didn't have a previous assumption that I will get coinflips different from HHTHTTHHTTHT.

If you didn't have an assumption that observing HHTHTTHHTTHT is improbable then in what sense did you observe an improbable event when you saw the outcome HHTHTTHHTTHT?

Your assumptions can be described as a probability space with less rich sigma-algebra in which outcome HHTHTTHHTTHT isn't an event in itself. Let's call it model A. Observing an improbable event in model A equals your assumption becoming improbable and vice versa.

On the other hand, you are also trying to keep a probability space with a power set in your mind as well. And there {HHTHTTHHTTHT} is an event with low probability. This is model B.

What you are saying is that if you observed an outcome that corresponds to a low probable event in model B, it doesn't mean that you've observed a low probable event in model A. And I completely agree. What I'm saying, is that you do not need to talk about model B in the first place, as it doesn't actually correspond to to what you are able to observe and just adds extra confusion.

To me your explanation leaves some things unexplained: for example: In what situation will our human psychology use which non-maximal event spaces? What is the evolutionary reason for this quirk? Isn't being surprised in the all heads case rational in an objective sense? Should we expect an alien species to be or not be surprised?

Naturally, it depends on our assumptions, what we are paying attention to. A person who is tracking a specific outcome and sees it being realized observes a much less probable event than a person who is tracking a dozen different outcomes, this one included.

There are some built in intuitions about what feels more or less random and its possible to speculate about their evolutionary reasons for them and for our ability to modify what we are paying attention to. There are, indeed, more things to be said on these topics. But they are besides the point of what I wanted to communicate in this post - probability theory and one of its apparent paradoxes which is quite relevant to anthropic reasoning which I'm trying to solve. The idea that our brain is a pattern seeking machine is already quite popular and I doubt that I have much new to add here.

↑ comment by polytope · 2024-01-22T02:26:10.398Z · LW(p) · GW(p)

Yes, rather than resolving the surprise of "the exact sequence HHTHTTHTTH" by declaring that it shouldn't be part of the set of events, I would prefer to resolve it via something like:

It should be part of the set of events I'm allowed to consider just like any other subset of all 10-flip sequences.
We do observe events (or outcomes that if constructed as singleton events) all the time that would we would have predicted to be exceedingly improbable (while they may be improbable individually, a union of them may not be).
Observing some particular unlikely event like "the exact sequence HHTHTTHTTH occurs" should in fact raise my relative belief in any hypothesis by a large factor if that hypothesis would have uniquely predicted that to occur, as compared to others that would have made a far more non-specific prediction. (up to a factor of at most 2^10 unless the other hypothesis considered that sequence to be unlikelier than uniform)
Even if all this is true, I still do not and should not feel surprised in such a case because I think surprise has more to do the amount by which something shifts the beliefs I have that my brain intuits to be important for various reasons. It has little to do with the likelihood of events I observe, other than how it affects those beliefs. I didn't have any prior reason to assign any meaningful weight to hypotheses about the coin that would predict that exact sequence and no others, such that even after scaling them by a large factor, my overall beliefs about the coin and the distribution of likely future flips should remain very similar to before, therefore I feel little surprise.
By contrast I might feel a little more surprise seeing "HHHHHHHHHH". And again the reason is not really because of the likelihood or unlikelihood of that sequence, and it also has little to do with which sequences I'm being told I can define to be a mathematical event or not. Rather I think it's closer to something like "this coin is biased heads" or "this coin always flips heads" are competing hypotheses to "this coin is fair" that while initially extremely unlikely would not be outlandish to consider, and if true it would affect my conception of the coin and predictions of its future flips. So this time the large relative boost would come closer to shifting my beliefs in a way that would impact how I think about the coin and make future predictions, therefore I feel more surprise.

comment by Adam Kaufman (Eccentricity) · 2024-01-21T16:39:56.611Z · LW(p) · GW(p)

We aren’t surprised by HHTHHTTTHT or whatever because we perceive it as the event “a sequence containing a similar number of heads and tails in any order, ideally without a long subsequence of H or T”, which occurs frequently.

Replies from: Ape in the coat

↑ comment by Ape in the coat · 2024-01-21T16:49:18.374Z · LW(p) · GW(p)

Yep. This is essentially the point of the post.

comment by Duschkopf · 2024-04-07T12:50:06.941Z · LW(p) · GW(p)

Yes. Our human mind is obviously biased to detect patterns. And people tend to react surprised if they observe patterns where they did not expect them to find. If someone has a specific sequence of coin toss results in her mind (eg. „HHTHTHHT“) and she is able to reproduce it with an actual coin on her first try, then she will likely be surprised. What she is really surprised about however, is not that she has observed an unlikely event ({HHTHTHHT}), but that she has observed an unexpected pattern. In this case, the coincidence of the sequence she had in mind and the sequence produced by the coin tosses constitutes a symmetry which our mind readily detects and classifies as such a pattern. One could also say that she has not just observed the event {HHTHTHHT} alone, but also the coincidence which can be regarded as an event, too. Both events, the actual coin toss sequence and the coincidence, are unlikely events and both become extremely unlikely with longer sequences. My reasoning is, that the coincidence is not more surprising than the actual sequence because it is an even more unlikely event than the sequence. Though both events are unlikely and therefore unexpected, the coincidence is more surprising to us simply because it looks like a pattern.

Replies from: Ape in the coat

↑ comment by Ape in the coat · 2024-04-08T05:05:09.340Z · LW(p) · GW(p)

What she is really surprised about however, is not that she has observed an unlikely event ({HHTHTHHT}), but that she has observed an unexpected pattern.

Why do you oppose these two things to each other? Talking about patterns is just another way to describe the same fact.

In this case, the coincidence of the sequence she had in mind and the sequence produced by the coin tosses constitutes a symmetry which our mind readily detects and classifies as such a pattern.

Well, yes. Or you can say that having a specific combination in mind allowed to observe event "this specific combination" instead of "any combination". Once again this is just using different language to talk about the same thing.

One could also say that she has not just observed the event {HHTHTHHT} alone, but also the coincidence which can be regarded as an event, too. Both events, the actual coin toss sequence and the coincidence, are unlikely events and both become extremely unlikely with longer sequences.

Oh! Are you saying that she has observed the intersection of two rare events: "HHTHTHHT was produced by coin tossing" and "HHTHTHHT was the sequence that I came up with in my mind" both of which have probability 1/2^8 so now she is surprised as if she observed an event with (1/2^8)^2?

That's not actually the case. If the person came up with some other combination and then it was realized on the coin tosses the surprise would be the same - there are 1/2^8 degrees of dreedom here - for every possible combination of Heads and Tails with lenghth 8. So the probability of the observed event is still 1/2^8.

Replies from: Duschkopf

↑ comment by Duschkopf · 2024-04-08T19:29:14.384Z · LW(p) · GW(p)

Maybe I expressed myself somewhat misleadingly. I am not saying that she is surprised because the coincidence is more unlikely than the sequence. You are absolutely right in correcting me that the latter isn‘t even the case (also since P(HHTHTHHT/„HHTHTHHT“)=P(HHTHTHHT)=1/2^8). What I was trying to say is that her suprise about the coincidence arises from the circumtance that the coincidence is both unlikely and looks like a pattern. That fact that an event is unlikely is a necessary condition to be surprised about its occurence but not a sufficient condition.

I agree with you when you are saying that how we structure our perception of the world is biased in some way towards what we are „tracking“ in our minds. And I also agree that this bias could be mathematically modelled by the event spaces you are proposing. But I would not go too far to say that we do only observe such events we are currently tracking (please let me know if I misread you or you feel strawmanned here, since it is absolutely not my intention to annoy you!). If this was true, then we could not observe the event „any other coin sequence“ as well since this event is by definition not being tracked. In fact, in order to detect a correspondence between a coin sequence that we have in mind and the actual sequence, our brain has to compare them to decide if there is a match. I can hardly imagine how this comparison could work without observing the specific actual sequence in the first place. That we classify and perceive a specific sequence as „any other sequence“ can be the result of the comparison, but is not its starting point.

In conclusion, I do not see a contradiction in not being surprised to observe an extremely unlikely event.

Replies from: Ape in the coat

↑ comment by Ape in the coat · 2024-05-10T09:33:06.179Z · LW(p) · GW(p)

If this was true, then we could not observe the event „any other coin sequence“ as well since this event is by definition not being tracked.

When you are tracking event A you are automatically tracking its complement.

In fact, in order to detect a correspondence between a coin sequence that we have in mind and the actual sequence, our brain has to compare them to decide if there is a match. I can hardly imagine how this comparison could work without observing the specific actual sequence in the first place. That we classify and perceive a specific sequence as „any other sequence“ can be the result of the comparison, but is not its starting point.

Oh sure, you are of course completely correct here. But this doesn't contradict what I'm saying.

The thing is, we observe a particular outcome and then we see which event(s) it corresponds to. Let's take an example: a series of 3 coin tosses.

So, in the beginning you have sample space which consist of all the elementary outcomes:

And an event space, some sigma-algebra of the sample space, which depends on your precommitments. Normally, it would look something like this:

${\emptyset, {H H H, T T T}, {H H T, T T H, H T H, T H T, T H H, H T T}, {H H H, T T T, H H T, T T H, H T H, T H T, T H H, H T T}}$

Because you are intuitively paying attention to whether there all Heads/Tails in a row. So your event space groups individual outcomes in this particular way, separating the event you are tracking and it's complement.

When a particular combination, say $T H H$ is realized in a iteration of the experiment, your mind works like this:

Outcome $T H H$ is realized
Therefore every event from the event space which includes $T H H$ is realized.
Events ${H H T, T T H, H T H, T H T, T H H, H T T}$ and ${H H H, T T T, H H T, T T H, H T H, T H T, T H H, H T T}$ are realized.
$P (H H T o r T T H o r H T H o r T H T o r T H H o r H T T) = 2 / 3$
This isn't a rare event and so you are not particularly surprised

So, as you see, you do indeed observe an actual sequence, it's just that observing this sequence isn't necessary an event in itself.

comment by red75prime · 2024-01-22T10:41:53.566Z · LW(p) · GW(p)

here are creatures in the possible mind space^[3] [LW(p) · GW(p)] whose intuition works in the opposite way. They are surprised specifically by the sequence of and do not mind the sequence of $H H H H H H H H H H$

That is creatures who aren't surprised by outcomes of lower Kolmogorov complexity or not surprised by the fact that the language they use for estimation of Kolmogorov complexity has a special compact case for producing "HHTHTTHTTH".

Looks possible, but not probable.

comment by quetzal_rainbow · 2024-01-22T07:09:56.811Z · LW(p) · GW(p)

I have a suggestion: maybe you join your antropics posts in one sequence?

Replies from: Ape in the coat

↑ comment by Ape in the coat · 2024-01-22T07:59:36.758Z · LW(p) · GW(p)

I'm going to do it eventually. Are there any reasons to do it earlier instead of later?

Replies from: quetzal_rainbow

↑ comment by quetzal_rainbow · 2024-01-22T08:30:41.668Z · LW(p) · GW(p)

Purely for navigation convenience.