Sleeping Beauty: an Accuracy-based Approach

glauberdebona

Sleeping Beauty: an Accuracy-based Approach

post by glauberdebona · 2025-02-10T15:40:29.619Z · LW · GW · 2 comments

  Summary
  Introduction
  Formal Setting
  Single Experiment
  Multiple Experiments 
    Minimizing the Expected Total Inaccuracy
    Minimizing the Expected Sum of Experiments' Mean Inaccuracies
    Minimizing the Expected Average Inaccuracy across Awakenings
  Concluding Remarks
None
2 comments

This post does not propose a solution to the Sleeping Beauty problem [? · GW], but presents arguments based on accuracy for thirders, halfers and double-halfers. A more detailed draft paper can be found here.

Summary

Accuracy-based arguments claim that one should plan to adopt posterior credences that maximize the expected (according to prior credences) accuracy. Applying this approach to the Sleeping Beauty problem, the solution depends on how we aggregate the accuracy of the credences held in the two indistinguishable awakenings when the coin lands Tails. Kierland and Monton (2005) have shown that, employing the Brier score, averaging accuracy yields halving (), and summing leads to thirding ( $p = 1 / 3$ ). We generalize that result for any strictly proper scoring rule. With multiple, repeated experiments, we show that accuracy can also be averaged in a different way, with the solution varying from $p = 1 / 2$ to $p = 1 / 3$ when the number of experiments increases indefinitely.

Introduction

The Sleeping Beauty problem [? · GW] ^[1] is about updating credences and can be approached via arguments based on accuracy (epistemic utility). When an agent is about to learn a proposition from a partition, it is well-known that they should plan to update via conditionalization in order to minimize the expected inaccuracy of their posterior credences. Similarly, before being put to sleep on Sunday, Sleeping Beauty (SB) can plan to update her credence on the coin landing Heads upon waking on Monday in such a way that minimizes the expected inaccuracy of that credence.

In the SB setup, we are interested in her credence on the coin outcome being Heads upon waking on Monday. However, if the coin lands Tails, her awakening on Tuesday is indistinguishable from the other on Monday, and presumably she would assign the same credences as Monday. In the Tails-world, thus, there are two identical credences to take into account while computing inaccuracy, one from Monday and one from Tuesday. The question is then how SB should aggregate the inaccuracies of her two indistinguishable temporal parts (awakenings) in the possible world where the coin lands Tails. It has been shown that averaging the inaccuracies yields the halfers' solution, while summing leads to the thirder's when Brier (quadratic) score is employed. These results can actually be extended to any strictly proper scoring rule.

More generally, we explore a repeated version^[2] of the SB problem where the experiment is conducted $n$ times, over $n$ weeks, with $n$ coin tosses. SB is awakened only on Mondays and (possibly) on Tuesdays, and does not remember previous awakenings. For instance, if the coin lands Heads in the first experiment, she will be awakened and put to sleep on the first Monday, then she will be awakened on the next Monday to start the second experiment, with another coin toss. All awakenings in the series of experiments are thus indistinguishable to her. Upon each awakening, we seek the credence she should assign to the coin landing Heads for that experiment. In this setting, we again can sum the inaccuracies in a possible world, but there are two ways to average. We can consider the mean inaccuracy per experiment and then average those means. Alternatively, the average can be taken across all awakenings in a possible world, regardless of the experiments. We present the credences SB should assign to minimize the expected inaccuracy for each of these three aggregation methods.

Formal Setting

With $n$ experiments, we have a set $W$ with $2^{n}$ possible worlds, since each coin toss can result in either Heads or Tails. A credence function is a mapping $c : 2^{W} \to [0, 1]$ from propositions to real numbers. We use $H_{i}$ to denote the set of worlds (proposition) where the coin toss for the $i^{t h}$ -experiment landed Heads.

The prior credences assigned by SB, just before she is firstly put to sleep, are denoted by the credence function $c_{0}$ . As the coin is fair and the coin tosses are independent, we assume that all possible worlds are a priori equiprobable to SB; formally, $c_{0} ({w}) = 1 / 2^{n}$ for all $w \in W$ , implying $c_{0} (H_{i}) = 1 / 2$ for all $i$ .

The posterior credences assigned upon each awakening during the $i^{t h}$ -experiment are denoted by the credence function $c_{i}$ ^[3]. In the $i^{t h}$ -experiment, the credence of interest, which answers the SB problem's question, is $c_{i} (H_{i})$ ^[4]. Since all awakenings are indistinguishable to SB, we assume that, for some $p \in [0, 1]$ , $c_{i} (H_{i}) = p$ for all $i$ . The (repeated) Sleeping Beauty problem is to determine the rational value for $p$ .

To measure inaccuracy, we use a scoring rule $s : [0, 1] \times {0, 1} \to R$ that maps the numeric credence in a proposition and its truth value (0=FALSE, 1=TRUTH) to a non-negative real number. We assume $s$ is strictly proper^[5], such as the Brier (quadratic) score^[6] or the logarithmic score. Each awakening corresponds to a credence of interest (in the $i^{t h}$ -experiment, $c_{i} (H_{i}) = p$ ), and the random variable $I : W \to R$ denotes some aggregation of their inaccuracies in a possible world $w \in W$ .

Single Experiment

Considering only one experiment, the set $W$ of possible worlds contains only two worlds, which are equiprobable to SB on Sunday: $w_{1}$ has one awakening (Heads) and $w_{2}$ has two (Tails). In $w_{1}$ , the inaccuracy of $c_{1} (H_{1}) = p$ (the credence of interest) is simply $s (p, 1)$ . In $w_{2}$ , there are two awakenings, each yielding inaccuracy $s (p, 0)$ to the credence of interest. To pick an optimal $p$ , one has to aggregate the inaccuracies for the two awakenings in $w_{2}$ somehow. We assume this aggregation is done either via the sum (resulting in an aggregate inaccuracy of $2 s (p, 0)$ in $w_{2}$ ) or via the arithmetical mean (which yields $s (p, 0)$ ), which are probably the two simplest choices among many^[7].

When the aggregate inaccuracy $I (w)$ is computed by summing the inaccuracies across indistinguishable awakenings, the expected inaccuracy is:

$E_{c_{0}} [I] = \sum w \in W c_{0} (w) I (w) = 0.5 s (p, 1) + s (p, 0) = \frac{3}{2} (\frac{1}{3} s (p, 1) + \frac{2}{3} s (p, 0))$

For a strictly proper $s$ , only $p = 1 / 3$ minimizes this expression.

Alternatively, if $I (w)$ is the arithmetic mean of the inaccuracies of $c_{1} (H_{1})$ held in different, indistinguishable awakenings in the same possible world, the expected inaccuracy of $c_{1} (H_{1})$ is given by:

$E_{c_{0}} [I] = \sum w \in W c_{0} (w) I (w) = \frac{1}{2} s (p, 1) + \frac{1}{2} s (p, 0)$

Since $s$ is strictly proper, only $p = 1 / 2$ minimizes the expression above.

These results generalize the findings from Kierland and Monton, who considered only the Brier score. They see both aggregation methods as permissible, but attaining different goals: averaging minimizes the expected inaccuracy on Monday; summing considers equally all possible awakenings (Monday-Heads, Monday-Tails and Tuesday-Tails).

Multiple Experiments

When the experiment is repeated $n > 1$ times, in each possible world there are multiple awakenings, from different experiments, all of which must be considered when measuring inaccuracy. In each awakening, we have a credence of interest $(c_{i} (H_{i}) = p)$ , whose inaccuracy we want to minimize. Thus, to measure inaccuracy in a possible world, we need to aggregate the inaccuracies corresponding to different awakenings, possibly from different experiments and different credences of interest.

To ilustrate three possible aggregation approaches, we will consider as an example the world $w_{12}$ , where the first experiment has one awakening (Heads) and the second, two (Tails). In the first experiment's awakening, the credence $c_{1} (H_{1}) = p$ has inaccuracy $s (p, 1)$ . In the second experiment, there are two awakenings, and, in each, the credence $c_{2} (H_{2}) = p$ has inaccuracy $s (p, 0)$ .

Minimizing the Expected Total Inaccuracy

One way to measure how inaccurate the Sleeping Beauty is in the world $w_{12}$ is to sum the three inaccuracy measurements, for $c_{1} (H_{1}) = p$ , $c_{2} (H_{2}) = p$ and $c_{2} (H_{2}) = p$ , which yields $s (p, 1) + 2 s (p, 0)$ . In general, the total inaccuracy in a possible world is the sum of the total inaccuracies per experiment. If $t_{j} : W \to R$ is a random variable denoting the total inaccuracy of $c_{j} (H_{j}) = p$ in the $j^{t h}$ -experiment, the expected total inaccuracy can be written as $E_{c_{0}} [I] = E_{c_{0}} [\sum_{j} t_{j}]$ . By the linearity of expectancy, this becomes $\sum_{j} E_{c_{0}} [t_{j}]$ . Any experiment has one awakening in half of the possible worlds, where $t_{j} = s (p, 1)$ , and has two awakenings in the other half, where $t_{j} = 2 s (p, 0)$ . As possible worlds are equiprobable according to $c_{0}$ , we have $E_{c_{0}} [t_{j}] = 0.5 s (p, 1) + s (p, 0)$ for all $j$ , yielding:

$E_{c_{0}} [I] = E_{c_{0}} [n \sum j = 1 t_{j}] = n \sum j = 1 E_{c_{0}} [t_{j}] = n (0.5 s (p, 1) + s (p, 0)) = \frac{3 n}{2} (\frac{1}{3} s (p, 1) + \frac{2}{3} s (p, 0))$

For any positive $n$ , only $p = 1 / 3$ minimizes this expression, for $s$ is strictly proper.

Minimizing the Expected Sum of Experiments' Mean Inaccuracies

A second aggregation approach would be to sum (or average^[8]) the mean inaccuracies of the experiments. In $w_{12}$ , the second experiment's mean inaccuracy is $s (p, 0) = (s (p, 0) + s (p, 0)) / 2$ , resulting in $s (p, 1) + s (p, 0)$ when summed with the first experiment's awakening inaccuracy. In general, this sum can be decomposed per experiments again, and linearity of expectancy can applied as before. If $m_{j} : W \to R$ is the random variable denoting the mean inaccuracy of $c_{j} (H_{j}) = p$ in the $j^{t h}$ -experiment, the sum of experiments' mean inaccuracies is $E_{c_{0}} [I] = E_{c_{0}} [\sum_{j} m_{j}] = \sum_{j} E_{c_{0}} [m_{j}]$ . Since $m_{j}$ is $s (p, 1)$ in exactly half of the worlds, and $s (p, 0)$ in the other half, we have $E_{c_{0}} [m_{j}] = 0.5 s (p, 1) + 0.5 s (p, 0)$ . Summing for all experiments, the expected sum of experiments' mean inaccuracies is:

$E_{c_{0}} [I] = E_{c_{0}} [n \sum j = 1 m_{j}] = n \sum j = 1 E_{c_{0}} [m_{j}] = n (\frac{1}{2} s (p, 1) + \frac{1}{2} s (p, 0))$

For any positive $n$ , this expression is minimized only at $p = 1 / 2$ , for $s$ is strictly proper.

Minimizing the Expected Average Inaccuracy across Awakenings

A third aggregation option is to arithmetically average all the inaccuracies of all awakenings' credences of interest ( $c_{i} (H_{i}) = p$ in the $i^{t h}$ -experiment) in a possible world. For $w_{12}$ , this approach yields an average inaccuracy of $[s (p, 1) + 2 s (p, 0)] / 3$ . Note that minimizing the expected value of this average is not equivalent to minimizing the expected total inaccuracy, as the denominator (the number of awakenings) might change in different worlds. In fact, in this case the expected inaccuracy can no longer be decomposed across experiments and we need a different approach.

Given an $n$ and a $p$ , the average inaccuracy of the credences of interest in a possible world depends only on the number of experiments with one/two awakenings. For any world where there are exactly $k$ experiments with two awakenings (Tails), there are $2 k$ awakenings occurring in pairs in the same experiment, each yielding inaccuracy $s (p, 0)$ for the credence of interest. In each of the remaining $n - k$ experiments, there is a single awakening, and the inaccuracy of the credence of interest is $s (p, 1)$ . In other words, the aggregate inaccuracy $I$ can be written using the random variable $T : W \to R$ that maps worlds to the number of experiments with two awakenings:

$I = \frac{(n - T) s (p, 1) + 2 T s (p, 0)}{n + T}$

Note that $T$ follows a binomial distribution with $n$ trials and probability $1 / 2$ . That is, according to $c_{0}$ , the probability that there are exactly $T = k$ experiments with two awakenings (coin tosses landing Tails) is $\frac{1}{2^{n}} (\frac{n}{k})$ . Therefore, the expected inaccuracy is given by:

$E_{c_{0}} [I] = n \sum k = 0 \frac{1}{2^{n}} (\frac{n}{k}) \frac{(n - k) s (p, 1) + 2 k s (p, 0)}{n + k}$

Replacing $n$ by $1$ , we have an expression minimized only at $p = 1 / 2$ :

$\frac{1}{2} s (p, 1) + \frac{1}{2} s (p, 0)$

This is no surprise, as averaging across awakenings is considering the mean inaccuracy per experiment when $n = 1$ . When we make $n = 2$ , the expected inaccuracy is:

$\frac{5}{12} s (p, 1) + \frac{7}{12} s (p, 0)$

That is, only $p = 5 / 12$ minimizes the expected inaccuracy. This aligns with the result from Bostrom's hybrid model.

One can see that the optimal $p$ keeps growing with $n$ and ask what happens in the limit. Informally, we can argue that the binomial random variable $T$ concentrates around $n / 2$ as $n$ tends to infinity^[9]. Replacing $T$ by $n / 2$ in the expression for $I$ , we obtain:

$\frac{(n - \frac{n}{2}) s (p, 1) + 2 \frac{n}{2} s (p, 0)}{n + \frac{n}{2}} = \frac{1}{3} s (p, 1) + \frac{2}{3} s (p, 0)$

That is, when exactly half of the experiments have two awakenings, $p = 1 / 3$ is optimal. The limit when $n$ tends to infinity can indeed be formally proven^[10], so we can write:

$lim n \to \infty E_{c_{0}} [I] = lim n \to \infty n \sum k = 0 \frac{1}{2^{n}} (\frac{n}{k}) \frac{(n - k) s (p, 1) + 2 k s (p, 0)}{n + k} = \frac{1}{3} s (p, 1) + \frac{2}{3} s (p, 0)$

Consequently, when $n \to \infty$ , only $p = 1 / 3$ minimizes the expected average inaccuracy across awakenings. Again, this result aligns with Bostrom's hybrid model.

Concluding Remarks

For the repeated Sleeping Beauty problem, we presented three ways of aggregating the inaccuracies of credences assigned in different, indistinguishable awakenings in a given possible world. Minimizing total inaccuracy supports the thirder's argument. Considering first the mean inaccuracy per experiment, then minimizing their sum or average, leads to the halfer's solution, where there is no update on the credences (as conditionalizing on a tautology). Taking the average across awakenings agrees with Bostrom's hybrid model, which is a double-halfer solution^[11]. Although these results by themselves do not point to a solution, they give another battleground for the dispute, where maximizing accuracy (epistemic utility) is the agent's goal.

^{^}
Interesting discussion on the problem can be found, for instance, here [LW · GW], here [AF · GW] or here [LW · GW].
^{^}
Our repeated version is equivalent to Bostrom's N-fold version, being different from the Repeated (and improved) Sleeping Beauty problem [AF · GW].
^{^}
Note that each $c_{i}$ , for $i \geq 1$ , does not necessarily refer to a unique point in time, but possibly to a pair of instants, when SB is awakened during the $i^{t h}$ -experiment. As these instants are indistinguishable for her, we assume the corresponding credence functions are all equal.
^{^}
As SB does not know in which experiment she is in our setup, upon each awakening in the $i^{t h}$ -experiment, she can think of $H_{i}$ as "The coin toss landed Heads for this experiment".
^{^}
A scoring rule $s$ is strictly proper if $p s (x, 1) + (1 - p) s (x, 0)$ is minimized only at $x = p$ .
^{^}
Defined via $s (q, v) = (q - v)^{2}$ . For instance, with $n = 1$ , if Sleeping Beauty assigns $c_{1} (H) = 0.4$ upon awakening, but the coin lands Tails (the proposition $H$ is false), Brier score yields $s (0.4, 0) = {0.4}^{2} = 0.16$ .
^{^}
There are infinite choices for this aggregation. For instance, one can consider the inaccuracies on Monday and Tuesday as components of a vector $(s_{M}, s_{T}) \in R^{2}$ and then pick a norm (e.g. the Euclidean) in the corresponding vector space.
^{^}
We could equivalently consider the average of the means, but, as $n$ is fixed across all worlds, this does not change the $p$ minimizing the aggregate inaccuracy.
^{^}
Bostrom employs this informal argument to determine his hybrid model yields optimal $p = 1 / 3$ for large $n$ .
^{^}
A proof sketch is given in the paper draft.
^{^}
Another double-halfer solution can found here [LW · GW].

2 comments

Comments sorted by top scores.

comment by SMK (Sylvester Kollin) · 2025-02-11T12:45:19.963Z · LW(p) · GW(p)

I think it would be good if you made clear in the abstract what your contributions to the literature are, and how your results relate to those of e.g. Kierland and Monton (2005).

Replies from: glauberdebona

↑ comment by glauberdebona · 2025-02-11T14:09:46.182Z · LW(p) · GW(p)

Thanks for the feedback! The summary here and the abstract in the draft paper have been updated; I hope it is clearer now.

Sleeping Beauty: an Accuracy-based Approach

Contents

Summary

Introduction

Formal Setting

Single Experiment

Multiple Experiments

Minimizing the Expected Total Inaccuracy

Minimizing the Expected Sum of Experiments' Mean Inaccuracies

Minimizing the Expected Average Inaccuracy across Awakenings

Concluding Remarks

2 comments