Posts
Comments
I think the temptation is very strong to notice the distinction between the elemental nature of raw sensory inputs and the cognitive significance they are the bearers of. And this is so, and is useful to do, precisely to the extent that the cognitive significance will vary depending on context and background knowledge, such as light levels, perspective, etc. because those serve as dynamically updated calibrations of cognitive significance. But these calibrations become transparent with use, so that we see, hear and feel vividly and directly in three dimensions because we have learned that that is the cognitive significance of what we see, hear, feel and navigate through. Subjective experience comes cooked and raw in the same dish. It then takes an analytic effort of abstraction of a painter's eye to notice that it takes an elliptical shape on a focal plane to induce the visual experience of a round coin on a tabletop. Thus ambiguities, ambivalences and confusions abound about what constitutes the contents of subjective experience.
I'm reminded of an experiment I read about quite some time ago in a very old Scientific American I think, in which (IIRC) psychology subjects were fitted with goggles containing prisms that flipped their visual fields upside down. They wore them for upwards of a month during all waking hours. When they first put them on, they could barely walk at all without collapsing in a heap because of the severe navigational difficulties. After some time, the visual motor circuits in their brains adapted, and some were even able to re-learn how to ride a bike with the goggles on. After they could navigate their world more or less normally, they were asked whether at anytime their visual field ever "flipped over" so that things started looking "right side up" again. No, there was no change, things looked the same as when they first put the goggles on. So then things still looked "upside down"? After a while, the subjects started insisting that the question made no sense, and they didn't know how to answer it. Nothing changed about their visual fields, they just got used to it and could successfully navigate in it; the effect became transparent.
(Until they took the goggles off after the experiment ended. And then they were again seriously disoriented for a time, though they recovered quickly.)
I'm what David Chalmers would call a "Type-A materialist" which means that I deny the existence of "subjective facts" which aren't in some way reducible to objective facts.
The concerns Chalmers wrote about focused on the nature of phenomenal experience, and the traditional dichotomy between subjective and objective in human experience. That distinction draws a dividing line way off to the side of what I'm interested in. My main concern isn't with ineffable consciousness, it's with cognitive processing of information, information defined as that which distinguishes possibilities, reduces uncertainty and can have behavioral consequences. Consequences for what/whom? Situated epistemic agents, which I take as ubiquituous constituents of the world around us, and not just sentient life-forms like ourselves. Situated agents that process information don't need to be very high on the computational hierarchy in order to be able to interact with the world as it is, use representations of the world as they take it to be, and entertain possibilities about how well their representations conform to what they are intended to represent. The old 128MB 286 I had in the corner that was too underpowered to run even a current version of linux, was powerful enough to implement an instantiation of a situated Bayesian agent. I'm completely fine with stipulating that it had about as much phenomenal or subjective experience as a chunk of pavement. But I think there are useful distinctions totally missed by Chalmers' division (which I'm sure he's aware of, but not concerned with in the paper you cite), between what you might call objective facts and what you might call "subjective facts", if by the latter you include essentially indexical and contextual information, such as de se and de dicto information, as well as de re propositions.
Therefore, I think that centered worlds can be regarded one of two ways: (i) as nonsense or (ii) as just a peculiar kind of uncentered world: A "centered world" really just means an "uncentered world that happens to contain an ontologically basic, causally inert 'pointer' towards some being and an ontologically basic, causally inert catalogue of its "mental facts". However, because a "center" is causally inert, we can never acquire any evidence that the world has a "center".
(On Lewis's account, centered worlds are generalizations of uncentered ones, which are contained in them as special cases.) From the point of view of a situated agent, centered worlds are epistemologically prior, about as patently obvious as the existence of "True", "False" and "Don't Know", and the uncentered worlds are secondary, synthesized, hypothesized and inferred. The process of converting limited indexical information into objective, universally valid knowledge is where all the interesting stuff happens. That's what the very idea of "calibration" is about. To know whether they (centered worlds or the other kind) are ontologically prior it's just too soon for me to tell, but I feel uncomfortable prejudging the issue on such strict criteria without a more detailed exploration of the territory on the outside of the walled garden of God's Own Library of Eternal Verity. In other words, with respect to that wall, I don't see warrant flowing from inside out, I see it flowing from outside in. I suppose that's in danger of making me an idealist, but I'm trying to be a good empiricist.
The Bayesian calculation only needs to use the event "Tuesday exists"
I can't follow this. If "Tuesday exists" isn't indexical, then it's exactly as true on Monday as it is on Tuesday, and furthermore as true everywhere and for everyone as it is for anyone.
there doesn't seem to be any non-arbitrary way of deriving a distribution over centered worlds from a distribution over uncentered ones.
Indeed, unless you work within the confines of a finite toy model. But why go in that direction? What non-arbitrary reason is there not to start with centered worlds and try to derive a distribution over uncentered ones? In fact, isn't that the direction scientific method works in?
I suppose I'm being obtuse about this, but please help me find my way through this argument.
- The event "it is Monday today" is indexical. However, an "indexical event" isn't strictly speaking an event. (Because an event picks out a set of possible worlds, whereas an indexical event picks out a set of possible "centered worlds".) Since it isn't an event, it makes no sense to treat it as 'data' in a Bayesian calculation.
Isn't this argument confounded by the observation that an indexical event "It is Tuesday today", in the process of ruling out several centered possible worlds--the ones occurring on Monday--also happens to rule out an entire uncentered world? If it's not an event, how does it makes sense to treat it as data in a Bayesian calculation that rules out Heads? If that wasn't the event that entered into the Bayesian calculation, what was?
On further reflection, both Ancestor and each Descendant can consider the proposition P(X) = "X is a descendant & X is a lottery winner". Given the setup, Ancestor can quantify over X, and assign probability 1/N to each instance. That's how the statement {"I" will win the lottery with probability 1} is to be read, in conjunction with a particular analysis of personal identity that warrants it. This would be the same proposition each descendant considers, and also assigns probability 1/N to. On this way of looking at it, both Ancestor and each descendant are in the same epistemic state, with respect to the question of who will win the lottery.
Ok, so far so good. This same way of looking at things, and the prediction about probability of descendants, is a way of looking at the Sleeping Beauty problem I tried to explain some months ago, and from what I can see is an argument for why Beauty is able to assert on Sunday evening what the credence of her future selves should be upon awakening (which is different from her own credence on Sunday evening), and therefore has no reason to change it when she later awakens on various occasions. It didn't seem to get much traction then, probably because it was also mixed in with arguments about expected frequencies.
There need be no information transferred.
I didn't quite follow this. From where to where?
But anyway, yes, that's correct that the referents of the two claims aren't the same. This could stand some further clarification as to why. In fact, Descendant's claim makes a direct reference to the individual who uttered it at the moment it's uttered, but Ancestor's claim is not about himself in the same way. As you say, he's attempting to refer to all of his descendants, and on that basis claim identity with whichever particular one of them happens to win the lottery, or not, as the case may be. (As I note above, this is not your usual equivalence relation.) This is an opaque context, and Ancestor's claim fails to refer to a particular individual (and not just because that individual exists only in the future). He can only make a conditional statement: given that X is whoever it is will win the lottery (or not), the probability that person will win the lottery (or not) is trivial. He lacks something that allows him to refer to Descendant outside the scope of the quantifier. Descendant does not lack this, he has what Ancestor did not have-- the wherewithal to refer to himself as a definite individual, because he is that individual at the time of the reference.
But a puzzle remains. On this account, Ancestor has no credence that Descendant will win the lottery, because he doesn't have the means to correctly formulate the proposition in which he is to assert a credence, except from inside the scope of a universal quantifier. Descendant does have the means, can formulate the proposition (a de se proposition), and can now assert a credence in it based on his understanding of his situation with respect to the facts he knows. And the puzzle is, Descendant's epistemic state is certainly different from Ancestor's, but it seems it didn't happen through Bayesian updating. Meanwhile, there is an event that Descendant witnessed that served to narrow the set of possible worlds he situates himself in (namely, that he is now numerically distinct from any of the other descendants), but, so the argument goes, this doesn't count as any kind of evidence of anything. It seems to me the basis for requiring diachronic consistency is in trouble.
I don't think personal identity is a mathematical equivalence relation. Specifically, it's not symmetric: "I'm the same person you met yesterday" actually needs to read "I was the same person you met yesterday"; "I will be the same person tomorrow" is a prediction that may fail (even assuming I survive that long). This yields failures of transitivity: "Y is the same person as X" and "Z is the same person as X" doesn't get you "Y is the same person as Z".
Given that you know there will be a future stage of you that will win the lottery how can that copy (the copy that is the future stage of you that has won the lottery) be surprised?
It's not the ancestor--he who is certain to have a descendant that wins the lottery--who wins the lottery, it's that one descendant of him who wins it, and not his other one(s). Once a descendant realizes he is just one of the many copies, he then becomes uncertain whether he is the one who will win the lottery, so will be surprised when he learns whether he is. I think the interesting questions here are
1) Consider the epistemic state of the ancestor. He believes he is certain to win the lottery. There is an argument that he's justified in believing this.
2) Now consider the epistemic state of a descendant, immediately after discovering that he is one of several duplicates, but before he learns anything about which one. There is some sense in which his (the descendant's) uncertainty about whether he (the descendant) will win the lottery has changed from what it was in 1). Aside: in a Bayesian framework, this means having received some information, some evidence on which to update. But the only plausible candidate in sight is the knowledge that he is now just one particular one of the duplicates, not the ancestor anymore (e.g., because he has just awoken from the procedure). But of course, he knew that was going to happen with certainty before, so some deny that he learns anything at all. This seems directly analogous to Sleeping Beauty's predicament.
3) Descendant now learns whether he's the one who's won the lottery. Descendant could not have claimed that with certainty before, so he definitely does receive new information, and updates accordingly (all of them do). There is some sense in which the information received at this point exactly cancels out the information(?) in 2).
A couple points:
Of course, Bayesians can't revise certain knowledge, so the standard analysis gets stuck on square 1. But I don't see that the story changes in any significant way if we substitute "reasonable certainty(epsilon)" throughout, so I'm happy to stipulate if necessary.
Bayesians have a problem with de se information: "I am here now". The standard framework on which Bayes' Theorem holds deals with de re information. De se and de dicto statements have to be converted into de re statements before they can be processed as evidence. This has to be done via various calibrations that adequately disambiguate possibilities and interpret contexts and occasions: who am I, what time is it, and where am I. This process is often taken for granted, because it usually happens transparently and without error. Except when it doesn't.
I may need to be providing a more extensive philosophical context about personal identity for this to make sense, I'm not sure.
I hope you do.
I wonder if this can stand in for the Maher?
Did I accuse someone of being incoherent? I didn't mean to do that, I only meant to accuse myself of not being able to follow the distinction between a rule of logic (oh, take the Rule of Detachment for instance) and a syntactic elimination rule. In virtue of what do the latter escape the quantum of sceptical doubt that we should apply to other tautologies? I think there clearly is a distinction between believing a rule of logic is reliable for a particular domain, and knowing with the same confidence that a particular instance of its application has been correctly executed. But I can't tell from the discussion if that's what's at play here, or if it is, whether it's being deployed in a manner careful enough to avoid incoherence. I just can't tell yet. For instance,
Conditioning on this tiny credence would produce various null implications in my reasoning process, which end up being discarded as incoherent
I don't know what this amounts to without following a more detailed example.
It all seems to be somewhat vaguely along the lines of what Hartry Field says in his Locke lectures about rational revisability of the rules of logic and/or epistemic principles; his arguments are much more detailed, but I confess I have difficulty following him too.
Ah, thanks for the pointer. Someone's tried to answer the question about the reliability of Bayes' Theorem itself too I see. But I'm afraid I'm going to have to pass on this, because I don't see how calling something a syntactic elimination rule instead a law of logic saves you from incoherence.
Probabilities of 1 and 0 are considered rule violations and discarded.
What should we take for P(X|X) then?
And then what can I put you down for the probability that Bayes' Theorem is actually false? (I mean the theorem itself, not any particular deployment of it in an argument.)
We're getting ahead of the reading, but there's a key distinction between the plausibility of a single proposition (i.e. a probability) and the plausibilities of a whole family of related plausibilities (i.e. a probability distribution).
Ok, that sounds helpful. But then my question is this-- if we have whole family of mutually exclusive propositions, with varying real numbers for plausibilities, about the plausibility of one particular proposition, then the assumption that that one proposition can have one specific real number as its plausibility is cast in doubt. I don't yet see how we can have all those plausibility assignments in a coherent whole. But I'm happy to leave my question on the table if we'll come to that part later.
Perhaps it would be wiser to use complex numbers for instance.
Perhaps it might be wiser to use measures (distributions), or measures on spaces of measures, or iterate that construction indefinitely. (The concept of hyperpriors seems to go in this direction, for example.)
But intuitively it seems very likely that if you tell me two different propositions, that I can say either that one is more likely than the other, or that they are the same. Are there any special cases where one has to answer "the probabilities are uncomparable" that makes you doubt that it is so?
Consider the following propositions.
P1: The recently minted U.S. quarter I just vigorously flipped into the air landed heads on the floor.
P2: A ball pulled from an unspecified urn containing an unspecified number of balls is white.
P3(x): The probability of P2 is x
Part of the problem is the laxness in specifying the language, as I mentioned. For example, if the language we use is rich enough to support self-referring interpretations, then it may not even be possible to coherently assign a truth value--or any other probability, or to know whether that is possible.
But even ruling out Goedelian potholes in the landscape and uncountably infinite families of propositions, the contrast between P1 and P2 is problematic. P1 is backed up by a vast trove of background knowledge and evidence, and our confidence in asserting Prob(P1) = 1/2 is very strong. On the other hand, background knowledge and evidence about P2 is virtually nil. It is reasonable as a matter of customary usage to assume the number of balls in the urn is finite, and thus the probability of P1 is a rational number, but until you start adding in more assumptions and evidence, one's confidence in Prob(P2) < x for any particular real number x seems typically to be very much lower than for P1. Summarizing one's state of knowledge about these two propositions onto the same scale of reals between 0 and 1 seems to ignore an awful lot that we know about the relative state of knowledge vs. ignorance with respect to P1 and P2. An awful lot of knowledge is being jettisoned because it won't fit into this scheme of definite real numbers. To make the claim Prob(P2) = 1/2 (or any other definite real number you want to name) just does not seem like the same kind of thing as the claim Prob(P1) = 1/2. It feels like a category mistake.
Jaynes addresses this to some degree in Appendix A4 "Comparative Probability". He presents an argument that seems to go like this. It hardly matters very much what real number we use to start with for a statement without much background evidence, because the more evidence we accumulate, the more our assignments are coordinated with other statements into a comprehensive picture, and the probabilities eventually converge to true and correct values. That's a heartening way to look at it, but it also goes to show that many of the assignments of specific real numbers we make, such as for P2 or P3, are largely irrelevancies that are right next door to meaningless. And in the end he reiterates his initial argument that the benefits of being able to have a real number to calculate with are irresistible. This comes at the price of helping ourselves to the illusion of more precision than our state of ignorance seems to entitle us to. This is why the axiom of comparability seems to me to make an unnatural correspondence to the way we could or should think about these things.
- Can you think of further desiderata for plausible inference, or find issues with the one Jaynes lays out?
I find desideratum 1) to be poorly motivated, and a bit problematic. This is urged upon us in Chapter 1 mainly by considerations of convenience: a reasoning robot can't calculate without numbers. But just because a calculator can't calculate without numbers doesn't seem a sufficient justification to assume those numbers exist, i.e., that a full and coherent mapping from statements to plausibilities exists. This doesn't seem the kind of thing we can assume is possible, it's the kind of thing we need to investigate to see if it's possible.
This of course will depend on what class of statements we'll allow into our language. I can see two ways forward on this: 1) we can assume that we have language of statements for which desideratum 1) is true. But then we need to understand what restrictions we've placed on the kinds of statements that can have numerical plausibilities. Or 2) We can pick a language that we want to use to talk about the world, and then investigate whether desideratum 1) can be satisfied by that language. I don't see that this issue is touched on in Chapter 1.
There is further discussion of this in Appendix C; will this be discussed in connection with Chapter 1, or at some later time in the sequence? For example, in Appendix C, it turns out that desideratum 1 subdivides into two other axioms: transitivity, and universal comparability. The first one makes sense, but the second one doesn't seem as compelling to me.
Perhaps this is beating a dead horse, but here goes. Regarding your two variants:
1 Same as SSB except If heads, she is interviewed on Monday, and then the coin is turned over to tails and she is interviewed on Tuesday. There is amnesia and all of that. So, it's either the sequence (heads on Monday, tails on Tuesday) or (tails on Monday, tails on Tuesday). Each sequence has a 50% probability, and she should think of the days within a sequence as being equally likely. She's asked about the current state of the coin. She should answer P(H)=1/4.
I agree. When iterated indefinitely, the Markov chain transition matrix is:
[ 0 1 0 0 ]
[ 1/2 0 1/2 0 ]
[ 0 0 0 1 ]
[ 1/2 0 1/2 0 ]
acting on state vector [ H1 H2 T1 T2 ], where H,T are coin toss outcomes and 1,2 label Monday,Tuesday. This has probability eigenvector [ 1/4 1/4 1/4 1/4 ]; 3 out of 4 states show Tails (as opposed to the coin having been tossed Tails). By the way, we have unbiased sampling of the coin toss outcomes here.
If the Markov chain model isn't persuasive, the alternative calculation is to look at the branching probability diagram
[http://entity.users.sonic.net/img/lesswrong/sbv1tree.png (SB variant 1)]
and compute the expected frequencies of letters in the result strings at each leaf on Wednesdays. This is
0.5 * ( H + T ) + 0.5 * ( T + T ) = 0.5 * H + 1.5 * T.
2 Same as SSB except If heads, she is interviewed on Monday, and then the coin is flipped again and she is interviewed on Tuesday. There is amnesia and all of that. So, it's either the sequence (heads on Monday, tails on Tuesday), (heads on Monday, heads on Tuesday) or (tails on Monday, tails on Tuesday). The first 2 sequences have a 25% chance each and the last one has a 50% chance. When asked about the current state of the coin, she should say P(H)=3/8
I agree. Monday-Tuesday sequences occur with the following probabilities:
HH: 1/4
HT: 1/4
TT: 1/2
Also, the Markov chain model for the iterated process agrees:
[ 0 1/2 0 1/2 ]
[ 1/2 0 1/2 0 ]
[ 0 0 0 1 ]
[ 1/2 0 1/2 0 ]
acting on state vector [ H1 H2 T1 T2 ] gives probability eigenvector [ 1/4 1/8 1/4 3/8 ]
Alternatively, use the branching probability diagram
[http://entity.users.sonic.net/img/lesswrong/sbv2tree.png (SB variant 2)]
to compute expected frequencies of letters in the result strings,
0.25 * ( H + H ) + 0.25 * ( H + T ) + 0.5 * ( T + T ) = 0.75 * H + 1.25 * T
Because of the extra coin toss on Tuesday after Monday Heads, these are biased observations of coin tosses. (Are these credences?) But neither of these two variants is equivalent to Standard Sleeping Beauty or its iterated variants ISB and ICSB.
The 1/2 solution to SSB results from similar reasoning. 50% chance for the sequence (Monday and heads). 50% chance for the sequence (Monday and tails, Tuesday and tails). P(H)=1/2
(Sigh). I don't think your branching probability diagram is correct. I don't know what other reasoning you are using. This is the diagram I have for Standard Sleeping Beauty
[http://entity.users.sonic.net/img/lesswrong/ssbtree.png (Standard SB)]
And this is how I use it, using exactly the same method as in the two examples above. With probability 1/2 the process accumulates 2 Tails observations per week, and with probability 1/2 accumulates 1 Heads observation. The expected number of observations per week is 1.5, the expected number of Heads observations per week is 0.5, the expected number of Tails observations is 1 per week.
0.5 * ( H ) + 0.5 * ( T + T ) = 0.5 * H + 1.0 * T
Likewise when we record Monday/Tuesday observations per week instead of Heads/Tails, the expected number of Monday observations is 1, expected Tuesday observations 0.5, for a total of 1.5. But in both of your variants above, the expected number of Monday observations = expected number of Tuesday observations = 1.
Thanks for your response. I should have been clearer in my terminology. By "Iterated Sleeping Beauty" (ISB) I meant to name the variant that we here have been discussing for some time, that repeats the Standard Sleeping Beauty problem some number say 1000 of times. In 1000 coin tosses over 1000 weeks, the number of Heads awakenings is 1000 and the number of Tails awakenings is 2000. I have no catchy name for the variant I proposed, but I can make up an ugly one if nothing better comes to mind; it could be called Iterated Condensed Sleeping Beauty (ICSB). But I'll assume you meant this particular variant of mine when you mention ISB.
You say
Q3. ISB is different from SSB as follows: more than one coin toss; same number of interviews regardless of result of coin toss
"More than one coin toss" is the iterated part. As far as I can see, and I've argued it a couple times now, there's no essential difference between SSB and ISB, so I meant to draw a comparison between my variant and ISB.
"Same number of interviews regardless of result of coin toss" isn't correct. Sorry if I was unclear in my description. Beauty is interviewed once per toss when Heads, twice when Tails. This is the same in ICSB as in Standard and Iterated Sleeping Beauty. Is there an important difference between Standard Sleeping Beauty and Iterated Sleeping Beauty, or is there an important difference between Iterated Sleeping Beauty and Iterated Condensed Sleeping Beauty?
Q4. It makes a big difference. She has different information to condition on. On a given coin flip, the probability of heads is 1/2. But, if it is tails we skip a day before flipping again. Once she has been woken up a large number of times, Beauty can easily calculate how likely it is that heads was the most recent result of a coin flip.
We not only skip a day before tossing again, we interview on that day too! I see how over time Beauty gains evidence corroborating the fairness of the coin (that's exactly my later rhetorical question), but assuming it's a fair coin, and barring Type I errors, she'll never see evidence to change her initial credence in that proposition. In view of this, can you explain how she can use this information to predict with better than initial accuracy the likelihood that Heads was the most recent outcome of the toss? I don't see how.
In SSB, Tuesday&heads doesn't exist, for example.
After relabeling Monday and Tuesday to Day 1 and Day 2 following the coin toss, Tuesday&Heads (H2) exists in none of these variants. So what difference is there?
Q1: I agree with you: 1/3, 1/3, 2/3
Good and well, but--are these legitimate credences? If not, why not? And if so, why aren't they also in the following:
Standard Iterated Sleeping Beauty is isomorphic to the following Markov chain, which just subdivides the Tails state in my condensed variant into Day 1 and Day 2:
[1/2, 1/2, 0]
[0, 0, 1]
[1/2, 1/2, 0]
operating on row vector of states [ Heads&Day1 Tails&Day1 Tails&Day2 ], abbreviated to [ H1 T1 T2 ]
When I say isomorphic, I mean the distinct observable states of affairs are the same, and the possible histories of transitions from awakening to next awakening are governed by the same transition probabilities.
So either there's a reason why my 2-state Markov chain correctly models my condensed variant that allows you to accept the 1/3 answers it computes, that doesn't apply to the three-state Markov chain and its 1/3 answers (perhaps you came to those answers independently of my model), or else there's some reason why the three-state Markov chain doesn't correctly model the Iterated Sleeping Beauty process. Can you help me see where the difficulty may lie?
I'm struggling to see how ISB isn't different from SSB in meaningful ways.
I assume you are referring to my variant, not what I'm calling Iterated Sleeping Beauty. If so, I'm kind of baffled by this statement, because under similarities, you just listed
- fair coin
- woken twice if Tails, once if Heads
- epistemic state reset each day
With the emendation that 2) is per coin toss, and in 3) "each day" = "each awakening", you have just listed three essential features that SSB, ISB and ICSB all have in common. It's exactly those three things that define the SSB problem. I'm claiming that there aren't any others. If you disagree, then please tell me what they are. Or if parts of my argument remain unclear, I can try to go into more detail.
Yet one more variant. On my view it's structurally and hence statistically equivalent to Iterated Sleeping Beauty, and I present an argument that it is. This one has the advantage that it does not rely on any science fictional technology. I'm interested to see if anyone can find good reasons why it's not equivalent.
The Iterated Sleeping Beaty problem (ISB) is the original Standard Sleeping Beauty (SSB) problem repeated a large number N of times. People always seem to want to do this anyway with all the variations, to use the Law of Large Numbers to gain insight to what they should do in the single shot case.
The Setup
- As before, Sleeping Beauty is fully apprised of all the details ahead of time.
- The experiment is run for N consecutive days (N is a large number).
- At midnight 24 hours prior to the start of the experiment, a fair coin is tossed.
- On every subsequent night, if the coin shows Heads, it is tossed again; if it shows Tails, it is turned over to show Heads.
(This process is illustrated by a discrete-time Markov chain with transition matrix:
[1/2 1/2] = P
[ 1 0 ]
and the state vector is the row
x = [ Heads Tails ],
with consecutive state transitions computed as x * P^k
Each morning when Sleeping Beauty awakes, she is asked each of the following questions:
- "What is your credence that the most recent coin toss landed Heads?"
- "What is your credence that the coin was tossed last night?"
- "What is your credence that the coin is showing Heads now?"
The first question is the equivalent of the question that is asked in the Standard Sleeping Beauty problem. The second question corresponds to the question "what is your credence that today is Monday?" (which should also be asked and analyzed in any treatment of the Standard Sleeping Beauty problem.)
Note: in this setup, 3) is different than 1) only because of the operation of turning the coin over instead of tossing it. This is just a perhaps too clever mechanism to count down the days (awakenings, actually) to the point when the coin should be tossed again. It may very well make a better example if we never touch the coin except to toss it, and use some other deterministic countdown mechanism to count repeated awakenings per coin toss. That allows easier generalization to the case where the number of days to awaken when Tails is greater than 2. It also makes 3) directly equivalent to the standard SB question, and also 1) and 3) have the same answers. You decide which mechanism is easier to grasp from a didactic point of view, and analyze that one.
- After that, Beauty goes on about her daily routine, takes no amnesia drugs, sedulously avoids all matter duplicators and transhuman uploaders, and otherwise lives a normal life, on one condition: she is not allowed to examine the coin or discover its state (or the countdown timer) until the experiment is over.
Analysis
- Q1: How should Beauty answer?
- Q2: How is this scenario similar in key respects to the SSB/ISB scenario?
- Q3: How does this scenario differ in key respects from the SSB/ISB scenario?
- Q4: How would those differences if any make a difference to how Beauty should answer?
My answers:
Q1: Her credence that the most recent coin toss landed Heads should be 1/3. Her credence that the coin was tossed last night should be 1/3. Her credence that the coin shows Heads should be 2/3. (Her credence that the coin shows Heads should be 1/3 if we never turn it over, only toss, and 1/K if the countdown timer counts K awakenings per Tail toss.)
Q2: Note that Beauty's epistemic state regarding the state of the coin, or whether it was tossed the previous midnight, is exactly the same on every morning, but without the use of drugs or other alien technology. She awakens and is asked the questions once every time the coin toss lands Heads, and twice every time it lands tails. In Standard Sleeping Beauty, her epistemic state is reset by the amnesia drugs. In this setup, her epistemic state never needs to be reset because it never changes, simply because she never receives any new information that could change it, including the knowledge of when the coin has been tossed to start a new cycle.
Q3: In ISB, a new experimental cycle is initiated at fixed times--Monday (or Sunday midnight). Here the start of a new "cycle" occurs with random timing. The question arises, does the difference in the speed of time passing make any difference to the moments of awakening when the question is asked? Changing labels from "Monday" and "Tuesday" to "First Day After Coin Toss" and "Second Day After Coin Toss" respectively makes no structural change to the operation of the process. Discrete-time Markov chains have no timing, they have only sequence.
In the standard ISB, there seems to be a natural unit of replication: the coin toss on Sunday night followed by whatever happens through the rest of the week. Here, that unit doesn't seem so prominent, though it still exists as a renewal point of the chain. In a recurrent Markov chain, the natural unit of replication seems to be the state transition. Picking a renewal point is also an option, but only as a matter of convenience of calculation; it doesn't change the analysis.
Q4: I don't see how. The events, and the processes which drive their occurence haven't changed that I can see, just our perspective in looking at them. What am I overlooking?
Iteration
I didn't tell you yet how N is determined and how the experiment is terminated. Frankly, I don't think it matters all that much as N gets large, but let's remove all ambiguity.
Case A: N is a fixed large number. The experiment is terminated on the first night on which the coin shows Heads, after the Nth night.
Case B: N is not fixed in advance, but is guaranteed to be larger than some other large fixed number N', such that the coin has been tossed at least N' times. Once N' tosses have been counted, the experiment is terminated on any following night on which the coin shows Heads, at the whim of the Lab Director.
Q5: If N (or N') is large enough, does the difference between Case A and B make a difference to Beauty's credence? (To help sharpen your answer, consider Case C: Beauty dies of natural causes before the experiment terminates.)
Note that in view of the discussion under Q3 above, we are picking some particular state in the transition diagram and thinking about recurrence to and from that state. We could pick any other state too, and the analysis wouldn't change in any significant way. It seems more informative (to me at any rate) to think of this as an ongoing prcess that converges to stable behavior at equilibrium.
Extra Credit:
This gets right to the heart of what a probability could mean, what things can count as probabilities, and why we care about Sleeping Beauty's credence.
Suppose Beauty is sent daily reports showing cumulative counts of the nightly heads/tails observations. The reports are sufficiently old as not to give any information about the current state of the coin or when it was last tossed. (E.g., the data in the report are from at least two coin tosses ago.) Therefore Beauty's epistemic state about the current state of the coin always remains in its initial/reset state, with the following exception. Discuss how Beauty could use this data to--
- corroborate that the coin is in fact fair as she has been told.
- update her credences, in case she accrues evidence that shows the coin is not fair.
For me this is the main attraction of this particular model of the Sleeping Beauty setup, so I'm very interested in any possible reasons why it's not equivalent.
Two ways to iterate the experiment:
- Replicate the entire experiment 1000 times. That is, there will be 1000 independent tosses of the coin. This will lead between 1000 and 2000 awakenings, with expected value of 1500 awakenings.
and
- Replicate her awakening-state 1000 times. Because her epistemic state is always the same on an awakening, from her perspective, it could be Monday or Tuesday, it could be heads or tails.
The distinction between 1 and 2 is that, in 2, we are trying to repeatedly sample from the joint probability distributions that she should have on an awakening. In 1, we are replicating the entire experiment, with the double counting on tails.
This seems a distinction without a difference. The longer the iterated SB process continues, the less important is the distinction between counting tosses versus counting awakenings. This distinction is only about a stopping criterion, not about the convergent behavior of observations or coin tosses to expected values as it's ongoing. Considered as an ongoing process of indefinite duration, the expected number of tosses and of observations of each type are well-defined, easily computed, and well-behaved with respect to each other. Over the long run, #awakenings accumulates 1.5 times more frequently than #tosses. Beauty is never more than two awakenings away from starting a new coin toss, so whether you choose to stop as soon as an awakening has completed or until you finish a coin-toss cycle, the relative perturbation in the statistics collected so far goes to zero. Briefly, there is no "natural" unit of replication independent of observer interest.
She knows that it was a fair coin. She knows that if she's awake it's definitely Monday if heads, and could be either Monday or Tuesday if tails. She knows that 50% of coin tosses would end up heads, so we assign 0.5 to Monday&heads.
This would be an error. You are assigning a 50% probability to an observation (that it is Heads&Monday) without taking into account the bias that's built in to the process for Beauty to make observations. Alternatively, if you are uncertain whether Monday is true or not--you know it might be Tuesday--then you should be uncertain that P(Heads)=P(Heads&Monday).
You the outside observer know the chance of observing that the coin lands Heads is 50%. You presumably know this because you have corroborated it through an unbiased observation process: look at the coin exactly once per toss. Once Beauty is put to sleep and awoken, she is no longer an outside observer, she is a particpant in a biased observation process, so she should update her expectation about what her observation process will show. Different observation process, different observations, different likelhoods of what she can expect to see.
Of course, as a card-carrying thirder, I'm assuming that the question about credence is about what Beauty is likely to see upon awakening. That's what the carefully constructed wording of the question suggests to me.
She knows that 50% of coin tosses would end up tails,
except that as we agreed, she's not observing coin tosses, she's observing biased samples of coin tosses. The connection between what she observes and the objective behavior of the coin is just what's at issue here, so you can't beg the question.
In 1, people are using these ratios of expected counts to get the 1/3 answer. 1/3 is the correct answer to the question about the long-run frequencies of awakenings preceded by heads to awakenings preceded by tails. But I do not think it is the answer to the question about her credence of heads on an awakening.
Agreed, but for this: it all depends on what you want credence to mean, and what it's good for; see discussion below.
In 2, the joint probabilities are determined ahead of time based on what we know about the experiment.
Let n2 and n3 are counts, in repeated trials, of tails&Monday and tails&Tuesday, respectively. You will of course see that n2=n3. They are the same random variable. tails&Monday and tails&Tuesday are the same.
Let me uphold a distinction that's continually skated over, but which is crucial point of disagreement here. I think you're confusing your evidence for the thing evidenced. And you are selectively filtering your evidence, which amounts to throwing away information. Tails&Monday and Tails&Tuesday are not the same; they are distinct observations of the same state of the coin, thus they are perfectly correlated in that regard. Aside from the coin, they observe distinct days of the week, and thus different states of affairs. By a state of affairs I mean the conjunction of all the observable properties of interest at the moment of observation.
It's like what Jack said about types and tokens. It's like Vladimir_Nesov said:
The distinction between types and tokens is only relevant when you want to interepret your tokens as being about something else, their types, rather than about themselves. But types are carved out of observers' interests in their significance, which are non-objective, observer-dependent if anything is. Their variety and fineness of distinction is potentially infinite. As I mentioned above, a state of affairs is a conjunction of observable properties of interest. This Boolean lattice has exactly one top: Everything, and unknown atoms if any at bottom. Where you choose to carve out a distinction between type and token is a matter of observer interest.
Two subsequent states of a given dynamical system make for poor distinct elements of a sample space: when we've observed that the first moment of a given dynamical trajectory is not the second, what are we going to do when we encounter the second one? It's already ruled "impossible"! Thus, Monday and Tuesday under the same circumstances shouldn't be modeled as two different elements of a sample space.
I'll certainly agree it isn't desirable, but oughtn't isn't the same as isn't, and in the Sleeping Beauty problem we have no choice. Monday and Tuesday just are different elements in a sample space, by construction.
if she starts out believing that heads has probability 1/2, but learns something about the coin toss, her probability might go up a little if heads and down a little if tails.
What you seem to be talking about is using evidence that observations provide to corroborate or update Beauty's belief that the coin is in fact fair. Is that a reasonable take? But due to the epistemic reset between awakenings, there is never any usable input to this updating procedure. I've already stipulated this is impossible. This is precisely what the epistemic reset assumption is for. I thought we were getting off this merry-go-round.
Suppose, for example, she is informed of a variable X. If P(heads|X)=P(tails|X), then why is she updating at all? Meaning, why is P(heads)=/=P(heads|X)? This would be unusual. It seems to me that the only reason she changes is because she knows she'd be essentially 'betting' twice of tails, but that really is distinct from credence for tails.
Ok, I guess it depends on what you want the word "credence" to mean, and what you're going to use it for. If you're only interested in some updating process that digests incoming information-theoretic quanta, like you would get if you were trying to corroborate that the coin was inded a fair one to within a certain standard error, you don't have it here. That's not Sleeping Beauty, that's her faithful but silent, non-memory-impaired lab partner with the log book. If Beauty herself is to have any meaningful notion of credence in Heads, it's pointless for it be about whether the coin is indeed fair. That's a separate question, which in this context is a boring thing to ask her about, because it's trivially obvious: she's already accepted the information going in that it is fair and she will never get new information from anywhere regarding that belief. And, while she's undergoing the process of being awoken inside the experimental setup, a value of credence that's not connected to her observations is not useful for any purpose that I can see, other than perhaps to maintain her membership in good standing in the Guild of Rational Bayesian Epistomologists. It doesn't connect to her experience, it doesn't predict frequencies of anything she has any access to, it's gone completely metaphysical. Ok, what else is there to talk about? On my view, the only thing left is Sleeping Beauty's phenomenology when awakened. On Bishop Berkeley's view, that's all you ever have.
Beauty gets usable, useful information (I guess it depends on what you want "information" to mean, too) once, on Sunday evening, and she never forgets it thereafter. This information is separate from, in addition to the information that the coin itself is fair. This other information allows her to make a more accurate prediction about the likelihood that, each time she is awoken, the coin is showing heads. Or whether it's Monday or Tuesday. The information she receives is the details of the sampling process, which has been specifically constructed to give results that are biased with respect to the coin toss itself, and the day of the week. Directly after being informed of the structure of the sampling process, she knows it is biased and therefore ought to update her prediction about what relative frequencies per observation will be of each observable aspect of the possible state of affairs she's awoken into-- Heads vs. Tails, Monday vs. Tuesday.
I think I might understand the interpretation that a halfer puts on the question. I'm just doubtful of its interest or relevance. Do you see any validity (I mean logical coherence, as opposed to wrong-headedness) to this interpretation? Is this just a turf war over who gets to define a coveted word for their purposes?
This sounds like the continuity argument, but I'm not quite clear on how the embedding is supposed to work, can you clarify? Instead of telling me what the experimenter rightly or wrongly believes to be the case, spell out for me how he behaves.
If the coin comes up Heads, there is a tiny but non-zero chance that the experimenter mixes up Monday and Tuesday.
What does this mean operationally? Is there a nonzero chance, let's call it epsilon or e, that the experimenter will incorrectly behave as if it's Tuesday when it's Monday? I.e., with probability e, Beauty is not awoken on Monday, the experiment ends, or is awoken and sent home, and we go on to next Sunday evening without any awakenings that week? Then Heads&Tuesday still with certainty does not occur. So maybe you meant that on Monday, he doesn't awaken Beauty at all, but awakens her on Tuesday instead? Is this confusion persistent across days, or is it a random confusion that happens each time he needs to examine the state of the coin to know what he should do?
And on Tuesday
If the coin comes up Tails, there is a tiny but non-zero chance that the experimenter mixes up Tails and Heads.
So when the coin comes up Tails, there is a nonzero probability, let's call it delta or d, that the experimenter will incorrectly behave as if it's Heads? I.e., on Tuesday morning, he will not awaken Beauty or will wake her and send her home until next Sunday? Then Tails&Tuesday is a possible nonoccurrence.
Your argument is, I take it, that these counts of observations are irrelevant, or at best biased.
No, I was just saying that this, lim N-> infinity n1/(n1+n2+n3), is not actually a probability in the sleeping beauty case.>
I maintain that it is. I can guarantee you that it is. What obstacle do you see to accepting that? You've made noises that this is because the counts are correlated, but I haven't seen any argument for this beyond bare assertion. Do you want to claim it is impossible for some reason, or are you just saying you haven't seen a persuasive argument yet?
The disagreement seems to center on the denominator; it should count not awakenings, but coin-tosses.
No, I wouldn't say that. My argument is that you should use probability laws to get the answer. If you take ratios of expected counts, well, you have to show that what you get as actually a probability.
What would you require for proof? If I could show you a Markov chain whose behavior is isomorphic to iterated Sleeping Beauty, would that convince you?
I also am not sure what you mean when you say "use probability laws". Is there a failure to comport with the Kolmogorov axioms? Is there a problem with the definition of the events? Do you mean Bayes' Theorem, or some other law(s)? I also am deeply suspicious of the phrase "get the answer". I will have no idea what this could mean until we can eliminate ambiguity about what the question is (there seems to be a lot of that going around), or what class of questions you'll admit as legitimate.
defining feature of Sleeping Beauty--all Sleeping Beauty's awakenings are epistemically indistinguishable. She has no choice but to treat them all identically.
Hm, I think that is what I'm saying. She does have to treat them all identically. They are the same variable. That's why she has to say the same thing on Monday and Tuesday.
Up to this point, I see we are actually in strenuous agreement on this aspect, so I can stop belaboring it.
That's why an awakening contains no new info. If she had new evidence at an awakening, she'd give different answers under heads and tails.
I don't mean to claim that as soon as Beauty awakes, new evidence comes to light that she can add to her store of bits in additive fashion, and thereby update her credence from 1/2 to 1/3 along the way. If this is the only kind of evidence that your theory of Bayesian updating will acknowledge, then it is too restrictive. Since Beauty is apprised of all the relevant details of the experimental process on Sunday evening, she can (and should) use the fact that the predicted frequency of awakenings into a reset epistemic state is dependent on the state of the coin toss to change the credence she reports on such awakenings from 1/2 to 1/3. She can tell you this on Sunday night, just as I can tell you now, before any of us enter into any such experimental procedure. So her prediction about what she should answer on an awakening does not change from Sunday evening to Monday morning.
The key pieces of information she uses to arrive at this revised estimate are:
- That the questions will be asked in a reset epistemic state. This requires her to give the same answer on all awakenings.
- That the frequency of awakenings is dependent in a specific way on the result of the coin toss. This requires her to update the credence she'll report on awakenings from 1/2 to 1/3.
The 1/3 solution makes the assumption that the probability of heads given an awakening is:
lim N-> infinity n1/(n1+n2+n3)
I'd quibble about calling it an assumption. The 1/3 solution notes that this is the ratio of observations upon awakening of heads to the total number of observations, which is one of the problematic facts about the experimental setup. The 1/3 solution assumes that this is relevant to what we should mean by "credence", and makes an argument that this is a justification for the claim that Sleeping Beauty's credence should be 1/3.
Your argument is, I take it, that these counts of observations are irrelevant, or at best biased. Something else should be counted, or should be counted differently. The disagreement seems to center on the denominator; it should count not awakenings, but coin-tosses. Then there is a difference in the definition of the relevant events and the probabilities that get calculated from them.
- Thirders: An event is an awakening.
- The question asks about # awakenings with heads / total awakenings.
- This ratio is an estimate of a fraction that can be used to predict frequencies of something of interest.
- Halfers: An event is a coin-toss.
- The question asks about # tosses with heads / total tosses.
- This ratio is an estimate of a fraction which is universally agreed to be a probability, and can be used to predict frequencies of something of interest.
Did I get that right? Is this a fair description?
I think a key difference between halfers and thirders is that for thirders, the occurrence of an awakening constitutes evidence of the current state of the system that's being asked about--whether the coin shows heads or tails, because the frequency with which the state of the system is asked about (or, equivalently, an observation is made) is influenced by the current state of the system. To ward off certain objections, it is of no consequence whether this influence is deterministic, probabilistic or mixed in nature, the mere fact that it exists can and should be exploited. I don't think there's disagreement that it exists, but there is over how it's relevant.
Halfers deny that any new evidence becomes available on awakening, because the operation of the process is completely known ahead of time. (Alternatively, if any new evidence could be said to become available, it cannot be exploited.) From what I can tell, and my understanding is surely imperfect, there is some kind of cognitive dissonance about what kinds of things can constitute evidence in some epistomological theory, such that drawing a distinction between the actual occurrence of an event and the knowledge that at least one such event will surely occur is illegitimate for halfers. Is this a fair description?
Suppose half of the population are male and the other half are female. Also, suppose that only females have ovaries.
Suppose I record 3 variables: indicator that the person is male, indicator that the person is female, and indicator that the person has ovaries.
I sample N people, and get counts for those 3 variables of n1, n2 and n3. Given that we recorded a variable for a randomly selected person, is the probability that they are male equal to
lim N->infinity n1/(n1+n2+n3) ?
That's as may be, but it doesn't help Sleeping Beauty in her quandary. If you think this example helps to prove your point, I think it helps to prove the opposite. Although she knows, in this variation, that a randomly selected person will be tested, the random person selection process is not accessible to her, only the opportunity to know that one of three possible test results has been collected. She knows very well, given a randomly selected person (resp. a coin toss), what the probability they are male is (resp. the given coin toss came Heads). She isn't being asked about that conditional probability. (Or maybe you think she is? Please clarify.) To follow your analogy, upon being awakened, she's informed that a test result has been collected from an unknown person, and now, given that a test result has been collected, what are the chances it cames from a male?
Clearly the selection process for asking Sleeping Beauty questions is biased. If bias had not been introduced by an extra awakening on Tuesday, the problem would collapse into triviality. The puzzle asks how this sampling bias should affect Sleeping Beauty's calculations of what to answer on awakening, if at all. One of the reasons for doing statistical analysis of sampling schemes is to quantify how the mechanism that's introducing bias changes the expected values of observations. In the SB case, the biased selection process is a mixture of random and deterministic mechanisms. Untangling the random from the deterministic parts is difficult enough for the participants in this discussion-- they can't even agree on a forking path diagram! Untangling it for Sleeping Beauty while she's in the experiment is epistemically impossible. She has no basis whatsoever inside the game for saying, "this one is randomly different from the last one" versus "this one is deterministically identical to the last one, therefore this one doesn't count."
The same considerations apply to the case of the cancer test. Let me elaborate on your scenario to see if I understand it, and let me know if I'm mischaracterizing the test protocol in any material way. There is a test for a disease condition. Every person knows they have a 50% chance going in of testing positive for the disease. We'll stipulate that the repeatability of the test is perfect, though in real life this is achieved only within epsilon of certainty. (Btw, here's where the continuity argument enters in: how crucial is the assumption of absolute certainty versus near certainty? What hinges on that?) In this protocol, if the initial test result is positive, then the test is repeated k times (k=2 or 10, or whatever you deem necessary), either with a new sample or from an aliquot of the original sample, I don't think it matters which. Here the repetition is because of the obstinacy of the head of the test lab and their predilection for amnesia drugs; in real life the reasons would be something like the very high cost in anguish and/or money of a false positive, however unlikely. You, as a recorder of test results, see a certain number of test samples come through the lab. The identities of the samples are encrypted, so your epistemic state with regard to any particular test result is identical to that for any other test sample and its result.
So now the question comes down to this: upon any particular awakening, how is the test subject's epistemic state at any particular awakening significantly different from the lab tech's epistemic state regarding any particular test sample? There is a one-to-one correspondence between test samples being evaluated and questions to the patient about their prognosis. Should they give the same answer, or is there a reason why they should give different answers? Just as with the patient, the lab tech knows that any randomly chosen individual has a 50% chance of of giving a positive test result, but does she give the same answer to that question as to a different question: given that she has a particular sample in her hands, what is the probability that the person it belongs to will test positive? She knows that she has k times as many samples in her lab that will test positive than otherwise, but she has no way of knowing whether the sample in her hands is an initial sample or a replicate. It seems to me that halfers might be claiming these two questions are the same question, while thirders claim that they are different questions with different answers. Is this a fair description? If not, please clarify.
Of course not. It’s lim N->infinity n1/(n1+n2).
Even though n2 and n3 are counts of something different, in a sense, they are really the same variable. Just like Beauty waking up on Monday is the same as Beauty waking up on Tuesday. There is no justification for treating them as separate variables.
What you say is true for any outside observers, and for Sleeping Beauty after the experiment is over and the logbooks analyzed. But while Sleeping Beauty is in the experiment, this option is simply not available to her. The scenario has been carefully constructed to make this so, that's what makes it an interesting problem. The whole point of the amnesia drug in the SB setup (or downloadable avatars, or forking universes, random passersby, whatever) is that she has NO justification nor even a method for NOT treating any of her awakenings as separate variables, because the information that could allow her to do this is unavailable to her. By construction--and this is the defining feature of Sleeping Beauty--all Sleeping Beauty's awakenings are epistemically indistinguishable. She has no choice but to treat them all identically.
This phenomenon is a common occurrence in queueing systems where there's a very definite and well-understood difference between omniscient "outside observers" and epistemically indistinguishable "arriving customers", who can have different values for the probability of observing the system in state X, where the system is executing a well-defined random process, or even a combination random-deterministic process.
I don't follow your latest argument against thirders. You claim that the denominator
#(heads & monday) + #(tails & monday) + #(tails & tuesday)
counts events that are not mutually exclusive. I don't see this. They look mutually exclusive to me-- heads is exclusive of tails, and monday is exclusive of tuesday, Could you elaborate this argument? Where does exclusivity fail? Are you saying tails&monday is not distinct from tails&tuesday, or all three overlap, or something else?
You also assert that the denominator is not determined by n. (I assume by n you mean replications of the SB experiment, where each replication has a randomly varying number of awakenings. That's true in a way-- particular values that you will see in particular replications will vary, because the denominator is a random variable with a definite distribution (Bernoulli, in fact). But that's not a problem when computing expected values for random processes in general; they often have perfectly definite and easily computed expected values. Are you arguing that this makes that ratio undefined, or problematic in some way? I can tell easily what this ratio converges to, but you won't like it.