Anthropics and the Universal Distribution

post by Joe Carlsmith (joekc) · 2021-11-28T20:35:06.737Z · LW · GW · 8 comments

Contents

  I. Preliminaries
  II. World and claw
  III. Are the world and claw assumptions true?
  IV. What’s up with the ASSA?
  V. Weird SIA
  VI. Soul magnetism
  VII. What’s to like about soul magnets?
  VIII. Soul-magnetism silliness
  IX. Care magnetism
  X. Do we have any clue what Just Doing Solomonoff Induction actually implies?
  XI. Are you more likely to run on thicker wires?
  XII. Which claw?
  XIII. Just guess about what the UD would say?
  XIV. A few other arguments for UDASSA I’ve seen floating around
  XV. Why aren’t I a quark?
  XVI. Wrapping up
None
6 comments

(Cross-posted from Hands and Cities. Content warning: especially niche topic.)

Some readers of my recent sequence on anthropics [LW · GW] suggested that I consider an approach that they find especially plausible — namely, UDASSA (or the “Universal Distribution” plus the “Absolute Self-Sampling Assumption”). So, partly on this prompting, and partly from pre-existing interest, I spent some time learning about UDASSA, and talking to people who like it.

What is UDASSA? Roughly, UDASSA is anthropics for people who really like the Universal Distribution (UD), which is a way of assigning prior probabilities to hypotheses (see my previous post [LW · GW] for details). UDASSA’s hope is that, having discovered the UD, it has done a lot to solve anthropics, too: all you need to do is to apply the UD (plus some Bayesianism) to hypotheses about who you are (and in particular, which observer-moment you are).

In practice, and granted some further assumptions (what I call the “world and claw assumptions”), this means that UDASSA expects to find itself in “simpler” combination of (a) worlds, and (b) “locations” in a world (where the relevant notion of “location” is importantly up for grabs, and itself biased towards simplicity). We can think of this move as an extension of Occam’s Razor from objective worlds to centered worlds [LW · GW]: “you know how you wanted to weight hypotheses about the world according to some notion of simplicity? Well, I want to do that with hypotheses about who I am, too.”

Is UDASSA a good approach? I think it’s important to distinguish between two claims:

  1. The UD is cool.
  2. The UD helps a lot with anthropics.

As I indicated in my previous post, I’m open to (1): I think the UD might well be cool, despite various issues (e.g., uncomputability, the arbitrary choice of UTM, ruling out uncomputable worlds, and so on).

But I’m more skeptical of (2). Granted the world and claw assumptions, I think the UD helps with some problems in anthropics: in particular, relative to SIA and SSA, it seems better positioned – at least in principle — to deal with infinities, and with cases that make it unclear how to count observers (some other candidate pluses, like “there are limits to my dogmatism,” seem to me less exciting).

But these benefits come with serious costs. Applying the UD to anthropics leads to unappealing combinations of: (a) extreme disconnection from normal patterns of reasoning (and/or ethical concern), and (b) cluelessness about what conclusions are actually implied.

In particular, my impression is that people tend to assume that UD-ish anthropics entails something that I call “weird SIA.” Weird SIA is supposed to act broadly like SIA, but with the addition of what I call “soul magnetism”: namely, the hypothesis that in a world where multiple people make your observations, you are more likely be some of those people than others, because some of their “locations” are “simpler.” Soul magnetism is key to the UD’s ability to handle infinities, but it’s also really epistemically (and/or ethically) unappealing in its own right. And to the extent we try to water the UD’s soul-magnetism down (it’s not clear that we can), weird SIA starts to take on more of normal SIA’s presumptuousness problem.

Beyond this, though, it’s not at all clear to me that the UD actually does imply weird SIA – or, indeed, that it implies any of the things advocates seem to hope it implies (for example, that you’re more likely to be running on a computer with thicker wires [LW · GW] – that oh-so-clear intuitive datum). To the contrary, as far as I can tell, we’re basically clueless about what the UD says about anthropics — it all depends on which ways of extracting our observations end up coded for by the shortest programs input to our arbitrary UTM. Maybe we can guess wildly about these programs, but doing so feels to me a lot like just making stuff up.

This isn’t necessarily a reason to reject the UD, but it’s a reason to tread very cautiously in appealing to its object-level verdicts as a reason to feel good about it. And we shouldn’t use our cluelessness about these verdicts to smuggle in whatever anthropic conclusions we were hoping for anyway.

(It’s also unclear to me, on UDASSA, why I am not a quark, or something else much easier to specify in the language of physics than an “observer.” I’m not sure this is an additional objection, though.)

Overall, I think that the UD is plausibly cool (though far from problem-free), and its applications to anthropics worth exploring in more detail. But I don’t see it as any sort of panacea for anthropic gnarly-ness. If anything, it adds to the gnarl.

My thanks to Amanda Askell, Paul Christiano, Tom Davidson, Katja Grace, Jacob Hilton, Evan Hubinger, Buck Shlegeris, Carl Shulman, Bastian Stern, Ben Weinstein-Raun, and Mark Xu for discussion.

I. Preliminaries

Public writing on UDASSA is scant — and writing that sets out the view’s motivations in any detail, even more so. Hal Finney offers brief summaries here and here (he also links to a history of the concept, but the link is broken); Paul Christiano has a few [LW · GW] blog [LW · GW] posts on the subject; and Issa Rice collects some information and commentary on a wiki. Finney, Christiano, and Rice all credit Wei Dai with originating the idea on his mailing list in the late 90s (see links from Rice) — though Dai has since rejected UDASSA (he points to some reasons here [LW(p) · GW(p)]). Finney also credits Jürgen Schmidhuber and Max Tegmark as influences (this piece from Schmidhuber, for example, seems like it’s up a similar alley).

These writers (and those who, in my experience, adopt their views) tend to share various additional assumptions and inclinations, for example:

In my experience, writing and discussion in the vicinity of UDASSA sometimes mixes these assumptions and inclinations together in a way that makes it hard to isolate the distinctive claims and motivations for the approach to anthropics in particular. Indeed, reading Finney and Schmidhuber, it can feel a bit like one shows up in the midst of a “computer scientists try to settle a whole lot of philosophical questions very fast” festival. One is told to grab hold of an arbitrary Universal Turing Machine and hold on tight: we’re going to construct a “theory of everything.” One is tempted to ask: “Wait, what are we even doing here? Why am I talking about the minimum number of characters it takes to write a python program that ‘finds’ the n-th waking Sleeping Beauty?” Or worse, perhaps one doesn’t ask: one simply nods along, vaguely confused, but numb from a barrage of unfamiliar abstraction — or worse still, not even aware of your own confusion. Or perhaps one simply bounces off entirely.

I tried, in my last post, to do a bit more to set up and examine the backdrop headspace that UDASSA inhabits, which I see as centrally animated by excitement about the UD, and about the machinery of Solomonoff Induction. And I tried, as well, to present this headspace in a manner compatible with what I think of as an everyday person’s background ontology: an ontology on which there are physical objects, rather than just “information structures”; concrete worlds, rather than just “programs” or “simplest descriptions”; probabilities, rather than just “bets” or “levels of concern.” I’ll aim, here, to preserve such possible normality — or at least, to flag any deviations very explicitly. But I think it possible that at the end of the day, UDASSA makes the most sense if you move further in the direction of “Everything Is An Abstract Object And All Such Objects Exist/Don’t And You Don’t Need To Exist To Be Conscious And Beliefs Are Just Love.”

I’ll also assume that readers are broadly familiar with the UD. If you aren’t, see the first few sections of my previous post.

II. World and claw

OK, so how is the UD supposed to help with anthropics?

To get the most common approach up and running, we’re going to need a further pair of assumptions, which I’ll call the “world and claw” assumptions.

The first assumption, which I’ll call the “work tape assumption,” is that the shortest programs that output your observations cause your UTM to do some combination of (a) simulating a world, and (b) extracting from that world some set of observations that some observer within that world makes. Thus, on a basic version of this picture, the UTM starts by writing into its work tape some world of e.g. fragrant valleys and golden wheat, in which Bob sees red, and Sally sees blue; and then, as a next step, it extends some computational claw into that world, and extracts either the red, or the blue, either from Bob’s head (retina? something something? [LW · GW]), or from Sally’s. The machine then writes whatever its claw grabbed onto the output tape.

An arbitrary UTM extracting bits from a world simulated on its work tape. Source here.

The second assumption, which I’ll call the “input tape assumption,” is that this division between simulation and extraction is reflected on the input tape as well, such that one part of the input tape codes for the simulation (e.g., the world), and one part codes for the extraction protocol (e.g., the claw). This division allows us to hold fixed the length of the “world” part of the program, and reason independently about the length of the “claw” part.

Note that the input tape assumption doesn’t follow from the work-tape assumption: your UTM’s internal processing could involve some combination of simulation and extraction, without these being cleanly separable as distinct strings on the input tape.

Equipped with these assumptions, we arrive at the following extension of UD-style reasoning to anthropic questions:

  1. You are more likely to be an observer whose combination of (a) world and (b) extraction protocol are coded for by a shorter program input to your arbitrary UTM.
  2. In a given world, you’re more likely to be an observer in that world whose extraction protocol is coded for (relative to that UTM) by a shorter “Extraction Protocol” part of an overall World + Extraction Protocol program.

The work-tape assumption gets us (1): regardless of what’s happening with the input tape, the work tape process involves some sort of world and claw dynamic, so on the UD, shorter programs for that dynamic are more likely. And the input-tape assumption gets us (2): simulating a world is a fixed and separate cost of input-string length, so in trying to figure out who you are most likely to be, we can just focus on whose observations would be extracted by the shortest-to-code-for extraction procedures.

We can see in these assumptions something reminiscent of the distinction between “objective worlds” and “centered worlds,” where the latter contain a reference to a particular person/location/time in a world, and the former do not. That is, the work tape assumption imagines that the inner dynamic of the UTM’s information-processing mirrors the terms in the centered-world’s specification of {world, location}. And the input tape assumption imagines that this specification is literally written onto the input tape in two distinct stages: you write down the world (I think people generally imagine something like “fundamental physics and initial conditions,” here), and separately, you write down how to find who you’re looking for, in that world.

If this all sounds pretty specific and alien to normal reasoning, and like it doesn’t obviously have anything to do with anything: yep. We can make it sound smoother, though, if we run with the structural analogies with “centered worlds,” embrace the world and claw assumptions, and smoosh all this stuff about arbitrary UTMs and input program lengths and extraction protocols etc into that more comforting and familiar (and wooshy) term, “simplicity.” With such unsightly baggage in the background, we can then say something much shorter, like: “Simpler centered worlds are more likely.” Ta-da! Look at how short a sentence that is. (Probably it’s true, given how short it is. And this one is truer. This one more. This most.)

On this framing, we are basically extending Occam’s Razor from objective worlds to centered worlds, using the UD’s operationalization of “simplicity,” and conditioned on assumptions that play nice with a “centered worlds” ontology. And note that granted such assumptions, this sort of extension isn’t some ad hoc additional move that UD-adherents can choose to make if they’re interested in anthropics. Rather, it’s an unavoidable consequence of using the UD at all. After all, the UD is supposed to be a prior over computable ways of generating your observations in particular — and a prior over objective worlds isn’t enough to do this. If Bob sees red, and Sally sees blue, and they both live in World 1, then assigning probability p to World 1 won’t get you to an answer on “will I see red or blue?” — you also have to decide the probability of seeing what Bob sees, vs. seeing what Sally sees. If the world and claw assumptions hold, then, the UD was about centered worlds all along.

III. Are the world and claw assumptions true?

The UDASSA literature I’ve read (e.g., the handful of blog posts etc cited above) is pretty short on explicit argument for the world and claw assumptions. Dai just writes that they are “probably true” (perhaps there is more argument elsewhere). Finney writes that he “strongly suspects” that something in the vicinity is true, and that he has “argued elsewhere” that a world-and-claw type program is simpler than a “program which was hard-wired to produce a specific observer and had all the information necessary to do so.” My guess is that the thought here is supposed to be something like “your observations are a lot of bits, surely it’s easier to write down physics and then find your observations then it is to code for your observations directly”, in the same sense in which it easier to write down the code for the Mandelbrot set, and the location of a specific (sufficiently large/information-dense) region within it, than it is to code for that region’s contents directly.

This seems plausible to me: but note that “world and claw” and “code for it directly” (whatever that means) aren’t obviously the only options for generating your observations. Indeed, naively, it seems like there might be a lot of ways of generating/compressing a given set of bits, and I worry a bit about saying “I dunno, seems hard to write down bit-by-bit, or to do X-particular-thing, so probably it’s something about a physics and a locator?” In particular, I worry that this suffers from some conflation between “options that immediately come to mind” and “all computable options.”

Some comments from Paul Christiano [LW · GW] suggest a different argument for the work tape assumption, at least. I’ll flag up front, though, that Paul indicated in conversation that he doesn’t actually endorse world and claw in a straightforward sense, and didn’t mean for his post to be read as doing so (a comment that prompted a UDASSA-influenced person in the conversation to exclaim something like: “I based my entire worldview on a misreading of a 2011 blog post by Paul Christiano?!”). So rather than attempt exegesis, I’ll just give the argument that the post’s comments make me think of, namely:

1. Intuitively, the simplest explanation of our experience is that we are particular observers embedded in an external world.

2. Thus, Solomonoff Induction will converge on this as the highest likelihood hypothesis.

3. Thus, the highest likelihood programs will involve (a) a simulation of our world, and (b) a procedure for extracting observations from some observer within that world.

4. Thus, the work-tape assumption.

This feels to me a bit shaky, though. In particular, I feel more confident that we are particular observers embedded in an external world than that is “the intuitively simplest explanation” of our experience. I think this is partly because I feel less allegiance to intuitive “simplicity” as an epistemic (/ethical?) desiderata than some UD-ish folks, and partly because I don’t feel like I have a particularly clear grip on what our intuitive notion of “simple” is supposed to imply about questions like “which is simpler, solipsism or non-solipsism?” Or course, if we assume that the UD accurately captures our intuitive notion of simplicity, then we can ask what the UD (together with an arbitrary UTM) says about such a question — but in that case we’re back to just asking, directly, whether something like 3 is true. And if we don’t assume that the UD accurately captures our intuitive notion of simplicity, then even if 1 is true, it’s not clear that 2 and 3 follow.

Could we run a less simplicity-focused argument? Consider:

5. We are, in fact, particular observers embedded in an external world.

6. Solomonoff Induction converges on the truth.

7. Thus, 3 and 4 above.

In some moods, this feels to me a bit question-begging. In particular, if we’re wondering whether Solomonoff Induction says that our observations are generated via our being observers in an external world, rather than via our being in e.g. some hard-coded floating solipsist God-knows-what situation, it seems like we ought to assume either that we are in fact observers in external worlds (but Solomonoff Induction might lead us astray), or that Solomonoff Induction always converges on the truth about our situation (but we might be in fact be in some hard-coded solipsistic God-knows-what situation) — but it feels maybe a bit cheatsy to assume both. That said: whatever, maybe it’s fine, one person’s “question-begging” is often another’s “sound argument.” (Though re: soundness, note that “we are particular observers embedded in an external world” is also the type of thing that people I know who like UDASSA will suddenly start up and denying — you never really know.)

Maybe, then, we can get to the work tape assumption, if we’re willing to be suitably confident about our actual situation, and about Solomonoff Induction’s reliability (or perhaps, about what’s intuitively simple, and how much Solomonoff Induction tracks intuitive simplicity). But where does the input tape assumption come from? After all, we can grant that the shortest programs that generate our observations will in some sense create worlds and extract observations from them, while still denying that this bifurcation will show up in some cleanly separable way on the input program itself. Of course, it’s presumably possible to write down the physics on the first part of the tape, and the extraction procedure on the second; and it would be convenient for our ability to reason about the UD’s implications if this were always the shortest way of doing it. But where do we get the assumption that it actually is?

Indeed, naively, it seems like you’d get a shorter program if you can find a way to code for both the world and the claw at the same time. Maybe you somehow interleave the specifications for word and claw into the same string of bits? I’m not sure exactly how this would go, but again, we’re talking about “all computable options,” here.

(Here’s an example that might give a flavor of why world and claw might not always be cleanly separable parts of a program. Suppose that my world is coded for by some not-that-long string s, and that I have to pick out one amongst a zillion locations in that world. Perhaps, then, the location coded for by string s itself (e.g., the location you’d get if your extraction procedure was “go to the location picked out by treating the world part of the program as a location coordinate instead”) will be super extra likely, relative to other locations you can’t get to by re-using the world bits. Should we then expect to live at such a “world-program coordinate”? Am I, somehow, what you get if you treat the true physics, written in some arbitrary language, as a place to drop a pin into the world that the true physics creates? If so, this would complicate aspirations to ignore the world part of the program, and just reason independently about the extraction procedure — though it’s still in some sense an input-tape-ish situation; e.g., the “go to the world program coordinate” bits of the tape are still separable.)

Overall, the work tape assumption currently seems to me a better bet than the input tape assumption. And I have some ongoing questions about both.

IV. What’s up with the ASSA?

Thus far, I’ve been focusing on the “UD” part of UDASSA. But what about the ASSA — that is, the so-called “Absolute Self-Selection Assumption”?

The “ASSA” part of the UDASSA discourse seems like a bit of a mess to me. I’ll focus on two presentations: Finney’s, and Christiano’s [LW · GW].

As Finney presents it, the ASSA is supposed to be an extension of Nick Bostrom’s “Self-Sampling Assumption” or “SSA” (Finney calls Bostrom’s principle the “Self-Selection Assumption,” but I think the name difference is either a mistake, or a reflection of Bostrom’s having changed the name later). Finney treats SSA as saying that “you should think of yourself as being a randomly selected conscious entity (aka ‘observer’) from the universe,” but strictly speaking, this isn’t right: SSA (at least on the definition Bostrom eventually settled on) says that you should think of yourself as randomly sampled from the observers in your reference class (see here [LW · GW] for more on reference classes). This difference matters. On Bostrom’s view, SSA is compatible with thinking that you “couldn’t” have been e.g. a conscious chimp, or a person wearing a red jacket, or whatever; whereas Finney’s presentation of Bostrom builds in a specific reference class: namely, conscious observers.

For Finney, the addition of “Absolute” to SSA extends the concept from observers to “observer moments” or “OMs” — that is, “small units of time such that no perceptible change occurs within that unit.” Bostrom makes this move, too, but he uses the word “strong” instead of “absolute,” yielding the “Strong Self-Sampling Assumption” or “SSSA.” Thus, Finney’s formulation of the ASSA ends up equivalent to Bostrom’s formulation of the SSSA, except with a specific reference class — e.g., “all conscious observer-moments in the universe” — built in.

Christiano, by contrast, seems to treat the ASSA as some more substantive shift away from standard reasoning. He writes [LW · GW]:

A thinker using Solomonoff induction searches for the simplest explanation for its own experiences. It eventually learns that the simplest explanation for its experiences is the description of an external lawful universe in which its sense organs are embedded and a description of that embedding.

As humans using Solomonoff induction, we go on to argue that this external lawful universe is real, and that our conscious experience is a consequence of the existence of certain substructure in that universe. The absolute self-selection assumption discards this additional step. Rather than supposing that the probability of a certain universe depends on the complexity of that universe, it takes as a primitive object a probability distribution over possible experiences.

It’s not fully clear to me what’s going on here, but my sense is that the basic vibe is supposed to be something like “let’s skip this talk of worlds and just talk about experiences.” On a minimal interpretation, this is just a way of saying that ultimately, what we care about is centered worlds (and the observations they imply), rather than objective worlds, and that we should use the complexity prior for the former. But Christiano explicitly denies [LW(p) · GW(p)] this sort of interpretation: “My approach is defined by solipism. I don’t use the complexity prior for ‘centered worlds’ I just use the complexity prior for ‘subjective state.’” Maybe this is just a way of saying that he’s not taking for granted the world and claw assumptions, and that he’s open to experiences getting produced in some way that doesn’t involve a “world” existing at all? Regardless, comments to the effect that “normally, we think that ‘the universe is real’, but the ASSA discards that ‘additional step’” are suggestive of something metaphysically revisionary.

What’s more, such comments make it sound like this metaphysical revisionism is what the ASSA, is Christiano’s view, is all about. This makes pretty clear that we’re no longer just talking about Bostrom’s Strong Self-Sampling Assumption, but using a different name and a specific reference class. After all, even with a reference class of “conscious observers,” the SSSA is not at all about discarding the idea that the world exists: it’s about a certain way of updating your prior over worlds (namely, in proportion to the fraction of the observer moments in your reference class that are in your epistemic situation). Apparently, though, we’re doing something weirder than that, something about … solipsism? There being no fact of the matter about whether the world exists? Probability distributions over possible experiences as a “primitive object”?

I think part of what might be going on here is that UDASSA was developed in a context excited about something like Max Tegmark’s Mathematical Universe Hypothesis: namely, the view that our universe is a mathematical object, and that all possible mathematical objects are equally real. Thus, Christiano titles an early post on UDASSA “Anthropics in a Tegmark Multiverse [LW · GW],” and writes that:

I believe that the execution of a certain computation is a necessary and sufficient condition for my conscious experience. Following Tegmark, by ‘execution’ I don’t refer to any notion of physical existence—I suspect that the mathematical possibility of my thoughts implies conscious experience.

And Finney writes:

I am therefore implicitly assuming that only information objects exist … The UD defines a probability or ‘measure’ for every information object. This is the basic ontology which I assume exists. It is the beginning and the ending of my ontology.

That is, I think the picture here is supposed to be something like: “look, get rid of any contrast between existing vs. not existing, possible vs. actual, concrete vs. abstract, and so on: just imagine the set of all math/information things, some of those will be conscious observer moments, use the UD to assign probabilities to their getting printed out by an arbitrary UTM, and then expect to be a given observer moment in proportion to these probabilities – not because those probabilities are related to whether that OM gets “made real,” but because… well… just because. That’s just the whole thing. It’s a primitive.” (Or maybe: the probabilities indicate how much reality-ness each OM has. Or maybe: the probabilities indicate how much I care about each OM [LW · GW].)

This, though, is a lot of extra metaphysical baggage. We’re not, now, just talking about an interesting type of prior, and about the possibility of using that prior, in conjunction with a version of Bostrom’s Strong Self-Sampling Assumption, to address questions in anthropics. Rather, we’re also talking about an entire (revisionary and inflationary) backdrop metaphysics – and one for which Finney and Christiano, in the posts I’m referencing, give ~zero argument. If UDASSA means the UD, plus Bostrom’s SSSA with conscious observer moments as the reference class, plus the Mathematical Universe Hypothesis, it should be renamed to make this clearer (UDSSSAWCOMARCMUH?). And it shouldn’t be pitched as “one approach to anthropics.” Rather, it should be pitched as “an extremely substantive metaphysical hypothesis, and also some applications to anthropics on the side.”

Indeed, it’s not actually clear to me what work the most traditionally anthropics-ish bit — namely SSSA with conscious observer moments as the reference class — is doing, for Finney/Christiano. As far as I can tell, what Finney and Christiano really want to do is just use the UD to give a probability distribution over mathematical objects, and then to distribute subjective credence according to this probability distribution. But to do that, we don’t, actually, need to talk about conscious observer moments, in particular, at all – and still less, do we need to use them as a Bostromian reference class.

What’s more, in my opinion, we don’t need the Mathematical Universe part, either. As I tried to emphasize in my “pitch for the UD [LW · GW],” you don’t need to be Tegmarkian to get interested in the UD: all you need to do is think that your observations are being produced by a computable process, and to get interested in predicting them using an arbitrary UTM and some prior over inputs. Finney and Christiano seem interested in making a bunch of extra metaphysical claims, too, about how only information objects exist, something something solipsism, and so on – but this, in my view, is muddying the waters.

For these reasons, my own thinking about UDASSA tends to downplay the “ASSA” part, and the   commentary about it. Indeed, I think that the cleanest and most interesting thing in the vicinity of UDASSA is just: “whatever approach to anthropics falls out of just doing Solomonoff Induction” (or “Just Do Solomonoff Induction” (“JDSI”) for short). And I think the interesting question is whether “just doing Solomonoff Induction” actually helps with the types of problems anthropics has to deal with. Let’s turn to that question now.

V. Weird SIA

Let’s suppose, then, that we are going to Just Do Solomonoff Induction. What will we end up saying about all the questions that plague the anthropicists? 

The UDASSA-ish stuff I’ve read is pretty short on analyses of the flavor “I went through and tried to actually apply UDASSA to a bunch of gnarly anthropic questions, and here’s what I got.” My impression, though, is that people generally hope that UDASSA will act in rough accordance with what I’ll call “weird SIA.” I’ll start, then, by describing weird SIA, a few of its putative benefits, and also what’s so weird about it. I’ll then go on to question whether Just Doing Solomonoff Induction actually implies weird SIA at all.

What is weird SIA? As I’ll present it, weird SIA is an attempt to reason in a broadly SIA-ish way, but with the addition of what I’ll call “soul magnetism”: namely, the view that in a world where multiple people make your observations, you are more likely be some of those people than others, in virtue of differences in the length of the shortest “claw” program that extracts those observations. The hope is that this addition allows us to capture (at least roughly) the stuff that SIA gets right, but in a way that avoids some of its problem cases (and in particular, issues with infinities).

What kind of stuff does SIA get right? Well, consider:

Sleeping Beauty: Beauty goes to sleep on Sunday night. After she goes to sleep, a fair coin is flipped. If heads, she is woken up once, on Monday. If tails, she is woken up twice: first on Monday, then on Tuesday. However, if tails, Beauty’s memories are altered on Monday night, such that her awakening on Tuesday is subjectively indistinguishable from her awakening on Monday. When Beauty wakes up, what should her credence be that the coin landed heads?

1/3rd really looks to me — and to many — like the right answer, here. For relevant arguments, see e.g. the Dorr-Arntzenius argument here [LW · GW], Elga’s original argument for 1/3rd here, and problems for 1/2 like the Red-Jacketed-High-Roller, and related forms of Telekinesis, here [LW · GW]. Weird SIA hopes to play nice with such arguments.

How does it do so? Broadly, the idea is that if there are more copies of you in a given world, there are going to be more programs that output your observations by simulating that world — and thus, that world will be more likely (that is, you’ll be more likely to go on to observe it). Thus, to take a toy example, suppose that the experimenter flipped a coin, and if heads, he created one Joe in a white room labeled “0”, and if tails, he created two Joes, in rooms labeled “0” and “1”.  Suppose you wake up as a Joe in a white room, and suppose, further, that:

  1. the world and claw assumptions are true,
  2. the shortest way to print out a given Joe’s observations is via a program consisting of (a) a physics (used for simulating the objective world), (b) an “extraction protocol” that somehow says to “extract the observations of the observer in the room with the following number,” and (c) a number written in binary, and
  3. specifying the physics in which the coin landed heads, vs. the one in which the coin landed tails, requires a program of the same length.

Thus, because the length of (a) and (b) is the same in both worlds, and the binary representation of “0” and “1” is the same length, all three “centered worlds,” here, are coded for by programs with the same length. Thus, the UD gives equal prior probability to each of them, and after updating on your observations, their posterior probability remains equal as well. If we ignore all the longer programs that generate these observations (or if we assume that the same sort of dynamic applies to them), then we get 1/3rd that, upon leaving your room, you will observe a heads-up coin; and conditional on tails, you’re equally like to be in room 0, or room 1. So far, then, we’ve reproduced SIA’s verdicts about the case.

Of course, this particular set up, where the extraction protocol proceeds via the binary representation of your room number, presumably isn’t actually the shortest way of coding for these observations. But weird SIA’s hope is that, for whatever the shortest way actually is, something like this dynamic will apply: that is, specifying the observations in Heads-Room 0, Tails-Room 0, and Tails-Room 1 will all require programs of ~equal length.

VI. Soul magnetism

OK, then: so what makes this weird SIA, as opposed to just SIA proper?

The weirdness comes from the soul-magnetism thing. Basically, the case above, where the programs specifying Tails-Room 0 and Tails-Room 1 are precisely the same length, is a special case. In most other cases, it will be easier to specify some observers than others — even if they have the exact same observations, and even if they live in the same objective world.

Suppose, for example, that God makes a world with a million Joes in different white rooms, each labeled 1-1,000,000. You know that you’re one of these Joes, but you don’t know which one. The standard move here, for both SIA and SSA, is to distribute your credence equally amongst each Joe, such that you’re one-in-a-million on Joe 1, one-in-a-million on Joe 2, and so forth. But soul magnetism says: nope. In fact, you’re more likely to be some of these Joes than others, because the claw programs for these Joes are different lengths.

Or at least, this is what would fall out of the assumptions and extraction procedure I sketched in the previous section. Thus, for different input programs that print out Joe-observations, the length of the physics and “extract the observations of the observer in the room with the following number” are held constant, and all we need to do, to compare the overall program lengths, is to compare the lengths of the binary representation of the room number. Thus, extracting the observations of the Joe in room 1 only takes 1 extra bit, used for writing the binary number 1; room 10, by contrast, takes four extra bits (1010). Thus, if we were using the type of UTM I described in my previous post [LW · GW], which starts with a prior over programs corresponding to the probability of three-sided-coin-flipping your way to a given program, then your probability of being Joe 1 is 27x higher than your probability of being the Joe 10; and Joe 1,000,000 is basically a write-off.

On this picture, then, some Joes are quite a bit special-er than others, just in virtue of how efficiently their room numbers can be written in binary. It is as though the Joes in shorter-numbered rooms have a special glow, which pulls hard on whatever forces determine who “you” end up “being.” The image that comes to mind, for me, is of God first creating the world of Joes, then grabbing a soul out of the spirit realm and throwing it randomly into the world, to be inserted into whichever Joe it runs into first. On the standard picture, this hapless soul has an equal chance of ending up inside any given Joe; but this, says weird SIA, neglects the unique metaphysical power of your room number’s binary representation. Because of this power, Joe 1 sucks the souls towards himself with extra force; he gets twenty-seven souls, for every one soul that Joe 10 gets — and Joe 1,000,000, poor fellow, barely gets any.

(Though we might wonder: how many souls are there supposed to be in this world, anyway? Does everyone get at least one soul, or does Joe 1,000,000 only get some tiny fraction of one? Does Joe 1 have 27x more souls than Joe 10? Is the relevant thing more like “soul-force,” or “soul-density”? What in the heck are we talking about?)

In a sense, that is, Joe 1 is like Buzz Lightyear in the picture from section II. Because he’s taller/bigger/special-er (e.g., more soul-magnetic), the claw grabs him more easily.

An arbitrary UTM responding to the special metaphysical power of Buzz's binary representation. Source here.

VII. What’s to like about soul magnets?

Soul magnetism, at a glance, looks pretty silly. And in many ways, I think, it is (more on this below). But it’s also worth flagging why one might tolerate it.

The strongest reason, I think, is that soul magnetism gives weird SIA a way of dealing with infinities – a topic that standard SIA struggles with [LW · GW]. Thus, suppose that if heads, God creates a million Joes, in a million numbered white rooms, but if tails, God creates an infinite number of Joes, in an infinite number of numbered white rooms. Standard SIA freaks out about cases like this. In particular, first it becomes certain of tails; but then, in assigning probabilities to centered tails-worlds, it tries to split its probability equally between all the infinite Joes, and it doesn’t know how (one option here is to appeal to infinitesimals, but this, I gather, has problems). What’s more, it struggles to assign probabilities to different objective worlds with infinite Joes — for example, infinite worlds with Joes packed into every nook and cranny, vs. infinite worlds where Joes are much more sparse (we can hope the density of Joes converges as we take the limit over some process, like an expanding volume of space, but we have to pick such a process, and we’d still need ways of handling worlds where the process does not converge).

Weird SIA, by contrast, is much more cool-headed about infinities, and more able to accommodate them within a standard type of Bayesianism. In particular, if we assume the same set up and extraction procedure re: room numbers as above, the probability that weird SIA puts on the finite world will only be a tiny bit smaller than the probability it puts on the infinite world. After all, by the time you’re adding people in rooms with labels higher than a million, you’re talking about people with basically no soul-juice anyway — and in particular, soul-juice that declines exponentially as their room number increases in size. You’re basically never going to end up as one of those losers, so a given world hosting tons of them — even an infinite number isn’t much extra reason to think that you’re in it. And conditional on a given world with infinite losers, you’re still probably a winner — one of the cool kids, with the small-numbered rooms.

Comparing different infinite worlds is somewhat trickier, and will depend on the specific set up and assumptions about the most efficient extraction procedure — but at least one expects a determinate output, along with nice, well-behaved probabilities both on objective worlds, and on being any given inhabitant.

We can make similar moves in an effort to ward off finite “presumptuous philosopher”-type cases: e.g., cases where standard SIA becomes highly confident that it is in a world with (finitely) more observers in its epistemic situation (see here [LW · GW] and here [LW · GW] for more). As above, if the loser-dom of these extra observers increases exponentially as their population increases, this puts a cap on the amount of excited you get about believing in the world they inhabit. The first extra person in your epistemic situation matters a lot to this excitement; the second, less; the third, even less — and it’s a fast drop-off.

Finally, my impression is that some people hope to use soul-magnetism to ward off worries about Boltzmann brains — e.g., disembodied brains that make your observations, arising due to random fluctuations in suitably long-lasting universes, and then immediately disintegrating (stable ones are much less likely). The classic worry is that if Boltzmann brains with your observations vastly (infinitely?) outnumber your embodied self — as they would on some cosmological hypotheses — then you are almost certainly a Boltzmann brain. There are various ways of responding to this. For example, we can try to appeal to cosmological updates we should make from the fact that, contra the prediction of Boltzmann brain hypotheses we could’ve believed in the past, we did not disintegrate (thanks to Carl Shulman for discussion) — though this gets into questions about how to think about the evidence your memories provide, if you’re wondering about Boltzmann brains with false memories. In the context of weird SIA, though, my impression is that some hope that the Boltzmann brains are losers — e.g., that the programs required to extract the observations of a Boltzmann brain are sufficiently long that even a vastly larger population of Boltzmann brains still leaves your embodied self/selves with most of the soul-juice.

Thus, as an analogy: if your world consists of (a) a copy of Hamlet, sitting on a big pedestal, and then (b) the library of Babel (e.g., a giant library of all possible books of a certain longer-than-Hamlet length), then if a UTM extracting bits from this world is outputting Hamlet, the thought would be that it’s probably outputting the Hamlet from the pedestal, rather than the Hamlet buried in the library, because “output the book on the pedestal” is a much shorter-to-write-on-the-input-tape way of finding Hamlet than whatever would be required to find it in the library. And the Boltzmann brains, the thought goes, are buried in the stacks.

I haven’t thought much about this, but not being a Boltzmann brain does seem like a plus, if you can get it.

Before dismissing soul-magnetism as silly, then, these benefits — with respect to infinities, presumptuous philosophers, and Boltzmann brains — are worth noting and reflecting on. In particular, it seems plausible to me that some form of soul-magnetism is basically required if you want to assign well-behaved, real-numbered probabilities to being particular Joes in an infinite-Joes world: trying to be “uniform” over the Joes won’t cut it (though note that non-UDASSA anthropic theories can go non-uniform in infinite cases, too).

And more generally, conditional on the world and claw assumptions, it’s not clear that users of the UD can avoid soul magnetism, even in finite cases. After all, granted that you’re simulating a given world with multiple people making observations O, you need some way of deciding which of those people’s observations to extract – and it seems plausible that some people will, in fact, end up more efficiently claw-able than others. So people who like the UD and the world and claw assumptions may be in for soul-magnetism whether they like it or not.

VIII. Soul-magnetism silliness

That said, soul magnetism sure does seem pretty silly, especially in finite cases.

At a high-level, the strangest thing about it is just that it doesn’t seem like it reflects any deep differences between the people in question. If there are a million Joes in white rooms, then modulo Tegmarkian-God-Knows-What, all of these Joes are in fact equally real inhabitants of the real world. There just aren’t, actually, differences in their amount of “soul-juice,” or “soul-attraction” — or at least, not on our current best philosophy mind. Rather, on our current best philosophy of mind, whatever’s going on with minds, consciousness, observation, and so on, arises in virtue of (or is identical with) the existence of physical structures in the real world, and these physical structures are, by hypothesis, identical amongst the Joes. To the extent these Joes are equally real, then, and equally ensouled: in virtue of what, pray, should someone with Joe-like observations expect so strongly to be a Joe in a room with a smaller number on the door?

Of course, the answer here is supposed to be: well, smaller numbers can be represented by shorter binary strings. But, like: what? What on earth does that have to do with anything? Yes, we can just posit that this matters: but why would you think that this is actually a good guide as to who you are likely to be?

We can say the same about various other candidate sensitivities people throw around, sometimes jokingly, in the context of UDASSA: for example, you’re more likely to be someone who is unusually tall or fat, or standing next to an especially large sign, or living at an especially compressible time-step, or reading an especially compressible number out of the corner of your eye. All these properties, the thought goes, are ways in which you might become easier to “extract” from the world; and all of them seem notably bad from “this is a serious view about how you should form beliefs” perspective. At the very least, if this is where we have ended up, it feels like we need to really go back and make sure we know exactly why we’re playing this weird game in the first place.

Maybe talking about bets can bring the strangeness out. Suppose that God puts you to sleep, then wakes you up, Sleeping-Beauty style, once in Room 1, and once in Room 2 (with no memory of previous wakings). You’re one of these wakings, and you don’t care about the other. Remembering your binary and your three-sided-coin-flipping-UTM, you realize that, strange as it may seem, you’re 3x more likely to be in Room 1, so 75% vs. 25% (though if it’s room 0 vs. room 1, it’s a different story — another weirdness). So, when someone offers you (along with the other waking) a deal that gives you $30 if you’re in Room 1, at the cost of $70 if you’re in Room 2, you take it: after all, it’s worth 5 bucks ($30*75% – $70*25%) in expectation.

But do you really expect to come out ahead, here? I don’t. When I actually ask myself: “am I actually going to walk out and find that I was in Room 1 three times out of four?”, I’m like: uh, no. My room number doesn’t have that kind of power. And of course (though not that it matters to you, as a waking-egoist), your policy is terrible for the observer-moment team as a whole; if God repeats the stunt over and over, the team will end up broke.

Imagine it from the perspective of the waking in room number 2. Why, exactly, are we dooming that guy to be wrong a bunch, and to lose lots of money betting he’s in room 1? Because he gets fewer souls? Doesn’t seem that way to him, I expect – what with the whole “actually-having-just-as-much-of-a-soul-as-the-room-1-waking” thing.

I think part of what’s driving my rejection, here, is that the exponential cost of having a higher room number is quite dramatic. The difference between room number 1 and 2 alone is already a factor of 3; and as I noted above, by the time you’re at room 10, you’re already 27 times more of a loser. That’s a big hit, for a relatively small number of additional rooms — and of course, as ever, a hit weirdly sensitive to the particular UTM, type of representation, and so on (and weirdly discontinuous as well: e.g., no difference between rooms 0 and 1, but then you add an extra bit, and bam, a factor of 3).

And here, perhaps an advocate for soul-magnetism would argue that my presentation has been a caricature. Obviously, it won’t really be door numbers (or physical size, time-step compressibility, etc) that drive the most efficient extraction procedure. It’ll be, well, something else — and presumably, something more respectable-seeming. And this something else will, let’s hope, result in some much smaller (and hopefully more continuous?) change to your probability of being any given Joe, as we proceed on down the line of rooms. Of course, there do have to be differences, since we have to deal with the infinite-rooms cases, and we can’t have equal probabilities on each of the infinite Joes (nor, it seems, should we somehow try to finagle uniformity in all finite cases, but non-uniformity in the infinite ones: this would imply, for example, that in order to compare the probabilities of any two rooms, we need to know whether we’re in an infinite case or not; and it would be seem objectionably ad hoc regardless). But perhaps the soul-attraction differences between Joes can be milder, and hence only relevant in cases where you have a truly wacky number of Joe’s — e.g., a Graham’s number, or some such, where it feels like all bets are more likely to be off. In more standard cases, the hope is, you’re basically uniform, and hence basically doing standard SIA.

I have three responses to this. First: I grant that even granted the world and claw assumptions, “binary representation of your room number” is extremely unlikely to drive the actually shortest extraction procedure, and that it is in this sense a caricature. But as far as I can tell, most of the discourse about UDASSA is either (a) a caricature in this sense, or (b) is left so unspecified as to be unhelpful (more on (b) below). That is, specifying what a world-and-claw-ish UD actually implies about a given case requires making assumptions about what sorts of claws are in play; and any given claw people throw around seems unlikely to be the true claw. (We can, if we wish, attempt a sort of meta-UDASSA, which proceeds by attempting to have a distribution over possible outputs of UDASSA, and hence over possible claws – I discuss this route below).

Second, any given claw plausibly will result in pretty dramatic penalties, once the relevant property-that-increases-input-program-length has been identified. After all, we’re talking about a 1/3rd hit just from adding an extra bit — and when we’re trying to distinguish between lots of different Joes, it seems like bits have to get added pretty willy nilly. And recall, as well, that the shortest world+claw combo is going to dominate probability distribution, on the UD — it’s not like other claws, reflecting other types of soul-magnetism, will balance it out. So it’s unclear what this “basically uniform unless it’s a truly ridiculous number of Joes” picture actually involves (if some reader has a concrete proposal, I’d be interested to hear it).

Third, there’s a tension between trying to make weird SIA more like standard SIA (at least in finite cases), and continuing to do as much to avoid Presumptuous Philosopher-type problems. After all, the problem with the Presumptuous Philosopher was centrally that she made big updates about cosmology, because some cosmologies have extra people in her epistemic situation. The size of these updates shrinks dramatically if we posit that once we’re adding extras with room numbers above 10, it has become way less likely that the presumptuous philosopher is, in fact, one of those extras. But if we instead weaken the penalty for being an extra-in-a-bigger-numbered-room (or whatever the relevant property is), we also leave more room for presumptuousness of the type we were aiming to avoid.

Overall, I don’t currently see a lot of reason to expect UDASSA (plus the world-and-claw assumptions) to yield forms of soul-magnetism that are, as it were, “not so bad.” Rather, I expect that at bottom, UDASSA’s true view is that in fact, even in finite cases, some people making observations O are metaphysically special, relative to whatever arbitrary UTM you picked, in a way significantly relevant to how you should form beliefs. Some people are just more likely get inhabited by souls, and you should expect, strongly, to be one of them. This, it seems to me, is an extremely substantive metaphysical hypothesis – one that dramatically revises, rather than innocuously “reframes,” our basic predicament. It would be a big discovery if true (and these arbitrary UTMs would have such power!). But have we really made such a discovery?

IX. Care magnetism

It is at this point, I think, that advocates of “epistemology is just love” step in and say: “Joe, you’re thinking about this the wrong way. It’s not that I think the guy in room 1 is metaphysically special, such that he “pulls souls” towards him harder. Rather, he’s ethically special: he pulls at my heartstrings harder. He doesn’t have more “soul-juice”; he has more “cared-about-by-me” juice (or perhaps, more “I-care-about-myself-more-if-I’m-him” juice). And cared-about-by-me-juice is what the UD has been about all along.”

Of course, attempting to reduce epistemology to different arbitrary levels of care implicates its own bucket of issues (do advocates of this view actually ever go all the way with it?), but if you could pull it off in a worked-out way, it would help a bit with soul-magnetism’s metaphysical strangeness. In particular, it seems strange for the ghostly forces of “where does your soul end up” to be sensitive to binary representations of room numbers (or whatever); but on subjectivism, at least, we’re a lot more used to people caring about whatever weird arbitrary thing they want. 

That said, in the absence of any metaphysical differences, caring 27x more about the person in room 1, vs. room 10 – or about the next-to-the-big-sign person vs. the further-away-from-it person, or the person-living-at-an-easily-compressible-time-step vs. the person-living-at-a-less-compressible one, or whatever – seems pretty ethically dubious. In particular, it looks a bit like some weird form of CS-inflected prejudice: an ignoring of the fundamental equality of these people’s reality — their suffering and happiness — in order to discriminate on the basis of whatever inequality in length-of-program-for-extracting-their-observations happened to fall out of your arbitrary choice of UTM. It is coherent, in some sense, to think like this; but it is coherent, in some sense, to give more weight to the interests of people with green hats, or blond hair (not to mention more politically charged examples). But the suffering of the non-blonds, and the red-hatters, will persist in its reality: and those who actually care about suffering should keep their eyes on the ball.

“Sorry, Joe 100, I know I gave Joe 1 a lollypop instead of saving your arm. It’s nothing personal. It’s just that, well, that room number…”

And the purely prudential version of this discrimination is strange as well. If you think you’re equally likely to be in any of the rooms, are you really so much more indifferent to your own welfare, if your room number is higher? I wonder, for example, how you’ll feel about this policy when you actually are in room 100, and then you find yourself losing your arm, while the Joe in room 1 gets some crappy candy. (That said, “epistemology is just love” presumably rejects traditional notions of assigning probabilities to being different people? So I’m not actually sure what a standardly prudential gloss on this form of discrimination looks like.)

X. Do we have any clue what Just Doing Solomonoff Induction actually implies?

I conclude, then, that the weird SIA is in fact very weird (and attempts to reduce its weirdness, and to make it more like standard SIA, will also increase its vulnerability to some of standard SIA’s problems, like presumptuousness). Perhaps, though, if such weirdness in facts falls out of Just Doing Solomonoff Induction, in all its formal glory, some will be inclined to say “OK, so be it. I love Solomonoff Induction enough to just bite whatever bullets it spits out, including this. Also, hey, it helps with infinities and stuff, right?”

But now I want to step back, and ask: do we actually have any clue what Just Doing Solomonoff Induction spits out? Here I’m quite skeptical – especially in practice.

I noted one potential source of skepticism earlier – namely, skepticism about the world and claw assumptions (I’m particularly unsure about the input-tape assumption). But even granted such assumptions, it’s quite unclear that the shortest world+claw combo that extracts your observations is going to imply something akin to SIA + soul magnetism.

Thus, for example, consider the claw: “sample randomly from the members of X reference class.” Such a claw would result in quite SSA-like behavior. In Sleeping Beauty, for example, if the reference class was “non-God-person-moments,” and Beauty was the only person either way, then it seems like this claw would end up a halfer.

(Or maybe not? Maybe here we appeal to details about the extra bits required to feed in the random seed, as one person I spoke to suggested? Or something? If that’s the route back towards thirding, though, it feels like one that took us pretty into the weeds on the technical details of our UTM’s set up.)

Of course, this SSA-ish claw, too, is just one claw – the truly shortest claw is likely different. But it gives some flavor for why SIA-like anthropic behavior can’t just be taken for granted as an output of Solomonoff Induction: rather, you have to argue for it. The main argument I’ve heard thus far is something like: “well, presumably it’s roughly equally easy to pull out any of the person-moments, regardless of heads or tails?” – but on the SSA-like claw above, that’s not true (an individual person-moment in tails-worlds is less likely to get pulled out than the one in heads, because the random sampling has more options to choose from). Is there anything more we can say? Maybe just: “I dunno, SSA-like claws seem kind of specific?” Maybe that’s OK for now, but I find myself wanting something more principled – something that feels like it’s really orienting towards the space of all possible claws, rather than just fixating on the ones that pop to mind first.

Indeed, one conversation I had about UDASSA seemed to me to focus unduly on what seemed to me an artificially limited menu of possible claws: in particular, something akin to (a) “sample randomly from the physical locations,” and (b) “sample randomly from the observer moments.” These two options bear some (presumably non-accidental) resemblance to SIA-like and SSA-like behavior, but a priori, in the space of all possible ways of extracting observations from a world, why we should we think these the most efficiently encodable by our arbitrary UTM? (Have we even picked our UTM yet?). And more generally, I feel some worry about jumping from “Just Do Solomonoff Induction” to “Also, The World and Claw Assumptions” to “This View Basically Says What I Wanted To Say Anyway”, without grappling with the Eldritch craziness and uncertainty implied by just the first step, and even including the second. 

XI. Are you more likely to run on thicker wires?

Here’s an example of the type of jump I’m suspicious of (thanks to Evan Hubinger, in particular, for discussion).

In his blog post on UDASSA, Christiano argues that UDASSA says the right thing about a problem I associate with Bostrom (2005): namely, the question of how to understand the metaphysics of “splitting simulations.” In brief, the problem is that it seems possible to (a) run a mind/observer/consciousness on a computer, and then (b) to take a series of steps that splits this computer into two separable (and eventually separate) computers, in a way that makes it unclear whether you have created a new “copy” mind/observer/consciousness, and if so, when (see Bostrom’s paper for more).

The particular example Christiano considers is: turning a two-atom-thick computer into two, one-atom-thick computers. Christiano wants to avoid saying that this transition has any moral significance, because he thinks that this will leave it unclear what aspect of the transition was the morally significant one. And he claims that UDASSA gets him this verdict:

Given a description of one of the 1 atom thick computers, then there are two descriptions of equal complexity that point to the simulation running on the 2 atom thick computer: one description pointing to each layer of the 2 atom thick computer. When a 2 atom thick computer splits, the total number of descriptions pointing to the experience it is simulating doesn’t change.

One thing I’ll note, here, is that none of the options for what to say about splitting simulations seem to me especially awesome. Christiano’s view, for example, naively implies that if you’re a simulation, you should pay a lot to be transferred to a thicker-wired computer, or to avoid having your wires thinned, despite the fact that these procedures would (presumably) be introspectively undetectable (see Wei Dai’s comment here [LW · GW], and his post here [LW · GW]). What’s more, Christiano’s argument against wire-thickness-indifference is basically just “this implies vagueness about something that matters,” which is something many philosophers have gotten used to in lots of contexts, and which computationalists about stuff may need to accept regardless, given ambiguities about which things implement which computations. And if you really love thicker wires, I expect you can find ways to love them that don’t involve UDASSA-like machinery (perhaps, for example, Bostrom’s appeals to different amounts/degrees of consciousness could be helpful). In these senses, even if UDASSA gets Christiano’s favored verdict here, this doesn’t seem like an especially strong positive argument in its favor.

But my main point here is: wait, since when are we assuming that ease of specifiability goes in proportion to number of atoms or thickness of wires? This sounds to me like an extremely specific “claw.” Maybe it falls out of something like “sample from all arrangements of atoms” or “sample from all sets of space-time points.” But why should we think that these are the shortest claws, relative to our UTM? What about claws like “sample from the causal structures,” or “sample from the computers,” or whatever? Whence this level of clarity about the claws that count?

XII. Which claw?

This kind of “which claw?” problem currently leaves me basically at a loss as to what sorts of verdicts JDSI+worlds-and-claw would actually yield even about very toy anthropics questions. Consider:

God’s weird coin toss: God flips a coin. If heads, he creates ten identical people in ten rooms. From left to right, the rooms are numbered with the first 10 primes. From right to left, by contrast, the rooms are given progressively larger hats. However, perched on every second hat is a pigeon singing a progressively less popular Britney Spears song, and the third pigeon from the right will one day invent AGI. If tails, by contrast, God creates 90 rooms with a similar set up, except that the pigeons have double-thick neurons made of gold.

What’s the chance of heads? SIA answers immediately: 1/10th. SSA (assuming no God in the reference class) answers almost as quickly: ½. But JDSI? I’m not sure. It depends, presumably, on whether the shortest claw is something about primes, hats, pigeons, songs, world-historical something somethings, or God knows what else (and on how the relevant property is encoded, and how large the input alphabet is for the arbitrary UTM, and on whether the physics of the “heads” and the “tails” objective worlds are in fact equally simple to encode, and …).

Similarly: conditional on heads, what’s the chance of being the person one pigeon to the right of the AGI pigeon? SIA and SSA just say 10%: they don’t stop to count primes, or to weigh hats, or to check how fast the anthropic magnetism of the AGI pigeon drops off with distance (are you sure you’re not the AGI pigeon? On priors, you really should’ve been…). But JDSI? Again, I’m not sure. It all depends on which people are how efficiently claw-able. And we just don’t know.

I expect that problems in this vein will get even more gnarly when we move from toy cases to the real world anthropic problems. What, for example, does JDSI+world-and-claw say about the Doomsday Argument, or the location of the Great Filter, or about whether we’re likely to live in a universe obsessed with simulating us in particular? I am not aware of any writing that goes through and attempts to work out what something UDASSA-ish actually says about these issues, even in very rough terms (readers, do you know of examples I’m missing?). Indeed, if some UDASSA fan out there is interested in creating such an analysis, I would love to see it — I expect it would make clearer what UDASSA looks like in action. And a persistent absence of such analysis, despite people maintaining theoretical interest in the view, would seem to me an instructive data point with respect to the view’s usefulness.

Note, though, that any such analysis shouldn’t just assume without argument that some particular claw (e.g., “sample from the space-time points”) is the shortest claw. And granted this constraint, I currently feel pessimistic about getting much in the way of object-level conclusions out of something UDASSA-ish, even given the world and claw assumptions. In particular, it seems to me extremely hard to actually guess the shortest claw, relative to some UTM (unless you’ve rigged your UTM specifically for this purpose); and thus, extremely hard to know what UDASSA’s verdict about a particular case would be.

XIII. Just guess about what the UD would say?

Perhaps, though, we should embrace this uncertainty. After all, if the UD does in fact determine our anthropic fate, then whether it’s easy to think about or not, we just have to deal with it, and to get along with whatever best guesses about claws we’re able to cobble together.

Indeed, we can imagine ways in which uncertainty about claws could help restore some semblance of normality to our calculations of soul-magnetism, at least in finite cases. Thus, faced with the ten people in the heads worlds of God’s weird coin toss, perhaps we can say something like: “Look, I don’t know what the truly shortest claw is, but it probably isn’t something about pigeons or Britney Spears songs or inventing AGI – or at least, I give extremely low credence to these specific claws. True, according to the UD, some of these people are, in fact, much more soul-magnetic than others – but I, a mere human without access to the UD’s probabilities (weren’t they uncomputable [LW · GW] anyway?), cannot tell the metaphysical winners from the losers. So, seeing no reason to privilege any of these people over the others, I’ll go basically uniform between them.”

That is, this approach introduces a new layer of abstraction. We’re still assuming that UD is the authoritative guide to who and where we are how likely to be; but we’re abandoning hope of actually using the UD to reason. Instead, we’re trying to take some more normal-subjective-Bayesian approach to guessing at what the UD would say, if only we could hear its weird voice.

What would this approach say about heads vs. tails, in God’s weird coin toss? Or about infinite versions of the case, where you have to start privileging some people over others in order to keep your probabilities well-behaved? I’m not sure: unlike “which of these ten people in the same world am I likely to be?”, these questions don’t come with naturally agnostic “default” like uniformity. So people trying to guess at what the UD says will have to go out on more of a limb – guessing, for example, that the UD behaves more like SIA vs. SSA, or that it tends to order infinities in some ways rather than others.

As far as I can tell, this kind of guesswork is basically the best we can do at the moment in trying to get anything anthropically useful out of the UD. Maybe that’s what UD fans have to settle for, but to me, at least, it reframes the most salient route to a UDASSA-ish anthropic approach. In particular, such a route won’t feel like: “Oh boy! If you use the UD, you clearly get XYZ object-level verdicts that were independently attractive.” To my mind, object-level verdicts really aren’t the UD’s strong suit. Rather, it will feel more like: “ugh, applying the UD to anthropics is a nightmare of made-up guess-work and lurking weirdness, but apparently there are really strong grounds for using the UD in general, so we have to make the best of it.”

Perhaps it feels like I’m asking too much of the UD, here. After all, aren’t I ok with e.g. Bayesianism, or Expected Utility Theory, as models of ideal reasoning, even if, in practice, you don’t know what the fully ideal thing would say, and you have to use short-cuts and heuristics instead? Is uncertainty about what the UD says any worse?

To me, it feels worse. In particular, when I make a quick Bayesian model of how much to update based on a negative Covid test, or when I do a quick EV calculation about whether to turn around and get something I forgot, I feel like I am doing a condensed version of a bigger ideal thing, such that (a) the practical version is actually useful, (b) the practical version is closely related in structure to the ideal version, and (c) the practically version’s usefulness is importantly connected to the ideal version’s usefulness.

When I try to apply the UD to the probability of heads in God’s weird coin toss, by contrast, I just feel like I’m just basically lost in the woods. It’s not that I have a condensed version of Solomonoff Induction that I can do to figure out which claws are likely to be shortest, despite the fact that there is, out there, some shortest claw that totally dominates my probability distribution. Ignorant of this claw, though, I end up saying things like: “I dunno, maybe lots of claws are SIA-like, which would mean 1/10th, but maybe lots of those penalize extras a lot, and there’s already ten people in heads, so that puts you at more like ½, plus I guess some claws are more SSA-like…” That is, I feel like I’m just making something up. I’m not doing a toy version of the real thing. I’m guessing wildly about what the real thing does.

XIV. A few other arguments for UDASSA I’ve seen floating around

I want to briefly mention a few other arguments for UDASSA that I’ve seen floating around.

One argument is that even beyond specific verdicts about wire-thickness, UDASSA is better positioned than both SIA and SSA to handle situations in which it becomes unclear how to count the number of observers (for example: vagueness, quantum splitting, etc). And it does seem like a fully developed version of SIA and SSA will indeed have to figure out what to say about cases that don’t seem to fit well into an “there is always an integer number of observers in any given situation” paradigm. I don’t feel I’ve yet heard strong reason to think that something UDASSA-ish, in particular, is required in order to handle such cases decently, but I haven’t thought about it much, and in the meantime, I’ll grant that avoiding questions about “how many observers are there” is a plus for UDASSA, at least in principle (and assuming that the relevant claw doesn’t itself build in some notion of counting observers).

Another argument, discussed by Christiano here [LW · GW], is something to do with the Born probabilities. I’m not going to get into this here: a few brief conversations on the topic made it seem like a big additional can of worms, and at this point I’m not even sure what kind of argument it’s supposed to be (e.g., an argument that UDASSA is at least compatible with the Born rule? An argument that UDASSA predicts or explains the Born rule? An argument that in light of the Born rule, UDASSA is less weird than you might’ve thought?). Those interested can check out Christiano’s discussion for more.

A third argument is that UDASSA allows you to be less dogmatic about anthropics than picking a view like SIA or SSA does. I think the idea here is supposed to be that the UD has non-zero probability on all anthropic theories (I think “claws” and “anthropic theories” end up kind of similar here), and so as you go through life, if you started off with highest credence on the wrong anthropics, you’ll make bad predictions (e.g., “I’m about to encounter a zillion copies of myself”), and you’ll update over time towards a better view. And in the meantime, your credence on e.g. SSA-like anthropic theories will cap your SIA-ish obsession with lots-of-people worlds [LW · GW]; your credence on e.g. SIA-like anthropic theories will cap your SSA-ish obsession with few-people worlds where you know who you are [LW · GW]; and hopefully you’ll emerge as the type of well-rounded, inoffensive anthropic reasoner that we all must surely aspire to be.

I’m a bit unsure how the set-up here is supposed to actually work, but regardless, my current take is that non-dogmatism about anthropics doesn’t require UDASSA: non-UDASSA-fans can distribute their overall credence amongst multiple anthropic theories, too, and get roughly the same benefits.

A final consideration I heard mentioned is that UDASSA (or at least, the UD) fits well with a worldview on which there are lots of nested simulations happening (e.g., simulations making further simulations, which make further simulations), because in those worlds, constraints on the difficulty of specifying worlds will in fact be determining which simulations get run. My take here is:

  1. First, when did “this fits well with a nested simulations worldview” become a philosophical plus? Was there some independent reason to get into nested simulations?
  2. Even in nested-simulations worlds, the UD’s set-up seems to reflect poorly the trade-offs that the simulators would face. In particular, program length doesn’t seem like the sole relevant factor: computational burden, for example, would surely also play a role, not to mention e.g. whatever reasons you might have for running the simulations in the first place.
  3. Also, isn’t the UD supposed to apply to basement worlds, too?
  4. Also, if we’re just talking about running the simulations, rather than extracting observations from them, then wouldn’t all the observers in the simulation be making observations, regardless of how easy it is to claw them out? So even if we care about ease of specifying a simulation, why would we care about ease-of-clawing?

Perhaps there are more arguments for UDASSA swimming around – readers, I’d love to hear your favorites.

XV. Why aren’t I a quark?

I also want to note one additional source of hesitation I feel about UDASSA-ish views: namely, that the type of “claw” required to extract my observe involves complex high-level concepts that are hard to write down on the input tape. Shouldn’t we have expected simpler claws a priori?

This comes out, for example, when people start to imagine that the relevant claw is something like “sample from the space-time points,” or “just grab bits at random from the simulation and print them out.” Here I feel like: dude, if that’s your claw, no way you’re getting out this specific stream of sensory data from this specific ever-shifting monkey-shaped cloud of atoms and stuff. Rather, you’re just going to get some random bits from some random part of the universe. It seems strange to assume that that’s the claw, and that it just so happened to light upon this monkey cloud’s sense-data. Rather, it seems much more likely that the claw in question is in some sense aiming at something monkey-ish directly.

But what kind of claw is that? Once we start talking about claws like “grab like the observations made by the observer at location X,” we’re writing high-level, philosophically-fraught concepts like “observations” and “observers” onto our input tape – and this seems pretty expensive in terms of bits (not to mention philosophical progress). Wouldn’t we expect, instead, some claw that hews more naturally to the language of our physics, which the input tape (we’re assuming) already includes? E.g., some claw specified in terms of something like quarks? But then how are we getting observer-ish stuff out of that? Wouldn’t we expect, instead, to just see bits from the life of some low-numbered quark, or whatever?

One response, here, is to start thinking that somehow, claws that involve high level philosophically fraught concepts like observers (or maybe, “sources of causal influence?”) are in fact easier to write down than you might’ve thought – since apparently, those are the ones that are in some sense “operative.” But here I feel like we should distinguish between the updates we make about which claws are operative, conditional on accepting some UDASSA-ish theory, and the updates we make about whether a UDASSA-ish theory is true, given its initial predictions about what bits would get pulled and printed. If you’re wedded to a UDASSA-ish theory, then once you observe that you’re not a quark, maybe you should start changing your views about which claws are shortest; but if you admit that we should’ve expected a more quark-ish claw a priori, then in that sense, your UDASSA-ish theory gave a bad prediction, and is now back-pedaling.

That said, I acknowledge that I may be asking more of UDASSA, here, than I would ask of other anthropic theories. That is, it’s not necessarily the job of a theory of anthropics to answer the question “why am I an observer at all?” – and SIA and SSA don’t have ready answers, either. Rather, all anthropic theories seem to grant themselves the fact that we are, for whatever reason, these strange monkey creatures with this strange “observer-nature.” Perhaps, if we condition on predicting observations at all, or if we try to pull some “if you weren’t an observer, we wouldn’t be having this conversation” type of move, this isn’t so surprising. Or perhaps, it is surprising (you shouldn’t have expected to be an observer-like thing), but we feel the surprise before we set off on our anthropic project (e.g., “Huh, I guess I’m a vague cloud-like monkey-thing instead of a quark. Wild. But given that, what’s the shortest claw that would claw me?”).

Nevertheless, somehow this issue comes up for me with UDASSA in particular as a source of puzzlement, perhaps because UDASSA has a stronger “theory of everything” vibe, and a better-specified set of tools for making predictions. Comparisons to SIA and SSA aside, if we end up having to appeal to claws that involve concepts like “observers” to make UDASSA work, then I feel like: what’s up with these weird claws? 

XVI. Wrapping up

This has been a long, weedsy post about a very obscure topic. I wanted to write it partly because I feel like the ratio of “causal talk about (and endorsement of) UDASSA amongst a very specific set of people I know in the Bay Area” vs. “written-down discussion of UDASSA” is currently tilted notably far in the “casual talk” direction, and I want to push for better balance.

At present, my overall take is that we should distinguish between the following claims: (a) there are good reasons to treat Solomonoff Induction as a model of ideal reasoning, and (b) doing this helps a lot with anthropics. As I indicated in my previous post, I’m open to (a) – though perhaps less sold than some. But granted (a), I’m skeptical of (b) — and skeptical, as well, that (b) is much of an argument for (a), if we weren’t on board with (a) already. In particular, it seems to me that at this point, we are close to clueless about what Solomonoff Induction (relative to some particular UTM) would actually say about anthropic questions; that trying to guess is an unmoored nightmare; and that to the extent that weird SIA-ish guesses about e.g. soul-magnetism are correct, they imply pretty unappetizing epistemic (/ethical) conclusions.

What’s more, I want to guard against using our current cluelessness about Solomonoff Induction’s anthropic implications as an excuse to sneak in whatever anthropic conclusions we were hoping for (e.g., thirding, non-presumptuousness, no telekinesis, etc), via vague gestures at possible claws that could in principle lead to those conclusions. Indeed, at this point, I trust other, more philosophically agnostic reasoning (e.g., the Dorr-Arntzenius argument for thirding [LW · GW], the basic sense in which telekinesis seems silly [LW · GW], etc) much more than speculation about whatever claws happen to fall out most easily of my arbitrary UTM. I am not, like Christiano, ready to declare UDASSA “the only framework which I would feel comfortable using to make a real decision” (I barely know what making decisions using UDASSA as a framework would even look like) – and I would encourage others not to make high-impact decisions that turn centrally on UDASSA-ish considerations, but that would look strange/bad by the lights of common sense.

Indeed, stepping back and looking at this UDASSA stuff from afar, I’m left with some feeling that it’s all a bit … brittle and made-up. It feels like the thing moved too fast, and abstractly; like it brushed off too many questions, and accepted too many assumptions, along the way, without an adequate grip on what it was doing and why. Maybe it’s interesting; maybe it’s fun party chat. But it’s not real earth, real ground. Or at least, not yet. Maybe for some. Not for me.

That said, I think the topic may still well be worth more investigation, and I can imagine starting to get more excited about the tools UDASSA makes available – especially in the context of infinities, or in dealing with problems counting observers. I encourage UDASSA-fans to do more to write down and make public the arguments that persuade them, and to lay out what they think these arguments imply; and I encourage anyone interested in UDASSA-ish lines of thought to explore where they lead. Perhaps, for example, there are better arguments for the world and claw assumptions, or for specific forms of “weird SIA”/soul-magnetism, than I’m aware of; better ways of guessing about Solomonoff Induction’s outputs given what we know so far; better ways of motivating this whole weird game in the first place.

And what’s more, it’s not like there’s some comfortable “anthropic default” that we can just go back to if we want to avoid all this weirdness. Rather, the anthropic options most salient to me all seem pretty unappealing [LW · GW]; and “agnosticism,” as always, is its own type of bet. We should be wary, then, of doing too much complaining about the downsides of some positive proposal, and not enough reckoning with difficulty of saying anything plausible at all. Appealing to the UD does, at least, add something to menu of anthropic options – and we do need a better menu.

8 comments

Comments sorted by top scores.

comment by justinpombrio · 2021-11-29T16:22:40.555Z · LW(p) · GW(p)

There's a background assumption in these discussions about anthropics, that there is a single correct answer, but I think that the correct probability distribution depends on what your aim is.

Say you're living in a civilization on a continent, and you're not sure whether there's another civilization on a faraway continent. God speaks, and tells you that before He created the world, He wasn't sure whether to make one populated continent or two, so He flipped a coin to decide. Heads one continent, tails two. What is the probability that there is a second civilization on your world?

Say your government is deciding whether to send a sailing expedition to search for the second civilization. If you're alone, then the fruitless expedition costs -$3 million. If you're not alone, you find a trading partner, and net +$2 million.

There are two possible worlds: should the possible single civilization lose $3 million, in order for the possible two civilizations to each gain $2 million? If you want to maximize expected average wealth, the answer is no, and if you want to maximize expected total wealth, the answer is yes. This preference induces a probability distribution: either SIA or SSA, depending on whether you care about the average or total.

What I don't get, is what the answer is if you want to maximize expected personal wealth. (That is, the wealth of your civilization, ignoring others.) I notice I am confused. I almost feel like the question is ill-defined, though I don't know why it would be. I guess this question is what anthropics is about, and I just answered an easier question above. Maybe we should be looking for the gap between the two?

(I made this point before [LW(p) · GW(p)], though less straightforwardly.)

Replies from: Wei_Dai, samuel-shadrach
comment by Wei_Dai · 2021-11-30T00:23:41.270Z · LW(p) · GW(p)

Some possibly relevant/interesting links for you:

I've stopped following most anthropics discussions in recent years, so I'm not sure how much subsequent progress there has been on "selfish anthropics", but I guess not much, judging from the backlinks to Stuart's post?

comment by acylhalide (samuel-shadrach) · 2021-11-30T04:29:43.809Z · LW(p) · GW(p)

There's a background assumption in these discussions about anthropics, that there is a single correct answer, but I think that the correct probability distribution depends on what your aim is.

I echo this intuition weakly - and also if you replace "anthropic theories" with "decision theories".

Anthropic theories or decision theories are said to be "better" if they are in some sense - more intuitive or more intelligent. Often we are implicitly assuming a notion of intelligence under which all agents ( / Turing machines / physical structures / toy models) can be partially ordered. I'm yet to see a sufficiently convincing structure-independent goal-independent world-independent formal definition of intelligence.* I'd be keen to know if anyone has one. If not, maybe there should be more focus on defining this rigorouly rather than relying on intuitions regarding its existence. Especially when discussing philosophical examples that are designed to cause a lot of intuitions to break down.

 

*[Structure-independent = ordering should not be humans grading other machines on "whether this machine's physical structure or reasoning looks like mine".

Goal-independent = ordering should not be humans grading other machines on "whether this machine is optimised for the problems that I care about" or some other narrow set of problems.

World-independent = ordering should not strongly depend on capabilities as measured in "worlds that I happen to belong to". I = Human.

Apologies if all this terminology has already been defined differently, I'm somewhat new to this topic.]

comment by Wei_Dai · 2021-11-30T03:27:07.416Z · LW(p) · GW(p)

This seems like a good overview of UDASSA and its problems. One consideration you didn't touch on is that the universal distribution is in some sense a good approximation [LW · GW] of any computable distribution. (Apparently that's what the "universal" in UD means, as opposed to meaning that it's based on a universal Turing machine.) So an alternative way to look at UD is we can use it as a temporary stand-in, until we figure out what the actually right prior is, or what the real distribution of "reality-fluid" is, or how we should really distribute our "care" over the infinite number of "individuals" in the multiverse. This is how I'm mostly viewing UDASSA now (but haven't really talked about it except in scattered comments).

That UDASSA probably isn't the final right answer to anthropics, along with the opportunity cost involved in investigating any object-level philosophical problem (cf https://www.lesswrong.com/posts/EByDsY9S3EDhhfFzC/some-thoughts-on-metaphilosophy [LW · GW]) and the slow progress of investigation (where applying effort at any current margin seems to only cause a net increase in open problems), I think explains a lot of why there's not much research/writings about UDASSA.

comment by Charlie Steiner · 2021-12-11T23:17:00.354Z · LW(p) · GW(p)

I like this article, but I think you are trying to hold on to a physics-based interpretation even when it makes less sense.

I think the zenith of this was when you say something like "I'm much more confident in the existence of an external world than I am that an external world is used in the simplest explanation of my sense data." To you, the method by which you get legitimate information about the world is disconnected from Solomonoff induction to a potentially arbitrary degree, and we need to make sure that Solomonoff induction is "kept on the straight and narrow" of only considering physical hypotheses. But the more "Solomonoff native" perspective is that sometimes you'll consider non-physical hypotheses, and that's totally okay, and if there is an external world out there you'll probably pick up on the fact pretty quickly, but even if there is an external world and there's also some other simpler hypothesis that explains what you know better, such that you don't end up believing in the external world, that's actually okay and not some sort of failure that must be avoided at any cost.

I think whether you're not a quark is intimately tied up with whether an external world is a good explanation of your sense data. So it makes sense that if one is unclear the other is unclear.

The same perspective issues color the discussions of infinities and soul magnetism. You start by considering a fixed physical world and then ask which copy within that world you expect to be. But you never consider a fixed set of memories and feelings and then ask what physical world you expect to be around you. What would SIA or SSA say about this latter case - what's the problem here?

comment by avturchin · 2021-11-29T13:22:31.453Z · LW(p) · GW(p)

Mueller in his article "Law without law: from observer states to physics via algorithmic information theory" suggested to use Solomonoff induction to go directly from one observer-state to another. 

Thus he probably escapes the "world and claws" problem, but ends up with a variant of Egan's dust theory in mathematical world.