On the limits of idealized values

joe-carlsmith

On the limits of idealized values

post by Joe Carlsmith (joekc) · 2021-06-22T02:10:50.073Z · LW · GW · 20 comments

  I. Clarifying the view
  II. The appeal
  III. Which idealization?
  IV. Galaxy Joe
  V. Mind-hacking vs. insight
  VI. Privileged procedures
  VII. Appeals to actual attitudes
  VIII. Appeals to idealized attitudes
  IX. Hoping for convergence, tolerating indeterminacy
  X. Passive and active ethics
  XI. Ghost civilizations
None
21 comments

(Cross-posted from Hands and Cities)

On a popular view about meta-ethics, what you should value is determined by what an idealized version of you would value. Call this view “idealizing subjectivism.”

Idealizing subjectivism has been something like my best-guess meta-ethics. And lots of people I know take it for granted. But I also feel nagged by various problems with it — in particular, problems related to (a) circularity, (b) indeterminacy, and (c) “passivity.” This post reflects on such problems.

My current overall take is that especially absent certain strong empirical assumptions, idealizing subjectivism is ill-suited to the role some hope it can play: namely, providing a privileged and authoritative (even if subjective) standard of value. Rather, the version of the view I favor mostly reduces to the following (mundane) observations:

If you already value X, it’s possible to make instrumental mistakes relative to X.
You can choose to treat the outputs of various processes, and the attitudes of various hypothetical beings, as authoritative to different degrees.

This isn’t necessarily a problem. To me, though, it speaks against treating your “idealized values” the way a robust meta-ethical realist treats the “true values.” That is, you cannot forever aim to approximate the self you “would become”; you must actively create yourself, often in the here and now. Just as the world can’t tell you what to value, neither can your various hypothetical selves — unless you choose to let them. Ultimately, it’s on you.

I. Clarifying the view

Let’s define the view I have in mind a little more precisely:

Idealizing subjectivism: X is intrinsically valuable, relative to an agent A, if and only if, and because, A would have some set of evaluative attitudes towards X, if A had undergone some sort of idealization procedure.

By evaluative attitudes, I mean things like judgments, endorsements, commitments, cares, desires, intentions, plans, and so on. Versions of the view differ in which they focus on.

Example types of idealization might include: full access to all relevant information; vivid imaginative acquaintance with the relevant facts; the limiting culmination of some sort of process of reflection, argument, and/or negotiation/voting/betting between representatives of different perspectives; the elimination of “biases”; the elimination of evaluative attitudes that you don’t endorse or desire; arbitrary degrees of intelligence, will-power, dispassion, empathy, and other desired traits; consistency; coherence; and so on.

Note that the “and because” in the definition is essential. Without it, we can imagine paradigmatically non-subjectivist views that qualify. For example, it could be that the idealization procedure necessarily results in A’s recognizing X’s objective, mind-independent value, because X’s value is one of the facts that falls under “full information.” Idealizing subjectivism explicitly denies this sort of picture: the point is that A’s idealized attitudes make X valuable, relative to A. (That said, views on which all idealized agents converge in their evaluative attitudes can satisfy the definition above, provided that value is explained by the idealized attitudes in question, rather than vice versa.)

“Relative to an agent A,” here, means something like “generating (intrinsic) practical reasons for A.”

II. The appeal

Why might one be attracted to such a view? Part of the appeal, I think, comes from resonance with three philosophical impulses:

A rejection of certain types of robust realism about value, on which value is just a brute feature of the world “out there.”
A related embrace of a kind of Humeanism about means and ends. The world can tell you the means to your ends, but it cannot tell you what ends to pursue — those must in some sense be there already, in your (idealized?) heart.
An aspiration to maintain some kind of deep connection between what’s valuable, and what actually moves us to act (though note that this connection is not universalized — e.g., what’s valuable relative to you may not be motivating to others).

Beyond this, though, a key aim of idealizing subjectivism (at least for me) is to capture the sense in which it’s possible to question what you should value, and to make mistakes in your answer. That is, the idealization procedure creates some distance between your current evaluative attitudes, and the truth about what’s valuable (relative to you). Things like “I want X,” or “I believe that X is valuable” don’t just settle the question.

This seems attractive in cases like:

(Factual mistake) Alfred wants his new “puppy” Doggo to be happy. Doggo, though, is really a simple, non-conscious robot created by mischievous aliens. If Alfred knew Doggo’s true nature, he would cease to care about Doggo in this way.
(Self knowledge) Betty currently feels very passionately about X cause. If she knew, though, that her feelings were really the product of a desire to impress her friend Beatrice, and to fit in with her peers more broadly, she’d reject them. (This example is inspired by one from Yudkowsky here.)
(Philosophical argument) Cindy currently thinks of herself as an average utilitarian, and she goes around trying to increase average utility. However, if she learned more about the counterintuitive implications to average utilitarianism, she would switch to trying to increase total utility instead.
(Vividness) Denny knows that donating $10,000 to the Against Malaria Foundation, instead of buying a new grand piano, would save multiple lives in expectation. He’s currently inclined to buy the grand piano. However, if he imagined more vividly what it means to save these lives, and/or if he actually witnessed the impact that saving these lives would have, he’d want to donate instead.
(Weakness of the will) Ernesto is trapped by a boulder, and he needs to cut off his own hand to get free, or he’ll die. He really doesn’t want to cut off his hand. However, he would want himself to cut off the hand, if he could step back and reflect dispassionately.
(Incoherence) Francene prefers vacationing in New York to San Francisco, San Francisco to LA, and LA to New York, and she pays money to trade “vacation tickets” in a manner that reflects these preferences. However, if she reflected more on her vulnerability to losses this way, she’d resolve her circular preferences into New York > SF > LA.
(Inconsistency) Giovanni’s intuitions are (a) it’s impermissible to let a child drown in order to save an expensive suit, (b) it’s permissible to buy a suit instead of donating the money to save a distant child, and (c) there’s no morally relevant difference between these cases. If he had to give one of these up, he’d give up (c).
(Vicious desires) Harriet feels a sadistic desire for her co-worker to fail and suffer, but she wishes that she didn’t feel this desire.

By appealing to the hypothetical attitudes of these agents, the idealizing subjectivist aims to capture a sense that their actual attitudes are, or at least could be, in error.

Finally, idealizing subjectivism seems to fit with of our actual practices of ethical reflection. For example, thinking about value, we often ask questions like: “what would I think/feel if I understood this situation better?”, “what would I think if I weren’t blinded by X emotion or bias?” and so forth — questions reminiscent of idealization. And ethical debate often involves seeking a kind of reflective equilibrium — a state that some idealizers take as determining what’s valuable, rather than indicating it.

These, then, are among the draws of idealizing subjectivism (there are others) — though note that whether the view can actually deliver these goods (anti-realism, Humeanism, fit with our practices, etc) is a further question, which I won’t spend much time on.

What about objections? One common objection is that the view yields counterintuitive results. Plausibly, for example, we can imagine ideally-coherent suffering maximizers, brick-eaters, agents who are indifferent towards future agony, agents who don’t care about what happens on future Tuesdays, and so on — agents whose pursuit of their values, it seems, need involve no mistakes (relative to them). We can debate which of such cases the idealized subjectivist must concede, but pretty clearly: some. In a sense, cases like this lie at the very surface of the view. They’re the immediate implications.

(As I’ve discussed previously, we can also do various semantic dances, here, to avoid saying certain relativism-flavored things. For example, we can make “a paperclip maximizer shouldn’t clip” true in a hedonist’s mouth, or a paperclip maximizer’s statement “I should clip” false, evaluated by a hedonist. Ultimately, though, these moves don’t seem to me to change the basic picture much.)

My interest here is in a different class of more theoretical objections. I wrote about one of these in my post about moral authority. This post examines some others. (Many of them, as well as many of the examples I use throughout the post, can be found elsewhere in the literature in some form or other.)

III. Which idealization?

Consider Clippy, the paperclip maximizing robot. On a certain way of imagining Clippy, its utility function is fixed and specifiable independent of its behavior, including behavior under “idealized conditions.” Perhaps we imagine that there is a “utility function slot” inside of Clippy’s architecture, in which the programmers have written “maximize paperclips!” — and it is in virtue of possessing this utility function that Clippy consistently chooses more paperclips, given idealized information. That is, Clippy’s behavior reveals Clippy’s values, but it does not constitute those values. The values are identifiable by other means (e.g., reading what’s written in the utility function slot).

If your values are identifiable by means other than your behavior, and if they are already coherent, then it’s much easier to distinguish between candidate idealization procedures that preserve your values vs. changing them. Holding fixed the content of Clippy’s “utility function slot,” for example, we can scale up Clippy’s knowledge, intelligence, etc, while making sure that the resulting, more sophisticated agent is also a paperclip maximizer.

But note, though, that in such a case, appeals to idealization also don’t seem to do very much useful normative work, for subjectivists. To explain what’s of value relative to this sort of Clippy, that is, we can just look directly at Clippy’s utility function. If humans were like this, we could just look at a human’s “utility function slot,” too. No fancy idealization necessary.

But humans aren’t like this. We don’t have a “utility function slot” (or at least, I’ll assume as much in what follows; perhaps this — more charitably presented — is indeed an important point of dispute). Rather, our beliefs, values, heuristics, cognitive procedures, and so on are, generally speaking, a jumbled, interconnected mess (here I think of a friend’s characterization, expressed with a tinge of disappointment and horror: “an unholy and indeterminate brew of these … sentiments”). The point of idealizing subjectivism is to take this jumbled mess as an input to an idealization procedure, and then to output something that plays the role of Clippy’s utility function — something that will constitute, rather than reveal, what’s of value relative to us.

In specifying this idealization procedure, then, we don’t have the benefit of holding fixed the content of some slot, or of specifying that the idealization procedure can’t “change your values.” Your values (or at least, the values we care about not changing) just are whatever comes out the other side of the idealization procedure.

Nor, importantly, can we specify the idealization procedure via reference to some independent truth that its output needs to track. True, we evaluate the “ideal-ness” of other, more epistemic procedures this way (e.g., the ideal judge of the time is the person whose judgment actually tracks what time it is — see Enoch (2005)). But the point of idealizing subjectivism is that there is no such independent truth available.

Clearly, though, not just any idealization procedure will do. Head bonkings, brainwashings, neural re-wirings — starting with your current brain, we can refashion you into a suffering-maximizer, a brick-eater, a helium-maximizer, you name it. So how are we to distinguish between the “ideal” procedures, and the rest?

IV. Galaxy Joe

To me, this question gains extra force from the fact that your idealized self, at least as standardly specified, will likely be a quite alien creature. Consider, for example, the criterion, endorsed in some form by basically every version of idealization subjectivism, that your idealized self possess “full information” (or at least, full relevant information — but what determines relevance?). This criterion is often treated casually, as though a run-of-the-mill human could feasibly satisfy it with fairly low-key modifications. But my best guess is that to the extent that possessing “full information” is a thing at all, the actual creature to imagine is more like a kind of God — a being (or perhaps, a collection of beings) with memory capacity, representational capacity, and so on vastly exceeding that of any human. To evoke this alien-ness concretely, let’s imagine a being with a computationally optimal brain the size of a galaxy. Call this a “galaxy Joe.”

Here, we might worry that no such galaxy Joe could be “me.” But it’s not clear why this would matter, to idealizing subjectivists: what’s valuable, relative to Joe, could be grounded in the evaluative attitudes of galaxy Joe, even absent a personal identity relation between them. The important relation, for example, be some form of psychological continuity (though I’ll continue to use the language of self-hood in what follows).

Whether me or not, though: galaxy Joe seems like he’ll likely be, from my perspective, a crazy dude. It will be hard/impossible to understand him, and his evaluative attitudes. He’ll use concepts I can’t represent. His ways won’t be my ways.

Suppose, for example, that a candidate galaxy Joe — a version of myself created by giving original me “full information” via some procedure involving significant cognitive enhancement — shows me his ideal world. It is filled with enormously complex patterns of light ricocheting off of intricate, nano-scale, mirror-like machines that appear to be in some strange sense “flowing.” These, he tells me, are computing something he calls [incomprehensible galaxy Joe concept (IGJC) #4], in a format known as [IGJC #5], undergirded and “hedged” via [IGJC #6]. He acknowledges that he can’t explain the appeal of this to me in my current state.

“I guess you could say it’s kind of like happiness,” he says, warily. He mentions an analogy with abstract jazz.

“Is it conscious?” I ask.

“Um, I think the closest short answer is ‘no,’” he says.

Of course, by hypothesis, I would become him, and hence value what he values, if I went through the procedure that created him — one that apparently yields full information. But now the question of whether this is a procedure I “trust,” or not, looms large. Has galaxy Joe gone off the rails, relative to me? Or is he seeing something incredibly precious and important, relative to me, that I cannot?

The stakes are high. Suppose I can create either this galaxy Joe’s favorite world, or a world of happy puppies frolicking in the grass. The puppies, from my perspective, are a pretty safe bet: I myself can see the appeal. Expected value calculations under moral uncertainty aside, suppose I start to feel drawn towards the puppies. Galaxy Joe tells me with grave seriousness: “Creating those puppies instead of IGJC #4 would be a mistake of truly ridiculous severity.” I hesitate. Is he right, relative to me? Or is he basically, at this point, an alien, a paperclip maximizer, for all his humble roots in my own psychology?

Is there an answer?

V. Mind-hacking vs. insight

Here’s a related intuition pump. Just as pills and bonks on the head can change your evaluative attitudes, some epistemically-flavored stimuli can do so, too. Some such changes we think of as “legitimate persuasion” or “value formation,” others we think of as being “brainwashed,” “mind-hacked,” “reprogrammed,” “misled by rhetoric and emotional appeals,” and so on. How do we tell (or define) the difference?

Where there are independent standards of truth, we can try appealing to them. E.g., if Bob, a fiery orator, convinces you that two plus two is five, you’ve gone astray (though even cases like this can get tricky). But in the realm of pure values, and especially absent other flagrant reasoning failures, it gets harder to say.

One criterion might be: if the persuasion process would’ve worked independent of its content, this counts against its legitimacy (thanks to Carl Shulman for discussion). If, for example, Bob, or exposure to a certain complex pattern of pixels, can convince you of anything, this might seem a dubious source of influence. That said, note that certain common processes of value formation — for example, attachment to your hometown, or your family — are “content agnostic” to some extent (e.g., you would’ve attached to a different hometown, or a different family, given a different upbringing); and ultimately, different evolutions could’ve built wildly varying creatures. And note, too, that some standard rationales for such a criterion — e.g., being convinced by Bob/the pixels doesn’t correlate sufficiently reliably with the truth — aren’t in play here, since there’s no independent truth available.

Regardless, though, this criterion isn’t broad enough. In particular, some “mind-hacking” memes might work because of their content — you can’t just substitute in arbitrary alternative messages. Indeed: one wonders, and worries, about what sort of Eldritch horrors might be lurking in the memespace, ready and able, by virtue of their content, to reprogram and parasitize those so foolish, and incautious, as to attempt some sort of naive acquisition of “full information.”

To take a mundane example: suppose that reading a certain novel regularly convinces people to become egoists, and you learn, to your dismay (you think of yourself as an altruist), that it would convince you to become so, too, if you read it. Does your “idealization procedure” involve reading it? You’re not used to avoiding books, and this one contains, let’s suppose, no falsehoods or direct logical errors. Still, on one view, the book is, basically, brainwashing. On another, the book is a window onto a new and legitimately more compelling vision of life. By hypothesis, you’d take the latter view after reading. But what’s the true view?

Or suppose that people who spend time in bliss-inducing experience machines regularly come to view time spent in such machines as the highest good, because their brains receive such strong reward signals from the process, though not in a way different in kind from other positive experiences like travel, fine cuisine, romantic love, and so on (thanks to Carl Shulman for suggesting this example). You learn that you, too, would come to view machine experiences this way, given exposure to them, despite the fact that you currently give priority to non-hedonic goods. Does your idealization process involve entering such machines? Would doing so result in a “distortion,” an (endorsed, desired) “addiction”; or would it show you something you’re currently missing — namely, just how intrinsically good, relative to you, these experiences really are?

Is there an answer?

As with the candidate galaxy Joe above, what’s needed here is some way of determining which idealization procedures are, as it were, the real deal, and which create imposters, dupes, aliens; which brain-wash, alter, or mislead. I’ll consider three options for specifying the procedure in question, namely:

Without reference to your attitudes/practices.
By appeal to your actual attitudes/practices.
By appeal to your idealized attitudes/practices.

All of these, I think, have problems.

VI. Privileged procedures

Is there some privileged procedure for idealizing someone, that we can specify and justify without reference to that person’s attitudes (actual or ideal)? To me, the idea of giving someone “full information” (including logical information), or of putting them in a position of “really understanding” (assuming, perhaps wrongly, that we can define this in fully non-evaluative terms) is the most compelling candidate. Indeed, when I ask myself whether, for example, IGJC #4 is really good (relative to me), I find myself tempted to ask: “how would I feel about it, if I really understood it?”. And the question feels like it has an answer.

One justification for appealing to something like “full information” or “really understanding” is: it enables your idealized self to avoid instrumental mistakes. Consider Alfred, owner of Doggo above. Because Alfred doesn’t know Doggo’s true nature (e.g., a simple, non-conscious robot), Alfred doesn’t know what he’s really causing, when he e.g. takes Doggo to the park. He thinks he’s causing a conscious puppy to be happy, but he’s not. Idealized Alfred knows better. Various other cases sometimes mentioned in support of idealizing — e.g., someone who drinks a glass of petrol, thinking it was gin — can also be given fairly straightforward instrumental readings.

But this justification seems too narrow. In particular: idealizers generally want the idealization process to do more than help you avoid straightforward instrumental mistakes. In cases 1-8 above, for example, Alfred’s is basically the only one that fits this instrumental mold straightforwardly. The rest involve something more complex — some dance of “rewinding” psychological processes (see more description here), rejecting terminal (or putatively terminal) values on the basis of their psychological origins, and resolving internal conflicts by privileging some evaluative attitudes, stances, and intuitions over others. That is, the idealization procedure, standardly imagined, is supposed to do more than take in someone who already has and is pursuing coherent values, and tell them how to get what they want; that part is (theoretically) easy. Rather, it’s supposed to take in an actual, messy, internally conflicted human, and output coherent values — values that are in some sense “the right answer” relative to the human in question.

Indeed, I sometimes wonder whether the appeal of idealizing subjectivism rests too much on people mistaking its initial presentation for the more familiar procedure of eliminating straightforward instrumental mistakes. In my view, if we’re in a theoretical position to just get rid of instrumental mistakes, then we’re already cooking with gas, values-wise. But the main game is messier — e.g., using hypothetical selves (which?) to determine what counts as an instrumental mistake, relative to you.

There’s another, subtly different justification for privileging “full information,” though: namely, that once you’ve got full information, then (assuming anti-realism about values) you’ve got everything that the world can give you. That is: there’s nothing about reality that you’re, as it were, “missing” — no sense in which you should hesitate from decision, on the grounds that you might learn something new, or be wrong about some independent truth. The rest, at that point, is up to you.

I’m sympathetic to this sort of thought. But I also have a number of worries about it.

One (fairly minor) is whether it justifies baking full information into the idealization procedure, regardless of the person’s attitudes towards acquiring such information. Consider someone with very limited interest in the truth, and whose decision-making process, given suitable opportunity, robustly involves actively and intentionally self-modifying to close off inquiry and lock in various self-deceptions/falsehoods. Should we still “force” this person’s idealized self to get the whole picture before resolving questions like whether to self-deceive?

A second worry, gestured at above, is that the move from my mundane self to a being with “full information” is actually some kind of wild and alien leap: a move not from Joe to “Joe who has gotten out a bit more, space and time-wise” but from Joe to galaxy Joe, from Joe to a kind of God. And this prompts concern about the validity of the exercise.

Consider its application to a dog, or an ant. What would an ant value, if it had “full information”? What, for that matter, would a rock value, if it had full information? If I were a river, would I flow fast, or slow? If I were an egg, would I be rotten? Starting with a dog, or an ant, or a rock, we can create a galaxy-brained God. Or, with the magic of unmoored counterfactuals, we can “cut straight to” some galaxy-brained God or other, via appeal to some hazy sort of “similarity” to the dog/ant/rock in question, without specifying a process for getting there — just as we can try to pick an egg that I would be, if I were an egg. With dogs, or ants, though, and certainly with rocks, it seems strange to give the resulting galaxy-brain much authority, with respect to what the relevant starting creature/rock “truly values,” or should. In deciding whether to euthanize your dog Fido, should you ask the nearest galaxy-brained former-Fido? If not, are humans different? What makes them so?

This isn’t really a precise objection; it’s more of a hazy sense that if we just ask directly “how would I feel about X, if I were a galaxy brain?”, we’re on shaky ground. (Remember, we can’t specify my values independently, hold them fixed, and then require that the galaxy brain share them; the whole point is that the galaxy brain’s attitudes constitute my values.)

A third worry is about indeterminacy. Of the many candidate ways of creating a fully informed galaxy Joe, starting with actual me, it seems possible that there will be important path-dependencies (this possibility is acknowledged by many idealizers). If you learn X information, or read Y novel, or have Z experience, before some alternatives (by hypothesis, you do all of it eventually), you will arrive at a very different evaluative endpoint than if the order was reversed. Certainly, much real-life value formation has this contingent character: you meet Suzy, who loves the stoics, is into crypto, and is about to start a medical residency, so you move to Delaware with her, read Seneca, start hanging out with libertarians, and so on. Perhaps such contingency persists in more idealized cases, too. And if we try to skip over process and “cut straight to” a galaxy Joe, we might worry, still, that equally qualified candidates will value very different things: “full information” just isn’t enough of a constraint.

(More exotically, we might also worry that amongst all the evaluative Eldritch horrors lurking in the memespace, there is one that always takes over all of the Joes on their way to becoming fully-informed galaxy Joes, no matter what they do to try to avoid it, but which is still in some sense “wrong.” Or that full information, more generally, always involves memetic hazards that are fatal from an evaluative perspective. It’s not clear that idealizing subjectivism has the resources to accommodate distinctions between such hazards and the evaluative truth. That said, these hypotheses also seem somewhat anti-Humean in flavor. E.g., can’t fully-informed minds value any old thing?)

Worries about indeterminacy become more pressing once we recognize all the decisions a galaxy Joe is going to have to make, and all of the internal evaluative conflicts he will have to resolve (between object-level and meta preferences, competing desires, contradictory intuitions, and the like), that access to “full information” doesn’t seem to resolve for him. Indeed, the Humean should’ve been pessimistic about the helpfulness of “full information” in this regard from the start. If, by Humean hypothesis, your current, imperfect knowledge of the world can’t tell you what to want for its own sake, and/or how to resolve conflicts between different intrinsic values, then perfect knowledge won’t help, either: you still face what is basically the same old game, with the same old gap between is and ought, fact and value.

Beyond accessing “full information,” is there a privileged procedure for playing this game, specifiable without reference to the agent’s actual or idealized attitudes? Consider, for example, the idea of “reflective equilibrium” in ethics — the hypothesized, stable end-state of a process of balancing more specific intuitions with more general principles and theoretical considerations. How, exactly, is this balance to be struck? What weight, for example, should be given to theoretical simplicity and elegance, vs. fidelity to intuition and common sense? In contexts with independent standards of accuracy, we might respond to questions like this with reference to the balance most likely to yield the right answer; but for the idealizer, there is not yet a right answer to be sought; rather, the reflective equilibrium process makes its output right. But which reflective equilibrium process?

Perhaps we might answer: whatever reflective equilibrium process actually works in the cases where there is a right answer (thanks to Nick Beckstead for discussion). That is, you should import the reasoning standards you can actually evaluate for accuracy (for example, the ones that work in e.g. physics, math, statistics, and so on) into a domain (value) with no independent truth. Thus, for example, if simplicity is a virtue in science, because (let’s assume) the truth is often simple, it should be a virtue in ethics, too. But why? Why not do whatever’s accurate in the case where accuracy is a thing, and then something else entirely in the domain where you can’t go wrong, except relative to your own standards?

(We can answer, here, by appeal to your actual or idealized attitudes: e.g., you just do, in fact, use such-and-such standards in the evaluative domain, or would if suitably idealized. I discuss these options in the next sections. For now, the question is whether we can justify particular idealization procedures absent such appeals.)

Or consider the idea that idealization involves or is approximated by “running a large number of copies of yourself, who then talk/argue a lot with each other and with others, have a bunch of markets, and engage in lots of voting and trading and betting” (see e.g. Luke Muelhauser’s description here), or that it involves some kind of “moral parliament.” What sorts of norms, institutions, and procedures structure this process? How does it actually work? Advocates of these procedures rarely say in any detail (though see here for one recent discussion); but presumably, one assumes, “the best procedures, markets, voting norms, etc.” But is there a privileged “best,” specifiable and justifiable without appeal to the agent’s actual/idealized attitudes? Perhaps we hope that the optimal procedures are just there, shining in their optimality, identifiable without any object-level evaluative commitments (here Hume and others say: what?), or more likely, given any such commitments. My guess, though, is that absent substantive, value-laden assumptions about veils of ignorance and the like, and perhaps even given such assumptions, this hope is over-optimistic.

The broader worry, here, is that once we move past “full information,” and start specifying the idealization procedure in more detail (e.g., some particular starting state, some particular type of reflective equilibrium, some particular type of parliament), or positing specific traits that the idealized self needs to have (vivid imagination, empathy, dispassion, lack of “bias,” etc), our choice of idealization will involve (or sneak in) object-level value judgments that we won’t be able to justify as privileged without additional appeal to the agent’s (actual or idealized) attitudes. Why vivid imagination, or empathy (to the extent they add anything on top of “full information”)? Why a cool hour, instead of a hot one? What counts as an evaluative bias, if there is no independent evaluative truth? The world, the facts, don’t answer these questions.

If we can’t appeal to the world to identify a privileged idealization procedure, it seems we must look to the agent instead. Let’s turn to that option now.

VII. Appeals to actual attitudes

Suppose we appeal to your actual attitudes about idealization procedures, in fixing the procedure that determines what’s of value relative to you. Thus, if we ask: why this particular reflective equilibrium? We answer: because that’s the version you in fact use/endorse. Why this type of parliament, these voting norms? They’re the ones you in fact favor. Why empathy, or vivid imagination, or a cool hour? Because you like them, prefer them, trust them. And so on.

Indeed, some idealization procedures make very explicit reference to the “idealized you” that you yourself want to be/become. In cases like “vicious desires” above, for example, your wanting not to have a particular desire might make it the case that “idealized you” doesn’t have it. Similarly, Yudkowsky’s “coherent extrapolated volition” appeals to the attitudes you would have if you were “more the person you wished you were.”

At a glance, this seems an attractive response, and one resonant with a broader subjectivist vibe. However, it also faces a number of problems.

First: just as actual you might be internally conflicted about your object-level values (conflicts we hoped the idealization procedure would resolve), so too might actual you be internally conflicted about the procedural values bearing on the choice of idealization procedure. Perhaps, for example, there isn’t currently a single form of reflective equilibrium that you endorse, treat as authoritative, etc; perhaps there isn’t a single idealized self that you “wish you were,” a single set of desires you “wish you had.” Rather, you’re torn, at a meta-level, about the idealization procedures you want to govern you. If so, there is some temptation, on pain of indeterminacy, to look to an idealization procedure to resolve this meta-conflict, too; but what type of idealization procedure to use is precisely what you’re conflicted about (compare: telling a group torn about the best voting procedure to “vote on it using the best procedure”).

Indeed, it can feel like proponents of this version of the view hope, or assume, that you are in some sense already engaged in, or committed to, a determinate decision-making process of forming/scrutinizing/altering your values, which therefore need only be “run” or “executed.” Uncertainty about your values, on this picture, is just logical uncertainty about what the “figure out my values computation” you are already running will output. The plan is in place. Idealization executes.

But is this right? Clearly, most people don’t have very explicit plans in this vein. At best, then, such plans must be implicit in their tangle of cognitive algorithms. Of course, it’s true that if put in different fully-specified situations, given different reflective resources, and forced to make different choices given different constraints, there is in fact a thing a given person would do. But construing these choices as the implementation of a determinate plan/decision-procedure (as opposed to e.g., noise, mistakes, etc), to be extrapolated into some idealized limit, is, at the least, a very substantive interpretative step, and questions about indeterminacy and path dependence loom large. Perhaps, for example, what sort of moral parliament Bob decides to set up, in different situations, depends on the weather, or on what he had for breakfast, or on which books he read in what order, and so on. And perhaps, if we ask him which such situation he meta-endorses as most representative of his plan for figuring out his values, he’ll again give different answers, given different weather, breakfasts, books, etc — and so on.

(Perhaps we can just hope that this bottoms out, or converges, or yields patterns/forms of consensus robust enough to interpret and act on; or perhaps, faced with such indeterminacy, we can just say: “meh.” I discuss responses in this vein in section IX.)

Second (though maybe minor/surmountable): even if your actual attitudes yield determinate verdicts about the authoritative form of idealization, it seems like we’re now giving your procedural/meta evaluative attitudes an unjustified amount of authority relative to your more object-level evaluative attitudes. That is, we’re first using your procedural/meta evaluative attitudes to fix an idealization procedure, then judging the rest of your attitudes via reference to that procedure. But why do the procedural/meta attitudes get such a priority?

This sort of issue is most salient in the context of cases like the “vicious desires” one above. E.g., if you have (a) an object-level desire that your co-worker suffer, and (b) a meta-desire not to have that object-level desire, why do we choose an “ideal you” in which the former is extinguished, and the latter triumphant? Both, after all, are just desires. What grants meta-ness such pride of place?

Similarly, suppose that your meta-preferences about idealization give a lot of weight to consistency/coherence — but that consistency/coherence will require rejecting some of your many conflicting object-level desires/intuitions. Why, then, should we treat consistency/coherence as a hard constraint on “ideal you,” capable of “eliminating” other values whole hog, as opposed to just one among many other values swirling in the mix?

(Not all idealizers treat consistency/coherence in this way; but my sense is that many do. And I do actually think there’s more to say about why consistency/coherence should get pride of place, though I won’t try to do so here.)

Third: fixing the idealization procedure via reference to your actual (as opposed to your idealized) evaluative attitudes risks closing off the possibility of making mistakes about the idealization procedure you want to govern you. That is, this route can end up treating your preferences about idealization as “infallible”: they fix the procedure that stands in judgment over the rest of your attitudes, but they themselves cannot be judged. No one watches the watchmen.

One might have hoped, though, to be able to evaluate/criticize one’s currently preferred idealization procedures, too. And one might’ve thought the possibility of such criticism truer to our actual patterns of uncertainty and self-scrutiny. Thus: if you currently endorse reflective equilibrium process X, but you learn that it implies an idealized you that gives up currently cherished value Y, you may not simply say: “well, that’s the reflective equilibrium process I endorse, so there you have it: begone, Y.” Rather, you can question reflective equilibrium process X on the very grounds that it results in giving up cherished value Y — that is, you can engage in kind of meta-reflective equilibrium, in which the authority of a given process of reflective equilibrium is itself subject to scrutiny from the standpoint of the rest of what you care about.

Indeed, if I was setting off on some process of creating my own “moral parliament,” or of modifying myself in some way, then even granted access to “full information,” I can well imagine worrying that the parliament/self I’m creating is of the wrong form, and that the path I’m on is the wrong one. (This despite the fact that I can accurately forecast its results before going forward — just as I can accurately forecast that, after reading the egoist novel, or entering the experience machine, I’ll come out with a certain view on the other end. Such forecasts don’t settle the question).

We think of others as making idealization procedure mistakes, too. Note, for example, the tension between appealing to your actual attitudes towards idealization, and the (basically universal?) requirement that the idealized self possess something like full (or at least, much more) information. Certain people, for example, might well endorse idealization processes that lock in certain values and beliefs very early, and that as a result never reach any kind of fully informed state: rather, they arrive at a stable, permanently ignorant/deceived equilibrium well before that. Similarly, certain people’s preferred idealization procedures might well lead them directly into the maw of some memetic hazard or other (“sure, I’m happy to look at the whirling pixels”).

Perhaps we hope to save such people, and ourselves, from such (grim? ideal?) fates. We find ourselves saying: “but you wouldn’t want to use that idealization procedure, if you were more idealized!”. Let’s turn to this kind of thought, now.

VIII. Appeals to idealized attitudes

Faced with these problems with fixing the idealization procedure via reference to our actual evaluative attitudes, suppose we choose instead to appeal to our idealized evaluative attitudes. Naive versions of this, though, are clearly and problematically circular. What idealization determines what’s of value? Well, the idealization you would decide on, if you were idealized. Idealized how? Idealized in the manner you would want yourself to be idealized, if you were idealized. Idealized how? And so on. (Compare: “the best voting procedure is the one that would be voted in by the best voting procedure.”)

Of course, some idealization procedures could be self-ratifying, such that if you were idealized in manner X, you would choose/desire/endorse idealization process X. But it seems too easy to satisfy this constraint: if after idealization process X, I end up with values Y, then I can easily end up endorsing idealization process X, since this process implies that pursuing Y is the thing for me to do (and I’m all about pursuing Y); and this could hold true for a very wide variety of values resulting from a very wide variety of procedures. So “value is determined by the evaluative attitudes that would result from an idealization procedure that you would choose if you underwent that very procedure” seems likely to yield wildly indeterminate results; and more importantly, its connection with what you actually care about now seems conspicuously tenuous. If I can brainwash you into becoming a paperclip maximizer, I can likely do so in a way that will cause you to treat this very process as one of “idealization” or “seeing the light.” Self-ratification is too cheap.

Is there a middle ground, here, between using actual and idealized attitudes to fix the idealization procedure? Some sort of happy mix? But which mix? Why?

In particular, in trying to find a balance between endless circles of idealization, and “idealized as you want to be, period,” I find that I run into a kind of “problem of arbitrary non-idealization,” pulling me back towards the circle thing. Thus, for example, I find that at every step in the idealization process I’m constructing, it feels possible to construct a further process to “check”/”ratify” that step, to make sure it’s not a mistake. But this further process will itself involve steps, which themselves could be mistakes, and which themselves must therefore be validated by some further process — and so on, ad infinitum. If I stop at some particular point, and say “this particular process just isn’t getting checked. This one is the bedrock,” I have some feeling of: “Why stop here? Couldn’t this one be mistaken, too? What if I wouldn’t want to use this process as bedrock, if I thought more about it?”.

Something similar holds for particular limitations on e.g. the time and other resources available. Suppose you tell me: “What’s valuable, relative to you, is just what you’d want if ten copies of you thought about it for a thousand years, without ever taking a step of reasoning that another ten copies wouldn’t endorse if they thought about that step for a thousand years, and that’s it. Done.” I feel like: why not a hundred copies? Why not a billion years? Why not more levels of meta-checking? It feels like I’m playing some kind of “name the largest number” game. It feels like I’m building around me an unending army of ethereal Joes, who can never move until all the supervisors arrive to give their underlings the go-ahead, but everyone can never arrive, because there’s always room for more.

Note that the problem here isn’t about processes you might run or compute, in the actual world, given limited resources. Nor is it about finding a process that you’d at least be happy deferring to, over your current self; a process that is at least better than salient alternatives. Nor, indeed, is the problem “how can I know with certainty that my reasoning process will lead me to the truth” (there is no independent truth, here). Rather, the problem is that I’m supposed to be specifying a fully idealized process, the output of which constitutes the evaluative truth; but for every such process, it feels like I can make a better one; any given process seems like it could rest on mistakes that a more exhaustive process would eliminate. Where does it stop?

IX. Hoping for convergence, tolerating indeterminacy

One option, here, is to hope for some sort of convergence in the limit. Perhaps, we might think, there will come a point where no amount of additional cognitive resources, levels of meta-ratification, and so on will alter the conclusion. And perhaps indeed — that would be convenient.

Of course, there would remain the question of what sort of procedure or meta-procedure to “take the limit” of. But perhaps we can pull a similar move there. Perhaps, that is, we can hope that a very wide variety of candidate procedures yield roughly similar conclusions, in the limit.

Indeed, in general, for any of these worries about indeterminacy, there is an available response to the effect that: “maybe it converges, though?” Maybe as soon as you say “what Joe would feel if he really understood,” you hone in on a population of Galaxy Joes that all possess basically the same terminal values, or on a single Galaxy Joe who provides a privileged answer. Maybe Bob’s preferences about idealization procedures are highly stable across a wide variety of initial conditions (weather, breakfasts, books, etc). Maybe it doesn’t really matter how, and in what order, you learn, read, experience, reflect: modulo obvious missteps, you end up in a similar place. Maybe indeed.

Or, if not, maybe it doesn’t matter. In general, lots of things in life, and especially in philosophy, are vague to at least some extent; arguments to the effect that “but how exactly do you define X? what about Y edge case?” are cheap, and often unproductive; and there really are bald people, despite the indeterminacy of exactly who qualifies.

What’s more, even if there is no single, privileged idealized self, picked out by a privileged idealization procedure, and even if the many possible candidates for procedures and outputs do not converge, it seems plausible that there will still be patterns and limited forms of consensus. For example, it seems unlikely that many of my possible idealized selves end up trying to maximize helium, or to eat as many bricks as they can; even if a few go one way, the preponderance may go some other way; and perhaps it’s right to view basically all of them, despite their differences, as worthy of deference from the standpoint of my actual self, in my ignorance (e.g., perhaps the world any of them would create is rightly thought better, from my perspective, than the world I would create, if I wasn’t allowed further reflection).

In this sense, the diverging attitudes of such selves may still be able to play some of the role the idealizer hopes for. That is, pouring my resources into eating bricks, torturing cats, etc really would be a mistake, for me — none of my remotely plausible idealized selves are into it — despite the fact that these selves differ in the weight they give to [incomprehensible galaxy-brained concept] vs. [another incomprehensible galaxy-brained concept]. And while processes that involve averaging between idealized selves, picking randomly amongst them, having them vote/negotiate, putting them behind veils of ignorance, etc raise questions about circularity/continuing indeterminacy, that doesn’t mean that all such processes are on equal footing (e.g., different parties can be unsure what voting procedure to use, while still being confident/unanimous in rejecting the one that causes everyone to lose horribly).

Perhaps, then, the idealizer’s response to indeterminacy — even very large amounts of it — should simply be tolerance. Indeed, there is an art, in philosophy, to not nitpicking too hard — to allowing hand-waves, and something somethings, where appropriate, in the name of actually making progress towards some kind of workable anything. Perhaps some of the worries above have fallen on the wrong side of the line. Perhaps a vague gesture, a promissory note, in the direction of something vaguely more ideal than ourselves is, at least in practical contexts (though this isn’t one), good enough; better than nothing; and better, too, than setting evaluative standards relative to our present, decidedly un-ideal selves, in our ignorance and folly.

X. Passive and active ethics

I want to close by gesturing at a certain kind of distinction — between “passive” and “active” ethics (here I’m drawing terminology and inspiration from a paper of Ruth Chang’s, though the substance may differ) — which I’ve found helpful in thinking about what to take away from the worries just discussed.

Some idealizing subjectivists seem to hope that their view can serve as a kind of low-cost, naturalism-friendly substitute for a robustly realist meta-ethic. That is, modulo certain extensional differences about e.g. ideally-coherent suffering maximizers, they basically want to talk about value in much the way realists do, and to differ, only, when pressed to explain what makes such talk true or false.

In particular, like realists, idealizers can come to see every (or almost every) choice and evaluative attitude as attempting to approximate and conform to some external standard, relative to which the choice or attitude is to be judged. Granted, the standard in question is defined by the output of the idealization procedure, instead of the robustly real values; but in either case, it’s something one wants to recognize, receive, perceive, respond to. For us non-ideal agents, the “true values” are still, effectively, “out there.” We are, in Chang’s terminology, “passive” with respect to them.

But instructively, I think, naive versions of this can end up circular. Consider the toy view that “what’s good is whatever you’d believe to be good if you had full information.” Now suppose that you get this full information, and consider the question: is pleasure good? Well, this just amounts to the question: would I think it good if I had full information? Well, here I am with full information. Ok, do I think it good? Well, it’s good if I would think it good given full information. Ok, so is it good? And so on.

Part of the lesson here is that absent fancier footwork about what evaluative belief amounts to, belief isn’t a good candidate for the evaluative attitude idealization should rest on. But consider a different version: “what you should do is whatever you would do, given full information.” Suppose that here I am with full information. I ask myself: what should I do? Well, whatever I would do, given full information. Ok, well, I’ve got that now. What would I do, in precisely this situation? Well, I’m in this situation. Ok, what would I do, if things were like this? Well, I’d try to do what I should do. And what should I do? Etc.

The point here isn’t that there’s “no way out,” in these cases: if I can get myself to believe, or to choose, then I will, by hypothesis, have believed truly, chosen rightly. Nor, indeed, need all forms of idealizing subjectivism suffer from this type of problem (we can appeal, for example, to attitudes that plausibly arise more passively and non-agentically, like desire).

Rather, what I’m trying to point at is a way that importing and taking for granted a certain kind of realist-flavored ethical psychology can result in an instructive sort of misfire. Something is missing, in these cases, that I expect the idealizing subjectivist needs. In particular: these agents, to the end, lack an affordance for a certain kind of direct, active agency — a certain kind of responsibility, and self-creation. They don’t know how to choose, fully, for themselves. Rather, even in ideal conditions, they are forever trying to approximate something else. True, on idealizing subjectivism, the thing they are trying to approximate is ultimately, themselves, in those conditions. But this is no relief: still, they are approximating an approximator, of an approximator, and so on, in an endless loop. They are always looking elsewhere, forever down the hall of mirrors, around and around a maze with no center (what’s in the center?). Their ultimate task, they think, is to obey themselves. But they can only obey: they cannot govern, and so have no law.

It’s a related sort of misfire, I think, that gives rise to the “would an endless army of ethereal Joes ratify every step of my reasoning, and the reasoning of the ratifiers, and so on?” type of problem I discussed above. That is, one wants every step to conform to some external standard — and the only standards available are built out of armies of ethereal Joes. But those Joes, too, must conform. It’s conformity all the way down — except that for the anti-realist, there’s no bottom.

What’s needed, here, is a type of choice that is creating, rather than trying to conform — and which hence, in a sense, is “infallible.” And here perhaps one thinks, with the realists: surely the types of choices we’re interested in here — choices about which books, feelings, machines, galaxy brains, Gods, to “trust”; which puppies, or nanomachines, to create — are fallible. Or if not, surely they are, in a sense, arbitrary — mere “pickings,” or “plumpings.” If you aren’t trying to conform to some standard, than how can you truly, and non-arbitrarily, choose? I don’t have a worked-out story, here (though I expect that we can at least distinguish such creative choices from e.g. Buridan’s-ass style pickings — for example, they don’t leave you indifferent). But it’s a question that I think subjectivists must face; and which I feel some moderate optimism about answering (though perhaps not in a way that gives realists what they want).

Of course, subjectivists knew, all along, that certain things about themselves were going to end up being treated as effectively infallible, from an evaluative perspective. Whatever goes in Clippy’s utility function slot, for subjectivists, governs what’s valuable relative to Clippy; and it does so, on subjectivism, just in virtue of being there — in virtue of being the stuff that the agent is made out of (this is part of the arbitrariness and contingency that so bothers realists). The problem that the idealizer faces is that actual human agents are not yet fully made: rather, they’re still a tangled mess. But the idealizer’s hope is that they’re sufficiently “on their way to getting made” that we can, effectively, assume they’re already there; the seed has already determined a tree, or a sufficiently similar set of trees; we just haven’t computed the result.

But is that how trees grow? Have you already determined a self? Have you already made what would make you, if all went well? Do you know, already, how to figure out who you are? Perhaps for some the answer is yes, or close enough. Perhaps for all. In that case, you are already trying to do something, already fighting for something — and it is relative to that something that you can fail.

But if the choice has not yet been made, then it is we who will have to make it. If the sea is open, then so too is it ours to sail.

Indeed, even if in some sense, the choice has been made — even if there is already, out there, a privileged idealized version of yourself; even if all of the idealization procedures converge to a single point — the sea, I think, is still open, if you step back and make it so. You can still reject that self, and the authority of the procedure(s) that created it, convergence or no. Here I think of a friend of mine, who expressed some distress at the thought that his idealized self could in principle turn out to be a Voldemort-like character. His distress, to me, seemed to assume that his idealized self was “imposed on him”; that he “had,” as it were, to acknowledge the authority of his Voldemort self’s values. But such a choice is entirely his. He can, if he wishes, reject the Voldemort, and the parts of himself (however strong) that created it; he can forge his own path, towards a new ideal. The fact that he would become a Voldemort, under certain conditions he might’ve thought “ideal,” is ultimately just another fact, to which he himself must choose how to respond.

Perhaps some choices in this vein will be easier, and more continuous/resonant with his counterfactual behavior and his existing decision-making processes; some paths will be harder, and more fragile; some, indeed, are impossible. But these facts are still, I think, just facts; the choice of how to respond to them is open. The point of subjectivism is that the standards (relative to you) used to evaluate your behavior must ultimately be yours; but who you are is not something fixed, to be discovered and acknowledged by investigating what you would do/feel in different scenarios; rather, it is something to be created, and choice is the tool of creation. Your counterfactual self does not bind you.

In a sense, what I’m saying here is that idealizing subjectivism is, and needs to be, less like “realism-lite,” and more like existentialism, than is sometimes acknowledged. If subjectivists wish to forge, from the tangled facts of actual (and hypothetical) selfhood, an ideal, then they will need, I expect, to make many choices that create, rather than conform. And such choices will be required, I expect, not just as a “last step,” once all the “information” is in place, but rather, even in theory, all along the way. Such choice, indeed, is the very substance of the thing.

(To be clear: I don’t feel like I’ve worked this all out. Mostly, I’ve been trying to gesture at, and inhabit, some sort of subjectivist existentialist something, which I currently find more compelling than a more realist-flavored way of trying to be an idealizer. What approach to meta-ethics actually makes most sense overall and in practice is a further question.)

XI. Ghost civilizations

With this reframing in mind, some of the possible circles and indeterminacies discussed above seem to me less worrying — rather, they are just more facts, to be responded to as I choose. Among all the idealized selves (and non-selves), and all combinations, there is no final, infallible evaluative authority — no rescuer, Lord, father; no safety. But there are candidate advisors galore.

Here’s an illustration of what I mean, in the context of an idealization I sometimes think about.

I’ve written, in the past, about a “ghost” version of myself — that is, one that can float free from my body; which travel anywhere in all space and time, with unlimited time, energy, and patience; and which can also make changes to different variables, and play forward/rewind different counterfactual timelines (the ghost’s activity somehow doesn’t have any moral significance).

I sometimes treat such a ghost kind of like an idealized self. It can see much that I cannot. It can see directly what a small part of the world I truly am; what my actions truly mean. The lives of others are real and vivid for it, even when hazy and out of mind for me. I trust such a perspective a lot. If the ghost would say “don’t,” I’d be inclined to listen.

As I usually imagine it, though, the ghost isn’t arbitrarily “ideal.” It hasn’t proved all the theorems, or considered all the arguments. It’s not all that much smarter than me; it can’t comprehend anything that I, with my brain, can’t comprehend. It can’t directly self-modify. And it’s alone. It doesn’t talk with others, or make copies of itself. In a sense, this relative mundanity makes me trust it more. It’s easier to imagine than a galaxy brain. I feel like I “know what I’m dealing with.” It’s more “me.”

We can imagine, though, a version of the thought experiment where we give the ghost more leeway. Let’s let it make copies. Let’s give it a separate realm, beyond the world, where it has access to arbitrary technology. Let’s let it interact with whatever actual and possible humans, past and future, that it wants, at arbitrary depths, and even to bring them into the ghost realm. Let’s let it make new people and creatures from scratch. Let’s let it try out self-modifications, and weird explorations of mind-space — surrounded, let’s hope, by some sort of responsible ghost system for handing explorations, new creatures, and so on (here I imagine a crowd of copy ghosts, supervising/supporting/scrutinizing an explorer trying some sort of process or stimulus that could lead to going off the rails). Let’s let it build, if it wants, a galaxy brain, or a parliament, or a civilization. And let’s ask it, after as much of all this as it wants, to report back about what it values.

If I try to make, of this ghost civilization, some of sort of determinate, privileged ideal, which will define what’s of value, relative to me, I find that I start to run into the problems discussed above. That is, I start wondering about whether the ghost civilization goes somewhere I actually want; how much different versions of it diverge, based on even very similar starting points; how to fix the details in a manner that has any hope of yielding a determinate output, and how arbitrary doing so feels. I wonder whether the ghosts will find suitable methods of cooperating, containing memetic hazards, and so on; whether I would regret defining my values relative to this hazy thought experiment, if I thought about it more; whether I should instead be focusing on a different, even more idealized thought experiment; where the possible idealizing ends.

But if I let go of the thought that there is, or need be, a single “true standard,” here — a standard that is, already, for me, the be-all-end-all of value — then I feel like I can relate to the ghosts differently, and more productively. I can root for them, as they work together to explore the distant reaches of what can be known and thought. I can admire them, where they are noble, cautious, compassionate, and brave; where they build good institutions and procedures; where they cooperate. I can try, myself, to see through their eyes, looking out on the vastness of space, time, and the beings who inhabit it; zooming in, rewinding, examining, trying to understand. In a sense, I can use the image of them to connect with, and strengthen, what I myself value, now (indeed, I think that much actual usage of “ideal advisor” thought experiments, at least in my own life, is of this flavor).

And if I imagine the ghosts becoming more and more distant, alien, and incomprehensible, I can feel my confidence in their values begin to fray. Early on, I’m strongly inclined to defer to them. Later, I am still rooting for them; but I start to see them as increasingly at the edges of things, stepping forward into the mist; they’re weaving on a tapestry that I can’t see, now; they’re sailing, too, on the open sea, further than I can ever go. Are they still good, relative to me? Have they gone “off the rails”? The question itself starts to fade, too, and with it the rails, the possibility of mistake. Perhaps, if necessary, I could answer it; I could decide whether to privilege the values of some particular ghost civilization, however unrecognizable, over my own current feelings and understanding; but answering is increasingly an act of creation, rather than an attempt at discovery.

Certainly, I want to know where the ghost civilization goes. Indeed, I want to know where all the non-Joe civilizations, ghostly or not, go too. I want to know where all of it leads. And I can choose to defer to any of these paths, Joe or non-Joe, to different degrees. I’m surrounded, if I wish to call on them, by innumerable candidate advisors, familiar and alien. But the choice of who, if any of them, to listen to, is mine. Perhaps I would choose, or not, to defer, given various conditions. Perhaps I would regret, or not; would kick myself, or not; would rejoice, or not. I’m interested to know that, too. But these “woulds” are just more candidate advisors. It’s still on me, now, in my actual condition, to choose.

(Thanks to Katja Grace, Ketan Ramakrishnan, Nick Beckstead, Carl Shulman, and Paul Christiano for discussion.)

20 comments

Comments sorted by top scores.

comment by cousin_it · 2021-06-23T09:34:22.373Z · LW(p) · GW(p)

Very nice and clear writing, thank you! This is exactly the kind of stuff I'd love to see more on LW:

Suppose I can create either this galaxy Joe’s favorite world, or a world of happy puppies frolicking in the grass. The puppies, from my perspective, are a pretty safe bet: I myself can see the appeal.

Though I think some parts could use more work, shorter words and clearer images:

Second (though maybe minor/surmountable): even if your actual attitudes yield determinate verdicts about the authoritative form of idealization, it seems like we’re now giving your procedural/meta evaluative attitudes an unjustified amount of authority relative to your more object-level evaluative attitudes.

But most of the post is good.

R. Scott Bakker made a related point in Crash Space:

The reliability of our heuristic cues utterly depends on the stability of the systems involved. Anyone who has witnessed psychotic episodes has firsthand experience of consequences of finding themselves with no reliable connection to the hidden systems involved. Any time our heuristic systems are miscued, we very quickly find ourselves in ‘crash space,’ a problem solving domain where our tools seem to fit the description, but cannot seem to get the job done.

And now we’re set to begin engineering our brains in earnest. Engineering environments has the effect of transforming the ancestral context of our cognitive capacities, changing the structure of the problems to be solved such that we gradually accumulate local crash spaces, domains where our intuitions have become maladaptive. Everything from irrational fears to the ‘modern malaise’ comes to mind here. Engineering ourselves, on the other hand, has the effect of transforming our relationship to all contexts, in ways large or small, simultaneously. It very well could be the case that something as apparently innocuous as the mass ability to wipe painful memories will precipitate our destruction. Who knows? The only thing we can say in advance is that it will be globally disruptive somehow, as will every other ‘improvement’ that finds its way to market.

Human cognition is about to be tested by an unparalleled age of ‘habitat destruction.’ The more we change ourselves, the more we change the nature of the job, the less reliable our ancestral tools become, the deeper we wade into crash space.

In other words, yeah, I can imagine an alter ego who sees more and thinks better than me. As long as it stays within human evolutionary bounds, I'm even okay with trusting it more than myself. But once it steps outside these bounds, it seems like veering into "crash space" is the expected outcome.

Replies from: joekc

↑ comment by Joe Carlsmith (joekc) · 2021-06-24T06:51:32.295Z · LW(p) · GW(p)

Glad you liked it, and thanks for sharing the Bakker piece -- I found it evocative.

comment by paulfchristiano · 2021-06-23T19:16:40.626Z · LW(p) · GW(p)

I feel like "Something is good to the extent that an idealized version of me would judge it good" is a useful heuristic about goodness, but I agree that it doesn't really work as a definition and I liked this post.

It seems like an important heuristic if we are in a bad position to figure out what is good directly (e.g. because we are spending our time fending off catastrophe or competing with each other), where it feels possible to construct an idealization that we'd trust more than ourselves (e.g. by removing the risk of extinction or some kinds of destructive conflict).

In particular, it seems we could (often) trust them to figure out how to perform further idealization better than we would. We don't want to pick just any self-ratifying idealization, but we can hope to get around this by taking little baby steps each of which ratifies the next. The very simple version of this heuristic quickly loses its value once we take a few steps and fix the most obviously broken+urgent things about our situation. Then we are left with hard questions about which kinds of idealizations are "best," and then eventually with hard object-level questions.

(I do think any of these calls, even the apparently simple ones, is value-laden. For example, it seems like a human is not a single coherent entity, and different processes of idealization could lead to different balances between conflicting desires or ways of being. This kind of problem is more obvious for groups than individuals, since it's clear from everyday life how early steps of "idealization" can fundamentally change the balance of power, but I think it's also quite important for incoherent individuals. Not to mention more mundane forms of wrongness, that seem possible even for the simplest kinds of idealization and mean that no idealization is really a free lunch.)

I wrote about something a bit like your "ghost civilization" here (under "Finding Earth" and then "Extrapolation").

Replies from: joekc

↑ comment by Joe Carlsmith (joekc) · 2021-06-24T06:49:24.850Z · LW(p) · GW(p)

I agree that it's a useful heuristic, and the "baby steps" idealization you describe seems to me like a reasonable version to have in mind and to defer to over ourselves (including re: how to continue idealizing). I also appreciate that your 2012 post actually went through sketched a process in that amount of depth/specificity.

comment by habryka (habryka4) · 2021-06-24T20:22:10.214Z · LW(p) · GW(p)

Promoted to curated: I really liked this post. I've had some thoughts along similar lines for a while, and this post clarified a bunch of them in much better ways than I have succeeded at so far. It also seems like a pretty important topic. Thank you for writing this!

comment by Wei Dai (Wei_Dai) · 2021-12-01T00:40:20.078Z · LW(p) · GW(p)

On a popular view about meta-ethics, what you should value is determined by what an idealized version of you would value. Call this view “idealizing subjectivism.”

What do you think of the view that “idealizing subjectivism” is just a "interim meta-ethics", a kind of temporary placeholder until we figure out the real nature of morality? As an analogy, consider "what you should believe (about math and science) is determined by what an idealized version of you would believe (about math and science)." This doesn't seem very attractive to us, given available alternative philosophies of math and science that are more direct and less circular, but might have made a good interim philosophy of math and science in the past, when we were much more confused about these topics.

Another comment is that I wish all meta-ethics, including this one, would engage more with the idea that the functional role of morality in humans is apparently some combination of:

a tool - for cooperation and increasing group fitness
a weapon - to coordinate and attack enemies/rivals with (e.g., bringing someone down by creating a mob to accuse them of some moral violation that you may have just recently invented)
a game - to gain status by displaying virtue/morality (by ordinary people) or intelligence/wisdom/sophistication (by philosophers)

This seems like it ought to have some implications for meta-ethics, but I'm not sure what exactly, and again wish there was more engagement with it. (See also related comment here [LW(p) · GW(p)].) Perhaps one relevant question is, should you think of your idealized self as existing in an environment where morality still plays these roles? Why or why not?

Replies from: joekc

↑ comment by Joe Carlsmith (joekc) · 2021-12-02T07:20:51.032Z · LW(p) · GW(p)

In the past, I've thought of idealizing subjectivism as something like an "interim meta-ethics," in the sense that it was a meta-ethic I expected to do OK conditional on each of the three meta-ethical views discussed here [LW · GW], e.g.:

Internalist realism (value is independent of your attitudes, but your idealized attitudes always converge on it)
Externalist realism (value is independent of your attitudes, but your idealized attitudes don't always converge on it)
Idealizing subjectivism (value is determined by your idealized attitudes)

The thought was that on (1), idealizing subjectivism tracks the truth. On (2), maybe you're screwed even post-idealization, but whatever idealization process you were going to do was your best shot at the truth anyway. And on (3), idealizing subjectivism is just true. So, you don't go too far wrong as an idealizing subjectivist. (Though note that we can run similar lines or argument for using internalist or externalist forms of realism as the "interim meta-ethics." The basic dynamic here is just that, regardless of what you think about (1)-(3), doing your idealization procedures is the only thing you know how to do, so you should just do it.)

I still feel some sympathy towards this, but I've also since come to view attempts at meta-ethical agnosticism of this kind as much less innocent and straightforward than this picture hopes. In particular, I feel like I see meta-ethical questions interacting with object-level moral questions, together with other aspects of philosophy, at tons of different levels (see e.g. here [LW · GW], here [LW · GW], and here [LW · GW] for a few discussions), so it has felt corresponding important to just be clear about which view is most likely to be true.

Beyond this, though, for the reasons discussed in this post, I've also become clearer in my skepticism that "just do your idealization procedure" is some well-defined thing that we can just take for granted. And I think that once we double click on it, we actually get something that looks less like any of 1-3, and more like the type of active, existentialist-flavored thing I tried to point at in Sections X and XI [LW · GW].

Re: functional roles of morality, one thing I'll flag here is that in my view, the most fundamental meta-ethical questions aren't about morality per se, but rather are about practical normativity more generally (though in practice, many people seem most pushed towards realism by moral questions in particular, perhaps due to the types of "bindingness" intuitions I try to point at here [LW · GW] -- intuitions that I don't actually think realism on its own helps with).

Should you think of your idealized self as existing in a context where morality still plays these (and other) functional roles? As with everything about your idealization procedure, on my picture it's ultimately up to you. Personally, I tend to start by thinking about individual ghost versions of myself [LW · GW] who can see what things are like in lots of different counterfactual situations (including, e.g., situations where morality plays different functional roles, or in which I am raised differently), but who are in some sense "outside of society," and who therefore aren't doing much in the way of direct signaling, group coordination, etc. That said, these ghost version selves start with my current values, which have indeed resulted from my being raised in environments where morality is playing roles of the kind you mentioned.

Replies from: Wei_Dai

↑ comment by Wei Dai (Wei_Dai) · 2021-12-02T16:21:52.396Z · LW(p) · GW(p)

so it has felt corresponding important to just be clear about which view is most likely to be true.

I guess this means you've rejected both versions of realism as unlikely? Have you explained why somewhere? What do you think about position 3 in this list [LW · GW]?

As with everything about your idealization procedure, on my picture it’s ultimately up to you.

This sounds like a version of my position 4. Would you agree? I think my main problem with it is that I don't know how to rule out positions 1,2,3,5,6.

therefore aren’t doing much in the way of direct signaling, group coordination, etc.

Ok, interesting. How does your ghost deal with the fact that the real you is constrained/motivated by the need to do signaling and coordination with morality? (For example does the ghost accommodate the real you by adjusting its conclusions to be more acceptable/useful for these purposes?) Is "desire for status" a part of your current values that the ghost inherits, and how does that influence its cognition?

Replies from: joekc

↑ comment by Joe Carlsmith (joekc) · 2021-12-08T01:22:47.334Z · LW(p) · GW(p)

I haven't given a full account of my views of realism anywhere, but briefly, I think that the realism the realists-at-heart want is a robust non-naturalist realism, a la David Enoch, and that this view implies:

an inflationary metaphysics that it just doesn't seem like we have enough evidence for,
an epistemic challenge (why would we expect our normative beliefs to correlate with the non-natural normative facts?) that realists have basically no answer to except "yeah idk but maybe this is a problem for math and philosophy too?" (Enoch's chapter 7 covers this issue; I also briefly point at it in this section [LW · GW], in talking about why the realist bot would expect its desires and intuitions to correlate with the the contents of the envelope buried in the mountain), and
an appeal to a non-natural realm that a lot of realists take as necessary to capture the substance and heft of our normative lives, but which I don't think is necessary for this, at least when it comes to caring (i think moral "authority" and "bindingness regardless of what you care about" might be a different story [LW · GW], but one that "the non-natural realm says so" doesn't obviously help with, either). i wrote up my take on this issue here [LW · GW].

Also, most realists are externalists, and I think that externalist realism severs an intuitive connection between normativity and motivation that I would prefer to preserve (though this is more of an "I don't like that" than a "that's not true" objection). I wrote about this here [LW · GW].

There are various ways of being a "naturalist realist," too, but the disagreement between naturalist realism and anti-realism/subjectivism/nihilism is, in my opinion, centrally a semantic one. The important question is whether anything normativity-flavored is in a deep sense something over and above the standard naturalist world picture. Once we've denied that, we're basically just talking about how to use words to describe that standard naturalist world picture. I wrote a bit about how I think of this kind of dialectic here [LW · GW]:

This is a familiar dialectic in philosophical debates about whether some domain X can be reduced to Y (meta-ethics is a salient comparison to me). The anti-reductionist (A) will argue that our core intuitions/concepts/practices related to X make clear that it cannot be reduced to Y, and that since X must exist (as we intuitively think it does), we should expand our metaphysics to include more than Y. The reductionist (R) will argue that X can in fact be reduced to Y, and that this is compatible with our intuitions/concepts/everyday practices with respect to X, and hence that X exists but it’s nothing over and above Y. The nihilist (N), by contrast, agrees with A that it follows from our intuitions/concepts/practices related to X that it cannot be reduced to Y, but agrees with D that there is in fact nothing over and above Y, and so concludes that there is no X, and that our intuitions/concepts/practices related to X are correspondingly misguided. Here, the disagreement between A vs. R/N is about whether more than Y exists; the disagreement between R vs. A/N is about whether a world of only Y “counts” as a world with X. This latter often begins to seem a matter of terminology; the substantive questions have already been settled.

There's a common strain of realism in utilitarian circles that tries to identify "goodness" with something like "valence," treats "valence" as a "phenomenal property", and then tries to appeal to our "special direct epistemic access" to phenomenal consciousness in order to solve the epistemic challenge above. i think this doesn't help at all (the basic questions about how the non-natural realm interacts with the natural one remain unanswered -- and this is a classic problem for non-physicalist theories of consciousness as well), but that it gets its appeal centrally via running through people's confusion/mystery relationship with phenomenal consciousness, which muddies the issue enough to make it seem like the move might help. I talk about issues in this vein a bit in the latter half of my podcast with Gus Docker.

Re: your list of 6 meta-ethical options, I'd be inclined to pull apart the question of

(a) do any normative facts exists, and if so, which ones, vs.
(b) what's the empirical situation with respect to deliberation within agents and disagreement across agents (e.g., do most agents agree and if so why; how sensitive is the deliberation of a given agent to initial conditions, etc).

With respect to (a), my take is closest to 6 ("there aren't any normative facts at all") if the normative facts are construed in a non-naturalist way, and closest to "whatever, it's mostly a terminology dispute at this point" if the normative facts are construed in a naturalist way (though if we're doing the terminology dispute, I'm generally more inclined towards naturalist realism over nihilism). Facts about what's "rational" or "what decision theory wins" fall under this response as well (I talk about this a bit here [LW · GW]).

With respect to (b), my first pass take is "i dunno, it's an empirical question," but if I had to guess, I'd guess lots of disagreement between agents across the multiverse, and a fair amount of sensitivity to initial conditions on the part of individual deliberators.

Re: my ghost, it starts out valuing status as much as i do, but it's in a bit of a funky situation insofar as it can't get normal forms of status for itself because it's beyond society. It can, if it wants, try for some weirder form of cosmic status amongst hypothetical peers ("what they would think if they could see me now!"), or it can try to get status for the Joe that it left behind in the world, but my general feeling is that the process of stepping away from the Joe and looking at the world as a whole tends to reduce its investment in what happens to Joe in particular, e.g. [LW · GW]:

Perhaps, at the beginning, the ghost is particularly interested in Joe-related aspects of the world. Fairly soon, though, I imagine it paying more and more attention to everything else. For while the ghost retains a deep understanding of Joe, and a certain kind of care towards him, it is viscerally obvious, from the ghost’s perspective, unmoored from Joe’s body, that Joe is just one creature among so many others; Joe’s life, Joe’s concerns, once so central and engrossing, are just one tiny, tiny part of what’s going on.

That said, insofar as the ghost is giving recommendations to me about what to do, it can definitely take into account the fact that I want status to whatever degree, and am otherwise operating in the context of social constraints, coordination mechanisms, etc.

Replies from: Wei_Dai

↑ comment by Wei Dai (Wei_Dai) · 2021-12-08T04:06:39.139Z · LW(p) · GW(p)

an epistemic challenge (why would we expect our normative beliefs to correlate with the non-natural normative facts?) that realists have basically no answer to except “yeah idk but maybe this is a problem for math and philosophy too?”

i think this doesn’t help at all (the basic questions about how the non-natural realm interacts with the natural one remain unanswered—and this is a classic problem for non-physicalist theories of consciousness as well), but that it gets its appeal centrally via running through people’s confusion/mystery relationship with phenomenal consciousness, which muddies the issue enough to make it seem like the move might help.

It seems that you have a tendency to take "X'ists don't have an answer to question Y" as strong evidence for "Y has no answer, assuming X" and therefore "not X", whereas I take it as weak evidence for such because it seems pretty likely that even if Y has an answer given X, humans are just not smart enough to have found it yet. It looks like this may be the main crux that explains our disagreement over meta-ethics (where I'm much more of an agnostic).

but my general feeling is that the process of stepping away from the Joe and looking at the world as a whole tends to reduce its investment in what happens to Joe in particular

This doesn't feel very motivating to me (i.e., why should I imagine idealized me being this way), absent some kind of normative force that I currently don't know about (i.e., if there was a normative fact that I should idealize myself in this way). So I'm still in a position where I'm not sure how idealization should handle status issues (among other questions/confusions about it).

comment by Rana Dexsin · 2021-06-26T06:46:10.707Z · LW(p) · GW(p)

Approximately what I might have said had I attempted to actually make it coherent! I look forward to seeing what comes out of this.

(One exception: if galaxy Rana were to match anything like your description, I have a rough pre-existing protocol (which is only partially computed and also hard to describe) for trying to work this out. I think this might not generalize well to other value systems or mind architectures, though, and I doubt it invalidates the thought experiment as such.)

comment by Charlie Steiner · 2021-06-22T17:12:28.847Z · LW(p) · GW(p)

Good stuff! I'm not sure if you pulled your punches at the end in service of hope - different ghost councils will lead you to different decisions, as will different ways of consulting ghost councils, and different ways of choosing consultation methods, and so on ad infinitum. There is no Cartesian boundary between you and the ghosts that lets you pick what ghosts to listen to from a point of infinite distance and infinite leverage; you just kinda get the ghosts you get. You have to make peace not only with your own agency, but also with your own contingency, to end up in a place where maybe doing your best really can be enough.

Replies from: joekc

↑ comment by Joe Carlsmith (joekc) · 2021-06-24T07:00:25.706Z · LW(p) · GW(p)

Thanks :). I didn't mean for the ghost section to imply that the ghost civilization solves the problems discussed in the rest of the post re: e.g. divergence, meta-divergence, and so forth. Rather, the point was that taking responsibility for making the decision yourself (this feels closely related to "making peace with your own agency"), in consultation with/deference towards whatever ghost civilizations etc you want, changes the picture relative to e.g. requiring that there be some particular set of ghosts that already defines the right answer.

comment by Ofer (ofer) · 2021-06-25T13:19:29.641Z · LW(p) · GW(p)

Or consider the idea that idealization involves or is approximated by “running a large number of copies of yourself, who then talk/argue a lot with each other and with others, […]”

Later in the "Ghost civilizations" section you mentioned the idea of ghost copies "supervising/supporting/scrutinizing an explorer trying some sort of process or stimulus that could lead to going off the rails". It's interesting to think about technologies like lie-detectors in this context, for mitigating risks like the "memetic hazards that are fatal from an evaluative perspective" that you mentioned. For example, suppose that a Supervisor Copy asks many Explorer Copies to enter a secure room that is then locked. The Explorer Copies then pursue a certain risky line of thought X. They then get to write down their conclusion, but the Supervisor Copy only gets to read it if all the Explorer Copies pass a lie-detector test in which they claim that they did not stumble upon any "memetic hazard" etc.

As an aside, all those copies can be part of a single simulation that we run for this purpose, in which they all get treated very well (even if they end up without the ability to affect anything outside the simulation).

Related to what you wrote near the end ("In a sense, I can use the image of them…"), I just want to add that using an imaginary idealized version of oneself as an advisor may be a great way to mitigate some harmful cognitive biases and also just a great productivity trick.

comment by TAG · 2021-06-22T18:13:58.358Z · LW(p) · GW(p)

A rejection of certain types of robust realism about value, on which value is just a brute feature of the world “out there.”

Its a three horse race , not a two horse race. There isn't just realism and subjectivism (individual level relativism), there's group level ethics.

Its s fact that it exists, and what it exists to share behaviour...otherwise there would not be such behaviour shaping social phenomena as praise and blame, punishment and reward.

A related embrace of a kind of Humeanism about means and ends. The world can tell you the means to your ends, but it cannot tell you what ends to pursue — those must in some sense be there already, in your (idealized?) heart.

But society can tell you what not to do.

You have noticed that some subjects might have murdery values. So you can't get any intuitively satisfactory ethics out of everyone doing what they value, since some people want to murder.

Your solution is .. the "ideal" part of ideal subjectivism? But it's not clear that would turn murdery people into non murdery people ... and it's voluntary anyway ..if they don't value reflective equilibrium, they're not going to do it.

An aspiration to maintain some kind of deep connection between what’s valuable, and what actually moves us to act (though note that this connection is not universalized — e.g., what’s valuable relative to you may not be motivating to others).

Why? If you want an entirely voluntary system of ethics, I suppose that is valuable.

There's a sense in which ethical motivation always comes from what individuals value, but that doesn't imply that motivation has to come from a subjective or solipsistic process. Group morality also has a solution: society punishes you, or threatens to, and that works on your subjective (but shared) desire not to be punished.

Theres probably a moral question about what values people should or would voluntarily pursue, once the problems of do-not-steal and do-not-kill have been solved. Making a voluntary , private, decision to achieve your own values.

But the thou-shalt-nots,the aspects of ethics that are basically public, basically obligatory, and basically about not putting negative value on other people, are more important. That's built into the word "supererogatory".

Replies from: joekc, Charlie Steiner

↑ comment by Joe Carlsmith (joekc) · 2021-06-24T07:09:20.732Z · LW(p) · GW(p)

I agree that there are other meta-ethical options, including ones that focus more on groups, cultures, agents in general, and so on, rather than individual agents (an earlier draft had a brief reference to this). And I think it's possible that some of these are in a better position to make sense of certain morality-related things, especially obligation-flavored ones, than the individually-focused subjectivism considered here (I gesture a little at something in this vicinity at the end of this post). I wanted a narrower focus in this post, though.

Replies from: TAG

↑ comment by TAG · 2021-06-24T11:27:23.923Z · LW(p) · GW(p)

Ok. I'm glad you noticed, in the linked post, that utilitarianism doesn't have a decent model of obligation.

↑ comment by Charlie Steiner · 2021-06-23T02:13:27.872Z · LW(p) · GW(p)

Now I'm trying to recall a reference. Was there a LW post in the last few years about treating society, rather than individuals, as the subject of value learning? Maybe also something about how non-western societies are less likely to put individual values as paramount?

Replies from: rohinmshah

↑ comment by Rohin Shah (rohinmshah) · 2021-06-27T14:46:18.496Z · LW(p) · GW(p)

This one [LW · GW]?

Replies from: Charlie Steiner

↑ comment by Charlie Steiner · 2021-06-27T17:21:40.692Z · LW(p) · GW(p)

Yes!

comment by Svyatoslav Usachev (svyatoslav-usachev-1) · 2021-06-24T17:21:08.569Z · LW(p) · GW(p)