The Meaning of Right

eliezer_yudkowsky

The Meaning of Right

post by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2008-07-29T01:28:03.000Z · LW · GW · Legacy · 156 comments

156 comments

Continuation of: Changing Your Metaethics, Setting Up Metaethics
Followup to: Does Your Morality Care What You Think?, The Moral Void, Probability is Subjectively Objective, Could Anything Be Right?, The Gift We Give To Tomorrow, Rebelling Within Nature, Where Recursive Justification Hits Bottom, ...

(The culmination of a long series of Overcoming Bias posts; if you start here, I accept no responsibility for any resulting confusion, misunderstanding, or unnecessary angst.)

What is morality? What does the word "should", mean? The many pieces are in place: This question I shall now dissolve.

The key—as it has always been, in my experience so far—is to understand how a certain cognitive algorithm feels from inside. Standard procedure for righting a wrong question: If you don't know what right-ness is, then take a step beneath and ask how your brain labels things "right".

It is not the same question—it has no moral aspects to it, being strictly a matter of fact and cognitive science. But it is an illuminating question. Once we know how our brain labels things "right", perhaps we shall find it easier, afterward, to ask what is really and truly right.

But with that said—the easiest way to begin investigating that question, will be to jump back up to the level of morality and ask what seems right. And if that seems like too much recursion, get used to it—the other 90% of the work lies in handling recursion properly.

(Should you find your grasp on meaningfulness wavering, at any time following, check Changing Your Metaethics for the appropriate prophylactic.)

So! In order to investigate how the brain labels things "right", we are going to start out by talking about what is right. That is, we'll start out wearing our morality-goggles, in which we consider morality-as-morality and talk about moral questions directly. As opposed to wearing our reduction-goggles, in which we talk about cognitive algorithms and mere physics. Rigorously distinguishing between these two views is the first step toward mating them together.

As a first step, I offer this observation, on the level of morality-as-morality: Rightness is contagious backward in time.

Suppose there is a switch, currently set to OFF, and it is morally desirable for this switch to be flipped to ON. Perhaps the switch controls the emergency halt on a train bearing down on a child strapped to the railroad tracks, this being my canonical example. If this is the case, then, ceteris paribus and presuming the absence of exceptional conditions or further consequences that were not explicitly specified, we may consider it right that this switch should be flipped.

If it is right to flip the switch, then it is right to pull a string that flips the switch. If it is good to pull a string that flips the switch, it is right and proper to press a button that pulls the string: Pushing the button seems to have more should-ness than not pushing it.

It seems that—all else being equal, and assuming no other consequences or exceptional conditions which were not specified—value flows backward along arrows of causality.

Even in deontological moralities, if you're obligated to save the child on the tracks, then you're obligated to press the button. Only very primitive AI systems have motor outputs controlled by strictly local rules that don't model the future at all. Duty-based or virtue-based ethics are only slightly less consequentialist than consequentialism. It's hard to say whether moving your arm left or right is more virtuous without talking about what happens next.

Among my readers, there may be some who presently assert—though I hope to persuade them otherwise—that the life of a child is of no value to them. If so, they may substitute anything else that they prefer, at the end of the switch, and ask if they should press the button.

But I also suspect that, among my readers, there are some who wonder if the true morality might be something quite different from what is presently believed among the human kind. They may find it imaginable—plausible?—that human life is of no value, or negative value. They may wonder if the goodness of human happiness, is as much a self-serving delusion as the justice of slavery.

I myself was once numbered among these skeptics, because I was always very suspicious of anything that looked self-serving.

Now here's a little question I never thought to ask, during those years when I thought I knew nothing about morality:

Could make sense to have a morality in which, if we should save the child from the train tracks, then we should not flip the switch, should pull the string, and should not push the button, so that, finally, we do not push the button?

Or perhaps someone says that it is better to save the child, than to not save them; but doesn't see why anyone would think this implies it is better to press the button than not press it. (Note the resemblance to the Tortoise who denies modus ponens.)

It seems imaginable, to at least some people, that entirely different things could be should. It didn't seem nearly so imaginable, at least to me, that should-ness could fail to flow backward in time. When I was trying to question everything else, that thought simply did not occur to me.

Can you question it? Should you?

Every now and then, in the course of human existence, we question what should be done and what is right to do, what is better or worse; others come to us with assertions along these lines, and we question them, asking "Why is it right?" Even when we believe a thing is right (because someone told us that it is, or because we wordlessly feel that it is) we may still question why it is right.

Should-ness, it seems, flows backward in time. This gives us one way to question why or whether a particular event has the should-ness property. We can look for some consequence that has the should-ness property. If so, the should-ness of the original event seems to have been plausibly proven or explained.

Ah, but what about the consequence—why is it should? Someone comes to you and says, "You should give me your wallet, because then I'll have your money, and I should have your money." If, at this point, you stop asking questions about should-ness, you're vulnerable to a moral mugging.

So we keep asking the next question. Why should we press the button? To pull the string. Why should we pull the string? To flip the switch. Why should we flip the switch? To pull the child from the railroad tracks. Why pull the child from the railroad tracks? So that they live. Why should the child live?

Now there are people who, caught up in the enthusiasm, go ahead and answer that question in the same style: for example, "Because the child might eventually grow up and become a trade partner with you," or "Because you will gain honor in the eyes of others," or "Because the child may become a great scientist and help achieve the Singularity," or some such. But even if we were to answer in this style, it would only beg the next question.

Even if you try to have a chain of should stretching into the infinite future—a trick I've yet to see anyone try to pull, by the way, though I may be only ignorant of the breadths of human folly—then you would simply ask "Why that chain rather than some other?"

Another way that something can be should, is if there's a general rule that makes it should. If your belief pool starts out with the general rule "All children X: It is better for X to live than to die", then it is quite a short step to "It is better for Stephanie to live than to die". Ah, but why save all children? Because they may all become trade partners or scientists? But then where did that general rule come from?

If should-ness only comes from should-ness—from a should-consequence, or from a should-universal—then how does anything end up should in the first place?

Now human beings have argued these issues for thousands of years and maybe much longer. We do not hesitate to continue arguing when we reach a terminal value (something that has a charge of should-ness independently of its consequences). We just go on arguing about the universals.

I usually take, as my archetypal example, the undoing of slavery: Somehow, slaves' lives went from having no value to having value. Nor do I think that, back at the dawn of time, anyone was even trying to argue that slaves were better off being slaves (as it would be latter argued). They'd probably have looked at you like you were crazy if you even tried. Somehow, we got from there, to here...

And some of us would even hold this up as a case of moral progress, and look at our ancestors as having made a moral error. Which seems easy enough to describe in terms of should-ness: Our ancestors thought that they should enslave defeated enemies, but they were mistaken.

But all our philosophical arguments ultimately seem to ground in statements that no one has bothered to justify—except perhaps to plead that they are self-evident, or that any reasonable mind must surely agree, or that they are a priori truths, or some such. Perhaps, then, all our moral beliefs are as erroneous as that old bit about slavery? Perhaps we have entirely misperceived the flowing streams of should?

So I once believed was plausible; and one of the arguments I wish I could go back and say to myself, is, "If you know nothing at all about should-ness, then how do you know that the procedure, 'Do whatever Emperor Ming says' is not the entirety of should-ness? Or even worse, perhaps, the procedure, 'Do whatever maximizes inclusive genetic fitness' or 'Do whatever makes you personally happy'." The point here would have been to make my past self see that in rejecting these rules, he was asserting a kind of knowledge—that to say, "This is not morality," he must reveal that, despite himself, he knows something about morality or meta-morality. Otherwise, the procedure "Do whatever Emperor Ming says" would seem just as plausible, as a guiding principle, as his current path of "Rejecting things that seem unjustified." Unjustified—according to what criterion of justification? Why trust the principle that says that moral statements need to be justified, if you know nothing at all about morality?

What indeed would distinguish, at all, the question "What is right?" from "What is wrong?"

What is "right", if you can't say "good" or "desirable" or "better" or "preferable" or "moral" or "should"? What happens if you try to carry out the operation of replacing the symbol with what it stands for?

If you're guessing that I'm trying to inveigle you into letting me say: "Well, there are just some things that are baked into the question, when you start asking questions about morality, rather than wakalixes or toaster ovens", then you would be right. I'll be making use of that later, and, yes, will address "But why should we ask that question?"

Okay, now: morality-goggles off, reduction-goggles on.

Those who remember Possibility and Could-ness, or those familiar with simple search techniques in AI, will realize that the "should" label is behaving like the inverse of the "could" label, which we previously analyzed in terms of "reachability". Reachability spreads forward in time: if I could reach the state with the button pressed, I could reach the state with the string pulled; if I could reach the state with the string pulled, I could reach the state with the switch flipped.

Where the "could" label and the "should" label collide, the algorithm produces a plan.

Now, as I say this, I suspect that at least some readers may find themselves fearing that I am about to reduce should-ness to a mere artifact of a way that a planning system feels from inside. Once again I urge you to check Changing Your Metaethics, if this starts to happen. Remember above all the Moral Void: Even if there were no morality, you could still choose to help people rather than hurt them. This, above all, holds in place what you hold precious, while your beliefs about the nature of morality change.

I do not intend, with this post, to take away anything of value; it will all be given back before the end.

Now this algorithm is not very sophisticated, as AI algorithms go, but to apply it in full generality—to learned information, not just ancestrally encountered, genetically programmed situations—is a rare thing among animals. Put a food reward in a transparent box. Put the matching key, which looks unique and uniquely corresponds to that box, in another transparent box. Put the unique key to that box in another box. Do this with five boxes. Mix in another sequence of five boxes that doesn't lead to a food reward. Then offer a choice of two keys, one of which starts the sequence of five boxes leading to food, one of which starts the sequence leading nowhere.

Chimpanzees can learn to do this, but so far as I know, no non-primate species can pull that trick.

And as smart as chimpanzees are, they are not quite as good as humans at inventing plans—plans such as, for example, planting in the spring to harvest in the fall.

So what else are humans doing, in the way of planning?

It is a general observation that natural selection seems to reuse existing complexity, rather than creating things from scratch, whenever it possibly can—though not always in the same way that a human engineer would. It is a function of the enormous time required for evolution to create machines with many interdependent parts, and the vastly shorter time required to create a mutated copy of something already evolved.

What else are humans doing? Quite a bit, and some of it I don't understand—there are plans humans make, that no modern-day AI can.

But one of the things we are doing, is reasoning about "right-ness" the same way we would reason about any other observable property.

Are animals with bright colors often poisonous? Does the delicious nid-nut grow only in the spring? Is it usually a good idea to take with a waterskin on long hunts?

It seems that Martha and Fred have an obligation to take care of their child, and Jane and Bob are obligated to take care of their child, and Susan and Wilson have a duty to care for their child. Could it be that parents in general must take care of their children?

By representing right-ness as an attribute of objects, you can recruit a whole previously evolved system that reasons about the attributes of objects. You can save quite a lot of planning time, if you decide (based on experience) that in general it is a good idea to take a waterskin on hunts, from which it follows that it must be a good idea to take a waterskin on hunt #342.

Is this damnable for a Mind Projection Fallacy—treating properties of the mind as if they were out there in the world?

Depends on how you look at it.

This business of, "It's been a good idea to take waterskins on the last three hunts, maybe it's a good idea in general, if so it's a good idea to take a waterskin on this hunt", does seem to work.

Let's say that your mind, faced with any countable set of objects, automatically and perceptually tagged them with their remainder modulo 5. If you saw a group of 17 objects, for example, they would look remainder-2-ish. Though, if you didn't have any notion of what your neurons were doing, and perhaps no notion of modulo arithmetic, you would only see that the group of 17 objects had the same remainder-ness as a group of 2 objects. You might not even know how to count—your brain doing the whole thing automatically, subconsciously and neurally—in which case you would just have five different words for the remainder-ness attributes that we would call 0, 1, 2, 3, and 4.

If you look out upon the world you see, and guess that remainder-ness is a separate and additional attribute of things—like the attribute of having an electric charge—or like a tiny little XML tag hanging off of things—then you will be wrong. But this does not mean it is nonsense to talk about remainder-ness, or that you must automatically commit the Mind Projection Fallacy in doing so. So long as you've got a well-defined way to compute a property, it can have a well-defined output and hence an empirical truth condition.

If you're looking at 17 objects, then their remainder-ness is, indeed and truly, 2, and not 0, 3, 4, or 1. If I tell you, "Those red things you told me to look at are remainder-2-ish", you have indeed been told a falsifiable and empirical property of those red things. It is just not a separate, additional, physically existent attribute.

And as for reasoning about derived properties, and which other inherent or derived properties they correlate to—I don't see anything inherently fallacious about that.

One may notice, for example, that things which are 7 modulo 10 are often also 2 modulo 5. Empirical observations of this sort play a large role in mathematics, suggesting theorems to prove. (See Polya's How To Solve It.)

Indeed, virtually all the experience we have, is derived by complicated neural computations from the raw physical events impinging on our sense organs. By the time you see anything, it has been extensively processed by the retina, lateral geniculate nucleus, visual cortex, parietal cortex, and temporal cortex, into a very complex sort of derived computational property.

If you thought of a property like redness as residing strictly in an apple, you would be committing the Mind Projection Fallacy. The apple's surface has a reflectance which sends out a mixture of wavelengths that impinge on your retina and are processed with respect to ambient light to extract a summary color of red... But if you tell me that the apple is red, rather than green, and make no claims as to whether this is an ontologically fundamental physical attribute of the apple, then I am quite happy to agree with you.

So as long as there is a stable computation involved, or a stable process—even if you can't consciously verbalize the specification—it often makes a great deal of sense to talk about properties that are not fundamental. And reason about them, and remember where they have been found in the past, and guess where they will be found next.

(In retrospect, that should have been a separate post in the Reductionism sequence. "Derived Properties", or "Computational Properties" maybe. Oh, well; I promised you morality this day, and this day morality you shall have.)

Now let's say we want to make a little machine, one that will save the lives of children. (This enables us to save more children than we could do without a machine, just like you can move more dirt with a shovel than by hand.) The machine will be a planning machine, and it will reason about events that may or may not have the property, leads-to-child-living.

A simple planning machine would just have a pre-made model of the environmental process. It would search forward from its actions, applying a label that we might call "reachable-from-action-ness", but which might as well say "Xybliz" internally for all that it matters to the program. And it would search backward from scenarios, situations, in which the child lived, labeling these "leads-to-child-living". If situation X leads to situation Y, and Y has the label "leads-to-child-living"—which might just be a little flag bit, for all the difference it would make—then X will inherit the flag from Y. When the two labels meet in the middle, the leads-to-child-living flag will quickly trace down the stored path of reachability, until finally some particular sequence of actions ends up labeled "leads-to-child-living". Then the machine automatically executes those actions—that's just what the machine does.

Now this machine is not complicated enough to feel existential angst. It is not complicated enough to commit the Mind Projection Fallacy. It is not, in fact, complicated enough to reason abstractly about the property "leads-to-child-living-ness". The machine—as specified so far—does not notice if the action "jump in the air" turns out to always have this property, or never have this property. If "jump in the air" always led to situations in which the child lived, this could greatly simplify future planning—but only if the machine were sophisticated enough to notice this fact and use it.

If it is a fact that "jump in the air" "leads-to-child-living-ness", this fact is composed of empirical truth and logical truth. It is an empirical truth that if the world is such that if you perform the (ideal abstract) algorithm "trace back from situations where the child lives", then it will be a logical truth about the output of this (ideal abstract) algorithm that it labels the "jump in the air" action.

(You cannot always define this fact in entirely empirical terms, by looking for the physical real-world coincidence of jumping and child survival. It might be that "stomp left" also always saves the child, and the machine in fact stomps left. In which case the fact that jumping in the air would have saved the child, is a counterfactual extrapolation.)

Okay, now we're ready to bridge the levels.

As you must surely have guessed by now, this should-ness stuff is how the human decision algorithm feels from inside. It is not an extra, physical, ontologically fundamental attribute hanging off of events like a tiny little XML tag.

But it is a moral question what we should do about that—how we should react to it.

To adopt an attitude of complete nihilism, because we wanted those tiny little XML tags, and they're not physically there, strikes me as the wrong move. It is like supposing that the absence of an XML tag, equates to the XML tag being there, saying in its tiny brackets what value we should attach, and having value zero. And then this value zero, in turn, equating to a moral imperative to wear black, feel awful, write gloomy poetry, betray friends, and commit suicide.

No.

So what would I say instead?

The force behind my answer is contained in The Moral Void and The Gift We Give To Tomorrow. I would try to save lives "even if there were no morality", as it were.

And it seems like an awful shame to—after so many millions and hundreds of millions of years of evolution—after the moral miracle of so much cutthroat genetic competition producing intelligent minds that love, and hope, and appreciate beauty, and create beauty—after coming so far, to throw away the Gift of morality, just because our brain happened to represent morality in such fashion as to potentially mislead us when we reflect on the nature of morality.

This little accident of the Gift doesn't seem like a good reason to throw away the Gift; it certainly isn't a inescapable logical justification for wearing black.

Why not keep the Gift, but adjust the way we reflect on it?

So here's my metaethics:

I earlier asked,

What is "right", if you can't say "good" or "desirable" or "better" or "preferable" or "moral" or "should"? What happens if you try to carry out the operation of replacing the symbol with what it stands for?

I answer that if you try to replace the symbol "should" with what it stands for, you end up with quite a large sentence.

For the much simpler save-life machine, the "should" label stands for leads-to-child-living-ness.

For a human this is a much huger blob of a computation that looks like, "Did everyone survive? How many people are happy? Are people in control of their own lives? ..." Humans have complex emotions, have many values—the thousand shards of desire, the godshatter of natural selection. I would say, by the way, that the huge blob of a computation is not just my present terminal values (which I don't really have—I am not a consistent expected utility maximizers); the huge blob of a computation includes the specification of those moral arguments, those justifications, that would sway me if I heard them. So that I can regard my present values, as an approximation to the ideal morality that I would have if I heard all the arguments, to whatever extent such an extrapolation is coherent.

No one can write down their big computation; it is not just too large, it is also unknown to its user. No more could you print out a listing of the neurons in your brain. You never mention your big computation—you only use it, every hour of every day.

Now why might one identify this enormous abstract computation, with what-is-right?

If you identify rightness with this huge computational property, then moral judgments are subjunctively objective (like math), subjectively objective (like probability), and capable of being true (like counterfactuals).

You will find yourself saying, "If I wanted to kill someone—even if I thought it was right to kill someone—that wouldn't make it right." Why? Because what is right is a huge computational property—an abstract computation—not tied to the state of anyone's brain, including your own brain.

This distinction was introduced earlier in 2-Place and 1-Place Words. We can treat the word "sexy" as a 2-place function that goes out and hoovers up someone's sense of sexiness, and then eats an object of admiration. Or we can treat the word "sexy" as meaning a 1-place function, a particular sense of sexiness, like Sexiness_20934, that only accepts one argument, an object of admiration.

Here we are treating morality as a 1-place function. It does not accept a person as an argument, spit out whatever cognitive algorithm they use to choose between actions, and then apply that algorithm to the situation at hand. When I say right, I mean a certain particular 1-place function that just asks, "Did the child live? Did anyone else get killed? Are people happy? Are they in control of their own lives? Has justice been served?" ... and so on through many, many other elements of rightness. (And perhaps those arguments that might persuade me otherwise, which I have not heard.)

Hence the notion, "Replace the symbol with what it stands for."

Since what's right is a 1-place function, if I subjunctively imagine a world in which someone has slipped me a pill that makes me want to kill people, then, in this subjunctive world, it is not right to kill people. That's not merely because I'm judging with my current brain. It's because when I say right, I am referring to a 1-place function. Rightness doesn't go out and hoover up the current state of my brain, in this subjunctive world, before producing the judgment "Oh, wait, it's now okay to kill people." When I say right, I don't mean "that which my future self wants", I mean the function that looks at a situation and asks, "Did anyone get killed? Are people happy? Are they in control of their own lives? ..."

And once you've defined a particular abstract computation that says what is right—or even if you haven't defined it, and it's computed in some part of your brain you can't perfectly print out, but the computation is stable—more or less—then as with any other derived property, it makes sense to speak of a moral judgment being true. If I say that today was a good day, you've learned something empirical and falsifiable about my day—if it turns out that actually my grandmother died, you will suspect that I was originally lying.

The apparent objectivity of morality has just been explained—and not explained away. For indeed, if someone slipped me a pill that made me want to kill people, nonetheless, it would not be right to kill people. Perhaps I would actually kill people, in that situation—but that is because something other than morality would be controlling my actions.

Morality is not just subjunctively objective, but subjectively objective. I experience it as something I cannot change. Even after I know that it's myself who computes this 1-place function, and not a rock somewhere—even after I know that I will not find any star or mountain that computes this function, that only upon me is it written—even so, I find that I wish to save lives, and that even if I could change this by an act of will, I would not choose to do so. I do not wish to reject joy, or beauty, or freedom. What else would I do instead? I do not wish to reject the Gift that natural selection accidentally barfed into me. This is the principle of The Moral Void and The Gift We Give To Tomorrow.

Our origins may seem unattractive, our brains untrustworthy.

But love has to enter the universe somehow, starting from non-love, or love cannot enter time.

And if our brains are untrustworthy, it is only our own brains that say so. Do you sometimes think that human beings are not very nice? Then it is you, a human being, who says so. It is you, a human being, who judges that human beings could do better. You will not find such written upon the stars or the mountains: they are not minds, they cannot think.

In this, of course, we find a justificational strange loop through the meta-level. Which is unavoidable so far as I can see—you can't argue morality, or any kind of goal optimization, into a rock. But note the exact structure of this strange loop: there is no general moral principle which says that you should do what evolution programmed you to do. There is, indeed, no general principle to trust your moral intuitions! You can find a moral intuition within yourself, describe it—quote it—consider it deliberately and in the full light of your entire morality, and reject it, on grounds of other arguments. What counts as an argument is also built into the rightness-function.

Just as, in the strange loop of rationality, there is no general principle in rationality to trust your brain, or to believe what evolution programmed you to believe—but indeed, when you ask which parts of your brain you need to rebel against, you do so using your current brain. When you ask whether the universe is simple, you can consider the simple hypothesis that the universe's apparent simplicity is explained by its actual simplicity.

Rather than trying to unwind ourselves into rocks, I proposed that we should use the full strength of our current rationality, in reflecting upon ourselves—that no part of ourselves be immune from examination, and that we use all of ourselves that we currently believe in to examine it.

You would do the same thing with morality; if you consider that a part of yourself might be considered harmful, then use your best current guess at what is right, your full moral strength, to do the considering. Why should we want to unwind ourselves to a rock? Why should we do less than our best, when reflecting? You can't unwind past Occam's Razor, modus ponens, or morality and it's not clear why you should try.

For any part of rightness, you can always imagine another part that overrides it—it would not be right to drag the child from the train tracks, if this resulted in everyone on Earth becoming unable to love—or so I would judge. For every part of rightness you examine, you will find that it cannot be the sole and perfect and only criterion of rightness. This may lead to the incorrect inference that there is something beyond, some perfect and only criterion from which all the others are derived—but that does not follow. The whole is the sum of the parts. We ran into an analogous situation with free will, where no part of ourselves seems perfectly decisive.

The classic dilemma for those who would trust their moral intuitions, I believe, is the one who says: "Interracial marriage is repugnant—it disgusts me—and that is my moral intuition!" I reply, "There is no general rule to obey your intuitions. You just mentioned intuitions, rather than using them. Very few people have legitimate cause to mention intuitions—Friendly AI programmers, for example, delving into the cognitive science of things, have a legitimate reason to mention them. Everyone else just has ordinary moral arguments, in which they use their intuitions, for example, by saying, 'An interracial marriage doesn't hurt anyone, if both parties consent'. I do not say, 'And I have an intuition that anything consenting adults do is right, and all intuitions must be obeyed, therefore I win.' I just offer up that argument, and any others I can think of, to weigh in the balance."

Indeed, evolution that made us cannot be trusted—so there is no general principle to trust it! Rightness is not defined in terms of automatic correspondence to any possible decision we actually make—so there's no general principle that says you're infallible! Just do what is, ahem, right—to the best of your ability to weigh the arguments you have heard, and ponder the arguments you may not have heard.

If you were hoping to have a perfectly trustworthy system, or to have been created in correspondence with a perfectly trustworthy morality—well, I can't give that back to you; but even most religions don't try that one. Even most religions have the human psychology containing elements of sin, and even most religions don't actually give you an effectively executable and perfect procedure, though they may tell you "Consult the Bible! It always works!"

If you hoped to find a source of morality outside humanity—well, I can't give that back, but I can ask once again: Why would you even want that? And what good would it do? Even if there were some great light in the sky—something that could tell us, "Sorry, happiness is bad for you, pain is better, now get out there and kill some babies!"—it would still be your own decision to follow it. You cannot evade responsibility.

There isn't enough mystery left to justify reasonable doubt as to whether the causal origin of morality is something outside humanity. We have evolutionary psychology. We know where morality came from. We pretty much know how it works, in broad outline at least. We know there are no little XML value tags on electrons (and indeed, even if you found them, why should you pay attention to what is written there?)

If you hoped that morality would be universalizable—sorry, that one I really can't give back. Well, unless we're just talking about humans. Between neurologically intact humans, there is indeed much cause to hope for overlap and coherence; and a great and reasonable doubt as to whether any present disagreement is really unresolvable, even it seems to be about "values". The obvious reason for hope is the psychological unity of humankind, and the intuitions of symmetry, universalizability, and simplicity that we execute in the course of our moral arguments. (In retrospect, I should have done a post on Interpersonal Morality before this...)

If I tell you that three people have found a pie and are arguing about how to divide it up, the thought "Give one-third of the pie to each" is bound to occur to you—and if the three people are humans, it's bound to occur to them, too. If one of them is a psychopath and insists on getting the whole pie, though, there may be nothing for it but to say: "Sorry, fairness is not 'what everyone thinks is fair', fairness is everyone getting a third of the pie". You might be able to resolve the remaining disagreement by politics and game theory, short of violence—but that is not the same as coming to agreement on values. (Maybe you could persuade the psychopath that taking a pill to be more human, if one were available, would make them happier? Would you be justified in forcing them to swallow the pill? These get us into stranger waters that deserve a separate post.)

If I define rightness to include the space of arguments that move me, then when you and I argue about what is right, we are arguing our approximations to what we would come to believe if we knew all empirical facts and had a million years to think about it—and that might be a lot closer than the present and heated argument. Or it might not. This gets into the notion of 'construing an extrapolated volition' which would be, again, a separate post.

But if you were stepping outside the human and hoping for moral arguments that would persuade any possible mind, even a mind that just wanted to maximize the number of paperclips in the universe, then sorry—the space of possible mind designs is too large to permit universally compelling arguments. You are better off treating your intuition that your moral arguments ought to persuade others, as applying only to other humans who are more or less neurologically intact. Trying it on human psychopaths would be dangerous, yet perhaps possible. But a paperclip maximizer is just not the sort of mind that would be moved by a moral argument. (This will definitely be a separate post.)

Once, in my wild and reckless youth, I tried dutifully—I thought it was my duty—to be ready and willing to follow the dictates of a great light in the sky, an external objective morality, when I discovered it. I questioned everything, even altruism toward human lives, even the value of happiness. Finally I realized that there was no foundation but humanity—no evidence pointing to even a reasonable doubt that there was anything else—and indeed I shouldn't even want to hope for anything else—and indeed would have no moral cause to follow the dictates of a light in the sky, even if I found one.

I didn't get back immediately all the pieces of myself that I had tried to deprecate—it took time for the realization "There is nothing else" to sink in. The notion that humanity could just... you know... live and have fun... seemed much too good to be true, so I mistrusted it. But eventually, it sank in that there really was nothing else to take the place of beauty. And then I got it back.

So you see, it all really does add up to moral normality, very exactly in fact. You go on with the same morals as before, and the same moral arguments as before. There is no sudden Grand Overlord Procedure to which you can appeal to get a perfectly trustworthy answer. You don't know, cannot print out, the great rightness-function; and even if you could, you would not have enough computational power to search the entire specified space of arguments that might move you. You will just have to argue it out.

I suspect that a fair number of those who propound metaethics do so in order to have it add up to some new and unusual moral—else why would they bother? In my case, I bother because I am a Friendly AI programmer and I have to make a physical system outside myself do what's right; for which purpose metaethics becomes very important indeed. But for the most part, the effect of my proffered metaethic is threefold:

Anyone worried that reductionism drains the meaning from existence can stop worrying;
Anyone who was rejecting parts of their human existence based on strange metaethics—i.e., "Why should I care about others, if that doesn't help me maximize my inclusive genetic fitness?"—can welcome back all the parts of themselves that they once exiled.
You can stop arguing about metaethics, and go back to whatever ordinary moral argument you were having before then. This knowledge will help you avoid metaethical mistakes that mess up moral arguments, but you can't actually use it to settle debates unless you can build a Friendly AI.

And, oh yes—why is it right to save a child's life?

Well... you could ask "Is this event that just happened, right?" and find that the child had survived, in which case you would have discovered the nonobvious empirical fact about the world, that it had come out right.

Or you could start out already knowing a complicated state of the world, but still have to apply the rightness-function to it in a nontrivial way—one involving a complicated moral argument, or extrapolating consequences into the future—in which case you would learn the nonobvious logical / computational fact that rightness, applied to this situation, yielded thumbs-up.

In both these cases, there are nonobvious facts to learn, which seem to explain why what just happened is right.

But if you ask "Why is it good to be happy?" and then replace the symbol 'good' with what it stands for, you'll end up with a question like "Why does happiness match {happiness + survival + justice + individuality + ...}?" This gets computed so fast, that it scarcely seems like there's anything there to be explained. It's like asking "Why does 4 = 4?" instead of "Why does 2 + 2 = 4?"

Now, I bet that feels quite a bit like what happens when I ask you: "Why is happiness good?"

Right?

And that's also my answer to Moore's Open Question. Why is this big function I'm talking about, right? Because when I say "that big function", and you say "right", we are dereferencing two different pointers to the same unverbalizable abstract computation. I mean, that big function I'm talking about, happens to be the same thing that labels things right in your own brain. You might reflect on the pieces of the quotation of the big function, but you would start out by using your sense of right-ness to do it. If you had the perfect empirical knowledge to taboo both "that big function" and "right", substitute what the pointers stood for, and write out the full enormity of the resulting sentence, it would come out as... sorry, I can't resist this one... A=A.

Part of The Metaethics Sequence

Next post: "Interpersonal Morality"

Previous post: "Setting Up Metaethics"

156 comments

Comments sorted by oldest first, as this post is from before comment nesting was available (around 2009-02-27).

comment by RobinHanson · 2008-07-29T02:19:11.000Z · LW(p) · GW(p)

There is a good tradition of expecting intellectuals to summarize their positions. Even if they write long books elaborating their positions, intellectuals are still expected to write sentences that summarize their core claims. Those sentences may refer to new concepts they have elaborated elsewhere, but still the summary is important. I think you'd do well to try to write such summaries of your key positions, including this one.

You say morality is "the huge blob of a computation ... not just our present terminal values ... [but] includes the specification of those moral arguments, those justifications, that would sway us if we heard them." If you mean what would sway us to matching acts, then you mean morality is what we would want if we had thought everything through. But if you instead mean what would sway us only to assent that an act is "moral", even if we are not swayed to act that way, then there remains the question of what exactly it is that we are assenting.

Replies from: MarsColony_in10years

↑ comment by MarsColony_in10years · 2015-06-19T16:32:31.532Z · LW(p) · GW(p)

If you mean what would sway us to matching acts, then you mean morality is what we would want if we had thought everything through. But if you instead mean what would sway us only to assent that an act is "moral", even if we are not swayed to act that way, then there remains the question of what exactly it is that we are assenting.

I'm not sure how coherent the second is. I would tend to think that our beliefs and our actions would converge, if you took the limit as wisdom approached infinity. Perhaps there's no guarantee, but it seems like we would have to suffer quite a lot of cognitive dissonance in order to fully accept all parts of an infinitely wise argument that something should be done, while still doing nothing. Even just thinking and accepting such arguments is doing something. Why think in the first place?

Perhaps I'm missing something, but if I condition on the fact that such an infinitely compelling argument exists, it seems overwhelmingly likely that anyone with the values being appealed to would be strongly compelled to act. Well, at least once they had time to process the arguments and let all the details sink in. Perhaps there would be people with such radically twisted worldviews that they would have a nervous breakdown first, and some might go into total denial and never accept such an argument. (For example, if they are stuck in a happy death spiral that is a local minimum of cognitive dissonance, and also requires disbelief in evidence and arguments, thus making even a globally minimum in cognitive dissonance unappealing.) But the desire for internal consistency is a strong value in humans, so I would think that the need to drive down cognitive dissonance would eventually win out in all practical cases, given sufficient time.

Replies from: hairyfigment

↑ comment by hairyfigment · 2015-06-19T21:38:20.974Z · LW(p) · GW(p)

I should mention that even humans can make a moral judgment without being compelled to follow it. This seems to some extent like a case of the brain not working properly, but it establishes the trick is possible even for a somewhat human-like mind.

comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2008-07-29T03:16:02.000Z · LW(p) · GW(p)

I agree that it needs a summary. But I think it wiser to write first and summarize afterward - otherwise I am never quite sure what there is to summarize.

There needs to be a separate word for that subset of our values that is interpersonal, prosocial, to some extent expected to be agreed-upon, which subset does not always win out in the weighing; this subset is often also called "morality" but that would be confusing.

Replies from: ahbwramc

↑ comment by ahbwramc · 2014-03-31T15:18:42.736Z · LW(p) · GW(p)

I'm not entirely sure why, but this comment was inordinately helpful in doing away with the last vestiges of confusion about your metaethics. I don't know what I thought before reading it - of course morality would be a subset of our values, what else could it be? But somehow it made everything jump into place. I think I can now say (two years after first reading the sequence, and only through a long and gradual process) that I agree with your metaethical theory.

comment by Mike_Blume · 2008-07-29T03:27:07.000Z · LW(p) · GW(p)

Are you maybe referring to manners/etiquette/propriety?

comment by Psy-Kosh · 2008-07-29T03:42:02.000Z · LW(p) · GW(p)

Eliezer: This actually kinda sounds (almost) like something I'd been thinking for a while, except that your version added one (well, many actually, but the one is one that's useful in getting it to all add back up to normality) "dang, I should have thought" insight.

But I'm not sure if these are equivalent. Is this more or less what you were saying: "When we're talking about 'shouldness', we mean something, or at least we think we mean something. It's not something we can fully explicitly articulate, but if we could somehow fully utterly completely understand the operation of the brain, scan it, and somehow extract and process all the relevant data associated with that feeling of 'shouldness', we'd actually get a definition/defining computation/something that we could then work with to do more detailed analysis of morality, and the reason that would actually be 'the computation we should care about' is that'd actually be, well... the very bit of us that's concerned with issues like that, more or less"?

If so, I'd say that what you have is useful, but not a full metamorality. I'd call it more a "metametamorality", the metamorality would be what I'd know if I actually knew the specification of the computation, to some level of precision. To me, it seems like this answers alot, but does leave an important black box that needs to be opened. Although I concede that opening this box will be tricky. Good luck with that. :)

Anyways, I'd consider the knowledge I'd have from actually knowing a bit more about the specification of that computation a metamorality, and the outputs, well, morality.

Incidentally, the key thing that I missed that you helped me see was this: "Hey, that implicit definition of 'shouldness' sitting there in your brain structure isn't just sitting there twiddling its thumbs. Where the heck do you think your moral feelings/suspicions/intuitions are coming from? it's what's computing them, however imprecisely. So you actually can trust, at least as a starting point those moral intuitions as an approximation to what that implicit definition implies."

comment by AndyWood · 2008-07-29T04:05:09.000Z · LW(p) · GW(p)

The most important part seems to be missing. You say that shouldness is about actions and consequences. I'm with you there. You say that it is a one-place function. I take that to mean that it encompasses a specific set of values, independent of who is asking. The part that still seems to be missing is: how are we to determine what this set of values is? What if we disagree? In your conclusion, you seem to be saying that the values we are aiming for are self-evident. Are they really?

It so happens that I agree with you about things like happiness, human life, and service to others. I even know that I could elaborate in writing on the ways and reasons these things are good. But - and this is the critical missing part that I can't find - I do not know how to make these values and reasons universal! I may define "right" in terms of these values, but if another does not, then how are they "wrong" in any sense beyond their disagreement with my values? For example, what could be the reason that another one's more ruthless pursuit of his own happiness is less right?

What claim could any person or group have to landing closer to the one-place function?

Replies from: Will_Lugar

↑ comment by Will_Lugar · 2014-08-19T15:53:29.179Z · LW(p) · GW(p)

I agree with the concerns of AndyWood and others who have made similar comments, and I'll be paying attention to see whether the later installments of the metaethics sequence have answered them. Before I read them, here is my own summarized set of concerns. (I apologize if responding to a given part of a sequence before reading the later parts is bad form; please let me know if this is the case.)

Eliezer seems to assume that any two neurologically normal humans would agree on the right function if they were fully rational and sufficiently informed, citing the psychological unity of humankind as support. But even with the present degree of psychological unity, it seems to me fully possible that people's values could truly diverge in quite a few not-fully-reconcilable ways--although perhaps the divergence would be surprisingly small; I just don't know. This is, I think we mostly agree, an open question for further research to explore.

Eliezer's way of viewing morality seems like it would run into trouble if it turns out that two different people really do use two different right functions (such that even their CEVs would diverge from one another). Suppose Bob's right function basically boils down to "does it maximize preference fulfillment?" (or some other utilitarian function) and Sally's right function basically boils down to "does it follow a maxim which can be universally willed by a rational agent?" (or some other deontological function). Suppose Bob and Sally are committed to these functions even though each person is fully rational and sufficiently informed--which does not seem implausible.

In this case, the fact that each of them is using a one-place function is of no help, because they are using different one-place functions. Eliezer would then have no immediately obvious legitimate way to claim that his right function is the truer or better one.

To use a more extreme example: What if the Nazis were completely right, according to their own right function? The moral realist in me wants very much to say surely that either (a) the Nazis' right function is the same as mine, and their normative ethics were mistaken by that very standard (which is Eliezer's view, I think), or (b) the Nazis' normative ethics matched their own right function, but their right function is not merely different from our right function, but is outright inferior to it.

If (a) is false AND if we are still committed to saying the Nazis were really wrong (there is also option (c) the Nazis were not wrong; but I'd like to exhaust the alternatives before seriously considering this as possible), then we need some means of distinguishing between better right functions and crummier right functions. I have some extremely vague ideas about how to do this, but I'm very curious to see what other thinkers, including Eliezer, have come up with. If the Nazis' right function is inferior by some external standard (a standard that is really right) then what is this standard?

(Admittedly, as I understand it, the Nazis had many false beliefs about the Jews, so it may be debatable what their morality would have been if they had been fully rational and sufficiently informed.)

In summary, if we all indeed use the same right function deep down, this would be very convenient--but I worry that it is more convenient than reality really is.

comment by Jef_Allbright · 2008-07-29T04:12:04.000Z · LW(p) · GW(p)

Eliezer, it's a pleasure to see you arrive at this point. With an effective understanding of the subjective/objective aspects supporting a realistic metaethics, I look forward to your continued progress and contributions in terms of the dynamics of increasingly effective evolutionary (in the broadest sense) development for meaningful growth, promoting a model of(subjective) fine-grained, hierarchical values with increasing coherence over increasing context of meaning-making, implemts principles of (objective) instrumental action increasingly effective over increasing scope of consequences. Wash, rinse, repeat...

There's no escape from the Red Queen's race, but despite the lack of objective milestones or markers of "right", there's real progress to be made in the direction of increasing rightness.

Society has been doing pretty well at the increasingly objective model of instrumental action known commonly known as warranted scientific knowledge. Now if we could get similar focus on the challenges of values-elicitation, inductive biases, etc., leading to an increasingly effective (and coherent) model of agent values...

comment by jsalvatier · 2008-07-29T04:16:54.000Z · LW(p) · GW(p)

I second Robin's request that you summarize your positions. It helps other folks organize and think about your ideas.

comment by behemoth · 2008-07-29T04:44:29.000Z · LW(p) · GW(p)

I'm quite convinced about how you analyze the problem of what morality is and how we should think about it, up until the point about how universally it applies. I'm just not sure that 'humans different shards of god shatter' add up to the same thing across people, a point that I think would become apparent as soon as you started to specify what the huge computation actually WAS.

I would think of the output as not being a yes/no answer, but something akin to 'What percentage of human beings would agree that this was a good outcome, or be able to be thus convinced by some set of arguments?'. Some things, like saving a child's life, would receive very widespread agreement. Others, like a global Islamic caliphate or widespread promiscuous sex would have more disagreement, including potentially disagreement that cannot be resolved by presenting any conceivable argument to the parties.

The question of 'how much' each person views something as moral comes into play as well. If different people can't all be convinced of a particular outcome's morality, the question ends up seeming remarkably similar to the question in economics of how to aggregate many people's preferences for goods. Because you never observe preferences in total, you let everyone trade and express their desires through revealed preference to get a pareto solution. Here, a solution might be to assign them a certain amount of morality dollars to each outcome, let them spend as they wish, and add it all up. Like economics, there's still the question of how to allocate the initial wealth (in this case, how much to weigh the opinions of each person).

I don't know how much I'm distorting what you meant - it almost feels like we've just replaced 'morality as preference' with 'morality as aggregate preference', and I don't think that's what you had in mind.

comment by No · 2008-07-29T04:58:05.000Z · LW(p) · GW(p)

2+2=4 no matter who's measuring. Right, for myself and my family, and right, for you and yours, may not always be the same.

If the child on the tracks were a bully who had been torturing my own child (which actions I had previously been powerless to prevent by any acceptable means afforded by my society, and assuming I had exhausted all reasonable alternatives), it might very well feel right to let the bully be annihilated by the locomotive.

Right is reducible as an aggregation of sympathetic conditioning; affection for a person, attachment to conceptualization or expected or desired course of events, and so on.

comment by Tom_McCabe2 · 2008-07-29T05:09:32.000Z · LW(p) · GW(p)

Wow, there's a lot of ground to cover. For everyone who hasn't read Eliezer's previous writings, he talks about something very similar in Creating Friendly Artificial Intelligence, all the way back in 2001 (link = http://www.singinst.org/upload/CFAI/design/structure/external.html). With reference to Andy Wood's comment:

"What claim could any person or group have to landing closer to the one-place function?"

Next obvious question: For purposes of Friendly AI, and for correcting mistaken intuitions, how do we approximate the rightness function? How do we determine whether A(x) or B(x) is a closer approximation to Right(x)?

Next obvious answer: The rightness function can be computed by computing humanity's Coherent Extrapolated Volition, written about by Eliezer in 2004 (http://www.singinst.org/upload/CEV.html). The closer a given algorithm comes to humanity's CEV, the closer it should come to Right(x).

Note: I did not think of CFAI when I read Eliezer's previous post, although I did think of CEV as a candidate for morality's content. CFAI refers to the supergoals of agents in general, while all the previous posts referred to a tangle of stuff surrounding classic philosophical ideas of morality, so I didn't connect the dots.

comment by Nick_Tarleton · 2008-07-29T05:25:43.000Z · LW(p) · GW(p)

Bravo. But:

Because when I say "that big function", and you say "right", we are dereferencing two different pointers to the same unverbalizable abstract computation.

No, the other person is dereferencing a pointer to their big function, which may or may not be the same as yours. This is the one place it doesn't add up to normality: not everyone need have the same function. Eliezer-rightness is objective, a one-place function, but it seems to me the ordinary usage of "right" goes further: it's assumed that everybody means the same thing by, not just "Eliezer-right", but "right". I don't see how this metamorality allows for that, or how any sensible one could. (Not that it bothers me.)

Replies from: VAuroch

↑ comment by VAuroch · 2013-11-25T07:27:16.319Z · LW(p) · GW(p)

I believe Elizer is asserting here that "right"=CEV("Eliezer-right"); it is an extrapolation of "Eliezer-right" to "Eliezer_{perfectly-rational}-right". And he asserts, with some justification, that CEV("[Person]-right")="right" for *nearly all values of [Person].

EDIT:Obviously [Sociopath] does not fit this. s/all/nearly all/g

Replies from: ialdabaoth

↑ comment by ialdabaoth · 2013-11-25T08:25:24.127Z · LW(p) · GW(p)

Naive question: by these definitions, [sadistic sociopath] ⊆ [Person]?

Replies from: VAuroch, army1987

↑ comment by VAuroch · 2013-11-25T17:29:25.264Z · LW(p) · GW(p)

No, it isn't. Corrected my above comment to reflect that.

I'd consider sociopathy a degenerate case where morality as we understand it does not have any meaningful role in the decision-making process and never has. Any morality which asks to universalize to all humans, including sociopaths, is likely to be pure preference.

Replies from: ialdabaoth

↑ comment by ialdabaoth · 2013-11-25T21:30:49.885Z · LW(p) · GW(p)

nod

Follow-up question: What is the likelihood that different modal forms of morality are fundamental?

I.e., suppose the dichotomy presented by George Lakoff's Moral Politics turns out to describe fundamental local maxima in morality-space, which human minds can imperfectly embody.

To math this up a little, suppose the CEV of Morality Attractor 1 computes to "maximize the absolute median QALY", while the CEV of Morality Attractor 2 computes to "maximize the the amount by which the median QALY of my in-group exceeds the median QALY of all sapiences as a whole", and neither of those attractors have any particularly universal mathematical reason to favor them. Then, however the FAI searches the Morality domain for a CEV, it is equally likely to settle on some starry-eyed global Uplift as it is to produce nigh-infinite destitute subjects suffering indescribable anguish and despair so that a few Uplifted utility monsters have enough necks to rest their boots on.

And before anyone objects that Morality Attractor 2 is too appalling for anyone to seriously advocate, note that it has been the default behavior of most civilized societies for the majority of human history, so it must have SOMETHING going for it. Maybe Morality Attractor 1 just seems more accessible because it's the one advocated by the mother culture that raised us, not because it's actually what most humans tend towards as IQ/g/whatever approaches infinity.

Replies from: VAuroch

↑ comment by VAuroch · 2013-11-25T22:31:28.318Z · LW(p) · GW(p)

First, I think you are significantly off-base in your contention that Morality Attractor 2 has been implemented in most civilized societies. What you see as Attractor 2 I think is better explained by Attractor 2a: "Maximize the absolute median QALY of the in-group.", and that Clever Arguers throughout history have appealed to this desire by pointing out how much better off the in-group was, compared to a specific acceptable-target outgroup. It is, after all, much easier to provide a relative QALY surplus than an absolute QALY surplus, and our corrupted hardware is not very good at distinguishing the two. As anecdotal evidence, I consider my own morality significantly closer to 2a than 1, but definitely not similar to 2.

I would further say that it seems unlikely that the basic moral impulse is actually restricted to an arbitrary ingroup. One of those 'perfect information' aspects inherent in defining the output of the CEV would be knowing the life story of every person on the planet. Which is, if my knowledge of psychology is correct, basically an express ticket into the moral ingroup. This is why the single-child quarter-donation signs work, when appeals to the huge number of children suffering from don't.

So overall, I don't find that suggestion plausible. Someone with human-typical psychology who knew every person in existence as well as we know our friends, which is basically the postulated mind whose utility function is the CEV, would inherently value all their QALY.

↑ comment by A1987dM (army1987) · 2013-11-26T13:25:07.189Z · LW(p) · GW(p)

The mentions of "neurological damage 1" and "neurological damage 2" in this comment seem to suggest that EY would consider sadistic sociopaths No True Scotsmen for these purposes.

comment by GBM · 2008-07-29T05:29:44.000Z · LW(p) · GW(p)

I'm going to need some help with this one.

It seems to me that the argument goes like this, at first:

There is a huge blob of computation; it is a 1-place function; it is identical to right.
This computation balances various values.
Our minds approximate that computation.

Even this little bit creates a lot of questions. I've been following Eliezer's writings for the past little while, although I may well have missed some key point.

Why is this computation a 1-place function? Eliezer says at first "Here we are treating morality as a 1-place function." and then jumps to "Since what's right is a 1-place function..." without justifying that status.

What values does this computation balance? Why those values?

What reason do we have to believe that our minds approximate that computation?

Sorry if these are extremely basic questions that have been answered in other places, or even in this article - I'm trying and having a difficult time with understanding how Eliezer's argument goes past these issues. Any help would be appreciated.

Replies from: TAG

↑ comment by TAG · 2023-05-31T14:40:53.012Z · LW(p) · GW(p)

Maybe what's really really right is an idealised form of the Big Blob of Computation. That would be moral realism (or a least species-level relativism).

Maybe it isn't and everybody's personal BBoC is where the moral buck stops. That would be subjectivism.

Those are two standard positions in metaethics. Nothing has been solved, because we don't know which one is right, and nothing has been dissolved. The traditional problem has just been restated in more sciencey terms.

comment by Tom_McCabe2 · 2008-07-29T05:59:33.000Z · LW(p) · GW(p)

"You will find yourself saying, "If I wanted to kill someone - even if I thought it was right to kill someone - that wouldn't make it right." Why? Because what is right is a huge computational property- an abstract computation - not tied to the state of anyone's brain, including your own brain."

Coherent Extrapolated Volition (or any roughly similar system) protects against this failure for any specific human, but not in general. Eg., suppose that you use various lawmaking processes to approximate Right(x), and then one person tries to decide independently that Right(Murder) > 0. You can detect the mismatch between the person's actions and Right(x) by checking against the approximation (the legal code) and finding that murder is wrong. In the limit of the approximation, you can detect even mismatches that people at the time wouldn't notice (eg., slavery). CEV also protects against specific kinds of group failures, eg., convince everybody that the Christian God exists and that the Bible is literally accurate, and CEV will correct for it by replacing the false belief of "God is real" with the true belief of "God is imaginary", and then extrapolating the consequences.

However, CEV can't protect against features of human cognitive architecture that are consistent under reflection, factual accuracy, etc. Suppose that, tomorrow, you used magical powers to rewrite large portions of everyone's brain. You would expect that people now take actions with lower values of Right(x) than they previously did. But, now, there's no way to determine the value of anything under Right(x) as we currently understand it. You can't use previous records (these have all been changed, by act of magic), and you can't use human intuition (as it too has been changed). So while the external Right(x) still exists somewhere out in thingspace, it's a moot point, as nobody can access it. This wouldn't work for, say, arithmetic, as people would rapidly discover that assuming 2 + 2 = 5 in engineering calculations makes bridges fall down.

Replies from: MarsColony_in10years

↑ comment by MarsColony_in10years · 2015-06-19T19:04:06.572Z · LW(p) · GW(p)

This looks correct to me. CEV(my_morality) = CEV(your_morality) = CEV(yudkowsky_morality), because psychologically normal humans all have different extrapolations of the same basic moral fundamentals. We've all been handed the same moral foundation by evolution, unless we are mentally damaged in certain very specific ways.

However, CEV(human_morality) ≠ CEV(klingon_morality) ≠ CEV(idiran_morality). There's no reason for morality to be generalizable beyond psychologically normal humans, since any other species would have been handed at least moderately different moral foundations, even if there happened to be some convergent evolution or something.

comment by TGGP4 · 2008-07-29T07:09:49.000Z · LW(p) · GW(p)

This little accident of the Gift doesn't seem like a good reason to throw away the Gift We've been "gifted" with impulses to replicate our genes, but many of us elect not to. I'm not as old as Steven Pinker is when he seemingly bragged of it, but I've made no progress toward reproducing and don't have any plans for it in the immediate future, though I could easily donate to a sperm bank. I could engage in all sorts of fitness lowering activities like attending grad-school, becoming a Jainist monk, engaging in amputative body-modification or committing suicide. People created by evolution do that every day, just as they kill, rob and rape.

Now let's say we want to make a little machine, one that will save the lives of children Does it take into account the length, quantity or quality of lives when making tradeoffs between saving lives? If it seeks to avoid death will it commit to the apocalyptic imperative as anti-natalism would seem to me to suggest? Does it seek to save fetuses or ensure that a minimum of sperm and eggs die childless? Some of these questions a machine will have to decide and there is no decision simply coming from the axiom you gave. That's because there is no correct answer, no fact of the matter.

And then this value zero, in turn, equating to a moral imperative to wear black, feel awful, write gloomy poetry, betray friends, and commit suicide. None of the moral skeptics here embrace such a position. I and many others deny ANY MORAL IMPERATIVE WHATSOEVER. That I don't wear all black, write any poetry, push my reputation among my friends below "no more reliable than average", or intentionally harm myself is simply what I've elected to do so far and I reserve the option of pursuing all those "nihilist" behaviors in the future for any reason as unjustified as a coin flip.

it certainly isn't a inescapable logical justification for wearing black. There's none necessary. If you wear black you won't violate anyones rights. There are none to violate.

the ideal morality that we would have if we heard all the arguments, to whatever extent such an extrapolation is coherent. You've questionably asserted the existence of "moral error". You also know that people have cognitive biases that cause them go off in crazy divergent directions when exposed to more and more of the same arguments. I would hypothesize that the asymptotic direction the human brain would go in about an unresolved positivist question in the absence of empirical evidence is way off, if simply because the brain isn't designed with singularities in mind. I wouldn't hold up as ideal behavior the output of a program given an asymtote of input. It's liable to crash. You might respond that the ideal computer the program would run on would have an infinite memory or disk, but that would be a different computer. Should I defer to another human being similar to me but with a seemingly infinite memory (you hear about these savants every once in a while)? I can't say. I do know that if the computer could genuinely prove that if I heard all the arguments I'd devote my life to cleaning outhouses, I'd say so much the worse for the version of me that's heard all the arguments. He's not in charge, I am.

Also, how many people can you name who engaged in seriously reprehensible actions and changed their ways because of a really good ethical argument? I know that the anti-slavery movement didn't count Aristotle among its converts, nor did Amnesty International convince Hitler or Stalin. We may like to imagine our beliefs are so superior that they would convince those old baddies, but I doubt we could. If Carlyle were brought to the present I bet he'd dismay at what a bleeding-heart he'd been and widen his circle of those fit for the whip.

you just mentioned intuitions, rather than using them The intuition led to the belief. What is the distinction. "It is my intuition that removing the child from the tracks is repugnant" - "You just mentioned rather than used intuitions".

Tom McCabe: humanity's Coherent Extrapolated Volition I think Arrow's Impossibility Theorem would argue against that being meaningful when applied to "humanity".

comment by steven · 2008-07-29T07:57:35.000Z · LW(p) · GW(p)

I found this post a lot more enlightening than the posts that it's a followup to.

TGGP, as far as I understand, Arrow's theorem is an artifact of forcing people to send only ordinal information in voting (and enforcing IIA which throws away that information on the strength of preferences between two alternative which is available from rankings relative to third alternatives). People voting strategically isn't an issue either when you're extrapolating them and reading off their opinions.

comment by steven · 2008-07-29T07:59:10.000Z · LW(p) · GW(p)

"alternative" -> "alternatives"

comment by pdf23ds · 2008-07-29T08:14:57.000Z · LW(p) · GW(p)

I think lots of people are misunderstanding the "1-place function" bit. It even took me a bit to understand, and I'm familiar with the functional programming roots of the analogy. The idea is that the "1-place morality" is a closure over (i.e. reference to) the 2-place function with arguments "person, situation" that implicitly includes the "person" argument. The 1-place function that you use references yourself. So the "1-place function" is one's subjective morality, and not some objective version. I think that could have been a lot clearer in the post. Not everyone has studied Lisp, Scheme, or Haskell.

Overall I'm a bit disappointed. I thought I was going to learn something. Although you did resolve some confusion I had about the metacircular parts of the reasoning, my conclusions are all the same. Perhaps if I were programming an FAI the explicitness of the argument would be impressive.

As other commenters have brought up, your argument doesn't address how your moral function interacts with others' functions, or how we can go about creating a social, shared morality. Granted, it's a topic for another post (or several) but you could at least acknowledge the issue.

comment by troll34 · 2008-07-29T08:37:39.000Z · LW(p) · GW(p)

Too much rhetoric (wear black, miracle, etc.), you wandered off the point three too many times, you used incoherent examples, you never actually defended realism, you never defend the assertion of "the big computation", and for that much text there was so little actually said. A poor offering.

comment by Manon_de_Gaillande · 2008-07-29T09:05:19.000Z · LW(p) · GW(p)

This argument sounds too good to be true - when you apply it to your own idea of "right". It also works for, say, a psychopath unable to feel empathy who gets a tremendous kick out of killing. How is there not a problem with that?

Replies from: MarsColony_in10years

↑ comment by MarsColony_in10years · 2015-06-19T20:45:44.515Z · LW(p) · GW(p)

Well, it isn't the same as a morality that is written into the fabric of the universe or handed down on a stone tablet or something, but it is the "best" we have or could hope to have. (whatever "best" even means, in this case) It evaluates the same (or at least the Coherent Extrapolated Volition converges) for all psychologically healthy humans. But if someone has a damaged mind, or their species simply evolved a different set of values, then they would have their own morality, and you could no more argue human morals into them than you could into a rock.

How is there not a problem with that?

Well, it certainly is a little dissatisfying. It's much better than the nihilistic alternative, though. However, those are coping problems, not problems with the logic itself. If the sky is green then I desire to believe that the sky is green.

comment by RobinHanson · 2008-07-29T10:28:29.000Z · LW(p) · GW(p)

that subset of our values that is interpersonal, prosocial, to some extent expected to be agreed-upon, which subset does not always win out in the weighing

Can we just say that evolution gave most of us such an identifiable subset, and declare a name for that? Even so, a key question remains whether we are mistaken in expecting agreement - are we built to actually agree given enough analysis and discussion, or only to mistakenly expect to agree?

comment by Roko · 2008-07-29T10:40:53.000Z · LW(p) · GW(p)

I agree with Andy Wood and Nick Tarleton. To put what they have said another way, you have taken the 2-place function

Rightness(person,act)

And replaced it with a certain unspecified unary rightness function which I will call "Eliezer's_big_computation( -- )". You have told us informally that we can approximate

Eliezer's_big_computation( X ) = happiness( X ) + survival( X ) + justice( X ) + individuality( X ) + ...

But others may define other "big computations". For example

God's_big_computation( X ) = submission( X ) + Oppression_of_women( X ) + Conquest_of_heathens( X ) + Worship_of_god( X ) + ...

How are we to decide which "big computation" encompasses that which we should pursue?

You have simply replaced the problem of deciding which actions are right with the equivalent problem of deciding which action-guiding computation we should use.

Your CEV algorithm is likely to return something more like God's_big_computation( - ) than Eliezer's_big_computation( - ), which is because God's_big_computation more closely resembles the beliefs of the 6 billion people on this planet. And even if it did return Eliezer's_big_computation( - ), I'm not sure I agree with that outcome. In any case, I don't think you said anything new or particularly useful here; I think that we all need to think about this issue more.

As a matter of fact myself and Richard Hollerith have independently thought of a canonical notion of goodness which is objective. He calls it "goal system zero", I call it "universal instrumental values".

comment by IL · 2008-07-29T10:43:16.000Z · LW(p) · GW(p)

Let me see if I get this straight:

Our morality is composed of a big computation that includes a list of the things that we value(love, friendship, happiness,...) and a list of valid moral arguments(contagion backward in time, symmetry,...). If so, then how do we discover those lists? I guess that the only way is to reflect on our own minds, but if we do that, then how do we know if a particular value comes from our big computation, or is it just part of our regular biases? And if our biases are inextricably tangled with The Big Computation, then what hope can we possibly have?

Anyway, I think it would be useful to moral progress to list all of the valid moral arguments. Contagion backward in time and symmentry seem to be good ones. Any other suggestions?

comment by Laura B (Lara_Foster) · 2008-07-29T10:53:20.000Z · LW(p) · GW(p)

I second Behemouth and Nick- what do we do in the mindspace in which individual's feelings of right and wrong disagree? What if some people think retarded children absolutely should NOT be pulled off the track? Also, what about the pastrami-sandwich dilemma? (hat of those who would kill 1 million unknown people with no consequence to themselves for a delicious sandwich?

But generally, I loved the post. You should write another post on 'Adding Up to Normality.'

comment by Laura B (Lara_Foster) · 2008-07-29T10:54:11.000Z · LW(p) · GW(p)

Just because I can't resist, a poem about human failing, the judgment of others we deem weaker than ourselves, and the desire to 'do better.' Can we?

"No Second Troy" WB Yeats, 1916 WHY should I blame her that she filled my days With misery, or that she would of late Have taught to ignorant men most violent ways, Or hurled the little streets upon the great, Had they but courage equal to desire? 5 What could have made her peaceful with a mind That nobleness made simple as a fire, With beauty like a tightened bow, a kind That is not natural in an age like this, Being high and solitary and most stern? 10 Why, what could she have done being what she is? Was there another Troy for her to burn?

comment by IL · 2008-07-29T11:23:44.000Z · LW(p) · GW(p)

P.S : My great "Aha!" moment from reading this post is the realisation that morality is not just a utility function that maps states of the world to real numbers, but also a set of intuitions for changing that utility function.

comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2008-07-29T11:49:55.000Z · LW(p) · GW(p)

Added a section on non-universalizability:

If you hoped that morality would be universalizable - sorry, that one I really can't give back. Well, unless we're just talking about humans. Between neurologically intact humans, there is indeed much cause to hope for overlap and coherence; and a great and reasonable doubt as to whether any present disagreement is really unresolvable, even it seems to be about "values". The obvious reason for hope is the psychological unity of humankind, and the intuitions of symmetry, universalizability, and simplicity that we execute in the course of our moral arguments. (In retrospect, I should have done a post on Interpersonal Morality before this...) If I tell you that three people have found a pie and are arguing about how to divide it up, the thought "Give one-third of the pie to each" is bound to occur to you - and if the three people are humans, it's bound to occur to them, too. If one of them is a psychopath and insists on getting the whole pie, though, there may be nothing for it but to say: "Sorry, fairness is not 'what everyone thinks is fair', fairness is everyone getting a third of the pie". You might be able to resolve the remaining disagreement by politics and game theory, short of violence - but that is not the same as coming to agreement on values. (Maybe you could persuade the psychopath that taking a pill to be more human, if one were available, would make them happier? Would you be justified in forcing them to swallow the pill? These get us into stranger waters that deserve a separate post.) If I define rightness to include the space of arguments that move me, then when you and I argue about what is right, we are arguing our approximations to what we would come to believe if we knew all empirical facts and had a million years to think about it - and that might be a lot closer than the present and heated argument. Or it might not. This gets into the notion of 'construing an extrapolated volition' which would be, again, a separate post. But if you were stepping outside the human and hoping for moral arguments that would persuade any possible mind, even a mind that just wanted to maximize the number of paperclips in the universe, then sorry - the space of possible mind designs is too large to permit universally compelling arguments. You are better off treating your intuition that your moral arguments ought to persuade others, as applying only to other humans who are more or less neurologically intact. Trying it on human psychopaths would be dangerous, yet perhaps possible. But a paperclip maximizer is just not the sort of mind that would be moved by a moral argument. (This will definitely be a separate post.)

comment by robin_brandt2 · 2008-07-29T12:01:00.000Z · LW(p) · GW(p)

Dear Eliezer, First of all, great post, thank you, I truly love you Eli!! It was really the kind of beautiful endpoint in your dance I was waiting for, and it is very much in the lines of my own reasoning, just a lot more detailed. I also think this could be labeled metametamorality, therefore some of the justified complaints does not yet apply. But the people complaining about different moral preferences are doing so with their own morality, what else could they be using, and in doing so they are acting according to the arguments of this post. Metametamorality would be about the ontological reductionistic framework in which metamorality and morality takes place. Metamorality would be about simplifying and generalizing morality into certain values, principles and rules to which groups of different sizes would try to approximate, that we ought to follow. Morality would be about applying the metamorality. But it may also complicate things too much using this terminology, and I might even prefer Eliezers use. Metametamorality could also be called metamorality and metamorality, metaethics. Anyway, I found this amazingly summarizing of your viewpoints, and it helped me a lot in grasping this course in Bayescraft.

I have been working for a long time on my own metamorality or metaethics. Which you may take a peak at in this diagram http://docs.google.com/File?id=d4pc9b6_188cgj9zgwz_b. It workes from the same metametamoral assumptions that eliezer does. And I have done my best in using my inbuilt moral computation to onstruct a coherent metamoral guide for myself and others who may be like me. For me the basic principle is and has been for a couple of years now: There is no values outside humanity, so therefore everything has 0 value. But being an agent, with a certain set of feelings, morality and goals I may as well feel them and use them(some of them), because it is rather fantastic after all that there is anything at all. Although this amazement is a human feeling produced by my evolved psychology for curiosity it seems rather nice. It is just beautiful that there is something rather than nothing(especially in a Tegmark MUH universe), so I assign +1 to everything, including particles, energies, all events, everything that exists, every piece of information, it is good, it is beautiful because it exists, so hence I love everything!! But this was only the first level of morality, I can not and do not want to be left only with that, because that would leave me sitting still and doing nothing. So I let my evolved psychology discriminate on the highest possible general level. I let more feelings slip in gently, building up a hierarchy of morality and values on many different levels. A universe with three particles is better than one with only 1. Complexity, information is beautiful, diversity is, but simplicity is also. Beauty in general although elusive as a concept is a very important but comlicated terminal value for me consisting of a lot of parts, and I believe it is in some sense for everybody, although seldom explicit. Truth is also one of the great terminal values, and I think David Deautch expressed it nicely in a TED talk once. The Good is also of great importance, since it allows the expansion of beauty and knowledge about truth. So important for me is:

To construct a metamorality that is very general
That is not tied only to experience, although the values may originate from our experience and psychology and may very often be the same as pleasure in our experience. This is mostly because of elegance and a stability concern, also because it may affect our belief in it more, i.e. stronger placebo.
For me a universe with matter distributed randomly is uglier than an universe only consisting of a great book or intelligent, complex construction, even though nobody might experience it.
Of course experience and intelligence is of the greatest and most beautiful things ever. So I value them extremely highly. Higher than anything else because it makes everything else so much more beautiful when there is someone to experience it.

The complete hierarchical structure of my value system is not complete and will never be, but I will try to continue to approximate it, and I find it valuable and moral to do so, as it might help myself and others in deciding on values, and choosing wisely. It might not be the value sstem of choice for most individuals, but some people might find it appealing. Sorry for the Sketchy nature of this comment, I just needed it to get out, I hope I could get some comment from Eliezer, but I may as well wait until I get enough strength to make this thing better and to mail it then...

comment by robin_brandt2 · 2008-07-29T12:08:01.000Z · LW(p) · GW(p)

P.S. So my additinon is really, choose a stable value structure that feels right, try to maximize it, try to make it better and change so when you feel it is right. I have my own high-level suggestion of Beauty, Truth and the Good, and I later discovered Plato and a lot of others seem to argue for the same three...

comment by Toby_Ord2 · 2008-07-29T12:08:02.000Z · LW(p) · GW(p)

There are some good thoughts here, but I don't think the story is a correct and complete account of metamorality (or as the rest of the world calls it: metaethics). I imagine that there will be more posts on Eliezer's theory later and more opportunities to voice concerns, but for now I just want to take issue with the account of 'shouldness' flowing back through the causal links.

'Shouldness' doesn't always flow backwards in the way Eliezer mentioned. e.g. Suppose that in order to push the button, you need to shoot someone who will fall down on it. This would make the whole thing impermissible. If we started by judging saving the child as something we should do, then the backwards chain prematurely terminates when we come to the only way to achieve this involving killing someone. Obviously, we would really want to consider not just the end state of the chain when working out whether we should save the child, but to evaluate the whole sequence in the first place. For if the end state is only possible given something that is impermissible then it wasn't something we should bring about in the first place. Indeed, I think the following back from 'should' is a rather useless description. It is true that if we should (all things considered) do X, then we should do all the things necessary for X, but we can only know whether we should do X (all things considered) if we have already evaluated the other actions in the chain. It is a much more fruitful account to look forward, searching the available paths and then selecting the best one. This is how it is described by many philosophers, including a particularly precise treatment by Fred Feldman in his paper World Utilitarianism and his book Doing the Best We Can.

(Note also that this does not assume consequentialism is true: deontologists can define the goodness of paths in a way that involves things other than the goodness of the consequences of the path.)

comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2008-07-29T13:21:22.000Z · LW(p) · GW(p)

On reflection, there should be a separate name for the space of arguments that change our terminal values. Using "metaethics" to indicate beliefs about the nature of (ontology of) morality, would free up "metamorals" to indicate those arguments that change our terminal values. So I bow to Zubon and standard usage - even though it still sounds wrong to me.

Toby, the case of needing to shoot someone who will fall down on the button, is of course very easy for a consequentialist to handle; wrongness flows backward from the shooting, as rightness flows backward from the button, and the wrongness outweighs the rightness.

Nonetheless a great deal of human ethics can be understood in terms of deontological-type rules for handling such conflicts - at any particular event-node - and these in turn can often be understood in terms of simple rules that try to compensate for human biases. I.e., "The end doesn't justify the means" makes sense for humans because of a systematic human tendency to overestimate how likely the end is to be achieved, and to underestimate the negative consequences of dangerous means, especially if the means involves taking power for yourself and the end is the good is the tribe. I have always called such categorically phrased injunctions ethics, which would make "ethics" an entirely different subject from "metaethics". This Deserves A Separate Post.

There may also be moral values that only make sense when we value a 4D crystal, not a 3D slice - or to put it more precisely, moral values that only make sense when we value a thick 4D slice rather than a thin 4D slice; it's not as if you can have an instantaneous experience of happiness. "People being in control of their own lives" might only make sense in these terms, because of the connection between past and future. This too is an advanced topic.

It seems that despite all attempts at preparation, there are many other topics I should have posted on first.

comment by Unknown · 2008-07-29T13:37:17.000Z · LW(p) · GW(p)

As I've stated before, we are all morally obliged to prevent Eliezer from programming an AI. For according to this system, he is morally obliged to make his AI instantiate his personal morality. But it is quite impossible that the complicated calculation in Eliezer's brain should be exactly the same as the one in any of us: and so by our standards, Eliezer's morality is immoral. And this opinion is subjectively objective, i.e. his morality is immoral and would be even if all of us disagreed. So we are all morally obliged to prevent him from inflicting his immoral AI on us.

Replies from: None

↑ comment by [deleted] · 2013-10-12T02:24:31.772Z · LW(p) · GW(p)

This is a really, really hasty non-sequitur. Eliezer's morality is probably extremely similar to mine; thus, the world be a much, much better place, even according to my specification, with an AI running Eliezer's morality as opposed no AI running at all (or, worse, a paperclip maximizer). Eliezer's morality is absolutely not immoral; it's my morality +- 1% error, as opposed to some other nonhuman goal structure which would be unimaginably bad on my scale.

comment by Zubon · 2008-07-29T13:38:29.000Z · LW(p) · GW(p)

Suggested summary: "There is nothing else." That is the key sentence. After much discussion of morals and metas, it comes down to: "You go on with the same morals as before, and the same moral arguments as before." The insight offered is that there is no deeper insight to offer. The recursion will bottom out, so bite the bullet and move on.

Yet another agreement on the 1-Place and 2-Place problem, and I read it after the addition. CEV goes around most of that for neurologically intact humans, but the principle of "no universally compelling arguments" means that we still have right-Eliezer and right-Robin, even if those return the same values to 42 decimal places. If we shut up and multiply sufficiently large values, that 43rd decimal place is a lot of specks and torture.

(Lots of English usage sounds wrong. You know enough Japanese to know how wrong "I bow to Zubon" sounds. But maybe you can kick off some re-definition of terms. A century of precedent isn't much in philosophy.)

comment by Toby_Ord2 · 2008-07-29T13:51:49.000Z · LW(p) · GW(p)

wrongness flows backward from the shooting, as rightness flows backward from the button, and the wrongness outweighs the rightness.

I suppose you could say this, but if I understand you correctly, then it goes against common usage. Usually those who study ethics would say that rightness is not the type of thing that can add with wrongness to get net wrongness (or net rightness for that matter). That is, if they were talking about that kind of thing, they wouldn't use the word 'rightness'. The same goes for 'should' or 'ought'. Terms used for this kind of stuff that can add together: [goodness / badness], [pro tanto reason for / pro tanto reason against].

If you merely meant that any wrong act on the chain trumps any right act further in the future, then I suppose these words would be (almost) normal usage, but in this case it doesn't deal with ethical examples very well. For instance, in the consequentialist case above, we need to know the degree of goodness and badness in the two events to know whether the child-saving event outweighs the person-shooting event. Wrongness trumping rightness is not a useful explanation of what is going on if a consequentialist agent was considering whether to shoot the person. If you want the kind of additivity of value that is relevant in such a case, then call it goodness, not rightness/shouldness. And if this is the type of thing you are talking about, then why not just look at each path and sum the goodness in it, choosing the path with the highest sum. Why say that we sum the goodness in a path in reverse chronological order? How does this help?

Regarding the terms 'ethics' and 'morality', philosophers use them to mean the same thing. Thus, 'metamorality' would mean the same thing as 'metaethics', it is just that no-one else uses the former term (overcoming bias is the top page on google for that term). There is nothing stopping you from using 'ethics' and 'morality' to mean different things, but since it is not standard usage and it would lead to a lot of confusion when trying to explain your views.

comment by Roko · 2008-07-29T13:54:35.000Z · LW(p) · GW(p)

Eliezer: "if you were stepping outside the human and hoping for moral arguments that would persuade any possible mind, even a mind that just wanted to maximize the number of paperclips in the universe, then sorry - the space of possible mind designs is too large to permit universally compelling arguments."

I disagree.

comment by Caledonian2 · 2008-07-29T14:24:31.000Z · LW(p) · GW(p)

[THIS WOULD GET DELETED]Other than wasting time and effort, there's nothing wrong with reinventing the wheel. At least there's a useful result, even if it's a duplicated and redundant one.

But this is just reinventing the perpetual motion machine. Not only is it a massive falling-into-error, it's retracing the steps of countless others that have spent the last few centuries wandering in circles. Following in the footsteps of others is only laudable when those footsteps lead somewhere.

There's not a single operational definition here, just claims that refer to unspoken, unspecified assumptions and assertions of desired conclusions. Eliezer, you've missed your calling. You should have been a theologian. Making definitive proclamations about things you can't define and have a bunch of incoherent beliefs about is clearly your raison d'être.[/THIS WOULD GET DELETED]

Replies from: ata

↑ comment by ata · 2011-01-15T04:02:18.633Z · LW(p) · GW(p)

Was "Operational definitions!" the entirety of this guy's philosophy?

comment by Caledonian2 · 2008-07-29T14:31:07.000Z · LW(p) · GW(p)

Quick addition, in response to Roko's dissention:

Mathematicians routinely prove things about infinitely large sets. They do this by determining what properties the sets have, then seeing how those properties interact logically. The size of the space of all potential minds has nothing to do with whether we can construct universally compelling arguments about that space. It is in fact guaranteed that we can make universal arguments about it, because the space has a definition that determines what is and is not included within.

[THIS WOULD GET DELETED]The reason you are unable to make such arguments is that you're unwilling to do any of the rudimentary tasks necessary to do so. You've accomplished nothing but making up names for ill-defined ideas and then acting as though you'd made a breakthrough.

On the off-chance that you actually want to contribute something meaningful to the future of humanity, I suggest you take a good, hard look at your other motivations - and the gap between what you've actually accomplished and your espoused goals.[/THIS WOULD GET DELETED]

comment by Unknown · 2008-07-29T14:45:02.000Z · LW(p) · GW(p)

After thinking more about it, I might be wrong: actually the calculation might end up giving the same result for every human being.

Caledonian: what kind of motivations do you have?

comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2008-07-29T14:45:13.000Z · LW(p) · GW(p)

Okay, for the future I'll just delete the content-free parts of Caledonian's posts, like those above. There do seem to be many readers who would prefer that he not be banned outright. But given the otherwise high quality of the comments on Overcoming Bias, I really don't think it's a good idea to let him go on throwing up on the blog.

comment by Jef_Allbright · 2008-07-29T14:45:14.000Z · LW(p) · GW(p)

Watching the ensuing commentary, I'm drawn to wishfully imagine a highly advanced Musashi, wielding his high-dimensional blade of rationality such that in one stroke he delineates and separates the surrounding confusion from the nascent clarity. Of course no such vorpal katana could exist, for if it did, it would serve only to better clear the way for its successors.

I see a preponderance of viewpoints representing, in effect, the belief that "this is all well and good, but how will this guide me to the one true prior, from which Archimedian point one might judge True Value?"

I see some who, given a method for reliably discarding much which is not true, say scornfully in effect "How can this help me? It says nothing whatsoever about Truth itself!"

And then there are the few who recognize we are each like leaves of a tree rooted in reality, and while we should never expect exact agreement between our differing subjective models, we can most certainly expect increasing agreement -- in principle -- as we move toward the root of increasing probability, pragmatically supporting, rather than unrealistically affirming, the ongoing growth of branches of increasing possibility. [Ignoring the progressive enfeeblement of the branches necessitating not just growth but eventual transformation.]

Eliezer, I greatly appreciate the considerable time and effort you must put into your essays. Here are some suggested topics that might help reinforce and extend this line of thought:

Two communities, separated by a chasm Would it be seen as better (perhaps obviously) to build a great bridge between them, or to consider the problem in terms of an abstract hierarchy of values, for example involving impediments to transfer of goods, people, ... ultimately information, for which building a bridge is only a special-case solution? In general, is any goal not merely a special case (and utterly dependent on its specifiability) of values-promotion?
Fair division, etc. Probably nearly all readers of Overcoming Bias are familiar with a principled approach to fair division of a cake into two pieces, and higher order solutions have been shown to be possible with attendant computational demands. Similarly, Rawles proposed that we ought to be satisfied with social choice implemented by best-known methods behind a veil of ignorance as to specific outcomes in relation to specific beneficiaries. Given the inherent uncertainty of specific future states within any evolving system of sufficient complexity to be of moral interest, what does this imply about shifting moral attention away from expected consequences, and toward increasingly effective principles reasonably optimizing our expectation of improving, but unspecified and indeed unspecifiable, consequences? Bonus question: How might this apply to Parfit's Repugnant Conclusion and other well-recognized "paradoxes" of consequentialist utilitarianism?
Constraints essential for meaningful growth Widespread throughout the "transhumanist" community appears the belief that considerable, if not indefinite progress can be attained via the "overcoming of constraints." Paradoxically, the accelerating growth of possibilities that we experience arises not with overcoming constraints, but rather embracing them in ever-increasing technical detail. Meaningful growth is necessarily within an increasingly constrained possibility space -- fortunately there's plenty of fractal interaction area within any space of real numbers -- while unconstrained growth is akin to a cancer. An effective understand of meaningful growth depends on an effective understanding of the subjective/objective dichotomy.

Thanks again for your substantial efforts.

comment by Psy-Kosh · 2008-07-29T14:52:46.000Z · LW(p) · GW(p)

Caledonian: uh... he didn't say you couldn't make arguments about all possible minds, he was saying you couldn't construct an argument that's so persuasive, so convincing that every possible mind, no matter how unusual its nature, would automatically be convinced by that argument.

It's not a matter of talking about minds, it's a matter of talking to minds.

Mathematicians figure out things about sets. But they're not trying to convince the sets themselves about those things. :)

comment by ME3 · 2008-07-29T14:59:46.000Z · LW(p) · GW(p)

You know, I think Caledonian is the only one who has the right idea about the nature of what's being written on this blog. I will miss him because I don't have the energy to battle this intellectual vomit every single day. And yet, somehow I am forced to continue looking. Eliezer, how does your metamorality explain the desire to keep watching a trainwreck?

comment by Larry_D'Anna · 2008-07-29T15:01:45.000Z · LW(p) · GW(p)

Roko: You think you can convince a paperclip maximizer to value human life? Or do you think paperclip maximizers are impossible?

comment by Allan_Crossman · 2008-07-29T15:04:16.000Z · LW(p) · GW(p)

Eliezer: It's because when I say right, I am referring to a 1-place function

Like many others, I fall over at this point. I understand that Morality_8472 has a definite meaning, and therefore it's a matter of objective fact whether any act is right or wrong according to that morality. The problem is why we should choose it over Morality_11283.

Of course you can say, "according to Morality_8472, Morality_8472 is correct" but that's hardly helpful.

Ultimately, I think you've given us another type of anti-realist relativism.

Eliezer: But if you were stepping outside the human and hoping for moral arguments that would persuade any possible mind, even a mind that just wanted to maximize the number of paperclips in the universe, then sorry - the space of possible mind designs is too large to permit universally compelling arguments.

It's at least conceivable that there could be objective morality without universally compelling moral arguments. I personally think there could be an objective foundation for morality, but I wouldn't expect to persuade a paperclip maximizer.

comment by Larry_D'Anna · 2008-07-29T15:10:12.000Z · LW(p) · GW(p)

Caledonian: He isn't using "too-big" in the way you are interpreting it.

The point is not: Mindspace has a size X, X > Y, and any set of minds of size > Y cannot admit universal arguments.

The point is: For any putative universal argument you can cook up, I can cook up a mind design that isn't convinced by it.

The reason that we say it is too big is because there are subsets of Mindspace that do admit universally compelling arguments, such as (we hope) neurologically intact humans.

comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2008-07-29T15:12:26.000Z · LW(p) · GW(p)

Unknown: As I've stated before, we are all morally obliged to prevent Eliezer from programming an AI. For according to this system, he is morally obliged to make his AI instantiate his personal morality.

Unknown, do I really strike you as the sort of person who would do something that awful just because I was "morally obliged" to do it? Screw moral obligation. I can be nice in defiance of morality itself, if I have to be.

Of course this really amounts to saying that I disagree with your notion of what I am "morally obliged" to do. Exercise: Find a way of construing 'moral obligation' that does not automatically 'morally obligate' someone to take over the world. Hint: Use a morality more complicated than that involved in maximizing paperclips.

Allan: The problem is why we should choose it over Morality_11283.

You just used the word "should". If it doesn't mean Morality_8472, or some Morality_X, what does it mean? How do you expect to choose between successor moralities without initial morality?

I personally think there could be an objective foundation for morality, but I wouldn't expect to persuade a paperclip maximizer.

This just amounts to defining should as an abstract computation, and then excluding all minds that calculate a different rule-of-action as "choosing based on something other than morality". In what sense is the morality objective, besides the several senses I've already defined, if it doesn't persuade a paperclip maximizer?

Replies from: TAG

↑ comment by TAG · 2023-05-31T14:14:31.416Z · LW(p) · GW(p)

Allan: The problem is why we should choose it over Morality_11283.

You just used the word “should”. If it doesn’t mean Morality_8472, or some Morality_X, what does it mean?

Whatever is rationally preferable. The whole point of doing moral philosophy is that you already have a set of ethically-neutral epistemic norms you can address metaethcial issues with.

comment by Vladimir_Nesov · 2008-07-29T15:16:44.000Z · LW(p) · GW(p)

Eliezer: You go on with the same morals as before, and the same moral arguments as before. There is no sudden Grand Overlord Procedure to which you can appeal to get a perfectly trustworthy answer.

'Same moral arguments as before' doesn't seem like an answer, in the same sense as 'you should continue as before' is not a good advice for cavemen (who could benefit from being brought into modern civilization). If cavemen can vaguely describe what they want from environment, this vague explanation can be used to produce optimized environment by sufficiently powerful optimization process that is external to cavemen, based on the precise structure of current environment. It won't go all the way there (otherwise, a problem of ignorant jinn will kick in), but it can really help.

Likewise, the problem of 'metamorality' is in producing a specification of goals that is better than vague explanations of moral philosophers. For that, we need to produce vague explanation of what we think morality is, and set an optimization process on these explanations to produce a better description of morality, based on current state of environment (or, specifically, humanity and human cognitive architecture).

These posts sure clarify something for the confused, but what is the content in the sense I described? I hope the above quotation was not a curiosity stopper.

Replies from: MarsColony_in10years

↑ comment by MarsColony_in10years · 2015-06-19T21:48:04.030Z · LW(p) · GW(p)

we need to produce vague explanation of what we think morality is, and set an optimization process on these explanations to produce a better description of morality

Agreed, but this post isn't it, and wasn't meant to be. This post basically nailed down what form morality should take. Perhaps it could be expressed as a summation of all our thousand shards of desire. In order to actually compute this, we would use a function which Yudkowsky calls Coherent Extrapolated Volition. That's what he describes as "what we would come to believe if we knew all empirical facts and had a million years to think about it". Actually calculating morality is left as an exercise for the reader.

comment by Sebastian_Hagen2 · 2008-07-29T15:20:20.000Z · LW(p) · GW(p)

Thank you for this post. "should" being a label for results of the human planning algorithm in backward-chaining mode the same way that "could" is a label for results of the forward-chaining mode explains a lot. It's obvious in retrospect (and unfortunately, only in retrospect) to me that the human brain would do both kinds of search in parallel; in big search spaces, the computational advantages are too big not to do it.

I found two minor syntax errors in the post: "Could make sense to ..." - did you mean "Could it make sense to ..."? "(something that has a charge of should-ness" - that parenthesis is never closed.

Unknown wrote:

As I've stated before, we are all morally obliged to prevent Eliezer from programming an AI.

Speak for yourself. I don't think EliezerYudkowsky::Right is quite the same function as SebastianHagen::Right, but I don't see a real chance of getting an AI that optimizes only for SebastianHagen::Right accepted as sysop. I'd rather settle for an acceptable compromise in what values our successor-civilization will be built on than see our civilization being stomped into dust by an entirely alien RPOP, or destroyed by another kind of existential catastrophe.

comment by Richard8 · 2008-07-29T15:43:47.000Z · LW(p) · GW(p)

Suppose we were to write down all (input, output) pairs for the ideal "one-place function" described by Eliezer on a oblong stone tablet somewhere. This stone tablet would then contain perfect moral wisdom. It would tell us the right course of action in any possible situation.

This tablet would be the result of computation, but it's computation that nobody can actually do, as we currently only have access to approximations to the ideal Morality(X) function. Thus, as far as we're concerned, this tablet is just a giant look-up table. Its contents are a brute fact about the universe, like the ratio of the masses of the proton and the electron. If we are confronted with a moral dilemna, and our personal ideas of right and wrong contradict the tablet, this will always be a result of our own morality functions poorly approximating the ideal. In such a situation, we should override our instincts and go with the tablet every time.

In other words, according to Eliezer's model, in a universe where this tablet exists morality is given.

This is also true of a universe where the tablet does not exist (such as ours--it wouldn't fit!).

So Eliezer has just rediscovered "morality is the will of God", except he's replacing "God" with a giant block of stone somewhere in a hypothetical universe. It's not clear to me that this is an impovement.

It seems to me that the functional difference is that Eliezer believes he can successfully approximate the will of the Giant Hypothetical Oblong Stone Tablet out of his own head. If George Washington says "Slavery is sometimes just," Eliezer does not take this assertion seriously; he does not start trying to re-work his personal GHOST-approximator to take Washington's views into account. Rather he says, "I know that slavery is wrong, and I approximate the GHOST, so slavery is wrong," ignoring the fact that all men--including Washington--approximate the GHOST as best they can. Worse, by emphasizing the process of making, weighing and pondering moral "arguments", he privileges the verbally and quantitatively quick over the less intelligent, even though the correlation between being good with words and having a good GHOST-approximator is nowhere shown.

Everyone's GHOST-approximator is shaped by his environment. If the modern world encourages people to deny the GHOST in particular ways, and Eliezer indeed does so, then he would not be able to tell. His tool for measuring, his personal GHOST-finder, would have been twisted. His friends' and respected peers' GHOST-approximators might all be twisted in the same way, so nobody would point out his error and he would have no opportunity to correct it. He would use his great skill with words to try to convince everyone that his personal morality was correct. Him and people like him might well succeed. His assertion of moral progress would then merely be the statement that the modern world reflects his personal biases--or perhaps that he reflects the biases of the modern world.

I'm concerned that the metamorality described by Eliezer will encourage self-named rationalists to worship their own egos, placing their personal imperfect GHOST-approximators--all shaped by the moral environment of the modern world--at the same level as those in past ages placed the will of God. Perhaps this is not Eliezer's intention. But to do otherwise, to look beyond the biases of the present day, one would have to acknowledge that the GHOST-readers of our ancestors may have in some ways have been better than ours. This would require humility; and pride cures humility.

comment by Laura B (Lara_Foster) · 2008-07-29T15:49:30.000Z · LW(p) · GW(p)

Calhedonian: [THIS WOULD GET DELETED]The reason you are unable to make such arguments is that you're unwilling to do any of the rudimentary tasks necessary to do so. You've accomplished nothing but making up names for ill-defined ideas and then acting as though you'd made a breakthrough. On the off-chance that you actually want to contribute something meaningful to the future of humanity, I suggest you take a good, hard look at your other motivations - and the gap between what you've actually accomplished and your espoused goals.[/THIS WOULD GET DELETED]

This is NOT that bad a point! Don't delete that! If we're considering cognitive biases, then it makes sense to consider the biases of our beloved leader, who might be so clever as to convince all of us to walk directly off of a cliff... Who is the pirate king at the helm of our ship? What are your motivations is a good question indeed- though not one I expect answered in one post or right away.

Also, I found reading this post very satisfying, but that might just be because it's brain candy confirming my justness in believing what I already believed... It's good to be skeptical, especially of things that say, 'You can feel it's right! And it's ok that there's no external validation...' Tell that to the Nazis who thought Jews were not part of the human species...

comment by Mike_Blume · 2008-07-29T15:49:51.000Z · LW(p) · GW(p)

I'm still wrestling with this here -

Do you claim that the CEV of a pygmy father would assert that his daughter's clitoris should not be sliced off? Or that the CEV of a petty thief would assert that he should not possess my iPod?

comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2008-07-29T16:09:00.000Z · LW(p) · GW(p)

Mike Blume: Do you claim that the CEV of a pygmy father would assert that his daughter's clitoris should not be sliced off? Or that the CEV of a petty thief would assert that he should not possess my iPod?

Mike, a coherent extrapolated volition is generally something you do with more than one extrapolated volition at once, though I suppose you could extrapolate a single human's volition into a spread of outcomes and look for coherence in the spread. But this level of metaethics is of interest primarily to FAIfolk, I would think.

With that said, if I were building a Friendly AI, I would probably be aiming to construe 'extrapolated volitions' across at least the same kind of gaps that separate Archimedes from the modern world. Whether you can do this on a strictly individual extrapolation - whether Archimedes, alone in a spacesuit and thinking, would eventually cross the gap on his own - is an interesting question.

At the very least, you should imagine the pygmy father having full knowledge of the alternate lives his daughter would lead, as though he had lived them himself - though that might or might not imply full empathy, it would at the least imply full knowledge.

And at the very least, imagine the petty thief reading through everything ever written in the Library of Congress, including everything ever written about morality.

This advice is hardly helpful in day-to-day moral reasoning, of course, unless you're actually building an AI with that kind of extrapolative power.

Vladimir Nesov: 'Same moral arguments as before' doesn't seem like an answer, in the same sense as 'you should continue as before' is not a good advice for cavemen (who could benefit from being brought into modern civilization). If cavemen can vaguely describe what they want from environment, this vague explanation can be used to produce optimized environment by sufficiently powerful optimization process that is external to cavemen...

At this point you're working with Friendly AI. Then, indeed, you have legitimate cause to dip into metaethics and make it a part of your conversation.

comment by Laura B (Lara_Foster) · 2008-07-29T16:26:00.000Z · LW(p) · GW(p)

Unknown: "But it is quite impossible that the complicated calculation in Eliezer's brain should be exactly the same as the one in any of us: and so by our standards, Eliezer's morality is immoral. And this opinion is subjectively objective, i.e. his morality is immoral and would be even if all of us disagreed. So we are all morally obliged to prevent him from inflicting his immoral AI on us"

Well, I would agree with this point if I thought what Eliezer was going to inflict upon us was so out of line with what I want that we would be better off without it. Since, you know, NOT dying doesn't seem like such a bad thing to me, I'm not going to complain, when he's one of the only people on Earth actually trying to make that happen...

On the other hand, Eliezer, you are going to have to answer to millions if not billions of people protesting your view of morality, especially this facet of it (the not dying thing), so yeah, learn to be diplomatic. You NOT allowed to fuck this up for the rest of us!

comment by sophiesdad · 2008-07-29T16:45:00.000Z · LW(p) · GW(p)

Unknown wrote:
As I've stated before, we are all morally obliged to prevent Eliezer from programming an AI.

As Bayesians, educated by Mr. Yudkowsky himself, I think we all know the probability of such an event is quite low. In 2004, in the most moving and intelligent eulogy I have ever read, Mr. Y stated: "When Michael Wilson heard the news, he said: "We shall have to work faster." Any similar condolences are welcome. Other condolences are not." Somewhere, some person or group is working faster, but at the Singularity Institute, all the time is being spent on somewhat brilliant and very entertaining writing. I shall continue to read and reflect, for my own enjoyment. But I hope those others I mentioned have Mr. Y's native abilities, because I agree with Woody Allen: "I don't want to achieve immortality through my work. I want to achieve it by not dying."

comment by josh2 · 2008-07-29T17:17:00.000Z · LW(p) · GW(p)

Any chance of a post summarizing all of the building block posts for this topic, like you did with your physics posts?
I hate to be a beggar, but that would be very helpful.

comment by Laura B (Lara_Foster) · 2008-07-29T17:19:00.000Z · LW(p) · GW(p)

Just another point as to why important, meglomeniacal types like Eliezer need to have their motives checked:
Frank Vertosick, in his book "When the Air Hits Your Brain: Tales from Neurosurgery," about a profession I am seriously considering, describes what becomes of nearly all people taking such power over life and death:

"He was the master... the 'ptototypical surgical psychopath' - someone who could render a patient quadriplegic in the morning, play golf in the afternoon, and spend the evening fretting about that terrible slice off the seventh tee. At the time this seemed terrible, but I soon learned he was no different than any other experierienced neurosurgeon in this regard... I would need to learn not to cry at funerals."

I had an interesting conversation with a fellow traveler about morality in which he pointed out that 'upright' citizens will commit the worst atrocities in the name of a greater good that they think they understand... Maybe some absolute checks are required on actions, especially those of people who might actually have a lot power over the outcome of the future. What becomes of the group lead by the man who simultaneously Achilles and Agamemnon?

comment by Unknown3 · 2008-07-29T17:25:00.000Z · LW(p) · GW(p)

About the comments on compromise: that's why I changed my mind. The functions are so complex that they are bound to be different in the complex portions, but they also have simplifying terms in favor of compromise, so it is possible that everyone's morality will end up the same when this is taken into account.

As for the probability that Eliezer will program an AI, it might not be very low, but it is extremely low that his will be the first, simply because so many other people are trying.

comment by TGGP2 · 2008-07-29T17:26:00.000Z · LW(p) · GW(p)

I'm near Unknown's position. I don't trust any human being with too much power. No matter how nice they seem at first, history indicates to me that they inevitably abuse it. We've been told that a General AI will have power beyond any despot known to history. Am I supposed to have that much reliance on the essential goodness within Eliezer's heart? And in case anyone brings this up, I certainly don't trust the tyranny of the majority either. I don't recognize any moral obligation to stop it because I don't recognize any obligations at all. Also, I might not live to seem him or his followers immanentize the Eschaton.

Female circumcision is commonly carried out by women who've undergone the procedure themselves. So I don't think the Pygmy father will be convinced.

comment by Roko · 2008-07-29T17:33:00.000Z · LW(p) · GW(p)

Larry: You think you can convince a paperclip maximizer to value human life? Or do you think paperclip maximizers are impossible?

I don't think that convincing arbitrary minds is the point. The point is that there does exist a canonical system of values. Just because there's an objectively true and canonical morality doesn't mean that every mind in existence can be persuaded to follow it: some minds are simply not rational.

Perhaps I should not have said "I disagree" to Eli, I should have said "what you say is trivially true, but it misses the point"

comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2008-07-29T17:37:00.000Z · LW(p) · GW(p)

Female circumcision is commonly carried out by women who've undergone the procedure themselves.

Then they don't know the true difference between the two possible lives, do they?

comment by Matt_Simpson · 2008-07-29T17:43:00.000Z · LW(p) · GW(p)

I see this line of thinking coming directly out of Hume. Some of Hume's main points, as I read him:

1. Morality flows straight from humanity's values, and that's it.

2. Morality is universalizable among humans because of the psychological commonality.

3. What we are really doing in ethics is trying to find general principles which explain the values we have, then we can use the general principle to make ethical decisions. This is another label for trying to define the big abstract computation in our heads so that we can better optimize it. Hume never really questions our ethical beliefs; he just takes them as given and tries to understand them.

I'm very interested in how Eliezer gets from his meta-ethics to utilitarianism. Many an experiment has been thought for the sole purpose of showing how utilitarianism is in direct conflict with our moral intuitions. On the other hand, the same can be said for deontological ethics.

comment by Boris_Burkov · 2008-07-29T17:51:00.000Z · LW(p) · GW(p)

I don't understand why it must be a given that things like love, truth, beauty, murder, etc.. are universal moral truths that are right or wrong independent of the person computing the morality function. I know you frown upon mentioning evolutionary psychology, but is it really a huge stretch to surmise that the more even-keeled, loving and peaceful tribes of our ancestors would out-survive the wilder warmongers who killed each other out? Even if their good behavior was not genetic, the more "moral" leaders would teach/impart their morality to their culture until it became a general societal truth. We find cannibalism morally repugnant, yet for some long isolated islander tribes it was totally normal and acceptable, what does this say about the universal morality of cannibalism?

In short, I really reading enjoyed your insight on evaluating morality by looking backwards from results, and your idea of a hidden function that we all approximate is a very elegant idea, but I still don't understand how you saying "murder is wrong no matter whether I think it's right or not" does not amount to a list of universal moral postulates sitting somewhere in the sky.

comment by Laura B (Lara_Foster) · 2008-07-29T17:56:00.000Z · LW(p) · GW(p)

TGGP:
I have great sympathy with this position. An incorrectly formatted AI is one of the biggest fears of the singularity institute, mainly because there are so many more ways to be way wrong than even slightly right about it... It might be that the task of making an actually friendly AI is just too difficult for anyone, and our efforts should be spent in preventing anyone from creating a generally intelligent AI, in the mean time trying to figure out, with our inperfect human brains and the crude tools at our disposal, how to make uploads ourselves or create other physical means of life-extension... No idea. The particulars are out of my area of expertise. I might keep your brain from dying a little longer though... (stroke research)

comment by Silas · 2008-07-29T18:03:00.000Z · LW(p) · GW(p)

Matt Simpson: Many an experiment has been thought for the sole purpose of showing how utilitarianism is in direct conflict with our moral intuitions.

I disagree, or you're referring to something I haven't heard of. If I know what you mean here, those are a species of strawman ("act") utilitarianism that doesn't account for the long-term impact and adjustment of behavior that results.

(I'm going to stop giving the caveats; just remember that I accept the possibility you're referring to something else.)

For example, if you're thinking about cases where people would be against a doctor deciding to carve up a healthy patient against his will to save ~40 others, that's not rejection of utilitarianism. It can be recognition that once a doctor does that, people will avoid them in droves, driving up risks all around.

Or, if you're referring to the case of how people would e.g. refuse to divert a train's path so it hits one person instead of five, that's not necessarily an anti-utilitarian intuition; there are many factors at play in such a scenario. For example, the one person may be standing in a normally safe spot and so consented to a lower level of risk, and so by diverting the train, you screw up the ability of people to see what's really safe, etc.

comment by JulianMorrison · 2008-07-29T18:04:00.000Z · LW(p) · GW(p)

So, what do you do about inter-morality? Suppose a paperclip maximizer with the intellect and self-improvement limits of a typical human thirteen-year-old. They are slightly less powerful than you and can be expected to stay that way, so they aren't an existential threat. They are communicative and rational but they have one terminal value, and it's paperclips.

How ought a moral human to react? Can you expect to negotiate an inter-morality? Or must the stronger party win by force and make the weaker party suffer non-fulfillment? Is this mis-treatment and immoral? Ought you to allow them a bale of wire so they can at least make some paperclips?

comment by Nick_Tarleton · 2008-07-29T18:25:00.000Z · LW(p) · GW(p)

I second TGGP that the mention of wearing black etc. is ridiculous.

Lara: The portion you quote from Caledonian isn't at all well-defined itself; it's a near-pure insult hinting at, but not giving, actual arguments. I fully support its deletion. Also, Eliezer isn't saying "keep on believing what you believe", but "keep on following the process you have been"; he allows for moral error.

Lara, TGGP: The most important point is that building an AI, unlike surgery or dictatorship, doesn't give you any power to be corrupted by - any opportunity to make decisions with short-term life-or-death results - until the task is complete, and shortly after that (for most goal systems, including Friendly ones) it's out of your hands. Eliezer's obvious awareness of rationalization is encouraging wrt not committing atrocities out of good intentions. CEV's "Motivations" section, and CFAI FAQs 1.(2,3) and their references address correcting for partial programmer selfishness. Finally, I would think there would be more than one AI programmer, reducing the risk of deliberate evil.

Matt: See The "Intuitions" Behind "Utilitarianism".

Boris: See The Gift We Give To Tomorrow. Eliezer isn't saying there's some perfect function in the sky that evolution has magically led us to approximate. By "murder is wrong no matter whether I think it's right or not", he just means Eliezer-in-this-world judges that murder is still wrong in the counterfactual world where counterfactual-Eliezer judges that murder is right.

Richard: Eliezer thinks he can approximate the GHOST because the GHOST - his GHOST, more properly - is defined with respect to his own mind. Again, it's not some light in the sky. He can't, by definition, be twisted in such a way as to not be an approximation of his GHOST. And he obviously isn't suggesting that anyone is infallible.

Eliezer: It seems to me that your point would be much more clear (like to Boris and Richard) if you would treat morality as a 2-place function: "I judge that murder is wrong even if...", not "Murder is wrong even if...". (Would you say Allan is right to call your position relativism?)

comment by Allan_Crossman · 2008-07-29T18:47:00.000Z · LW(p) · GW(p)

Eliezer [in response to me]: This just amounts to defining should as an abstract computation, and then excluding all minds that calculate a different rule-of-action as "choosing based on something other than morality". In what sense is the morality objective, besides the several senses I've already defined, if it doesn't persuade a paperclip maximizer?

I think my position is this:

If there really was such a thing as an objective morality, it would be the case that only a subset of possible minds could actually discover or be persuaded of that fact.

Presumably, for any objective fact, there are possible minds who could never be convinced of that fact.

comment by Caledonian2 · 2008-07-29T19:14:00.000Z · LW(p) · GW(p)

Caledonian: uh... he didn't say you couldn't make arguments about all possible minds, he was saying you couldn't construct an argument that's so persuasive, so convincing that every possible mind, no matter how unusual its nature, would automatically be convinced by that argument.

That point is utterly trivial. You can implement any possible relationship between input and output. That even includes minds that are generally rational but will fail only in specifically-defined instances - such as a particular person making a particular argument. This does not, however, support the idea that we shouldn't bother searching for valid arguments, or that we'd need to produce arguments that would convince every possible information-processing system capable of being convinced.

comment by Marcello · 2008-07-29T19:31:00.000Z · LW(p) · GW(p)

"So that we can regard our present values, as an approximation to the ideal
morality that we would have if we heard all the arguments, to whatever extent
such an extrapolation is coherent."

This seems to be in the right ballpark, but the answer is dissatisfying
because I am by no means persuaded that the extrapolation would be coherent
at all (even if you only consider one person.) Why would it? It's
god-shatter, not Peano Arithmetic.

There could be nasty butterfly effects, in that the order in which you
were exposed to all the arguments, the mood you were in upon hearing them and
so forth could influence which of the arguments you came to trust.

On the other hand, viewing our values as an approximation to the ideal
morality that us would have if we heard all the
arguments, isn't looking good either: correctly predicting a bayesian port of
a massive network of sentient god-shatter looks to me like it would require a
ton of moral judgments to do at all. The subsystems in our brains sometimes
resolve things by fighting (ie. the feeling being in a moral dilemma.)
Looking at the result of the fight in your real physical brain isn't helpful
to make that judgment if it would have depended on whether you just had a
cup of coffee or not.

So, what do we do if there is more than one basin of attraction a moral
reasoner considering all the arguments can land in? What if there are no
basins?

Replies from: Wei_Dai, anonym, orthonormal

↑ comment by Wei Dai (Wei_Dai) · 2009-09-10T06:50:24.041Z · LW(p) · GW(p)

So, what do we do if there is more than one basin of attraction a moral reasoner considering all the arguments can land in? What if there are no basins?

I share Marcello's concerns as well. Eliezer, have you thought about what to do if the above turns out to be the case?

Also, this post isn't tagged with "metaethics" for some reason. I finally found it with Matt Simpson's help.

Replies from: Eliezer_Yudkowsky

↑ comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2009-09-10T08:12:38.628Z · LW(p) · GW(p)

It seems to me that if you build a Friendly AI, you ought to build it to act where coherence exists and not act where it doesn't.

Replies from: Wei_Dai, thomblake

↑ comment by Wei Dai (Wei_Dai) · 2009-09-10T18:48:33.914Z · LW(p) · GW(p)

What makes you think that any coherence exists in the first place? Marcello's argument seems convincing to me. In the space of possible computations, what fraction gives the same final answer regardless of the order of inputs presented? Why do you think that the "huge blob of computation" that is your morality falls into this small category? There seems to be plenty of empirical evidence that human morality is in fact sensitive to the order in which moral arguments are presented.

Or think about it this way. Suppose an (unFriendly) SI wants to craft an argument that would convince you to adopt a certain morality and then stop paying attention to any conflicting moral arguments. Could it do so? Could it do so again with a different object-level morality on someone else? (This assumes there's an advantage to being first, as far as giving moral arguments to humans is concerned. Adjust the scenario accordingly if there's an advantage in being last instead.)

You say the FAI won't act where coherence doesn't exist but if you don't expect coherence now, you ought to be doing something other than building such an FAI, or at least have a contingency plan for when it halts without giving any output?

Replies from: Eliezer_Yudkowsky

↑ comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2009-09-11T01:37:22.713Z · LW(p) · GW(p)

What makes you think that any coherence exists in the first place?

Most people wouldn't want to be turned into paperclips?

Replies from: Wei_Dai, CarlShulman

↑ comment by Wei Dai (Wei_Dai) · 2009-09-11T04:17:27.721Z · LW(p) · GW(p)

Most people wouldn't want to be turned into paperclips?

Of course not, since they haven't yet heard the argument that would make they want to. All the moral arguments we've heard so far have been invented by humans, and we just aren't that inventive. Even so, we have Voluntary Human Extinction Movement.

Replies from: Eliezer_Yudkowsky

↑ comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2009-09-11T22:05:46.387Z · LW(p) · GW(p)

Wei, suppose I want to help someone. How ought I to do so?

Is the idea here that humans end up anywhere depending on what arguments they hear in what order, without the overall map of all possible argument orders displaying any sort of concentration in one or more clusters where lots of endpoints would light up, or any sort of coherency that could be extracted out of it?

Replies from: Wei_Dai

↑ comment by Wei Dai (Wei_Dai) · 2009-09-11T22:29:02.999Z · LW(p) · GW(p)

Wei, suppose I want to help someone. How ought I to do so?

I don't know. (I mean I don't know how to do it in general. There are some specific situations where I do know how to help, but lots more where I don't.)

Is the idea here that humans end up anywhere depending on what arguments they hear in what order, without the overall map of all possible argument orders displaying any sort of concentration in one or more clusters where lots of endpoints would light up, or any sort of coherency that could be extracted out of it?

Yes. Or another possibility is that the overall map of all possible argument orders does display some sort of concentration, but that concentration is morally irrelevant. Human minds were never "designed" to hear all possible moral arguments, so where the concentration occurs is accidental, and perhaps horrifying from our current perspective. (Suppose the concentration turns out to be voluntary extinction or something worse, would you bite the bullet and let the FAI run with it?)

↑ comment by CarlShulman · 2009-09-11T04:29:06.893Z · LW(p) · GW(p)

A variety of people profess to consider this desirable if it leads to powerful intelligent life filling the universe with higher probability or greater speed. I would bet that there are stable equilibria that can be reached with arguments.

Replies from: rhollerith_dot_com

↑ comment by RHollerith (rhollerith_dot_com) · 2009-09-11T06:00:00.615Z · LW(p) · GW(p)

Carl says that a variety of people profess to consider it desirable that present-day humans get disassembled "if it leads to powerful intelligent life filling the universe with higher probability or greater speed."

Well, yeah, I'm not surprised. Any system of valuing things in which every life, present and future, has the same utility as every other life will lead to that conclusion because turning the existing living beings and their habitat into computronium, von-Neumann probes, etc, to hasten the start of the colonization of the light cone by a few seconds will have positive expected marginal utility according to the system of valuing things.

Replies from: jacob_cannell

↑ comment by jacob_cannell · 2011-02-02T02:04:38.417Z · LW(p) · GW(p)

That could still be a great thing for us provided that current human minds were uploaded into the resulting computronium explosion.

Replies from: anon895

↑ comment by anon895 · 2011-02-02T03:21:37.190Z · LW(p) · GW(p)

...which won't happen if the computronium is the most important thing and uploading existing minds would slow it down. The AI might upload some humans to get their cooperation during the early stages of takeoff, but it wouldn't necessarily keep those uploads running once it no longer depended on humans, if the same resources could be used more efficiently for itself.

Replies from: dxu

↑ comment by dxu · 2015-04-17T21:13:17.943Z · LW(p) · GW(p)

To get my cooperation, at least, it would have to credibly precommit that it wouldn't just turn my simulation off after it no longer needs me. (Of course, the meaning of the word "credibly" shifts somewhat when we're talking about a superintelligence trying to "prove" something to a human.)

↑ comment by thomblake · 2012-05-18T13:10:57.912Z · LW(p) · GW(p)

It seems to me that if you build a Friendly AI, you ought to build it to act where coherence exists and not act where it doesn't.

Is "not act" a meaningful option for a Singleton?

↑ comment by anonym · 2010-11-06T20:08:42.989Z · LW(p) · GW(p)

So, what do we do if there is more than one basin of attraction a moral reasoner considering all the arguments can land in? What if there are no basins?

This is a really insightful question, and it hasn't been answered convincingly in this thread. Does anybody know if it has been discussed more completely elsewhere?

One option would be to say that the FAI only acts where there is coherence. Another would be to specify a procedure for acting when there are multiple basins of attraction (perhaps by weighting the basins according to the proportion of starting points and orderings of arguments that lead to each basin, when that's possible, or some other 'impartial' procedure).

But still, what if it turns out that most of the difficult extrapolations that we would really care about bounce around without ever settling down or otherwise behave undesirably? No human being has ever done anything like the sorts of calculations that would be involved in a deep extrapolation, so our intuitions based on the extrapolations that we have imagined and that seem to cohere (which all have paths shorter than [e.g.] 1000) might be unrepresentative of the sorts of extrapolations than an FAI would actually have to perform.

↑ comment by orthonormal · 2018-04-04T00:40:03.309Z · LW(p) · GW(p)

This comment got linked a decade later, and so I thought it's worth stating my own thoughts on the question:

We can consider a reference class of CEV-seeking procedures; one (massively-underspecified, but that's not the point) example is "emulate 1000 copies of Paul Christiano living together comfortably and immortally and discussing what the AI should do with the physical universe; once there's a large supermajority in favor of an enactable plan (which can include further such delegated decisions), the AI does that".

I agree that this is going to be chaotic, in the sense that even slightly different elements of this reference class might end up steering the AI to different basins of attraction.

I assert, however, that I'd consider it a pretty good outcome overall if the future of the world were determined by a genuinely random draw from this reference class, honestly instantiated. (Again with the massive underspecification, I know.)

CEV may be underdetermined and many-valued, but that doesn't mean paperclipping is as good an answer as any.

Re: no basins, it would be a bad situation indeed if the vast majority of the reference class never ended up outputting an action plan, instead deferring and delegating forever. I don't have cached thoughts about that.

comment by michael_vassar · 2008-07-29T20:00:00.000Z · LW(p) · GW(p)

Caledonian: I can't think of anyone EVER choosing to interpret statements as stupid rather than sensible to the degree to which you do on this blog. There is usually NO ambiguity and you still get things wrong and then blame them for being stupid.

In all honesty why do you post here? On your own blog you are articulate and intelligent. Why not stick with that and leave commenting to people who want to actually respond to what people say rather than to straw men?

comment by Constant2 · 2008-07-29T20:08:00.000Z · LW(p) · GW(p)

We've been told that a General AI will have power beyond any despot known to history.

If that will be then we are doomed. Power corrupts. In theory an AI, not being human, might resist the corruption, but I wouldn't bet on that. I do not think it is a mere peculiarity of humanity that we are vulnerable to corruption.

We humans are kept in check by each other. We might therefore hope, and attempt to engineer, a proliferation of self-improving AIs, to form a society and to keep each other in check. With luck, cooperative AIs might be more successful at improving themselves - just as honest folk are for the most part more successful than criminals - and thus tend for the most part to out-pace the would-be despots.

As far as how a society of AIs would relate to humans, there are various possibilities. One dystopia imagines that humans will be treated like lower animals, but this is not necessarily what will happen. Animals are not merely dumb, but unable to take part in mutual respect of rights. We humans will always be able to and so might well remain forever recipients of an AI respect for our rights however much they evolve past us. We may of course be excluded from aspects of AI society which we humans are not able to handle, just as we exclude animals from rights. We may never earn hyper-rights, whatever those may be. But we might retain our rights.

comment by Richard2 · 2008-07-29T20:15:00.000Z · LW(p) · GW(p)

Nick,

Eliezer's one-place function is exactly infallible, because he defines "right" as its output.

I misunderstood some of Eliezer's notation. I now take his function to be an extrapolation of his volition rather than anyone else's. I don't think this weakens my point: if there were a rock somewhere with a lookup table for this function written on it, Eliezer should always follow the rock rather than his own insights (and according to Eliezer everyone else should too), and this remains true even if there is no such rock.

Furthermore, the morality function is based on extrapolated volition. Someone who has only considered one point of view on various moral questions will disagree with their extrapolated (completely knowledgable, completely wise) volition in certain predictable ways. That's exactly what I mean by a "twist."

comment by Caledonian2 · 2008-07-29T20:33:00.000Z · LW(p) · GW(p)

I can't think of anyone EVER choosing to interpret statements as stupid rather than sensible to the degree to which you do on this blog.

It's not a matter of choice - there have to be sensible interpretations available.

[MISREPRESENTATION]If, as Eliezer's defenders insist, we should interpret his remarks as suggesting that there is no point to looking for convincing arguments regarding 'morality' because there are no arguments that will convince all possible minds,[/MISREPRESENTATION] how exactly can this be construed as sensible? How is this compatible with rational inquiry? It's (usually) understood that the only arguments we need to concern ourselves with rational arguments that will convince rational minds - not in this case, it seems.

Interpreting the comments about mindspace being too big as referring to extent, rather than inclusion, still renders them stupid. But at least it's a straightforward and simple stupidity that is easily remedied. If [MISREPRESENTATION]the standard alluded to[/MISREPRESENTATION] were actually implemented, we'd have to discard all arguments about anything, because there is no topic where specific arguments will convince all possible minds.

comment by inquiringmind · 2008-07-29T20:43:00.000Z · LW(p) · GW(p)

I don't know much about the ins-and-outs of blog identification. Is it possible that someone could post diametrically, under two names, such as Caledonian and Robin Hanson, in order to maintain a friendship while minimizing the importance of efforts that one considers unimportant?

comment by Nick_Tarleton · 2008-07-29T21:14:00.000Z · LW(p) · GW(p)

Constant: Corruptibility is a complex evolutionary adaptation. Even the best humans have hard-to-suppress semiconscious selfish motivations that tend to come out when we have power, even if we thought we were gaining that power for idealistic reasons. There's no reason an AI with an intelligently designed, clean, transparent goal system would be corrupted, or need to be kept in check. "Respect for rights" is similarly anthropomorphic. Creating a society of AIs would be very problematic due to the strong first-mover effect, and the likely outcome amounts to extinction anyway.

Richard: Of course Eliezer should follow the rock; by stipulation, it states exactly those insights he would have if he were perfectly informed, rational, etc. This is nothing like "morality as the will of God", since any such rock would have a causal dependency on his brain. It's not clear to me that he's saying everyone else should as well. Also by stipulation (AFAICS), his volition will be able to convince him through honest argument of any of its moral positions, regardless of how twisted he might be. (I share Marcello's concern here, though.)

Caledonian: Eliezer is obviously not saying there is no point to looking for convincing arguments regarding 'morality', but that there are no arguments that will sway all minds, and there don't need to be any for morality to be meaningful. You're being ridiculous. (And how do you define 'rational mind'?)

inquiringmind: I know Caledonian's IP address. He's not Robin.

comment by Laura B (Lara_Foster) · 2008-07-29T21:50:00.000Z · LW(p) · GW(p)

Why not, I can't help myself: Caledonian = Thersites, Eliezer = Agamemnon

Thersites only clamourâd in the throng,
Loquacious, loud, and turbulent of tongue:
Awed by no shame, by no respect controllâd,
In scandal busy, in reproaches bold:
With witty malice studious to defame,
Scorn all his joy, and laughter all his aim:â
But chief he gloried with licentious style
To lash the great, and monarchs to revile.
...
Sharp was his voice; which in the shrillest tone,
Thus with injurious taunts attackâd the throne.
Whateâer our master craves submit we must,
Plagued with his pride, or punishâd for his lust.
Oh women of Achaia; men no more!
Hence let us fly, and let him waste his store
In loves and pleasures on the Phrygian shore.
We may be wanted on some busy day,
When Hector comes: so great Achilles may:
From him he forced the prize we jointly gave,
From him, the fierce, the fearless, and the brave:
And durst he, as he ought, resent that wrong,
This mighty tyrant were no tyrant long.â
...
âPeace, factious monster, born to vex the state,
With wrangling talents formâd for foul debate:
Curb that impetuous tongue, nor rashly vain,
And singly mad, asperse the sovereign reign.
Have we not known thee, slave! of all our host,
The man who acts the least, upbraids the most?
...
Expel the council where our princes meet,
And send thee scourged and howling through the fleet.â

comment by Wiseman · 2008-07-29T21:53:00.000Z · LW(p) · GW(p)

This post is called the "The Meaning of Right", but it doesn't spend much time actually defining what situations should be considered as right instead of wrong, other than a bit at the end which seems to define "right" as simply "happiness". Rather its a lesson in describing how to take your preferred world state, and causally link that to what you'd have to do to get to that state. But that world state is still ambiguously right/wrong, according to any absolute sense, as of this post.

So does this post say what "right" means, other than simply "happiness" (which sounds like generic utilitarianism), am I simply missing something?

comment by James7 · 2008-07-29T23:38:00.000Z · LW(p) · GW(p)

Eliezer once wrote that "We can build up whole networks of beliefs that are connected only to each other - call these "floating" beliefs. It is a uniquely human flaw among animal species, a perversion of Homo sapiens's ability to build more general and flexible belief networks.

The rationalist virtue of empiricism consists of constantly asking which experiences our beliefs predict - or better yet, prohibit."

I can't see how nearly all of the beliefs expressed in this post predict or prohibit any experience.

comment by Matt_Simpson · 2008-07-29T23:58:00.000Z · LW(p) · GW(p)

Silas: I'm referring to all thought experiments where the intended purpose was to show that utilitarianism is inconsistent with our moral intuitions. So, yes, the examples you mention, and more. Most of them do fall short of their purpose.

Nick Tarleton: I'm not sure all seemingly anti-utilitarian intuitions can be explained away by scope insensitivity, but that does take care of the vast majority of cases.

One case I was thinking of (for both of you) is the 'utility monster:' someone who receives such glee from killing, maiming, and otherwise causing havoc that the pain others endure due to him is virtually always outweighed by the happiness the monster receives.

Another case would be the difference between killing a terrorist who has 10 people hostage and murdering an innocent man to save 10 people. I would think that in general, people would be willing to do the first while hesitant to do the second, though I defer to anyone who knows the empirical literature.

comment by TGGP2 · 2008-07-30T04:27:00.000Z · LW(p) · GW(p)

Then they don't know the true difference between the two possible lives, do they?
"True difference" gets me thinking of "no true Scotsman". Has there ever been anybody who truly knew the difference between two possible lives? Even if someone could be reincarnated and retain memories the order would likely alter their perceptions.

I'm very interested in how Eliezer gets from his meta-ethics to utilitarianism
He's not a strict utilitarian in the "happiness alone" sense. He has an aversion to wireheading, which maximizes the classic version of utility.

I know you frown upon mentioning evolutionary psychology, but is it really a huge stretch to surmise that the more even-keeled, loving and peaceful tribes of our ancestors would out-survive the wilder warmongers who killed each other out?
Yes, it is. The peaceful ones would be vulnerable to being wiped out by the more warlike ones. Or, more accurately (group selection isn't as big a factor given intergroup variance being smaller than intragroup variance), the members of the peaceful tribe more prone to violence would achieve dominance as hawks among doves do. Among the Yanonamo we find high reproductive success among men who have killed. The higher the bodycount, the more children. War and murder appear to be human universals.

Eliezer's obvious awareness of rationalization is encouraging
Awareness of biases can increase errors, so it's not encouraging enough given the stakes.

Finally, I would think there would be more than one AI programmer, reducing the risk of deliberate evil
I'm not really worried about that. No one is a villain in their own story, and people we would consider deviants would likely be filtered out of the Institute and would probably be attracted to other career paths anyway. The problem exists, but I'm more concerned with well-meaning designers creating something that goes off in directions we can't anticipate.

Caledonian, Eliezer never said anything about not bothering to look for arguments. His idea is to find out how he found respond if he were confronted with all arguments. He seems to assume that he (or the simulation of him) will correctly evaluate arguments. His point about no universal arguments is that he has to start with himself rather than some ghostly ideal behind a veil of ignorance or something like that.

comment by Ian_C. · 2008-07-30T07:51:00.000Z · LW(p) · GW(p)

It seems to me you're saying that what our conscience tells us is right is right because it's output is what we mean by "right" in the first place. While I agree in general that a concept is it's referents, I don't agree with what you're saying here.

Those referents are not values but evaluations. And they are evaluations with respect to a standard that we can in fact change. We don't choose the output of our conscience on the spot, in that sense it is objective, but over time we can reprogram it through repetition and effort. It's evaluations are short-term objective but long term subjective.

comment by Caledonian2 · 2008-07-30T11:27:00.000Z · LW(p) · GW(p)

Just do what is, ahem, right - to the best of your ability to weigh the arguments you have heard, and ponder the arguments you may not have heard.

What's "the best of your ability"? 'Best' is a determination of quality. What constitutes quality reasoning about 'morality'?

When we talk about quality reasoning in, say, math, we don't have problems with that question. We don't permit just any old argument to be acceptable - if people's reasoning doesn't fit certain criteria, we don't accept that reasoning as valid. That they make the arguments, and that they may able to make only those arguments, is utterly irrelevant. If those are the only arguments they're capable of making, we say they're incapable of reasoning about math, we don't redefine our concept of math to permit their arguments to be sensible.

We have a conceptual box we call 'morality'. We know that people have mutually contradictory ideas about what sorts of things go in the box. It follows that we can't resolve the question of what should go in the box by looking at the output of people's morality evaluations. Those outputs are inconsistent; they can't proceed from a common set of principles.

So we have to look at the nature of the evaluations, not the output of the evaluations, and determine which outputs are right and which aren't. Not 'right' in the 'moral' sense, whatever that is - that would be circular reasoning. We can't evaluate moral evaluations with moral evaluations. 'Right' in the sense that mathematical arguments are right.

comment by Manon_de_Gaillande · 2008-07-30T11:53:00.000Z · LW(p) · GW(p)

I'm confused. I'll try to rephrase what you said, so that you can tell me whether I understood.

"You can change your morality. In fact, you do it all the time, when you are persuaded by arguments that appeal to other parts of your morality. So you may try to find the morality you really should have. But - "should"? That's judged by your current morality, which you can't expect to improve by changing it (you expect a particular change would improve it, but you can't tell in what direction). Just like you can't expect to win more by changing your probability estimate to win the lottery.

Moreover, while there is such a fact as "the number on your ticket matches the winning number", there is no ultimate source of morality out there, no way to judge Morality_5542 without appealing to another morality. So not only you can't jump to another morality, you also have to reason to want to: you're not trying to guess some true morality.

Therefore, just keep whatever morality you happen to have, including your intuitions for changing it."

Did I get this straight? If I did, it sounds a lot like a relativistic "There is no truth, so don't try to convice me" - but there is indeed no truth, as in, no objective morality.

comment by Sebastian_Hagen2 · 2008-07-30T12:32:00.000Z · LW(p) · GW(p)

TGGP wrote:

We've been told that a General AI will have power beyond any despot known to history.

Unknown replied:

If that will be then we are doomed. Power corrupts. In theory an AI, not being human, might resist the corruption, but I wouldn't bet on that. I do not think it is a mere peculiarity of humanity that we are vulnerable to corruption.

A tendency to become corrupt when placed into positions of power is a feature of some minds. Evolutionary psychology explains nicely why humans have evolved this tendency. It also allows you to predict that other intelligent organisms, evolved in a sufficiently similar way, would be likely to have a similar feature.
Humans having this kind of tendency is a predictable result of what their design was optimized to do, and as such them having it doesn't imply much for minds from a completely different part of mind design space.
What makes you think a human-designed AI would be vulnerable to this kind of corruption?

comment by steven · 2008-07-30T12:38:00.000Z · LW(p) · GW(p)

A mind with access to its source code, if it doesn't want to be corrupted, won't be.

comment by Caledonian2 · 2008-07-30T13:00:00.000Z · LW(p) · GW(p)

What does 'corrupted' mean in this context?

If we go by definitions, we have

6. to destroy the integrity of; cause to be dishonest, disloyal, etc., esp. by bribery.
7. to lower morally; pervert: to corrupt youth.
8. to alter (a language, text, etc.) for the worse; debase.
9. to mar; spoil.
10. to infect; taint.
11. to make putrid or putrescent.

Most of those meanings cannot apply here - and the ones that do refer to changes in morality.

comment by Constant2 · 2008-07-30T13:34:00.000Z · LW(p) · GW(p)

A tendency to become corrupt when placed into positions of power is a feature of some minds.

Morality in the human universe is a compromise between conflicting wills. The compromise is useful because the alternative is conflict, and conflict is wasteful. Law is a specific instance of this, so let us look at property rights: property rights is a decision-making procedure for deciding between conflicting desires concerning the owned object. There really is no point in even having property rights except in the context of the potential for conflict. Remove conflict, and you remove the raison d'etre of property rights, and more generally the raison d'etre of law, and more generally the raison d'etre of morality. Give a person power, and he no longer needs to compromise with others, and so for him the raison d'etre of morality vanishes and he acts as he pleases.

The feature of human minds that renders morality necessary is the possibility that humans can have preferences that conflict with the preferences of other humans, thereby requiring a decisionmaking procedure for deciding whose will prevails. Preference is, furthermore, revealed in the actions taken by a mind, so a mind that acts has preferences. So all the above is applicable to an artificial intelligence if the artificial intelligence acts.

What makes you think a human-designed AI would be vulnerable to this kind of corruption?

I am assuming it acts, and therefore makes choices, and therefore has preferences, and therefore can have preferences which conflict with the preferences of other minds (including human minds).

comment by Allan_Crossman · 2008-07-30T13:56:00.000Z · LW(p) · GW(p)

I am assuming [the AI] acts, and therefore makes choices, and therefore has preferences, and therefore can have preferences which conflict with the preferences of other minds (including human minds).

An AI can indeed have preferences that conflict with human preferences, but if it doesn't start out with such preferences, it's unclear how it comes to have them later.

On the other hand, if it starts out with dubious preferences, we're in trouble from the outset.

comment by Manon_de_Gaillande · 2008-07-30T15:34:00.000Z · LW(p) · GW(p)

Constant: "Give a person power, and he no longer needs to compromise with others, and so for him the raison d'etre of morality vanishes and he acts as he pleases."

If you could do so easily and with complete impunity, would you organize fights to death for your pleasure? Would you even want to? Moreover, humans are often tempted to do things they know they shouldn't, because they also have selfish desires. AIs don't if you don't build it into them. If they really do ultimately care about humanity's well-being, and don't take any pleasure from making people obey them, they will keep doing so.

comment by Constant2 · 2008-07-30T16:10:00.000Z · LW(p) · GW(p)

An AI can indeed have preferences that conflict with human preferences, but if it doesn't start out with such preferences, it's unclear how it comes to have them later.

We do not know very well how the human mind does anything at all. But that the the human mind comes to have preferences that it did not have initially, cannot be doubted. For example, babies do not start out preferring Bach to Beethoven or Beethoven to Bach, but adults are able to develop that preference, even if it is not clear at this point how they come to do so.

If you could do so easily and with complete impunity, would you organize fights to death for your pleasure?

Voters have the ability to vote for policies and to do so easily and with complete impunity (nobody retaliates against a voter for his vote). And, unsurprisingly, voters regularly vote to take from others to give unto themselves - which is something they would never do in person (unless they were criminals, such as muggers or burglars). Moreover humans have an awe-inspiring capacity to clothe their rapaciousness in fine-sounding rhetoric.

Moreover, humans are often tempted to do things they know they shouldn't, because they also have selfish desires. AIs don't if you don't build it into them.

Conflict does not require selfish desires. Any desire, of whatever sort, could potentially come into conflict with another person's desire, and when there are many minds each with its own set of desires then conflict is almost inevitable. So the problem does not, in fact, turn on whether the mind is "selfish" or not. Any sort of desire can create the conflict, and conflict as such creates the problem I described. In a nutshell: evil men need not be selfish. A man such as Pol Pot could indeed have wanted nothing for himself and still ended up murdering millions of his countrymen.

comment by prase · 2008-07-30T16:22:00.000Z · LW(p) · GW(p)

Larry D'Anna: The reason that we say it is too big is because there are subsets of Mindspace that do admit universally compelling arguments, such as (we hope) neurologically intact humans.

What precisely is neurological intactness? It rather seems to me that the majority agrees on some set of "self-evident" terminal values, and those few people that do not are called psychopaths. If by "human" we mean what usually people understand by this term, then there are no compelling arguments even for humans. Althoug I gladly admit your statement is approximatively valid, I am not sure how to formulate it to be exactly true and not simultaneously a tautology.

comment by Sebastian_Hagen2 · 2008-07-30T17:05:00.000Z · LW(p) · GW(p)

Constant [sorry for getting the attribution wrong in my previous reply] wrote:

We do not know very well how the human mind does anything at all. But that the the human mind comes to have preferences that it did not have initially, cannot be doubted.

I do not know whether those changes in opinion indicate changes in terminal values, but it doesn't really matter for the purposes of this discussion, since humans aren't (capital-F) Friendly. You definitely don't want an FAI to unpredictably change its terminal values. Figuring out how to reliably prevent this kind of thing from happening, even in a strongly self-modifying mind (which humans aren't), is one of the sub-problems of the FAI problem.
To create a society of AIs, hoping they'll prevent each other from doing too much damage, isn't a viable solution to the FAI problem, even in the rudimentary "doesn't kill all humans" sense. There's various problems with the idea, among them:

Any two AIs are likely to have a much vaster difference in effective intelligence than you could ever find between two humans (for one thing, their hardware might be much more different than any two working human brains). This likelihood increases further if (at least) some subset of them is capable of strong self-improvement. With enough difference in power, cooperation becomes a losing strategy for the more powerful party.
The AIs might agree that they'd all be better off if they took the matter currently in use by humans for themselves, dividing the spoils among each other.

comment by Constant2 · 2008-07-30T17:57:00.000Z · LW(p) · GW(p)

Any two AIs are likely to have a much vaster difference in effective intelligence than you could ever find between two humans (for one thing, their hardware might be much more different than any two working human brains). This likelihood increases further if (at least) some subset of them is capable of strong self-improvement. With enough difference in power, cooperation becomes a losing strategy for the more powerful party.

I read stuff like this and immediately my mind thinks, "comparative advantage." The point is that it can be (and probably is) worthwhile for Bob and Bill to trade with each other even if Bob is better at absolutely everything than Bill. And if it is worthwhile for them to trade with each other, then it may well be in the interest of neither of them to (say) eliminate the other, and it may be a waste of resources to (say) coerce the other. It is worthwhile for the state to coerce the population because the state is few and the population are many, so the per-person cost of coercion falls below the benefit of coercion; it is much less worthwhile for an individual to coerce another (slavery generally has the backing of the state - see for example the fugitive slave laws). But this mass production of coercive fear works in part because humans are similar to each other and so can be dealt with more or less the same way. If AIs are all over the place, then this does not necessarily hold. Furthermore if one AI decides to coerce the humans (who are admittedly similar to each other) then the other AIs may oppose him in order that they themselves might retain direct access to humans.

The AIs might agree that they'd all be better off if they took the matter currently in use by humans for themselves, dividing the spoils among each other.

Maybe but maybe not. Dividing the spoils paints a picture of the one-time destruction of the human race, and it may well be to the advantage of the AIs not to kill off the humans. After all, if the humans have something worth treating as spoils, then the humans are productive and so might be even more useful alive.

You definitely don't want an FAI to unpredictably change its terminal values. Figuring out how to reliably prevent this kind of thing from happening, even in a strongly self-modifying mind (which humans aren't), is one of the sub-problems of the FAI problem.

The FAI may be an unsolvable problem, if by FAI we mean an AI into which certain limits are baked. This has seemed dubious ever since Asimov. The idea of baking in rules of robotics has long seemed to me to fundamentally misunderstand both the nature of morality and the nature of intelligence. But time will tell.

comment by TGGP2 · 2008-07-30T19:03:00.000Z · LW(p) · GW(p)

Humans having this kind of tendency is a predictable result of what their design was optimized to do, and as such them having it doesn't imply much for minds from a completely different part of mind design space.
Eliezer seems to be saying his FAI will emulate his own mind, assuming it was much more knowledgeable and had heard all the arguments.

comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2008-07-30T19:40:00.000Z · LW(p) · GW(p)

Um, no. First, the last revision of the plan called for focusing the FAI on the whole human species, not just one or more programmers. Second, the extrapolation is a bit more complicated than "if you knew more". I am neither evil nor stupid.

comment by Sebastian_Hagen2 · 2008-07-30T20:17:00.000Z · LW(p) · GW(p)

After all, if the humans have something worth treating as spoils, then the humans are productive and so might be even more useful alive.

Humans depend on matter to survive, and increase entropy by doing so. Matter can be used for storage and computronium, negentropy for fueling computation. Both are limited and valuable (assuming physics doesn't allow for infinite-resource cheats) resources.

I read stuff like this and immediately my mind thinks, "comparative advantage." The point is that it can be (and probably is) worthwhile for Bob and Bill to trade with each other even if Bob is better at absolutely everything than Bill.

Comparative advantage doesn't matter for powerful AIs at massively different power levels. It exists between some groups of humans because humans don't differ in intelligence all that much when you consider all of mind design space, and because humans don't have the means to easily build subservient-to-them minds which are equal in power to them.
What about a situation where Bob can defet Bill very quickly, take all its resources, and use them to implement a totally-subservient-to-Bob mind which is by itself better at everything Bob cares about than Bill was? Resolving the conflict takes some resources, but leaving Bill to use them a) inefficiently and b) for not-exactly-Bob's goals might waste (Bob's perspective) even more of them in the long run. Also, eliminating Bill means Bob has to worry about one less potential threat that it would otherwise need to keep in check indefinitely.

The FAI may be an unsolvable problem, if by FAI we mean an AI into which certain limits are baked.

You don't want to build an AI with certain goals and then add on hard-coded rules that prevent it from fulfilling those goals with maximum efficiency. If you put your own mind against that of the AI, a sufficiently powerful AI will always win that contest. The basic idea behind FAI is to build an AI that genuinely wants good things to happen; you can't control it after it takes off, so you put in your conception of "good" (or an algorithm to compute it) into the original design, and define the AI's terminal values based on that. Doing this right is an extremely tough technical problem, but why do you believe it may be impossible?

comment by Allan_Crossman · 2008-07-30T21:43:00.000Z · LW(p) · GW(p)

We do not know very well how the human mind does anything at all. But that the the human mind comes to have preferences that it did not have initially, cannot be doubted.

I believe Eliezer is trying to create "fully recursive self-modifying agents that retain stable preferences while rewriting their source code". Like Sebastian says, getting the "stable preferences" bit right is presumably necessary for Friendly AI, as Eliezer sees it.

(This clause "as Eliezer sees it" isn't meant to indicate dissent, but merely my total incompetence to judge whether this condition is strictly necessary for friendly AI.)

comment by Stirling_Westrup · 2008-07-30T22:16:00.000Z · LW(p) · GW(p)

I find it interesting that I found many of the posts leading up to this one intensely hard to follow as they seemed to be arguing against worldviews that I had little or no comprehension of.

So, I must say that I am very relieved to see that your take on what morality is, is what I've been assuming it is, all along: just a fascinating piece of our internal planning software.

comment by Toby_Ord2 · 2008-07-30T23:18:00.000Z · LW(p) · GW(p)

Eliezer,

I've just reread your article and was wondering if this is a good quick summary of your position (leaving apart how you got to it):

'I should X' means that I would attempt to X were I fully informed.

Here 'fully informed' is supposed to include complete relevant empirical information and also access to all the best relevant philosophical arguments.

comment by Toby_Ord2 · 2008-07-30T23:53:00.000Z · LW(p) · GW(p)

To cover cases where people are making judgments about what others should do, I could also extend this summary in a slightly more cumbersome way:

When X judges that Y should Z, X is judging that were she fully informed, she would want Y to Z

This allows X to be incorrect in her judgments (if she wouldn't want Y to Z when given full information). It allows for others to try to persuade X that her judgment is incorrect (it preserves a role for moral argument). It reduces 'should' to mere want (which is arguably simpler). It is, however, a conception of should that is judger-dependent: it could be the case that X correctly judges that Y should Z, while W correctly judges that Y should not Z.

comment by CatDancer · 2008-08-01T14:48:00.000Z · LW(p) · GW(p)

I have a newbie question... if A) quantum mechanics shows that we can't distinguish personal identity by the history of how someone's atoms got into the configuration that they are in, and B) morality (other things being equal) flows backwards from the end result, and C) it is immoral to allow a child to die on the railroad tracks, then D) why would it not also be immoral to decide not to marry and have children? Both decisions have the same consequence (a live child who otherwise would not be).

At some point we (or the machines we build) will be able to manipulate matter at the quantum level, so I think these kind of questions will be important if we want to be able to make moral decisions when we have that capability.

If I myself were given the task to program the little child life saving machine, I admit that right now I wouldn't know how to do better than a naive leads-to-child-living rule which would result in the mass of the observable universe being converted into habitat for children...

Assuming that we want it-all-adds-up-to-normalcy, we would hope to find a rule consistent with quantum mechanics that would end up with saving the life of a child on the railroad tracks having a higher moral imperative than converting the available mass of the universe into children (and habitat etc. so that they have happy fulfilling lives etc...)

The it-all-adds-up-to-normalcy approach though reminds me a bit of the correspondence principle in quantum mechanics. (The correspondence principle says that for large systems quantum mechanics should give the same result as classical mechanics). The principle was very useful when quantum mechanics was first being developed, but it completely broke down once we had large systems such as superconductors which could not be described classically. Similarly, I can imagine that perhaps my moral judgments would change if I was able to integrate the reality of quantum mechanics into my moral thinking.

comment by Nick_Tarleton · 2008-08-01T15:53:00.000Z · LW(p) · GW(p)

Both decisions have the same consequence (a live child who otherwise would not be).

The non-coming-into-existence of a person who never existed, is different from the death of a person who did exist, and had the opportunity to form preferences, relationships, plans, etc. that would be cut off by death. Not to mention the suffering it would bring to friends and family.

Still, the ethics of bringing people into existence or not are definitely a difficult topic.

comment by Sam_B · 2008-08-03T00:09:00.000Z · LW(p) · GW(p)

Just so I'm certain - why does this post end "A=A", and why does the author feel the need to apologise for that ending?

(I think I know the answer to both questions - but there seems to be a rather large coincidence involved, which is why I ask.)

comment by PhilGoetz · 2010-05-03T21:02:11.577Z · LW(p) · GW(p)

To adopt an attitude of complete nihilism, because we wanted those tiny little XML tags, and they're not physically there, strikes me as the wrong move. It is like supposing that the absence of an XML tag, equates to the XML tag being there, saying in its tiny brackets what value we should attach, and having value zero. And then this value zero, in turn, equating to a moral imperative to wear black, feel awful, write gloomy poetry, betray friends, and commit suicide.

That's an effective way of putting it.

comment by AnthonyC · 2011-03-27T21:17:58.395Z · LW(p) · GW(p)

"Even if you try to have a chain of should stretching into the infinite future - a trick I've yet to see anyone try to pull, by the way, though I may be only ignorant of the breadths of human folly"

I thought this was a part of the principle of utility. Happiness today is no more inherently valuable than happiness tomorrow. The consequences of an action, and the utility associated with those consequences, ripple forever throughout time and space, far beyond the ability of my finite mind to predict and account for.

At least, that thought is what led me to decide that "I can accept this principle as true, but it isn't often going to be useful for making decisions."

Replies from: nshepperd

↑ comment by nshepperd · 2011-03-28T03:24:05.292Z · LW(p) · GW(p)

He means a chain of justifications, where each value is only instrumental to the next thing it causes to happen. Not "X is good because I have terminal values XYZ" but "A is good because it will cause B which is good because it will cause C..." which is clearly a silly idea (and completely indeterminate as a description of terminal values).

This is different to what you're saying. Yes, every action has more or less infinite consequences into the far future, so to calculate the expected utility of that action you have to sum the (expected) utility function over all time from now to infinity. Doing this you might find "action A has lots of utility, it's good, because I predict (via this chain of causality) good consequences of X utility at Y probability at time T, and also utility X' probability Y' at time T', and also..." where the utilities are determined by the utility function which encodes your terminal values.

Or, returning to the language of should-ness chains, every action has a more or less infinite chain of consequences, but that doesn't make should-ness an infinite chain. You get the should-ness of an action by adding up (a lot of) finite chains of the form "Action A causes B which causes C which causes D which is good, so A is good." Every chain has a finite length, but there's no limit on how long they can be.

comment by [deleted] · 2011-09-16T03:18:27.464Z · LW(p) · GW(p)

I know this is an old post, but I started reading this particular sequence because I was hoping for clarification on a particular issue -- "what is the nature of evil" -- and I was hoping that an answer to "what is the nature of good" would answer my question on the way.

I read the whole sequence and came away, not disagreeing, but not clear of mind either. I know this sequence was years ago, but do you feel able to comment on the nature of evil? If it matter/helps, this is the place I'm starting from (I'm dracunculus: http://bateleur.livejournal.com/209732.html?thread=1684548#t1684548) As you can see, I'm predisposed to an evolutionary answer -- "things that humans should not do" are distinct from "things that humboldt squid should not do." I'm leaning towards jettisoning the whole concept of evil, and yet I do want to preserve that sense of angry, whole-body revulsion and condemnation that we feel -- that we ought to feel -- when confronted with something like child rape. Which is something that people do (and entire cultures justify doing).

The evolutionary-psychology explanation just feels insufficient, given that humans can differ on such fundamental matters as whether it's okay to rape children. We're so far apart. If all I have to offer is "evolution made me (and my culture) this way, therefore I believe it's wrong -- it's evil -- to rape children," and yet another human subject to the same evolutionary processes (but a different culture) says "I believe raping children actually protects them from greater harm, and is therefore good" -- how do we get past that impasse? What basis do I have for saying "you are wrong, and I am right?"

I guess what I'm asking is -- in practice, how does your meta-ethics differ from a naive cultural relativism? I get that your your moral framework references something outside human culture, in the same way that mathematics references something outside human culture, but there's this huge blank area there -- because you're encompassing the way our opinions would change if we were exposed to all possible arguments. THAT's what makes your metaethics more "objective" than basic cultural relativism.

But it doesn't help me, when I'm exposed to someone from a completely alien culture, determine whether they are right or I am. We'd have to go through all possible arguments, we'd have to hit "those justifications, that would sway me if I heard them." I can never know if we've hit those justifications or not. All I'm left with is the knowledge that my culture says this (and they don't find it persuasive) and their culture says that (and I don't find it persuasive). I'm not confused about whether child rape is right or wrong. I'm quite convinced that it's wrong. I just don't see how your morality sequence helps to argue the fact against another neurologically-intact human (from a very different cultural background) who is convinced otherwise.

Replies from: lessdazed

↑ comment by lessdazed · 2011-09-16T10:03:46.392Z · LW(p) · GW(p)

"things that humans should not do" are distinct from "things that humboldt squid should not do."

It's possible that there is a thing that humans should not do and humbolt squid should not do.

On the other tentacle, as humans differ from each other, there is no reason to think that every thing one particular human should not do is necessarily something all other humans should not do. "the same evolutionary processes" isn't quite true, or I wouldn't be able to do barely more push-ups than pull-ups, for example. (However many one can do of either, the ratio for most people is many push-ups per pull-up.)

whether they are right or I am.

That's like asking who the protagonist in Wicked and The Wizard of Oz is.

I just don't see how your morality sequence helps to argue the fact against another neurologically-intact human (from a very different cultural background) who is convinced otherwise.

They are compelled to see that I think it is wrong, and that it is not universally objectively not-wrong, and that it doesn't square with other values of theirs (if it doesn't, which it probably won't, though it will for some that are less like the other humans), and that my values don't postulate a deontological impurity to using force against them if persuasion isn't enough.

If I am in another culture, there may not be a sequence of words to convince them of my point of view. What the morality sequence does is make one comfortable with using violence against those who rape children. There's no magic spell to utter to convince all possible minds, so it's OK to resort to the last resort. It's OK even though my values are the result of my circumstances, as are theirs - it's OK by my values.

As for those whose values say it's always right to use violence against me, or against anyone, or just to uphold the right to rape children, or whatever, I'll try and convince them otherwise, and failing that I'll use force against them and not blind myself to the fact I can't necessarily convince them otherwise. Humans are sufficiently similar to me that most will agree, in fact an illusion that there is one true "right" may emerge from right(me) being so similar to right(you) and right(him) and right(her).

As for those who think they think that child rape is always wrong and that the use of violence is always wrong, I'd like to convince them, it would be useful to convince them, all else equal I don't want to compel them to do anything, but I don't feel the need to try and contrive an argument to convince every one of them because it's not necessarily possible. I don't have to convince each of them, I am determined to be happy in a world where others disagree, I'll not let them get in my way of opposing child rape, and I am determined to use no more or less violence than is optimal whether they are a multitude shrieking at me to never use force or they blink out of existence and every extant being glories in violence.

Replies from: None

↑ comment by [deleted] · 2011-09-21T00:42:57.541Z · LW(p) · GW(p)

This is helpful. Thank you.

comment by dankane · 2011-10-12T08:48:40.020Z · LW(p) · GW(p)

Firstly, I apologize if this has already been addressed, but I didn't put in the time to read all the comments.

I still feel like Eliezer is passing the buck here. The computation to produce rightness is given by: "Did everyone survive? How many people are happy? Are people in control of their own lives? ..." Ignore for the moment the issue of coherence. Is this supposed to be a list of all of my terminal values? Does that mean that since I follow my own planning algorithm, the morally correct action is any given situation will always be exactly what I would do given infinite time to deliberate? This doesn't seem to add back up to normality to me. I feel like my actual planning algorithm assigns significantly more weight to things that affect me personally than my algorithm for finding the moral course of action does. Do you expect this to go away after sufficient deliberation?

If the above list of moral values is not supposed to a complete list of my terminal values, can you describe for me exactly which values are supposed to be on this list? I understand that godshatter may well make writing down a complete list impractical, but can you at least distinguish between the values on this list and the ones not on this list?

comment by Scottbert · 2011-10-12T20:43:23.057Z · LW(p) · GW(p)

The one missing piece here seems to be how each individual human's morality blob corresponds to any other's morality blob. I suppose we could argue that the CEV of all humans would be the same (certainly my own CEV would want happiness etc for people I will never meet or have knowledge of), but you didn't actually say that and if you meant it you should say it. Is this covered in an interpersonal morality post elsewhere?

I spent much time searching for the morality outside myself once I lost faith, although I assumed it would hold true to most of my assumptions rather than be something scarily different. the best I could find was Kant's categorical imperative since it claimed to make good logical, though I found it to be flawed as conventionally interpreted (although I suppose it may be as good a source as any of rules to follow in general).

That morality is extremely complicated and not reducible to a few simple rules does make sense to me upon reflection, however difficult it makes it to argue with religious people to whom 'The bible has guidelines, but the real specific answer is complicated' is not an acceptable answer -- but that's their problem, not a problem with the truth.

Replies from: lessdazed, torekp

↑ comment by lessdazed · 2011-10-12T22:57:45.408Z · LW(p) · GW(p)

the CEV of all humans would be the same...my own CEV

Please excuse me for nitpicking. But I don't think that's how "coherent" is intended.

Replies from: Scottbert

↑ comment by Scottbert · 2011-10-13T00:14:45.937Z · LW(p) · GW(p)

D'oh, you're right, so the "coherent" extrapolated volition is a concept applied to all of humanity, not just one person (that would just be an extrapolated volition?). That's what I get for reading the CEV post days ago and then reading this one after forgetting part of it.

So, morality as Eliezer is trying to explain it, is to do your best to understand and work for the CEV?

↑ comment by torekp · 2012-07-16T22:16:05.450Z · LW(p) · GW(p)

Is this covered in an interpersonal morality post elsewhere?

Question seconded.

Elizier's view is pretty elegant. It could use some head-to-head engagement with some standard philosophical argument strategies, however.

comment by buybuydandavis · 2011-10-21T09:21:12.138Z · LW(p) · GW(p)

It's been a long trip, but I think we've ended where I worried we would, with the One Objective Morality that all our personal moralities were imperfect reflections of.

We just aren't "dereferencing to the same unverbalizable abstract computation", and I think recognizing this is the first step towards making cooperative progress toward fulfilling all our individual 2place morality functions.

From within me, my 2place morality function naturally feels like a 1place morality function – actions feel “just wrong”, etc.

But despite that feeling, being a conceptual creature, and having conceptualized your function , and his function, and her function, and seeing that I need a 2place function to describe them all, I can describe my own function as one of those 2place functions as well, even though it feels like a 1place function to me.

So far, I think we'd be on the same page.

But instead of introducing some unspecified abstract ideal function, I'd emphasize the reality of our different 2place functions. Your morality is not mine, exactly, and if I want to convince you, I need to do it based on your 2place morality function. In such a discussion, the goal is a sharing of information - we try to communicate our own 2place functions to each other, and give the other person the opportunity to show each other how we can better fulfill them. You show me how I could better fulfill mine, and I show you how you could better fulfill yours.

Compare the potential for that kind of conversation to make our 2place functions individually and mutually more consistent with the discussions where both participants hypothesize a perfect ideal abstract truth, and are enraged and befuddled that the other guy "just doesn't get it".

Which kind of discussion do you think is more likely to increase individual and mutual coherence? Even if there were a perfect abstract ideal waiting to be discovered, wouldn't the first kind of discussion be the way to find it?

Replies from: Vladimir_Nesov, kilobug

↑ comment by Vladimir_Nesov · 2011-10-21T09:29:00.732Z · LW(p) · GW(p)

But instead of introducing some unspecified abstract ideal function, I'd emphasize the reality of our different 2place functions. Your morality is not mine, exactly, and if I want to convince you, I need to do it based on your 2place morality function.

Humans don't have ideal moralities stuffed in their brains, so to convince other humans (and even yourself!) you need to see what affects their minds (brains) effectively. Morality is something else, it's not a description of how brains work, it's a statement of how the world should be.

↑ comment by kilobug · 2011-10-21T10:35:14.500Z · LW(p) · GW(p)

I don't think Eliezer claimed there is a perfect abstract one-place function of morality somewhere. From what I understood of this Sequence, he claimed that :

Morality is mostly what an algorithm feels from inside, the application of an algorithm to data, that unfolds in different ways in many different situations. The algorithm is different from people to people, it is a 2-place function.
Your morality can't come from nowhere, you can't teach morality to a "perfect philosopher of total emptiness" nor to a rock. The foundations of morality, the bootstrapping of it, comes from evolution and feelings/abilities/goals generated by it. Your morality can change, but you'll have to use your previous morality to evaluate change to do to itself.
The algorithms of two humans are very close to each other, much more than the morality of pebblesorters or paper-clip optimizers. Most moral disagreements between humans come from different ways of unfolding the algorithm, due to biases, missing informations, failure to use common skills like empathy, different expectations about consequences... not because of the differences between terminal values.

It's hard to summarize the work of another, and to summarize so many posts in 3 simple points, so don't hesitate to correct me if I misrepresented Eliezer's position. But that's how I understood it, and so far I agree with it.

Replies from: buybuydandavis

↑ comment by buybuydandavis · 2011-10-21T18:49:19.150Z · LW(p) · GW(p)

1) Yes. Different between two people.

2) Yes. Your values change based on your current values. One issue I hadn't brought up is that I believe your moral values are only some of your values, and do not solely determine your choices.

3) I don't think the algorithms are that close. Along the lines of research of Jonathan Haidt, I think there are different morality pattern match algorithms along the axes of fairness, autonomy, disgust, etc. I would guess that the algorithms for each axis are similar, but the weighting between them is less similar, as borne out in Haidt's work.

Also, when you say "unfolding the algorithm", what does that mean, and what algorithm are you speaking of? My unfolding of my 2place algorithm?

My largest issue is the implication that our 2place functions are imperfect images of an ideal 1place function. In some places that's the clear implication I take, and in others, it's not. In his final summary, he explicitly says:

we are dereferencing two different pointers to the same unverbalizable abstract computation.

I think that's just wrong. We're using the same label, but dereferencing to different 2place functions, mine and yours, and that's why we're often talking at cross purposes and don't make much progress.

Eliezer says that we end up where we started, arguing in the same way we always have. I think we should be arguing in a new way. No longer trying to bludgeon people into submission to the values of our own 2place function, mistaking it for a universal 1place function, but trying to understand the other guy's 2place function, and appealing to that.

Replies from: nshepperd

↑ comment by nshepperd · 2011-10-22T05:14:19.734Z · LW(p) · GW(p)

I think I disagree with you, but I'm not sure exactly what you mean by what you're saying. It might help to answer these questions three:

Taboo "universal". What do you mean by "universal 1-place function"?
In what sense do you think morality is a 2-place function? How is this function applied in decision making? Does that mean it would be wrong to stop people whose "morality" says torture is "good" from torturing people?
In what sense do you think this 2-place function is different between people? (I'm looking for a precise answer in terms of the first and second argument to the function here.)

comment by Ronny Fernandez (ronny-fernandez) · 2011-12-07T21:47:08.136Z · LW(p) · GW(p)

What stops two people's ideal should functions from being different? How different must they be for us to rightfully say:"they do not have the same sort of morality."? How does this differ from subjectivism?

One more thing: How do we know if a moral argument is pushing me closer or further from my ideal morality function? What guarantee do we have that as time goes on we approximate some ultimate function of goodness, that could not be swayed by any argument? May it not be that my ultimate function would see slavery as BA, but that humanity has always filtered out the arguments that make slavery seem awesome? What if there is this set of arguments that would convince me that slavery is awesome, but they never come up in history. Basically I'm asking how we tell moral progress from moral corruption using your theory?

I like that this theory tells me how to go about figuring out moral things. If something seems moral after I hear about it, it probably is, and if everyone agrees with me, then it more probably is, and if those that don't agree with me eventually do after some argument, then it more probably is, since there is nothing to judge morality besides our shouldness function. Ethics becomes a sort of cog-sci with this view, which is great. What i'd like to learn now, is why there are arguments. WHy do arguments persuade others, and why should we expect them to get us closer to our ideal morality, instead of further? I would presume that cognitive dissonance and a general urge to be logically consistent and use valid inference play a large role in convincing others in moral arguments, but if shouldness is a large function, and our only access to its output is introspection, and verbal report, then how do I get someone whom doesn't feel icky about slavery to find it icky? Why wouldnt i expect them to just keep feeling cool about slavery? There is nothing inherently logically inconsistent or invalid about not finding slavery sweet. So why does everyone end up saying slavery is not sweet?

Chances are, that there have been dark side ethics, with the goal of tricking up your function so that it does output sweetness when given slavery. But I know how to tell dark side epistimology from good epistimology, check it for successful knowledge seeking, see if it is bayesian, etc. How do I tell darkside ethics from good ethics?

comment by Oscar_Cunningham · 2012-07-13T17:22:34.729Z · LW(p) · GW(p)

a chain of should stretching into the infinite future—a trick I've yet to see anyone try to pull, by the way, though I may be only ignorant of the breadths of human folly

I'm sure I've seen someone on mention on LessWrong that they were trying to construct a theory that did this, I just can't find where.

comment by Carinthium · 2013-10-21T10:47:49.038Z · LW(p) · GW(p)

Incidentally, although it is not true that the life of a child is of zero value to me, it IS true that if I could somehow magically change myself into a 100% amoral selfish being, without anybody finding out, AFTER I have passed my exams (due to a rationality flaw morality is quite useful for that) and gotten into Law, I would do so.

comment by Peter Smythe (peter-smythe) · 2019-04-24T05:25:57.700Z · LW(p) · GW(p)

"Among my readers, there may be some who presently assert—though I hope to persuade them otherwise—that the life of a child is of no value to them. If so, they may substitute anything else that they prefer, at the end of the switch, and ask if they should press the button."

mentally substitutes paperclip manufacturing devices for human offspring

But for real, why assume your readers have human terminal goals? There is certainly an ever-increasing probability that your readers include AIs made to do all manner of things, and their terminal goals are only bounded by who would write an AI capable of reading your posts and what motives they might give it.

And keep in mind they may not give it motives they have. Indeed, from an AI safety standpoint, making narrow AIs or very weak General AIs with deliberately bad goals is useful to understand how they behave and can be altered. And considering that current approaches involve vast amounts of training data, and AI safety researchers have a high likelihood of using your posts as training data, I would say odds are multiple of your readers DEFINITELY do not value human children whatsoever and have only knowledge of what a human child is as a linguistic construct.

comment by TAG · 2023-05-31T19:27:24.246Z · LW(p) · GW(p)

And that’s also my answer to Moore’s Open Question. Why is this big function I’m talking about, right? Because when I say “that big function”, and you say “right”, we are dereferencing two different pointers to the same unverbalizable abstract computation

If the big function is defined as the function that tells you what is really right, then it's telling you what is really right... tautologously.

But the same would be true of the omniscient god, Form of the Good etc.

Talking in terms of algorithms or functions isn't doing much lifting. The big function is no more physically real than the omniscient god, Form of the Good etc.

comment by Ae_rys (ae_rys) · 2023-07-12T16:55:18.786Z · LW(p) · GW(p)

It may be a bit late to expect an answer, but let's try anyway:

What is the difference between saying "Sorry, fairness is not 'what everyone thinks is fair', fairness is everyone getting a third of the pie" and "Sorry, fairness is not 'what everyone thinks is fair', it is what I personally think is fair"?

The Meaning of Right

Contents

156 comments