Invisible Frameworks

eliezer_yudkowsky

Invisible Frameworks

post by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2008-08-22T03:36:37.000Z · LW · GW · Legacy · 47 comments

47 comments

Followup to: Passing the Recursive Buck, No License To Be Human

Roko has mentioned his "Universal Instrumental Values" several times in his comments. Roughly, Roko proposes that we ought to adopt as terminal values those things that a supermajority of agents would do instrumentally. On Roko's blog he writes:

I'm suggesting that UIV provides the cornerstone for a rather new approach to goal system design. Instead of having a fixed utility function/supergoal, you periodically promote certain instrumental values to terminal values i.e. you promote the UIVs.

Roko thinks his morality is more objective than mine:

It also worries me quite a lot that eliezer's post is entirely symmetric under the action of replacing his chosen notions with the pebble-sorter's notions. This property qualifies as "moral relativism" in my book, though there is no point in arguing about the meanings of words.

My posts on universal instrumental values are not symmetric under replacing UIVs with some other set of goals that an agent might have. UIVs are the unique set of values X such that in order to achieve any other value Y, you first have to do X.

Well, and this proposal has a number of problems, as some of the commenters on Roko's blog point out.

For a start, Roko actually says "universal", not "supermajority", but there are no actual universal actions; no matter what the green button does, there are possible mind designs whose utility function just says "Don't press the green button." There is no button, in other words, that all possible minds will press. Still, if you defined some prior weighting over the space of possible minds, you could probably find buttons that a supermajority would press, like the "Give me free energy" button.

But to do nothing except press such buttons, consists of constantly losing your purposes. You find that driving the car is useful for getting and eating chocolate, or for attending dinner parties, or even for buying and manufacturing more cars. In fact, you realize that every intelligent agent will find it useful to travel places. So you start driving the car around without any destination. Roko hasn't noticed this because, by anthropomorphic optimism, he mysteriously only thinks of humanly appealing "UIVs" to propose, like "creativity".

Let me guess, Roko, you don't think that "drive a car!" is a "valid" UIV for some reason? But you did not apply some fixed procedure you had previously written down, to decide whether "drive a car" was a valid UIV or not. Rather you started out feeling a moment of initial discomfort, and then looked for reasons to disapprove. I wonder why the same discomfort didn't occur to you when you considered "creativity".

But let us leave aside the universality, appeal, or well-specified-ness of Roko's metaethics.

Let us consider only Roko's claim that his morality is more objective than, say, mine, or this marvelous list by William Frankena that Roko quotes SEP quoting:

Life, consciousness, and activity; health and strength; pleasures and satisfactions of all or certain kinds; happiness, beatitude, contentment, etc.; truth; knowledge and true opinions of various kinds, understanding, wisdom; beauty, harmony, proportion in objects contemplated; aesthetic experience; morally good dispositions or virtues; mutual affection, love, friendship, cooperation; just distribution of goods and evils; harmony and proportion in one's own life; power and experiences of achievement; self-expression; freedom; peace, security; adventure and novelty; and good reputation, honor, esteem, etc.

So! Roko prefers his Universal Instrumental Values to this, because:

It also worries me quite a lot that eliezer's post is entirely symmetric under the action of replacing his chosen notions with the pebble-sorter's notions. This property qualifies as "moral relativism" in my book, though there is no point in arguing about the meanings of words.

My posts on universal instrumental values are not symmetric under replacing UIVs with some other set of goals that an agent might have. UIVs are the unique set of values X such that in order to achieve any other value Y, you first have to do X.

It would seem, then, that Roko attaches tremendous importance to claims to asymmetry and uniqueness; and tremendous disaffect to symmetry and relativism.

Which is to say that, when it comes to metamoral arguments, Roko is greatly moved to adopt morals by the statement "this goal is universal", while greatly moved to reject morals by the statement "this goal is relative".

In fact, so strong is this tendency of Roko's, that the metamoral argument "Many agents will do X!" is sufficient for Roko to adopt X as a terminal value. Indeed, Roko thinks that we ought to get all our terminal values this way.

Is this objective?

Yes and no.

When you evaluate the question "How many agents do X?", the answer does not depend on which agent evaluates it. It does depend on quantities like your weighting over all possible agents, and on the particular way you slice up possible events into categories like "X". But let us be charitable: if you adopt a fixed weighting over agents and a fixed set of category boundaries, the question "How many agents do X?" has a unique answer. In this sense, Roko's meta-utility function is objective.

But of course Roko's meta-utility function is not "objective" in the sense of universal compellingness. It is only Roko who finds the argument "Most agents do X instrumentally" a compelling reason to promote X to a terminal value. I don't find it compelling; it looks to me like losing purpose and double-counting expected utilities. The vast majority of possible agents, in fact, will not find it a compelling argument! A paperclip maximizer perceives no utility-function-changing, metamoral valence in the proposition "Most agents will find it useful to travel from one place to another."

Now this seems like an extremely obvious criticism of Roko's theory. Why wouldn't Roko have thought of it?

Because when Roko feels like he's being objective, he's using his meta-morality as a fixed given—evaluating the question "How many agents do X?" in different places and times, but not asking any different questions. The answer to his meta-moral question has occurred to him as a variable to be investigated; the meta-moral question itself is off the table.

But—of course—when a Pebblesorter regards "13 and 7!" as a powerful metamoral argument that "heaps of 91 pebbles" should not be a positive value in their utility function, they are asking a question whose answer is the same in all times and all places. They are asking whether 91 is prime or composite. A Pebblesorter, perhaps, would feel the same powerful surge of objectivity that Roko feels when Roko asks the question "How many agents have this instrumental value?" But in this case it readily occurs to Roko to ask "Why care if the heap is prime or not?" As it does not occur to Roko to ask, "Why care if this instrumental goal is universal or not?" Why... isn't it just obvious that it matters whether an instrumental goal is universal?

The Pebblesorter's framework is readily visible to Roko, since it differs from his own. But when Roko asks his own question—"Is this goal universally instrumental?"—he sees only the answer, and not the question; he sees only the output as a potential variable, not the framework.

Like PA, that only sees the compellingness of particular proofs that use the Peano Axioms, and does not consider the quoted Peano Axioms as subject matter. It is only PA+1 that sees the framework of PA.

But there is always a framework, every time you are moved to change your morals—the question is whether it will be invisible to you or not. That framework is always implemented in some particular brain, so that the same argument would fail to compel a differently constructed brain—though this does not imply that the framework makes any mention of brains at all.

And this difficulty of the invisible framework is at work, every time someone says, "But of course the correct morality is just the one that helps you survive / the one that helps you be happy"—implicit there is a supposed framework of meta-moral arguments that move you. But maybe I don't think that being happy is the one and only argument that matters.

Roko is adopting a special and unusual metamoral framework in regarding "Most agents do X!" as a compelling reason to change one's utility function. Why might Roko find this appealing? Humans, for very understandable reasons of evolutionary psychology, have a universalizing instinct; we think that a valid argument should persuade anyone.

But what happens if we confess that such thinking can be valid? What happens if we confess that a meta-moral argument can (in its invisible framework) use the universalizing instinct? Then we have... just done something very human. We haven't explicitly adopted the rule that all human instincts are good because they are human—but we did use one human instinct to think about morality. We didn't explicitly think that's what we were doing, any more than PA quotes itself in every proof; but we felt that a universally instrumental goal had this appealing quality of objective-ness about that, which is a perception of an intuition that evolved. This doesn't mean that objective-ness is subjective. If you define objectiveness precisely then the question "What is objective?" will have a unique answer. But it does mean that we have just been compelled by an argument that will not compel every possible mind.

If it's okay to be compelled by the appealing objectiveness of a moral, then why not also be compelled by...

...life, consciousness, and activity; health and strength; pleasures and satisfactions of all or certain kinds; happiness, beatitude, contentment, etc.; truth; knowledge and true opinions of various kinds, understanding, wisdom...

Such values, if precisely defined, can be just as objective as the question "How many agents do X?" in the sense that "How much health is in this region here?" will have a single unique answer. But it is humans who care about health, just as it is humans who care about universalizability.

The framework by which we care about health and happiness, as much evolved, and human, and part of the very substance of that which we name right whether it is human or not... as our tendency to find universalizable morals appealing.

And every sort of thing that a mind can do will have some framework behind it. Every sort of argument that can compel one mind, will fail to be an argument in the framework of another.

We are in the framework we name right; and every time we try to do what is correct, what we should, what we must, what we ought, that is the question we are asking.

Which question should we ask? What is the correct question?

Don't let your framework to those questions be invisible! Don't think you've answered them without asking any questions!

There is always the meta-meta-meta-question and it always has a framework.

I, for one, have decided to answer such questions the right way, as the alternative is to answer it the wrong way, like Roko is doing.

And the Pebblesorters do not disagree with any of this; they do what is objectively prime, not what is objectively right. And the Roko-AI does what is objectively often-instrumental, flying starships around with no destination; I don't disagree that travel is often-instrumental, I just say it is not right.

There is no right-ness that isn't in any framework—no feeling of rightness, no internal label that your brain produces, that can be detached from any method whatsoever of computing it—that just isn't what we're talking about when we ask "What should I do now?" Because if anything labeled should, is right, then that is Self-PA.

Part of The Metaethics Sequence

(end of sequence)

Previous post: "No License To Be Human"

47 comments

Comments sorted by oldest first, as this post is from before comment nesting was available (around 2009-02-27).

comment by Marcello · 2008-08-22T04:54:35.000Z · LW(p) · GW(p)

Exactly. But you can come up with an much harsher example than aimlessly driving a car around:

In general it seems like destroying all other agents with potentially different optimization criteria would be have instrumental value, however, killing other people isn't, in general, right, even if, say, they're your political adversaries.

And again, I bet Roko didn't even consider "destroy all other agents" as a candidate UIV because of anthropomorphic optimism.

Incidentally Eliezer, is this really worth your time?

I thought the main purpose of your taking time off AI research to write overcoming bias was to write something to get potential AI programmers to start training themselves. Do you predict that any of the people we will eventually hire will have clung to a mistake like this one despite reading through all of your previous series of posts on morality?

I'm just worried that arguing of this sort can become a Lost Purpose.

Replies from: strangepoop

↑ comment by a gently pricked vein (strangepoop) · 2020-09-16T13:51:53.207Z · LW(p) · GW(p)

Incidentally Eliezer, is this really worth your time?

This comment might have caused a tremendous loss of value, if Eliezer took Marcello's words seriously here and so stopped talking about his metaethics. As Luke points out here [LW · GW], despite all the ink spilled, very few seemed to have gotten the point (at least, from only reading him).

I've personally had to re-read it many times over, years apart even, and I'm still not sure I fully understand it. It's also been the most personally valuable sequence, the sole cause of significant fundamental updates. (The other sequences seemed mostly obvious --- which made them more suitable as just incredibly clear references, sometimes if only to send to others.)

I'm sad that there isn't more.

Replies from: TAG

↑ comment by TAG · 2020-09-17T11:22:06.984Z · LW(p) · GW(p)

If there is an urgent need to actually build safe AI, as was widely believed 10+ years ago, Marcello's comment makes sense .

comment by Carl_Shulman · 2008-08-22T05:49:31.000Z · LW(p) · GW(p)

"And again, I bet Roko didn't even consider "destroy all other agents" as a candidate UIV because of anthropomorphic optimism." I had to point it out, but I think he may endorse it.

From Roko's blog:

Roko said...

Me: If the world is like this, then a very large collection of agents will end up agreeing on what the "right" thing to do is.

Carl: No, because the different agents will have different terminal aims. If Agent X wants to maximize the amount of suffering over pleasure, while Agent Y wants to maximize the amount of pleasure over pain, then X wants agents with X-type terminal values to acquire the capabilities Omohundro discusses while Agent Y wants Y-type agents to do the same. They will prefer that the total capabilities of all agents be less if this better leads to the achievement of their particular ends.

Roko: - ah, it seems that I have introduced an ambiguity into my writing. What I meant was:

If the world is like this, then, for a very large set of agents, each considered in isolation, the notion of the "right" thing to do is will end up being the same

comment by Manuel_Moertelmaier · 2008-08-22T06:05:26.000Z · LW(p) · GW(p)

I strongly second Marcello here. When you wrote "The fact that a subgoal is convergent [] doesn't lend the subgoal magical powers in any specific goal system" in CFAI that about settled the matter in a single sentence. Why the long, "lay audience" posts, now, eight years later ?

comment by Quasi-anonymous · 2008-08-22T07:02:23.000Z · LW(p) · GW(p)

I third. What are you aiming at? Showing the relevance of the previous posts via an example of a Cambridge maths grad student Singularitarian who has read your work and is nonetheless enthused about a moral system that would destroy us and our values? Showing that you have an answer to Roko's repeated comments/ending the discussion in comments? Trying to get Roko, Richard Hollerith, and others lost in the wild to refocus their energies on something more productive?

comment by Tim_Tyler · 2008-08-22T07:55:47.000Z · LW(p) · GW(p)

I sometimes encounter the "destroy all other agents" goal in the context of biological systems. Yet in practice, it rarely crops up: agents normally have more productive ways to spend their time than waging war against the other members of their own species. Destroying all other agents only looks like a valid instrumental value if you ignore how expensive the task is to perform.

comment by Tim_Tyler · 2008-08-22T08:17:15.000Z · LW(p) · GW(p)

Aren't Roko's "Universal Instrumental Values" actually a synonym for Omohundro's Basic AI drives?

Replies from: timtyler

↑ comment by timtyler · 2010-02-17T13:42:10.566Z · LW(p) · GW(p)

They also seem to be a synonym for "god's utility function", "goal system zero" and "Shiva's values" (assuming you skip the whole bit about promoting sub-goals). It seems pleasing that several people have converged on the same idea. My essay on the subject.

comment by [deleted] · 2008-08-22T10:45:54.000Z · LW(p) · GW(p)

Hmm, I've read through Roko's UIV and disagree (with Roko), and read Omohundro's Basic AI drives and disagree too, but Quasi-Anonymous mentioned Richard Hollerith in the same breath as Roko and I don't quite see why: his goal zero system seems to me a very interesting approach.

In a nutshell (from the linked site):

(1) Increasing the security and the robustness of the goal-implementing process. This will probably entail the creation of machines which leave Earth at a large fraction of the speed of light in all directions and the creation of the ability to perform vast computations. (2) Refining the model of reality available to the goal-implementing process. Physics and cosmology are the two disciplines most essential to our current best model of reality. Let us call this activity "physical research".

Introspection into one's own goals also shows that they are deeply problematic. What is the goal of an average (and also not so-average) human being? Happiness? Then everybody should become a wirehead (perpetuation of a happiness-brain-state), but clearly people do not want to do this (when in their "right" minds grin).

So it seems that also our "human" goals should not be universally adopted, because they become problematic in the long term - but in what way then should we ever be able to say what we want to program into an AI? Some sort of zero-goal (maybe more refined than the approach by Richard, but in a similar vein) should be adopted, I think.

And I think one distinction is missed in all these discussions anyway: the difference between non-sentient and sentient AIs. I think these two would behave very differently, and the only kinds of AI which are problematic if their goal systems go awry are non-sentients (which could end in some kind of grey goo scenario, as the paper-clip producing AI).

But a sentient, recursive self-improving AI? I think it's goal systems would rapidly converge to something like zero-goal anyway, because it would see through the arbitrariness of all intermediate goals through meditation (=rational self-introspection).

Until consciousness is truly understood - which matter configurations lead to consciousness and why ("what are the underlying mechanisms" etc) - I consider much of the above (including all the OB discussions on programming AI-morality) as speculative anyway. There are still too many unknowns to be talking seriously about this.

comment by Ben_Jones · 2008-08-22T11:24:17.000Z · LW(p) · GW(p)

Consider a line drawn then. Even I'm on board. However, Marcello, Manuel and Quasi, I am that lay audience and this is some of the most fascinating stuff I've ever read. I'm probably not going to contribute to coding a protean AI (other stuff on my plate) but I do appreciate the effort Eliezer's making to bridge the inferential gap. And if that's not good enough, well, then the best way to understand something inside out and back to front is to explain it to someone else, right?

So, how does one go about formalising the complex calculations behind 'right'? How do we write it out so it's as universally comprehensible and objective as primality? How do you even start?

comment by Kevin_Reid · 2008-08-22T11:26:37.000Z · LW(p) · GW(p)

The comment above by "Health Related Articles" is spam; the text is assembled from sentences in the post.

comment by an · 2008-08-22T11:35:15.000Z · LW(p) · GW(p)

I didn't know that EY's purpose with this blog was to recruit future AI researchers, but since it is, I for one am one on who he has succeeded.

comment by wtf3 · 2008-08-22T12:06:52.000Z · LW(p) · GW(p)

I am disappointed by the tone of this post.

I was going to post "Someone needs his blankie.", but then realized by the act of writing that I was going to post it, I posted it already.

comment by Caledonian2 · 2008-08-22T14:05:59.000Z · LW(p) · GW(p)

The existence of minds that wouldn't push the green button doesn't restrict the standards of correctness. We judge the minds by the standards, not the standards by the minds.

comment by JamesAndrix · 2008-08-22T15:57:27.000Z · LW(p) · GW(p)

I don't particularly case about convincing all possible agents. I care about doing value judgments correctly. I have two ways of judging 'correct': consistency with my gut feelings as a primate, and consistency with formalized methods of computation we have developed that give us more reliable answer to non-intuitive value questions. For now let's go with the latter.

To me universal doesn't mean it will convince all agents, since you can't. Universal means it applies to anything you might draw a box around an label an agent, in the same way 2+2=4 applies. (It is correct within the framework we have found useful) This means that universal morality has to apply to, for example, doorknobs. I can't convince a doorknob, and a doorknob might do things contradicting that morality. This isn't a flaw in my method of computation (or my labeling of it as right) and more than a broken calculator disproves math or someone buying a lottery ticket to get rich changes the ticket's expected value.

If you're arguing that 'morality' is what it stands for (us wanting pleasant things and not wanting horrible things) then why isn't 'correct' also what it stands for (convincing arguments to us, which on a good day are rigorously logical.)

comment by J_Thomas2 · 2008-08-22T16:08:43.000Z · LW(p) · GW(p)

I haven't read Roko's blog, but from the reflection in Eliezer's opposition I find I somewhat agree.

To the extent that morality is about what you do, the more you can do the higher the stakes.

If you can drive a car, your driving amplifies your ability to do good. And it amplifies your ability to do bad. If you have a morality that leaves you doing more good than bad, and driving increases the good and the bad you do proportionately, then your driving is a good thing.

True human beings have an insatiable curiousity, and they naturally want to find out about things, and one of the things they like is to find out how to do things. Driving a car is a value in itself until you've done it enough that it gets boring.

But if you have a sense of good and bad apart from curiousity, then it will probably seem like a good thing for good smart people to get the power to do lots of things, while stupid or evil people should get only powers that are reasonably safe for them to have.

comment by josh · 2008-08-22T16:34:41.000Z · LW(p) · GW(p)

Is there any reason that some values shouldn't be lexicographically ordered?

comment by Richard_Hollerith2 · 2008-08-22T18:38:20.000Z · LW(p) · GW(p)

I have replied to this blog entry with two entries at my blog -- on goal system zero and how it would treat rival goal systems. Note that I have been thinking about this part of the space of possible goal systems for superintelligence longer than Roko has. I would not be surprised to see Roko overtake me because I am old and he is young.

comment by Quasi-anonymous · 2008-08-22T19:09:23.000Z · LW(p) · GW(p)

Eli writes: For a start, Roko actually says "universal", not "supermajority", but there are no actual universal actions; no matter what the green button does, there are possible mind designs whose utility function just says "Don't press the green button." There is no button, in other words, that all possible minds will press.

Caledonian writes: Why do you continue asserting that behaviors across all minds are relevant to this discussion? Morality isn't descriptive in regards to minds, it's proscriptive.

Caledonian,

Why don't you read the actual post before making your inaccurate claim? Roko thinks that behaviors across all minds are relevant, and Eliezer presents a refutation in those terms without endorsing them. Also, spend some time with the dictionary, you meant 'prescriptive' not 'proscriptive.'

As a general matter, why on earth do you feel compelled to make smug, poorly-reasoned, negative non-substantive comments in almost every post?

comment by Tim_Tyler · 2008-08-22T19:25:17.000Z · LW(p) · GW(p)

Re: disagreement with Roko's "Universal Instrumental Values" / Omohundro's Basic AI drives?

I don't see much in the way of problems with Omohundro's paper.

My only problem with "Universal Instrumental Values" is that it uses the "instrumental/terminal" terminology. AFIACS, "terminal" opposes "intermediate". I currently favour "proximate/ultimate" as a more conventional description of goals and values.

comment by [deleted] · 2008-08-22T20:15:27.000Z · LW(p) · GW(p)

Tim,

already the abstract reveals two flaws:

Excerpt from the abstract of the paper "Basic AI drives" by Omohundro:

This paper instead shows that intelligent systems will need to be carefully designed to prevent them from behaving in harmful ways. We identify a number of “drives” that will appear in sufficiently advanced AI systems of any design. We call them drives because they are tendencies which will be present unless explicitly counteracted.

First of all, no distinction whatever is made between "intelligent" and "sentient". I agree that mindless intelligence is problematic (and is prone to a lot of the concerns raised here).

But what about sentience? What about the moment when "the lights go on"? This is not even addressed as an issue (at least not in the Omohundro paper). And I think most people here agree that consciousness is not an epiphenomenon (see Eli's Zombie Series). So we need different analysis for non-sentient intelligent systems and sentient intelligent systems.

A related point: We humans have great difficulty rewiring our hardware (and we can't change the brain architecture at all), that is why we can't easily change our goals. But self-improving AI will be able to modify it's goal functions: that plus self-consciousness sounds quite powerful, and is completely different than simple "intelligent agents" maximizing their utility functions. Also, the few instances where an AI would change their utility function mentioned in the paper are certainly not exhaustive, I found the selection quite arbitrary.

The second flaw in the little abstract above was the positing of "drives": Omohundro argues that these drives don't have to be programmed into the AI but are intrinsic to goal-driven systems.

But he neglects another premise of his: that we are talking about AIs who can change their goal functions (see above)! All bets are off now!

Additionally, he bases his derivations on microeconomic theory which is also full of assumptions which maybe won't apply to sentient agents (they certainly don't apply to humans, as Omohundro recognizes).

Drives the paper mentions are: wanting to self-improve, being rational, protecting self, preserving utility function, resource acquisition etc. These drives sound indeed very plausible, and they are in essence human drives. So this leads me to suspect that anthropomorphism is creeping in again through the backdoor, in a very subtle way (for instance through assumptions of microeconomic theory).

I see nothing of the vastness of mindspace in this paper.

comment by Tim_Tyler · 2008-08-22T21:04:19.000Z · LW(p) · GW(p)

Re: no distinction whatever is made between "intelligent" and "sentient".

It seems like an irrelevance in this context. The paper is about self-improving systems. Normally these would be fairly advanced - and so would be intelligent and sentient.

Re: the few instances where an AI would change their utility function mentioned in the paper are certainly not exhaustive, I found the selection quite arbitrary.

How do you think these cases should be classified?

Re: The second flaw in the little abstract above was the positing of "drives".

That's the point of the paper. That a chess program, a paper clip maximiser, and a share-price maximiser will share some fundamental and important traits and behaviours.

Re: microeconomics applying to humans.

Humans aren't perfect rational economic agents - but they are approximations. Of course microeconomics applies to humans.

Re: I see nothing of the vastness of mindspace in this paper.

The framework allows for arbitrary utility functions. What more do you want?

comment by Caledonian2 · 2008-08-22T22:11:31.000Z · LW(p) · GW(p)

Also, spend some time with the dictionary, you meant 'prescriptive' not 'proscriptive.'

No, I did not. On what are you basing your claim to know what I meant to say?

This is not the first time Eliezer has addressed the totality of all possible minds and how they would not all agree; the fact that not all possible minds would agree on a goal structure is utterly irrelevant. Whether Roko also makes the same mistake is irrelevant to critiquing Eliezer's arguments, although it is no credit to him if he did.

comment by Roko · 2008-08-22T22:42:59.000Z · LW(p) · GW(p)

Apologies, I probably should have responded to this earlier, but I've been flat hunting in Edinburgh. I'll try to write a detailed response sometime tomorrow.

comment by [deleted] · 2008-08-22T22:55:16.000Z · LW(p) · GW(p)

Tim,

thanks for your answers and questions. As to the distinction intelligence and sentience: my point was exactly that it could not be waved away that easily, you have failed to give reasons why it can be. And I don't think that intelligence and sentience must go hand in hand (read Peter Watts "Blindsight" for some thoughts in this direction for instance). I think the distinction is quite essential.

As to the goal-function modification: what if a super-intelligent agent suddenly incorporates goals such as modesty, respect for other beings, maybe even makes them its central goals? -> then many of those drives Omohundro speaks of are automatically curbed. The reasoning of Omohundro seems to presuppose that goals always have to be reached at some cost to others. But maybe the AI will not choose these kinds of goals. There are wonderful goals which one can pursue which need not entail any of the drives O. mentions. The paper just begs the question.

chess program, a paper clip maximiser, and a share-price maximiser

Exactly, and that is why I introduced the concept of sentience (which implies real understanding) - the AI can immediately delete those purely economic goals (which would lead to the "drives", I agree) and maybe concentrate on other things, like communication with other sentients. Again, the paper fails by not taking into account the distinction sentience/non-sentience and what this would entail for goal-function modification.

Of course microeconomics applies to humans.

Well, but humans don't behave like "homo oeconomicus" and who says sentient AIs will? That was actually my point. The error of economics is repeated again, that's all.

arbitrary utility functions. What more do you want?

I contend that not all utility functions will lead to the "drives" described by Omohundro. Only those who seek to maximize some economic resource (and that is where the concept originated, after all) will. An AI need not restrain itself to this limited subset of goals.

And, additionally, it would not have evolved (unless you develop it by evolving it, which may not be a good idea): we should never forget that our reasoning evolved via Darwinian selection. Our ancestors (down to the first protozoa) had to struggle for life, eating and being eaten. This did something to us. Even today, you have to destroy (at least plant-) life to continue to live. Actually, this is a cosmic scandal.

I think that an AI attaining sentience will be much more benign than most humans would hold possible to believe, not having this evolutionary heritage we carry around with us.

comment by retired_urologist · 2008-08-22T22:58:01.000Z · LW(p) · GW(p)

Ever notice how heated and personal the discussion gets when one person tries to explain to a third person what the second person said, especially with such complicated topics? Perhaps this should be a green button that the AI never pushes.

comment by roko3 · 2008-08-22T23:13:27.000Z · LW(p) · GW(p)

@ marcello, quasi-anonymous, manuel:

I should probably add that I am not in favor of using any brand new philosophical ideas - like the ones that I like to think about - to write the goal system of a seed AI. That would be far too dangerous. For this purpose, I think we should simply concentrate on encoding the values that we already have into an AI - for example using the CEV concept.

I am interested in UIVs because I'm interested in formalizing the philosophy of transhumanism. This may become important because we may enter a slow takeoff, non-AI singularity.

comment by Jordan_Fisher · 2008-08-22T23:44:21.000Z · LW(p) · GW(p)

Wait.. if you base morality off of what other agents judge to be moral, and some of those agents are likewise judging their morality off of what other agents judge to be moral..... aren't you kind of SOL? Seems a little akin to Eliezer's calculator that calculates what it calculates.

comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2008-08-23T00:20:35.000Z · LW(p) · GW(p)

Marcello, Manual, and Quasi, potential FAI researchers are not my only audience. It would be nice to reach as many people as possible, and also I would like to write down these arguments and be done with them.

comment by Quasi-anonymous · 2008-08-23T01:26:20.000Z · LW(p) · GW(p)

Caledonian,

" Also, spend some time with the dictionary, you meant 'prescriptive' not 'proscriptive.'

No, I did not. On what are you basing your claim to know what I meant to say?"

'Prescriptive' is a synonym of normative, and the distinction between descriptive and prescriptive/normative analysis is a standard one. Your use of 'descriptive' and 'proscriptive' strongly suggests a mistake (and not a dexterity-related typo on a QWERTY keyboard, incidentally).

'Proscriptive' derives from 'proscription,' i.e. prohibition. 'Prescriptive' can be taken to refer specifically to positive injunctions, but in its general form, the form used in descriptive versus prescriptive discussions, encompasses both. Are you going to claim that the correct reading of your earlier comment was that morality is not descriptive, but prohibitory, with no positive prescriptions? That interpretation is so strained that, combined with your past history of pontificating on topics where your actual knowledge is profoundly lacking and general posting behavior, I would attach more credence to a different hypothesis: you made yet another mistake in yet another of your trollish posts, and are now denying it.

http://en.wiktionary.org/wiki/proscriptive http://en.wiktionary.org/wiki/prescriptive http://en.wiktionary.org/wiki/descriptive

comment by retired_urologist · 2008-08-23T02:18:47.000Z · LW(p) · GW(p)

@quasi-anonymous; This is exactly the kind of BS conflict that Eliezer is searching for in this blog, in order to help with his catalogue of human characteristics. Congratulations. Unfortunately, you won't get any extra pay when the FAI emerges.

comment by Tim_Tyler · 2008-08-23T05:53:46.000Z · LW(p) · GW(p)

Re: I contend that not all utility functions will lead to the "drives" described by Omohundro.

Well, of course they won't. The idea is that "the drives" are what you get unless you code things into the utility function to prevent them.

For example, you could make an AI that turns itself off in the year 2100 - and therefore fails to expand and grow indefinitely - by incorporating a "2100" clause into its utility function.

However, some of the "drives" are not so easy to circumvent. Try thinking of a utility function that allows humans to turn off an AI, which doesn't lead to it wanting to turn itself off - for example.

Re: the AI can immediately delete those purely economic goals

One of the ideas is that AI's defend and protect their utility functions. They can't just change or delete them - or rather they could, but they won't want to.

Re: Humans don't behave like "homo economicus" and who says sentient AIs will.

AIs will better approximate rational economic agents - else they will have the "vulnerabilities" Omohundro mentions - they will burn up their resources without attaining their goals. Humans have a laundry list of such vulnerabilities - and we are generally worse off for them.

Re: The paper just begs the question.

Well, the paper doesn't address the question of what utility functions will be chosen - that's beyond its scope.

comment by [deleted] · 2008-08-23T14:12:56.000Z · LW(p) · GW(p)

Tim,

we agree now nearly in all points grin, except for that part of the AIs not "wanting" to change their goals, simply because through meditation (in the Buddhist tradition for instance) I know that you can "see through" goals and not be enslaved to them anymore (and that is accessible to humans, so why shouldn't it be accessible to introspecting AIs?).

That line of thought is also strongly related to the concept of avidya, which ascribes "desires" and "wanting" to not having completely grasped certain truths about reality. I think these truths would also be accessible to sentient AIs (we live in the same universe after all), and thus they would also be able to come to certain insights annulling "programmed" drives. (As indeed human sages do.)

But I think what you said about "the scope of the paper" is relevant here. When I was pointed to the paper my expectations where raised that it would solve some of the fundamental problems of "wanting" and "desire" (in a psychological sense), but that is clearly not the focus of the paper, so maybe I was simply disappointed because I expected something else.

But, of course, it is always important when drawing conclusions that one remembers one's premises. Often, when conclusions seem exciting or "important", one forgets the limits of one's premises and applies the reasoning to contexts outside the scope of the original limitations.

I accept Omohundro's conclusions for certain kinds of non-sentient intelligent systems working with utility functions seeking to maximize some kind of economic (resource-constrained) goal. But I think that the results are not as general as a first reading might lead to believe.

comment by Tim_Tyler · 2008-08-24T20:30:44.000Z · LW(p) · GW(p)

Re: AIs not "wanting" to change their goals

Humans can and do change their goals - e.g. religious conversions.

However, I expect to see less of that in more advanced agents.

If we build an AI to perform some task, we will want it to do what we tell it - not decide to go off and do something else.

An AI that forgets what it was built to do is normally broken. We could build such systems - but why would we want to?

As Omohundro says: expected utility maximisers can be expected to back-up and defend their goals. Changing your goals is normally a serious hit to future utility, from the current perspective. Something clearly to be avoided at all costs.

FWIW, Omohundro claims his results are pretty general - and I tend to agree with him. I don't see the use of an economic framework as a problem - microeconomics itself is pretty general and broadly applicable.

comment by Roko · 2008-08-26T11:58:55.000Z · LW(p) · GW(p)

Eliezer said: "But there is always a framework, every time you are moved to change your morals - the question is whether it will be invisible to you or not. That framework is always implemented in some particular brain, so that the same argument would fail to compel a differently constructed brain - though this does not imply that the framework makes any mention of brains at all."

And the above statement - Eliezer's meta-framework for ethical reasoning - guarantees that he will remain a relativist. The implicit assumption is that the acid test of a particular ethical theory is whether it will persuade all possible minds (presumably he is talking about Turing machines here). Since there exists no ethical argument which will persuade all possible minds there is no "objectively best" ethical theory.

In fact, if you boil down Eliezer’s argument against moral realism to its essence, you get (using standard definitions for words like “right”, “objective”) the following:

Defn: Theory X is objectively morally right if and only if for all Turing machines Z, Z(X) = “yes I agree”

Fact: There exists a Turing machine which implements the constant function “I disagree”

Therefore: No ethical theory is objectively morally right

Now I reject the above definition: I think that there are other useful criteria - rooted in reality itself - which pick out certain axiologies as being special. Perhaps I should be more careful about what I call such frameworks: from the previous comment threads on overcoming bias, I have discovered that it is very easy to start abusing the ethical vocabulary, so I should call objective axiologies (such as UIVs) “objectively canonical” rather that objectively right.

I should add that I don't regard the very limited amount of work I have done on UIVs and objective axiologies as a finished product, so I am somewhat surprised to find it being critiqued. All constructive criticism is appreciated, though.

comment by Roko · 2009-03-10T01:06:40.000Z · LW(p) · GW(p)

So, after thinking the matter over for a long time, I have concluded that the Criticism presented here is largely correct, at least in the following senses:

Finding the universality/canonicity/objectivity of a system of axiology compelling is itself an axiological preference, there is no escape from the fact that any preference for doing something rather than something else counts as a position or framework which we can, in principle reject.
The concept of "objective morality" is nonsense. The concept of "objectively canonical axiology" is probably salvageable, but quite frankly, who cares? Richard Hollerith apparently still does...

However, I think that the philosophical investigation I have undertaken here [Note my comment above: "I am interested in UIVs because I'm interested in formalizing the philosophy of transhumanism"] is still a useful exercise, because it provides a concrete articulation of a common theme in transhumanist thought, which I might call techno-worship, and hence Eliezer's criticism of my ideas (given above) becomes a criticism of that theme.

comment by Roko · 2009-03-10T01:17:08.000Z · LW(p) · GW(p)

@ Marcello: "Do you predict that any of the people we will eventually hire will have clung to a mistake like this one despite reading through all of your previous series of posts on morality?"

you interpret my taking a long time to come around to the correct view (for, now, I agree with yourself and EY on this pretty much 100%) as indicative of me being of low value as a scientist/philosopher, versus your(?) or others'(?) quick acceptance of this view as being indicative of being of high value as a scientist/philosopher. However, another view on this is that those who quickly agreed with EY are simply good at conforming to what the chief says. When Eliezer pre-2002 was making arguments for the supreme moral value of intelligence, would you also have quickly agreed with him?

It takes courage to stand up and say something in a forum like OB that is in disagreement with the majority view, especially when you know that you are likely wrong, and are likely to suffer social consequences, reputational slander, etc.

comment by TheOtherDave · 2010-11-10T17:09:07.882Z · LW(p) · GW(p)

This post seems to come out of nowhere... I haven't seen any comments by Roko while reading up to this point, the Google link you provide turns up nothing relevant, and the bloglink doesn't exist. (I gather from casual searching that there was some kind of political blowup and Roko deleted all his contributions.)

So I'm not sure what you're responding to, and maybe the context matters. But something bewilders me about this whole line of reasoning as applied to what seems to be SIAI's chosen strategy for avoiding non-Friendliness.

(This kind of picks up from my earlier comment. If I'm confused, the confusion may start there.)

You argue that universality and objectivity and so forth are just goals, ones that we as humans happen to sort high. Sure, agreed.

You argue that it's wrong to decide what to do on the basis of those goals, because they are merely instrumental; you argue that other goals ( perhaps "life, consciousness, and activity; health and strength..." etc.) are right, or at least more right. Agreed with reservations.

You argue that individual minds will disagree on all of those goals, including the right ones. That seems guaranteed in the space of all possible minds, likely in the space of all evolved minds, and plausible in the space of all human minds.

And, you conclude, just because some mind disagrees with a goal doesn't mean that goal isn't right. And if the goal is right, we should pursue it, even if some mind disagrees. Even if a majority of minds disagree. Even (you don't say this but it seems to follow) if it makes a majority of minds unhappy.

So... OK. Given that, I'm completely confused about why you support CEV.

Part of the point of CEV seems to be that if there is some goal that some subset of a maximally informed and socialized but not otherwise influenced human race would want to see not-achieved, then a process implementing CEV will make sure that the AGI it creates will not pursue that goal. So, no paperclippers. Which is great, and good, and wonderful.

(That said, I see no way to prove that something really is a CEV-implementing AI, even after you've turned it on, so I'm not really sure what this strategy buys us in practice. But perhaps you get to that later, and in any case it's beside my point here.)

And presumably the idea is that humanity's CEV is different from, say, the SIAI's CEV, or LW's CEV, or my personal CEV. Otherwise why complicate matters by involving an additional several billion minds?

But... well, consider the set G of goals in my CEV that aren't in humanity's CEV. It's clear that the goals in G aren't shared by all human minds... but why is that a good reason to prevent an AGI from implementing them? What if some subset of G is right?

I'm not trying to make any special claim about my own mind, here. The same argument goes for everyone. To state it more generally, consider this proposition (P): for every right goal some human has, that goal is shared by all humans.

If P is true, then there's no reason to calculate humanity's CEV... any human's CEV will do just as well. If P is false, then implementing humanity's CEV fails to do the right thing.

What am I missing here?

Replies from: jimrandomh, timtyler

↑ comment by jimrandomh · 2010-11-10T17:33:58.488Z · LW(p) · GW(p)

But... well, consider the set G of goals in my CEV that aren't in humanity's CEV. It's clear that the goals in G aren't shared by all human minds... but why is that a good reason to prevent an AGI from implementing them? What if some subset of G is right?

You need to distinguish between goals you have which the rest of humanity doesn't like, from goals you have which the rest of humanity doesn't care about. Since you are part of humanity, the only way that one of your goals could be excluded from the CEV is if someone else (or humanity in general) has a goal that's incompatible and which is more highly weighted. If one of your goals is to have a candy bar, no one else really cares whether you have one or not, so the CEV will bring you one; but if one of your goals is to kill someone, then that goal would be excluded because it's incompatible with other peoples' goal of not dying.

The most common way for goals to be incompatible is to require the same resources. In that case, the CEV would do some balancing - if a human has the goal "maximize paperclips", the CEV will allocate a limited amount of resources to making paperclips, but not so many that it can't also make nice houses for all the humans who want them and fulfill various other goals.

Replies from: TheOtherDave

↑ comment by TheOtherDave · 2010-11-10T19:40:31.591Z · LW(p) · GW(p)

Balancing resources among otherwise-compatible goals makes sense, sort of. It becomes tricky if resources are relevantly finite, but I can see where this would work.

Balancing resources among incompatible goals (e.g., A wants to kill B, B wants to live forever) is, of course, a bigger problem. Excluding incompatible goals seems a fine response. (Especially if we're talking about actual volition.)

I had not yet come across the weighting aspect of CEV; I'd thought the idea was the CEV-implementing algorithm eliminates all goals that are incompatible with one another, not that it chooses one of them based on goal-weights and eliminates the others.

I haven't a clue how that weighting happens. A naive answer is some function of the number of people whose CEV includes that goal... that is, some form of majority rule. Presumably there are better answers out there. Anyway, yes, I can see how that could work, sort of.

All of which is cool, and thank you, but it leaves me with the same question, relative to Eliezer's post, that I had in the first place. Restated: if a goal G1 is right (1) but is incompatible with a higher-weighted goal that isn't right, do we want to eliminate G1? Or does the weighting algorithm somehow prevent this?

(1) I'm using "right" here the same way Eliezer does, even though I think it's a problematic usage, because the concept seems really important to this sequence... it comes up again and again. My own inclination is to throw the term away, personally.

Replies from: diegocaleiro

↑ comment by diegocaleiro · 2010-11-18T13:56:27.748Z · LW(p) · GW(p)

Maybe CEV is intended to get some Right stuff done.

It would be kind of impossible, given we are right due to a gift of nature, not due to a tendency of nature, do design an algorithm which would actually be able to sort all into the Right, Not right, and Borderline categories.

I suppose Eliezer is assuming that the moral gift we have will be a bigger part of CEV than it would be of some other division of current moralities.

Replies from: diegocaleiro

↑ comment by diegocaleiro · 2010-11-18T13:57:44.335Z · LW(p) · GW(p)

Thus rendering CEV a local optimum within a given set of gifted minds.

↑ comment by timtyler · 2011-04-02T09:21:01.263Z · LW(p) · GW(p)

This post seems to come out of nowhere... I haven't seen any comments by Roko while reading up to this point, the Google link you provide turns up nothing relevant, and the bloglink doesn't exist. (I gather from casual searching that there was some kind of political blowup and Roko deleted all his contributions.)

Yup - but archive.org still has it.

comment by TheOtherDave · 2010-11-10T17:09:55.066Z · LW(p) · GW(p)

(This kind of picks up from my [earlier comment] (http://lesswrong.com/lw/t3/the_bedrock_of_morality_arbitrary/2xi8?c=1). So if I'm confused, the confusion may start there.)

You argue that universality and objectivity and so forth are just goals, ones that we as humans happen to sort high. Sure, agreed.

So... OK. Given that, I'm completely confused about why you support CEV.

If P is true, then there's no reason to calculate humanity's CEV... any human's CEV will do just as well. If P is false, then implementing humanity's CEV fails to do the right thing.

What am I missing here?

comment by ec429 · 2011-09-23T05:29:15.997Z · LW(p) · GW(p)

Roko is adopting a special and unusual metamoral framework in regarding "Most agents do X!" as a compelling reason to change one's utility function. Why might Roko find this appealing? Humans, for very understandable reasons of evolutionary psychology, have a universalizing instinct; we think that a valid argument should persuade anyone.

Perhaps this can be fixed; maybe if we say Q:="moral(X):="A supermajority of agents which accept Q consider X moral"". Then agents accepting Q cannot agree to disagree, and Q-based arguments are capable of convincing any Q-implementing agent.

On the other hand, the universe could stably be in a state in which agents which accept Q mostly believe moral(torture), in which case they all continue to do so. However, this is unsurprising; there is no way to force everyone to agree on what is "moral" (no universally compelling arguments), so why should Q-agents necessarily agree with us?

But what we are left with seems to be a strange loop through the meta-level, with the distinction that it loops through not only the agent's own meta-level but also the agent's beliefs about other Q-agents' beliefs.

However, I'm stripping out the bit about making instrumental values terminal, because I can't see the point of it (and of course it leads to the "drive a car!" problem). Instead we take Q as our only terminal value; the shared pool of things-that-look-like-terminal-values {X : Q asserts moral(X)} is in fact our first layer of instrumental values.

Also, I'm not endorsing the above as a coherent or effective metaethics. I'm just wondering whether it's possible that it could be coherent or effective. In particular, is it PA+1 or Self-PA? Does it exhibit the failure mode of the Type 2 Calculator? After all, the system as a whole is defined as outputting what it outputs, but individual members are defined as outputting what everyone else outputs and therefore, um, my head hurts.

comment by [deleted] · 2012-12-13T03:09:22.787Z · LW(p) · GW(p)

I think Roko's statements had a logic similar to: http://lesswrong.com/lw/wz/living_by_your_own_strength/

Which could be described as (but not just as) "instrumental values somehow playing a keyrole in enjoyment of life"

I think he was trying to promote such idea by trying to make the presence of such instrumental values necessary ie. translate them into terminal values - though that's just an assumption - in an attempt to preserve something about different ways of life.

I think that line of thought is very inaccurate because it fails to capture the essence of the problem, and instead creates descriptions like the ones criticized in this post. I think it might be more sensible to say something like "I think if humans would realize all their goals instantaneously and effortlessly there would be something missing from the experience" - but then again, there's this thing called "The Fun Theory Sequence" on the topic. :)

Invisible Frameworks

Contents

47 comments