Intuitive supergoal uncertainty

post by JustinShovelain · 2009-12-04T05:21:03.942Z · LW · GW · Legacy · 27 comments

There is a common intuition and feeling that our most fundamental goals may be uncertain in some sense. What causes this intuition? For this topic I need to be able to pick out one’s top level goals, roughly one’s context insensitive utility function, and not some task specific utility function, and I do not want to imply that the top level goals can be interpreted in the form of a utility function. Following from Eliezer’s CFAI paper I thus choose the word “supergoal” (sorry Eliezer, but I am fond of that old document and its tendency to coin new vocabulary). In what follows, I will naturalistically explore the intuition of supergoal uncertainty.

To posit a model, what goal uncertainty (including supergoal uncertainty as an instance) means is that you have a weighted distribution over a set of possible goals and a mechanism by which that weight may be redistributed. If we take away the distribution of weights how can we choose actions coherently, how can we compare? If we take away the weight redistribution mechanism we end up with a single goal whose state utilities may be defined as the weighted sum of the constituent goals’ utilities, and thus the weight redistribution mechanism is necessary for goal uncertainty to be a distinct concept.

(ps I may soon post and explore the effects of supergoal uncertainty in its various reifications on making decisions. For instance, what implications, if any, does it have on bounded utility functions (and actions that depend on those bounds) and negative utilitarianism (or symmetrically positive utilitarianism)? Also, if anyone knows of related literature I would be happy to check it out.)

(pps Dang, the concept of supergoal uncertainty is surprisingly beautiful and fun to explore, and I now have a vague wisp of an idea of how to integrate a subset of these with TDT/UDT)

27 comments

Comments sorted by top scores.

comment by Mitchell_Porter · 2009-12-05T04:28:15.445Z · LW(p) · GW(p)

It's a total digression from this post, but: it occurs to me that someone ought to try to figure out what the "supergoal" or utility function of C. elegans is, or what the coherent extrapolated volition of the C. elegans species might be. That organism's nervous system has been mapped down to every last neuron (not so hard since there's only about 300 of them). If we can't make a C.elegans-Friendly AI given that information, we certainly can't do it for H. sapiens.

Replies from: Eliezer_Yudkowsky, wedrifid, timtyler
comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2009-12-05T08:22:43.259Z · LW(p) · GW(p)

My understanding is that we have a connection map but have not successfully simulated the behavior.

comment by wedrifid · 2009-12-05T04:53:54.725Z · LW(p) · GW(p)

If we can't make a C.elegans-Friendly AI given that information, we certainly can't do it for H. sapiens.

I like the suggestion you make. (But) I would perhaps fall just short of certainty. It is not unreasonable to suppose that a supergoal or utility function is something that was evolved alongside higher level adaptations like, say, an executive function and goal directed behaviour. C. elegans just wouldn't get much benefit from having a supergoal encoded in its nervous system.

Looking at the difficulty of creating a C. elegans-FAI would highlight one of the difficulties with FAI in general. There is the inevitable and somewhat arbitrary decision on just how much weight we want to give of implicit goals of humanity. The line between terminal and instrumental values is somewhat dependent on one's perspective.

comment by timtyler · 2009-12-09T15:59:01.121Z · LW(p) · GW(p)

In a nutshell, it's to make more copies of the c. elegans genome:

http://en.wikipedia.org/wiki/God's_utility_function

comment by Psychohistorian · 2009-12-04T17:15:27.979Z · LW(p) · GW(p)

There is a common intuition and feeling that our most fundamental goals may be uncertain in some sense.

In what follows, I will naturalistically explore the intuition of supergoal uncertainty.

These are entirely too representative of this post. I admit it's possible I lack adequate background, but this post seems incredibly dense and convoluted. I literally do not know what you're talking about, and I have enough external evidence of my reading comprehension to conclude that it's significantly the author's fault. The idea may be clear in your mind, but you need to spell it out in clear and simple terms if you want others to follow you. Defining "supergoal uncertainty" would be a necessary step, though it would still be well short of sufficient.

Replies from: Jack, JustinShovelain, rhollerith_dot_com
comment by Jack · 2009-12-04T22:55:06.504Z · LW(p) · GW(p)

There also appear to be outright misuses of vocabulary, unless there are technical meanings I am unaware of. I.e. "I may soon post and explore the effects of supergoal uncertainty in its various reifications on making decisions."

Not even the most obscure continental philosophy gets away with using 'reify' that way.

Still, it looks like there might be some interesting ideas somewhere in there.

Replies from: JustinShovelain
comment by JustinShovelain · 2009-12-05T02:04:18.007Z · LW(p) · GW(p)

Addressing your reification point:

By means of reification something that was previously implicit, unexpressed and possibly unexpressible is explicitly formulated and made available to conceptual (logical or computational) manipulation." - Reification(computer science) from wikipedia.

I don't think I did abuse vocabulary outside of possibly generalizing meanings in straightforward ways and taking words and meanings common in one topic and using them in a context where they are rather uncommon (e.g. computer science to philosophy). I rely on context to refine and imbue words with meaning instead of focusing on dictionary definitions (to me all sentences take the form of puzzles and words are the pieces; I've written more words in proofs than in all other contexts combined). I will try to pay more attention to context invariant meanings in the future. Thanks for the criticism.

comment by JustinShovelain · 2009-12-05T02:46:26.327Z · LW(p) · GW(p)

Hmm, darn. When I write I do have a tendency to see what ideas I meant to describe instead of seeing my actual exposition; I don't like grammar checking my writing until I've had some time to forget details, I read right over my errors unless I pay special attention.

I did have a three LWers look over the article before I sent it and got the general criticism that it was a bit obscure and dense but understandable and interesting. I was probably too ambitious in trying to include everything within one post though, length vs clarity tradeoff.

To address your points:

Have you not felt or encountered people who have the opinion that our life goals may be uncertain, something to have opinions about, and are valid targets for argument? Also, is not uncertainty of our most fundamental goals something we must consider and evaluate (explicitly or implicitly) in order to verify that an artificial intelligence is provably Friendly?

Elaborating on the second statement, when I used "naturalistically" I wished to invoke the idea that the exploration I was doing was similar to classifying animals before we had taxonomies, we look around with our senses (or imagination and inference in this case) and see what we observe and lay no claim to systematic search or analysis. In this context I did a kind of imagination limited shallow search process without trying to systematically relate the concepts (combinatorial explosion and I'm not yet sure how to condense and analyze supergoal uncertainty).

As to the third point, what I did in this article is allocate a name "supergoal uncertainty", roughly described it in the first paragraph and hopefully brought up the intuition, and then subsequently considered various definitions of "supergoal uncertainty" following from this intuition.

In retrospect, I probably errored on the clarity versus writing time trade-off and was perhaps biased in trying to get this uncomfortable writing task (I'm not a natural writer) off my plate so I can do other things.

comment by RHollerith (rhollerith_dot_com) · 2009-12-04T18:05:51.660Z · LW(p) · GW(p)

this post seems incredibly dense and convoluted. I literally do not know what you're talking about

That was not my experience. I understood everything in the first five paragraphs without having to reflect or even read a second time except that I did have to reflect for a few minutes on the last sentence of paragraph four. Although I am still less confident that I know what Justin intended there than I am with the other sentences, I am 72% confident I know. I think he meant that even if we are not religious, society tends to pull us into moral realism even though of course moral realism is an illusion. (Time constraints prevent me from reading the rest now.)

Defining "supergoal uncertainty" would be a necessary step

Oh, he did that. And the definition was quite clear to me on first reading, but then I have done a lot of math, and a lot of math in which I attempt my own definitions.

Replies from: Cyan, JustinShovelain, bgrah449
comment by Cyan · 2009-12-04T18:19:37.535Z · LW(p) · GW(p)

72% confident

Two sig figs? Really?

Replies from: rhollerith_dot_com, wedrifid
comment by RHollerith (rhollerith_dot_com) · 2009-12-04T18:57:27.365Z · LW(p) · GW(p)

Well, I am relatively new at assigning my beliefs numerical probabilities, so if Eliezer or E.T. Jaynes says different, believe them, but here is my reply.

72% confident

Two sig figs? Really?

Note that if I had said .7 that does not mean that my probability will not go to.4 or .9 tomorrow. On the other hand, if I say the doo-dad is .7 meters long, I am implying that if I re-measure the doo-dad tomorrow, the result will be somewhere in the range .65 to .75 (or to .8). In summary, significant figures does not seem a worthwhile way to communicate how much evidence is required to move a probability by a certain amount. What I suggest people do instead is communicate somehow the nature of the evidence used to arrive at the number. In this case, I left implied that my evidence comes from squishy introspective considerations. Also, note that the fact that Justin will be checking frequently for comments (because it is his post) and Justin can very easily drive my probability to close to 1 or close to 0 with a reply that takes him only 10 seconds to make means that it does not serve the "vericidal" interests of the community for me to spend more than a few seconds in arriving at my numerical probability. I could have mentioned these considerations of the cost of updating my probability and the implications that cost structure has for how much effort I put into my number, but I considered them so obvious that the reader would take them into consideration without my having to say anything.

Look: there is a cost to the experimentalist's tradition by which .7 means that tomorrow the number will not change to anything lower than .65 and higher than .75 or .7999 and that cost is that the only numbers available to the writer are .1 .2 .3 .4 .5 .6 .7 .8 .9. The previous paragraph explains why I consider that cost not worth paying for subjective probabilities.

Replies from: Cyan
comment by Cyan · 2009-12-04T19:30:30.963Z · LW(p) · GW(p)

Jaynes does have something to say on this, which I will summarize thus: you get to (ought to, even) put credible-interval-type bounds on a stated probability (that is, you could have said, e.g., "between 50% and 90%"). The central location of the interval tells us what you now think of your probability (~70%), and the width of the interval tells us how apt your estimate is to move in the face of new evidence.

The above is an approximation; there are lots of refinements. One I will mention right off is that the scheme will break down for probabilities near 0 or 1, because the implied distribution is no longer symmetric around the center of the interval.

Replies from: Peter_de_Blanc
comment by Peter_de_Blanc · 2009-12-04T19:55:30.464Z · LW(p) · GW(p)

Can you give a reference? Because that strikes me as rather un-Jaynesian.

You say that the interval tells us something about how apt the estimate is to move in the face of new evidence. What does it tell us about that? Doesn't it depend on which piece of evidence we're talking about? Do you have to specify a prior over which variables you are likely to observe next?

Replies from: Cyan, Eliezer_Yudkowsky, timtyler
comment by Cyan · 2009-12-04T21:00:34.688Z · LW(p) · GW(p)

The material I have in mind is Chapter 18 of PT:LOS. You can see the section headings on page 8 (numbered vii because the title page is unnumbered) here. One of the section titles is "Outer and Inner Robots"; when rhollerith says 72%, he's giving the outer robot answer. To give an account of how unstable your probability estimates are, you need to give the inner robot answer.

What does it tell us about that? Doesn't it depend on which piece of evidence we're talking about?

When we receive new evidence, we assign a likelihood function for the probability. (We take the perspective of the inner robot reasoning about what the outer robot will say.) The width of the interval for the probability tells us how narrow the likelihood function has to be to shift the center of that interval by a non-neglible amount.

Do you have to specify a prior over which variables you are likely to observe next?

No.

Replies from: Eliezer_Yudkowsky
comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2009-12-05T08:42:18.438Z · LW(p) · GW(p)

That is a strange little chapter, but I should note that if you talk about the probability that you will make some future probability estimate, then the distribution of a future probability estimate does make a good way of talking about the instability of a state of knowledge. As opposed to the notion of talking about the probability of a current probability estimate, which sounds much more like you're doing something wrong.

comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2009-12-04T20:50:16.445Z · LW(p) · GW(p)

Second the question, it doesn't sound Jaynesian to me either.

Replies from: wedrifid
comment by wedrifid · 2009-12-05T03:14:06.069Z · LW(p) · GW(p)

Second the question, it doesn't sound Jaynesian to me either.

I'm relieved that I'm not the only one who thought that. I was somewhat aghast to hear Jaynes recommend something that is so, well, obviously a bull@# hack.

Replies from: Cyan
comment by Cyan · 2009-12-05T03:48:22.486Z · LW(p) · GW(p)

It's curious to me that you'd write this even after I cited chapter and verse. Do you have a copy of PT:LOS?

Replies from: wedrifid
comment by wedrifid · 2009-12-05T04:45:40.597Z · LW(p) · GW(p)

It's curious to me that you'd write this even after I cited chapter and verse. Do you have a copy of PT:LOS?

I do have a copy but I will take your word for it. I am shocked and amazed that Jayenes would give such a poor recommendation. It doesn't sound Jaynesian to me either and I rather hope he presents a variant that is sufficiently altered as to not be this suggestion at all. You yourself gave the reason why it doesn't work and I am sure there is a better approach than just hacking the scale when it is near 1 or 0. (I am hoping your paraphrase sounds worse than the original.)

Replies from: timtyler
comment by timtyler · 2009-12-09T16:04:04.730Z · LW(p) · GW(p)

Best to give a probabilty density function - but two 2-S-F probabilites typically gives more information than one.

comment by timtyler · 2009-12-09T16:12:23.731Z · LW(p) · GW(p)

It is good to indicate the strength of your priors. Perhaps one could indicate how much you think your opinion is likely to change over some specified timescale - or in response to the next set of pertinent data points.

comment by wedrifid · 2009-12-04T19:14:57.405Z · LW(p) · GW(p)

Two sig figs? Really?

For significant figures to be at all applicable you would need to express confidence with a completely different kind of scale. I am not going to round off 96% to "not even a probability".

Replies from: Cyan
comment by Cyan · 2009-12-04T19:22:13.718Z · LW(p) · GW(p)

express confidence with a completely different kind of scale

I like the odds scale, myself.

Replies from: wedrifid
comment by wedrifid · 2009-12-04T19:31:24.090Z · LW(p) · GW(p)

I like the odds scale, myself.

For my part I find it irritating. But it would certainly work better for 1 significant figure expressions. Although I suppose you could say it kind of relies on two significant figures (one on either side) to work at all.

comment by JustinShovelain · 2009-12-05T02:09:29.165Z · LW(p) · GW(p)

I think he meant that even if we are not religious, society tends to pull us into moral realism even though of course moral realism is an illusion.

You are correct, though I don't go as far as calling moral realism an illusion because of unknown unknowns (though I would be very surprised to find it isn't illusionary).

comment by timtyler · 2009-12-09T15:58:05.041Z · LW(p) · GW(p)

Re: "There is a common intuition and feeling that our most fundamental goals may be uncertain in some sense. What causes this intuition?"

  • Rapidly-changing memetic infections;

  • Pleiotropic side effects of a flexible brain;

  • Other malfunctions.