Fake Fake Utility Functionspost by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2007-12-06T06:30:26.000Z · LW · GW · Legacy · 9 comments
Followup to: Most of my posts over the last month...
Every now and then, you run across someone who has discovered the One Great Moral Principle, of which all other values are a mere derivative consequence.
I run across more of these people than you do. Only in my case, it's people who know the amazingly simple utility function that is all you need to program into an artificial superintelligence and then everything will turn out fine...
It's incredible how one little issue can require so much prerequisite material. My original schedule called for "Fake Utility Functions" to follow "Fake Justification" on Oct 31.
You see, before I wrote this post, it occurred to me that if I wanted to properly explain the problem of fake utility functions, it would be helpful to illustrate a mistake about what a simple optimization criterion implied. The strongest real-world example I knew was the Tragedy of Group Selectionism. At first I thought I'd mention it in passing, within "Fake Utility Functions", but I decided the Tragedy of Group Selectionism was a long enough story that it needed its own blog post...
So I started to write "The Tragedy of Group Selectionism". A few hours later, I noticed that I hadn't said anything about group selectionism yet. I'd been too busy introducing basic evolutionary concepts. Select all the introductory stuff, cut, Compose New Post, paste, title... "An Alien God". Then keep writing until the "Alien God" post gets too long, and start taking separate subjects out into their own posts: "The Wonder of Evolution", "Evolutions Are Stupid", and at this point it became clear that, since I was planning to say a few words on evolution anyway, that was the time. Besides, a basic familiarity with evolution would help to shake people loose of their human assumptions when it came to visualizing nonhuman optimization processes.
So, finally I posted "The Tragedy of Group Selectionism". Now I was ready to write "Fake Utility Functions", right? The post that was supposed to come immediately afterward? So I thought, but each time I tried to write the post, I ended up recursing on a prerequisite post instead. Such as "Fake Selfishness", "Fake Morality", and "Fake Optimization Criteria".
When I got to "Fake Optimization Criteria", I really thought I could do "Fake Utility Functions" the next day. But then it occurred to me that I'd never explained why a simple utility function wouldn't be enough. We are a thousand shards of desire, as I said in "Thou Art Godshatter". Only that first required discussing "Evolutionary Psychology", which required explaining that human minds are "Adaptation-Executers, not Fitness-Maximizers", plus the difference between "Protein Reinforcement and DNA Consequentialism".
Furthermore, I'd never really explained the difference between "Terminal Values and Instrumental Values", without which I could hardly talk about utility functions.
Surely now I was ready? Yet I thought about conversations I'd had over the years, and how people seem to think a simple instruction like "Get my mother out of that burning building!" contains all the motivations that shape a human plan to rescue her, so I thought that first I'd do "The Hidden Complexity of Wishes". But, really, the hidden complexity of planning, and all the special cases needed to patch the genie's wish, was part of the general problem of recording outputs without absorbing the process that generates the outputs - as I explained in "Artificial Addition" and "Truly Part Of You". You don't want to keep the local goal description and discard the nonlocal utility function: "Leaky Generalizations" and "Lost Purposes".
Plus it occurred to me that evolution itself made an interesting genie, so before all that, came "Conjuring An Evolution To Serve You".
One kind of lost purpose is artificial pleasure, and "happiness" is one of the Fake Utility Functions I run into more often: "Not for the Sake of Happiness (Alone)". Similarly, it was worth taking the time to establish that fitness is not always your friend ("Evolving to Extinction") and that not everything in the universe is subject to significant selection pressures ("No Evolutions for Corporations or Nanodevices"), to avoid the Fake Utility Function of "genetic fitness".
Into the home stretch! No, wait, this would be a good time to discuss "Affective Death Spirals", since that's one of the main things that goes wrong when someone discovers The One True Valuable Thingy - they keep finding nicer and nicer things to say about it. Well, you can't discuss affective death spirals unless you first discuss "The Affect Heuristic", but I'd been meaning to do that for a while anyway. "Evaluability" illustrates the affect heuristic and leads to an important point about "Unbounded Scales and Futurism". The second key to affective death spirals is "The Halo Effect", which we can see illustrated in "Superhero Bias" and "Mere Messiahs". Then it's on to affective death spirals and how to "Resist the Happy Death Spiral" and "Uncritical Supercriticality".
A bonus irony is that "Fake Utility Functions" isn't a grand climax. It's just one of many Less Wrong posts relevant to my AI work, with plenty more scheduled. This particular post just turned out to require just a little more prerequisite material which - I thought on each occasion - I would have to write anyway, sooner or later.
And that's why blogging is difficult, and why it is necessary, at least for me. I would have been doomed, yea, utterly doomed, if I'd tried to write all this as one publication rather than as a series of blog posts. One month is nothing for this much material.
But now, it's done! Now, after only slightly more than an extra month of prerequisite material, I can do the blog post originally scheduled for November 1st!
Now that I think about it...
This post is pretty long already, right?
So I'll do the real "Fake Utility Functions" tomorrow.
Comments sorted by oldest first, as this post is from before comment nesting was available (around 2009-02-27).