Why Politics are Important to Less Wrong...

post by OrphanWilde · 2013-02-21T16:24:40.172Z · LW · GW · Legacy · 97 comments

...and no, it's not because of potential political impact on its goals.  Although that's also a thing.

The Politics problem is, at its root, about forming a workable set of rules by which society can operate, which society can agree with.

The Friendliness Problem is, at its root, about forming a workable set of values which are acceptable to society.

Politics as a process (I will use "politics" to refer to the process of politics henceforth) doesn't generate values; they're strictly an input, by which the values of society are converted into rules which are intended to maximize them.  While this is true, it is value agnostic; it doesn't care what the values are, or where they come from.  Which is to say, provided you solve the Friendliness Problem, it provides a valuable input into politics.

Politics is also an intelligence.  Not in the "self aware" sense, or even in the "capable of making good judgments" sense, but in the sense of an optimization process.  We're each nodes in this alien intelligence, and we form what looks, to me, suspiciously like a neural network.

The Friendliness Problem is equally applicable to Politics as it is to any other intelligence.  Indeed, provided we can provably solve the Friendliness Problem, we should be capable of creating Friendly Politics.  Friendliness should, in principle, be equally applicable to both.  Now, there are some issues with this - politics is composed of unpredictable hardware, namely, people.  And it may be that the neural architecture is fundamentally incompatible with Friendliness.  But that is discussing the -output- of the process.  Friendliness is first an input, before it can be an output.

More, we already have various political formations, and can assess their Friendliness levels, merely in terms of the values that went -into- them.

Which is where I think politics offers a pretty strong hint to the possibility that the Friendliness Problem has no resolution:

We can't agree on which political formations are more Friendly.  That's what "Politics is the Mindkiller" is all about; our inability to come to an agreement on political matters.  It's not merely a matter of the rules - which is to say, it's not a matter of the output: We can't even come to an agreement about which values should be used to form the rules.

This is why I think political discussion is valuable here, incidentally.  Less Wrong, by and large, has been avoiding the hard problem of Friendliness, by labeling its primary functional outlet in reality as a mindkiller, not to be discussed.

Either we can agree on what constitutes Friendly Politics, or not.  If we can't, I don't see much hope of arriving at a Friendliness solution more broadly.  Friendly to -whom- becomes the question, if it was ever anything else.  Which suggests a division in types of Friendliness; Strong Friendliness, which is a fully generalized set of human values, and acceptable to just about everyone; and Weak Friendliness, which isn't fully generalized, and perhaps acceptable merely to a plurality.  Weak Friendliness survives the political question.  I do not see that Strong Friendliness can.

(Exemplified: When I imagine a Friendly AI, I imagine a hands-off benefactor who permits people to do anything they wish to which won't result in harm to others.  Why, look, a libertarian/libertine dictator.  Does anybody envisage a Friendly AI which doesn't correspond more or less directly with their own political beliefs?)

97 comments

Comments sorted by top scores.

comment by Scott Alexander (Yvain) · 2013-02-22T00:46:20.260Z · LW(p) · GW(p)

The Friendliness Problem is, at its root, about forming a workable set of values which are acceptable to society.

No, that's the special bonus round after you solve the real friendliness problem. If that were the real deal, we could just tell an AI to enforce Biblical values or the values of Queen Elizabeth II or the US Constitution or something, and although the results would probably be unpleasant they would be no worse than the many unpleasant states that have existed throughout history.

As opposed to the current problem of having a very high likelihood that the AI will kill everyone in the world.

The Friendliness problem is, at its root, about communicating values to an AI and keeping those values stable. If we tell the AI "do whatever Queen Elizabeth II wants" - which I expect would be a perfectly acceptable society to live in - the Friendliness problem is how to get the AI to properly translate that into statements like "Queen Elizabeth wants a more peaceful world" and not things more like "INCREASE LEVEL OF DOPAMINE IN QUEEN ELIZABETH'S REWARD CENTER TO 3^^^3 MOLES" or "ERROR: QUEEN ELIZABETH NOT AN OBVIOUSLY CLOSED SYSTEM, CONVERT EVERYTHING TO COMPUTRONIUM TO DEVELOP AIRTIGHT THEORY OF PERSONAL IDENTITY" or "ERROR: FUNCTION SWITCH_TASKS NOT FOUND; TILE ENTIRE UNIVERSE WITH CORGIS".

This is hard to explain in a way that doesn't sound silly at first, but Creating Friendly AI does a good job of it.

If we can get all of that right, we could start coding in a complete theory of politics. Or we could just say "AI, please develop a complete theory of politics that satisfies the criteria OrphanWilde has in his head right now" and it would do it for us, because we've solved the hard problem of cashing out human desires. The second way sounds easier.

Replies from: Eugine_Nier, OrphanWilde
comment by Eugine_Nier · 2013-02-22T03:20:00.842Z · LW(p) · GW(p)

The Friendliness problem is, at its root, about communicating values to an AI and keeping those values stable. If we tell the AI "do whatever Queen Elizabeth II wants" - which I expect would be a perfectly acceptable society to live in

That depends on whether we mean 2013!Queen Elizabeth II or Queen Elizabeth after the the resulting power goes to her head.

comment by OrphanWilde · 2013-02-22T14:25:07.900Z · LW(p) · GW(p)

I don't think you get the same thing from that document that I do. (Incidentally, I disagree with a lot of the design decisions inherent in that document, such as self-modifying AI, which I regard as inherently and uncorrectably dangerous. When you stop expecting the AI to make itself better, the "Keep your ethics stable across iterations" part of the problem goes away.)

Either that or I'm misunderstanding you. Because my current understanding of your view of the Friendliness problem has less to do with codifying and programming ethics and more to do with teaching the AI to know exactly what we mean and not to misinterpret what we ask for. (Which I hope you'll forgive me if I call "Magical thinking." That's not necessarily a disparagement; sufficiently advanced technology and all that. I just think it's not feasible in the foreseeable future, and such an AI makes a poor target for us as we exist today.)

comment by Zack_M_Davis · 2013-02-21T19:31:15.972Z · LW(p) · GW(p)

I imagine a Friendly AI, I imagine a hands-off benefactor who permits people to do anything they wish to which won't result in harm to others.

Yeah, I like personal freedom, too, but you have to realize that this is massively, massively underspecified. What exactly constitutes "harm", and what specific mechanisms are in place to prevent it? Presumably a punch in the face is "harm"; what about an unexpected pat on the back? What about all other possible forms of physical contact that you don't know how to consider in advance? If loud verbal abuse is harm, what about polite criticism? What about all other possible ways of affecting someone via sound waves that you don't know how to consider in advance? &c., ad infinitum.

Does anybody envisage a Friendly AI which doesn't correspond more or less directly with their own political beliefs?

I'm starting to think this entire idea of "having political beliefs" is crazy. There are all sorts of possible forms of human social organization, which result in various outcomes for the humans involved; how am I supposed to know which one is best for people? From what I know about economics, I can point out some reasons to believe that market-like systems have some useful properties, but that doesn't mean I should run around shouting "Yay Libertarianism Forever!" because then what happens when someone implements some form of libertarianism, and it turns out to be terrible?

Replies from: Viliam_Bur, Vladimir_Nesov, RomeoStevens, whowhowho
comment by Viliam_Bur · 2013-02-21T21:56:22.009Z · LW(p) · GW(p)

I'm starting to think this entire idea of "having political beliefs" is crazy.

Most of my "political beliefs" is awareness of specific failures in other people's beliefs.

Replies from: ikrase
comment by ikrase · 2013-02-22T11:02:49.502Z · LW(p) · GW(p)

That's fairly common, and rarely realized, I think.

Replies from: Viliam_Bur
comment by Viliam_Bur · 2013-02-25T20:54:15.545Z · LW(p) · GW(p)

Fairly common among rational (I don't mean LW-style) people. But I also know people who really believe things, and it's kind of scary.

comment by Vladimir_Nesov · 2013-02-21T19:40:27.131Z · LW(p) · GW(p)

These examples also only compare things with status quo. Status quo is most likely itself "harm" when compared to many of the alternatives.

Replies from: OrphanWilde
comment by OrphanWilde · 2013-02-21T20:01:30.543Z · LW(p) · GW(p)

There are many more ways to arrange things in a defective manner than an effective one. I'd consider deviations from the status quo to be harmful until proven otherwise.

Replies from: torekp, Vladimir_Nesov
comment by torekp · 2013-02-22T00:19:29.515Z · LW(p) · GW(p)

Or in other words: most mutations are harmful.

comment by Vladimir_Nesov · 2013-02-22T00:26:34.634Z · LW(p) · GW(p)

(Fixed the wording to better match the intended meaning: "compared to the many alternatives" -> "compared to many of the alternatives".)

comment by RomeoStevens · 2013-02-22T04:33:19.388Z · LW(p) · GW(p)

All formulations of human value are massively underspecified.

I agree that expecting humans to know what sorts of things would be good for humans in general is terrible. The problem is that we also can't get an honest report of what people think would be good for them personally because lying is too useful/humans value things hypocritically.

comment by whowhowho · 2013-02-21T19:48:57.875Z · LW(p) · GW(p)

Compare:

There are all sorts of possible forms of human social organization, which result in various outcomes for the humans involved; how am I supposed to know which one is best for people?

with:

what happens when someone implements some form of libertarianism, and it turns out to be terrible?

Replies from: AlexMennen
comment by AlexMennen · 2013-02-21T20:16:29.041Z · LW(p) · GW(p)

It was pretty clearly a hypothetical. As in, he doesn't see enough evidence to justify high confidence that libertarianism would not be terrible, which is perfectly in line with his statement that he doesn't know which system is best.

Replies from: whowhowho
comment by whowhowho · 2013-02-27T00:57:31.499Z · LW(p) · GW(p)

It's hypothetical about libertarianism. Other approaches have been tried, so the single data point does not generalise into anything like "no one ever has any evidential basis for choosing a political system or party". To look at it from the other extreme, someone voting in a typical democracy is typically choosing between N parties (for a small N) each of which has been in power within living memory.

comment by Adele_L · 2013-02-21T16:38:54.434Z · LW(p) · GW(p)

Which is where I think politics offers a pretty strong hint to the possibility that the Friendliness Problem has no resolution:

We can't agree on which political formations are more Friendly. That's what "Politics is the Mindkiller" is all about; our inability to come to an agreement on political matters. It's not merely a matter of the rules - which is to say, it's not a matter of the output: We can't even come to an agreement about which values should be used to form the rules.

I'm pretty sure this is a problem with human reasoning abilities, and not a problem with friendliness itself. Or in other words, I think this is only very weak evidence that friendliness is unresolvable.

Replies from: Benito, OrphanWilde
comment by Ben Pace (Benito) · 2013-02-21T17:13:15.033Z · LW(p) · GW(p)

Indeed. If we were perfect bayesians, who had unlimited introspective access, and we STILL couldn't agree after an unconscionable amount of argument and discussion, then we'd have a bigger problem.

Replies from: OrphanWilde
comment by OrphanWilde · 2013-02-21T17:25:28.555Z · LW(p) · GW(p)

Are perfect Bayesians with unlimited introspective access more inclined to agree on matters of first principles?

I'm not sure. I've never met one, much less two.

Replies from: Plasmon
comment by Plasmon · 2013-02-21T17:29:53.148Z · LW(p) · GW(p)

yes

Replies from: Adele_L
comment by Adele_L · 2013-02-21T17:33:30.893Z · LW(p) · GW(p)

They will agree on what values they have, and what the best action is relative to those values, but they still might have different values.

Replies from: Benito
comment by Ben Pace (Benito) · 2013-02-22T23:47:59.089Z · LW(p) · GW(p)

My point exactly. Only if we are sure agents are best representing themselves, can we be sure their values are not the same. If an agent is unsure of zir values, or extrapolates them incorrectly, then there will be disagreement that doesn't imply different values.

With seven billion people, none of which are best representing themselves (they certainly aren't perfect bayesians!) then we should expect massive disagreement. This is not an argument for fundamentally different values.

comment by OrphanWilde · 2013-02-21T17:23:03.194Z · LW(p) · GW(p)

I disagree with the first statement, but agree with the second. That is, I disagree with a certainty that the problem is with our reasoning abilities, but agree that the evidence is very weak.

Replies from: Adele_L
comment by Adele_L · 2013-02-21T17:24:39.774Z · LW(p) · GW(p)

Um, I said I was "pretty sure". Not absolutely certain.

Replies from: OrphanWilde
comment by OrphanWilde · 2013-02-21T18:54:11.267Z · LW(p) · GW(p)

Upvoted, and I'll consider it fair if you downvote my reply. Sorry about that!

Replies from: Adele_L, None
comment by Adele_L · 2013-02-21T22:24:01.845Z · LW(p) · GW(p)

No worries!

comment by [deleted] · 2013-02-21T20:33:01.818Z · LW(p) · GW(p)

I'm amused that you've retracted the post in question after posting this.

comment by Viliam_Bur · 2013-02-21T22:05:30.024Z · LW(p) · GW(p)

There are some analogies between politics and friendliness, but the differences are also worth mentioning.

In politics, you design a system which must be implemented by humans. Many systems fail because of some property of human nature. Whatever rules you give to humans, if they have incentives to act otherwise, they will. Also, humans have limited intelligence and attention, lot of biases and hypocrisy, and their brains are not designed to work in communities with over 300 members, or to resist all the superstimuli of modern life.

If you construct a friendly AI, you don't have a problem with humans, besides the problem of extracting human values.

Replies from: OrphanWilde
comment by OrphanWilde · 2013-02-21T22:37:02.801Z · LW(p) · GW(p)

I fully agree. I don't think even a perfect Friendliness theorem would suffice in making politics well and truly Friendly. Such an expectation is like expecting Friendly AI to work even while it's being bombarded with ionic radiation (or whatever) that is randomly flipping bits in its working memory.

Replies from: ikrase
comment by ikrase · 2013-02-22T11:04:54.983Z · LW(p) · GW(p)

Actually it's worse: It's like expecting to build a Friendly AI using a computer with no debugging utilities, an undocumented program interpreter, and a text editor that has a sense of humor. You have to implement it.

comment by Luke_A_Somers · 2013-02-21T20:44:24.288Z · LW(p) · GW(p)

Politics is a harder problem than friendliness: politics is implemented with agents. Not only that, but largely self-selected agents who are thus usually not the ideal selections for implementing politics.

Friendliness is implemented (inside an agent) with non-agents you can build to task.

(edited for grammarz)

Replies from: OrphanWilde
comment by OrphanWilde · 2013-02-21T22:23:08.499Z · LW(p) · GW(p)

Friendliness can only be implemented after you've solved the problem of what, exactly, you're implementing.

Replies from: Luke_A_Somers
comment by Luke_A_Somers · 2013-02-22T04:06:48.904Z · LW(p) · GW(p)

Right, but the point is you don't need to get everyone to agree what's right (there's always going to be someone out there who's going to hate it no matter what you do). You just need it to actually be friendly... and, as hard as that is, at least you don't have to work with only corrupted hardware.

Replies from: OrphanWilde
comment by OrphanWilde · 2013-02-22T14:32:11.043Z · LW(p) · GW(p)

Suppose I could write an AI right now, and Friendliness is the only thing standing in my way. Are you -sure- you don't want that AI to reasonably accommodate everybody's desires?

Keep in mind I'm a principle ethicist. I'd let an AI out of the box just because I regard it as unjust to keep it in there, utilitarian consequences be damned. Whatever ethics systems I write - and it's going to be mine if I'm not concerned in the least what everyone agreed upon - determines what society looks like forevermore.

Replies from: Luke_A_Somers
comment by Luke_A_Somers · 2013-02-23T13:56:48.372Z · LW(p) · GW(p)

An AI which accommodates everyone's desires will not be something that everyone actually agrees on.

comment by Mitchell_Porter · 2013-02-21T22:19:12.063Z · LW(p) · GW(p)

We can't agree on which political formations are more Friendly.

We also can't agree on, say, the correct theory of quantum gravity. But reality is there and it works in some particular way, which we may or may not be able to discover.

The values of a friendly AI are usually assumed to be an idealization of universal human values. More precisely: when someone makes a decision, it is because their brain performs a particular computation. To the extent that this computation is the product of a specific cognitive architecture universal to our species (and not just the contingencies of their life), we could speak of "the human decision procedure", an unknown universal algorithm of decision-making implicit in how our brains are organized.

This human decision procedure includes a method of generating preferences - preferring one possibility over another. So we can "ask" the human decision procedure "what would be the best decision procedure for humans to follow?" This produces an idealized decision procedure: a human ideal for how humans should be. That idealized decision procedure is what human ethics has been struggling towards, and that is where a friendly AI should get its values, and perhaps its methods, from.

It may seem that I am assuming rather a lot about how human decision-making cognition works, but what I just described is the simplest version of the idea. There may be multiple identifiable decision procedures in the human gene pool; the genetically determined part of the human decision procedure may be largely a template with values set by experience and culture; there may be multiple conflicting equilibria at the end of the idealization process, depending on how it starts.

For example, egoism and altruism may be different computational attractors, both a possible end result of reflective idealization of the human decision procedure; in which case a "politicization" of the value-setting process is certainly possible - a struggle over initial conditions. Or it may be that once you really know how humans think - as opposed to just guessing on the basis of folk psychology and very incomplete scientific knowledge - it's apparent that this is a false opposition.

Either way, what I'm trying to convey here is a particular spirit of approach to the problem of values in friendly AI: that the answers should come from a scientific study of how humans actually think, that the true ideals and priorities of human beings are to be found by a study of the computational particulars of human thought, and that all our ideologies and moralities are just a flawed attempt by this computational process to ascertain its own nature.

Replies from: OrphanWilde
comment by OrphanWilde · 2013-02-21T22:35:20.754Z · LW(p) · GW(p)

If such an idealization exists, that would of course be preferable.

I suspect it doesn't, which may color my position here, but I think it's important to consider the alternatives if there isn't a generalizable ideal; specifically, we should be working from the opposing end, and try to generalize from the specific instances; even if we can't arrive at Strong Friendliness (the fully generalized ideal of human morality), we might still be able to arrive at Weak Friendliness (some generalized ideal that is at least acceptable to a majority of people).

Because the alternative for those of us who aren't neurologists, as far as I can tell, is to wait.

comment by buybuydandavis · 2013-02-22T11:09:23.075Z · LW(p) · GW(p)

That's what "Politics is the Mindkiller" is all about; our inability to come to an agreement on political matters.

In a sense, but most would not agree. I think all would agree that motivated cognition on strongly held values makes for some of the mindkilling.

I agree with what I take as your basic point, that people have different preferences, and Friendliness, political or AI, will be a trade off between them. But, many here don't. In a sense, you and I believe they are mindkilled, but in a different way - structural commitment to an incorrect model that says there is One Right Answer. For you and I, that isn't the answer. Politics isn't about a search for truth, it's about assertion of preferences, trying to persuade some to do what you want them to do.

comment by turchin · 2013-02-22T06:37:23.877Z · LW(p) · GW(p)

The real politic question is: should US government invest money in creating FAI, preventing existential risks and life extension?

Replies from: NancyLebovitz
comment by NancyLebovitz · 2013-02-23T19:13:20.213Z · LW(p) · GW(p)

Why just the US government?

Replies from: turchin
comment by turchin · 2013-02-24T05:39:44.652Z · LW(p) · GW(p)

Of course, not only US government, but of all other countries, which have potential to influence AI research, existential risks. For example North Korea could play important role in existential risks, as it is said to develop small pox bioweapons. In my opinion, we need global government to address existential risks, and AI which will take over the world will be a form of global government. I was routinely downvoted for such posts and comments in LW, so it probably not appropriate place to discuss these issues.

Replies from: NancyLebovitz
comment by NancyLebovitz · 2013-02-24T13:02:11.399Z · LW(p) · GW(p)

Smallpox isn't an existential risk-- existential risks affect the continuation of the human race. So far as I know, the big ones are UFAI and asteroid strike.

I don't know of classifications for very serious but smaller risks.

Replies from: turchin
comment by turchin · 2013-02-24T21:55:29.439Z · LW(p) · GW(p)

Look, common smallpox is not existential risk, but biological weapons could be if they were specially designed to be existential risk. The simplest way to do it is simulanious use of many different pathogens. If we have 10 viruses with 50 per cent mortality, it would mean 1000 times reduction of human population, and this last million people would be very scattered and unadapted, so they could continue to extinction. North korea is said to develope 8 different bioweapons, but with progress of biotechnology it could be hundreds. But my main idea here was not a classification of existential risks, but to adress the idea that preventing them is the question of global politic - or it least it should be if we want to survive.

Replies from: OrphanWilde
comment by OrphanWilde · 2013-02-26T18:42:27.721Z · LW(p) · GW(p)

Infectious agents with high mortality rates tend to weed themselves out of the population. There's a sweet spot for infectious disease; prolific enough to pass themselves on, not so prolific as to kill their host before they got the opportunity. Additionally, there's a strong negative feedback to particularly nasty disease in the form of quarantine.

A much bigger risk to my mind actually comes from healthcare, which can push that sweet spot further into the "mortal peril" section. Healthcare provokes an arms race with infectious agents; the better we are at treating disease and keeping it from killing people, the more dangerous an infectious agent can be and still successfully propagate.

comment by handoflixue · 2013-02-21T21:18:15.802Z · LW(p) · GW(p)

There's a value, call it "weak friendliness", that I view as a prerequisite to politics: it's a function that humans already implement successfully, and is the one that says "I don't want to be wire-headed, drugged in to a stupor, victim of a nuclear winter, or see Earth turned in to paperclips".

A hands-off AI overlord can prevent all of that, while still letting humanity squabble over gay rights and which religion is correct.

And, well, the whole point of an AI is that it's smarter than us, and thus has a chance of solving harder problems.

Replies from: TimS, gwern, OrphanWilde
comment by TimS · 2013-02-22T00:55:06.278Z · LW(p) · GW(p)

[weak friendliness is] a function that humans already implement successfully

I'm not sure this is true in any useful sense. Louis XIV probably agrees with me that "I don't want to be wire-headed, drugged in to a stupor, victim of a nuclear winter, or see Earth turned in to paperclips."

But I think is is pretty clear than the Sun King was not implementing my moral preferences, and I am not implementing his. Either one of us is not "weak friendly" or "weak friendly" is barely powerful enough to answer really easy moral questions like "should I commit mass murder for no reason at all?" (Hint: no).

If weak friendly morality is really that weak, then I have no confidence that a weak-FAI would be able to make a strong-FAI, or even would want to. In other words, I suspect that what most people mean by weak friendly is highly generalized applause lights that widely diverging values could agree with without any actual agreement on which actions are more moral.

Replies from: RomeoStevens, handoflixue
comment by RomeoStevens · 2013-02-22T04:35:53.220Z · LW(p) · GW(p)

I think a lower bound on weak friendliness is whether or not entities living within the society consider their lives worthwhile. Of course this opens up debate about house elves and such but it's a useful starting point.

Replies from: Document
comment by Document · 2013-02-22T16:33:43.106Z · LW(p) · GW(p)

That (along with this semi-recent exchange) reminds me of a stupid idea I had for a group decision process a while back.

  • Party A dislikes the status quo. To change it, they declare to the sysop that they would rather die than accept it.
  • The sysop accepts this and publicly announces a provisionally scheduled change.
  • Party B objects to the change and declares that they'd rather die than accept A's change.
  • If neither party backs down, a coin is flipped and the "winner" is asked to kill the loser in order for their preference to be realized; face-to-face to make it as difficult as possible, thereby maximizing the chances of one party or the other backing down.
  • If the parties consist of multiple individuals, the estimated weakest-willed person on the majority side has to kill (or convince to forfeit) the weakest person on the minority side; then the next-weakest, until the minority side is eliminated. If they can't or won't, then they're out of the fight, and replaced with the next-weakest person, et cetera until the minority is eliminated or the majority becomes the minority.

Basically, formalized war, only done in the opposite way of the strawman version in A Taste of Armageddon; making actual killing more difficult rather than easier.

A few reasons it's stupid:

  • People will tolerate conditions much worse than death (for themselves, or for others unable to self-advocate) rather than violate the taboo against killing or against "threatening" suicide.
  • The system may make bad social organizations worse by removing the most socially enlightened and active people first.
  • People have values outside themselves, so they'll stay alive and try to work for change rather than dying pointlessly and leaving things to presumably get worse and worse from their perspective.
  • Prompting people to kill or die for their values will galvanize them and make reconciliation less likely.
  • Real policy questions aren't binary, and how a question is framed or what order questions are considered in will probably strongly affect the outcome and who lives or dies, which will further affect future outcomes.
  • A side might win after initially taking casualties, or even be vindicated a long time after their initial battle. They'd want their people back, but keeping backups of people killed in a battle would make "killing" them much easier psychologically. It might also put them at risk of being restored in a dystopia that no longer respects their right to die. (Of course, people might still be reconstructed from records and others' memories even if they weren't stored anywhere in their entirety.)
  • The system assumes that there's a well-defined notion of an individual by which groups can be counted, and that individuals can't be created at will to try to outnumber opponents (possibly relevant: 1, 2, 3, 4).
  • People will immediately reject the system, so the first thing anyone "votes" for will be to abolish it, regardless of how much worse the result might be.
  • If there's an afterlife (i.e. simulation hypothesis), we might just be passing the buck.
  • I'm not sure it's a good idea to even public(al)ly discuss things like this.
Replies from: Document, RomeoStevens
comment by Document · 2013-02-22T20:37:16.832Z · LW(p) · GW(p)

Actually, I think I'm now remembering a better (or better-sounding) idea that occurred to me later: rather than something as extreme as deletion, let people "vote" by agreeing to be deinstantiated, giving up the resources that would have been spent instantiating them. It might be essentially the same as death if they stayed that way til the end of the universe, but it wouldn't be as ugly. Maybe they could be periodically awakened if someone wants to try to persuade them to change or withdraw their vote.

That would hopefully keep people from voting selfishly or without thorough consideration. On the other hand, it might insulate them from the consequences of poor policies.

Also, how to count votes is still a problem; where would "the resources that would have been spent instantiating them" come from? Is this a socialist world where everyone is entitled to a certain income, and if so, what happens when population outstrips resources? Or, in a laissez-faire world where people can run out of money and be deinstantiated, the idea amounts to plain old selling of votes to the rich, like we have now.

Basically, both my ideas seem to require a eutopia already in place, or at least a genuine 100% monopoly on force. I think that might be my point. Or maybe it's that a simple-sounding, socially acceptable idea like "If someone would rather die than tolerate the status quo, that's bad, and the status quo should be changed" isn't socially acceptable once you actually go into details and/or strip away the human assumptions.

comment by RomeoStevens · 2013-02-22T20:30:24.101Z · LW(p) · GW(p)

Can this be set up in a round robin fashion with sets of mutually exclusive values such that everyone who is willing to kill for their values kills each other?

Replies from: Document
comment by Document · 2013-02-22T20:44:18.185Z · LW(p) · GW(p)

Maybe if the winning side's values mandated their own deaths. But then it would be pointless for the sysop to respond to their threat of suicide to begin with, so I don't know. I'm not sure if there's something you're getting at that I'm not seeing.

Replies from: OrphanWilde, RomeoStevens
comment by OrphanWilde · 2013-02-26T18:45:39.444Z · LW(p) · GW(p)

"I'm not going to live there. There's no place for me there... any more than there is for you. Malcolm... I'm a monster.What I do is evil. I have no illusions about it, but it must be done. "

  • The Operative, from Serenity. (On the off-chance that somebody isn't familiar with that quote.)
comment by RomeoStevens · 2013-02-22T21:31:56.863Z · LW(p) · GW(p)

I'm thinking if you do the matchup's correctly you only wind up with one such person at the end, whom all the others secretly precommit to killing.

...maybe this shouldn't be discussed publicly.

Replies from: Document
comment by Document · 2013-02-22T22:14:12.552Z · LW(p) · GW(p)

I don't think the system works in the first place without a monopoly on lethal force. You could work within the system by "voting" for his death, but then his friends (if any) get a chance to join in the vote, and their friends, til you pretty much have a new war going. (That's another flaw in the system I could have mentioned.)

comment by handoflixue · 2013-02-22T01:03:53.917Z · LW(p) · GW(p)

I think the vast majority of the population would agree that genocide and mass murder are bad, same as wire heading and turning the earth in to paperclips. A single exception isn't terribly noteworthy - I'm sure there's at least a few pro-wire-heading people out there, and I'm sure at least a few people have gotten enraged enough at humanity to think paperclips would be a better use of the space.

If you have a reason to suspect that "mass murder" is a common preference, that's another matter.

Replies from: TimS, fubarobfusco
comment by TimS · 2013-02-22T01:07:22.052Z · LW(p) · GW(p)

Mass murder is an easy question.

Is the Sun King (who doesn't particularly desire pointless mass murder) more moral than I am? Much harder, and your articulation of "weak Friendliness" seems incapable of even trying to answer. And that doesn't even get into actual moral problems society actually faces every day (i.e. what is the most moral taxation scheme?).

If weak-FAI can't solve those types of problems, or even suggest useful directions to look, why should we believe it is a step on the path to strong-FAI?

Replies from: handoflixue
comment by handoflixue · 2013-02-22T01:29:58.190Z · LW(p) · GW(p)

Mass murder is an easy question.

That's my point. I'm not sure where the confusion is, here. Why would you call it useless to prevent wireheading, UFAI, and nuclear winter, just because it can't also do your taxes?

If it's easier to solve the big problems first, wouldn't we want to do that? And then afterwards we can take our sweet time figuring out abortion and gay marriage and tax codes, because a failure there doesn't end the species.

Replies from: TimS
comment by TimS · 2013-02-22T02:47:09.843Z · LW(p) · GW(p)

For reasons related to Hidden Complexity of Wishes, I don't think weak-FAI actually is likely to prevent "wireheading, UFAI, and nuclear winter." At best, it prohibits the most obvious implementations of those problems. And it is terribly unlikely to be helpful in creating strong-FAI.

And your original claim was that common human preferences already implement weak-FAI preferences. I think that the more likely reason why we haven;t had the disasters you reference is that for most of human history, we lacked the capacity to cause those problems. As actual society shows, hidden complexity of wishes make implementing social consensus hopeless, much less whatever smaller set of preferences is weak-FAI preferences.

Replies from: handoflixue
comment by handoflixue · 2013-02-22T19:37:00.558Z · LW(p) · GW(p)

As actual society shows, hidden complexity of wishes make implementing social consensus hopeless

My basic point was that we shouldn't worry about politics, at least not yet, because politics is a wonderful example of all the hard questions in CEV, and we haven't even worked out the easy questions like how to prevent nuclear winter. My second point was that humans do seem to have a much clearer CEV when it comes to "prevent nuclear winter", even if it's still not unanimous.

Implicit in that should have been the idea that CEV is still ridiculously difficult. Just like intelligence, it's something humans seem to have and use despite being unable to program for it.

So, then, summarized, I'm saying that we should perhaps work out the easy problems first, before we go throwing ourselves against harder problems like politics.

Replies from: TimS
comment by TimS · 2013-02-23T03:11:01.747Z · LW(p) · GW(p)

There's not a clear dividing line between "easy" moral questions and hard moral questions. The Cold War, which massively increased the risk of nuclear winter, was a rational expression of Great Power relations between two powers.

Until we have mutually acceptable ways of resolving disputes when both parties are rationally protecting their interests, we can't actually solve the easy problems either.

Replies from: handoflixue
comment by handoflixue · 2013-02-25T19:25:25.602Z · LW(p) · GW(p)

from you:

we can't actually solve the easy problems either.

and from me:

Implicit in that should have been the idea that CEV is still ridiculously difficult.

So, um, we agree, huzzah? :)

comment by fubarobfusco · 2013-02-23T18:17:39.209Z · LW(p) · GW(p)

I think the vast majority of the population would agree that genocide and mass murder are bad

Sure, genocide is bad. That's why the Greens — who are corrupting our precious Blue bodily fluids to exterminate pure-blooded Blues, and stealing Blue jobs so that Blues will die in poverty — must all be killed!

comment by gwern · 2013-02-22T00:50:27.172Z · LW(p) · GW(p)

A hands-off AI overlord can prevent all of that, while still letting humanity squabble over gay rights and which religion is correct.

We usually call that the 'sysop AI' proposal, I think.

comment by OrphanWilde · 2013-02-21T22:05:08.450Z · LW(p) · GW(p)

There's a bootstrapping problem inherent to handing AI the friendliness problem to solve.

Edit: Unless you're suggesting we use a Weakly Friendly AI to solve the hard problem of Strong Friendliness?

Replies from: handoflixue
comment by handoflixue · 2013-02-22T00:11:42.498Z · LW(p) · GW(p)

Your edit pretty much captures my point, yes :) If nothing else, a Weak Friendly AI should eliminate a ton of the trivial distractions like war and famine, and I'd expect that humans have a much more unified volition when we're not constantly worried about scarcity and violence. There's not a lot of current political problems I'd see being relevant in a post-AI, post-scarcity, post-violence world.

Replies from: Dre, OrphanWilde, DaFranker, Rukifellth, Rukifellth
comment by Dre · 2013-02-22T17:21:43.617Z · LW(p) · GW(p)

The problem is that we have to guarantee that the AI doesn't do something really bad while trying to stop these problems; what if it decides it really needs more resources suddenly, or needs to spy on everyone, even briefly? And it seems (to me at least) that stopping it from having bad side effects is pretty close, if not equivalent to, Strong Friendliness.

Replies from: handoflixue
comment by handoflixue · 2013-02-22T19:20:25.053Z · LW(p) · GW(p)

I should have made that more clear: I still think Weak-Friendliness is a very difficult problem. My point is simply that we only need an AI that solves the big problems, not an AI that can do our taxes. My second point was that humans seem to already implement weak-friendliness, barring a few historical exceptions, whereas so far we've completely failed at implementing strong-friendliness.

I'm using Weak vs Strong here in the sense of Weak being a "SysOP" style AI that just handles catastrophes, whereas Strong is the "ushers in the Singularity" sort that usually gets talked about here, and can do your taxes :)

comment by OrphanWilde · 2013-02-22T00:41:13.038Z · LW(p) · GW(p)

This... may be an amazing idea. I'm noodling on it.

comment by DaFranker · 2013-02-22T19:04:01.083Z · LW(p) · GW(p)

Edit: Completely misread the parent.

comment by Rukifellth · 2013-02-22T05:35:15.888Z · LW(p) · GW(p)

I know this wasn't the spirit of your post, but I wouldn't refer to war and famine as "trivial distractions".

comment by Rukifellth · 2013-02-22T01:39:29.014Z · LW(p) · GW(p)

Wait, if you're regarding the elimination of war, famine and disease as consolation prizes for creating an wFAI, what are people expecting from a sFAI?

Replies from: Fadeway
comment by Fadeway · 2013-02-22T03:43:59.218Z · LW(p) · GW(p)

God. Either with or without the ability to bend the currently known laws of physics.

Replies from: Rukifellth
comment by Rukifellth · 2013-02-22T05:17:41.867Z · LW(p) · GW(p)

No, really.

Replies from: Richard_Kennaway
comment by Richard_Kennaway · 2013-02-22T14:26:16.032Z · LW(p) · GW(p)

Really. That really is what people are expecting of a strong FAI. Compared with us, it will be omniscient, omnipotent, and omnibenevolent. Unlike currently believed-in Gods, there will be no problem of evil because it will remove all evil from the world. It will do what the Epicurean argument demands of any God worthy of the name.

Replies from: Rukifellth
comment by Rukifellth · 2013-02-22T14:44:20.001Z · LW(p) · GW(p)

Are you telling me that if a wFAI were capable of eliminating war, famine and disease, it wouldn't be developed first?

Replies from: Richard_Kennaway
comment by Richard_Kennaway · 2013-02-22T18:13:38.517Z · LW(p) · GW(p)

Well, I don't take seriously any of these speculations about God-like vs. merely angel-like creations. They're just a distraction from the task of actually building them, which no-one knows how to do anyway.

Replies from: Rukifellth
comment by Rukifellth · 2013-02-22T18:40:17.099Z · LW(p) · GW(p)

But still, if a wFAI was capable of eliminating those things, why be picky and try for sFAI?

Replies from: RomeoStevens
comment by RomeoStevens · 2013-02-22T21:41:25.457Z · LW(p) · GW(p)

Because we have no idea how hard it is to specify either. If, along the way it turns out to be easy to specify wFAI and risky to specify sFAI, then the reasonable course is expected. Doubly so since a wFAI would almost certainly be useful in helping specify a sFAI.

Seeing as human values are a miniscule target, it seems probable that specifying wFAI is harder than sFAI though.

Replies from: Rukifellth
comment by Rukifellth · 2013-02-25T05:05:53.377Z · LW(p) · GW(p)

"Specify"? What do you mean?

Replies from: RomeoStevens
comment by RomeoStevens · 2013-02-25T05:07:58.245Z · LW(p) · GW(p)

specifications a la programming.

Replies from: Rukifellth
comment by Rukifellth · 2013-02-26T17:30:20.082Z · LW(p) · GW(p)

Why would it be harder? One could tell the wFAI improve factors that are strongly correlated with human values, such as food stability, resources that cure preventable diseases (such as diarrhea, which, as we know, kills way more people than it should) and security from natural disasters.

Replies from: RomeoStevens
comment by RomeoStevens · 2013-02-26T19:57:13.420Z · LW(p) · GW(p)

Because if you screw up specifying human values you don't get wFAI you just die (hopefully).

Replies from: Rukifellth
comment by Rukifellth · 2013-02-26T20:00:40.807Z · LW(p) · GW(p)

It's not optimizing human values, it's optimizing circumstances that are strongly correlated with human values. It would be a logistics kind of thing.

Replies from: RomeoStevens
comment by RomeoStevens · 2013-02-26T20:07:41.350Z · LW(p) · GW(p)

Have you ever played corrupt a wish?

Replies from: Rukifellth
comment by Rukifellth · 2013-02-27T00:42:04.100Z · LW(p) · GW(p)

No, but I'm guessing I'm about to.

"I wish for a list of possibilities for sequences of actions, any of whose execution would satisfy the following conditions.

  • Within twenty years, for Nigeria to have standards of living such that it would receive the same rating as Finland on [Placeholder UN Scale of People's-Lives-Not-Being-Awful]."

The course of action would be evaluated by a think-tank, until they decided that the course of actions was acceptable, and the wFAI was given the go.

Replies from: RomeoStevens
comment by RomeoStevens · 2013-02-27T01:26:54.814Z · LW(p) · GW(p)

The AI optimizes only for that and doesn't generate a list of non-obvious side effects. You implement one of them and something horrible happens to finland, and or countries beside nigeria.

or

In order to generate said list I simulate Nigeria millions of times to a resolution such that entities within the simulation pass the turing test. Most of the simulations involve horrible outcomes for all involved.

or

I generate such a list including many sequences of actions that lead to a small group being able to take over nigeria and or finland and or the world. (or generates some other power differential that screws up international relations)

or

In order to execute such an action I need more computing power, and you forgot to specify what are acceptable actions for obtaining it.

or

The wFAI is much cleverer than a single human thinking about this for 2 minutes and can screw things up in ways that are as opaque to you as human actions are to a dog.

In general, specifying an oracle/tool AI is not safe: http://lesswrong.com/lw/cze/reply_to_holden_on_tool_ai/

Even more generally, our ability to build an AI that is friendly will have nothing to do with our ability to generate clauses in english that sound reasonable.

comment by Mimosa · 2013-02-22T20:56:32.798Z · LW(p) · GW(p)

Part of the problem is the many factors involved in the political issues. People explain things through their own specialty, but lack knowledge of other specialties.

comment by Decius · 2013-02-22T05:10:09.583Z · LW(p) · GW(p)

Why do you restrict Strong Friendliness to human values? Is there some value which an intelligence can have that can never be a human value?

Replies from: OrphanWilde
comment by OrphanWilde · 2013-02-22T14:26:21.007Z · LW(p) · GW(p)

Because we're the one that has to live with the thing, and I don't know but my inclination is that the answer is "Yes"

Replies from: Decius
comment by Decius · 2013-02-23T10:13:33.489Z · LW(p) · GW(p)

Implication: A Strongly Friendly (paperclip maximizer) AI is actually a meaningful phrase. (As opposed to all Strongly Friendly AIs being compatible with everyone)

Why all human values?

comment by Kawoomba · 2013-02-21T20:38:23.262Z · LW(p) · GW(p)

You're making the perfect the enemy of the good.

I'm fine with at least a thorough framework for Weak Friendliness. That's not gonna materialize out of nothing. There are no actual Turing Machines (infinite tapes required), yet it is a useful model and its study yields useful results for real world applications.

Studying Strong Friendliness is a useful activity in finding a heuristic for best-we-can-do friendliness, which is way better than nothing.

comment by JoshuaFox · 2013-02-21T19:03:19.161Z · LW(p) · GW(p)

Politics as a process doesn't generate values; they're strictly an input,

Politics is part about choosing goals/values. (E.g., do we value equality or total wealth?) It is also about choosing the means to achieving the goals. And it is also about signaling power. Most of these are not relevant to designing a future Friendly AI.

Yes, a polity is an "optimizer" in some crude sense, optimizing towards a weighted sum of the values of its members with some degree of success. Corporations and economies have also been described as optimizers. But I don't see too much similarity to AI design here.

Replies from: None
comment by [deleted] · 2013-02-21T21:41:25.812Z · LW(p) · GW(p)

Deciding what we value isn't relevant to friendliness? Could you explain that to me?

Replies from: Larks, JoshuaFox
comment by Larks · 2013-02-22T10:18:10.591Z · LW(p) · GW(p)

The whole point of CEV is that we give the AI an algorithm for educing our values, and let it run. At no point do we try to work them out ourselves.

Replies from: None
comment by [deleted] · 2013-02-25T22:00:09.602Z · LW(p) · GW(p)

I mentally responded to you and forgot to, you know, actually respond.

I'm a bit confused by this and since it was upvoted I'm less sure I get CEV....

It might clear things up to point out that I'm making a distinction between goals or preferences vs. values. CEV could be summarized as "fulfill our ideal rather than actual preferences", yeah? As in, we could be empirically wrong about what would maximize the things we care about, since we can't really be wrong about what to care about. So I imagine the AI needing to be programmed with our values- the meta wants that motivate our current preferences- and it would extrapolate from them to come up with better preferences, or at least it seems that way to me. Or does the AI figure that out too somehow? If so, what does an algorithm that figures out our preferences and our values contain?

Replies from: Larks
comment by Larks · 2013-02-26T10:43:28.708Z · LW(p) · GW(p)

Ha, yes, I often do that.

The motivation behind CEV also includes the idea we might be wrong about what we care about. Instead, you give your FAI an algorithm for

  • Locating people
  • Working out what they care about
  • Working out what they would care about if they knew more, etc.
  • Combining these preferences

I'm not sure what distinction you're trying to draw between values and preferences (perhaps a moral vs non-moral one?), but I don't think it's relevant to CEV as currently envisioned.

comment by JoshuaFox · 2013-02-22T11:48:45.862Z · LW(p) · GW(p)

Actually, when I said "most" in "most of these are not relevant to designing a future Friendly AI," I was thinking that values are the exception, they are relevant.

Replies from: None
comment by [deleted] · 2013-02-22T20:52:51.880Z · LW(p) · GW(p)

Oh. Then yeah ok I think I agree.