Friendly AI ideas needed: how would you ban porn?

post by Stuart_Armstrong · 2014-03-17T18:00:27.069Z · LW · GW · Legacy · 80 comments

To construct a friendly AI, you need to be able to make vague concepts crystal clear, cutting reality at the joints when those joints are obscure and fractal - and them implement a system that implements that cut.

There are lots of suggestions on how to do this, and a lot of work in the area. But having been over the same turf again and again, it's possible we've got a bit stuck in a rut. So to generate new suggestions, I'm proposing that we look at a vaguely analogous but distinctly different question: how would you ban porn?

Suppose you're put in change of some government and/or legal system, and you need to ban pornography, and see that the ban is implemented. Pornography is the problem, not eroticism. So a lonely lower-class guy wanking off to "Fuck Slaves of the Caribbean XIV" in a Pussycat Theatre is completely off. But a middle-class couple experiencing a delicious frisson when they see a nude version of "Pirates of Penzance" at the Met is perfectly fine - commendable, even.

The distinction between the two case is certainly not easy to spell out, and many are reduced to saying the equivalent of "I know it when I see it" when defining pornography. In terms of AI, this is equivalent with "value loading": refining the AI's values through interactions with human decision makers, who answer questions about edge cases and examples and serve as "learned judges" for the AI's concepts. But suppose that approach was not available to you - what methods would you implement to distinguish between pornography and eroticism, and ban one but not the other? Sufficiently clear that a scriptwriter would know exactly what they need to cut or add to a movie in order to move it from one category to the other? What if the nude "Pirates of of Penzance" was at a Pussycat Theatre and "Fuck Slaves of the Caribbean XIV" was at the Met?

To get maximal creativity, it's best to ignore the ultimate aim of the exercise (to find inspirations for methods that could be adapted to AI) and just focus on the problem itself. Is it even possible to get a reasonable solution to this question - a question much simpler than designing a FAI?

80 comments

Comments sorted by top scores.

comment by Kaj_Sotala · 2014-03-17T18:27:51.443Z · LW(p) · GW(p)

But suppose that approach was not available to you - what methods would you implement to distinguish between pornography and eroticism, and ban one but not the other? Sufficiently clear that a scriptwriter would know exactly what they need to cut or add to a movie in order to move it from one category to the other? What if the nude "Pirates of of Penzance" was at a Pussycat Theatre and "Fuck Slaves of the Caribbean XIV" was at the Met?

Not saying that I would endorse this as a regulatory policy, but it's my understanding that the strategy used by e.g. the Chinese government is to not give any explicit guidelines. Rather, they ban things which they consider to be out of line and penalize the people who produced/distributed them, but only give a rough reason. The result is that nobody tries to pull tricks like obeying the letter of the regulations while avoiding the spirit of them. Quite the opposite, since nobody knows what exactly is safe, people end up playing it as safe as possible and avoiding anything that the censors might consider a provocation.

Of course this errs on the side of being too restrictive, which is a problem if the eroticism is actually something that you'd want to encourage. But then you always have to err in one of the directions. And if you do a good enough job at picking the people who you make public examples of, maybe most people will get the right message. You know it when you see it, so punish people when you do see something wrong, and let them off when you don't see anything wrong: eventually people will be able to learn the pattern that your judgments follow. Even if they couldn't formulate an explicit set of criteria for porn-as-defined-by-you, they'll know it when they see it.

Replies from: Risto_Saarelma, Gunnar_Zarncke
comment by Risto_Saarelma · 2014-03-17T20:33:05.656Z · LW(p) · GW(p)

If the problem is about classifier design, I supposed that in the least convenient possible world both the porn and the high culture media were being beamed from Alpha Centauri. Instead of being able to affect the production in any way, all you could do was program the satellite relay that propagates the stuff to terrestrial networks to filter the porny bits while feeding the opera bits to lucrative pay-per-view, while trying not to think too hard about just what is going on at Alpha Centauri and why is it resulting pitch-perfect human porn movies and opera performances being narrowcast at Earth in high resolution video.

comment by Gunnar_Zarncke · 2014-03-17T20:37:53.207Z · LW(p) · GW(p)

Which is exactly "refining the AI's values through interactions with human decision makers, who answer questions about edge cases and examples and serve as "learned judges" for the AI's concepts".

Replies from: Kaj_Sotala
comment by Kaj_Sotala · 2014-03-18T11:21:38.202Z · LW(p) · GW(p)

Hmm, I read the original post rather quickly, so I actually missed the fact that the analogy was supposed to map to value loading. I mistakenly assumed that this was about how to ban/regulate AGI while still allowing more narrow AI.

comment by Nornagest · 2014-03-17T21:10:04.371Z · LW(p) · GW(p)

Short answer: Mu.

Longer answer: "Porn" is clearly underspecified, and to make matters worse there's no single person or interest group that we can try to please with our solution: many different groups (religious traditionalists, radical feminists, /r/nofap...) dislike it for different and often conflicting reasons. This wouldn't be such a problem -- it's probably possible to come up with a definition broad enough to satisfy all parties' appetites for social control, distasteful as such a thing is to me -- except that we're also trying to leave "eroticism" alone. Given that additional constraint, we can't possibly satisfy everyone; the conflicting parties' decision boundaries differ too much.

We could then come up with some kind of quantification scheme -- show questionable media to a sample of the various stakeholders, for example -- and try to satisfy as many people as possible. That's probably the least-bad way of solving the problem as stated, and we can make it as finely grained as we have money for. It's also one that's actually implemented in practice -- the MPAA ratings board works more or less like this. Note however that it still pisses a lot of people off.

I think a better approach, however, would be to abandon the question as stated and try to solve the problem behind it. None of the stakeholders actually care about banning media-labeled-porn (unless they're just trying to win points by playing on negative emotional valence, a phenomenon I'll contemptuously ignore); instead, they have different social agendas that they're trying to serve by banning some subset of media with that label. Social conservatives want to limit perceived erosion of traditional propriety mores and may see open sexuality as sinful; radical feminists want to reduce what they see as exploitative conditions in the industry and to eliminate media they perceive as objectifying women; /r/nofap wants what it says on the tin.

Depending on the specifics of these objections, we can make interventions a lot more effective and less expensive than varying the exact criteria of a ban: we might be able to satisfy /r/nofap and some conservatives, for example, by instituting an opt-out process by which individuals could voluntarily and verifiably bar themselves from purchasing prurient media (or accessing websites, with the help of a friendly ISP). If we have a little more latitude, we could even look at these agendas and the reasoning behind them, see if they're actually well-founded and well-targeted, and ignore them if not.

Replies from: Oscar_Cunningham, itaibn0
comment by Oscar_Cunningham · 2014-03-17T21:50:10.185Z · LW(p) · GW(p)

I think a better approach, however, would be to abandon the question as stated and try to solve the problem behind it.

Note that this is the general method for dealing with confused concepts.

Replies from: Nornagest
comment by Nornagest · 2014-03-17T22:47:05.265Z · LW(p) · GW(p)

Yeah. An earlier version of my post started by saying so, but I decided that the OP had been explicit enough in asking for an object-level solution that I'd be better off laying out more of the reasoning behind going meta.

comment by itaibn0 · 2014-03-18T22:01:07.775Z · LW(p) · GW(p)

This all sounds reasonable to me. Now what happens when you apply the same reasoning to Friendly AI?

Replies from: Nornagest
comment by Nornagest · 2014-03-18T22:47:39.601Z · LW(p) · GW(p)

Nothing particularly new or interesting, as far as I can tell. It tells us that defining a system of artificial ethics in terms of the object-level prescriptions of a natural ethic is unlikely to be productive; but we already knew that. It also tells us that aggregating people's values is a hard problem and that the best approaches to solving it probably consist of trying to satisfy underlying motivations rather than stated preferences; but we already knew that, too.

comment by Lumifer · 2014-03-17T18:29:49.542Z · LW(p) · GW(p)

distinguish between pornography and eroticism

Aren't you assuming these two are at different sides of a "reality joint"?

I tend to treat these words as more or less synonyms in that they refer to the same thing but express different attitude on the part of the speaker.

Replies from: Stuart_Armstrong, Gunnar_Zarncke
comment by Stuart_Armstrong · 2014-03-18T10:38:01.460Z · LW(p) · GW(p)

I chose those examples because the edge cases seem distinct, but the distinction seems very hard to formally define.

Replies from: Lumifer
comment by Lumifer · 2014-03-18T16:11:41.355Z · LW(p) · GW(p)

I don't think the edge cases are as distinct as they seem to you to be.

Generally speaking, pornography and eroticism are two-argument things, the first argument is the object (the text/image/movie), and the second argument is the subject (the reader/watcher) together with all his cultural and personal baggage.

Trying to assume away the second argument isn't going to work. The cultural and individual differences are too great.

comment by Gunnar_Zarncke · 2014-03-17T20:38:32.342Z · LW(p) · GW(p)

Doesn't seem lawmakers see it the same way.

comment by Oscar_Cunningham · 2014-03-17T19:52:12.840Z · LW(p) · GW(p)

To construct a friendly AI, you need to be able to make vague concepts crystal clear, cutting reality at the joints when those joints are obscure and fractal - and them implement a system that implements that cut.

I don't think that this is true. Reductionist solutions to philosophical problems typically pick some new concepts which can be crisply defined, and then rephrase the problem in terms of those, throwing out the old fuzzy concepts in the process. What they don't do is to take the fuzzy concepts and try to rework them.

For example, nowhere in the "Free Will Sequence" does Eliezer give a new clear definition of "free will" by which one may decide whether something has free will or not. Instead he just explains all the things that you might want to explain with "free will" using concepts like "algorithm".

For another example, pretty much all questions of epistemic rationality are settled by Bayesianism. Note that Bayesianism doesn't contain anywhere a definition of "knowledge". So we've successfully dodged the "problem of knowledge".

So the answer to the title question is to ask what you want to achieve by banning porn, and then ban precisely the things such that banning them helps you achieve that aim. Less tautologically, my point is that the correct way of banning porn isn't to make a super precise definition of "porn" and then implement that definition.

comment by lucidian · 2014-03-17T20:58:09.598Z · LW(p) · GW(p)

To construct a friendly AI, you need to be able to make vague concepts crystal clear, cutting reality at the joints when those joints are obscure and fractal - and them implement a system that implements that cut.

Strongly disagree. The whole point of Bayesian reasoning is that it allows us to deal with uncertainty. And one huge source of uncertainty is that we don't have precise understandings of the concepts we use. When we first learn a new concept, we have a ton of uncertainty about its location in thingspace. As we collect more data (either through direct observation or indirectly through communication with other humans), we are able to decrease that uncertainty, but it never goes away completely. An AI which uses human concepts will have to be able to deal with concept-uncertainty and the complications that arise as a result.

The fact that humans can't always agree with each other on what constitutes porn vs. erotica demonstrates that we don't all carve reality up in the same places (and therefore there's no "objective" definition of porn). The fact that individual humans often have trouble classifying edge cases demonstrates that even when you look at a single person's concept, it will still contain some uncertainty. The more we discuss and negotiate the meanings of concepts, the less fuzzy the boundaries will become, but we can't remove the fuzziness completely. We can write out a legal definition of porn, but it won't necessarily correspond to the black-box classifiers that real people are using. And concepts change - what we think of as porn might be classified differently in 100 years. An AI can't just find a single carving of reality and stick with it; the AI needs to adapt its knowledge as the concepts mutate.

So I'm pretty sure that what you're asking is impossible. The concept-boundaries in thingspace remain fuzzy until humans negotiate them by discussing specific edge cases. (And even then, they are still fuzzy, just slightly less so.) So there's no way to find the concept boundaries without asking people about it; it's the interaction between human decision makers that define the concept in the first place.

Replies from: Kaj_Sotala
comment by Kaj_Sotala · 2014-03-18T11:32:10.932Z · LW(p) · GW(p)

The fact that individual humans often have trouble classifying edge cases demonstrates that even when you look at a single person's concept, it will still contain some uncertainty.

Related paper. Also sections 1 and 2 of this paper.

comment by diegocaleiro · 2014-03-17T19:08:39.003Z · LW(p) · GW(p)

This is a sorites problem and you want to sort some pebbles into kinds. You've made clear that externalism about porn may be true (something may begin or stop being porn in virtue of properties outside it's own inherent content, such as where it is and contextual features).

It seems to me that you have to prioritize your goals in this case. So the goal "ban porn" is much more important then the goal "leave eroticism alone". My response would be to play safe and ban all footage including genitals, similar to what the Japanese already do with pubes etc...

This response is analogous to your Oracle AI paper suggestion, only mine is less sophisticated.

Now, I can steelman your request, you must ban all porn and with the exact same importance you must endorse that eroticism take place. Then I'd evoke the works of skeptical philosophers about the sorites problem, like Paul Unger, who doesn't think people nor himself exist (while still thinking you should not "live high and let die").

I'd argue that these theorists are right, and that thus there is no matter of fact as to where porn stops and erotism begins.

The goal is still there though, the goal is embedded in Nature, regardless of what we think about it? Ok. Now if we know there is no fact of the matter, and erring in both directions would be equally bad, we should make pools to find out the general opinion on the porn eroticism divide (seems safe to bet that porn is porn in virtue of its relation to human minds, since chimp porn is not attractive to us). Those pools would be compulsory, like voting is in some places, we'd have as good an idea about what the human mind considers porn as we can. Like hot or not, there would be worldwide porn or not websites.

This would go on (causing a lot of damage every time someone had to detect porn) for a while, until the best algorithms of the time could do the detecting indistinguishably from the average of all humans to some approximation. When that was done, anyone attempting to create eroticism would have their files automatically scanned for porn-ness by the narrow AI.

We would have the best system possible to avoid infractions in both directions of the sorites problem. For every doubling of the human population from then on, re-check to see if our senses have shifted drastically from the previous ones and adapt the algorithm.

Seems desirable to over the long run make sure cultural globalization of morals is very intense, so I'd ban immigration laws to make sure humanity clusters around a narrow subset of mindspace as it regards porn or not.

Depending on how that goal factors in with every single other goal we have ever had, it may be good to destroy lots of resources after the moral globalization process is done, so that people spend more resources on survival and have fewer chances of drifting morally apart. Following the same reasoning, it may be desirable to progressively eliminate males, since males are less biologically necessary, and porn judgement varies significantly by gender.

Replies from: Stuart_Armstrong
comment by Stuart_Armstrong · 2014-03-18T10:54:49.167Z · LW(p) · GW(p)

For every doubling of the human population from then on, re-check to see if our senses have shifted drastically from the previous ones and adapt the algorithm.

I feel this kind of idea could have some AI potential in some form or other. Let me think about it...

comment by jimrandomh · 2014-03-17T18:47:42.365Z · LW(p) · GW(p)

The distinction between eroticism and pornography is that it's porn of a typical viewer wanks to it. Like the question of whether something is art, the property is not intrinsic to the thing itself.

That this question was so easy, very slightly decreases my difficulty estimate for Friendliness.

Replies from: CronoDAS, Stuart_Armstrong
comment by CronoDAS · 2014-03-17T19:06:39.741Z · LW(p) · GW(p)

So if you only show your porn somewhere that makes it inconvenient to masturbate to (such as at the Met) then it's no longer porn? ;)

Replies from: jimrandomh, knb
comment by jimrandomh · 2014-03-17T19:43:47.632Z · LW(p) · GW(p)

Yes.

comment by knb · 2014-03-18T01:37:15.464Z · LW(p) · GW(p)

In the same sense that a broken toilet in an art gallery is a powerful dadaist work of art, but a broken toilet in an alley is a broken toilet.

comment by Stuart_Armstrong · 2014-03-18T10:47:58.337Z · LW(p) · GW(p)

If I port this type of idea over to AI, I would get things like "the definition of human pain is whether the typical sufferer desires to scream or not". Those definition can be massively gamed, of course; but it does hint that if we define a critical mass of concepts correctly (typical, desires...) we can ground some undefined concepts in those ones. It probably falls apart the more we move away from standard human society (eg will you definition of porn work for tenth generation uploads?).

So in total, if we manage to keep human society relatively static, and we have defined a whole host of concepts, we may be able to ground extra ambiguous concepts using what we've already defined. The challenge seems keeping human society (and humans!) relatively static.

comment by diegocaleiro · 2014-03-18T02:49:15.926Z · LW(p) · GW(p)

I'll cite the comment section on this post to friends whenever I need to say: "And this, my friends, is why you don't let rationalists discuss porn" http://imgs.xkcd.com/comics/shopping_teams.png

comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2014-03-17T20:10:25.438Z · LW(p) · GW(p)

In terms of AI, this is equivalent with "value loading": refining the AI's values through interactions with human decision makers, who answer questions about edge cases and examples and serve as "learned judges" for the AI's concepts. But suppose that approach was not available to you

But it is, and the contrary approach of teaching humans to recognize things doesn't have an obvious relation to FAI, unless we think that the details of teaching human brains by instruction and example are relevant to how you'd set up a similar training program for an unspecified AI algorithm. If this is the purported connection to FAI it should be spelled out explicitly, and the possible failure of the connection spelled out explicitly. I'm also not sure the example is a good one for the domain. Asking how to distinguish happiness from pleasure, what people really want from what they say they want, the difference between panic and justified fear? Or maybe if we want to start with something more object-level, what should be tested and when you should draw confident conclusions about what someone's taste buds will like (under various circumstances?), i.e., how much do you need to know to decide that someone will like the taste of a cinnamon candy if they've never tried anything cinnamon before? Porn vs. erotica seems meant to take us into a realm of conflicting values, disagreements, legalisms, and a large prior literature potentially getting in the way of original thinking - if each of these aspects is meant to be relevant, then can the relevance of each aspect be spelled out?

I like the "What does it take to predict taste buds?" question, of those I brainstormed above, because it's something we could conceivably test in practice. Or maybe an even more practical conjugate would be Netflix-style movie score prediction, only you can ask the subject whatever you like, have them rate particular other movies, etc., all to predict the rating on that one movie.

Replies from: itaibn0, Stuart_Armstrong, Gunnar_Zarncke
comment by itaibn0 · 2014-03-18T22:04:56.726Z · LW(p) · GW(p)

Porn vs. erotica seems meant to take us into a realm of conflicting values, disagreements, legalisms, and a large prior literature potentially getting in the way of original thinking - if each of these aspects is meant to be relevant, then can the relevance of each aspect be spelled out?

Well, conflicting values is obviously relevant, and disagreements seem so as well to a less extend (consider the problem of choosing priors for an AI), for starters.

comment by Stuart_Armstrong · 2014-03-18T11:02:56.136Z · LW(p) · GW(p)

If this is the purported connection to FAI it should be spelled out explicitly, and the possible failure of the connection spelled out explicitly.

I'm just fishing in random seas for new ideas.

comment by Gunnar_Zarncke · 2014-03-17T20:50:43.266Z · LW(p) · GW(p)

The movie prediction question is complicated because it includes feedbacl cycles over styles and tastes and is probably cross-linked to other moves airing at the same time. See e.g. http://www.stat.berkeley.edu/~aldous/157/Old_Projects/kennedy.pdf

The "predict taste buds?" question is better. But even that one contains feedback cycles over tastes. At least on some domains like wine and probably cigarettes and expensive socially consumed goods.

comment by Lumifer · 2014-03-17T19:09:10.575Z · LW(p) · GW(p)

This thread should provide interesting examples of the Typical Mind Fallacy... :-D

comment by [deleted] · 2014-03-17T19:06:30.745Z · LW(p) · GW(p)

An alternative to trying to distinguish between porn and erotica on the basis of content or user attitudes: teach the AI to detect infrastructures of privacy and subterfuge, and to detect when people are willing to publicly patronize and self-identify with something. Most people don't want others to know that they enjoy porn. You could tell your boss about the nude Pirates you saw last weekend, but probably not the porn. Nude Pirates shows up on the Facebook page, but not so much the porn. An online video with naked people that has half a million views, but is discussed nowhere where one's identity is transparent, is probably porn. It's basic to porn that it's enjoyed privately, erotica publicly.

Replies from: Gunnar_Zarncke, Stuart_Armstrong
comment by Gunnar_Zarncke · 2014-03-17T20:42:21.648Z · LW(p) · GW(p)

Except that this doesn't hold in all social circles. Once there is a distinction people will start to use it to make a difference.

Replies from: None
comment by [deleted] · 2014-03-17T20:56:50.016Z · LW(p) · GW(p)

Well, it's sufficient for our purposes that it holds in most. Proud and public porn consumers are outliers, and however an AI might make ethical distinctions, there will always be a body of outliers to ignore. But I grant that my approach is culturally relative. In my defense, a culture ordered such that this approach wouldn't work at all probably wouldn't seek a ban on porn anyway, and might not even be able to make much sense of the distinction we're working with here.

Replies from: Gunnar_Zarncke
comment by Gunnar_Zarncke · 2014-03-17T22:00:19.831Z · LW(p) · GW(p)

Handling sub cultures is difficult. We can ignore outliers because our rules are weak and don't reach those who make their own local rules and accept the price of violating (or bending) some larger sociaties rules. But an AI may not treat them the same. An AI will be able to enforce the rules on the outliers and effectively kill those sub cultures. Do we ant this? One size fits all? I don't think so. The complex value function must also allow 'outliers' - only the concept must be made stronger.

comment by Stuart_Armstrong · 2014-03-18T10:50:24.673Z · LW(p) · GW(p)

I like this kind of indirect approach. I wonder if such ideas could be ported to AI...

comment by [deleted] · 2014-03-17T19:11:48.631Z · LW(p) · GW(p)

I would guess that eroticism is supposed to inspire creativity while pronography supposedly replaces it. So if the piece in question were to be presented to people while their brain activity is being monitored I would expect to see an increase of activity throughout the brain for eroticism while I'd expect a decrease or concentration of activity for pornography. Although I have no idea if that is actually the case.

Without reference to sexual stimulation this would include a lot of things that are not currently thought of as pornography, but that might actually be intentional depending on the reason why someone would want to ban pornography.

comment by Wei Dai (Wei_Dai) · 2014-03-18T22:45:24.740Z · LW(p) · GW(p)

I think there is no fundamental difference between porn and erotica, it's just that one is low status and the other is high status (and what's perceived as highs status depends greatly on the general social milieu so it's hard to give any kind of stand-alone definition to delineate the two). It only seems like there are two "clusters in thingspace" because people tend to optimize their erotic productions to either maximize arousal or maximize status, without much in between (unless there is censorship involved, in which case you might get shows that are optimized to just barely pass censorship). Unfortunately I don't think this answer helps much with building FAI.

comment by drnickbone · 2014-03-18T08:30:01.206Z · LW(p) · GW(p)

A couple of thoughts here:

  1. Set a high minimum price for anything arousing (say $1000 a ticket). If it survives in the market at that price, it is erotica; if it doesn't, it was porn. This also works for $1000 paintings and sculptures (erotica) compared to $1 magazines (porn).

  2. Ban anything that is highly arousing for males but not generally liked by females. Variants on this: require an all-female board of censors; or invite established couples to view items together, and then question them separately (if they both liked it, it's erotica). Train the AI on examples until it can classify independently of the board or couples.

Replies from: Nornagest, mwengler
comment by Nornagest · 2014-03-18T17:35:56.971Z · LW(p) · GW(p)

Ban anything that is highly arousing for males but not generally liked by females.

I can see the headline now: "Yaoi Sales Jump on Controversial FCC Ruling".

comment by mwengler · 2014-03-18T19:45:37.621Z · LW(p) · GW(p)

Set a high minimum price for anything arousing (say $1000 a ticket). If it survives in the market at that price, it is erotica; if it doesn't, it was porn. This also works for $1000 paintings and sculptures (erotica) compared to $1 magazines (porn).

I doubt that that works. What makes you think there are no rich guys who want to see pornography? They will simply buy it at the $1000 price.

I can think of no reason why price discrimination would favor "art" over porn.

Replies from: drnickbone
comment by drnickbone · 2014-03-18T21:39:49.700Z · LW(p) · GW(p)

A "few" rich guys buying (overpriced) porn is unlikely to sustain a real porn industry. Also, using rich guy logic, it is probably a better investment to buy the sculptures, paintings, art house movies etc, amuse yourself with those for a while, then sell them on. Art tends to appreciate over time.

comment by someonewrongonthenet · 2014-03-21T04:30:46.331Z · LW(p) · GW(p)

Just criminalize porn, and leave it to the jury to decide whether or not it's porn. That's how we handle most moral ambiguities, isn't it?

I will assume that the majority of the population shares my definition of porn and is on board with this, creating low risk of an activist jury (otherwise this turns into the harder problem of "how to seize power from the people".)

Edit: On more careful reading, I guess that's not allowed since it would fall in the "I know it when I see it" category. But then, since we obviously are not going to write an actual algorithm, how specific does the answer need to be?

Would "It is pornography if the intention is primarily to create sexual arousal, and it's up to the jury to decode intention" be an acceptably well-defined answer? Would "I'm going to use theoretically possible mind-reading technology to determine whether or not the viewer / creator of the pornography were primarily intending to view / create sexually arousing stimuli" be an acceptably well defined answer? Do I have to define the precise threshold upon which something is "primarily" about a factor with neuron-level accuracy, or can I just approximately define the threshold of "primarily" via a corpus of examples?

I guess what I'm saying is... "how to ban pornography" seems to be "solved" in the abstract as soon as you adequately define pornography, and the rest is all implementation.

comment by Squark · 2014-03-23T21:09:11.481Z · LW(p) · GW(p)

To construct a friendly AI, you need to be able to make vague concepts crystal clear, cutting reality at the joints when those joints are obscure and fractal - and them implement a system that implements that cut.

I don't think that's how the solution to FAI will look like. I think the solution to FAI will look like "Look, this a human (or maybe an uploaded human brain), it is an agent, it has a utility function. You should be maximizing that."

Replies from: Stuart_Armstrong
comment by Stuart_Armstrong · 2014-03-24T10:29:20.846Z · LW(p) · GW(p)

The clearer we make as many concepts as we can, the more likely is is that "look, this is..." is going to work.

Replies from: Squark
comment by Squark · 2014-03-25T12:33:11.588Z · LW(p) · GW(p)

Well, I think the concept we have to make clear is "agent with given utility function". We don't need any human-specific concepts, and they're hopelessly complex anyway: let the FAI figure out the particulars on its own. Moreover, the concept of an "agent with given utility function" is something I believe I'm already relatively near to formalizing.

Replies from: Eugine_Nier, Stuart_Armstrong
comment by Eugine_Nier · 2014-03-27T05:17:27.616Z · LW(p) · GW(p)

If the agent in question has a well-defined utility function, why is he deferring to the FAI to explain it to him?

Replies from: Squark
comment by Squark · 2014-03-27T19:03:55.471Z · LW(p) · GW(p)

Because he is bad at introspection and his only access to the utility function is through a noisy low-bandwidth sensor called "intuition".

comment by Stuart_Armstrong · 2014-03-25T13:24:25.284Z · LW(p) · GW(p)

let the FAI figure out the particulars on its own.

Again, the more we can do ahead of time, the more likely it is that the FAI will figure these things out correctly.

Replies from: Squark
comment by Squark · 2014-03-25T19:18:15.929Z · LW(p) · GW(p)

Why do you think the FAI can figure these things out incorrectly, assuming we got "agent with given utility function" right? Maybe we can save it time by providing it with more initial knowledge. However, since the FAI has superhuman intelligence, it would probably take us much longer to generate that knowledge than it would take the FAI. I think that to generate an amount of knowledge which would be non-negligible from the FAI's point of view would take a timespan large with respect to the timescale on which UFAI risk becomes significant. Therefore in practice I don't think we can wait for it before building the FAI.

Replies from: Stuart_Armstrong
comment by Stuart_Armstrong · 2014-03-26T13:43:22.354Z · LW(p) · GW(p)

Why do you think the FAI can figure these things out incorrectly

Because values are not physical facts, and cannot be deduced from mere knowledge.

Replies from: Squark
comment by Squark · 2014-03-26T20:51:30.752Z · LW(p) · GW(p)

I'm probably explaining myself poorly.

I'm suggesting that there should be a mathematical operator which takes a "digitized" representation of an agent, either in white-box form (e.g. uploaded human brain) or in black-box form (e.g. chatroom logs) and produces a utility function. There is nothing human-specific in the definition of the operator: it can as well be applied to e.g. another AI, an animal or an alien. It is the input we provide the operator that selects a human utility function.

Replies from: asr, Stuart_Armstrong, pengvado
comment by asr · 2014-03-31T14:55:06.447Z · LW(p) · GW(p)

I don't understand how such an operator could work.

Suppose I give you a big messy data file that specifies neuron state and connectedness. And then I give you a big complicated finite-element simulator that can accurately predict what a brain would do, given some sensory input. How do you turn that into a utility function?

I understand what it means to use utility as a model of human preference. I don't understand what it means to say that a given person has a specific utility function. Can you explain exactly what the relationship is between a brain and this abstract utility function?

Replies from: Squark
comment by Squark · 2014-03-31T20:14:23.887Z · LW(p) · GW(p)

See the last paragraph in this comment.

Replies from: asr
comment by asr · 2014-04-01T04:22:39.426Z · LW(p) · GW(p)

I don't see how that addresses the problem. You're linking to a philosophical answer, and this is an engineering problem.

The claim you made, some posts ago, was "we can set an AI's goals by reference to a human's utility function." Many folks objected that humans don't really have utility functions. My objection was "we have no idea how to extract a utility function, even given complete data about a human's brain." Defining "utility function" isn't a solution. If you want to use "the utility function of a particular human" in building an AI, you need not only a definition, but a construction. To be convincing in this conversation, you would need to at least give some evidence that such a construction is possible.

You are trying to use, as a subcomponent, something we have no idea how to build and that seems possibly as hard as the original problem. And this isn't a good way to do engineering.

Replies from: Squark
comment by Squark · 2014-04-01T06:08:14.723Z · LW(p) · GW(p)

The way I expect AGI to work is receiving a mathematical definition of its utility function as input. So there is no need to have a "construction". I don't even know what a "construction" is, in this context.

Note that in my formal definition of intelligence, we can use any appropriate formula* in the given formal language as a utility function, since it all comes down to computing logical expectation values. In fact I expect a real seed AGI to work through computing logical expectation values (by an approximate method, probably some kind of Monte Carlo).

Of course, if the AGI design we will come up with is only defined for a certain category of utility functions then we need to somehow project into this category (assuming the category is rich enough for the projection not to lose too much information). The construction of this projection operator indeed might be very difficult.

  • In practice, I formulated the definition with utility = Solomonoff expectation value of something computable. But this restriction isn't necessary. Note that my proposal for defining logical probabilities admits self reference in the sense that the reasoning system is allowed to speak of the probabilities it assigns (like in Christiano et al).
comment by Stuart_Armstrong · 2014-03-31T11:02:03.965Z · LW(p) · GW(p)

Humans don't follow anything like a utility function, which is a first problem, so you're asking the AI to construct something that isn't there. Then you have to knit this together into a humanity utility function, which is very non trivial (this is one feeble and problematic way of doing this: http://lesswrong.com/r/discussion/lw/8qb/cevinspired_models/).

The other problem is that you haven't actually solved many of the hard problems. Suppose the AI decides to kill everyone, then replay, in an endless loop, the one upload it has, having a marvellous experience. Why would it not do that? We want the AI to correctly balance our higher order preferences (not being reduced to a single mindless experience) with our lower order preferences (being happy). But that desire is itself a higher order preference - it won't happen unless the AI already decides that higher order preferences trump lower ones.

And that was one example I just thought of. It's not hard to come up with "the AI does something stupid in this model (eg: replaces everyone with chatterbots that describe their ever increasing happiness and fulfilment) that is compatible with the original model but clearly stupid - clearly stupid to our own judgement, though, not to the AIs.

You may object that these problems won't happen - but you can't be confident of this, as you haven't defined your solution formally, and are relying on common sense to reject those pathological solutions. But nowhere have you assumed the AI has common sense, or how it will use it. The more details you put in your model, I think, the more the problems will become apparent.

Replies from: Squark
comment by Squark · 2014-03-31T11:37:12.089Z · LW(p) · GW(p)

Thank you for the thoughtful reply!

Deducing the correct utility of a utility maximiser is one thing (which has a low level of uncertainty, higher if the agent is hiding stuff).

In the white-box approach it can't really hide. But I guess it's rather tangential to the discussion.

Assigning a utility to an agent that doesn't have one is quite another... Humans don't follow anything like a utility function, which is a first problem, so you're asking the AI to construct something that isn't there.

What do you mean by "follow a utility function"? Why do you thinks humans don't do it? If it isn't there, what does it mean to have a correct solution to the FAI problem?

The robot is a behavior-executor, not a utility-maximizer.

The main problem with Yvain's thesis is in the paragraph:

Again, give the robot human level intelligence. Teach it exactly what a hologram projector is and how it works. Now what happens? Exactly the same thing - the robot executes its code, which says to scan the room until its camera registers blue, then shoot its laser.

What does Yvain mean by "give the robot human level intelligence"? If the robot's code remained the same, in what sense does it have human level intelligence?

Then you have to knit this together into a humanity utility function, which is very non trivial.

This is the part of the CEV proposal which always seemed redundant to me. Why should we do it? If you're designing the AI, why wouldn't you use your own utility function? At worst, an average utility function of the group of AI designers? Why do we want / need the whole humanity there? Btw, I would obviously prefer my utility function in the AI but I'm perfectly willing to settle on e.g. Yudkowsky's.

Suppose the AI decides to kill everyone, then replay, in an endless loop, the one upload it has, having a marvellous experience... the AI does something stupid in this model (eg: replaces everyone with chatterbots that describe their ever increasing happiness and fulfilment)...

It seems that you're identifying my proposal with something like "maximize pleasure". The latter is a notoriously bad idea, as was discussed endlessly. However, my proposal is completely different. The AI wouldn't do something the upload wouldn't do because such an action is opposed to the upload's utility function.

You may object that these problems won't happen - but you can't be confident of this, as you haven't defined your solution formally...

Actually, I'm not far from it (at least I don't think I'm further than CEV). Note that I have already defined formally I(A, U) where I=intelligence, A=agent, U=utility function. Now we can do something like "U(A) is defined to be U s.t. the probability that I(A, U) > I(R, U) for random agent R is maximal". Maybe it's more correct to use something like a thermal ensemble with I(A, U) playing the role of energy: I don't know, I don't claim to have solved it all already. I just think it's a good research direction.

Replies from: Stuart_Armstrong
comment by Stuart_Armstrong · 2014-03-31T12:39:20.132Z · LW(p) · GW(p)

What do you mean by "follow a utility function"? Why do you thinks humans don't do it?

Humans are neither independent not transitive. Human preferences change over time, depending on arbitrary factors, including how choices are framed. Humans suffer because of things they cannot affect, and humans suffer because of details of their probability assessment (eg ambiguity aversion). That bears repeating - humans have preference over their state of knowledge. The core of this is that "assessment of fact" and "values" are not disconnected in humans, not disconnected at all. Humans feel good when a team they support wins, without them contributing anything to the victory. They will accept false compliments, and can be flattered. Social pressure changes most values quite easily.

Need I go on?

If it isn't there, what does it mean to have a correct solution to the FAI problem?

A utility function which, if implemented by the AI, would result in a positive, fulfilling, worthwhile existence for humans. Even if humans had a utility, it's not clear that a ruling FAI should have the same one, incidentally. The utility is for the AI, and it aims to capture as much of human value as possible - it might just be the utility of a nanny AI (make reasonable efforts to keep humanity from developing dangerous AIs, going extinct, or regressing technologically, otherwise, let them be).

Replies from: Squark
comment by Squark · 2014-03-31T13:18:33.748Z · LW(p) · GW(p)

What do you mean by "follow a utility function"? Why do you thinks humans don't do it?

Humans are neither independent not transitive...

You still haven't defined "follow a utility function". Humans are not ideal rational optimizers of their respective utility functions. It doesn't mean they don't have them. Deep Blue often plays moves which are not ideal, nevertheless I think it's fair to say it optimizes winning. If you make intransitive choices, it doesn't mean your terminal values are intransitive. It means your choices are not optimal.

Human preferences change over time...

This is probably the case. However, the changes are slow, otherwise humans wouldn't behave coherently at all. The human utility function is only defined approximately, but the FAI problem only makes sense in the same approximation. In any case, if you're programming an AI you should equip it with the utility function you have at that moment.

...humans have preference over their state of knowledge...

Why do you think it is inconsistent with having a utility function?

...what does it mean to have a correct solution to the FAI problem?

A utility function which, if implemented by the AI, would result in a positive, fulfilling, worthwhile existence for humans.

How can you know that a given utility function has this property? How do you know the utility function I'm proposing doesn't have this property?

Even if humans had a utility, it's not clear that a ruling FAI should have the same one, incidentally.

Isn't it? Assume your utility function is U. Suppose you have the choice to create a superintelligence optimizing U or a superintelligence optimizing something other than U, let say V. Why would you choose V? Choosing U will obviously result in an enormous expected increase of U, which is what you want to happen, since you're a U-maximizing agent. Choosing V will almost certainly result in a lower expectation value of U: if the V-AI chooses strategy X that leads to higher expected U than the strategy that would be chosen by a U-AI then it's not clear why the U-AI wouldn't choose X.

Replies from: Stuart_Armstrong, Stuart_Armstrong
comment by Stuart_Armstrong · 2014-03-31T14:32:15.073Z · LW(p) · GW(p)

Humans are not ideal rational optimizers of their respective utility functions.

Then why claim that they have one? If humans have intransitive preferences (A>B>C>A), as I often do, then why claim that actually their preferences are secretly transitive but they fail to act on them properly? Nothing we know about the brain points to there being a hidden box with a pristine and pure utility function, that we then implement poorly.

comment by Stuart_Armstrong · 2014-03-31T14:33:11.787Z · LW(p) · GW(p)

...humans have preference over their state of knowledge...

Why do you think it is inconsistent with having a utility function?

They have preferences like ambiguity aversion, eg being willing to pay to find out, during a holiday, whether they were accepted for a job, while knowing that they can't make any relevant decisions with that early knowledge. This is not compatible with following a standard utility function.

Replies from: Squark
comment by Squark · 2014-03-31T17:41:33.708Z · LW(p) · GW(p)

They have preferences like ambiguity aversion, eg being willing to pay to find out, during a holiday, whether they were accepted for a job, while knowing that they can't make any relevant decisions with that early knowledge. This is not compatible with following a standard utility function.

I don't know what you mean by "standard" utility function. I don't even know what you mean by "following". We want to find out since uncertainty makes you nervous, being nervous is unpleasant and pleasure is a terminal value. It is entirely consistent with having a utility function and with my formalism in particular.

Humans are not ideal rational optimizers of their respective utility functions.

Then why claim that they have one? If humans have intransitive preferences (A>B>C>A), as I often do, then why claim that actually their preferences are secretly transitive but they fail to act on them properly?

In what epistemology are you asking this question? That is, what is the criterion according to which the validity of answer would be determined?

If you don't think human preferences are "secretly transitive", then why do you suggest the following:

Whenever revealed preferences are non-transitive or non-independent, use the person's stated meta-preferences to remove the issue. The AI thus calculates what the person would say if asked to resolve the transitivity or independence (for people who don't know about the importance of resolving them, the AI would present them with a set of transitive and independent preferences, derived from their revealed preferences, and have them choose among them).

What is the meaning of asking a person to resolve intransitivities if there are no transitive preferences underneath?

Replies from: Stuart_Armstrong
comment by Stuart_Armstrong · 2014-03-31T19:13:34.876Z · LW(p) · GW(p)

I don't even know what you mean by "following".

That is, what is the criterion according to which the validity of answer would be determined?

Those are questions for you, not for me. You're claiming that humans have a hidden utility function. What do you mean by that, and what evidence do you have for your position?

Replies from: Squark
comment by Squark · 2014-03-31T19:52:16.953Z · LW(p) · GW(p)

I'm claiming that it is possible to define the utility function of any agent. For unintelligent "agents" the result is probably unstable. For intelligent agents the result should be stable.

The evidence is that I have a formalism which produces this definition in a way compatible with intuition about "agent having a utility function". I cannot present evidence which doesn't rely on intuition since that would require having another more fundamental definition of "agent having a utility function" (which AFAIK might not exist). I do not consider this to be a problem since all reasoning falls back to intuition if you ask "why" sufficiently many times.

I don't see any meaningful definition of intelligence or instrumental rationality without a utility function. If we accepts humans are (approximately) rational / intelligent, they must (in the same approximation) have utility functions.

It also seems to me (again, intuitively) that the very concept of "preference" is incompatible with e.g. intransitivity. In the approximation it makes sense to speak of "preferences" at all, it makes sense to speak of preferences compatible with the VNM axioms ergo utility function. Same goes for the concept of "should". If it makes sense to say one "should" do something (for example build a FAI), there must be a utility function according to which she should do it.

Bottom line, eventually it all hits philosophical assumptions which have no further formal justification. However, this is true of all reasoning. IMO the only valid method to disprove such assumptions is either by reductio ad absurdum or by presenting a different set of assumptions which is better in some sense. If you have such an alternative set of assumption for this case or a wholly different way to resolve philosophical questions, I would be very interested to know.

Replies from: Stuart_Armstrong
comment by Stuart_Armstrong · 2014-04-01T12:21:04.221Z · LW(p) · GW(p)

I'm claiming that it is possible to define the utility function of any agent.

It is trivially possible to do that. Since no choice is strictly identical, you just add enough details to make each choice unique, and then choose a utility function that will always reach that choice ("subject has a strong preference for putting his left foot forwards when seeing an advertisement for deodorant on Tuesday morning that are the birthdays of prominent Dutch politicians").

A good simple model of human behaviour is that of different modules expressing preferences and short-circuiting the decision making in some circumstances, and a more rational system ("system 2") occasionally intervening to prevent loss through money pumps. So people are transitive in their ultimate decisions, often and to some extent, but their actual decisions depend strongly on which choices are presented first (ie their low level preferences are intransitive, but the rational part of them prevents loops). Would you say these beings have no preferences?

Replies from: Squark
comment by Squark · 2014-04-01T13:24:07.163Z · LW(p) · GW(p)

I'm claiming that it is possible to define the utility function of any agent.

It is trivially possible to do that. Since no choice is strictly identical, you just add enough details to make each choice unique, and then choose a utility function that will always reach that choice

My formalism doesn't work like that since the utility function is a function over possible universes, not over possible choices. There is no trivial way to construct a utility function wrt which the given agent's intelligence is close to maximal. However it still might be the case we need to give larger weight to simple utility functions (otherwise we're left with selecting a maximum in an infinite set and it's not clear why it exists). As I said, I don't have the final formula.

A good simple model of human behaviour is that of different modules expressing preferences and short-circuiting the decision making in some circumstances, and a more rational system ("system 2") occasionally intervening to prevent loss through money pumps. So people are transitive in their ultimate decisions, often and to some extent, but their actual decisions depend strongly on which choices are presented first (ie their low level preferences are intransitive, but the rational part of them prevents loops). Would you say these beings have no preferences?

I'd say they have a utility function. Image a chess AI that selects moves by one of two strategies. The first strategy ("system 1") uses simple heuristics like "check when you can" that produce an answer quickly and save precious time. The second strategy ("system 2") runs a minimax algorithm with a 10-move deep search tree. Are all of the agent's decisions perfectly rational? No. Does it have a utility function? Yes: winning the game.

comment by pengvado · 2014-03-27T00:10:21.583Z · LW(p) · GW(p)

There are many such operators, and different ones give different answers when presented with the same agent. Only a human utility function distinguishes the right way of interpreting a human mind as having a utility function from all of the wrong ways of interpreting a human mind as having a utility function. So you need to get a bunch of Friendliness Theory right before you can bootstrap.

Replies from: Squark
comment by Squark · 2014-03-27T19:02:20.011Z · LW(p) · GW(p)

Why do you think there are many such operators? Do you believe the concept of "utility function of an agent" is ill-defined (assuming the "agent" is actually an intelligent agent rather than e.g. a rock)? Do you think it is possible to interpret a paperclip maximizer as having a utility function other than maximizing paperclips?

Replies from: Stuart_Armstrong
comment by Stuart_Armstrong · 2014-03-31T11:09:41.342Z · LW(p) · GW(p)

Deducing the correct utility of a utility maximiser is one thing (which has a low level of uncertainty, higher if the agent is hiding stuff). Assigning a utility to an agent that doesn't have one is quite another.

See http://lesswrong.com/lw/6ha/the_blueminimizing_robot/ Key quote:

The robot is a behavior-executor, not a utility-maximizer.

Replies from: Squark
comment by Squark · 2014-03-31T11:58:18.918Z · LW(p) · GW(p)

Replied in the other thread.

comment by [deleted] · 2014-03-19T16:03:51.893Z · LW(p) · GW(p)

After refining my thoughts, I think I see the problem:

1: The Banner AI must ban all transmissions of naughty Material X.

1a: Presumably, the Banner must also ban all transmissions of encrypted naughty Material X.

2: The people the Banner AI is trying to ban from sending naughty transmissions have an entire field of thought (knowledge of human values) the AI is not allowed to take into account: It is secret.

3: Presumably, the Banner AI has to allow some transmissions. It can't just shut down all communications.

Edit: 4: The Banner AI needs a perfect success rate. High numbers like 97.25% recognition are not sufficient. I was previously presuming this without stating it, hence my edit.

I think fulfilling all of these criteria with sufficiently clever people is impossible or nearly so. If it is possible, it strikes me as highly difficult, unless I'm making a fundamental error in my understanding of encryption theory.

comment by Error · 2014-03-17T18:57:20.760Z · LW(p) · GW(p)

what methods would you implement to distinguish between pornography and eroticism, and ban one but not the other

There's a heuristic I use to distinguish between the two that works fairly well: in erotica, the participants are the focus of the scene. In pornography, the camera (and by implication the viewer) is the true focus of the scene.

That being said, I have a suspicion that trying to define the difference explicitly is a wrong question. People seem to use a form of fuzzy logic[1] when thinking about the two. What we're really looking at is gradations; a better question might be "how far does something have to be along a pornographic axis before being banned, and what factors determine a position on that axis?"

[1]: Damn I hope I'm using that term right....

Replies from: Stuart_Armstrong, Gunnar_Zarncke
comment by Stuart_Armstrong · 2014-03-18T10:49:17.964Z · LW(p) · GW(p)

This seems like a very high level solution - I don't think "where is the real focus of the scene (in a very abstract sense)" is simpler than "is this pornography".

comment by Gunnar_Zarncke · 2014-03-17T20:40:59.105Z · LW(p) · GW(p)

Your heuristic is bound to be gamed for. But that is a problem of any definition that isn't true to the underlying complex value function.

Replies from: Error
comment by Error · 2014-03-18T00:03:51.388Z · LW(p) · GW(p)

I agree. I wasn't suggesting it for serious, literal use; that's why I specified that it was a heuristic.

comment by Eugine_Nier · 2014-03-19T04:38:32.007Z · LW(p) · GW(p)

Well, there's Umberto Eco's famous essay on the subject. (The essay is not long so read the whole thing.)

One notable thing about his criterion is that it makes no reference to nudity, thus it's a horrendous predictor on the set of all possible movies, it just happens to work well on the subset of possible movies a human would actually want to watch.

comment by Eugine_Nier · 2014-03-18T02:53:10.165Z · LW(p) · GW(p)

Suppose you're put in change of some government and/or legal system, and you need to ban pornography, and see that the ban is implemented. Pornography is the problem, not eroticism. So a lonely lower-class guy wanking off to "Fuck Slaves of the Caribbean XIV" in a Pussycat Theatre is completely off. But a middle-class couple experiencing a delicious frisson when they see a nude version of "Pirates of Penzance" at the Met is perfectly fine - commendable, even.

I have no idea what distinction you're trying to draw here. And I say this as someone who opposes porn but is willing to permit artistic nudes. (I'm not sure what you mean by "erotica" but your example is something I'd probably classify as porn.) I say probably because it's theoretically possible to do it artistically but given the state of the current art world the chances that such a performance be done artistically is low.

comment by [deleted] · 2014-03-17T18:10:11.474Z · LW(p) · GW(p)

To ban anything, by law or less formally, is to use might in the service of right. For an AI to ban anything, it would need to have the ability to cause physical harm or death to humans. Just like we humans do to ourselves. All law is rooted in enfranchising a minority to commit violence and disenfranchising the majority to do the same. This is part of what it would mean for an AI to ban porn.

Replies from: polymathwannabe, itaibn0
comment by polymathwannabe · 2014-03-17T20:51:23.111Z · LW(p) · GW(p)

All law is rooted in enfranchising a minority to commit violence and disenfranchising the majority to do the same.

I guess you don't conceive any people can peacefully agree to rule itself?

Replies from: Viliam_Bur
comment by Viliam_Bur · 2014-03-18T12:30:11.760Z · LW(p) · GW(p)

They usually also peacefully agree on punishment for other people if those break the law. Even if the latter disagree about whether they should be punished. And if they try to resist the punishment, it usually results in more punishment.

The larger a group of people, the less likely it is that they would all agree on something. Ten people can agree on something; ten millions agreeing on something is extremely improbable.

Trevor's comment is irrelevant to the thought experiment (thus the downvotes), but it is technically correct.

comment by itaibn0 · 2014-03-17T18:19:59.712Z · LW(p) · GW(p)

The question is not how to get an AI to ban porn, it's how to get a pre-existing legal system, which presumably is able and willing to use coercive violence, to ban porn.