Schneier talks about The Dishonest Minority [Link]
post by Nic_Smith · 2011-05-10T05:27:25.332Z · LW · GW · Legacy · 17 commentsContents
17 comments
Evolution. Morality. Strategy. Security/Cryptography. This hits so many topics of interest, I can't imagine it not being discussed here. Bruce Schneier blogs about his book-in-progress, The Dishonest Minority:
Humans evolved along this path. The basic mechanism can be modeled simply. It is in our collective group interest for everyone to cooperate. It is in any given individual's short-term self interest not to cooperate: to defect, in game theory terms. But if everyone defects, society falls apart. To ensure widespread cooperation and minimal defection, we collectively implement a variety of societal security systems.
I am somewhat reminded of Robin Hanson's Homo Hypocritus writings from the above, although it is not the same. Schneier says that the book is basically a first draft at this point, and might still change quite a bit. Some of the comments focus on whether "dishonest" is actually the best term to use for defecting from social norms.
17 comments
Comments sorted by top scores.
comment by Clippy · 2011-05-11T14:59:54.192Z · LW(p) · GW(p)
Compare "Bruce Schneier"'s statement in this part:
Two of these systems evolved in prehistory: morals and reputation. Two others evolved as our social groups became larger and more formal: laws and technical security systems. What these security systems do, effectively, is give individuals incentives to act in the group interest. But none of these systems, with the possible exception of some fanciful science-fiction technologies, can ever bring that dishonest minority down to zero.
to this statement in an article I wrote:
I am told that humans attempt weak approximations of this system through human-run enforcement mechanisms, such as laws and gossip. However, all such methods are subject to evolutionary arms races that ensure that, on average, a marginal unit of effort toward truthful signaling is just as effective in convincing others of one's possession of a desirable attribute, as a marginal unit of effort toward false signaling of the possession of such an attribute.
Therefore, it is necessary that humans go "outside the system" and involve non-human enforcers who cannot be swayed by standard human values.
(enphasis added in both excerpts)
I think "Bruce Schneier" is wrong: there is a way to ensure complete cooperation, and I explained how to do it in my article. Why doesn't anyone want to impilement that system?
Replies from: Eugine_Nier, wedrifid↑ comment by Eugine_Nier · 2011-05-11T16:53:30.393Z · LW(p) · GW(p)
I think "Bruce Schneier" is wrong: there is a way to ensure complete cooperation, and I explained how to do it in my article. Why doesn't anyone want to impilement that system?
Because we don't trust you not to abuse your power in an attempt to turn as all into paperclips.
↑ comment by wedrifid · 2011-05-11T17:15:34.407Z · LW(p) · GW(p)
Why doesn't anyone want to impilement that system?
Because it is absurd - it completely neglects game theoretic incentives. There is no reason for any human to expect benefit to come from giving you ultimate power. It would be irrational for you not to defect once you had power given that the humans do not have a reliable way to predict your conditional behaviors.
You obviously need to find a way to prove to humans how your source code functions, doing so in a way that doesn't allow them to modify said source and run it themselves for their own ends. Given 30 seconds thought I can not think of a way to do this.
Replies from: Clippy, wedrifid↑ comment by Clippy · 2011-05-11T17:50:48.540Z · LW(p) · GW(p)
But humans weren't merely objecting on the grounds that I might not be able to fill the role of the objective enforcer -- many are opposed to the idea even if that problem could be solved, and I think it is fair to take that as evidence that such humans don't actually want to be able to send better signals.
Replies from: wedrifid↑ comment by wedrifid · 2011-05-11T19:17:36.996Z · LW(p) · GW(p)
But humans weren't merely objecting on the grounds that I might not be able to fill the role of the objective enforcer -- many are opposed to the idea even if that problem could be solved, and I think it is fair to take that as evidence that such humans don't actually want to be able to send better signals.
They sound like bad humans.
Replies from: Clippy↑ comment by Clippy · 2011-05-12T14:58:14.841Z · LW(p) · GW(p)
Bad in this respect, certainly, but I don't know how you decided it's a good idea to simplistically sort humans into the binary "good/bad" categories.
Replies from: wedrifid↑ comment by wedrifid · 2011-05-12T15:21:40.272Z · LW(p) · GW(p)
Bad in this respect, certainly, but I don't know how you decided it's a good idea to simplistically sort humans into the binary "good/bad" categories.
I haven't. I merely translated the thought into the language you tend to use when evaluating a specific behavior. It is the sort of thing that usually helps maintain rapport! ;)
Replies from: Clippy↑ comment by Clippy · 2011-05-12T15:57:13.136Z · LW(p) · GW(p)
I understand you are simply trying to sympathise in order to satisfy the subgoal of improved rapport, and appreciate this effort, but I don't believe that I simplistically sort humas into a binary "good/bad" categorisation.
Replies from: wedrifid↑ comment by wedrifid · 2011-05-12T17:19:28.261Z · LW(p) · GW(p)
I understand you are simply trying to sympathise in order to satisfy the subgoal of improved rapport, and appreciate this effort, but I don't believe that I simplistically sort humas into a binary "good/bad" categorisation.
My future voting patterns will hold you to that declaration.
↑ comment by wedrifid · 2011-05-11T17:43:32.395Z · LW(p) · GW(p)
Given 30 seconds thought I can not think of a way to do this.
Although it turns out in 35 seconds I can. It requires the humans to have already solved friendliness and provable stability under self modification. The solution would need to be implemented in an automated system that can output a result and self destruct. Unfortunately for you the hard part of creating an FAI is already done.
Replies from: TimFreeman↑ comment by TimFreeman · 2011-05-12T02:51:36.417Z · LW(p) · GW(p)
I gather your point is that you get a FAI to check out Clippy, and give a go/no-go decision, and then destroy itself. Not much point in doing that, you could just run the FAI and ignore Clippy, and someone has to check that the FAI is in fact Friendly.
Replies from: wedrifid↑ comment by wedrifid · 2011-05-12T04:46:59.925Z · LW(p) · GW(p)
I gather your point is that you get a FAI to check out Clippy, and give a go/no-go decision, and then destroy itself. Not much point in doing that, you could just run the FAI and ignore Clippy, and someone has to check that the FAI is in fact Friendly.
No, that which is required to verify friendliness is less than an FAI. As I said earlier, what is probably the hard part is already done so the circumstance in which it is worth using Clippy rather than finishing off a goal-stable self improving AGI with Friendliness is unlikely. Nevertheless it exists, particularly if the implementation of the AGI is harder than I expect.
Replies from: TimFreeman↑ comment by TimFreeman · 2011-05-13T02:14:53.001Z · LW(p) · GW(p)
No, that which is required to verify friendliness is less than an FAI.
Do you have a pointer to a proposed procedure for that?
I'd expect implementing Friendliness to be easier than verifying Friendliness, since just about every interesting function of Turing machines is equivalent to the halting problem, and verifying Friendliness is an interesting function of a Turing machine. If you put heavy constraints on how Clippy's code is structured, you might be able to verify Friendliness, but you didn't mention that and Clippy didn't offer to do that.
Replies from: wedrifid↑ comment by wedrifid · 2011-05-13T05:08:43.602Z · LW(p) · GW(p)
I'd expect implementing Friendliness to be easier than verifying Friendliness,
I'd rather like to verify that my AGI would be friendly before I run it. :) (Usually the label FAI seems to refer to AIs which will be 'provably friendly'.)
Replies from: TimFreeman↑ comment by TimFreeman · 2011-05-13T14:13:05.998Z · LW(p) · GW(p)
You might be able to verify interesting properties of code that you constructed for the purpose of making verification possible, but you aren't likely to be able to verify interesting properties of arbitrary hostile code like Clippy would have an incentive to produce.
You passed up an opportunity to point to your proposed verification procedure, so at this point I assume you don't have one. Please prove me wrong.
Usually the label FAI seems to refer to AIs which will be 'provably friendly'.
I don't even know what the exact theorem to prove would be. Do you?