Work on Security Instead of Friendliness?

wei-dai

Work on Security Instead of Friendliness?

post by Wei Dai (Wei_Dai) · 2012-07-21T18:28:44.692Z · LW · GW · Legacy · 107 comments

  1. The economics of security seems very unfavorable to the defense, in every field except cryptography.
  2. Solving the problem of security at a sufficient level of generality requires understanding goals, and is essentially equivalent to solving Friendliness.
None
107 comments

So I submit the only useful questions we can ask are not about AGI, "goals", and other such anthropomorphic, infeasible, irrelevant, and/or hopelessly vague ideas. We can only usefully ask computer security questions. For example some researchers I know believe we can achieve virus-safe computing. If we can achieve security against malware as strong as we can achieve for symmetric key cryptography, then it doesn't matter how smart the software is or what goals it has: if one-way functions exist no computational entity, classical or quantum, can crack symmetric key crypto based on said functions. And if NP-hard public key crypto exists, similarly for public key crypto. These and other security issues, and in particular the security of property rights, are the only real issues here and the rest is BS.

-- Nick Szabo

Nick Szabo and I have very similar backrounds and interests. We both majored in computer science at the University of Washington. We're both very interested in economics and security. We came up with similar ideas about digital money. So why don't I advocate working on security problems while ignoring AGI, goals and Friendliness?

In fact, I once did think that working on security was the best way to push the future towards a positive Singularity and away from a negative one. I started working on my Crypto++ Library shortly after reading Vernor Vinge's A Fire Upon the Deep. I believe it was the first general purpose open source cryptography library, and it's still one of the most popular. (Studying cryptography led me to become involved in the Cypherpunks community with its emphasis on privacy and freedom from government intrusion, but a major reason for me to become interested in cryptography in the first place was a desire to help increase security against future entities similar to the Blight described in Vinge's novel.)

I've since changed my mind, for two reasons.

1. The economics of security seems very unfavorable to the defense, in every field except cryptography.

Studying cryptography gave me hope that improving security could make a difference. But in every other security field, both physical and virtual, little progress is apparent, certainly not enough that humans might hope to defend their property rights against smarter intelligences. Achieving "security against malware as strong as we can achieve for symmetric key cryptography" seems quite hopeless in particular. Nick links above to a 2004 technical report titled "Polaris: Virus Safe Computing for Windows XP", which is strange considering that it's now 2012 and malware have little trouble with the latest operating systems and their defenses. Also striking to me has been the fact that even dedicated security software like OpenSSH and OpenSSL have had design and coding flaws that introduced security holes to the systems that run them.

One way to think about Friendly AI is that it's an offensive approach to the problem of security (i.e., take over the world), instead of a defensive one.

2. Solving the problem of security at a sufficient level of generality requires understanding goals, and is essentially equivalent to solving Friendliness.

What does it mean to have "secure property rights", anyway? If I build an impregnable fortress around me, but an Unfriendly AI causes me to give up my goals in favor of its own by crafting a philosophical argument that is extremely convincing to me but wrong (or more generally, subverts my motivational system in some way), have I retained my "property rights"? What if it does the same to one of my robot servants, so that it subtly starts serving the UFAI's interests while thinking it's still serving mine? How does one define whether a human or an AI has been "subverted" or is "secure", without reference to its "goals"? It became apparent to me that fully solving security is not very different from solving Friendliness.

I would be very interested to know what Nick (and others taking a similar position) thinks after reading the above, or if they've already had similar thoughts but still came to their current conclusions.

107 comments

Comments sorted by top scores.

comment by nickLW · 2012-07-21T22:09:52.087Z · LW(p) · GW(p)

I only have time for a short reply:

(1) I'd rephrase the above to say that computer security is among the two most important things one can study with regard to this alleged threat.

(2) The other important thing is law. Law is the "offensive approach to the problem of security" in the sense I suspect you mean it (unless you mean something more like the military). Law is very highly evolved, the work of millions of people as smart or smarter than Yudkoswky over more than a millenium, and tested empirically against the real world of real agents with a real diversity of values every day. It's not something you can ever come close to competing with by a philosophy invented from scratch.

(3) I stand by my comment that "AGI" and "friendliness" are hopelessly anthropomorphic, infeasible, and/or vague.

(4) Computer "goals" are only usefully studied against actual algorithms, or clearly defined mathemetical classes of algorithms, not vague and imaginary concepts. Perhaps you can make some progress by for example advancing the study of postconditions, which seem to be the closest analog to goals in the software engineering world. One can imagine a world where postconditions are always checked, for example, and other software ignores the output of software that has violated one of its postconditions.

Replies from: TimS, CarlShulman, Wei_Dai

↑ comment by TimS · 2012-07-22T02:59:37.965Z · LW(p) · GW(p)

The other important thing is law. Law is the "offensive approach to the problem of security" in the sense I suspect you mean it (unless you mean something more like the military). Law is very highly evolved, the work of millions of people as smart or smarter than Yudkoswky over more than a millenium, and tested empirically against the real world of real agents with a real diversity of values every day. It's not something you can ever come close to competing with by a philosophy invented from scratch.

As a lawyer, I strongly suspect this statement is false. As you seem to be referring to the term, Law is society's organizational rules about how and when to implement coercive violence. In the abstract, this is powerful, but concretely, this power is implemented by individuals. Some of them (i.e. police officers), care relatively little about the abstract issues - in other words, they aren't careful about the issues that are relevant to AI.

Further, law is filled with backdoors - they are called legislators. In the United States, Congress can make almost any judicially announced rule irrelevant by passing a statute. If you call that process "Law," then you aren't talking about the institution that draws on "the work of millions of smart people" over time.

Finally, individual lawyers' day-to-day work has almost no relationship to the parts of Law that you are suggesting is relevant to AI. Worse for your point, lawyers don't even engage with the policy issues of law with any frequency. For example, a lawyer litigating contracts might never engage with what promises should be enforced in her entire career.

In short, your paragraph about law is misdirected and misleading.

↑ comment by CarlShulman · 2012-07-22T02:35:49.942Z · LW(p) · GW(p)

Law is very highly evolved, the work of millions of people as smart or smarter than Yudkoswky over more than a millenium,

That seems pretty harsh! The Bureau of Labor Statistics reports 728,000 lawyers in the U.S., a notably attorney-heavy society within the developed world. The SMPY study of kids with 1 in 10,000 cognitive test scores found (see page 722) only a small minority studying law. The 90th percentile IQ for "legal occupations" in this chart is a little over 130. Historically populations were much lower, nutrition was worse, legal education or authority was only available to a small minority, and the Flynn Effect had not occurred. Not to mention that law is disproportionately made by politicians who are selected for charisma and other factors in addition to intelligence.

and tested empirically against the real world of real agents with a real diversity of values every day. It's not something you can ever come close to competing with by a philosophy invented from scratch.

It's hard to know what to make of this.

Perhaps that the legal system is good at creating incentives that closely align the interests of those it governs with the social good, and that this will work on new types of being without much dependence on their decisionmaking processes?

Contracts and basic property rights certainly do seem to help produce wealth. On the other hand, financial regulation is regularly adjusted to try to nullify new innovation by financiers that poses systemic risks or exploits government guarantees, but the financial industry still frequently outmaneuvers the legal system. And of course the legal system depends on the loyalty of the security forces for enforcement, and makes use of ideological agreement among the citizenry that various things are right or wrong.

Restraining those who are much weaker is easier than restraining those who are strong. A more powerful analogy would be civilian control over military and security forces. There do seem to have been big advances in civilian control over the military in the developed countries (fewer coups, etc), but they seem to reflect changes in ideology and technology more than law.

If it is easy to enforce laws on new AGI systems, then the situation seems fairly tractable, even for AGI systems with across-the-board superhuman performance which take action based on alien and inhumane cost functions. But it doesn't seem guaranteed that it will be easy to enforce such laws on smart AGIs, or that the trajectory of development will be "all narrow AI, all the time," given the great economic value of human generality.

Replies from: private_messaging, nickLW

↑ comment by private_messaging · 2012-07-22T16:06:59.277Z · LW(p) · GW(p)

That seems pretty harsh!

There's 0.0001 prior for 1 in 10000 intelligence level. It's a low prior, you need a genius detector with an incredibly low false positive rate before most of your 'geniuses' are actually smart. A very well defined problems with very clear 'solved' condition (such as multiple novel mathematical proofs or novel algorithmic solution to hard problem that others try to solve) would maybe suffice, but 'he seems smart' certainly would not. This also goes for IQ tests themselves - while a genius would have high IQ score, high IQ scored person would most likely be someone somewhat smart slipping through the crack between what IQ test measures and what intelligence is (case example, Chris Langan, or Keith Raniere, or other high IQ 'geniuses' we would never suspect of being particularly smart if not for IQ tests).

Weak and/or subjective evidence of intelligence, especially given lack of statistical independence of evidence, should not get your estimate of intelligence of anyone very high.

Replies from: Wei_Dai

↑ comment by Wei Dai (Wei_Dai) · 2012-07-22T21:33:10.882Z · LW(p) · GW(p)

This is rather tangential, but I'm curious, out of those who score 1 in 10000 on a standard IQ test, what percentage is actually at least, say, 1 in 5000 in actual intelligence? Do you have a citation or personal estimate?

Replies from: David_Gerard, private_messaging

↑ comment by David_Gerard · 2012-07-22T23:59:03.308Z · LW(p) · GW(p)

Depends what you call "actual intelligence" as distinct from what IQ tests measure. private_messaging talks a lot in terms of observable real-world achievements, so presumably is thinking of something along those lines.

Replies from: evand

↑ comment by evand · 2012-07-23T00:42:27.130Z · LW(p) · GW(p)

The easiest interpretation to measure would be a regression toward the mean effect. Putting a lower bound on the IQ scores in your sample means that you have a relevant fraction of people who tested higher than their average test score. I suspect that at the high end, IQ tests have few enough questions scored incorrectly that noise can let some < 1 in 5000 IQ test takers into your 1 in 10000 cutoff.

Replies from: David_Gerard

↑ comment by David_Gerard · 2012-07-23T07:30:28.132Z · LW(p) · GW(p)

I also didn't note the other problem: 1 in 10,000 is around IQ=155; the ceiling of most standardized (validated and normed) intelligence tests is around 1 in 1000 (IQ~=149). Tests above this tend to be constructed by people who consider themselves in this range, to see who can join their high IQ society and not substantially for any other purpose.

↑ comment by private_messaging · 2012-07-23T06:36:31.180Z · LW(p) · GW(p)

Would depend to how you evaluate actual intelligence. IQ test, at high range, measures reliability in solving simple problems (combined with, maybe, environmental exposure similarity to test maker when it comes to progressive matrices and other 'continue sequence' cases - the predictions by Solomonoff induction depend to machine and prior exposure, too). As an extreme example consider an intelligence test of very many very simple and straightforward logical questions. It will correlate with IQ but at the high range it will clearly measure something different from intelligence. All the intelligent individuals will score highly on that test, but so will a lot of people who are simply very good at simple questions.

A thought experiment: picture a class room of mind uploads, set for a half the procedural skills to read only, and teach them the algebra class. Same IQ, utterly different outcome.

I would expect that if the actual intelligence correlates with IQ to the factor of 0.9 (VERY generous assumption), the IQ could easily become non-predictive at as low as 99th percentile without creating any contradiction with the observed general correlation. edit: that would make about one out of 50 people with IQ of one in 10 000 (or one in 1000 or 1 in 1000 0000 for that matter) be intelligent at level of 1 in 5 000. That seems kind of low, but then, we mostly don't hear of the high IQ people just for IQ alone. edit: and the high IQ organizations like Mensa and the like are hopelessly unremarkable, rather than some ultra powerful groups of super-intelligences.

In any case the point is that the higher is the percentile the more confident you must be that you have no common failure mode between parts of your test.

edit: and for the record my IQ is 148 as measured on a (crappy) test in English which is not my native tongue. I also got very high percentile ratings in programming contest, and I used to be good at chess. I have no need to rationalize something here. I feel that a lot of this sheepish innumerate assumption that you can infer one in 10 000 level performance from a test in absence of failure modes of which you are far less certain than 99.99% , comes simply from signalling - to argue against applicability of IQ test in the implausibly high percentiles lets idiots claim that you must be stupid. When you want to select one in 10 000 level of performance in running 100 meters you can't do it by measuring performance at a standing jump.

Replies from: CarlShulman

↑ comment by CarlShulman · 2012-07-23T07:20:43.709Z · LW(p) · GW(p)

There are longitudinal studies showing that people with 99.99th percentile performance on cognitive tests have substantially better performance (on patents, income, tenure at top universities) than those at the 99.9th or 99th percentiles. More here.

and the high IQ organizations like Mensa and the like are hopelessly unremarkable, rather than some ultra powerful groups of super-intelligences.

Mensa is less selective than elite colleges or workplaces for intelligence, and much less selective for other things like conscientiousness, height, social ability, family wealth, etc. Far more very high IQ people are in top academic departments, Wall Street, and Silicon Valley than in high-IQ societies more selective than Mensa. So high-IQ societies are a very unrepresentative sample, selected to be less awesome in non-IQ dimensions.

Replies from: private_messaging

↑ comment by private_messaging · 2012-07-23T08:50:45.291Z · LW(p) · GW(p)

There are longitudinal studies showing that people with 99.99th percentile performance on cognitive tests

Uses other tests than IQ test, right? I do not dispute that a cognitive test can be made which would have the required reliability for detecting the 99.99th percentile. The IQ tests, however, are full of 'continue a short sequence' tests that are quite dubious even in principle. It is fundamentally difficult to measure up into 99.99th percentile, you need a highly reliable measurement apparatus, carefully constructed in precisely the way in which IQ tests are not. Extreme rarities like one in 10 000 should not be thrown around lightly.

Mensa is less selective than elite colleges or workplaces for intelligence

There are other societies. They all are not very selective for intelligence either, though, because they all rely on dubious tests.

and much less selective for other things like conscientiousness, height, social ability, family wealth, etc.

I would say that this makes those other places be an unrepresentative sample of the "high IQ" individuals. Even if those individuals who pass highly selective requirements on something else rarely enter mensa, they are rare (tautology on highly selective) and their relative under representation in mensa doesn't sway mensa's averages.

edit: for example consider the Nobel Prize winners. They all have high IQs but there is considerable spread and the IQ doesn't seem to correlate well with the estimate of "how many others worked on this and did not succeed".

Note: I am using "IQ" in the narrow sense of "what IQ tests measure", not as shorthand for intelligence. The intelligence has the capacity to learn component which IQ tests do not measure but tests of mathematical aptitude (with hard problems) or verbal aptitude do.

note2: I do not believe that the correlation entirely disappears even for IQ tests past 99th percentile. My argument is that for the typical IQ tests it well could. It's just that the further you get up the smaller fraction of the excellence is actually being measured.

Replies from: CarlShulman

↑ comment by CarlShulman · 2012-07-23T09:13:33.611Z · LW(p) · GW(p)

Uses other tests than IQ test, right?

Administering SATs to younger children, to raise the ceiling.

I would say that this makes those other places be an unrepresentative sample of the "high IQ" individuals.

Well Mensa is ~0 selectivity beyond the IQ threshold, and is a substitute good for other social networks, leaving it with the dregs. "Much more" is poor phrasing here, they're not rejecting 90%. If you look at the linked papers you'll see that a good majority of those at the 1 in 10,000 level on those childhood tests wind up with elite university/alumni or professional networks with better than Mensa IQ distributions.

Replies from: private_messaging

↑ comment by private_messaging · 2012-07-23T09:38:31.173Z · LW(p) · GW(p)

Administering SATs to younger children, to raise the ceiling.

Ghmmm. I'm sure this measures a plenty of highly useful personal qualities that correlate with income. E.g. rate of learning. Or inclination to pursue intellectual work.

Well Mensa is ~0 selectivity beyond the IQ threshold, and is a substitute good for other social networks, leaving it with the dregs. "Much more" is poor phrasing here, they're not rejecting 90%. If you look at the linked papers you'll see that a good majority of those at the 1 in 10,000 level on those childhood tests wind up with elite university/alumni or professional networks with better than Mensa IQ distributions

Well, yes. I think we agree on all substantial points here but disagree on interpretation of my post. I referred specifically to "IQ tests" not to SAT, as lacking the rigour required for establishing 1 in 10 000 performance with any confidence, to balance on my point that e.g. 'that guy seems smart' shouldn't possibly result in estimate of 1 in 10 000 , and neither could anything that relies on rather subjective estimate of the difficulty of the accomplishments in the settings where you can't e.g. reliably estimate from number of other people who try and don't succeed.

Replies from: CarlShulman

↑ comment by CarlShulman · 2012-07-23T10:09:05.022Z · LW(p) · GW(p)

I referred specifically to "IQ tests" not to SAT, as lacking the rigour required for establishing 1 in 10 000 performance with any confidence, to balance on my point that e.g. 'that guy seems smart' shouldn't possibly result in estimate of 1 in 10 000

Note that these studies use the same tests (childhood SAT) that Eliezer excelled on (quite a lot higher than the 1 in 10,000 level), and that I was taking into account in my estimation.

Replies from: private_messaging

↑ comment by private_messaging · 2012-07-23T13:33:49.535Z · LW(p) · GW(p)

Sources?

Also,

a: while that'd be fairly impressive, keep in mind that if it is quite a lot higher than 1 in 10 000 then my prior for it is quite a lot lower than 0.0001 with only minor updates up for 'seeming clever' , and my prior for someone being a psychopath/liar is 0.01, with updates up for talking other people into giving you money.

b: not having something else likewise concrete to show off (e.g. contest results of some kind and the like) will at most make me up-estimate him to bin with someone like Keith Raniere or Chris Langan (those did SAT well too), which is already the bin that he's significantly in. Especially as he had been interested in programming, and the programming is the area where you can literally make a LOT of money in just a couple years while gaining the experience and gaining much better cred than childhood SAT. But also an area that heavily tasks general ability to think right and deal with huge amounts of learned information. My impression is that he's a spoiled 'math prodigy' who didn't really study anything beyond fairly elementary math, and my impression is that it's his own impression except he thinks he can do advanced math with little effort using some intuition while i'm pretty damn skeptical of such stuff unless well tested.

Replies from: CarlShulman, David_Gerard, None

↑ comment by CarlShulman · 2012-07-23T20:15:13.629Z · LW(p) · GW(p)

and the programming is the area where you can literally make a LOT of money in just a couple years while gaining the experience and gaining much better cred than childhood SAT

I don't think the childhood SAT gives that much "cred" for real-world efficacy, and I don't conflate intelligence with "everything good a person can be." Obviously, Eliezer is below average in the combination of conscientiousness, conformity, and so forth that causes most smart people to do more schooling. So I would expect lower performance on any given task than from a typical person of his level of intelligence. But it's not that surprising that he would, say, continue popular blogging with significant influence on a sizable audience, rather than stop that (which he values for its effects) to work as a Google engineer to sack away a typical salary, or to do a software startup (which the stats show is pretty uncertain even for those with VC backing and previous successful startups).

'math prodigy' who didn't really study anything beyond fairly elementary math, and my impression is that it's his own impression

I agree on not having deep math knowledge, and this being reason to be skeptical of making very unusual progress in AI or FAI. However while his math scores were high, "math prodigy" isn't quite right, since his verbal scores were even higher. There are real differences in what you expect to happen depending on the "top skill." In the SMPY data such people often take up professions like science (or science fiction) writer (or philosopher) that use the verbal skills too, even when they have higher raw math performance than others who go to on to become hard science professors. It's pretty mundane when such a person leans towards being a blogger rather than an engineer, especially when they are doing pretty well as the former. Eliezer has said that if not worried about x-risk he would want to become a science fiction writer, as opposed to a scientist.

↑ comment by David_Gerard · 2012-07-23T13:43:42.938Z · LW(p) · GW(p)

Hey, Raniere was smart enough to get his own cult going.

Replies from: private_messaging

↑ comment by private_messaging · 2012-07-23T13:59:56.018Z · LW(p) · GW(p)

Or old enough and disillusioned enough not to fight the cultist's desire to admire someone.

↑ comment by [deleted] · 2012-07-23T14:57:09.326Z · LW(p) · GW(p)

Especially as he had been interested in programming, and the programming is the area where you can literally make a LOT of money in just a couple years while gaining the experience and gaining much better cred than childhood SAT.

What salary level is good enough evidence for you to consider someone clever?

Notice that your criteria for impressive cleverness excludes practically every graduate student -- the vast majority make next to nothing, have few "concrete" things to show off, etc.

My impression is that he's a spoiled 'math prodigy' who didn't really study anything beyond fairly elementary math, and my impression is that it's his own impression except he thinks he can do advanced math with little effort using some intuition while i'm pretty damn skeptical of such stuff unless well tested.

Except the interview you quoted says none of that.

JB: I can think of lots of big questions at this point, and I’ll try to get to some of those, but first I can’t resist asking: why do you want to study math?

EY: A sense of inadequacy.

[...]

[EY:] Even so, I was a spoiled math prodigy as a child—one who was merely amazingly good at math for someone his age, instead of competing with other math prodigies and training to beat them. My sometime coworker Marcello (he works with me over the summer and attends Stanford at other times) is a non-spoiled math prodigy who trained to compete in math competitions and I have literally seen him prove a result in 30 seconds that I failed to prove in an hour.

This is substantially different from EY currently being a math prodigy.

[EY:] I’ve come to accept that to some extent [Marcello and I] have different and complementary abilities—now and then he’ll go into a complicated blaze of derivations and I’ll look at his final result and say "That’s not right" and maybe half the time it will actually be wrong.

In other words, he's no better than random chance, which is vastly different from "[thinking] he can do advanced math with little effort using some intuition." By the same logic, you'd accept P=NP trivially.

Replies from: CarlShulman, wedrifid, private_messaging

↑ comment by CarlShulman · 2012-07-23T20:18:32.192Z · LW(p) · GW(p)

[EY:] I’ve come to accept that to some extent [Marcello and I] have different and complementary abilities—now and then he’ll go into a complicated blaze of derivations and I’ll look at his final result and say "That’s not right" and maybe half the time it will actually be wrong.

In other words, he's no better than random chance, which is vastly different from "[thinking] he can do advanced math with little effort using some intuition." By the same logic, you'd accept P=NP trivially.

I don't understand. The base rate for Marcello being right is greater than 0.5.

Replies from: gwern, None

↑ comment by gwern · 2012-07-23T22:49:54.894Z · LW(p) · GW(p)

Maybe EY meant that, on the occasions that Eliezer objected to the final result, he was correct to object half the time. So if Eliezer objected to just 1% of the derivations, on that 1% our confidence in the result of the black box would suddenly drop down to 50% from 99.5% or whatever.

Replies from: CarlShulman

↑ comment by CarlShulman · 2012-07-23T22:52:20.150Z · LW(p) · GW(p)

Yes, but that's not "no better than random chance."

Replies from: gwern

↑ comment by gwern · 2012-07-23T22:58:01.899Z · LW(p) · GW(p)

Sure. I was suggesting a way in which an objection which is itself only 50% correct could be useful, contra Dmytry.

↑ comment by [deleted] · 2012-07-23T22:54:04.708Z · LW(p) · GW(p)

I don't understand. The base rate for Marcello being right is greater than 0.5.

Oh, right. The point remains that even a perfect Oracle isn't an efficient source of math proofs.

↑ comment by wedrifid · 2012-07-25T02:04:08.307Z · LW(p) · GW(p)

[EY:] I’ve come to accept that to some extent [Marcello and I] have different and complementary abilities—now and then he’ll go into a complicated blaze of derivations and I’ll look at his final result and say "That’s not right" and maybe half the time it will actually be wrong.

In other words, he's no better than random chance, which is vastly different from "[thinking] he can do advanced math with little effort using some intuition." By the same logic, you'd accept P=NP trivially.

You do not understand how basic probability works. I recommend An Intuitive Explanation of Bayes' Theorem.

If a device gives a correct diagnosis 999,999 times out of 1,000,000 and is applied to a population that has about 1 in 1,000,000 chance of being positive then a positive diagnosis by the device has approximately 50% chance of being correct. That doesn't make it "no better than random chance". It makes it amazingly good.

↑ comment by private_messaging · 2012-07-23T16:51:06.115Z · LW(p) · GW(p)

Notice that your criteria for impressive cleverness excludes practically every graduate student -- the vast majority make next to nothing, have few "concrete" things to show off, etc.

It's not criteria for cleverness, it is criteria for evidence when the prior is 0.0001 (for 1 in 10 000) . One can be clever at one in 7 billions level, and never having done anything of interest, but I can't detect such person as clever at one in 10 000 level with any confidence without seriously strong evidence.

This is substantially different from EY currently being a math prodigy.

I meant, a childhood math prodigy.

In other words, he's no better than random chance

If Marcello failed one time out of ten and Eliezer detected it half of the time, that would be better than chance. Without knowing failure rate of Marcello (or without knowing how the failures are detected besides being pointed out by EY), one can't say whenever it is better than chance or not.

↑ comment by nickLW · 2012-07-22T17:27:37.558Z · LW(p) · GW(p)

The Bureau of Labor Statistics reports 728,000 lawyers in the U.S

I would have thought it obvious that I was talking about lawyers who have been developing law for at least a millenium, not merely currently living lawyers in one particular country. Oh well.

Since my posts seem to be being read so carelessly, I will no longer be posting on this thread. I highly recommend folks who want to learn more about where I'm coming from to visit my blog, Unenumerated. Also, to learn more about the evolutionary emergence of ethical and legal rules, I highly recommend Hayek -- Fatal Conceit makes a good startng point.

Replies from: CarlShulman, Wei_Dai

↑ comment by CarlShulman · 2012-07-22T18:03:29.711Z · LW(p) · GW(p)

I would have thought it obvious that I was talking about lawyers who have been developing law for at least a millenium, not merely currently living lawyers in one particular country. Oh well.

Since my posts seem to be being read so carelessly, I will no longer be posting on this thread.

A careful reading of my own comment would have revealed my references to the US as only one heavily lawyered society (useful for an upper bound on lawyer density, and representing a large portion of the developed world and legal population), and to the low population of past centuries (which make them of lesser importance for a population estimate), indicating that I was talking about the total over time and space (above some threshold of intelligence) as well.

I was presenting figures as the start of an estimate of long term lawyer population, and to indicate that to get "millions" one could not pick a high percentile within the population of lawyers, problematic given the intelligence of even 90th percentile attorneys.

Replies from: TimS, private_messaging

↑ comment by TimS · 2012-07-23T00:53:25.231Z · LW(p) · GW(p)

Is it really so hard to believe that there have been more than a million highly intelligent judges and influential lawyers since the Magna Carta was issued? (In my mind, the reference is to English Common Law - Civil Law works differently enough that counting participants is much harder).

As I said, I don't think this proves what nickLW asserts follows from it, but I think the statement "More than a million fairly intelligent individuals have put in substantial amounts of work to make the legal system capable of solving social problems decently well" is true, if mostly irrelevant to AI.

Replies from: CarlShulman

↑ comment by CarlShulman · 2012-07-23T01:42:47.208Z · LW(p) · GW(p)

since the Magna Carta was issued? (In my mind, the reference is to English Common Law

Limiting to the common law tradition makes it even more dubious. Today, the population of England and Wales is around 60 million. Wikipedia says:

March 2006 there were 1,825 judges in post in England and Wales, most of whom were Circuit Judges (626) or District Judges (572)

On the number of solicitors (barristers are much less numerous):

The number of solicitors qualified to work in England and Wales has rocketed over the past 30 years, according to new figures from the Law Society. The number holding certificates - which excludes retired lawyers and those no longer following a legal career - are at nearly 118,000, up 36% on ten years ago.

Or this:

There were 2,500 barristers and 32,000 solicitors in England and Wales in the early 1970s. Now there are 15,000 barristers and 115,000 solicitors.

And further in the past the overall population was much smaller, as well as poorer and with fewer lawyers (who were less educated, and more impaired by lead, micronutrient deficiencies, etc):

1315 – Between 4 and 6 million.[3] 1350 – 3 million or less.[4] 1541 – 2,774,000 [note 1][5] 1601 – 4,110,000 [5] 1651 – 5,228,000 [5] 1701 – 5,058,000 [5] 1751 – 5,772,000 [5] 1801 – 8,308,000 at the time of the first census. Census officials estimated at the time that there had been an increase of 77% in the preceding 100 years. In each county women were in the majority.[6] Wrigley and Schofield estimate 8,664,000 based on birth and death records.[5] 1811 – 9,496,000

"More than a million fairly intelligent individuals have put in substantial amounts of work

If we count litigating for particular clients on humdrum matters (the great majority of cases) in all legal systems everywhere, I would agree with this.

"have put in substantial amounts of work to make the legal system capable of solving social problems decently well"

It seems almost all the work is not directed at that task, or duplicative, or specialized to particular situations in ways that obsolesce. I didn't apply much of this filter in the initial comment, but it seems pretty intense too.

Replies from: TimS

↑ comment by TimS · 2012-07-23T02:26:50.679Z · LW(p) · GW(p)

Ok, you've convinced me that millions is an overestimate.

Summing the top 60% of judges, top 10% of practicing lawyers, and the top 10% of legal thinkers who were not practicing lawyers - since 1215, that's more than 100,00 people. What other intellectual enterprise has that commitment for that period of time? The military has more people total, but far fewer deep thinkers. Religious institutions, maybe? I'd need to think harder about how to appropriately play reference class tennis - the whole Catholic Church is not a fair comparison because it covers more people than the common law.

Stepping back for a moment, I still think your particular criticism of nickLW's point is misplaced. Assuming that he's referencing the intellectual heft and success of the common law tradition, he's right that there's a fair amount of heft there, regardless of his overestimate of the raw numbers.

The existence of that heft doesn't prove what he suggests, but your argument seems to be assaulting the strongest part of his argument by asserting that there has not be a relatively enormous intellectual investment in developing the common law tradition. There has been a very large investment, and the investment has created a powerful institution.

Replies from: CarlShulman

↑ comment by CarlShulman · 2012-07-23T02:40:13.551Z · LW(p) · GW(p)

I agree that the common law is a pretty effective legal system, reflecting the work of smart people adjudicating particular cases, and feedback over time (from competition between courts, reversals, reactions to and enforcement difficulties with judgments, and so forth). I would recommend it over civil law for a charter city importing a legal system.

But that's no reason to exaggerate the underlying mechanisms and virtues. I also think that there is an active tendency in some circles to overhype those virtues, as they are tied to ideological disputes. [Edited to remove political label.]

but your argument seems to be assaulting the strongest part of his argument

Perhaps a strong individual claim, but I didn't see it clearly connected to a conclusion.

Replies from: TimS

↑ comment by TimS · 2012-07-23T12:57:22.338Z · LW(p) · GW(p)

Perhaps a strong individual claim, but I didn't see it clearly connected to a conclusion.

I agree with you that it isn't connected at all with his conclusions. Therefore, challenging it doesn't challenge his conclusion. Nitpicking something that you think is irrelevant to the opposing side's conclusion in a debate is logically rude.

↑ comment by private_messaging · 2012-07-22T19:29:14.386Z · LW(p) · GW(p)

And why one should pick a high percentile, exactly, if the priors for high percentiles are proportionally low and strong evidence is absent? What's wrong with assuming 'somewhat above median', i.e. close to 50th percentile? Why is that even really harsh?

Replies from: CarlShulman

↑ comment by CarlShulman · 2012-07-22T20:00:40.802Z · LW(p) · GW(p)

Extreme standardized testing (after adjusting for regression to the mean), successful writer (by hits, readers, reviews; even vocabulary, which is fairly strongly associated with intelligence in large statistical samples), impressing top philosophers with his decision theory work, impressing very smart and influential people (e.g. Peter Thiel) in real-time conversation.

Why is that even really harsh?

It would be harsh to a graduate student from a top hard science program or law school. The median attorney figure in the US today, let alone over the world and history, is just not that high.

Replies from: private_messaging

↑ comment by private_messaging · 2012-07-25T14:59:14.765Z · LW(p) · GW(p)

impressing top philosophers with his decision theory work,

The TDT paper from 2012 reads like popularization of something, not like normal science paper on some formalized theory. I don't think impressing 'top philosophers' is impressive.

It would be harsh to a graduate student from a top hard science program or law school.

Or to a writer that gets royalties larger than typical lawyer. Or a smart and influential person, e.g. Peter Thiel.

But a blogger that successfully talked small-ish percentage of people he could reach, into giving him money for work on AI? That's hardly the evidence to sway 0.0001 prior. I do concede though that median lawyer might be unable to do that (but I dunno - only small percentage would be self deluded or bad enough to try). The world is full of pseudoscientists, cranks, and hustlers that manage this, and more, and who do not seem to be particularly bright.

↑ comment by Wei Dai (Wei_Dai) · 2012-07-23T12:13:20.997Z · LW(p) · GW(p)

Nick, do you see a fault in how I've been carrying on our discussions as well? Because you've also left several of our threads dangling, including:

How likely is it that an AGI will be created before all of its potential economic niches have been filled by more specialized algorithms?
How much hope is there for "security against malware as strong as we can achieve for symmetric key cryptography"?
Does "hopelessly anthropomorphic and vague" really apply to "goals"?

(Of course it's understandable if you're just too busy. If that's the case, what kind of projects are you working on these days?)

Replies from: nickLW

↑ comment by nickLW · 2012-07-27T03:36:30.499Z · LW(p) · GW(p)

Wei, you and others here interested in my opinions on this topic would benefit from understanding more about where I'm coming from, which you can mainly do by reading my old essays (especially the three philosophy essays I've just linked to on Unenumerated). It's a very different world view than the typical "Less Wrong" worldview: based far more on accumulated knowledge and far less on superficial hyper-rationality. You can ask any questions that you have of me there, as I don't typically hang out here. As for your questions on this topic:

(1) There is insufficient evidence to distinguish it from an arbitrarily low probability.

(2) To state a probability would be an exercise in false precision, but at least it's a clearly stated goal that one can start gathering evidence for and against.

(3) It depends on how clearly and formally the goal is stated, including the design of observatons and/or experiments that can be done to accurately (not just precisely) measure progress towards and attainment or non-attainment of that goal.

As for what I'm currently working on, my blog Unenumerated is a good indication of my publicly accessible work. Also feel free to ask any follow-up questions or comments you have stemming from this thread there.

Replies from: Wei_Dai

↑ comment by Wei Dai (Wei_Dai) · 2012-07-27T05:11:18.867Z · LW(p) · GW(p)

I've actually already read those essays (which I really enjoyed, BTW), but still often cannot see how you've arrived at your conclusions on the topics we've been talking about recently.

For the rest of your comment, you seem to have misunderstood my grandparent comment. I was asking you to respond to my arguments on each of the threads we were discussing, not just to tell me how you would answer each of my questions. (I was using the questions to refer to our discussions, not literally asking them. Sorry if I didn't make that clear.)

↑ comment by Wei Dai (Wei_Dai) · 2012-07-22T04:34:10.960Z · LW(p) · GW(p)

It's not something you can ever come close to competing with by a philosophy invented from scratch.

I don't understand what you mean by this. Are you saying something like if a society was ever taken over by a Friendly AI, it would fail to compete against one ruled by law, in either a military or economic sense? Or do you mean "compete" in the sense of providing the most social good. Or something else?

I stand by my comment that "AGI" and "friendliness" are hopelessly anthropomorphic, infeasible, and/or vague.

I disagree with "hopelessly" "anthropomorphic" and "vague", but "infeasible" I may very well agree with, if you mean something like it's highly unlikely that a human team would succeed in creating a Friendly AGI before it's too late to make a difference and without creating unacceptable risk, which is why I advocate more indirect methods of achieving it.

Computer "goals" are only usefully studied against actual algorithms, or clearly defined mathemetical classes of algorithms, not vague and imaginary concepts.

People are trying to design such algorithms, things like practical approximations to AIXI, or better alternatives to AIXI. Are you saying they should refrain from using the word "goals" until they have actually come up with concrete designs, or what? (Again I don't advocate people trying to directly build AGIs, Friendly or otherwise, but your objection doesn't seem to make sense.)

Replies from: Steve_Rayhawk

↑ comment by Steve_Rayhawk · 2012-07-22T23:54:54.387Z · LW(p) · GW(p)

It's not something you can ever come close to competing with by a philosophy invented from scratch.

I don't understand what you mean by this.

A sufficient cause for Nick to claim this would be that he believed that no human-conceivable AI design would be able to incorporate by any means, including by reasoning from first principles or even by reference, anything functionally equivalent to the results of all the various dynamics of updating that have (for instance) made present legal systems as (relatively) robust (against currently engineerable methods of exploitation) as they are.

This seems somewhat strange to you, because you believe humans can conceive of AI designs that could reason some things from first principles (given observations of the world that the reasoning needed to be relevant to, plus reasonably anticipatable advantages of computing power over single humans) or incorporate results by reference.

One possible reason he might believe this would be that he believed that, whenever a human reasons about history or evolved institutions, there are something like two distinct levels of a computational complexity hierarchy at work, and that the powers of the greater level (history and the evolution of institutions) are completely inacessible to the powers of the lesser level (the human). (The machines representing the two levels in this case might be "the mental states accessible to a single armchair philosophy community", or, alternatively, "fledgling AI which, per a priori economic intuition, has no advantage over a few philosophers", versus "the physical states accessible in human history".)

This belief of his might be charged with a sort of independent half-intuitive aversion to making the sorts of (frequently catastrophic) mistakes that are routinely made by people who think they can metaphorically breach this complexity barrier. One effect of such an aversion would be that he would intuitively anticipate that he would always be, at least in expected value, wrong to agree with such people, no matter what arguments they could turn out to have. That is, it wouldn't increase his expected rightness to check to see if they were right about some proposed procedure to get around the complexity barrier, because, intuitively, the prior probability that they were wrong, the conditional probability that they would still be wrong despite being persuasive by any conventional threshold, and the wrongness of the cost that had empirically been inflicted on the world by mistakes of that sort, would all be so high. (I took his reference to Hayek's Fatal Conceit, and the general indirect and implicitly argued emotional dynamic of this interaction, to be confirmation of this intuitive aversion.) By describing this effect explicitly, I don't mean to completely psychologize here, or make a status move by objectification. Intuitions like the one I'm attributing can (and very much should!), of course, be raised to the level of verbally presented propositions, and argued for explicitly.

(For what it's worth, the most direct counter to the complexity argument expressed this way is: "with enough effort it is almost certainly possible, even from this side of the barrier, to formalize how to set into motion entities that would be on the other side of the barrier". To cover the pragmatics of the argument, one would also need to add: "and agreeing that this amount of effort is possible can even be safe, so long as everyone who heard of your agreement was sufficiently strongly motivated not to attempt shortcuts".)

Another, possibly overlapping reason would have to do with the meta level that people around here normally imagine approaching AI safety problems from -- that being, "don't even bother trying to invent all the required philosophy yourself; instead do your best to try to formalize how to mechanically refer to the process that generated, and could continue to generate, something equivalent to the necessary philosophy, so as to make that process happen better or at least to maximally stay out of its way" ("even if this formalization turns out to be very hard to do, as the alternatives are even worse"). That meta level might be one that he doesn't really think of as even being possible. One possible reason for this would be that he weren't aware that anyone actually ever meant to refer to a meta level that high, so that he never developed a separate concept for it. Perhaps when he first encountered e.g. Eliezer's account of the AI safety philosophy/engineering problem, the concept he came away with was based on a filled-in assumption about the default mistake that Eliezer must have made and the consequent meta level at which Eliezer meant to propose that the problem should be attacked, and that meta level was far too low for success to be conceivable, and he didn't afterwards ever spontaneously find any reason to suppose you or Eliezer might not have made that mistake. Another possible reason would be that he disbelieved, on the above-mentioned a priori grounds, that the proposed meta level was possible at all. (Or, at least, that it could ever be safe to believe that it were possible, given the horrors perpetrated and threatened by other people who were comparably confident in their reasons for believing similar things.)

comment by jimrandomh · 2012-07-22T03:51:41.390Z · LW(p) · GW(p)

If anyone's looking for a computer security project that could be relevant to AI risk scenarios, consider looking at the architecture that an AI box would have, and the software at its surface. While boxing is not sufficient, it is useful as part of a defense-in-depth strategy, and it's a relatively cheap and good precaution for groups that don't think their creations will become generally intelligent with much probability. A box would offer some protection against AI escaping before its birth has been noticed. I'm particularly concerned about terminals (iTerm, xterm, PuTTY, etc), which would most likely interact with the contents of the box in any boxed AI setup, and all seem to be neglected targets written in unsafe languages (ie, C and C++).

It would also be nice to have hardware and software for fake (alarmed) network interfaces, soundcard antennas, and pretend VM hosts. Rather than close off the tricks that an AI of unrecognized generality might use to escape its box, we should replace them all with traps; and since AI researchers probably won't make these things themselves, we should provide ready-made solutions for them.

Replies from: CarlShulman

↑ comment by CarlShulman · 2012-07-22T04:06:17.234Z · LW(p) · GW(p)

These things all seem positive (which doesn't mean near the frontier of effectiveness) and helpful as far as they go.

comment by lukeprog · 2012-07-21T22:06:11.244Z · LW(p) · GW(p)

I find it odd that Nick refers to "AGI goals" as an "anthropomorphic [and] hopelessly vague" idea. One model for AGI goals, for example, is the utility function, which is neither anthropomorphic (since humans don't have them) nor vague.

Replies from: Vaniver

↑ comment by Vaniver · 2012-07-21T23:18:50.333Z · LW(p) · GW(p)

nor vague.

It seems somewhat vague to me in the sense that the domain of the function is underspecified. Is it valuing sensory inputs? Is it valuing mental models? Is it valuing external reality? Is that at all related to what humans would recognize as "goals" (say, the goal of visiting London)?

Replies from: Wei_Dai, DanielLC

↑ comment by Wei Dai (Wei_Dai) · 2012-07-22T02:05:35.177Z · LW(p) · GW(p)

It seems to me that vagueness is different from having competing definitions (e.g., AIXI's notion of utility function vs UDT's) that may turn out to be wrong. In cryptography there are also competing formal definitions of "secure", and for many of them it turns out they don't coincide with our intuitive ideas of "secure", so that a cryptographic scheme can satisfy some formal definition of security while still allowing attackers to "break" the scheme and steal information through ways not anticipated by the designer. Note that this is after several decades of intensive research by hundreds of cryptologists world-wide. Comparatively the problem of "AGI goals" has just begun to be studied. What is it that makes "hopelessly anthropomorphic and vague" apply to "AGI goals", but not to "cryptographic security" as of, say, 1980?

Replies from: Vladimir_Nesov, Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2012-07-22T03:53:30.182Z · LW(p) · GW(p)

It seems to me that vagueness is different from having competing definitions (e.g., AIXI's notion of utility function vs UDT's) that may turn out to be wrong.

AIXI's utility function is useless, the fact that it can be called "utility function" notwithstanding. UDT's utility function is not defined formally (its meaning depends on "math intuition"). For any real-world application of a utility function, we don't have a formal notion of its domain. These definitions are somewhat vague, even if not hopelessly so. They are hopelessly vague for the purpose of building a FAI.

Replies from: Wei_Dai

↑ comment by Wei Dai (Wei_Dai) · 2012-07-22T05:26:06.987Z · LW(p) · GW(p)

These definitions are somewhat vague, even if not hopelessly so.

Perhaps I shouldn't have implied or given the impression that we have fully non-vague definitions of "utility function". What if I instead said that our notions of utility function are not as vague as Vaniver makes them out to be? That our most promising approach for how to define "utility function" gives at least fairly clear conceptual guidance as to the domain, and that we can see some past ideas (e.g., just over sensory inputs) as definitely wrong?

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2012-07-22T05:43:30.910Z · LW(p) · GW(p)

That our most promising approach for how to define "utility function" gives at least fairly clear conceptual guidance as to the domain

Given that the standard of being "fairly clear" is rather vague, I don't know if I disagree, but at the moment I don't know of any approach to a potentially FAI-grade notion of preference of any clarity. Utility functions seem to be a wrong direction, since they don't work in the context of the idea of control based on resolution of logical uncertainty (structure). (UDT's "utility function" is more of a component of definition of something that is not a utility function.)

ADT utility value (which is a UDT-like goal definition) is somewhat formal, but only applies to toy examples, it's not clear what it means even in these toy examples, it doesn't work at all when there is uncertainty or incomplete control over that value on part of the agent, and I have no idea how to treat physical world in its context. (It also doesn't have any domain, which seems like a desirable property for a structuralist goal definition.) This situation seems like the opposite of "clear" to me...

Replies from: Wei_Dai

↑ comment by Wei Dai (Wei_Dai) · 2012-07-23T13:09:08.804Z · LW(p) · GW(p)

In my original UDT post, I suggested

In this case, we'd need to program the AI with preferences over all mathematical structures, perhaps represented by an ordering or utility function over conjunctions of well-formed sentences in a formal set theory.

Of course there are enormous philosophical and technical problems involved with this idea, but given that it has more or less guided all subsequent decision theory work by our community (except possibly work within SI that I've not seen), Vaniver's characterization of how much the domain of the utility function is underspecified ("Is it valuing sensory inputs? Is it valuing mental models? Is it valuing external reality?") is just wrong.

Replies from: Vladimir_Nesov, private_messaging

↑ comment by Vladimir_Nesov · 2012-07-23T14:46:59.669Z · LW(p) · GW(p)

Right, preference over possible logical consequences of given situations is a strong unifying principle. We can also take physical world to be a certain collection of mathematical structures, possibly heuristically selected based on observations according with being controllable and morally relevant in a tractable way.

The tricky thing is that we are not choosing a structure among some collection of structures (a preferred possible world from a collection of possible worlds), but instead we are choosing which properties a given fixed class of structures will have, or alternatively we are choosing which theories/definitions are consistent or inconsistent, which defined classes of structures exist vs. don't exist. Since the alternatives that are not chosen are therefore made inconsistent, it's not clear how to understand them as meaningful possibilities, they are the mysterious logically impossible possible worlds. And there we have it, the mystery of the domain of preference.

↑ comment by private_messaging · 2012-07-23T14:12:06.702Z · LW(p) · GW(p)

Well, the publicly visible side of work does not seem to refer to specifically this, hence it is not so specified.

With regards to your idea, if you have to make that sort of preference function to really motivate the AI in sufficiently general manner that it may just be hell bent on killing everyone, I think the 'risk' is very far fetched exactly per my suspicion that viable super-intelligent UFAI is much too hard to worry about it (while all sorts of AIs that work on well specified mathematical problems would be much much simpler) . If it is the only way to motivate AI that is effective over evolution from seed to super-intelligence, then the only people who are working on something like UFAI are the FAI crowd. Keep in mind that if I want to cure cancer via the 'software tools' route and I am not signed up for cryonics then I'll just go for the simplest solution that works, which will be some sort of automated reasoning over formal systems (not over real world states). Especially as the general AI would require the technologies from the former anyway.

↑ comment by Vladimir_Nesov · 2012-07-22T03:01:32.243Z · LW(p) · GW(p)

It's somewhat vague, not necessarily hopelessly so. The question of the domain of utility functions seems important and poorly understood, not to mention the possible inadequacy of the idea of utility functions over worlds, as opposed to something along the lines of a fixed utility value definition that doesn't explicitly refer to any worlds.

↑ comment by DanielLC · 2012-07-22T04:58:12.598Z · LW(p) · GW(p)

It's valuing external reality. Valuing sensory inputs and mental models would just result in wireheading.

It would have a utility function, in which it assigns value to possible futures. It's not really a "goal" per se unless it's a satisficer. Otherwise, it's more of a general idea of what's better or worse. It would want to make as many paperclips as it can, rather than build a billion of them.

Replies from: JaneQ, timtyler

↑ comment by JaneQ · 2012-07-22T07:02:31.085Z · LW(p) · GW(p)

It's valuing external reality. Valuing sensory inputs and mental models would just result in wireheading.

Mathematically any value that AI can calculate from external anything is a function of sensory input.

'Vague' presumes the level of precision that is not present here. It is not even vague. It's incoherent.

Replies from: Wei_Dai, TheOtherDave, DanielLC

↑ comment by Wei Dai (Wei_Dai) · 2012-07-23T17:08:25.848Z · LW(p) · GW(p)

Mathematically any value that AI can calculate from external anything is a function of sensory input.

Given the same stream of sensory inputs, external reality may be different depending on the AI's outputs, and the AI can prefer one output to another based on their predicted effects on external reality even if they make no difference to its future sensory inputs.

'Vague' presumes the level of precision that is not present here. It is not even vague. It's incoherent.

Even if you were right that valuing external reality is equivalent to valuing sensory input, how would that make it incoherent? Or are you saying that the idea of "external reality" is inherently incoherent?

Replies from: JaneQ, timtyler

↑ comment by JaneQ · 2012-07-24T08:36:02.900Z · LW(p) · GW(p)

The 'predicted effects on external reality' is a function of prior input and internal state.

The idea of external reality is not incoherent. The idea of valuing external reality with a mathematical function is.

Note, by the way, that valuing 'wire in the head' is also a type of 'valuing external reality', not in the sense of 'external' as in wire being outside the box that runs AI, but external in sense of wire being outside the algorithm of the AI. When that point is being discussed here, SI seem to magically acquire an understanding of distinction between outside an algorithm and inside of algorithm to argue that wireheading won't happen. The confusion between model and reality appears and disappears at most convenient moments.

Replies from: Wei_Dai

↑ comment by Wei Dai (Wei_Dai) · 2012-07-24T16:36:24.674Z · LW(p) · GW(p)

I think I'm getting a better idea of where our disagreement is coming from. You think of external reality as some particular universe, and since we don't have direct knowledge of what that universe is, we can only apply our utility function to models of it that we build using sensory input, and not to external reality itself. Is this close to what you're thinking?

If so, I suggest that "valuing external reality" makes more sense if you instead think of external reality as the collection of all possible universes. I described this idea in more detail in my post introducing UDT.

Replies from: private_messaging

↑ comment by private_messaging · 2012-07-26T04:03:06.715Z · LW(p) · GW(p)

How would this assign utility to performing an experiment to falsify (drop probability of) some of the 'possible worlds' ? Note that such action decreases the sum of value over possible worlds by eliminating (decreasing weight of) some of the possible worlds.

Please note that the "utility function" to which Nick Szabo refers is the notion that is part of the SI marketing pitch, and therein it alludes to the concept of utility from economics - which does actually make the agent value gathering information - and creates impression that this is a general concept applicable to almost any AI and something likely to be created by an AGI team unaware of the whole 'friendliness' idea; something that would be simple to make for paperclips; the world's best technological genius of the future AGI creators being just a skill of making real the stupid wishes which need to be corrected by SI.

Meanwhile, in the non-vague sense that you outline here, it appears much more dubious that anyone who does not believe in feasibility of friendliness would want to build this; it's not even clear that anyone could. Meanwhile, an AI whose goal is only defined within a model based on physics as we know it, and lacking any sort of tie of that model to anything real - no value to keeping the model in sync with the world - is sufficient to build all that we need for mind uploading. Sensing is a very hard problem in AI, much more so for AGI.

Replies from: Wei_Dai

↑ comment by Wei Dai (Wei_Dai) · 2012-07-26T19:14:55.889Z · LW(p) · GW(p)

How would this assign utility to performing an experiment to falsify (drop probability of) some of the 'possible worlds' ?

UDT would want to perform experiments so that it can condition its future outputs on the results of those experiments (i.e., give different outputs depending on how the experiments come out). This gives it higher utility without "falsifying" any of the possible worlds.

Note that such action decreases the sum of value over possible worlds by eliminating (decreasing weight of) some of the possible worlds.

The reason UDT is called "updateless" is that it doesn't eliminate or change weight of any of the possible worlds. You might want to re-read the UDT post to better understand it.

The rest of your comment makes some sense, but is your argument that without SI (if it didn't exist), nobody else would try to make an AGI with senses and real-world goals? What about those people (like Ben Goertzel) who are currently trying to build such AGIs? Or is your argument that such people have no chance of actually building such AGIs at least until mind uploading happens first? What about the threat of neuromorphic (brain-inspired) AGIs as as we get closer to achieving uploading?

Replies from: private_messaging

↑ comment by private_messaging · 2012-07-26T20:24:01.923Z · LW(p) · GW(p)

The reason UDT is called "updateless" is that it doesn't eliminate or change weight of any of the possible worlds. You might want to re-read the UDT post to better understand it.

A particular instance of UDT running particular execution history got to condition on this execution history; you can say that you call conditioning what I call updates; in practice you will want not to run the computations irrelevant to the particular machine, and you will have strictly less computing power in the machine than in the universe it inhabits including the machine itself. It would be good if you could provide example of experimentations it might perform, somewhat formally derived. It feels to me that while it is valuable that you formalized some of the notions you largely have shifted/renamed all the actual problems.

E.g. it is problematic to specify utility function on reality, its incoherent. In your case the utility function is specified on all mathematically representable theories, which may well not allow to actually value a paperclip. Plus the number of potential paperclips within a theory would grow larger than any computable function of size of the theory, and the actions may well be dominated by relatively small, but absolutely enormous, differences between huge theories. Can you make actual example of some utility function? It doesn't have to correspond to paperclips - anything so that UDT with this plugged in would actually do something to our reality rather than the imaginary BusyBeaver(100) beings with imaginary dustspecks in their eyes which might be running a boxed sim of our world.

With regards to Ben Goertzel, where does his AGI include anything like this not so vague utility function of yours? The marketing spiel in question is, indeed, that Ben Goertzel's AI (or someone else's) would maximize an utility function and kill everyone or something, which leads me to assume that they are not talking of your utility function.

With regards to neuromorphic AGIs, I think there's far too much science fiction and far too little understanding of neurology in the rationalization of 'why am I getting paid'. While I do not doubt that brain does implement some sort of 'master trick' in, perhaps, every cortical column, there is an elaborate system for motivating this whole, and that system quite thoroughly fails to care about the real world state, in deed. And once again, why do you think neuromorphic AGIs would have the sort of values of real world as per UDT?

edit: furthermore it seems fairly preposterous to assume high probability that your utility function will actually be implemented in a working manner - say, paperclip maximizing manner - by people who really need SI to tell them to beware of creating skynet. SI is the typical 'high level idea guys' with a belief that the tech guys much smarter than them in fact are specialized in lowly stuff and need the high level idea guys to provide philosophical guidance or else we all die. Incredibly common sight in startups that should never have started up (and fail invariably).

Replies from: Wei_Dai

↑ comment by Wei Dai (Wei_Dai) · 2012-07-27T06:22:23.543Z · LW(p) · GW(p)

With regards to Ben Goertzel, where does his AGI include anything like this not so vague utility function of yours?

You seem to think that I'm claiming that UDT's notion of utility function is the only way real-world goals might be implemented in an AGI. I'm instead suggesting that it is one way to do so. It currently seems to be the most promising approach for FAI, but I certainly wouldn't say that only AIs using UDT can be said to have real-world goals.

At this point I'm wondering if Nick's complaint of vagueness was about this more general usage of "goals". It's unclear from reading his comment, but in case it is, I can try to offer a definition: an AI can be said to have real-world goals if it tries to (and generally succeeds at) modeling its environment and chooses actions based on their predicted effects on its environment.

Goals in this sense seems to be something that AGI researchers actively pursue, presumably because they think it will make their AGIs more useful or powerful or intelligent. If you read Goertzel's papers, he certainly talks about "goals", "perceptions", "actions", "movement commands", etc.

Replies from: private_messaging

↑ comment by private_messaging · 2012-07-27T08:42:39.918Z · LW(p) · GW(p)

You seem to think that I'm claiming that UDT's notion of utility function is the only way real-world goals might be implemented in an AGI. I'm instead suggesting that it is one way to do so. It currently seems to be the most promising approach for FAI, but I certainly wouldn't say that only AIs using UDT can be said to have real-world goals.

Then you having formalized your utility function has nothing to do with allegations of vagueness when it comes to defining the utility in the argument of how utility maximizers are dangerous. With regards to it being 'the most promising approach', I think it is a very, very silly idea to have an approach so general that we all may well end up sacrificed in the name of huge number of imaginary beings that might exist, an AI pascal-wagering itself on it's own. It looks like a dead end, especially for friendliness.

At this point I'm wondering if Nick's complaint of vagueness was about this more general usage of "goals". It's unclear from reading his comment, but in case it is, I can try to offer a definition: an AI can be said to have real-world goals if it tries to (and generally succeeds at) modeling its environment and chooses actions based on their predicted effects on its environment.

This does necessarily work like 'I want most paperclips to exist therefore I will talk my way into controlling the world, then kill everyone and make paperclips', though.

Goals in this sense seems to be something that AGI researchers actively pursue, presumably because they think it will make their AGIs more useful or powerful or intelligent. If you read Goertzel's papers, he certainly talks about "goals", "perceptions", "actions", "movement commands", etc.

They also don't try to make goals that couldn't be outsmarted into nihilism. We humans sort-of have a goal of reproduction, except we're too clever, and we use birth control.

In your UDT, the actual intelligent component is this mathematical intuition that you'd use to process this theory in reasonable time. The rest is optional and highly difficult (if not altogether impossible) icing, even for the most trivial goal such as paperclips, which may well in principle never work.

And the technologies employed in the intelligent component are, without any of those goals, and with much less intelligence (as in computing power and their optimality) requirement, sufficient for e.g. using them to design machinery for mind uploading.

Furthermore, and that is the most ridiculous thing, there is this 'oracle AI' being talked about, where an answering system is modelled as based on real world goals and real world utilities, as if those were somehow primal and universally applicable.

It seems to me that the goals and utilities are just an useful rhetorical device used to trigger anthropomorphization fallacy at will (in a selective way), as to solicit donations.

Replies from: Wei_Dai

↑ comment by Wei Dai (Wei_Dai) · 2012-07-27T16:22:19.765Z · LW(p) · GW(p)

They also don't try to make goals that couldn't be outsmarted into nihilism.

They're not explicitly trying to solve this problem because they don't think it's going to be a problem with their current approach of implementing goals. But suppose you're right and they're wrong, and somebody that wants to build a AGI ends up implementing a motivational system that outsmarts itself into nihilism. Well such an AGI isn't very useful so wouldn't they just keep trying until they stumble onto a motivational system that isn't so prone to nihilism?

We humans sort-of have a goal of reproduction, except we're too clever, and we use birth control.

Similarly, if we let evolution of humans continue, wouldn't humans pretty soon have a motivational system for reproduction that we won't want to cleverly work around?

Replies from: private_messaging

↑ comment by private_messaging · 2012-07-27T17:09:20.879Z · LW(p) · GW(p)

They're not explicitly trying to solve this problem because they don't think it's going to be a problem with their current approach of implementing goals.

They do not expect foom either.

Well such an AGI isn't very useful

You can still have formally defined goals - satisfy conditions on equations, et cetera. Defined internally, without the problematic real world component. Use this for e.g. designing reliable cellular machinery ('cure cancer and senescence'). Seems very useful to me.

so wouldn't they just keep trying until they stumble onto a motivational system that isn't so prone to nihilism?

How long would it take you to 'stumble' upon some goal for the UDT that translates to something actually real?

Similarly, if we let evolution of humans continue, wouldn't humans pretty soon have a motivational system for reproduction that we won't want to cleverly work around?

The evolution destructively tests designs against reality. Humans do have various motivational systems there, such as religion, btw.

I am not sure how you think a motivational system for reproduction could work, so that we would not embrace a solution that actually does not result in reproduction. (Given sufficient intelligence)

Replies from: Wei_Dai

↑ comment by Wei Dai (Wei_Dai) · 2012-07-27T18:14:58.238Z · LW(p) · GW(p)

They do not expect foom either.

Goertzel does, or at least thinks it's possible. See http://lesswrong.com/lw/aw7/muehlhausergoertzel_dialogue_part_1/ where he says "GOLEM is a design for a strongly self-modifying superintelligent AI system". Also http://novamente.net/AAAI04.pdf where he talks about Novamente potentially being "thoroughly self-modifying and self-improving general intelligence".

You can still have formally defined goals - satisfy conditions on equations, et cetera.

As I mentioned, there are AGI researchers trying to implement real-world goals right now. If they build an AGI that turns nihilistic, do you think they will just give up and start working on equation solvers instead, or try to "fix" their AGI?

How long would it take you to 'stumble' upon some goal for the UDT that translates to something actually real?

I guess probably not very long, if I had a working solution to "math intuition", a sufficiently powerful computer to experiment with, and no concerns for safety...

↑ comment by timtyler · 2012-07-24T01:47:30.630Z · LW(p) · GW(p)

Mathematically any value that AI can calculate from external anything is a function of sensory input.

Given the same stream of sensory inputs, external reality may be different depending on the AI's outputs

Actions are the product of sensory input and existing state - but the basic idea withstands this, I think.

↑ comment by TheOtherDave · 2012-07-22T07:35:31.087Z · LW(p) · GW(p)

Mathematically any value that AI can calculate from external anything is a function of sensory input.

Sure, but the kind of function matters for our purposes. That is, there's a difference between an optimizing system that is designed to optimize for sensory input of a particular type, and a system that is designed to optimize for something that it currently treats sensory input of a particular type as evidence of, and that's a difference I care about if I want that system to maximize the "something" rather than just rewire its own perceptions.

Replies from: JaneQ

↑ comment by JaneQ · 2012-07-22T08:22:06.859Z · LW(p) · GW(p)

Be specific as of what is the input domain of the 'function' in question.

And yes, there is the difference: one is well defined and what is the AI research works towards, and other is part of extensive AI fear rationalization framework, where it is confused with the notion of generality of intelligence, as to presume that the practical AIs will maximize the "somethings", followed by the notion that pretty much all "somethings" would be dangerous to maximize. The utility is a purely descriptive notion; the AI that decides on actions is a normative system.

edit: To clarify, the intelligence is defined here as 'cross domain optimizer' that would therefore be able to maximize something vague without it having to be coherently defined. It is similar to knights of the round table worrying that the AI would literally search for holy grail, because to said knights, abstract and ill defined goal of holy grail appears entirely natural; meanwhile for systems more intelligent than said knights such a confused goal, due to it's incoherence, is impossible to define.

Replies from: TheOtherDave

↑ comment by TheOtherDave · 2012-07-22T08:52:43.964Z · LW(p) · GW(p)

(shrug)

It seems to me that even if I ignore everything SI has to say about AI and existential risk and so on, ignore all the fear-mongering and etc., the idea of a system that attempts to change its environment so as to maximize the prevalence of some X remains a useful idea.

And if I extend the aspects of its environment that the system can manipulate to include its own hardware or software, or even just its own tuning parameters, it seems to me that there exists a perfectly crisp, measurable distinction between a system A that continues to increase the prevalence of X in its environment, and a system B that instead manipulates its own subsystems for measuring X.

If any part of that is as incoherent as you suggest, and you're capable of pointing out the incoherence in a clear fashion, I would appreciate that.

Replies from: JaneQ

↑ comment by JaneQ · 2012-07-22T11:44:06.054Z · LW(p) · GW(p)

the idea of a system that attempts to change its environment so as to maximize the prevalence of some X remains a useful idea.

The prevalence of X is defined how?

And if I extend the aspects of its environment that the system can manipulate to include its own hardware or software, or even just its own tuning parameters, it seems to me that there exists a perfectly crisp, measurable distinction between a system A that continues to increase the prevalence of X in its environment, and a system B that instead manipulates its own subsystems for measuring X.

In A, you confuse your model of the world with the world itself; in your model of the world you have a possible item 'paperclip', and you can therefore easily imagine maximization of number of paperclips inside your model of the world, complete with the AI necessarily trying to improve it's understanding of the 'world' (your model). With B, you construct a falsely singular alternative of a rather broken AI, and see a crisp distinction between two irrelevant ideas.

The practical issue is that the 'prevalence of some X' can not be specified without the model of the world; you can not have a function without specifying it's input domain, and the 'reality' is never an input domain of mathematical functions; the notion is not only incoherent but outright nonsensical.

If any part of that is as incoherent as you suggest, and you're capable of pointing out the incoherence in a clear fashion, I would appreciate that.

Incoherence of so poorly defined concepts can not be demonstrated when no attempts has been made to make the notions specific enough to even rationally assert coherence in the first place.

Replies from: TheOtherDave

↑ comment by TheOtherDave · 2012-07-22T16:43:06.826Z · LW(p) · GW(p)

OK. Thanks for your time.

↑ comment by DanielLC · 2012-07-23T02:48:53.295Z · LW(p) · GW(p)

It can only be said to be powerful if it will tend to do something significant regardless of how you stop it. If what it does has anything in common, even if it's nothing beyond "signficant", it can be said to value that.

Replies from: JaneQ

↑ comment by JaneQ · 2012-07-23T07:32:42.319Z · LW(p) · GW(p)

Actually, this is example of something incredibly irritating about this entire singularity topic: verbal sophistry of no consequence. What do you call 'powerful' has absolutely zero relation to anything. A powerful drill doesn't tend to do something significant regardless of how you stop it. Neither does powerful computer. Nor should powerful intelligence.

Replies from: DanielLC

↑ comment by DanielLC · 2012-07-23T18:34:15.095Z · LW(p) · GW(p)

A powerful drill doesn't tend to do something significant regardless of how you stop it. Neither does powerful computer. Nor should powerful intelligence.

In this case, I'm defining a powerful intelligence differently. An AI that is powerful in your sense is not much of a risk. It's basically the kind of AI we have now. It's neither highly dangerous, nor highly useful (in a singularity-inducing sense).

Building an AGI may not be feasible. If it is, it will be far more effective than a narrow AI, and far more dangerous. That's why it's primarily what SIAI is worried about.

Replies from: JaneQ

↑ comment by JaneQ · 2012-07-24T08:18:16.275Z · LW(p) · GW(p)

nor highly useful (in a singularity-inducing sense).

I'm not clear what we mean by singularity here. If we had an algorithm that works on well defined problems we could solve practical problems. edit: Like improving that algorithm, mind uploading, etc.

Building an AGI may not be feasible. If it is, it will be far more effective than a narrow AI,

Effective at what? Would it cure cancer sooner? I doubt so. An "AGI" with a goal it wants to do, resisting any control, is a much more narrow AI than the AI that basically solves systems of equations. Who would I rather hire: impartial math genius that solves the tasks you specify for him, or a brilliant murderous sociopath hell bent on doing his own thing? The latter's usefulness (to me, that's it) is incredibly narrow.

and far more dangerous.

Besides being effective at being worse than useless?

That's why it's primarily what SIAI is worried about.

I'm not quite sure that there's 'why' and 'what' in that 'worried'.

Replies from: DanielLC

↑ comment by DanielLC · 2012-07-24T20:42:59.628Z · LW(p) · GW(p)

I'm not clear what we mean by singularity here.

If we have an AGI, it will figure out what problems we need solved and solve them. It may not beat a narrow AI (ANI) in the latter, but it will beat you in the former. You can thus save on the massive losses due to not knowing what you want, politics, not knowing how to best optimize something, etc. I doubt we'd be able to do 1% as well without an FAI as with one. That's still a lot, but that means that a 0.1% chance of producing an FAI and a 99.9% chance of producing a UAI is better than a 100% chance of producing a whole lot of ANIs.

The latter's usefulness (to me, that's it) is incredibly narrow.

Only if his own thing isn't also your own thing.

Replies from: JaneQ

↑ comment by JaneQ · 2012-07-25T10:38:10.499Z · LW(p) · GW(p)

If we have an AGI, it will figure out what problems we need solved and solve them.

Only a friendly AGI would. The premise for funding to SI is not that they will build friendly AGI. The premise is that there is an enormous risk that someone else would for no particular reason add this whole 'valuing real world' thing into an AI, without adding any friendliness, actually restricting it's generality when it comes to doing something useful.

Ultimately, the SI position is: input from us the idea guys with no achievements (outside philosophy), are necessary for the team competent enough to build a full AGI, to not kill everyone, and therefore you should donate (Previously, the position was you should donate so we build FAI before someone builds UFAI, but Luke Muehlhauser been generalizing to non-FAI solutions). That notion is rendered highly implausible when you pin down the meaning of AGI, as we did in this discourse. For the UFAI to happen and kill everyone, a potentially vastly more competent and intelligent team that SI has to fail spectacularly.

Only if his own thing isn't also your own thing.

Will require simulation of me or a brain implant that effectively makes it extension of me. Do not want the former, and the latter is IA.

↑ comment by timtyler · 2012-07-24T01:51:16.607Z · LW(p) · GW(p)

It's valuing external reality. Valuing sensory inputs and mental models would just result in wireheading.

Thinking they are valuing "external reality" probably doesn't really protect agents from wireheading. The agents just wind up with delusional ideas about what "external reality" consists of - built of the patchwork of underspecification left by the original programmers of this concept.

Replies from: DanielLC

↑ comment by DanielLC · 2012-07-24T03:47:59.084Z · LW(p) · GW(p)

I know that it's possible for an agent that's created with a completely underspecified idea of reality to nonetheless value external reality and avoid wireheading. I know this because I am such an agent.

Everything humans can do, an AI could do. There's little reason to believe humans are remotely optimum, so an AI could likely do it better.

Replies from: timtyler

↑ comment by timtyler · 2012-07-25T00:34:14.066Z · LW(p) · GW(p)

The "everything humans can do, an AI could do better" argument cuts both ways. Humans can wirehead - machines may be able to wirehead better. That argument is pretty symmetric with the "wirehead avoidance" argument. So: I don't think either argument is worth very much. There may be good arguments that illuminate the future frequency of wireheading, but these don't qualify. It seems quite possible that our entire civilization could wirehead itself - along the lines suggested by David Pearce.

Replies from: DanielLC

↑ comment by DanielLC · 2012-07-25T00:44:02.936Z · LW(p) · GW(p)

Everything a human can do, a human cannot do in the most extreme possible manner. An AI could be made to wirehead easier or harder. It could think faster or slower. It could be more creative or less creative. It could be nicer or meaner.

I wouldn't begin to know how to build an AI that's improved in all the right ways. It might not even be humanly possible. If it's not humanly possible to build a good AI, it's likely impossible for the AI to be able to improve on itself. There's still a good chance that it would work.

Replies from: timtyler

↑ comment by timtyler · 2012-07-25T09:52:28.571Z · LW(p) · GW(p)

An AI could be made to wirehead easier or harder.

Probably true - and few want wireheading machines - but the issues are the scale of the technical challenges, and - if these are non-trivial - how much folk will be prepared to pay for the feature. In a society of machines, maybe the occasional one that turns Buddhist - and needs to go back to the factory for psychological repairs - is within tolerable limits.

Many apparently think that making machines value "external reality" fixes the wirehead problem - e.g. see "Model-based Utility Functions" - but it leads directly to the problems of what you mean by "external reality" and how to tell a machine that that is what it is supposed to be valuing. It doesn't look much like solving the problem to me.

comment by Epiphany · 2012-08-25T23:40:47.587Z · LW(p) · GW(p)

FAI is a security risk not a fix:

"One way to think about Friendly AI is that it's an offensive approach to the problem of security (i.e., take over the world), instead of a defensive one."

Not if the AI itself is vulnerable to penetration. By your own reasoning, we have no reason to think they won't be. They may turn out to be one of the biggest security liabilities because the way it executes tasks may be very intelligent and there's no reason to believe they won't be reprogrammed to do unfriendly things.

Friendly AI is only friendly until a human figures out how to abuse it.

Replies from: Wei_Dai

↑ comment by Wei Dai (Wei_Dai) · 2012-08-26T06:16:23.039Z · LW(p) · GW(p)

An FAI would have some security advantages. It can achieve physical security by taking over the world and virtualizing everyone else, and ought to also have enough optimization power to detect and fix all the "low level" information vulnerabilities (e.g., bugs in its CPU design or network stack). That still leaves "high level" vulnerabilities, which are sort of hard to distinguish from "failures of Friendliness". To avoid these, what I've advocated in the past is that FAI shouldn't be attempted until its builders have already improved beyond human intelligence via other seemingly safer means.

BTW, you might enjoy my Hacking the CEV for Fun and Profit.

(Edit to add some disclaimers, since Epiphany expressed a concern about PR consequences of this comment: Here I was implicitly assuming that virtualizing people is harmless, but I'm not sure about this, and if it's not, I would prefer the FAI not to virtualize people. Also, I do not work for SIAI nor am I affiliated with them.)

Replies from: Epiphany, Epiphany, Epiphany

↑ comment by Epiphany · 2012-08-31T07:10:44.378Z · LW(p) · GW(p)

No go. Four reasons.

One:

If the builders have increased their intelligence levels that high, then other people of that time will be able to do the same and therefore potentially crack the AI.

Two:

Also, I may as well point out that your argument is based on the assumption that enough intelligence will make for perfect security. It may be that no matter how intelligent the designers are, their security plans are not perfect. Perfect security looks to be about as likely, to me, as perpetual motion is. No matter how much intelligence you throw at it, you won't get a perpetual motion machine. We'd need to discover some paradigm shattering physics information for that to be so. I suppose it is possible that someone will shatter the physics paradigms by discovering new information, but that's not something to count on to build a perpetual motion machine, especially when you're counting on the perpetual motion machine to keep the world safe.

Three:

Whenever humans have tried to collect too much power into one place, it has not worked out for them. For instance, communism in Russia. They thought they'd share all the money by letting one group distribute it. That did not work.

The founding fathers of the USA insisted on checking and balancing the government's power. Surely you are aware of the reasons for that.

If the builders are the only ones in the world with intelligence levels that high, the power of that may corrupt them, and they may make a pact to usurp the AI themselves.

Four:

There may be unexpected thoughts you encounter in that position that seem to justify taking advantage of the situation. For instance, before becoming a jailor, you would assume you're going to be ethical and fair. In that situation, though, people change. (See also: Zimbardo's Stanford prison experiment).

Why do they change? I imagine the reasoning goes a little like this: "Great I'm in control. Oh, wait. Everyone wants to get out. Okay. And they're a threat to me because I'm keeping them in here. I'm going to get into a lot of power struggles in this job. Considering that even if I fail only 1% of the time, the consequences of failing at a power struggle are very dire, so I should probably err on the side of caution - use too much force rather than too little. And if it's okay to use physical force, then how bad is using a little psychological oppression as a deterrent? That will be a bit of extra security for me and help me maintain order in this jail. Considering the serious risk, and the high chance of injury, it's necessary to use everything I've got."

We don't know what kinds of reasoning processes the AI builders will get into at that time. They might be thinking like this:

"We're going to make the most powerful thing in the world, yay! But wait, everyone else wants it. They're trying to hack us, spy on us... there are people out there who would kidnap us and torture us to get a hold of this information. They might do all kinds of horrible things to us. Oh my goodness and they're not going to stop trying to hack us when we're done. Our information will still be valuable. I could get kidnapped years from now and be tortured for this information then. I had better give myself some kind of back door into the AI, something that will make it protect me when I need it. (A month later) Well... surely it's justified to use the back door for this one thing... and maybe for that one thing, too... man I've got threats all over me, if I don't do this perfectly, I'll probably fail ... even if I only make a mistake 1 in 100 times, that could be devastating. (Begins using the back door all the time.) And I'm important. I'm working on the most powerful AI. I'm needed to make a difference in the world. I had better protect myself and err on the side of caution. I could do these preventative things over here... people won't like the limits I place on them, but the pros outweigh the cons, so: oppress." The limits may be seen as evidence that the AI builders cannot be trusted (regardless of how justified they are, there will be some group of people who feels oppressed by new limits, possibly irrational people or possibly people who see a need for the freedom that the AI builders don't) and if a group of people are angry about the limits, they will then be opposed to the AI builders. If they begin to resist the AI builders, the AI builders will be forced to increase security, which may oppress them further. This could be a feedback loop that gets out of hand: Increasing resistance to the AI builders justifies increasing oppression, and increasing oppression justifies increasing resistance.

This is how an AI builder could turn into a jailor.

If part of the goal is to create an AI that will enforce laws, the AI researchers will be part of the penal system, literally. We could be setting ourselves up for the world's most spectacular prison experiment.

Checks and balances, Wei_Dai.

↑ comment by Epiphany · 2012-09-01T21:19:41.847Z · LW(p) · GW(p)

Whoever it is that keeps thumbing down my posts in this thread is invited to bring brutal honesty down onto my ideas, I am not afraid.

If "virtualizing everyone" means what I think you mean by that, that's a euphemism. That it will achieve physical security implies that the physical originals of those people would not exist after the process - otherwise you'd just have two copies of every person which, in theory, could increase their chances of cracking the AI. It sounds like what you're saying here is that the "friendly" AI would copy everyone's mind into a computer system and then kill them.

Maybe it seems to some people like copying your mind will preserve you, but imagine this: Ten copies are made. Do you, the physical original person, experience what all ten copies of you are experiencing at once? No. And if you, the physical original person, ceased to exist, would you continue by experiencing what a copy of you is experiencing? Would you have control over their actions? No.

You'd be dead.

Making copies of ourselves won't save our lives - that would only preserve our minds.

Now, if you meant something else by "virtualize" I'd be happy to go read about it. After turning up with absolutely no instances of the terms "virtualize people" or "virtualize everyone" on the internet (barring completely different uses like "blah blah blah virtualize. Everyone is blah blah.") I have no idea what you mean by "virtualize everyone" if it isn't "copy their minds and then kill their bodies."

Replies from: Vladimir_Nesov, Wei_Dai, Risto_Saarelma

↑ comment by Vladimir_Nesov · 2012-09-01T21:36:40.087Z · LW(p) · GW(p)

You'd be dead.

The Worst Argument in the World. This is not a typical instance of "dead", so the connotations of typical examples of "dead" don't automatically apply.

Replies from: Richard_Kennaway, Epiphany

↑ comment by Richard_Kennaway · 2012-09-01T22:53:36.376Z · LW(p) · GW(p)

Tabooing the word "dead", I ask myself, if a copy of myself was made, and ran independently of the original, the original continuing to exist, would either physical copy object to being physically destroyed provided the other continued in existence? I believe both of us would. Even the surviving copy would object to the other being destroyed.

But that's just me. How do other people feel?

Replies from: Epiphany

↑ comment by Epiphany · 2012-09-01T23:37:12.849Z · LW(p) · GW(p)

Assuming the copy had biochemistry, or some other way of experiencing emotions, the cop(ies) of me would definitely object to what had happened. Alternately, if a virtual copy of me was created and was capable of experiencing, I would feel that it was important for the copy to have the opportunity to make a difference in the world - that's why I live - so, yes, I would feel upset about my copy being destroyed.

You know, I think this problem has things in common with the individualism vs. communism debate. Do we view the copies as parts of a whole, unimportant in and of themselves, or do we view them all as individuals?

If we were to view them as parts of a whole, then what is valued? We don't feel pain or pleasure as a larger entity made up of smaller entities. We feel it individually. If happiness for as many life forms as possible is our goal, both the originals and the copies should have rights. If they copies are capable of experiencing pain and pleasure, they need to have human rights the same as ours. I would not see it as ethical to let myself be copied if my copies would not have rights.

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2012-09-01T23:40:00.635Z · LW(p) · GW(p)

Do we view the copies as parts of a whole, unimportant in and of themselves, or do we view them all as individuals?

We should view them as what they actually are, parts of the world with certain morally relevant structure.

↑ comment by Epiphany · 2012-09-01T22:39:11.410Z · LW(p) · GW(p)

Thank you, Vladmir, for your honest criticism, and more is invited. However, this "Worst argument in the world" comparison is not applicable here. In Yvain's post, he explains:

The opponent is saying "Because you don't like criminals, and Martin Luther King is a criminal, you should stop liking Martin Luther King." But King doesn't share the important criminal features of being driven by greed, preying on the innocent, or weakening the fabric of society that made us dislike criminals in the first place. Therefore, even though he is a criminal, there is no reason to dislike King.

If we do an exercise where we substitute the words "criminal" and "Martin Luther King" with "virtualization" and "death", and read the sentence that results, I think you'll see my point:

The opponent is saying "Because you don't like death, and being virtualized will cause death, you should stop liking the idea of being virtualized by an AGI." But virtualization doesn't share the important features of death like not being able to experience anymore and the inability to enjoy the world that made us dislike death in the first place. Therefore, even though being virtualized by an AGI will cause death, there is no reason to dislike virtualization.

Not being able to experience anymore and not being able to enjoy the world are unacceptable results of "being virtualized". Therefore, we should not like the idea of being virtualized by AGI.

↑ comment by Wei Dai (Wei_Dai) · 2012-09-02T07:28:34.150Z · LW(p) · GW(p)

That it will achieve physical security implies that the physical originals of those people would not exist after the process

Yes, that's what I was thinking when I wrote that, but if the FAI concludes that replacing a physical person with a software copy isn't a harmless operation, it could instead keep physical humans around and place them into virtual environments Matrix-style.

Replies from: Epiphany

↑ comment by Epiphany · 2012-09-02T07:41:00.381Z · LW(p) · GW(p)

Um. Shouldn't we be thinking "how will we get the FAI to conclude that replacing people with software is not harmless" not "If the FAI concludes that this is harmless..."

After all, if it's doing something that kills people, it isn't friendly.

To place people into a virtual environment would be to take away their independence. Humans have a need for dignity, and I think that would be bad for them. I think FAI should know better than to do that, too.

Replies from: Vladimir_Nesov, Risto_Saarelma

↑ comment by Vladimir_Nesov · 2012-09-02T16:43:53.719Z · LW(p) · GW(p)

Um. Shouldn't we be thinking "how will we get the FAI to conclude that replacing people with software is not harmless" not "If the FAI concludes that this is harmless..."

If it's actually a FAI, you should approve of what it decides, not of what people (including yourself) currently believe. If it can't be relied upon in this manner, it's not (known to be) a FAI. You know whether it's a FAI based on its design, not based on its behavior, which it won't generally be possible to analyze (or do something about).

You shouldn't be sure about correct answers to object level (very vaguely specified) questions like "Is replacing people with software harmless?". A FAI should use a procedure that's more reliable in answering such questions than you or any other human is. If it's using such a procedure, then what it decides is a more reliable indicator of what the correct decision is than what you (or I) believe. It's currently unclear how to produce such a procedure.

Demanding that FAI conforms to moral beliefs currently held by people is also somewhat pointless in the sense that FAI has to be able to make decisions about much more precisely specified decision problems, such that humans won't be able to analyze those decision problems in any useful way, so there are decisions where moral beliefs currently or potentially held by humans don't apply. If it's built so as to be able to make such decisions, it will also be able to answer the questions about which there are currently held beliefs, as a special case. If the procedure for answering moral questions is more reliable in general, it'll also be more reliable for such questions.

See Complex Value Systems are Required to Realize Valuable Futures for some relevant arguments.

↑ comment by Risto_Saarelma · 2012-09-02T15:08:34.431Z · LW(p) · GW(p)

Um. Shouldn't we be thinking "how will we get the FAI to conclude that replacing people with software is not harmless" not "If the FAI concludes that this is harmless..."

A lot of people here think that the quick assumption that unusual substrate changes necessarily imply death isn't well-founded. Arguing with the assumption that it is obviously true will not be helpful.

There should be a good single post or wiki page to like to about this debate (it also comes up constantly in cryonics discussions), but I don't think there is one.

↑ comment by Risto_Saarelma · 2012-09-02T05:38:48.605Z · LW(p) · GW(p)

This is a pretty long-running debate on LW and you've just given the starting argument yet again. One recent response is asking how can you tell sleeping won't kill you?

↑ comment by Epiphany · 2012-08-31T07:24:35.961Z · LW(p) · GW(p)

Physical walls are superior to logical walls according to what I've read. Turning everything into logic won't solve the largest of your security problems, and could exacerbate them.

That's five.

comment by Epiphany · 2012-08-31T19:57:54.888Z · LW(p) · GW(p)

Security is solving the problem after the fact, and I think that is totally the wrong approach here, we should be asking if something can be designed into the AI that prevents people from wanting to take the AI over or prevents takeovers from being disastrous (three suggestions for that are included in this comment).

Perhaps the best approach to security is to solve the problems humans are having that cause them to commit crimes. Of course this appears to be a chicken-or-egg proposition "Well, the AI can't solve the problems until it's securely built, and it won't be securely built until it's solved the problems." but we could look at this problem in several other ways. Each of these are with the assumption that it is safest to deem the security problem impossible to solve after the fact:

Make a lesser AI that is just powerful enough to address the root reasons humans cause security issues - the fact that it is less powerful may also make it a lot less dangerous in the event that it's taken over.
Give everyone access to AI. If everyone already has access, then there will be no big jackpot to steal. If everyone else has access to AI, too, then taking one over wouldn't buy you any power. This is a paradigm shift from what you guys are suggesting, and would require that much of this is gone about in a different way. At first it looks like this would exacerbate security problems - won't people use them for bad purposes? Of course. But if EVERYONE has access, that won't be any scarier than a situation in which a gunman wants to take over, and everyone else has a gun. You can't take over a room group of equally armed people. Imagine taking out a gun in a room full of people with guns. You'll end up getting shot - you won't be taking over. One AI can't take over a world full of AI. Distributing the power everywhere would prevent many security problems ranging from deterring criminals from committing crimes (and limiting the damage criminals can do) to checking and balancing big powers to prevent tyranny to preventing the corruption of good people that may otherwise happen if they're given too much power to preventing a situation similar to Zimbardo's prison experiment where powerful AI builders feel a need to defend themselves from hackers, criminals and tyrants and therefore begin using the power they have in an oppressive way. {reason four}
Focus on enlightenment. The human race has far too much power and far too little wisdom. Increasing the power of toddlers won't solve toddler's issues but only increase the damage the toddlers do to one another because of their issues. And so, increasing the power of an unenlightened human race will only amplify their pain. If we all practice non-attachment, we will be strong enough to heed "those who sacrifice freedom for security deserve neither" and hopefully cease the addiction to violating freedom, either by taking advantage of others or sacrificing our own, in order to gain more security. Might it make more sense to focus on enlightenment than on throwing power at an unenlightened species in an attempt to solve their problems? Maybe we should be asking "How can technology help us become more enlightened?"

"Intelligent people solve problems, geniuses prevent them." - Einstein

I think AI builders need to do like the geniuses. Prevent people from WANTING to take over the AI or prevent a takeover from being a big deal. There is nothing you can do guarantee it won't be taken over, not with the level of certainty that we'd need to have for a problem that could be this devastating to the entire world. The only way this much power can exist, and be safe from humans, is to address the human element itself.

comment by Benquo · 2012-07-23T02:15:14.969Z · LW(p) · GW(p)

This argument seems be following a common schema:

To understand X, it is necessary to understand its relations to other things in the world.

But to understand its relations to each of the other things that exist, it is necessary to understand each of those things as well.

Y describes many of the things that commonly interact with X.

Therefore, the best way to advance our understanding of X, is to learn about Y.

Is that a fair description of the structure of the argument? If so, are you arguing that our understanding of superintelligence needs to be advanced through better understanding of security, or that our understanding of security needs to be advanced by better understanding of superintelligence?

comment by billswift · 2012-07-21T20:04:57.145Z · LW(p) · GW(p)

"One Way Functions" aren't strictly one-way; they are just much harder to calculate in one direction than the other. A breakthrough in algorithms, or a powerful enough computer, can solve the problem.

comment by TheOtherDave · 2012-07-21T20:00:47.105Z · LW(p) · GW(p)

Not really an answer to your question, but it seems to me a lot depends on what position I take wrt value drift and the subject-dependence of values.

At one extreme: if I believe that whatever I happen to value right now is what I value, and what I value tomorrow is what I value tomorrow, and it simply doesn't matter how those things relate to each other, I just want to optimize my environment for what I value at any given moment, then it makes sense to concentrate on security without reference to goals. More precisely, it makes sense to concentrate on mechanisms for optimizing my environment for any given value, and security is a very important part of that.

At another extreme: if I believe that there is One True Value Set that ought to be optimized for (even if I don't happen to know what that is, or even if I don't particularly value it [1] ), thinking about goals is valuable only insofar as it leads to systems better able to implement the OTVS.

Only if I believe that my values are the important ones, believe my values can change, and endorse my current values over my values at other times, does working out a way to preserve my current values against value-shifts (either intentionally imposed shifts, as in your examples, or natural drift) start to seem important.

I know lots of people who don't seem to believe that their current values are more important than their later values, at least not in any way that consistently constrains their planning. That is, they seem to prefer to avoid committing to their current values, and to instead keep their options open.

And I can see how that sort of thinking leads to the idea that "secure property rights" (and, relatedly, reliably enforced consensual contracts) are the most important thing.

[1] EDIT: in retrospect, this is a somewhat confused condition; what I really mean is more like "even if I'm not particularly aware of myself valuing it", or "even if my valuation of it is not reflectively consistent" or something of that sort.

Replies from: DanielLC, torekp

↑ comment by DanielLC · 2012-07-22T05:00:39.596Z · LW(p) · GW(p)

More precisely, it makes sense to concentrate on mechanisms for optimizing my environment for any given value, and security is a very important part of that.

Wouldn't that be a bad idea? If you change your mind as to what you value, then Future!you will optimize for something Present!you doesn't want. Since you're only worried about Present!you's goals, that would be bad.

Replies from: TheOtherDave

↑ comment by TheOtherDave · 2012-07-22T05:06:19.668Z · LW(p) · GW(p)

Sure, if I'm only worried about Present!me's goals, then the entire rest of the paragraph you didn't bother quoting is of course false, and the sentence you quote for which that paragraph was intended as context is also false.

Replies from: DanielLC

↑ comment by DanielLC · 2012-07-22T05:59:40.163Z · LW(p) · GW(p)

At one extreme: if

Sorry. I missed a word when I read it the first time.

↑ comment by torekp · 2012-07-21T23:51:27.050Z · LW(p) · GW(p)

I don't understand how your hypothetical beliefs of paragraph two differ from those of paragraph four. Or don't they? Please elaborate. Are you saying that Nick Szabo's position depends on (or at least is helped by) viewing one's later values as quite possibly better than current ones?

Replies from: TheOtherDave

↑ comment by TheOtherDave · 2012-07-22T03:24:28.206Z · LW(p) · GW(p)

What I'm referring to in paragraphs 2 and 4 are similar enough that what differences may exist between them don't especially matter to any point I'm making.

Are you saying that Nick Szabo's position depends on (or at least is helped by) viewing one's later values as quite possibly better than current ones?

No, and in fact I don't believe that. Better to say that, insofar as an important component of Friendliness research is working out ways to avoid value drift, the OP's preference for Friendliness research over security research is reinforced by a model of the world in which value drift is a problem to be avoided/fixed rather than simply a neutral feature of the world.

Work on Security Instead of Friendliness?

Contents

107 comments