Advice for AI makers

post by Stuart_Armstrong · 2010-01-14T11:32:04.429Z · LW · GW · Legacy · 211 comments

A friend of mine is about to launch himself heavily into the realm of AI programming. The details of his approach aren't important; probabilities dictate that he is unlikely to score a major success. He's asked me for advice, however, on how to design a safe(r) AI. I've been pointing him in the right directions and sending him links to useful posts on this blog and the SIAI.

Do people here have any recommendations they'd like me to pass on? Hopefully, these may form the basis of a condensed 'warning pack' for other AI makers.

Addendum: Advice along the lines of "don't do it" is vital and good, but unlikely to be followed. Coding will nearly certainly happen; is there any way of making it less genocidally risky?

211 comments

Comments sorted by top scores.

comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2010-01-15T03:43:34.043Z · LW(p) · GW(p)

"And I heard a voice saying 'Give up! Give up!' And that really scared me 'cause it sounded like Ben Kenobi." (source)

Friendly AI is a humongous damn multi-genius-decade sized problem. The first step is to realize this, and the second step is to find some fellow geniuses and spend a decade or two solving it. If you're looking for a quick fix you're out of luck.

The same (albeit to a lesser degree) is fortunately also true of Artificial General Intelligence in general, which is why the hordes of would-be meddling dabblers haven't killed us all already.

Replies from: Wei_Dai, Psy-Kosh, Stuart_Armstrong
comment by Wei Dai (Wei_Dai) · 2010-01-16T04:35:13.921Z · LW(p) · GW(p)

This article (which I happened across today) written by Ben Goertzel should make interesting reading for a would-be AI maker. It details Ben's experience trying to build an AGI during the dot-com bubble. His startup company, Webmind, Inc., apparently had up to 130 (!) employees at its peak.

According to the article, the AGI was almost completed, and the main reason his effort failed was that the company ran out of money due to the bursting of the bubble. Together with the anthropic principle, this seems to imply that Ben is the person responsible for the stock market crash of 2000.

I was always puzzled why SIAI hired Ben Goertzel to be its research director, and this article only deepens the mystery. If Ben has done an Eliezer-style mind-change since writing that article, I think I've missed it.

ETA: Apparently Ben has recently been helping his friend Hugo de Garis build an AI at Xiamen University under a grant from the Chinese government. How do you convince someone to give up building an AGI when your own research director is essentially helping the Chinese government build one?

Replies from: timtyler, Wei_Dai, XiXiDu, outlawpoet
comment by timtyler · 2011-06-25T12:06:47.955Z · LW(p) · GW(p)

I was always puzzled why SIAI hired Ben Goertzel to be its research director, and this article only deepens the mystery.

Ben has a Phd, can program, has written books on the subject and has some credibility. Those kinds of things can help a little if you are trying to get people to give you money in the hope of you building a superintelligent machine. For more see here:

It has similarly been a general rule with the Singularity Institute that, whatever it is we're supposed to do to be more credible, when we actually do it, nothing much changes. "Do you do any sort of code development? I'm not interested in supporting an organization that doesn't develop code" -> OpenCog -> nothing changes. "Eliezer Yudkowsky lacks academic credentials" -> Professor Ben Goertzel installed as Director of Research -> nothing changes. The one thing that actually has seemed to raise credibility, is famous people associating with the organization, like Peter Thiel funding us, or Ray Kurzweil on the Board.

comment by Wei Dai (Wei_Dai) · 2010-01-20T04:15:32.555Z · LW(p) · GW(p)

I just came across an old post of mine that asked a similar question:

BTW, I still remember the arguments between Eliezer and Ben about Friendliness and Novamente. As late as January 2005, Eliezer wrote:

And if Novamente should ever cross the finish line, we all die. That is what I believe or I would be working for Ben this instant.

I'm curious how that debate was resolved?

From the reluctance of anyone at SIAI to answer this question, I conclude that Ben Goertzel being the Director of Research probably represents the outcome of some internal power struggle/compromise at SIAI, whose terms of resolution included the details of the conflict being kept secret.

What is the right thing to do here? Should we try to force an answer out of SIAI, for example by publicly accusing it of not taking existential risk seriously? That would almost certainly hurt SIAI as a whole, but might strengthen "our" side of this conflict. Does anyone have other suggestions for how to push SIAI in a direction that we would prefer?

Replies from: Eliezer_Yudkowsky
comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2010-01-20T04:25:05.988Z · LW(p) · GW(p)

The short answer is that Ben and I are both convinced the other is mostly harmless.

Replies from: Wei_Dai, Furcas, wedrifid
comment by Wei Dai (Wei_Dai) · 2010-01-20T04:36:07.457Z · LW(p) · GW(p)

Have you updated that in light of the fact that Ben just convinced the Chinese government to start funding AGI? (See my article link earlier in this thread.)

Replies from: Eliezer_Yudkowsky
comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2010-01-20T04:39:01.487Z · LW(p) · GW(p)

Hugo de Garis is around two orders of magnitude more harmless than Ben.

Replies from: Kevin, Wei_Dai, wedrifid
comment by Kevin · 2010-06-24T20:28:27.198Z · LW(p) · GW(p)

Update for anyone that comes across this comment: Ben Goertzel recently tweeted that he will be taking over Hugo de Garis's lab, pending paperwork approval.

http://twitter.com/bengoertzel/status/16646922609

http://twitter.com/bengoertzel/status/16647034503

comment by Wei Dai (Wei_Dai) · 2010-01-20T05:11:33.139Z · LW(p) · GW(p)

Hugo de Garis is around two orders of magnitude more harmless than Ben.

What about all the other people Ben might help obtain funding for, partly due to his position at SIAI?

And what about the public relations/education aspect? It's harmless that SIAI appears to not consider AI to be a serious existential risk?

Replies from: wedrifid, Eliezer_Yudkowsky
comment by wedrifid · 2010-01-20T12:26:58.398Z · LW(p) · GW(p)

And what about the public relations/education aspect? It's harmless that SIAI appears to not consider AI to be a serious existential risk?

This part was not answered. It may be a question to ask someone other than Eliezer. Or just ask really loudly. That sometimes works too.

comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2010-01-20T06:37:00.739Z · LW(p) · GW(p)

What about all the other people Ben might help obtain funding for, partly due to his position at SIAI?

The reverse seems far more likely.

Replies from: Wei_Dai
comment by Wei Dai (Wei_Dai) · 2010-01-20T12:16:06.965Z · LW(p) · GW(p)

What about all the other people Ben might help obtain funding for, partly due to his position at SIAI?

The reverse seems far more likely.

I don't know how to parse that. What do you mean by "the reverse"?

Replies from: wedrifid
comment by wedrifid · 2010-01-20T12:23:09.385Z · LW(p) · GW(p)

I don't know how to parse that. What do you mean by "the reverse"?

Ben's position at SIAI may reduce the expected amount of funding he obtains for other existentially risky persons.

comment by wedrifid · 2010-01-20T04:47:41.665Z · LW(p) · GW(p)

How much of this harmlessness is perceived impotence and how much is it an approximately sane way of thinking?

Replies from: Eliezer_Yudkowsky, XiXiDu
comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2010-01-20T05:01:11.983Z · LW(p) · GW(p)

Wholly perceived impotence.

comment by XiXiDu · 2010-11-04T18:53:23.236Z · LW(p) · GW(p)

Do you believe the given answer? And if Ben is really that impotent, what do you think does it reveal about the SIAI, or whoever put Ben into a position within the SIAI?

Replies from: wedrifid
comment by wedrifid · 2010-11-04T19:00:43.685Z · LW(p) · GW(p)

Do you believe the given answer?

I don't know enough about his capabilities when it comes to contributing to unfriendly AI research to answer that. Being unable to think sanely about friendliness or risks may have little bearing on your capabilities with respect to AGI research. The modes of thinking have very little bearing on each other.

And if Ben is really that impotent, what do you think does it reveal about the SIAI, or whoever put Ben into a position within the SIAI?

That they may be more rational and less idealistic than I may otherwise have guessed. There are many potential benefits the SIAI could gain from an affiliation with those inside the higher status AGI communities. Knowing who to know has many uses unrelated to knowing what to know.

Replies from: ata, XiXiDu
comment by ata · 2010-11-04T20:48:36.375Z · LW(p) · GW(p)

That they may be more rational and less idealistic than I may otherwise have guessed. There are many potential benefits the SIAI could gain from an affiliation with those inside the higher status AGI communities. Knowing who to know has many uses unrelated to knowing what to know.

Indeed. I read part of this post as implying that his position had at least a little bit to do with gaining status from affiliating with him ("It has similarly been a general rule with the Singularity Institute that, whatever it is we're supposed to do to be more credible, when we actually do it, nothing much changes. 'Do you do any sort of code development? I'm not interested in supporting an organization that doesn't develop code' -> OpenCog -> nothing changes. 'Eliezer Yudkowsky lacks academic credentials' -> Professor Ben Goertzel installed as Director of Research -> nothing changes.").

Replies from: wedrifid
comment by wedrifid · 2010-11-04T21:08:23.919Z · LW(p) · GW(p)

Indeed. I read this post as implying that his position had at least a little bit to do with gaining status from affiliating with him ("It has similarly been a general rule with the Singularity Institute that, whatever it is we're supposed to do to be more credible, when we actually do it, nothing much changes

That's an impressive achievement! I wonder if they will be able to maintain it? I also wonder whether they will be able to distinguish those times when the objections are solid, not merely something to treat as PR concerns. There is a delicate balance to be found.

comment by XiXiDu · 2010-11-04T19:41:15.604Z · LW(p) · GW(p)

There are many potential benefits the SIAI could gain from an affiliation with those inside the higher status AGI communities. Knowing who to know has many uses unrelated to knowing what to know.

Does this suggest that founding a stealth AGI institute (to coordinate conferences, and communication between researchers) might be suited to oversee and influence potential undertakings that could lead to imminent high-risk situations?

By the way, I noticed from my server logs that the Institute for Defense Analyses seems to be reading LW. They visited my homepage, referred by my LW profile. So one should think about the consequences of discussing such matters in public, respectively not doing so.

Replies from: Nick_Tarleton, wedrifid
comment by Nick_Tarleton · 2010-12-24T23:59:59.796Z · LW(p) · GW(p)

By the way, I noticed from my server logs that the Institute for Defense Analyses seems to be reading LW.

Most likely, someone working there just happens to.

comment by wedrifid · 2010-11-04T21:03:04.016Z · LW(p) · GW(p)

By the way, I noticed from my server logs that the Institute for Defense Analyses seems to be reading LW. They visited my homepage, referred by my LW profile. So one should think about the consequences of discussing such matters in public, respectively not doing so.

Fascinating.

comment by Furcas · 2010-01-20T04:48:32.704Z · LW(p) · GW(p)

Can we know how you came to that conclusion?

comment by wedrifid · 2010-01-20T04:39:07.267Z · LW(p) · GW(p)

There is one 'mostly harmless' for people who you think will fail at AGI. There is an entirely different 'mostly harmless' for actually have a research director who tries to make AIs that could kill us all. Why would I not think the SIAI is itself an existential risk if the criteria for director recruitment is so lax? Being absolutely terrified of disaster is the kind of thing that helps ensure appropriate mechanisms to prevent defection are kept in place.

What is the right thing to do here? Should we try to force an answer out of SIAI, for example by publicly accusing it of not taking existential risk seriously?

Yes. The SIAI has to convince us that they are mostly harmless.

comment by XiXiDu · 2011-06-24T09:35:07.736Z · LW(p) · GW(p)

According to the article, the AGI was almost completed, and the main reason his effort failed was that the company ran out of money due to the bursting of the bubble. Together with the anthropic principle, this seems to imply that Ben is the person responsible for the stock market crash of 2000.

Phew...I was almost going to call bullshit on this but that would be impolite.

comment by outlawpoet · 2010-01-16T22:20:05.868Z · LW(p) · GW(p)

That is an excellent question.

comment by Psy-Kosh · 2010-01-15T05:16:06.671Z · LW(p) · GW(p)

And now for a truly horrible thought:

which is why the hordes of would-be meddling dabblers haven't killed us all already.

I wonder to what extent we've been "saved" so far by anthropics. Okay, that's probably not the dominant effect. I mean, yeah, it's quite clear that AI is, as you note, REALLY hard.

But still, I can't help but wonder just how little or much that's there.

Replies from: cousin_it
comment by cousin_it · 2010-01-18T13:48:04.987Z · LW(p) · GW(p)

If you think anthropics has saved us from AI many times, you ought to believe we will likely die soon, because anthropics doesn't constrain the future, only the past. Each passing year without catastrophe should weaken your faith in the anthropic explanation.

Replies from: satt, Houshalter
comment by satt · 2014-06-30T21:06:20.388Z · LW(p) · GW(p)

The first sentence seems obviously true to me, the second probably false.

My reasoning: to make observations and update on them, I must continue to exist. Hence I expect to make the same observations & updates whether or not the anthropic explanation is true (because I won't exist to observe and update on AI extinction if it occurs), so observing a "passing year without catastrophe" actually has a likelihood ratio of one, and is not Bayesian evidence for or against the anthropic explanation.

comment by Houshalter · 2013-10-01T23:40:15.033Z · LW(p) · GW(p)

Wouldn't the anthropic argument apply just as much in the future as it does now? The world not being destroyed is the only observable result.

Replies from: None
comment by [deleted] · 2013-10-02T00:50:06.180Z · LW(p) · GW(p)

The future hasn't happened yet.

Replies from: Houshalter
comment by Houshalter · 2013-10-02T02:32:47.281Z · LW(p) · GW(p)

Right. My point was in the future you are still going to say "wow the world hasn't been destroyed yet" even if in 99% of alternate realities it was. cousn_it said:

Each passing year without catastrophe should weaken your faith in the anthropic explanation.

Which shouldn't be true at all.

If you can not observe a catastrophe happen, then not observing a catastrophe is not evidence for any hypothesis.

Replies from: nshepperd, wedrifid, None
comment by nshepperd · 2013-10-02T04:45:22.869Z · LW(p) · GW(p)

"Not observing a catastrophe" != "observing a non-catastrophe". If I'm playing russian roulette and I hear a click and survive, I see good reason to take that as extremely strong evidence that there was no bullet in the chamber.

Replies from: Houshalter
comment by Houshalter · 2013-10-02T06:19:51.188Z · LW(p) · GW(p)

But doesn't the anthropic argument still apply? Worlds where you survive playing russian roulette are going to be ones where there wasn't a bullet in the chamber. You should expect to hear a click when you pull the trigger.

Replies from: nshepperd
comment by nshepperd · 2013-10-02T06:24:32.388Z · LW(p) · GW(p)

As it stands, I expect to die (p=1/6) if I play russian roulette. I don't hear a click if I'm dead.

Replies from: Houshalter
comment by Houshalter · 2013-10-02T22:18:18.765Z · LW(p) · GW(p)

That's the point. You can't observe anything if you are dead, therefore any observations you make are conditional on you being alive.

Replies from: None
comment by [deleted] · 2014-06-28T06:40:59.438Z · LW(p) · GW(p)

Those universes where you die still exist, even if you don't observe them. If you carry your logic to its conclusion, there would be no risk to playing russian roulette, which is absurd.

Replies from: shminux, Houshalter
comment by shminux · 2014-06-28T07:26:11.346Z · LW(p) · GW(p)

The standard excuse given by those who pretend to believe in many worlds is that you are likely to get maimed in the universes where you get shot but don't die, which is somewhat unpleasant. If you come up with a more reliable way to quantum suicide, like using a nuke, they find another excuse.

Replies from: None
comment by [deleted] · 2014-06-28T16:17:24.765Z · LW(p) · GW(p)

Methinks that is still a lack of understanding, or a disagreement on utility calculations. I myself would rate the universes where I die as lower utility still than those were I get injured (indeed the lowest possible utility).

Better still if in all the universes I don't die.

Replies from: DefectiveAlgorithm
comment by DefectiveAlgorithm · 2014-06-29T02:47:36.858Z · LW(p) · GW(p)

I do think 'a disagreement on utility calculations' may indeed be a big part of it. Are you a total utilitarian? I'm not. A big part of that comes from the fact that I don't consider two copies of myself to be intrinsically more valuable than one - perhaps instrumentally valuable, if those copies can interact, sync their experiences and cooperate, but that's another matter. With experience-syncing, I am mostly indifferent to the number of copies of myself to exist (leaving aside potential instrumental benefits), but without it I evaluate decreasing utility as the number of copies increases, as I assign zero terminal value to multiplicity but positive terminal value to the uniqueness of my identity.

My brand of utilitarianism is informed substantially by these preferences. I adhere to neither average nor total utilitarianism, but I lean closer to average. Whilst I would be against the use of force to turn a population of 10 with X utility each into a population of 3 with (X + 1) utility each, I would in isolation consider the latter preferable to the former (there is no inconsistency here - my utility function simply admits information about the past).

Replies from: None
comment by [deleted] · 2014-06-29T06:41:43.223Z · LW(p) · GW(p)

That line of thinking leads directly to recommending immediate probabilistic suicide, or at least indifference to it. No thanks.

Replies from: DefectiveAlgorithm
comment by DefectiveAlgorithm · 2014-06-29T07:28:26.002Z · LW(p) · GW(p)

How so?

comment by Houshalter · 2014-06-28T15:04:04.750Z · LW(p) · GW(p)

I'm saying that you can only observe not dying. Not that you shouldn't care about universes that you don't exist in or observe.

The risk in Russian roulette is, in the worlds where you do survive you will probably be lobotomized, or drop the gun shooting someone else, etc. Ignoring that, there is no risk. As long as you don't care about universes where you die.

Replies from: None
comment by [deleted] · 2014-06-28T16:18:58.080Z · LW(p) · GW(p)

As long as you don't care about universes where you die.

Ok. I find this assumption absolutely crazy, but at least I comprehend what you are saying now.

Replies from: Houshalter
comment by Houshalter · 2014-06-28T16:32:43.467Z · LW(p) · GW(p)

Well think of it this way. You are dead/non-existent in the vast majority of universes as it is.

Replies from: None
comment by [deleted] · 2014-06-28T17:11:42.102Z · LW(p) · GW(p)

How is that relevant? If I take some action that results in the death of myself in some other Everett branch, then I have killed a human being in the multiverse.

Think about applying your argument to this universe. You shoot someone in the head, they die instantly, and then you say to the judge "well think of it this way: he's not around to experience this. besides, there's other worlds where I didn't shoot him, so he's not really dead!"

Replies from: Houshalter
comment by Houshalter · 2014-06-28T23:22:32.767Z · LW(p) · GW(p)

You can't appeal to common sense. That's the point of quantum immortality, it defies our common sense notions about death. Obviously, since we are used to assuming single-threaded universe, where death is equivalent to ceasing to exist.

Of course, if you kill someone, you still cause that person pain in the vast majority of universes, as well as grieving to their family and friends.

If star-trek-style teleportation was possible by creating a clone and deleting the original, is that equivalent to suicide/murder/death? If you could upload your mind to a computer but destroy your biological brain, is that suicide, and is the upload really you? Does destroying copies really matter as long as one lives on (assuming the copies don't suffer)?

Replies from: None
comment by [deleted] · 2014-06-29T00:05:15.601Z · LW(p) · GW(p)

You absolutely appeal to common sense on moral issues. Morality is applied common sense, in the Minsky view of "common sense" being an assortment of deductions and inferences extracted from the tangled web of my personal experiential and computational history. Morality is the result of applying that common sense knowledgebase against possible actions in a planning algorithm.

Quantum "immortality" involves a sudden, unexpected, and unjustified redefinition of "death." That argument works if you buy the premise. But, I don't.

If you are saying that there is no difference between painlessly, instantaneously killing someone in one branch while letting them live another, verses letting that person live in both, then I don't know how to proceed. If you're going to say that then you might as well make yourself indifferent to the arrow of time as well, in which case it doesn't matter if that person dies in all branches because he still "exists" in history.

Now I no longer know what we are talking about. According to my morality, it is wrong to kill someone. The existence of other branches where that person does not die does not have even epsilon difference on my evaluation of moral choices in this world. The argument from the other side seems inconsistent to me.

And yes, star trek transporters and destructive uploaders are death machines, a position I've previously articulated on lesswrong.

Replies from: Houshalter
comment by Houshalter · 2014-06-29T16:56:45.428Z · LW(p) · GW(p)

You are appealing to a terminal value that I do not share. I think caring about clones is absurd. As long as one copy of me lives, what difference does it make if I create and delete a thousand others? It doesn't change my experience or theirs. Nothing would change and I wouldn't even be aware of it.

Replies from: CCC
comment by CCC · 2014-06-30T09:52:35.524Z · LW(p) · GW(p)

From my point of view, I do not like the thought that I might be arbitrarily deleted by a clone of myself. I therefore choose to commit to not deleting clones of myself; thus preventing myself from being deleted by any clones that share that commitment.

comment by wedrifid · 2013-10-02T03:37:02.224Z · LW(p) · GW(p)

If you can not observe a catastrophe happen, then not observing a catastrophe is not evidence for any hypothesis.

I don't think this is quite true (it can redistribute probability between some hypotheses). But this strengthens your position rather than weakening it.

comment by [deleted] · 2013-10-02T03:20:31.155Z · LW(p) · GW(p)

Ok, correct.

Retracted: Not correct. What was I thinking? Just because you don't observe the universes where the world was destroyed, doesn't mean those universes don't exist.

comment by Stuart_Armstrong · 2010-01-15T10:20:19.055Z · LW(p) · GW(p)

That's the justification he gave me: he won't be able to make much of a difference to the subject, so he won't be generating much risk.

Since he's going to do it anyway, I was wondering whether there were safer ways of doing so.

comment by Vladimir_Nesov · 2010-01-14T16:55:34.312Z · LW(p) · GW(p)

For useful-tool AI, learn stuff from statistics and machine learning before making any further moves.

For self-improving AI, just don't do it as AI, FAI is not quite an AI problem, and anyway most techniques associated with "AI" don't work for FAI. Instead, learn fundamental math and computer science, to a good level -- that's my current best in-a-few-words advice for would-be FAI researchers.

Replies from: Wei_Dai
comment by Wei Dai (Wei_Dai) · 2010-01-14T19:09:28.682Z · LW(p) · GW(p)

Isn't every AI potentially a self-improving AI? All it takes is for the AI to come upon the insight "hey, I can build an AI to do my job better." I guess it requires some minimum amount of intelligence for such an insight to become likely, but my point is that one doesn't necessarily have to set out to build a self-improving AI, to actually build a self-improving AI.

Replies from: Morendil
comment by Morendil · 2010-01-14T19:55:03.209Z · LW(p) · GW(p)

I'm very much out of touch with the AI scene, but I believe the key distinction is between Artificial General Intelligence, versus specialized approaches like chess-playing programs or systems that drive cars.

A chess program's goal structure is strictly restricted to playing chess, but any AI with the ability to formulate arbitrary sub-goals could potentially stumble on self-improvement as a sub-goal.

Replies from: JGWeissman, Wei_Dai
comment by JGWeissman · 2010-01-14T20:03:44.732Z · LW(p) · GW(p)

Additionally, the actions that a chess AI can consider and take are limited to moving pieces on a virtual chess board, and the consequences of such actions that it considers are limited to the state of the chess game, with no model of how the outside world affects the opposing moves other than the abstract assumption that the opponent will make the best move available. The chess AI simply does not have any awareness of anything outside the chess game.

Replies from: wedrifid
comment by wedrifid · 2010-01-15T07:37:17.688Z · LW(p) · GW(p)

with no model of how the outside world affects the opposing moves other than the abstract assumption that the opponent will make the best move available.

A good chess AI would not be so constrained. A history of all chess games played by the particular opponent would be quite useful. As would his psychology

Additionally, the actions that a chess AI can consider and take are limited to moving pieces on a virtual chess board

Is it worth me examining the tree beyond this particular move further? How long will it take me (metacognitive awareness...) relative to my time limit?

The chess AI simply does not have any awareness of anything outside the chess game.

Unless someone gives them such awareness, which may be useful in some situations or may just seem useful to naive developers who get their hands on more GAI research than they can safely handle.

Replies from: ChristianKl
comment by ChristianKl · 2010-01-16T18:30:42.140Z · LW(p) · GW(p)

A history of all chess games played by the particular opponent would be quite useful.

Such a history would also contain of a list of move on a virtual chess game.

Unless someone gives them such awareness, which may be useful in some situations or may just seem useful to naive developers

If you are very naive it's unlikely that you understand the problem of AI well enough to solve it.

comment by Wei Dai (Wei_Dai) · 2010-01-14T20:25:36.330Z · LW(p) · GW(p)

Today's specialized AIs have little chance of becoming self-improving, but as as specialized AIs adopt more advanced techniques (like the ones Nesov suggested), the line between specialized AIs and AGIs won't be so clear. After all, chess-playing and car-driving programs can always be implemented as AGIs with very specific and limited super-goals, so I expect that as AGI techniques advance, people working on specialized AIs will also adopt them, but perhaps without giving as much thought about the AI-foom problem.

Replies from: ChristianKl
comment by ChristianKl · 2010-01-16T18:34:09.376Z · LW(p) · GW(p)

I would think that specialization reduces the variant trees that the AI has to consider which makes it unlikely that implenting AGI techniques would help the chess playing program.

Replies from: MarkusRamikin
comment by MarkusRamikin · 2011-06-24T12:57:27.137Z · LW(p) · GW(p)

It is not clear to me that the AGI wouldn't (eventually) be able to do everything that a specialised program would (and more). After all, humans are a general intelligence and can specialise; some of us are great chess players, and if we stretch the word specialise, creating a chess AI also counts (it's a human effort to create a better optimisation process for winning chess).

So I imagine an AGI, able to rewrite its own code, would at the same time be able to develop the techniques of specialised AIs, while considering broader issues that might also be of use (like taking over the world/lightcone to get more processing power for playing chess). Just like humanity making chess machines, it could discover and implement better techniques (and if it breaks out of the box, hardware), something the chess programs themselves cannot do.

Or maybe I'm nuts. /layman ignoramus disclaimer/ but in that case I'd appreciate a hint at the error I'm making (besides being a layman ignoramus). :)

EDIT: scary idea, but an AGI with the goal of becoming better at chess might only not kill us because chess is perhaps a problem that's generally soluble with finite resources.

comment by JamesAndrix · 2010-01-19T15:57:06.347Z · LW(p) · GW(p)

Create a hardware device that would be fatal to the programmer. Allow it to be activated by a primitive action that the program could execute. Give the primitive a high apparent utility. Code the AI however he wants.

If he gets cold sweats every time he does a test run, the rest of us will probably be OK.

comment by Roko · 2010-01-15T17:14:07.333Z · LW(p) · GW(p)

I suggest that working in the field of brain emulation is a way for anyone to actively contribute to safety.

If emulations come first, it won't take a miracle to save the human race; our existing systems of politics and business will generate a satisficing solution.

Replies from: timtyler, Vladimir_Nesov
comment by timtyler · 2010-01-15T23:10:36.852Z · LW(p) · GW(p)

I figure that would be slow, ineffectual and probably more dangerous than other paths in the unlikely case that it was successful.

Replies from: Roko
comment by Roko · 2010-01-15T23:20:20.805Z · LW(p) · GW(p)

you think that there's something more dangerous than the human race, who can't quite decide whether global warming should be mitigated against, trying to build an AI, where you have to get the answer pretty close to perfect first time, whilst also preventing all other groups from rushing to beat you and building uFAI?

Replies from: timtyler
comment by timtyler · 2010-01-15T23:35:10.056Z · LW(p) · GW(p)

I'm not sure that is a proper sentence.

I do think that we could build something more dangerous to civilization than the human race is at that time - but that seems like a rather obvious thing to think - and the fact that it is possible does not necessarily mean that it is likely.

Replies from: JamesAndrix
comment by JamesAndrix · 2010-01-17T00:21:19.826Z · LW(p) · GW(p)

Key Noun phrase: the human race,..., trying to build an AI,

Then: {description of difficulty of said activity}

I'm not sure it's proper either, but I'm sure you misparsed it.

Replies from: timtyler
comment by timtyler · 2010-01-17T01:45:51.655Z · LW(p) · GW(p)

Yay, that really helped!

Roko and I don't see eye to eye on this issue. From my POV, we have had 50 years of unsuccessful attempts. That is not exactly "getting it right the first time".

Google was not the first search engine, Microsoft was not the first OS maker - and Diffie–Hellman didn't invent public key crypto.

Being first does not necessarily make players uncatchable - and there's a selection process at work in the mean time, that weeds out certain classes of failures.

From my perspective, this is mainly a SIAI confusion. Because their funding is all oriented around the prospect of them saving the world from imminent danger, the execution of their mission apparently involves exaggerating the risks associated with that - which has the effect of stimulating funding from those who they convince that DOOM is imminent - and that the SIAI can help with averting in.

Humans will most likely get the machines they want - because people will build them to sell them - and because people won't buy bad machines.

Replies from: Roko
comment by Roko · 2010-01-17T02:01:17.544Z · LW(p) · GW(p)

Tim, I think that what worries me is the "detailed reliable inheritance from human morals and meta-morals" bit. The worry that there will not be "detailed reliable inheritance from human morals and meta-morals" is robust to what specific way you think the future will go. Ems can break the inheritance. The first, second or fifteenth AGI system can break it. Intelligence enhancement gone wrong can break it. Any super-human "power" that doesn't explicitly preserve it will break it.

All the examples you cite differ in the substantive dimension: the failure of attempt number 1 doesn't preclude the success of attempt number two.

In the case of the future of humanity, the first failure to pass the physical representation of human morals and metamorals on to the next timeslice of the universe is game over.

Replies from: timtyler, timtyler
comment by timtyler · 2010-01-17T10:20:00.833Z · LW(p) · GW(p)

The other thing to say is that there's an important sense in which most modern creatures don't value anything - except for their genetic heritage - which all living things necessarily value.

Contrast with a gold-atom maximiser. That values collections of pure gold atoms. It cares about something besides the survival of its genes (which obviously it also cares about - no genes, no gold). It strives to leave something of value behind.

Most modern organisms don't leave anything behind - except for things that are inherited - genes and memes. Nothing that they expect to last for long, anyway. They keep dissipating energy gradients until everything is obliterated in high-entropy soup.

Those values are not very difficult to preserve - they are the default state.

If ecosystems cared about creating some sort of low-entropy state somewhere, then that property would take some effort to preserve (since it is vulnerable to invasion by creatures who use that low-entropy state as fuel). However, with the current situation, there aren't really any values to preserve - except for those of the replicators concerned.

The idea has been called variously: goal system zero, god's utility function, Shiva's values.

Even the individual replicators aren't really valued in themselves - except by themselves. There's a parliament of genes, and any gene is expendable, on a majority vote. Genes are only potentially immortal. Over time, the representation of the original genes drops. Modern refactoring techniques will mean it will drop faster. There is not really a floor to the process - eventually, all may go.

comment by timtyler · 2010-01-17T09:56:32.959Z · LW(p) · GW(p)

I figure a fair amount of modern heritable information (such as morals) will not be lost. Civilization seems to be getting better at keeping and passing on records. You pretty-much have to hypothesize a breakdown of civilization for much of genuine value to be lost - an unprecedented and unlikely phenomenon.

However, I expect increasing amounts of it to be preserved mostly in history books and museums as time passes. Over time, that will probably include most DNA-based creatures - including humans.

Evolution is rather like a rope. Just as no strand in a rope goes from one end to the other, most genes don't tend to do that either. That doesn't mean the rope is weak, or that future creatures are not - partly - our descendants.

Replies from: Roko, Technologos
comment by Roko · 2010-01-17T11:45:06.195Z · LW(p) · GW(p)

And how do museums lead to more paperclips?

Replies from: timtyler
comment by timtyler · 2010-01-17T13:34:23.869Z · LW(p) · GW(p)

Museums have some paperclips in them. You have to imagine future museums as dynamic things that recreate and help to visualise the past - as well as preserving artefacts.

Replies from: orthonormal
comment by orthonormal · 2010-01-17T18:38:35.403Z · LW(p) · GW(p)

If you were an intelligence only cared about the number of paperclips in the universe, you would not build a museum to the past, because you could make more paperclips with the resources needed to create such a museum.

This is not some clever, convoluted argument. This is the same as saying that if you make your computer execute

10: GOTO 20

20: GOTO 10

then it won't at any point realize the program is "stupid" and stop looping. You could even give the computer another program which is capable of proving that the first one is an infinite loop, but it won't care, because its goal is to execute the first program.

Replies from: timtyler
comment by timtyler · 2010-01-17T19:10:55.563Z · LW(p) · GW(p)

That's a different question - and one which is poorly specified:

If insufficient look-ahead is used, such an agent won't bother to remember its history - prefering instead the gratification of instant paperclips.

On the other hand, if you set the look-ahead further out, it will. That's because most intelligent agents are motivated to remember the past - since only by remembering the past can they predict the future.

Understanding the history of their own evolution may well help them to understand the possible forms of aliens - which might well help them avoid being obliterated by alien races (along with all the paper clips they have made so far). Important stuff - and well worth building a few museums over.

Remebering the past is thus actually an proximate goal for a wide range of agents. If you want to argue paperclip-loving agents won't build museums, you need to be much more specific about which paperclip-loving agents you are talking about - because some of them will.

Once you understand this you should be able to see what nonsense the "value is fragile" post is.

Replies from: orthonormal, kim0
comment by orthonormal · 2010-01-17T19:51:29.854Z · LW(p) · GW(p)

At this point, I'm only saying this to ensure you don't take any new LWers with you in your perennial folly, but your post has anthropomorphic optimism written all over it.

Replies from: timtyler
comment by timtyler · 2010-01-17T20:47:07.268Z · LW(p) · GW(p)

This has nothing to do with anthropomorphism or optimism - it is a common drive for intelligent agents to make records of their pasts - so that they can predict the consequences of their actions in the future.

Once information is lost, it is gone for good. If information might be valuable in the future, a wide range of agents will want to preserve it - to help them attain their future goals. These points do not seem particularly complicated.

I hope at least that you now realise that your "loop" analogy was wrong. You can't just argue that paperclipping agents will not have preserving the past in museums as a proximate goal - since their ultimate goal involves making paperclips. There is a clear mechanism by which preserving their past in museums might help them attain that goal in the long term.

A wide class of paperclipping agents who are not suffering from temporal myopia should attempt to conquer the universe before wasting precious time and resources with making any paperclips. Once the universe is securely in their hands - then they can get on with making paperclips. Otherwise they run a considerable risk of aliens - who have not been so distracted with useless trivia - eating them, and their paperclips. They will realise that they are in an alien race - and so they will run.

Replies from: Bo102010
comment by Bo102010 · 2010-01-18T00:12:14.658Z · LW(p) · GW(p)

Did you make some huge transgression that I missed that is causing people to get together and downvote your comments?

Edit: My question has now been answered.

Replies from: AdeleneDawner, wedrifid, ciphergoth, timtyler
comment by AdeleneDawner · 2010-01-18T00:17:06.534Z · LW(p) · GW(p)

I haven't downvoted, but I assume it's because he's conflating 'sees the value in storing some kinds of information' with 'will build museums'. Museums don't seem to be particularly efficient forms of data-storage, to me.

Replies from: timtyler
comment by timtyler · 2010-01-18T07:58:40.426Z · LW(p) · GW(p)

Future "museums" may not look exactly like current ones - and sure - some information will be preserved in "libraries" - which may not look exactly like current ones either - and in other ways.

Replies from: AdeleneDawner
comment by AdeleneDawner · 2010-01-18T08:15:32.550Z · LW(p) · GW(p)

'Museum' and 'library' both imply, to me at least, that the data is being made available to people who might be interested in it. In the case of a paperclipper, that seems rather unlikely - why would it keep us around, instead of turning the planet into an uninhabitable supercomputer that can more quickly consider complex paperclip-maximization strategies? The information about what we were like might still exist, but probably in the form of the paperclipper's 'personal memory' - and more likely than not, it'd be tagged as 'exploitable weaknesses of squishy things' rather than 'good patterns to reproduce', which isn't very useful to us, to say the least.

Replies from: timtyler
comment by timtyler · 2010-01-18T18:27:12.238Z · LW(p) · GW(p)

I see. We have different connotations of the word, then. For me, a museum is just a place where objects of historical interest are stored.

When I talked about humans being "preserved mostly in history books and museums" - I was intending to conjour up an institution somewhat like the Jurassic park theme park. Or perhaps - looking further out - something like The Matrix. Not quite like the museum of natural history as it is today - but more like what it will turn into.

Regarding the utility of existence in a museum - it may be quite a bit better than not existing at all.

Regarding the reason for keeping objects of historical around - that is for much the same reason as we do today - to learn from them, and to preserve them for future generations to study. They may have better tools for analysing things with in the future. If the objects of study are destroyed, future tools will not be able to access them.

comment by wedrifid · 2010-01-18T00:27:23.813Z · LW(p) · GW(p)

Did you make some huge transgression that I missed that is causing people to get together and downvote your comments?

Not really, just lots of little ones involving the misuse of almost valid ideas. They get distracting.

Replies from: timtyler, kim0
comment by timtyler · 2010-01-18T19:56:18.255Z · LW(p) · GW(p)

That's pretty vague. Care to point to something specific?

Replies from: wedrifid
comment by wedrifid · 2010-01-18T21:34:41.477Z · LW(p) · GW(p)

The direct ancestors are perhaps not the most illustrative examples but they will do. (I downvoted them on their perceived merit completely independently of the name.)

Replies from: timtyler
comment by timtyler · 2010-01-18T23:44:59.036Z · LW(p) · GW(p)

A pathetic example, IMHO. Those were perfectly reasonable comments attempting to dispel a poster's inaccurate beliefs about the phenomenon in question.

Replies from: wedrifid
comment by wedrifid · 2010-01-19T00:13:01.056Z · LW(p) · GW(p)

A pathetic example, IMHO.

Feel free to provide a better one.

Those were perfectly reasonable comments attempting to dispel a poster's inaccurate beliefs about the phenomenon in question.

I disagree. That was what you were trying to do. You aren't a troll, you are just quite bad at thinking so your posts often get downvoted. This reduces the likelyhood that you successfully propagate positions that are unfounded.

Clippy museums. Right.

Replies from: timtyler
comment by timtyler · 2010-01-19T00:25:58.192Z · LW(p) · GW(p)

Yet another vague accusation that is not worth replying to.

I'm getting bored with this pointless flamewar. I can see that the mere breath of dissent causes the community to rise up in arms to nuke the dissenter. Great fun for you folk, I am sure - but I can't see any good reason for me to play along with your childish games.

Replies from: wedrifid
comment by wedrifid · 2010-01-19T00:37:49.669Z · LW(p) · GW(p)

Yet another vague accusation that is not worth replying to.

It's really not. Nothing good can come of this exchange, least of all to you.

I'm getting bored with this pointless flamewar.

People ask questions. People get answers. You included.

I can see that the mere breath of dissent causes the community to rise up in arms to nuke the dissenter.

No, you're actually just wrong and absurdly so. Clippy doesn't need you for his museum.

Great fun for you folk, I am sure

It isn't wise for me to admit it but yes, there is a certain amount of satisfaction to be derived from direct social competition. I'm human, I'm male.

but I can't see any good reason for me to play along with your childish games.

I agree (without, obviously, accepting the label). You are better off sticking to your position and finding ways to have your desired influence that avoid unwanted social penalties.

Replies from: orthonormal
comment by orthonormal · 2010-01-19T05:26:58.359Z · LW(p) · GW(p)

It isn't wise for me to admit it but yes, there is a certain amount of satisfaction to be derived from direct social competition. I'm human, I'm male.

Upvoted for honesty. It's far better to be aware of it than not to be.

Anyhow, I think you don't really need to add anything more at this point; the thread looks properly wrapped up to me.

comment by kim0 · 2010-01-18T20:39:06.960Z · LW(p) · GW(p)

You got voted down because you were rational. You went over some peoples heads.

These are popularity points, not rationality points.

Replies from: orthonormal
comment by orthonormal · 2010-01-18T21:02:22.458Z · LW(p) · GW(p)

That is something we worry about from time to time, but in this case I think the downvotes are justified. Tim Tyler has been repeating a particular form of techno-optimism for quite a while, which is fine; it's good to have contrarians around.

However, in the current thread, I don't think he's taking the critique seriously enough. It's been pointed out that he's essentially searching for reasons that even a Paperclipper would preserve everything of value to us, rather than just putting himself in Clippy's place and really asking for the most efficient way to maximize paperclips. (In particular, preserving the fine details of a civilization, let alone actual minds from it, is really too wasteful if your goal is to be prepared for a wide array of possible alien species.)

I feel (and apparently, so do others) that he's just replying with more arguments of the same kind as the ones we generally criticize, rather than finding other types of arguments or providing a case why anthropomorphic optimism doesn't apply here.

In any case, thanks for the laugh line:

You went over some peoples heads.

My analysis of Tim Tyler in this thread isn't very positive, but his replies seem quite clear to me; I'm frustrated on the meta-level rather than the object-level.

Replies from: timtyler, ciphergoth, kim0
comment by timtyler · 2010-12-10T17:46:28.461Z · LW(p) · GW(p)

It's been pointed out that he's essentially searching for reasons that even a Paperclipper would preserve everything of value to us, rather than just putting himself in Clippy's place and really asking for the most efficient way to maximize paperclips.

I don't think that a paperclip maximiser would "preserve everything of value to us" in the first place. What I actually said at the beginning was:

TT: I figure a fair amount of modern heritable information (such as morals) will not be lost.

Not everything. Things are constantly being lost.

In particular, preserving the fine details of a civilization, let alone actual minds from it, is really too wasteful if your goal is to be prepared for a wide array of possible alien species.

What I said here was:

TT: it is a common drive for intelligent agents to make records of their pasts - so that they can predict the consequences of their actions in the future.

We do, in fact, have detailed information about how much our own civilisation is prepared to spend on preserving its own history. We preserve many things which are millions of years old - and which take up far more resources than a human. For example, see how this museum dinosaur dwarfs the humans in the foreground. We have many such exhibits - and we are still a planet-bound civilisation. Our descendants seem likely to have access to much greater resources - and so may devote a larger quantity of absolute resources to museums.

So: that's the basis of my estimate. What is the basis of your estimate?

comment by Paul Crowley (ciphergoth) · 2010-01-18T21:42:34.008Z · LW(p) · GW(p)

I agree with your criticism, but I doubt that good will come of replying to a comment like the one you're replying to here, I'm afraid.

Replies from: orthonormal
comment by orthonormal · 2010-01-18T22:01:48.428Z · LW(p) · GW(p)

Fair enough; I should have replied to Tim directly, but couldn't pass up the laugh-line bit.

comment by kim0 · 2010-01-18T23:39:30.434Z · LW(p) · GW(p)

The real dichotomy here is "maximising evaluation function" versus "maximising probability of positive evaluation function"

In paperclip making, or better, the game of Othello/Reversi, there are choices like this:

80% chance of winning 60-0, versus 90% chance of winning 33-31.

The first maximises the winning, and is similar to a paperclip maker consuming the entire universe. The second maximises the probability of succeeding, and is similar to a paperclip maker avoiding being annihilated by aliens or other unknown forces.

Mathematically, the first is similar to finding the shortest program in Kolmogorov Complexity, while the second is similar to integrating over programs.

So, friendly AI is surely of the second kind, while insane AI is of the first kind.

Replies from: kim0, Benquo, MarkusRamikin
comment by kim0 · 2010-01-19T08:29:24.232Z · LW(p) · GW(p)

I guess you down-voters of me felt quite rational when doing so.

And this is precisely the reason I seldom post here, and only read a few posters that I know are rational from their own work on the net, not from what they write here:

There are too many fake rationalists here. The absence of any real arguments either way to my article above, is evidence of this.

My Othello/Reversi example above was easy to understand, and a very central problem in AI systems, so it should be of interest to real rationalists interested in AI, but there is only negative reaction instead, from people I guess have not even made a decent game playing AI, but nevertheless have strong opinions on how they must be.

So, for getting intelligent rational arguments on AI, this community is useless, as opposed to Yudkowsky, Schmidhuber, Hansen, Tyler, etc. which has shown on their own sites that they have something to contribute.

To get real results in AI and rationality, I do my own math and science.

Replies from: GuySrinivasan
comment by GuySrinivasan · 2010-01-19T08:49:37.789Z · LW(p) · GW(p)

Your Othello Reversi example is fundamentally flawed, but it may not seem like it unless you realize that at LW the tradition is to say that utility is linear in paperclips to Clippy. That may be our fault, but there's your explanation. "Winning 60-0", to us using our jargon, is equivalent to one paperclip, not 60. And "winning 33-31" is also equivalent to one paperclip, not 33. (or they're both equivalent to x paperclips, whatever)

So when I read your example, I read it as "80% chance of 1 paperclip, or 90% chance of 1 paperclip".

I'm sure it's very irritating to have your statement miscommunicated because of a jargon difference (paperclip = utility rather than f(paperclip) = utility)! I encourage you to post anyway, and begin with the assumption that we misunderstand you rather than the assumption that we are "fake rationalists", but realize that in the current environment (unfortunately or not, but there it is) the burden of communication is on the poster.

comment by Benquo · 2011-06-24T14:36:55.344Z · LW(p) · GW(p)

While most of this of this seems sensible, I don't understand how your last sentence follows. I have heard similar strategies suggested to reduce the probability of paperclipping, but it seems like if we actually succeed in producing a true friendly AI, the quantity it tries to maximize (expected winning, P(winning), or something else) will depend on how we evaluate outcomes.

comment by MarkusRamikin · 2011-06-24T11:51:21.226Z · LW(p) · GW(p)

This made some sense to me, at least to the point where I'd expect an intelligent refutation from disagreers, and seems posted in good faith. What am I missing about the voting system? Or about this post.

comment by Paul Crowley (ciphergoth) · 2010-01-18T08:53:13.239Z · LW(p) · GW(p)

Your use of "get together" brings to mind some sort of Less Wrong cabal who gathered to make a decision. This is of course the opposite of the truth, which is that each downvote is the result of someone reading the thread and deciding to downvote the comment. They're not necessarily uncorrelated, but "get together" is completely the wrong way to think about how these downvotes occur.

Replies from: Bo102010
comment by Bo102010 · 2010-01-18T13:16:27.506Z · LW(p) · GW(p)

Actually, that's what I was meaning to evoke. I read his recent comments, and while I didn't agree with all of them, didn't find them to be in bad faith. I found it odd that so many of them would be at -3, and wondered if I missed something.

Replies from: ciphergoth
comment by Paul Crowley (ciphergoth) · 2010-01-18T13:20:58.986Z · LW(p) · GW(p)

In seriousness, why would you deliberately evoke a hypothesis that you know is wildly unrealistic? Surely whatever the real reasons for the downvoting pattern are, they are relevant to your enquiry?

Replies from: Bo102010
comment by Bo102010 · 2010-01-18T13:24:41.871Z · LW(p) · GW(p)

Perhaps "cabal who gathered to make a decision [to downvote]" is an overly ominous image.

However, we've seen cases where every one of someone's comments has been downvoted in a short span of time, which is clearly not the typical reason for a downvoting.

That's the kind of thing I was asking about.

Replies from: RobinZ, ciphergoth
comment by RobinZ · 2010-01-18T16:00:54.692Z · LW(p) · GW(p)

It is possible the first downvote tends to attract further downvotes (by priming, for example), but an equally parsimonious explanation is that there are several people refreshing the comments page at a time and a subset of them dislike the content independently.

comment by Paul Crowley (ciphergoth) · 2010-01-18T13:41:21.696Z · LW(p) · GW(p)

But you can still be very confident that actual collusion wasn't involved, so you shouldn't be talking as if it might have been.

EDIT: as always I'm keen to know why the downvote - thanks! My current theory is that they come across as hostile, which they weren't meant to, but I'd value better data than my guesses.

comment by timtyler · 2010-01-18T08:03:06.003Z · LW(p) · GW(p)

One hypothesis is that people can't offer counter-arguments - but they don't like the conclusions - because they are contrary to the received wisdom. That creates cognitive dissonance in them - and they have to find an outlet.

Replies from: ciphergoth
comment by Paul Crowley (ciphergoth) · 2010-01-18T08:50:13.380Z · LW(p) · GW(p)

Really, never write comments proffering self-aggrandising explanations of why your comments are being badly received. You are way too smart and thoughtful to go all green ink on us like this.

Replies from: timtyler, timtyler
comment by timtyler · 2010-01-18T18:14:06.505Z · LW(p) · GW(p)

Hah! I had to look up http://en.wikipedia.org/wiki/Green_ink

I like my comment - and think it shows my sense of humour. If you were among those who were not amused, then sorry! ;-)

I do usually keep off karma-related sub-threads. They are mostly noise to me. However, here, I was asked a direct question.

Anyway, if people here who disagree with me can't be bothered to argue, I don't see how will they ever learn anything ;-) <--

Replies from: ciphergoth
comment by Paul Crowley (ciphergoth) · 2010-01-18T22:52:05.884Z · LW(p) · GW(p)

It may amuse you to know, I am told by my lawyer partners that a really astonishing proportion of crazy people who write to you really do use green ink. It seems astonishing to imagine that there really could be a correlation between sanity and favoured ink colour.

Replies from: Wei_Dai, AdeleneDawner, David_Gerard
comment by Wei Dai (Wei_Dai) · 2010-01-18T23:07:24.316Z · LW(p) · GW(p)

Yikes, did anyone else notice the amount of green on this site?

Replies from: thomblake, Furcas
comment by thomblake · 2010-01-19T18:31:51.418Z · LW(p) · GW(p)

I was just noticing that myself. Withdrawing my request to change the color scheme.

comment by Furcas · 2010-01-18T23:15:56.889Z · LW(p) · GW(p)

We all write in black, though. :P

Mostly.

comment by AdeleneDawner · 2010-01-19T10:17:50.285Z · LW(p) · GW(p)

I should be astonished, but I'm not. A statistically unlikely proportion of the autistics I know have a strong preference for the color purple (and I don't think I know any NTs with that preference), so the idea that color preference is a function of neurotype doesn't seem too odd to me.

Replies from: Normal_Anomaly
comment by Normal_Anomaly · 2011-06-27T17:19:26.992Z · LW(p) · GW(p)

As an Asperger's person who loves purple (and dislikes green) this thread is quite amusing.

comment by David_Gerard · 2010-12-10T00:09:18.404Z · LW(p) · GW(p)

This is the first place I have ever seen anyone say that green ink writers use actual green ink. Added to RW!

comment by timtyler · 2010-01-18T19:42:04.738Z · LW(p) · GW(p)

Incidentally, I hope you don't mean the "self-aggrandising" / "green ink" comments literally!

Disagreeing with majorities is often a bad sign. Delusional individuals may create "green ink" explanations of why others are foolish enough to disagree with them. However, critics may also find themselves disagreeing with majorities - for example when in the company of the associates of those being criticised. That is fairly often my role here. I am someone not in the thrall of the prevailing reality distortion field. Under those circumstances disagreements do not have the same significance.

Replies from: RobinZ
comment by RobinZ · 2010-01-18T20:55:30.156Z · LW(p) · GW(p)

Disagreeing with majorities is often a bad sign. Delusional individuals may create "green ink" explanations of why others are foolish enough to disagree with them. However, critics may also find themselves disagreeing with majorities - for example when in the company of the associates of those being criticised. That is fairly often my role here. I am someone not in the thrall of the prevailing reality distortion field. Under those circumstances disagreements do not have the same significance.

The indicated sections are green ink - claims which are easy to make regardless of the rectitude of your opinion, and which therefore are made by fools with higher-than-normal frequency.

Replies from: timtyler, orthonormal, DWCrmcm
comment by timtyler · 2010-01-18T23:35:13.323Z · LW(p) · GW(p)

I recommend you check with http://en.wikipedia.org/wiki/Green_ink

Arguing that fools make statement X with greater-than-average frequency is a rather feeble argument that someone making statement X is a fool. I am not sure why you are even bothering to present it.

comment by orthonormal · 2010-01-18T21:07:33.556Z · LW(p) · GW(p)

Well, the first bold section is a true, general and relevant statement.

I won't say what my estimate of a person's rationality would be, given only the information that they had written the second bold section somewhere on the internet; but it wouldn't be 100% crank, either.

Replies from: RobinZ
comment by RobinZ · 2010-01-18T22:16:13.917Z · LW(p) · GW(p)

Well, the first bold section is a true, general and relevant statement.

That doesn't mean the ink isn't green. In this particular case, he is persistently claiming that his remarks are being attacked due to various sorts of biases on the parts of those reading it, and he is doing so:

  • without detailed evidence, and
  • instead of either (a) clarifying his remarks or (b) dropping the subject.

That's green ink.

Edited for pronouns.

Edited for pronouns again, properly this time. Curse you, Picornaviridae Rhinovirus!

Replies from: timtyler, orthonormal
comment by timtyler · 2010-01-18T23:23:07.378Z · LW(p) · GW(p)

I think http://en.wikipedia.org/wiki/Green_ink makes it pretty clear that green ink is barely-coherent rambling coming from nutcases.

Someone disagreeing with other people and explaining why he thinks they are wrong is not "green ink" - unless that individual is behaving in a crazy fashion.

I don't think anyone has any evidence that my behaviour is anything other than rational and sane in this case. At any rate, so far no such evidence has been presented AFAICS. So: I think "green ink" is a fairly clear mis-characterisation.

Replies from: ciphergoth, RobinZ
comment by Paul Crowley (ciphergoth) · 2010-01-18T23:27:37.193Z · LW(p) · GW(p)

No, green ink covers a much wider span of writing than that. And honestly, no matter what disagreements you find yourself having with a group of people, and this would include circumstances where you were the only rationalist in a room full of crystal healers, you should never find yourself uttering the phrase "I am someone not in the thrall of the prevailing reality distortion field".

Replies from: timtyler
comment by timtyler · 2010-01-18T23:39:11.117Z · LW(p) · GW(p)

Um - why not?

I think that is just a difference of personalities.

If I am in a region where there's a reality distortion field in action, I don't necessarily avoid pointing that out for the sake of everyone's feelings - or for some other reason.

That would let the participants continue in their trance - and that might not be good for them, or others they interact with.

Replies from: Vladimir_Nesov
comment by Vladimir_Nesov · 2010-01-19T01:23:37.212Z · LW(p) · GW(p)

You can point something out, but it is an act of petty sabotage to repeat the same statements over and over again with no apparent effect but to irritation of the public. Even if you are in fact right, and the other guys are lunatics.

Replies from: timtyler
comment by timtyler · 2010-01-19T07:16:25.595Z · LW(p) · GW(p)

Hi, Vladimir! I don't know what you are talking about, why you are bothering to say it - and nor do I much care.

comment by RobinZ · 2010-01-18T23:30:20.070Z · LW(p) · GW(p)

I have nothing to say at the moment regarding your actual argumentation upthread - what I am criticizing is your reaction to the downvoting et seq. I don't care what you call it: stop.

Replies from: timtyler
comment by timtyler · 2010-01-18T23:42:03.442Z · LW(p) · GW(p)

What was wrong with that?

Someone asked me why I was being downvoted.

I gave them my best hypothesis.

You want me to lie? You think my hypothesis was inaccurate? What exactly is the problem you have?

On the other hand, if you genuinely want me to stop defending my actions, it would help if people first stop attacking them - perhaps starting with you.

Replies from: RobinZ
comment by RobinZ · 2010-01-19T00:20:31.638Z · LW(p) · GW(p)

You are acting as if you are obviously correct. That is true far less often than you suppose. If you are not obviously correct, retaliating against the people who attacked you is counterproductive. Better is to expand your analysis or drop the subject altogether.

Replies from: timtyler
comment by timtyler · 2010-01-19T00:28:58.658Z · LW(p) · GW(p)

You choose to perpetuate the cycle. OK, then, I will drop out first. This thread has been dragged into the gutter - and I am not interested in following it down there. Bye.

comment by orthonormal · 2010-01-18T22:20:55.237Z · LW(p) · GW(p)

Confused about pronouns even after your edit: who is "you"? My remarks aren't being downvoted, so I assume "you" doesn't mean me. And you used "he" to refer to Tim Tyler, so I assume "you" doesn't mean him.

Replies from: RobinZ
comment by RobinZ · 2010-01-18T22:31:10.608Z · LW(p) · GW(p)

I apologize, I r dum.

comment by DWCrmcm · 2010-02-02T23:53:46.129Z · LW(p) · GW(p)

Yes I am mentally ill. I had disclosed this on my own website. I am not delusional. For the disabled, carelessly throwing around psychiatric terms when you are not a practicing psychiatrist is foul and abusive. I have very little formal education which is common among the mentally disabled. I eagerly await your hierarchical models for the complexity of "AI", and your elegant algorithms for its implementation. I imagine your benefactors are also eagerly waiting. Don't disappoint them or they may pull their wasted funding. I will continue hobbling along my "delusional" way, then after a while I will probably apply for the funding they are wasting on this place. Of course the algorithm is the key, and that I will not be publishing. Best of luck. May the best crazy win.

comment by kim0 · 2010-01-18T20:28:22.971Z · LW(p) · GW(p)

Yes. To me it seems like all arguments for the importance of friendly AI are based on the assumption that its moral evaluation function must be correct, or it will necessarily become evil or insane, due over optimization of some weird aspect.

However, with uncertainty in the system, as limited knowledge of the past, or as uncertainty in what the evaluation function is, optimization should take this into account, and make strategies to keep its options open. In the paperclip example, this would be avoiding making people into paperclips because it suspects that the paperclips might be for people.

Mathematically, an AI going evil insane corresponds to it seeking the most probable optimization, while doing multiple strategies corresponds to it integrating the probabilities over different outcomes.

Replies from: timtyler
comment by timtyler · 2010-01-18T23:49:47.421Z · LW(p) · GW(p)

I think the usual example assumes that the machine assigns a low probability to the hypothesis that paperclips are not the only valuable thing - because of how it was programmed.

comment by Technologos · 2010-01-17T21:16:23.519Z · LW(p) · GW(p)

an unprecedented and unlikely phenomenon

Possible precedents: the Library of Alexandria and the Dark Ages.

Replies from: timtyler
comment by timtyler · 2010-01-17T21:27:33.433Z · LW(p) · GW(p)

Reaching, though: the dark ages were confined to Western Europe - and something like the Library of Alexandria couldn't happen these days - there are too many libraries.

comment by Vladimir_Nesov · 2010-01-15T20:24:35.282Z · LW(p) · GW(p)

This doesn't deal with uFAI...

comment by CronoDAS · 2010-01-14T13:47:11.390Z · LW(p) · GW(p)

If you think you have an AI that might improve itself and act on the real world, don't run it.

Replies from: ciphergoth, JamesAndrix
comment by Paul Crowley (ciphergoth) · 2010-01-14T15:22:21.509Z · LW(p) · GW(p)

Strike "and act on the real world" - all AIs act on the real world.

Replies from: CronoDAS
comment by CronoDAS · 2010-01-14T21:59:41.170Z · LW(p) · GW(p)

I mean, act on the real world in a way more significant than your typical chess-playing program.

comment by JamesAndrix · 2010-01-14T19:44:53.155Z · LW(p) · GW(p)

This rules out FAI.

Replies from: ciphergoth
comment by Paul Crowley (ciphergoth) · 2010-01-14T23:02:58.096Z · LW(p) · GW(p)

Sure, this is advice along the lines of "don't design your own cipher".

Only more so.

Replies from: JamesAndrix, Stuart_Armstrong
comment by JamesAndrix · 2010-01-15T03:08:05.466Z · LW(p) · GW(p)

In general wise, but in this case we need a cipher, don't have any, and will probably be handed a bad one in the future.

Our truisms need to be advice we would want everyone to follow.

Replies from: Nick_Tarleton
comment by Nick_Tarleton · 2010-01-15T03:30:05.197Z · LW(p) · GW(p)

We should encourage thinking about the intent (incoming) and expected effect (outgoing) of truisms, rather than their literal meaning. If either of the above injunctions actually doesn't apply to you, you'll know it.

Replies from: JamesAndrix
comment by JamesAndrix · 2010-01-15T07:20:42.291Z · LW(p) · GW(p)

My concern is you'll also 'know' it doesn't apply to you when it does. People write ciphers all the time.

Replies from: ciphergoth, Nick_Tarleton
comment by Paul Crowley (ciphergoth) · 2010-01-15T08:34:07.029Z · LW(p) · GW(p)

Yes, this is my concern too. However, anyone who posts to a newsgroup saying "I'm about to write my own cipher, any advice" should not do it. The post indicated someone who planned to actually start writing code; that's a definite sign that they shouldn't do it.

Replies from: Stuart_Armstrong
comment by Stuart_Armstrong · 2010-01-15T10:31:30.709Z · LW(p) · GW(p)

See the addendum above; "don't do it" isn't likely to work.

Replies from: ciphergoth
comment by Paul Crowley (ciphergoth) · 2010-01-15T10:45:31.449Z · LW(p) · GW(p)

Even though it's unlikely to work, it is still the approach which minimizes risk; even a small reduction in their probability of going ahead will likely be a bigger effect than any other safety advice you can give, and any other advice will act against its efficacy.

comment by Nick_Tarleton · 2010-01-15T15:44:10.892Z · LW(p) · GW(p)

"Then they are fools and nothing can be done about it." In any case, this seems to be the opposite of the concern you were citing before.

Replies from: JamesAndrix, JamesAndrix
comment by JamesAndrix · 2010-01-15T16:30:46.413Z · LW(p) · GW(p)

If we use truisms that everyone knows have to be ignored by someone, It becomes easier to think they can be ignored by oneself.

comment by JamesAndrix · 2010-01-15T16:36:03.583Z · LW(p) · GW(p)

I reread the thread, leaning towards your position now.

comment by Stuart_Armstrong · 2010-01-15T10:23:24.338Z · LW(p) · GW(p)

Entertainingly, he's entering the field from mathematical cryptography; so "don't design your own cipher" is precisely the wrong analogy to use here :-)

Replies from: ciphergoth
comment by Paul Crowley (ciphergoth) · 2010-01-15T10:44:05.588Z · LW(p) · GW(p)

"mathematical cryptography"? What other sort of cryptography is there?

Replies from: Stuart_Armstrong
comment by Stuart_Armstrong · 2010-01-15T13:10:38.969Z · LW(p) · GW(p)

It used to be the domain of the linguists... But you're correct; nowadays, I'm using mathematical cryptography as a short hand for "y'know, like, real cryptography, not just messing around with symbols to impress you friends".

Replies from: ciphergoth
comment by Paul Crowley (ciphergoth) · 2010-01-15T13:58:24.320Z · LW(p) · GW(p)

Ah, OK!

It's possible in that case that I may actually know your friend, if they happened to touch on some of the same parts of the field as me.

Replies from: Stuart_Armstrong
comment by Stuart_Armstrong · 2010-01-15T14:14:00.218Z · LW(p) · GW(p)

No extra clues :-)

comment by blogospheroid · 2010-01-15T07:35:10.341Z · LW(p) · GW(p)

Due to the lack of details, it is difficult to make a recomendation, but some thoughts.

Both as an AGI challenge and for general human safety, business intelligence datawarehouses are probably a good bet. Any pattern undetected by humans detected by an AI could mean good money, which could feedback into more resources for the AI. Also, the ability of corporations to harm others doesn't increase significantly with a better business intelligence tool.

Virtual worlds - If the AI is tested in an isolated virtual world, that will be better for us. Test it in a virtual world that is completely unlike ours, a gas giant simulation maybe. Even if it develops extremely capable technology to deal with the gas giant environment within the simulation, it would mean very little in the real world except as a demonstration of intelligence.

Replies from: JamesAndrix, wedrifid, ChristianKl
comment by JamesAndrix · 2010-01-17T00:45:37.169Z · LW(p) · GW(p)

Virtual Worlds doesn't buy you any safety, even if it can't break out of the simulator.

If you manage to make AI, you've got a Really Powerful Optimization Process. If it worked out simulated physics and has access to it's own source, it's probably smart enough to 'foom', even with the simulation. At which point you have a REALLY powerful optimizer, and no idea how to prove anything about it's goal system. An untrustable genie.

Also, spending all those cycles on that kind of simulated world would be hugely inefficient.

Replies from: blogospheroid
comment by blogospheroid · 2010-01-17T19:22:59.468Z · LW(p) · GW(p)

James, you can't blame me for responding to the question. Stuart has said that advice on giving up will not be accepted. The question is to minimise the fallout of a lucky stroke moving this guy's AI forward and fooming. Both of my suggestions were around that.

Replies from: JamesAndrix
comment by JamesAndrix · 2010-01-17T20:26:56.523Z · LW(p) · GW(p)

You are quite right.

comment by wedrifid · 2010-01-15T08:33:08.090Z · LW(p) · GW(p)

Virtual worlds - If the AI is tested in an isolated virtual world, that will be better for us. Test it in a virtual world that is completely unlike ours, a gas giant simulation maybe. Even if it develops extremely capable technology to deal with the gas giant environment within the simulation, it would mean very little in the real world except as a demonstration of intelligence.

You are giving a budding superintelligence exposure to a simulation based on our physics? It would work out the physics of the isolated virtual world, deduce from the traces you leave in the design that it is in a simulation and have a good guess on what we believe to be the actual physics of our universe. Maybe even have a hunch about how we have physics wrong. I would not want to bet our existence on it being unable to get out of that box.

Replies from: blogospheroid
comment by blogospheroid · 2010-01-16T10:01:15.542Z · LW(p) · GW(p)

My point with the virtual worlds was to put the AI into a simulation sufficiently unlike our world that it wouldn't be a threat and sufficiently like our world that we would be able to recognise what it does as intelligence. Hence the Gas giant example.

If we were to release an AI into today's simulations like sims which are much less granular than the one I have proposed in my post, then it would figure out that it is in a simulation much faster.

If we put it into some other kind of universe with weird physics, a magical universe lets say, then we will need to send someone intelligent to do a considerable amount of trials before we release the AI. This is to prove that whatever solutions the AI comes up with are genuinely intelligent and not something that is obvious.

I too agree that we wouldn't want to bet our existence on it being unable to get out of that box, but what evidence will we leave in the simulation which will point to it that it has to "Press Red for talking to simulator"? Or to put it in even simpler terms, where in our universe is OUR "Press Red to talk to simulator" button?

Replies from: Normal_Anomaly, wedrifid
comment by Normal_Anomaly · 2011-06-27T17:30:51.039Z · LW(p) · GW(p)

My point with the virtual worlds was to put the AI into a simulation sufficiently unlike our world that it wouldn't be a threat and sufficiently like our world that we would be able to recognise what it does as intelligence. Hence the Gas giant example.

I'm not sure I follow. Gas giants run on the same physics as you and me. Do you mean a world with actual different simulated physics?

comment by wedrifid · 2010-01-16T13:41:08.434Z · LW(p) · GW(p)

I too agree that we wouldn't want to bet our existence on it being unable to get out of that box, but what evidence will we leave in the simulation which will point to it that it has to "Press Red for talking to simulator"?

I don't know. Who is going to be creating the simulation? How can I be comfortable that he will not either make a bug or design a simulation that a superintelligence cannot deduce that it is artificial? Proving that things way way smarter than me couldn't know stuff is hard. Possible sometimes but hard.

Or to put it in even simpler terms, where in our universe is OUR "Press Red to talk to simulator" button?

The presence or absence of such a button in our universe provides some evidence about whether we could reliably create a simulation that is undetectable. But not that much evidence.

Replies from: ChristianKl
comment by ChristianKl · 2010-01-16T18:43:38.141Z · LW(p) · GW(p)

How would you design such a button? Reciting a fixed verse and afterwards stating what you want from the simulator seems like a good technique. A majority of the people on this earth believe that such a button exists in form of praying ;)

comment by ChristianKl · 2010-01-16T18:38:32.975Z · LW(p) · GW(p)

Additionally the computer on which the virtual world runs shouldn't be directly connected to other computers to prevent the AGI to escape through some 0day.

comment by Psychohistorian · 2010-01-14T18:36:06.890Z · LW(p) · GW(p)

This seems rather relevant - and suggests the answer is go watch more TV. Or, at least, I felt it really needed to be linked here, and this gave me the perfect opportunity!

Replies from: arbimote
comment by arbimote · 2010-01-18T11:23:47.287Z · LW(p) · GW(p)

Someone actually made a top-level post on this the other day. Just sayin'.

Replies from: RobinZ
comment by RobinZ · 2010-01-18T16:30:10.608Z · LW(p) · GW(p)

This comment and that post are actually within seventeen minutes of each other. I think Psychohistorian may be forgiven for not noticing dclayh.

Replies from: Psychohistorian
comment by Psychohistorian · 2010-01-19T05:38:36.023Z · LW(p) · GW(p)

I think Psychohistorian may be forgiven for not noticing dclayh.

That is odd; I distinctly recall posting this before the top-level.

Replies from: RobinZ
comment by RobinZ · 2010-01-19T05:40:52.935Z · LW(p) · GW(p)

That would be an even better excuse.

Edit: It occurs to me that the datestamp may correspond to the writing of a draft, not the time of publication.

comment by thomblake · 2010-01-14T16:02:59.685Z · LW(p) · GW(p)

There isn't really a general answer to "how to design a safe AI". It really depends what the AI is used for (and what they mean by AI).

For recursively self-improving AI, you've got your choice of "it's always bad", "You should only do it the SIAI way (and they haven't figured that out yet)", or "It's not a big deal, just use sofware best practices and iterate".

For robots, I've argued in the past that robots need to share our values in order to avoid squashing them, but I haven't seen anyone work this out rigorously. On a different tack altogether, Ron Arkin's Governing Lethal Behavior in Autonomous Robots is excellent and describes in detail how to make military robots that use lethality appropriately. In a household application, it's very difficult to see what sorts of actions might be problematic, but in a military application the main concern is making "aim the gun and fire" only happen when you really want it to.

For video games and the like, there's plenty of literature about the question, but not much to take seriously there.

comment by whpearson · 2010-01-14T11:57:00.063Z · LW(p) · GW(p)

If you want to design a complex malleable AI design and have some guarantees about what it will do (rather than just fail in some creative way), think of simple properties you can prove about your code, and then try and prove them using Coq or other theorem proving system.

If you can't think of any properties that you want to hold for your system, think more.

comment by zero_call · 2010-01-16T20:29:05.950Z · LW(p) · GW(p)

For solving the Friendly AI problem, I suggest the following constraints for your initial hardware system:

1.) All outside input (and input libraries) are explicitly user selected. 2.) No means for the system to establish physical action (e.g., no robotic arms.) 3.) No means for the system to establish unexpected communication (e.g., no radio transmitters.)

Once this closed system has reached a suitable level of AI, then the problem of making it friendly can be worked on much easier and more practically, and without risk of the world ending.

To start out from the beginning to make a GAI friendly through some other means seems rather ambitious to me. Why not just work on AI now, make sure when you're getting close to the goal, that the AI is suitably restricted, and then finally use the AI itself as an experimental testbed for "personality certification".

(Can someone explain/link me to why this isn't currently espoused?)

Replies from: Technologos, timtyler, orthonormal
comment by Technologos · 2010-01-16T20:32:26.502Z · LW(p) · GW(p)

This is essentially the AI box experiment. Check out the link to see how even an AI that can only communicate with its handler(s) might be lethal without guaranteed Friendliness.

Replies from: Alicorn
comment by Alicorn · 2010-01-16T20:35:56.074Z · LW(p) · GW(p)

I don't think the publicly available details establish "how", merely "that".

Replies from: Technologos
comment by Technologos · 2010-01-16T20:56:23.420Z · LW(p) · GW(p)

Sure, though the mechanism I was referring to is "it can convince its handler(s) to let it out of the box through some transhuman method(s)."

Replies from: RobinZ
comment by RobinZ · 2010-01-16T21:03:28.455Z · LW(p) · GW(p)

Wait, since when is Eliezer transhuman?

Replies from: Technologos
comment by Technologos · 2010-01-16T21:23:54.233Z · LW(p) · GW(p)

Who said he was? If Eliezer can convince somebody to let him out of the box--for a financial loss no less--then certainly a transhuman AI can, right?

Replies from: RobinZ
comment by RobinZ · 2010-01-16T22:19:19.553Z · LW(p) · GW(p)

Certainly they can; what I am emphasizing is that "transhuman" is an overly strong criterion.

Replies from: Technologos
comment by Technologos · 2010-01-16T22:21:25.493Z · LW(p) · GW(p)

Definitely. Eliezer reflects perhaps a maximum lower bound on the amount of intelligence necessary to pull that off.

comment by timtyler · 2010-01-17T11:14:21.350Z · LW(p) · GW(p)

Didn't David Chalmers propose that here:

http://www.vimeo.com/7320820

...?

Test harnesses are a standard procedure - but they are not the only kind of test.

Basically, unless you are playing chess, or something, if you don't test in the real world, you won't really know if it works - and it can't do much to help you do important things - like raise funds to fuel development.

comment by orthonormal · 2010-01-16T23:57:56.482Z · LW(p) · GW(p)

I don't understand why this comment was downvoted.

Yes, zero call asks a question many of us feel has been adequately answered in the past; but they are asking politely, and it would have taken extensive archive-reading for them to have already known about the AI-Box experiment.

Think before you downvote, especially with new users!

EDIT: As AdeleneDawner points out, zero call isn't that new. Even so, the downvotes (at -2 when I first made my comment) looked more like signaling disagreement than anything else.

Replies from: Vladimir_Nesov, AdeleneDawner
comment by Vladimir_Nesov · 2010-01-17T10:56:24.174Z · LW(p) · GW(p)

I downvoted the comment not because of AI box unsafety (which I don't find convincing at the certainty level with which it's usually asserted -- disutility may well give weight to the worry, but not to the probability), but because it gives advice on the paint color for a spaceship in the time when Earth is still standing on a giant Turtle in the center of the world. It's not a sane kind of advice.

Replies from: orthonormal
comment by orthonormal · 2010-01-17T18:11:37.331Z · LW(p) · GW(p)

If I'd never heard of the AI-Box Experiment, I'd think that zero call's comment was a reasonable contribution to a conversation about AI and safety in particular. It's only when we realize that object-level methods of restraining a transhuman intelligence are probably doomed that we know we must focus so precisely on getting its goals right.

Replies from: blogospheroid
comment by blogospheroid · 2010-01-17T19:47:06.139Z · LW(p) · GW(p)

Vladimir and orthonormal,

Please point me to some more details about the AI box experiment, since I think what i suggested earlier as isolated virtual worlds is pretty much the same as what zero call is suggesting here.

I feel that there are huge assumptions in the present AI Box experiment. The gatekeeper and the AI share a language, for one, by which the AI convinces the gatekeeper.

If AGI is your only criteria without regards to friendliness, just make sure not to communicate with the AI. Turing tests are not the only proofs of intelligence. If the agi can come up with unique solutions in the universe in which it is isolated, that is enough to understand this algorithm is creative.

Replies from: AdeleneDawner
comment by AdeleneDawner · 2010-01-17T21:40:32.053Z · LW(p) · GW(p)

This just evoked a possibly-useful thought:

If observing but not communicating with a boxed AI does a good enough job of patching the security holes (which I understand that it might not - that's for someone who better understands the issue to look at), perhaps putting an instance of a potential FAI in a contained virtual world would be useful as a test. It seems to me that a FAI that didn't have humans to start with would perhaps have to invent us, or something like us in some specific observable way(s), because of its values.

comment by AdeleneDawner · 2010-01-17T00:04:39.693Z · LW(p) · GW(p)

Good thought, but on further examination it turns out that zero isn't all that new - xe's been commenting since November; xyr karma is low because xe has been downvoted almost as often as upvoted.

comment by JamesAndrix · 2010-01-14T19:53:36.269Z · LW(p) · GW(p)

My current toy thinking along these lines is imagining a program that will write a program to solve the towers of hanoi, given only some description of the problem, and do nothing else, using only fixed computational resources for the whole thing.

I think that's safe, and would illustrate useful principles for FAI.

Replies from: Morendil, JenniferRM, JGWeissman
comment by Morendil · 2011-06-24T11:10:46.272Z · LW(p) · GW(p)

An earlier comment of mine on the Towers of Hanoi. (ETA: I mean earlier relative to the point in time when this thread was resurrected.)

Are you familiar with Hofstadter's work in "microdomains", such as Copycat et al.?

comment by JenniferRM · 2010-06-25T00:41:13.919Z · LW(p) · GW(p)

So.... you want to independently re-invent a prolog compiler?

Replies from: Blueberry, SilasBarta
comment by Blueberry · 2010-06-25T01:12:07.432Z · LW(p) · GW(p)

More like a program that takes

This object of this famous puzzle is to move N disks from the left peg to the right peg using the center peg as an auxiliary holding peg. At no time can a larger disk be placed upon a smaller disk.

as input and returns the Prolog code as output.

comment by SilasBarta · 2010-06-25T01:16:46.058Z · LW(p) · GW(p)

What Blueberry said. The page you linked just gives the standard program for solving Towers of Hanoi. What JamesAndrix was imagining was a program that comes up with that solution, given just the description of the problem -- i.e., what the human coder did.

Replies from: aletheilia
comment by aletheilia · 2011-06-24T10:49:06.751Z · LW(p) · GW(p)

Well, this can actually be done (yes, in Prolog with a few metaprogramming tricks), and it's not really that hard - only very inefficient, i.e. feasible only for relatively small problems. See: Inductive logic programming.

Replies from: JamesAndrix
comment by JamesAndrix · 2011-06-25T08:00:18.149Z · LW(p) · GW(p)

No, not learning. And the 'do nothing else' parts can't be left out.

This shouldn't be a general automatic programing method, just something that goes through the motions of solving this one problem. It should already 'know' whatever principles lead to that solution. The outcome should be obvious to the programmer, and I suspect realistically hand-traceable. My goal is a solid understanding of a toy program exactly one meta-level above hanoi.

This does seem like something Prolog could do well, if there is already a static program that does this I'd love to see it.

comment by JGWeissman · 2010-01-14T20:14:06.119Z · LW(p) · GW(p)

Until you specify the format of a description of the problem, and how the program figures out how to write a program to solve the problem, it is hard to tell if this would be safe.

And if you don't know that it is safe, it isn't. Using some barrier like "fixed computational resources" to contain a non-understood process is a red flag.

Replies from: JamesAndrix
comment by JamesAndrix · 2010-01-14T20:52:35.523Z · LW(p) · GW(p)

The format of the description is something I'm struggling with, but I'm not clear how it impacts safety.

How the AI figures things out is up to the human programmer. Part of my intent in this exercise is to constrain the human to solutions they fully understand. In my mind my original description would have ruled out evolving neural nets, but now I see I definitely didn't make that clear.

By 'fixed computational resources' I mean that you've got to write the program such that if it discovers some flaw that gives it access to the internet, it will patch around that access because what it is trying to do is solve the puzzle of (solving the puzzle using only these instructions and these rules and this memory.)

What I'm looking for is a way to work on friendliness using goals that are much simpler than human morality, implemented by minds that are at least comprehensible in their operation, if not outright step-able.

comment by Peter_de_Blanc · 2010-01-14T14:07:46.713Z · LW(p) · GW(p)

Try to build an AI that:

  1. Implements a timeless decision theory.
  2. Is able to value things that it does not directly perceive, and in particular cares about other universes.
  3. Has a utility function such that additional resources have diminishing marginal returns.

Such an AI is more likely to participate in trades across universes, possibly with a friendly AI that requests our survival.

[EDIT]: It now occurs to me that an AI that participates in inter-universal trade would also participate in inter-universal terrorism, so I'm no longer confident that my suggestions above are good ones.

Replies from: byrnema, Blueberry
comment by byrnema · 2010-01-14T14:57:26.307Z · LW(p) · GW(p)

(Disclaimer: I don't know anything about AI.)

Is the marginal utility of resources something that you can input? It seems to me that since resources have instrumental value (pretty much, that's what a resource is by definition), their value would be something that has to be outputted by the utility function.

If you tried to input the value of resources, you'd run into difficulties with the meaning of resources. For example, would the AI distinguish "having resources" from "having access to resources" from "having access to the power of having access to resources"? Even if 'having resources' has negative utility for the AI, he might enjoy controlling resources in all kinds of ways in exchange for power to satisfy terminal values.

Even if you define power as a type of resource, and give that negative utility, then you will basically be telling the AI to enjoy not being able to satisfy his terminal values. (But yet, put that way, it does suggest some kind of friendly passive/pacifist philosophy.)

Replies from: Technologos
comment by Technologos · 2010-01-14T15:18:17.864Z · LW(p) · GW(p)

There is a difference between giving something negative utility and giving it decreasing marginal utility. It's sufficient to give the AI exponents strictly between zero and one for all terms in a positive polynomial utility function, for instance. That would be effectively "inputting" the marginal utility of resources, given any current state of the world.

Replies from: byrnema
comment by byrnema · 2010-01-14T15:27:43.572Z · LW(p) · GW(p)

There is a difference between giving something negative utility and giving it decreasing marginal utility.

I was considering the least convenient argument, the one that I imagined would result in the least aggressive AI. (I should explain here that I considered that even a 0 terminal utility for the resource itself would not result in 0 utility for that resource, because that resource would have some instrumental value in achieving things of value.)

(Above edited because I don't think I was understood.)

But I think the problem in logic identified with inputting the value of an instrumental value remains either way.

Replies from: Peter_de_Blanc
comment by Peter_de_Blanc · 2010-01-14T20:30:00.137Z · LW(p) · GW(p)

You pretty much have to guess about the marginal value of resources. But let's say the AI's utility function is "10^10th root of # of paperclips in universe." Then it probably satisfies the criterion.

EDIT: even better would be U = 1 if the universe contains at least one paperclip, otherwise 0.

comment by Blueberry · 2010-01-14T17:54:30.032Z · LW(p) · GW(p)

Can you please elaborate on "trades across universes"? Do you mean something like quantum civilization suicide, as in Nick Bostrom's paper on that topic?

Replies from: Wei_Dai
comment by Wei Dai (Wei_Dai) · 2010-01-14T19:05:46.433Z · LW(p) · GW(p)

Here's Nesov's elaboration of his trading across possible worlds idea.

Personally, I think it's an interesting idea, but I'm skeptical that it can really work, except maybe in very limited circumstances such as when the trading partners are nearly identical.

Replies from: Blueberry
comment by Blueberry · 2010-01-15T17:33:43.792Z · LW(p) · GW(p)

Cool, thanks!

comment by Bugmaster · 2014-06-29T02:39:57.684Z · LW(p) · GW(p)

What does "AI programming" even mean ? If he's trying to make some sort of an abstract generally-intelligent AI, then he'll be wasting his time, since the probability of him succeeding is somewhere around epsilon. If he's trying to make an AI for some specific purpose, then I'd advise him to employ lots of testing and especially cross-validation, to avoid overfitting. Of course, if his purpose is something like "make the smartest killer drone ever", then I'd prefer him to fail...

comment by zero_call · 2010-01-17T19:52:13.644Z · LW(p) · GW(p)

I've read through the AI-Box experiment, and I can still say that I recommend the "sealed AI" tactic. The Box experiment isn't very convincing at all to me, which I could go into detail about, but that would require a whole post. But of course, I'll never develop the karma to do that because apparently the rate at which I ask questions of proper material exceeds the rate at which I post warm, fuzzy comments. Well, at least I have my own blog...

Replies from: orthonormal
comment by orthonormal · 2010-01-17T20:24:33.603Z · LW(p) · GW(p)

It looks like you're picking up karma relatively rapidly of late; it takes a while to learn the ways of speaking around here that don't detract from the content of one's comments, but once that happens, most people will accumulate karma reasonably quickly.

But since the AI-Box experiment has been discussed a bit here already, it might make sense to lay out your counterargument here or on the Open Thread for now. I know that's not as satisfying as making a post, but I think you'll still get quality discussion.

(Also, a top-level post on an old topic by a relative newcomer runs a risk of getting downvoted for redundancy if the argument recapitulates someone's old position— and post downvotes can kill your karma for a while. Caveat scriptor!)

P.S. Also on the topic, and quite interesting: That Alien Message.

Replies from: zero_call
comment by zero_call · 2010-01-17T21:16:38.556Z · LW(p) · GW(p)

I think God themselves just struck me with +20 karma somehow... thank ye almighty lords! Yeah, but indeed I will heed your advice and look into the issue more before posting.