Let's reimplement EURISKO!

post by cousin_it · 2009-06-11T16:28:06.637Z · LW · GW · Legacy · 165 comments

In the early 1980s Douglas Lenat wrote EURISKO, a program Eliezer called "[maybe] the most sophisticated self-improving AI ever built". The program reportedly had some high-profile successes in various domains, like becoming world champion at a certain wargame or designing good integrated circuits.

Despite requests Lenat never released the source code. You can download an introductory paper: "Why AM and EURISKO appear to work" [PDF]. Honestly, reading it leaves a programmer still mystified about the internal workings of the AI: for example, what does the main loop look like? Researchers supposedly answered such questions in a more detailed publication, "EURISKO: A program that learns new heuristics and domain concepts." Artificial Intelligence (21): pp. 61-98. I couldn't find that paper available for download anywhere, and being in Russia I found it quite tricky to get a paper version. Maybe you Americans will have better luck with your local library? And to the best of my knowledge no one ever succeeded in (or even seriously tried) confirming Lenat's EURISKO results.

Today in 2009 this state of affairs looks laughable. A 30-year-old pivotal breakthrough in a large and important field... that never even got reproduced. What if it was a gigantic case of Clever Hans? How do you know? You're supposed to be a scientist, little one.

So my proposal to the LessWrong community: let's reimplement EURISKO!

We have some competent programmers here, don't we? We have open source tools and languages that weren't around in 1980. We can build an open source implementation available for all to play. In my book this counts as solid progress in the AI field.

Hell, I'd do it on my own if I had the goddamn paper.

Update: RichardKennaway has put Lenat's detailed papers up online, see the comments.

165 comments

Comments sorted by top scores.

comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2009-06-11T18:56:41.895Z · LW(p) · GW(p)

This is a road that does not lead to Friendly AI, only to AGI. I doubt this has anything to do with Lenat's motives - but I'm glad the source code isn't published and I don't think you'd be doing a service to the human species by trying to reimplement it.

Replies from: orthonormal, cousin_it, Jonathan_Graehl, stcredzero, SilasBarta, loqi, Vladimir_Nesov, saturn, gRR, rwallace, CannibalSmith
comment by orthonormal · 2009-06-19T19:26:29.193Z · LW(p) · GW(p)

I doubt this has anything to do with Lenat's motives - but I'm glad the source code isn't published

A stunning proof-of-concept for AI, with the source code lost to the mists of time; then immersion in an apparently massive dead-end project. Is anyone else worried that Lenat might secretly have a phase 3 to his plan?

Replies from: Eliezer_Yudkowsky
comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2009-06-20T00:52:21.685Z · LW(p) · GW(p)

Seems too obvious.

Replies from: orthonormal
comment by orthonormal · 2009-06-20T01:50:44.453Z · LW(p) · GW(p)

Exactly.

comment by cousin_it · 2009-06-11T19:36:11.358Z · LW(p) · GW(p)

You may stop worrying for the moment. I just tried to wade through the papers RichardKennaway has put up, and it seems reimplementing EURISKO given Lenat's most detailed descriptions of it will likely be a big creative endeavor. Download them, read them and then tell me just one thing: do you now know (even approximately) what the main loop looks like, or not? Cause I couldn't make it out in half an hour's reading.

comment by Jonathan_Graehl · 2009-06-12T00:45:48.907Z · LW(p) · GW(p)

Are you really afraid that AI is so easy that it's a very short distance between "ooh, cool" and "oh, shit"?

Replies from: Eliezer_Yudkowsky, loqi, None
comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2009-06-12T03:44:40.466Z · LW(p) · GW(p)

Depends how cool. I don't know the space of self-modifying programs very well. Anything cooler than anything that's been tried before, even marginally cooler, has a noticeable subjective probability of going to shit. I mean, if you kept on making it marginally cooler and cooler, it'd go to "oh, shit" one day after a sequence of "ooh, cools" and I don't know how long that sequence is.

Replies from: Daniel_Burfoot
comment by Daniel_Burfoot · 2009-06-12T14:44:02.854Z · LW(p) · GW(p)

I mean, if you kept on making it marginally cooler and cooler, it'd go to "oh, shit" one day after a sequence of "ooh, cools" and I don't know how long that sequence is.

This means we should feel pretty safe, since AI does not appear to be making even incremental progress.

Really, it's hard for anyone who is well-versed in the "state of the art" of AI to feel any kind of alarm about the possibility of an imminent FOOM. Take a look at this paper. Skim through the intro, note the long and complicated reinforcement learning algorithm, and check out the empirical results section. The test domain involves a monkey in a 5x5 playroom. There are some fun little complications, like a light switch and a bell. Note that these guys are top-class (Andrew Barto basically invented RL), and the paper was published at one of the top-tier machine learning conferences (NIPS), in 2005.

Call me a denier, but I just don't think the monkey is going to bust out of his playroom and take over the world. At least, not anytime soon.

Replies from: AndrewH, MugaSofer
comment by AndrewH · 2009-06-13T21:59:19.671Z · LW(p) · GW(p)

Taking progress in AI to mean more real world effectiveness:

Intelligence seems to have jumps in real world effectiveness, e.g. the brains of great apes and humans are very similar, the difference in effectiveness is obvious.

So coming to the conclusion that we are fine based on the state of the art not being any more effective (not making progress) would be very dangerous. Perhaps tomorrow, some team of AI researchers will combine the current state of the art solutions in just the right way, resulting in a massive jump in real world effectiveness? maybe enough to have an "oh, shit" moment?

Regardless of the time frame, if the AI community is working towards AGI rather than FAI, we will likely have (eventually) an AI go FOOM or at the very least, and "oh, shit" moment (I'm not sure if they are equivalent).

comment by MugaSofer · 2013-04-25T13:10:38.315Z · LW(p) · GW(p)

This means we should feel pretty safe, since AI does not appear to be making even incremental progress.

Zing!

Also, good point, but this post is designed produce such progress, is it not?

comment by loqi · 2009-06-12T02:44:22.903Z · LW(p) · GW(p)

Does the specific distance even matter? UFAI vs FAI is zero sum, and we have no idea how long FAI will take us. Any progress toward AGI that isn't "matched" by progress toward FAI is regressive, even if AGI is still 100 years off.

comment by [deleted] · 2009-06-12T01:28:25.962Z · LW(p) · GW(p)

I am.

comment by stcredzero · 2009-06-14T13:49:17.954Z · LW(p) · GW(p)

I just site Googled lesswrong.org and overcomingbias.com, and my best guess so far is that AGI = Artificial General Intelligence. (Augmented?) Is that what AGI stands for? Non site-specific google results are definitely not the applicable ones. Apparently it has been used as an abbreviation in this community for quite awhile, but finding out what it stands for takes a few jumps. (Should there be an easily found FAQ? One is not obvious here or on the wiki. Or is a certain degree of obscurity desired?)

Replies from: Vladimir_Nesov, Psy-Kosh
comment by Vladimir_Nesov · 2009-06-14T15:04:33.638Z · LW(p) · GW(p)

It's Artificial general intelligence, I added a redirect and a stub page on the wiki.

comment by Psy-Kosh · 2009-06-14T15:04:07.668Z · LW(p) · GW(p)

Yeah, AGI would be Artificial General Intelligence, where the term "General" is to contrast it with much of current AI work which, well, isn't.

ie, a human, for instance, can figure out how to play an instrument, do deep number theory, build an engine, etc etc etc etc...

comment by SilasBarta · 2009-06-11T19:18:20.717Z · LW(p) · GW(p)

Security By Obscurity: When you can't be bothered to implement a real solution!(tm)

Replies from: Eliezer_Yudkowsky
comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2009-06-11T20:20:14.535Z · LW(p) · GW(p)

And what, pray tell, does a "real solution" look like?

Replies from: SilasBarta
comment by SilasBarta · 2010-07-14T15:51:27.011Z · LW(p) · GW(p)

Seeing the recent thread necromancy, it looks like this is a much more important question than I realized at the time, since it bears on AI-related existential risk.

The question, to summarize, was, "How exactly do you keep a good AGI prospect from being developed into unFriendly AGI, other than Security by Obscurity?"

My answer is that SbO (Security by Obscurity, not Antimony(II) Oxide, which doesn't even exist) is not a solution here for the same reason it's criticized everywhere else (which I assume is that it increases the probability of a rogue outsmarting mainstream researchers). Better to let the good guys be as well informed as the bad guys so they can deploy countermeasures (their own AGI) when the bad guys develop theirs.

But then, I haven't researched this 24/7 for the last several years, so this may be too trite a dismissal.

comment by loqi · 2009-06-11T19:49:20.091Z · LW(p) · GW(p)

This is a road that does not lead to Friendly AI, only to AGI.

Lazy question: Did you explain this in an OB post? I have a crappy mental index of the topics you covered, and Google didn't yield any immediate results, just an explanation of why Eurisko didn't foom. If not, I'd love to see a post explaining how you're extrapolating the Eurisko approach into something incompatible with FAI. Or are you just applying the general rule, "AGI that's not explicitly friendly is explicitly unfriendly"?

Replies from: thomblake
comment by thomblake · 2009-06-11T20:07:24.383Z · LW(p) · GW(p)

AGI that's not explicitly friendly is explicitly unfriendly

Yeah, he means that. "Please don't work on AGI until you've worked out FAI."

ETA: read Eliezer's reply.

Replies from: Eliezer_Yudkowsky
comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2009-06-11T20:27:01.238Z · LW(p) · GW(p)

Not exactly, Thom. Roughly, for FAI you need precise self-modification. For precise self-modification, you need a precise theory of the intelligence doing the self-modification. To get to FAI you have to walk the road that leads to precise theories of intelligence - something like our present-day probability theory and decision theory, but more powerful and general and addressing issues these present theories don't.

Eurisko is the road of self-modification done in an imprecise way, ad-hoc, throwing together whatever works until it gets smart enough to FOOM. This is a path that leads to shattered planets, if it were followed far enough. No, I'm not saying that Eurisko in particular is far enough, I'm saying that it's a first step along that path, not the FAI path.

Replies from: derekz, Vladimir_Nesov, John_Maxwell_IV, SoullessAutomaton, thomblake
comment by derekz · 2009-06-11T21:52:41.570Z · LW(p) · GW(p)

Perhaps a writeup of what you have discovered, or at least surmise, about walking that road would encourage bright young minds to work on those puzzles instead of reimplementing Eurisko.

It's not immediately clear that studying and playing with specific toy self-referential systems won't lead to ideas that might apply to precise members of that class.

Replies from: Eliezer_Yudkowsky
comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2009-06-11T23:01:56.975Z · LW(p) · GW(p)

I've written up some of the concepts of precise self-modification, but need to collect the posts on a Wiki page on "lawfulness of intelligence" or something.

Replies from: Kevin
comment by Kevin · 2010-01-23T09:01:55.207Z · LW(p) · GW(p)

Any of these posts ever go up?

Replies from: Zack_M_Davis
comment by Vladimir_Nesov · 2009-06-11T20:47:34.204Z · LW(p) · GW(p)

Well, in this sense computing is also a first step on that path, Moore's law of mad science and all. Eurisko in particular doesn't seem to deserve more mention than that.

Replies from: None
comment by [deleted] · 2009-06-12T01:33:14.750Z · LW(p) · GW(p)

Doesn't seem to deserve more mention than the creation of computing? Sure. But computing has already been created.

Replies from: derekz
comment by derekz · 2009-06-12T02:12:52.391Z · LW(p) · GW(p)

Um, so has Eurisko.

Replies from: None
comment by [deleted] · 2009-06-12T02:33:30.138Z · LW(p) · GW(p)

...indeed. It seems that I failed to figure out just what I was arguing against. Let me re-make that point.

As far as first steps along that path go, they have already been taken: we have gone from a world without computers to a world with one, and we can't reverse that. The logical place to focus our efforts would seem to be the next step which has not been taken, which could very well be reimplementing EURISKO. (Though it could also very well be running a neural net on a supercomputer or some guy making the video game "Operant Conditioning Hero".)

Replies from: steven0461
comment by steven0461 · 2009-06-12T03:26:00.462Z · LW(p) · GW(p)

We have gone from a world without dictators to a world with one, and we can't reverse that. The logical place to focus our efforts would seem to be the next step which has not been taken, which could very well be resurrecting Hitler.

Replies from: thomblake, Nominull, None
comment by thomblake · 2009-06-12T15:11:54.842Z · LW(p) · GW(p)

Seriously? I did not think a discussion of Eurisko could be Godwinned. Bravo.

Replies from: steven0461, steven0461
comment by steven0461 · 2009-06-12T19:47:13.161Z · LW(p) · GW(p)

While grandparent was probably a miscalculation of some sort, I feel that mentioning Hitler is more acceptable if the context is Nazi super science than outrage maximization.

comment by steven0461 · 2009-06-12T19:18:51.380Z · LW(p) · GW(p)

Grandparent was probably a miscalculation of some sort, but I think mention of Hitler is acceptable if the context is Nazi super science rather than outrage maximization.

comment by Nominull · 2009-06-12T16:20:06.583Z · LW(p) · GW(p)

Resurrecting Hitler would probably teach us a lot about medicine, actually. If we can generalize the process by which we resurrect Hitler, we could save a lot of lives.

comment by [deleted] · 2009-06-12T03:54:23.615Z · LW(p) · GW(p)

True, if resurrecting Hitler is a good idea and we can cause it to happen; if resurrecting Hitler is inevitable and we can ensure that he ends up being a good guy; or if resurrecting Hitler would be bad and we can prevent it from happening.

comment by John_Maxwell (John_Maxwell_IV) · 2009-06-12T03:35:14.925Z · LW(p) · GW(p)

Do you suppose that developing a FAI will require at least some experience trying whatever works? I don't know of any major computer programs that were written entirely before they were first compiled...

Edit: I see SoullessAutomaton has written a very similar comment.

comment by SoullessAutomaton · 2009-06-11T20:37:59.639Z · LW(p) · GW(p)

But can we learn anything useful for a complete theory of intelligence based on something like EURISKO? Sure, it's an ad hoc, throw things at the wall and see what sticks approach--but so are our brains, and if something like EURISKO can show limited, non-foomy levels of optimization power it would at least provide another crappy data point other than vertebrate brains on how intelligence works.

Replies from: Vladimir_Nesov
comment by Vladimir_Nesov · 2009-06-11T20:53:46.005Z · LW(p) · GW(p)

I used to think it's useful to study ad-hoc attempts at AGI, but it now seems to me that knowledge of these chaotic things is both very likely a dead end, even for destroying the world, and of a wrong character to progress towards FAI.

Replies from: loqi, SoullessAutomaton, JulianMorrison
comment by loqi · 2009-06-11T21:37:58.080Z · LW(p) · GW(p)

I think one of the factors that contributes to interest in ad-hoc techniques is the prospect of a "thrilling discovery". One is allowed to fantasize that all of their time and effort may pay off suddenly and unpredictably, which makes the research seem that much more fun and exciting. This is in contrast to a more formal approach in which understanding and progress are incremental by their very nature.

I bring this up because I see it as a likely underlying motive for arguments of the form "ad-hoc technique X is worth pursuing even though it's not a formal approach".

Replies from: Annoyance, Vladimir_Nesov
comment by Annoyance · 2009-06-12T16:47:24.252Z · LW(p) · GW(p)

There are two kinds of scientific progress: the methodical experimentation and categorization which gradually extend the boundaries of knowledge, and the revolutionary leap of genius which redefines and transcends those boundaries. Acknowledging our debt to the former, we yearn nonetheless for the latter. - Academician Prokhor Zakharov, "Address to the Faculty"

Replies from: Roko
comment by Roko · 2009-06-12T21:40:39.299Z · LW(p) · GW(p)

Upvoted for alpha centuri reference. God I love that game!

comment by Vladimir_Nesov · 2009-06-11T21:55:32.944Z · LW(p) · GW(p)

No, it actually looks (just barely) feasible to get a FOOM out of something ad-hoc, and there are even good reasons for expecting that. But it doesn't seem to be on the way towards deeper understanding. The best to hope for is catching it right when the FOOMing is imminent and starting to do serious theory, but the path of blind experimentation doesn't seem to be the most optimal one even towards blind FOOM.

Replies from: Eliezer_Yudkowsky
comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2009-06-11T23:02:30.149Z · LW(p) · GW(p)

That doesn't contradict what logi said. It could still be a motive.

Replies from: Vladimir_Nesov
comment by Vladimir_Nesov · 2009-06-11T23:07:18.886Z · LW(p) · GW(p)

It could, but it won't be invalid motive, as I (maybe incorrectly) heard implied.

Replies from: loqi
comment by loqi · 2009-06-12T00:26:47.362Z · LW(p) · GW(p)

I didn't mean to imply it was an invalid motive, merely a potential underlying motive. If it is valid in the sense that you mean (and I think it is), that's just reason to scrutinize such claims even more closely.

comment by SoullessAutomaton · 2009-06-11T21:07:32.068Z · LW(p) · GW(p)

What changed your mind?

Replies from: Vladimir_Nesov
comment by Vladimir_Nesov · 2009-06-11T21:20:07.383Z · LW(p) · GW(p)

Starting to seriously think about FAI and studying more rigorous system modeling techniques/theories changed my mind. There seems to be very little overlap between wild intuitions of ad-hoc AGI and technical challenges of careful inference/simulation or philosophical issues with formalizing decision theories for intelligence on overdrive.

Some of the intuitions from thinking about ad-hoc seem to carry over, but it's just that: intuitions, and understanding of approaches to more careful modeling, even if they are applicable only on "toy" applications, gives deeper insight than knowledge of a dozen "real projects". Intuitions gained from ad-hoc do apply, but only as naive clumsy caricatures.

comment by JulianMorrison · 2009-06-15T12:45:03.880Z · LW(p) · GW(p)

Ad hoc AI is like ad hoc aircraft design. It flaps, it's got wings, it has to fly, right? If we keep trying stuff, we'll stumble across a wing that works. Maybe it's the feathers?

Replies from: randallsquared, Vladimir_Nesov
comment by randallsquared · 2009-06-16T13:41:12.803Z · LW(p) · GW(p)

Since such aircraft design actually worked, and produced aeroplanes before pure theory-based design, perhaps it's not the best analogy. [Edit: Unless that was your point]

comment by Vladimir_Nesov · 2009-06-15T13:40:15.303Z · LW(p) · GW(p)

There are multiple concepts in the potential of ad-hoc. There is Strong AI, Good AI (Strong AI that has a positive effect), and Useful AI (Strong AI that can be used as a prototype or inspiration for Good AI, but can go Paperclip maximizer if allowed to grow). These concepts can be believed to be in quite different relations to each other.

Your irony states that there is no potential for any Strong AI in ad-hoc. Given that stupid evolution managed to get there, I think that with enough brute force of technology it's quite feasible to get to Strong AI via this road.

Many reckless people working on AGI think that Strong AI is likely to also be a Good AI.

My previous position was that ad-hoc gives a good chance (in the near future) for Strong AI that is likely a Useful AI, but unlikely a Good AI. My current position is that ad-hoc has a small but decent chance (in the near future) for Strong AI, that is unlikely to be either Useful AI or Good AI.

Replies from: JulianMorrison
comment by JulianMorrison · 2009-06-15T13:59:08.344Z · LW(p) · GW(p)

BTW, none of the above classifications are "friendly".

Replies from: Vladimir_Nesov
comment by Vladimir_Nesov · 2009-06-15T14:09:55.770Z · LW(p) · GW(p)

Good AI is a category containing Friendly AI, that doesn't require the outcome to be precisely right. This separates more elaborated concept of Friendly AI from an informal concept (requirement) of good outcome.

I believe the concepts are much more close than it seems, that is it's hard to construct an AI that is not precisely Friendly, but still Good.

Replies from: JulianMorrison
comment by JulianMorrison · 2009-06-15T20:19:31.263Z · LW(p) · GW(p)

FAI is about being reliably harmless. Whether the outcome seems good in the short term is tangential. Even a "good" AI ought to be considered unfriendly if it's opaque to proof - what can you possibly rely upon? No amount of demonstrated good behavior can be trusted. It could be insincere, it could be sincere but fatally misguided, it could have a flaw that will distort its goals after a few recursions. We would be stupid to just run it and see.

Replies from: Vladimir_Nesov
comment by Vladimir_Nesov · 2009-06-15T21:01:42.815Z · LW(p) · GW(p)

At which point you are starting to think of what it takes to make not just informally "Good" AI, but an actually Friendly AI.

comment by thomblake · 2009-06-12T14:32:49.195Z · LW(p) · GW(p)

Right.

That's what I had in mind, though I didn't state it explicitly. It's what I meant by 'worked out'. It's clear that you want these things worked out formally, as strong as being provably friendly.

I'm still skeptical on the world-destroying. My money's on chaos to FOOM. Dynamism FTW. But then, I think AGI will come from robots.

comment by Vladimir_Nesov · 2009-06-11T20:36:07.681Z · LW(p) · GW(p)

Why are you still thinking it has any potential? (Assuming it doesn't, the above comment sounds ridiculous.)

comment by saturn · 2009-06-11T19:47:37.006Z · LW(p) · GW(p)

Does this imply that you think there's a significant risk of EURISKO recursively self-improving, or do you discourage it for other reasons?

comment by gRR · 2012-02-14T12:45:26.027Z · LW(p) · GW(p)

What if its domain is restricted to math and self-modification? Then, if it fooms, it will be a safe math Oracle, possibly even provably safe. Then it would be a huge help in the road to FAI, both directly and as a case study.

Replies from: Houshalter
comment by Houshalter · 2014-05-10T09:49:24.625Z · LW(p) · GW(p)

It may very well be possible to build such an AI. However there are several issues with it:

  • The AI can be adapted for other, less restricted, domains if knowledge on how it works spreads. There would be a large incentive to since such an oracle would only be of limited utility.

  • The AI adds code that will evolve into another AI into it's output. It's remotely possible, depending on what kind of problems you have it working on. If you were using it to design more efficient algorithms, in some cases an AI of some form might be the optimal solution.

    Even if you 100% trust the AI to provide the optimal output, you can't trust that the optimal output to the problem you've specified is what you actually want.

  • The AI could self-modify incorrectly and result in unfriendly AI. In order to be provably friendly/restricted, it would have to be 100% certain of any modification. That's a very tall order, especially in AI where everything has to be approximations or probabilistic.

  • It might not be as safe as you think it is. The AI runs some code and gets an unexpected result. Possibly because of a bug in the environment itself. Look up how difficult it is to sandbox untrusted code and you will get some appreciation for how a superintelligence could figure a way out of it's box.

    But it can't do anything with any exploits it finds because it is restricted to hard-coded axioms? Well, maybe. If it's using probabilities and some form of machine learning, it might be able to learn that "executing this code give me this result" and then learn to take advantage of that. I don't believe that a system can work only in formal proofs. However I might be completely wrong about this one, it's just a thought.

Replies from: gRR
comment by gRR · 2014-05-10T10:53:46.641Z · LW(p) · GW(p)

The AI can be adapted for other, less restricted, domains

That the ideas from a safe AI can be used to build an unsafe AI is a general argument against working on (or even talking about) any kind of AI whatsoever.

The AI adds code that will evolve into another AI into it's output

The output is to contain only proofs of theorems. Specifically, a proof (or refutation) of the theorem in the input. The state of the system is to be reset after each run so as to not accumulate information.

The AI could self-modify incorrectly and result in unfriendly AI

Any correct or incorrect self-modification is still restricted to the math domain, and so cannot result in an unsafe AI.

bug in the environment itself

Guarding against software bugs is easy in this case. You design an abstract virtual machine environment for the AI, then design the software that implements this environment, then formally prove that the software is correct. Guarding against errors caused by cosmic rays is also easy. You estimate the probability of such an error, and then add redundancy until the probability is so low that it won't happen until the end of the universe.

Look up how difficult it is to sandbox untrusted code

Sandboxing untrusted code is easy. The difficult thing is sandboxing it while making it think that it runs normally. This is irrelevant here.

I don't believe that a system can work only in formal proofs

It is quite probable that a "pure math Oracle" system cannot work. The point was, it can be made safe to try.

comment by rwallace · 2009-06-12T16:40:24.852Z · LW(p) · GW(p)

As we see here, the important difference between flying saucers and Unfriendly AI is that the latter belief is not harmless.

Eliezer, I know you're not expecting your efforts in spreading parasite memes to result in snuffing out the future of intelligent life in the universe, but I will ask you to consider the possibility that you may be mistaken.

Replies from: loqi, MugaSofer
comment by loqi · 2009-06-12T20:54:56.982Z · LW(p) · GW(p)

I notice you phrased this in terms of belief. I'm curious, what would you consider to be the minimum estimate of UFAI's probability necessary to "reasonably" motivate concern or action?

Replies from: rwallace
comment by rwallace · 2009-06-12T23:02:56.306Z · LW(p) · GW(p)

If I'm right, the effect of widespread propagation of such memes will be to snuff out what chance of survival and success humanity might have had. Unlike UFAI which is pure science fiction, the strangling of progress is something that occurs - has occurred before - in real life.

What would you consider to be the minimum estimate of the probability that I'm right, necessary to "reasonably" motivate concern or action?

Replies from: loqi
comment by loqi · 2009-06-13T02:19:09.642Z · LW(p) · GW(p)

I'm not quite sure what "you being right" means here. Your thesis is that propagating the UFAI meme will suppress scientific and technological progress such as to contribute non-negligibly to our destruction?

I'm afraid I don't have much background on how that's supposed to work. If you can explain what you mean or point me to an existing explanation, I'll try and give you an answer, rather than reactively throwing your question back at you.

Replies from: rwallace
comment by rwallace · 2009-06-13T13:37:22.605Z · LW(p) · GW(p)

Basically yes. Civilizations, species and worlds are mortal; there are rare long-lived species whose environment has remained unchanged for long periods of time, but the environment in which we evolved is long gone and our current one is not merely not stable, it is not even in equilibrium. And as long as we remain confined to one little planet running off a dwindling resource base and with everyone in weapon range of everyone else, there is nothing good about our long-term prospects. (For a fictional but eloquent discussion of some of the issues involved, see Permanence by Karl Schroeder.)

To change that, we need more advanced technology, for which we need software tools smart enough to help us deal with complexity. If our best minds start buying into the UFAI meme and turning away from building anything more ambitious than a social networking mashup, we may simply waste whatever chance we had. That is why UFAI belief is not as its proponents would have it the road of safety, but the road of oblivion.

Replies from: loqi, rhollerith
comment by loqi · 2009-06-13T18:50:24.934Z · LW(p) · GW(p)

rhollerith raised some reasonable objections to this response that I'd like to see answered, but I'll try and answer your question without that information:

What would you consider to be the minimum estimate of the probability that I'm right, necessary to "reasonably" motivate concern or action?

As as far as concern goes, I think my threshold for concern over your proposition is identical to my threshold for concern over UFAI, as they postulate similar results (UFAI still seems marginally worse due to the chance of destroying intelligent alien life, but I'll write this off as entirely negligible for the current discussion). I'd say 1:10,000 is a reasonable threshold for concern of the vocalized form, "hey, is anyone looking into this?" I'd love to see some more concrete discussion on this.

"Action" in your scenario is complicated by its direct opposition to acceptance of UFAI, so I can only give you some rough constraints. To simplify, I'll assume all risks allow equally effective action to compensate for them, even though this is clearly not the case.

Let R = the scenario you've described, E = the scenario in which UFAI is a credible threat. "R and E" could be described as "damned if we do, damned if we don't", in which case action is basically futile, so I'll consider the case where R and E are disjoint. In that case, action would only be justifiable if p(R) > p(E). My intuition says that such justification is proportional to p(R) - p(E), but I'd prefer more clarity in this step.

So that's a rough answer... if T is my threshold probability for action in the face of existential risk, T (p(R) - p(E)) is my threshold for action on your scenario. If R and E aren't disjoint, it looks something like T (p(R and ~E) - p(E and ~R)).

Replies from: rwallace
comment by rwallace · 2009-06-13T20:56:38.689Z · LW(p) · GW(p)

A fair answer, thanks.

Though I'm not convinced "R and E" necessarily means "damned either way". If I believed E in addition to R, I think what I would do is:

Forget about memetics in either direction as likely to do more harm than good, and concentrate all available resources on developing Friendly AI as reliably and quickly as possible.

However, provably Friendly AI is still not possible with 2009 vintage tools.

So I'd do it in stages, a series of self improving AIs, the early ones with low intelligence and crude Friendliness architecture, using them to develop better Friendliness architecture in tandem with increasing intelligence for the later ones. No guarantees, but if recursive self-improvement actually worked, I think that approach would have a reasonable chance of success.

comment by rhollerith · 2009-06-13T14:30:31.078Z · LW(p) · GW(p)

rwallace has been arguing the position that AI researchers are too concerned (or will become too concerned) about the existential risk from UFAI. He writes that

we need software tools smart enough to help us deal with complexity.

rwallace: can we deal with complexity sufficiently well without new software that engages in strongly-recursive self-improvement?

Without new AGI software?

One part of the risk that rwallace says outweighs the risk of UFAI is that

we remain confined to one little planet . . . with everyone in weapon range of everyone else

The only response rwallace suggests to that risk is

we need more advanced technology, for which we need software tools smart enough to help us deal with complexity

rwallace: please give your reasoning for how more advanced technology decreases the existential risk posed by weapons more than it increases it.

Another part of the risk that rwallace says outweighs the risk of UFAI is that

we remain confined to one little planet running off a dwindling resource base

Please explain how dwindling resources presents a significant existential risk. I can come up with several argument, but I'd like to see the one or two you consider the strongest arguments.

Replies from: whpearson, rwallace, rwallace
comment by whpearson · 2009-06-13T14:48:54.969Z · LW(p) · GW(p)

If we have uploads we can get off the planet and stay in space for a fraction of the resources it currently costs to do manned space flight. We can spread ourselves between the stars.

But an upload might go foom, so we should stop all upload research.

It is this kind of conundrum I see humanity in at the moment.

Replies from: rwallace, MugaSofer
comment by rwallace · 2009-06-13T15:10:33.240Z · LW(p) · GW(p)

I agree, and will add:

First, an upload isn't going to "go foom": a digital substrate doesn't magically confer superpowers, and early uploads will likely be less powerful than their biological counterparts in several ways.

Second, stopping upload research is not the path of safety, because ultimately we must advance or die.

Replies from: steven0461
comment by steven0461 · 2009-06-14T06:34:04.911Z · LW(p) · GW(p)

First, an upload isn't going to "go foom": a digital substrate doesn't magically confer superpowers, and early uploads will likely be less powerful than their biological counterparts in several ways.

Foom is about rate of power increase, not initial power level. Copy/paste isn't everything, but still a pretty good superpower.

Second, stopping upload research is not the path of safety, because ultimately we must advance or die.

It's not at all obvious to me that the increased risk of stagnation death outweighs the reduced risk of foom death.

Replies from: rwallace
comment by rwallace · 2009-06-14T07:09:29.124Z · LW(p) · GW(p)

You can't copy paste hardware; and no, an upload won't be able to run on a botnet.

Not to mention the bizarre assumption that an uploading patient will turn into a comic book villain whose sole purpose is to conquer the world.

Replies from: MugaSofer
comment by MugaSofer · 2013-04-25T13:35:24.984Z · LW(p) · GW(p)

an upload won't be able to run on a botnet.

Source?

Not to mention the bizarre assumption that an uploading patient will turn into a comic book villain whose sole purpose is to conquer the world.

Upvoted for this.

comment by MugaSofer · 2013-04-25T13:08:55.246Z · LW(p) · GW(p)

You know, I don't think I've ever seen someone argue that. Does anyone have any links?

comment by rwallace · 2009-06-13T16:31:02.927Z · LW(p) · GW(p)

I've written a more detailed explanation of why recursive self-improvement is a figment of our imaginations: http://code.google.com/p/ayane/wiki/RecursiveSelfImprovement

Replies from: orthonormal, loqi
comment by orthonormal · 2009-06-13T17:34:15.594Z · LW(p) · GW(p)

Your third point is valid, but your first is basically wrong; our environments occupy a small and extremely regular subset of the possibility space, so that success on a certain few tasks seems to correlate extremely well with predicted success across plausible future domains. Measuring success on these tasks is something AIs can easily do; EURISKO accomplished it in fits and starts. More generally, intelligence isn't magical: if there's any way we can tell whether a change in an AGI represents a bug or an improvement, then there's an algorithm that an AI can run to do the same.

As for the second problem, one idea that may not have occurred to you is that an AI could write a future version of itself, bug-check and test out various subsystems and perhaps even the entire thing on a virtual machine first, and then shut itself down and start up the successor. If there's a way for Lenat to see that EURISKO isn't working properly and then fix it, then there's a way for AI (version N) to see that AI (version N+1) isn't working properly and fix it before making the change-over.

Replies from: whpearson, rwallace, asciilifeform
comment by whpearson · 2009-06-13T19:41:40.717Z · LW(p) · GW(p)

In those posts you are arguing something different from what I was talking about. Sure chimps will never make better technology than humans, but sometimes making more advanced clever technology is not what you want to do and be positively detrimental to your chances of shaping the world to a desirable state. The arms race for nuclear weapons for example or bio-weapons research.

If humans manage to invent a virus that wipes us out, would you still call that intelligent? If so it is not that sort of intelligence we need to create... we need to create things that win in the end, not have short term wins and then destroy itself.

Replies from: asciilifeform
comment by asciilifeform · 2009-06-14T16:10:41.918Z · LW(p) · GW(p)

If humans manage to invent a virus that wipes us out, would you still call that intelligent?

Super-plagues and other doomsday tools are possible with current technology. Effective countermeasures are not. Ergo, we need more intelligence, ASAP.

comment by rwallace · 2009-06-13T18:35:06.114Z · LW(p) · GW(p)

"More generally, intelligence isn't magical: if there's any way we can tell whether a change in an AGI represents a bug or an improvement, then there's an algorithm that an AI can run to do the same."

Except that we don't - can't - do it by pure armchair thought, which is what the recursive self-improvement proposal amounts to.

The approach of testing a new version in a sandbox had occurred to me, and I agree it is a very promising one for many things - but recursive self-improvement isn't among them! Consider, what's the primary capability for which version N+1 is being tested? Why, the ability to create version N+2... which involves testing N+2... which involves creating N+3... etc.

Replies from: orthonormal, Vladimir_Nesov
comment by orthonormal · 2009-06-13T19:33:20.624Z · LW(p) · GW(p)

Again, there's enough correlation between ability to perform certain tasks that you don't need an infinite recursion. To test AIv(N+1)'s ability to program to exact specification, instead of having it program AIv(N+2) have it instead program some other things that AIvN finds difficult (but whose solutions are within AIvN's power to verify). That we will be applying AIv(N+1)'s precision programming to itself doesn't mean we can't test it on non-recursive data first.

ETA: Of course, since we want the end result to be a superintelligence, AIvN might also ask AIv(N+1) for verifiable insight into an array of puzzling questions, some of which AIvN can't figure out but suspects are tractable with increased intelligence.

comment by Vladimir_Nesov · 2009-06-13T19:09:44.836Z · LW(p) · GW(p)

If you observed something to work 15 times, how do you know that it'll work 16th time? You obtain a model of increasing precision with each test, that lets you predict what happens next, on a test you haven't performed yet. The same way, you can try to predict what happens on the first try, before any observations took place.

Another point is that testing can be a part of the final product: instead of building a working gizmo, you build a generic self-testing adaptive gizmo that finds the right parameters itself, and that is pre-designed to do that in the most optimal way.

comment by asciilifeform · 2009-06-14T03:41:49.501Z · LW(p) · GW(p)

EURISKO accomplished it in fits and starts

Where is the evidence that EURISKO ever accomplished anything? No one but the author has seen the source code.

comment by loqi · 2009-06-13T17:32:51.958Z · LW(p) · GW(p)

On the subject of self-improving AI, you say:

Unfortunately there are three fundamental problems, any one of which would suffice to make this approach unworkable even in principle.

Keep the bolded context in mind.

What constitutes an improvement is not merely a property of the program, but also a property of the world. Even if an AI program could find improvements, it could not distinguish them from bugs.

There are large classes of problem for which this is not the case. For example, "make accurate predictions about future sensory data based on past sensory data" relates program to world, but optimizing for this task is a property of the time, memory, and accuracy trade-offs involved. At the very least, your objection fails to apply "even in principle".

Any significant AI program is the best effort of one or more highly skilled human programmers. To qualitatively improve on that work, the program would have to be at least as smart as the humans, so even if recursive self-improvement were workable, which it isn't, it would be irrelevant until after human level intelligence is reached by other means.

This depends on the definition of "qualitatively improve". It seems Eurisko improved itself in important ways that Lenat couldn't have done by hand, so I think this too fails the "even in principle" test.

Any self-modifying system relies, for the ability to constructively modify upper layers of content, on unchanging lower layers of architecture. This is a general principle that applies to all complex systems from engineering to biology to politics.

This seems the most reasonable objection of the three. Interestingly enough, Eliezer claims it's this very difference that makes self-improving AI unique.

I think this also fails to apply universally, for somewhat more subtle reasons involving the nature of software. A self-improving AI possessing precise specifications of its components has no need for a constant lower layer of architecture (except if you mean the hardware, which is a very different question). The Curry-Howard isomorphism offers a proof-of-concept here: An AI composed of precise logical propositions can arbitrarily rewrite its proofs/programs with any other program of an equivalent type. If you can offer a good argument for why this is impossible in principle, I'd be interested in hearing it.

Replies from: rwallace
comment by rwallace · 2009-06-13T18:05:33.461Z · LW(p) · GW(p)

"It seems Eurisko improved itself in important ways that Lenat couldn't have done by hand"

As far as I can see, the only self improvements it came up with were fairly trivial ones that Lenat could easily have done by hand. Where it came up with important improvements that Lenat wouldn't have thought of by himself was in the Traveler game - a simple formal puzzle, fully captured by a set of rules that were coded into the program, making the result a success in machine learning but not self-improvement.

"The Curry-Howard isomorphism offers a proof-of-concept here"

Indeed this approach shows promise, and is the area I'm currently working on. For example if you can formally specify an indexing algorithm, you are free to look for optimized versions provided you can formally prove they meet the specification. If we can make practical tools based on this idea, we could make software engineering significantly more productive.

But this can only be part of the story. Consider the requirement that a program have an intuitive user interface. We have nothing remotely approaching the ability to formally specify this, nor could an AI ever come up with such by pure introspection because it depends on entities that are not part of the AI. Nor, obviously, is a formal specification of human psychology the kind of thing that would ever be approached accidentally by experimentation with Eurisko-style programs as was the original topic. And if the science of some future century - that would need to be to today's science as the latter is to witchcraft - ever does manage to accomplish such, why then, that would be the key to enabling the development of provably Friendly AI.

Replies from: loqi
comment by loqi · 2009-06-13T19:01:11.174Z · LW(p) · GW(p)

Where it came up with important improvements that Lenat wouldn't have thought of by himself was in the Traveler game - a simple formal puzzle, fully captured by a set of rules that were coded into the program, making the result a success in machine learning but not self-improvement.

...And applied these improvements to the subsequent modified set of rules. "That was machine learning, not self-improvement" sounds like a fully general counter-argument, especially considering your skepticism toward the very idea of self-improvement. Perhaps you can clarify the distinction?

Consider the requirement that a program have an intuitive user interface. We have nothing remotely approaching the ability to formally specify this, nor could an AI ever come up with such by pure introspection because it depends on entities that are not part of the AI.

An AI is allowed to learn from its environment, no one's claiming it will simply meditate on the nature of being and then take over the universe. That said, this example has nothing to do with UFAI. A paperclip maximizer has no need for an intuitive user interface.

And if [science] ever does manage to accomplish [a formal specification of human psychology], why then, that would be the key to enabling the development of provably Friendly AI.

Indeed! Sadly, such a specification is not required to interact with and modify one's environment. Humans were killing chimpanzees with stone tools long before they even possessed the concept of "psychology".

Replies from: rwallace
comment by rwallace · 2009-06-13T21:03:51.722Z · LW(p) · GW(p)

"Perhaps you can clarify the distinction?"

I'll call it self improvement when a substantial, nontrivial body of code is automatically developed, that is applicable to domains other than gameplaying, as opposed to playing a slightly different version of the same game. (Note that it was substantially the same strategy that won the second time as the first time, despite the rules changes.)

"A paperclip maximizer has no need for an intuitive user interface."

True if you're talking about something like Galactus that begins the story already possessing the ability to eat planets. However, UFAI believers often talk about paperclip maximizers being able to get past firewalls etc. by verbal persuasion of human operators. That certainly isn't going to happen without a comprehensive theory of human psychology.

comment by rwallace · 2009-06-13T14:50:54.232Z · LW(p) · GW(p)

rhollerith:

"Strongly-recursive self-improvement" is a figment of the imagination; among the logical errors involved is confusion between properties of a program and properties of the world.

As for the rest: do you believe humanity can survive permanently as we are now, confined to this planet? If you do, then I will point you to the geological evidence to the contrary. If not, then it follows that without more advanced technology, we are dead. Neither I nor anybody else can know what will be the proximate cause of death for the last individual, or in what century, but certain extinction is certain extinction nonetheless.

Replies from: rhollerith
comment by rhollerith · 2009-06-14T00:40:01.101Z · LW(p) · GW(p)

Let us briefly review the discussion up to now since many readers use the the comments page which does not provide much context. rwallace has been arguing that AI researchers are too concerned (or will become too concerned) about the existential risk from reimplementing EURISKO and things like that.

You have mentioned two or three times, rwallace, that without more advanced technology, humans will eventually go extinct. (I quote one of those 2 or 3 mentions below.) You mention that to create and to manage that future advanced technology, civilization will need better tools to manage complexity. Well, I see one possible objection to your argument right there, in that better science and better technology might well decrease the complexity of the cultural information humans are required to keep on top of. Consider that once Newton gave our civilization a correct theory of dynamics, almost all of the books written before Newton on dynamics could safely be thrown away (the exceptions being books by Descartes and Galileo that help people understand Newton and put Newton in historical context) which of course constitutes a net reduction in the complexity of the cultural information that our civilization has to keep on top of. (If it does not seem like a reduction, that is because the possession of Newtonian dynamical theory made our civilization more ambitious about what goals to try for.)

do you believe humanity can survive permanently as we are now, confined to this planet? If you do, then I will point you to the geological evidence to the contrary. If not, then it follows that without more advanced technology, we are dead.

But please explain to me what your argument has to do with EURISKO and things like that: is it your position that the complexity of future human culture can be managed only with better AGI software?

And do you maintain that that software cannot be developed fast enough by AGI researchers such as Eliezer who are being very careful about existential risks?

In general, the things you argue are dangerous are slow dangers. You yourself refer to "geological evidence" which suggests that they are dangerous on geological timescales.

In contrast, research into certain areas of AI seems to me genuinely fast dangers: things with a high probability of wiping out our civilization in the next 30, 50 or 100 years. It seems unwise to increase fast dangers to decrease slow dangers. But I suppose you disagree that AGI research if not done very carefully is a fast danger. (I'm still studying your arguments on that.)

Replies from: rwallace
comment by rwallace · 2009-06-14T01:32:27.297Z · LW(p) · GW(p)

Reduction in complexity is at least conceivable, I'll grant. For example if someone invented a zero point energy generator with the cost and form factor of an AA battery, much of the complexity associated with the energy industry could disappear.

But this seems highly unlikely. All the current evidence suggests the contrary: the breakthroughs that will be necessary and possible in the coming century will be precisely those of complex systems (in both senses of the term).

Human level AGI in the near future is indeed neither necessary nor possible. But there is a vast gap between that and what we have today, and we will, yes, need to fill some of that gap. Perhaps a key breakthrough would have come from a young researcher who would have re-implemented Eurisko and from the experiment acquired a critical jump in understanding - and who has now quietly left, thinking Eurisko might blow up the world, to reconsider that job offer from Electronic Arts.

I do disagree that AGI research is a fast danger. I will grant you that there is a sense in which the dangers I am worried about are slow ones - barring unlikely events like a large asteroid impact (which is likely only over longer time scales), I'm confident humanity will still exist 100 years from now.

But our window of opportunity may not. Consider that civilizations are mortal, for reasons unrelated to this conversation. Consider that environments conducive to scientific progress are even considerably rarer and more transient than civilization itself. Consider also that the environment in which our civilization arose is gone, and is not coming back. (For the simplest example, while fossil fuels still exist, the easily accessible deposits thereof, so important for bootstrapping, are largely gone.) I think it quite possible that the 21st-century may be the last hard step in the Great Filter, that by the year 2100 the ultimate fate of humanity may in fact have been decided, even if nobody on that date yet knows it. I cannot of course be certain of this, but I think it likely enough that we cannot afford to risk wasting this window of opportunity.

Replies from: loqi
comment by loqi · 2009-06-14T08:02:51.638Z · LW(p) · GW(p)

One problem with this argument is how conjunctive it is: "(A) Progress crucially depends on breakthroughs in complexity management and (B) strong recursive self-improvement is impossible and (C) near-future human level AGI is neither dangerous nor possible but (D) someone working on it is crucial for said complexity management breakthroughs and (E) they're dissuaded by friendliness concerns and (F) our scientific window of opportunity is small."

My back-of-the-envelope, generous probabilities:

A. 0.5, this is a pretty strong requirement.

B. 0.9, for simplicity, giving your speculation the benefit of the doubt.

C. 0.9, same.

D. 0.1, a genuine problem of this magnitude is going to attract a lot of diverse talent.

E. 0.01, this is the most demanding element of the scenario, that the UFAI meme itself will crucially disrupt progress.

F. 0.05, this would represent a large break from our current form of steady scientific progress, and I haven't yet seen much evidence that it's terribly likely.

That product comes out to roughly 1:50,000. I'm guessing you think the actual figure is higher, and expect you'll contest those specific numbers, but would you agree that I've fairly characterized the structure of your objection to FAI?

comment by MugaSofer · 2013-04-25T12:49:58.993Z · LW(p) · GW(p)

Neither is a belief in flying saucers, taken seriously, for most values of "harmless".

To be clear, you are saying that you consider spontaneous FAI overwhelmingly likely, and thus consider any failure to promote FOOMing both a humanitarian catastrophe and a gateway to other existential risks? Because that seems like a claim worth proving.

comment by CannibalSmith · 2009-06-12T10:26:59.969Z · LW(p) · GW(p)

You just made me want to participate even more!

Replies from: Eliezer_Yudkowsky
comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2009-06-12T19:44:39.568Z · LW(p) · GW(p)

...aaaand that's why I don't go around discussing the danger paths until someone (who I can realistically influence) actually starts to advocate going down them. Plenty of idiots to take it as an instruction manual. So I discuss the safe path but make no particular advance effort to label the dangerous ones.

Replies from: asciilifeform
comment by asciilifeform · 2009-06-14T04:24:31.476Z · LW(p) · GW(p)

Eliezer,

I am rather surprised that you accept all of the claimed achievements of Eurisko and even regard it as "dangerous", despite the fact that no one save the author has ever seen even a fragment of its source code. I firmly believe that we are dealing with a "mechanical Turk."

I am also curious why you believe that meaningful research on Friendly AI is at all possible without prior exposure to a working AGI. To me it seems a bit like trying to invent the ground fault interrupter before having discovered electricity.

Aside from that: If I had been following your writings more carefully, I might already have learned the answer to this, but: just why do you prioritize formalizing Friendly AI over achieving AI in the fist place? You seem to side with humanity over a hypothetical Paperclip Optimizer. Why is that? It seems to me that unaugmented human intelligence is itself an "unfriendly (non-A)I", quite efficient at laying waste to whatever it touches.

There is every reason to believe that if an AGI does not appear before the demise of cheap petroleum, our species is doomed to "go out with a whimper." I for one prefer the "bang" as a matter of principle.

I would gladly accept taking a chance at conversion to paperclips (or some similarly perverse fate at the hands of an unfriendly AGI) when the alternative appears to be the artificial squelching of the human urge to discover and invent, with the inevitable harvest of stagnation and eventually oblivion.

I accept Paperclip Optimization (and other AGI failure modes) as an honorable death, far superior to being eaten away by old age or being killed by fellow humans in a war over dwindling resources. I want to live in interesting times. Bring on the AGI. It seems to me that if any intelligence, regardless of its origin, is capable of wrenching the universe out of our control, it deserves it.

Why is the continued hegemony of Neolithic flesh-bags so precious to you?

Replies from: Z_M_Davis
comment by Z_M_Davis · 2009-06-14T05:07:21.883Z · LW(p) · GW(p)

Aside from that: If I had been following your writings more carefully, I might already have learned the answer to this, but: just why do you prioritize formalizing Friendly AI over achieving AI in the fist place?

This was addressed in "Value is Fragile."

It seems to me that if any intelligence, regardless of its origin, is capable of wrenching the universe out of our control, it deserves it.

I don't think you understand the paperclip maximizer scenario. An UnFriendly AI is not necessarily conscious; it's just this device that tiles the light cone with paperclips. Arguably it helps to say "really powerful optimization process" rather than "intelligence." Consider that we would not say that a thermostat deserves to control the temperature of a room, even if we happen to be locked in the room and are to be roasted to death.

Replies from: asciilifeform
comment by asciilifeform · 2009-06-14T05:57:33.955Z · LW(p) · GW(p)

I was going to reply, but it appears that someone has eloquently written the reply for me.

I'd like to take my chances of being cooked vs. a world without furnaces, thermostats or even rooms - something I believe we're headed for by default, in the very near future.

Replies from: loqi
comment by loqi · 2009-06-14T08:13:49.915Z · LW(p) · GW(p)

This reminds me of the response I got when I criticized an acquaintance for excessive, reckless speeding: "Life is all about taking risks."

Replies from: Roko
comment by Roko · 2009-06-15T16:56:05.058Z · LW(p) · GW(p)

The only difference was that he was mostly risking his own life, whereas asciilifeform is risking mine, yours and everyone else's too.

ASCII - the onus is on you to give compelling arguments that the risks you are taking are worth it.

Replies from: loqi, asciilifeform
comment by loqi · 2009-06-15T22:04:49.375Z · LW(p) · GW(p)

Actually, I fully intended the implication that he was risking more than his own life. Self-inflicted risks don't concern me.

Now you've got me wondering what the casualty distribution for speeding-induced accidents looks like.

Replies from: Roko
comment by Roko · 2009-06-15T22:58:14.638Z · LW(p) · GW(p)

Well if ASCII has his way, there may be one data point at casualty level 6.6 billion ...

Replies from: asciilifeform
comment by asciilifeform · 2009-06-16T00:44:36.913Z · LW(p) · GW(p)

I will accept that "AGI-now" proponents should carry the blame for a hypothetical Paperclip apocalypse when Friendliness proponents accept similar blame for an Earth-bound humanity flattened by a rogue asteroid (or leveled by any of the various threats a superintelligence - or, say, the output of a purely human AI research community unburdened by Friendliness worries - might be able to counter. I previously gave Orlov's petrocollapse as yet another example.)

comment by asciilifeform · 2009-06-16T00:29:38.310Z · LW(p) · GW(p)

ASCII - the onus is on you to give compelling arguments that the risks you are taking are worth it

Status quo bias, anyone?

I presently believe, not without justification, that we are headed for extinction-level disaster as things are; and that not populating the planet with the highest achievable intelligence is in itself an immediate existential risk. In fact, our current existence may well be looked back on as an unthinkably horrifying disaster by a superintelligent race (I'm thinking of Yudkowsky's Super-Happies.)

Replies from: loqi
comment by loqi · 2009-06-16T03:33:07.155Z · LW(p) · GW(p)

Since your justification is omitted here, I'll go ahead and suspect it's at least as improbable as this one. The question isn't simply "do we need better technology to mitigate existential risk", it's "are the odds that technological suppression due to friendliness concerns wipes us out greater than the corresponding AGI risk".

If you assume friendliness is not a problem, AI is obviously a beneficial development. Is that really the major concern here? All this talk of the benefits of scientific and technological progress seems wasted. Take friendliness out of the picture, I doubt many here would disagree with the general point that progress mitigates long-term risk.

So please, be more specific. The argument "lack of progress contributes to existential risk" contains no new information. Either tell us why this risk is far greater than we suspect, or why AGI is less risky than we suspect.

comment by Richard_Kennaway · 2009-06-11T17:02:34.419Z · LW(p) · GW(p)

The journal's web site is here, from where I've just downloaded a copy of the paper. I don't know if it's freely available (my university has a subscription), but if anyone wants it and can't get it from the web site, send me an email address to send it to. (EDIT: Now online, see my later comment.)

The paper describes itself as the third in a series, of which the first appeared in the same journal, volume 19, pp.189-249 (also downloaded). The second is in a volume called "Machine Learning", which you can find here, but I haven't checked if the whole book is accessible. (EDIT: sorry, wrong reference, see later comment.)

Personally, I'm deeply sceptical of all work that has ever been done on AI (including the rebranding as AGI), which is why I consider Friendly AI to be a real but remote problem. However, I've no interest in raining on everyone else's parade. If you think you can make it work, go for it!

Replies from: HalFinney, Vladimir_Nesov
comment by HalFinney · 2009-06-11T17:14:55.137Z · LW(p) · GW(p)

Isn't the second paper in the series the one immediately before the Eurisko paper, "Theory formation by heuristic search: The nature of heuristics II: Background and examples" by Lenat, volume 21, page 31? Can you download that one?

Replies from: Richard_Kennaway
comment by Richard_Kennaway · 2009-06-11T17:28:08.194Z · LW(p) · GW(p)

Yes, it is, my mistake. I now have all three papers. I've temporarily put them up in the directory here, a URL to which you will have to add lenatN.pdf for N = 1, 2, or 3. (I'm avoiding posting the complete URL so that Google won't find them.)

Replies from: SilasBarta, cousin_it
comment by SilasBarta · 2009-06-11T19:17:23.946Z · LW(p) · GW(p)

I'd like to echo cousin_it's thanks, I downloaded them as well.

I haven't gotten to read much yet, but I've also run into the problem he's mentioned with theoretical computer science papers being too vague to write code, let alone include it. (Marcus Hutter and Juergen Schmidhuber, I'm looking in your general direction here.)

Replies from: Eliezer_Yudkowsky
comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2009-06-11T20:21:01.889Z · LW(p) · GW(p)

You're having trouble figuring out how to implement AIXI? I saw Marcus write it out as one equation. Perfectly clear what the main loop looks like. All you need is an infinitely fast computer and a halting oracle.

Replies from: SoullessAutomaton, SilasBarta
comment by SoullessAutomaton · 2009-06-13T14:12:37.193Z · LW(p) · GW(p)

All you need is an infinitely fast computer and a halting oracle.

Couldn't you implement a halting oracle given an infinitely fast computer, though?

So, that's one requirement down! We'll have this AIXI thing built any day now.

comment by SilasBarta · 2009-06-13T14:02:07.117Z · LW(p) · GW(p)

+5? Yikes! People, it's clear Eliezer_Yudkowsky is joking. There are no infinitely fast computers or halting oracles, and an equation is not the same thing as code, let alone pseudocode.

In any case, AIXI isn't my main complaint in that department. I'm thinking more of

Hutter's fastest shortest algorithm for everything and AIXI-tl; and Schmidhuber's provably globally optimal Goedel machines, speed prior, and ordered optimal problem solver

Toy implementations anytime, guys?

Replies from: orthonormal
comment by orthonormal · 2009-06-13T17:42:53.880Z · LW(p) · GW(p)

I think that most upvoters got the joke...

comment by cousin_it · 2009-06-11T19:07:48.041Z · LW(p) · GW(p)

Richard, thanks a lot! Downloaded all three articles. (No, ScienceDirect doesn't let me download them - the link says "Purchase PDF".)

First surface impression: none of the three papers are as specific as I'd like. After a skim I still have no idea how EURISKO's main loop would look in pseudocode. Will try reading closer.

comment by Vladimir_Nesov · 2009-06-11T17:10:29.494Z · LW(p) · GW(p)

Are you this Richard Kennaway? (You gave no apparent contact info.)

Replies from: Cyan, Richard_Kennaway
comment by Cyan · 2009-06-11T17:33:59.022Z · LW(p) · GW(p)

Presumably he meant LW's direct messaging, which you can use by clicking on the little envelope under your karma score to get to your inbox, and then clicking 'compose'.

comment by Richard_Kennaway · 2009-06-11T17:29:33.311Z · LW(p) · GW(p)

Yes. Almost all Google hits for "Richard Kennaway" are for me. (Number 2 by a long way is a political scientist in New Zealand.)

Replies from: thomblake
comment by thomblake · 2009-06-11T17:40:49.302Z · LW(p) · GW(p)

That's a funny parallel to my own situation. Most hits on the first few pages of Google for "Thom Blake" are me, but the runner-up is a historian from Australia.

comment by akkartik · 2009-06-13T22:31:08.561Z · LW(p) · GW(p)

One of my professors at UT reimplemented AM many years ago. I dusted it off and got it to compile with GNU prolog last christmas. Never got around to doing anything with it, though.

http://github.com/akkartik/am-utexas/tree/master

comment by asciilifeform · 2009-06-15T19:20:00.091Z · LW(p) · GW(p)

I have located a paper describing Lenat's "Representation Language Language", in which he wrote Eurisko. Since no one has brought it up in this thread, I will assume that it is not well-known, and may be of interest to Eurisko-resurrection enthusiasts. It appears that a somewhat more detailed report on RLL is floating around public archives; I have not yet been able to track down a copy.

comment by Richard_Kennaway · 2009-06-14T12:05:36.949Z · LW(p) · GW(p)

I have found Haase's thesis online. Would it be irresponsible of me to post the link here? (It is not actually hard to find.)

ETA: How concerned should we be that DARPA is going full steam ahead for strong AI? Perhaps not very much, given the failure of at least two of their projects along these lines:

Replies from: outlawpoet
comment by outlawpoet · 2009-06-15T00:35:45.779Z · LW(p) · GW(p)

There are a number of DARPA and IARPA projects we pay attention to, but I'd largely agree that their approaches and basic organization makes them much less worrying.

They tend towards large, bureaucratically hamstrung projects, like PAL, which the last time I looked included work and funding for teams at seven different universities, or they suffer from extreme narrow focus, like their intelligent communication initiatives, which went from being about adaptive routing via deep introspection of multimedia communication and intelligent networks, to just being software radios and error correction.

They're worth keeping any eye on mostly because they have the money to fund any number of approaches, and often in long periods. But the biggest danger isn't their funded, stated goals, it's the possibility of someone going off-target, and working on generic AI in the hopes of increasing their funding or scope in the next evaluation, which could be a year or more later.

comment by Richard_Kennaway · 2009-06-11T21:22:15.246Z · LW(p) · GW(p)

I've just been Googling to see what became of EURISKO. The results are baffling. Despite its success in its time, there has been essentially no followup, and it has hardly been cited in the last ten years. Ken Haase claims improvements on EURISKO, but Eliezer disagrees; at any rate, the paper is vague and I cannot find Haase's thesis online. But if EURISKO is a dead end, I haven't found anything arguing that either.

Perhaps in a future where Friendly AI was achieved, emissaries are being/will be sent back in time to prevent any premature discovery of the key insights necessary for strong AI.

Replies from: pjeby, Eliezer_Yudkowsky, SoullessAutomaton
comment by pjeby · 2009-06-11T21:45:57.635Z · LW(p) · GW(p)

Hm, the abstract for that paper mentions that:

a close coupling of representation syntax and semantics is neccessary for a discovery program to prosper in a given domain

This is a really interesting point; it seems related to the idea that to be an expert in something, you need a vocabulary close to the domain in question.

It also immediately raises the question of what the expert vocabulary of vocabulary formation/acquisition is, i.e. the domain of learning.

Replies from: SilasBarta, Daniel_Burfoot
comment by SilasBarta · 2009-06-15T18:46:54.865Z · LW(p) · GW(p)

a close coupling of representation syntax and semantics is neccessary for a discovery program to prosper in a given domain

This is a really interesting point; it seems related to the idea that to be an expert in something, you need a vocabulary close to the domain in question.

It doesn't seem that interesting to me: it's just a restatement that "data compression = data prediction". When you have a vocabulary "close to the domain" that simply means that common concepts are compactly expressed. Once you've maximally compressed a domain, you have discovered all regularities, and simply outputting a short random string will decompress into something useful.

How do you find which concepts are common and how do you represent them? Aye, there's the rub.

It also immediately raises the question of what the expert vocabulary of vocabulary formation/acquisition is, i.e. the domain of learning.

So my guess would be that the expert vocabulary of vocabulary formation is the vocabulary of data compression. I don't know how to make any use of that, though, because the No Free Lunch Theorems seem to say that there's no general algorithm that is the best across all domains And so there's no algorithmic way to find which is the best compressor for this universe.

(ETA: multiple quick edits)

comment by Daniel_Burfoot · 2009-06-12T14:24:50.736Z · LW(p) · GW(p)

This is a really interesting point; it seems related to the idea that to be an expert in something, you need a vocabulary close to the domain in question.

I'm not so sure about this. I am pretty good at understanding visual reality, and I have some words to describe various objects, but my vocabulary is nowhere near as rich as my understanding is (of course, I'm only claiming to be an average member of a race of fantastically powerful interpreters of visual reality).

Let me give you an example. Say you had two pictures of faces of two different people, but the people look alike and the pictures were taken under similar conditions. Now a blind person, who happens to be a Matlab hacker, asks you to explain how you know the pictures are of different people, presumably by making reference to the pixel statistics of certain image regions (which the blind person can verify with Matlab). Is your face recognition vocabulary up to this challenge?

Replies from: Cyan
comment by Cyan · 2009-06-12T14:58:26.767Z · LW(p) · GW(p)

I think "vocabulary" in this sense refers to the vocabulary of the bits doing the actual processing. Humans don't have access to the "vocabulary" of their fusiform gyruses, only the result of its computations.

comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2009-06-11T23:04:31.872Z · LW(p) · GW(p)

Perhaps in a future where Friendly AI was achieved, emissaries are being/will be sent back in time to prevent any premature discovery of the key insights necessary for strong AI.

As silly explanations go, I prefer the anthropic explanation: In worlds where AI didn't stagnate, you're dead and hence not reading this.

Replies from: Richard_Kennaway, Jonathan_Graehl
comment by Richard_Kennaway · 2009-06-12T10:40:01.218Z · LW(p) · GW(p)

Or in non-anthropic terms, strong AI could be done on present-day hardware, if we only knew how, and our survival so far is down to blind luck in not yet discovering the right ideas?

For how long, in your estimate, has the hardware been powerful enough for this to be so?

If Eurisko was a non-zero step towards strong AI, would it have been any bigger a step if Lenat had been using present-day hardware? Or did it fizzle because it didn't have sufficiently rich self-improvement capabilities, regardless of how fast it might have been implemented?

comment by Jonathan_Graehl · 2009-06-12T00:13:42.557Z · LW(p) · GW(p)

That is silly. In the same vein, why worry about any risks? You'll continue to exist in whatever worlds they didn't develop into catastrophe.

Replies from: Roko
comment by Roko · 2009-06-12T21:45:34.832Z · LW(p) · GW(p)

This is a very serious point and has been worrying me for some time. This problem connects to continuity of consciousness and reference classes.

Replies from: Eliezer_Yudkowsky
comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2009-06-12T21:54:41.167Z · LW(p) · GW(p)

Not all worlds in which you continue to exist are pleasant ones. I think Michael Vassar once called quantum immortality the most horrifying hypothesis he had ever taken seriously, or something along those lines.

Replies from: loqi, Roko, NancyLebovitz, SoullessAutomaton
comment by loqi · 2009-06-12T22:39:25.408Z · LW(p) · GW(p)

Indeed. In particular, "dying of old age" is pretty damn horrifying if you think quantum immortality holds.

comment by Roko · 2009-06-13T13:34:00.526Z · LW(p) · GW(p)

Sure, but the idea that we should ignore futures where we are dead will still have some bizarre implications. For example, it would strongly contradict Nick Bostrom's MaxiPOK principle (maximize the probability of an OK outcome). In particular, if you thought that the development of AGI would lead to utopia with probability p u, near instant human extinction with probability p e and torture of humans with probability p _ t, where

p t << p u

then one would have a strong motive to accelerate the development of AGI as much as possible, because the total probability of mediocre outcomes due to non-extinction global catastrophes like resource depletion or nuclear war increases every year that AGI doesn't get developed. Your actions would be dominated by trying to increase the strength of the inequality p t << p u whilst getting the job done quickly enough that p u was still bigger than the probability of ordinary global problems such as global warming happening in your development window. You would do this even at the expense of increasing the probability p e - potentially until it was > 0.5. You'd better be damn sure that anthropic reasoning is correct if you're going to do this!

comment by NancyLebovitz · 2010-07-14T12:19:02.742Z · LW(p) · GW(p)

If there's quantum immortality, what proportion of your lives would be likely to be acutely painful?

I don't have an intuition on that one. It seems as though worlds in which something causes good health would predominate over just barely hanging on, but I'm unsure of this.

comment by SoullessAutomaton · 2009-06-12T22:42:55.220Z · LW(p) · GW(p)

Hunh. I'm glad I'm not the only person who has always found quantum immortality far more horrifying than nonexistence.

comment by SoullessAutomaton · 2009-06-12T11:00:20.048Z · LW(p) · GW(p)

The most sensible explanation has, I think been mentioned previously: that EURISKO was both overhyped and a dead end. Perhaps the techniques it used fell apart rapidly in less rigid domains than rule-based wargaming, and perhaps its successes were very heavily guided by Lenat. It's somewhat telling that Lenat, the only one who really knows how it worked, went off to do something completely different from EURISKO.

In this regard, one could consider something like EURISKO not as a successful AI, but as a successful cognitive assistant for someone working in a mostly unexplored rule-based system. Recall the results that AM, EURISKO's predecessor, got--if memory serves me, it rediscovered a lot of mathematical principles, none of them novel, but duplicating mostly from scratch results that took many years and many mathematicians to find originally.

Not that I'm certain this is the case by a long shot, but it seems the most superficially plausible explanation.

Replies from: ChrisHibbert
comment by ChrisHibbert · 2009-06-13T03:18:03.246Z · LW(p) · GW(p)

From what I remember of the papers, it was pretty clear (though perhaps not stated explicitly) that AM "happened across" many interesting factoids about math, but it was Lenat's intervention that declared them important and worth further study. I think your second paragraph implies this, but I wanted it to be explicit.

A reasonable interpretation of AM's success was that Lenat was able to recognize many important mathematical truths in AM's meanderings. Lenat never claimed any new discoveries on behalf of AM.

Replies from: Cyan, SoullessAutomaton
comment by Cyan · 2009-06-13T03:34:59.496Z · LW(p) · GW(p)

Lenat was also careful to note that AM's success, such as it was, was very much due to the fact that LISP's "vocabulary" started with a strong relation to mathematics. EURISKO didn't show anything like reasonable performance until he realized that the vocabulary it was manipulating needed to be "close" to the modeled domain, in the sense that interesting (to Lenat) statements about the domain needed to be short, and therefore easy for EURISKO to come across.

comment by SoullessAutomaton · 2009-06-13T13:49:28.907Z · LW(p) · GW(p)

Yeah, that was basically what I meant. My hypothesis was that if you gave AM to someone with good mathematical aptitude but little prior knowledge, they would discover a lot more interesting mathematical statements than they would have without AM's help, by analogy to Lenat discovering more interesting logical consequences of the wargaming rules with EURISKO's help than any of the experienced players discovered themselves.

comment by Douglas Miles (douglas-miles) · 2023-11-25T14:31:13.858Z · LW(p) · GW(p)

Doug Lenat's source code for AM and possibly EURISKO w/Traveller found in public archives.. see https://white-flame.com/am-eurisko.html 

comment by asciilifeform · 2009-06-14T03:36:26.578Z · LW(p) · GW(p)

I find it extremely difficult to believe that Eurisko actually worked as advertised, given Dr. Lenat's behavior when confronted with requests for the source code.

What I find truly astounding is the readiness with which other researchers, textbook authors, journalists, etc. simply took his word for it, without holding the claim to anything like the usual standards of scientific evidence.

Replies from: Eliezer_Yudkowsky
comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2009-06-14T05:12:52.662Z · LW(p) · GW(p)

He did win the Trillion-Credit Squadron tournament.

Replies from: asciilifeform
comment by asciilifeform · 2009-06-14T05:26:01.371Z · LW(p) · GW(p)

Well, Lenat did. Whether or in what capacity a computer program was involved is an open question.

Replies from: outlawpoet
comment by outlawpoet · 2009-06-15T00:43:22.089Z · LW(p) · GW(p)

It's useful evidence that EURISKO was doing something. There were some extremely dedicated and obsessive people involved in Traveller, back then. The idea that someone unused to starship combat design of that type could come and develop fleets that won decisively two years in a row seems very unlikely.

It might be that EURISKO acted merely as a generic simulator of strategy and design, and Lenat did all the evaluating, and no one else in the contest had access to simulations of similar utility, which would negate much of the interest in EURISKO, I think.

Replies from: asciilifeform
comment by asciilifeform · 2009-06-15T00:58:10.220Z · LW(p) · GW(p)

There were some extremely dedicated and obsessive people involved in Traveller, back then

How many of them made use of any kind of computer? How many had any formal knowledge applicable to this kind of optimization?

comment by SilasBarta · 2009-06-11T16:48:37.307Z · LW(p) · GW(p)

I'm with you, all the way. I was intensely curious when I first read about it. Specifically, the idea of being able to generate arbitrary concepts without being pre-programmed, and having heuristics and metaheuristics and meta[*n]-heuristics that were apparently able to come up with non-obvious solutions to problems, like that war game.

It even came up with interesting results when it didn't solve anything, such as heuristics that somehow optimized themselves for "claiming credit for findings of other heuristics".

So yes, let's pull back the curtain.

comment by thomblake · 2009-06-11T16:55:50.503Z · LW(p) · GW(p)

I've always been more suspicious it's a 'mechanical turk' than a 'clever hans'.

How could he not make the source code public? Who does he think he is, Microsoft?

Replies from: Houshalter, Jonathan_Graehl
comment by Houshalter · 2013-04-25T05:33:31.845Z · LW(p) · GW(p)

Well winning the traveler tournament and designing circuits is a bit too much for just a person. Eurisko had to have done something, even if it had help or it's abilities were exaggerated. I think it's unlikely it didn't exist at all or was totally faked.

My guess is he exaggerated what it was capable of. It's also possible it really did work and he planned on capitalizing on it. Or maybe he was legitimately scared of it falling into the wrong hands and becoming unfriendly AI.

comment by Jonathan_Graehl · 2009-06-12T00:49:33.364Z · LW(p) · GW(p)

Modern "AI" research programs tend to develop relatively simple "training wheels" tasks with objectively measurable and reproducible performance. But you have at least the trappings of science. The same can't be said for most early AI work.

If there really isn't enough information in his papers to reproduce his result (I have not read them), then Lenat has to at least be suspected of painting an overly rosy picture of how awesome his creation was.

If the result is just "this is cool", then a public binary, web service, or source code release would be welcome.

comment by Vladimir_Nesov · 2009-06-11T16:59:04.382Z · LW(p) · GW(p)

I bet any results therein are subsumed by modern developments, and are nothing particularly interesting from the right background, so only the mystery continues to capture attention.

comment by aindilis · 2023-11-25T14:21:28.043Z · LW(p) · GW(p)

Doug Lenat's sources for AM (and EURISKO+Traveller?) found in public archives

https://news.ycombinator.com/item?id=38413615

comment by momom2 (amaury-lorin) · 2023-01-11T09:12:41.644Z · LW(p) · GW(p)

Update on this project: Lenat's thesis on AM is available for purchase online, and explains with all necessary details how AM works. (AM: An Artificial Intelligence Approach to Discovery in Mathematics as Heuristic Search)
Unfortunately I have not found a paper that describes Eurisko itself with the same degree of precision, but that's not too much of an issue.

For a school project, I am reimplementing AM in the context of chess-playing, and it's looking good. Lenat's thesis is largely enough to do that.

comment by iGniSz · 2009-06-13T21:43:06.560Z · LW(p) · GW(p)

Interesting post (thanks for putting up the detailed papers RichardKennaway!!)! I've always been fascinated by Doug Lenat and his creations and I would like to share a google techtalk, by Doug Lenat about his work and ideas from 2006. It's contents have direct bearing on this post (although it doesn't mention EURISKO specifically nor does it give insight into it's main loop, it's more of an overview thing and the first half has a slight bias towards search and it ends up discussing CYC: it does give a lot of good information about how to model the world and make an reasoning agent on top of that, the most interesting part starts at 29:00 minutes in)

http://video.google.com/videoplay?docid=-7704388615049492068

As for the question in the post: I am a programmer, and would love to lend my time to a project such as this. My vote would be for python, since it allows for very rapid development and has a lot of ai code available like bayesian inference, and fuzzy logic. It has good (enough) performance and is easily distributable, but PHP and/or Java would be fine by me as well.

I think an easily runnable program that would produce results and give insight would be at least "just cool" and might help inspire a next generation of people. I believe there are dangers in AGI, and FAI is a problem that merits attention but at the same time I believe that it is an essential next step in the development of mankind. If you look at our development you see a steady movement of "outsourcing" our abilities. First it was out muscle power but since the invention of computers it has become intelligence. This is a good thing, you used to need a trained proffesional to do a bloodsugar test for you, now you can buy this "intelligence" in a small electronic package. This is good since it frees up time for the professional and reduces the error rate for the patient. AGI would allow for even more of this!

comment by JamesAndrix · 2009-06-12T16:07:35.060Z · LW(p) · GW(p)

Let's implement a secure sandbox first.

Replies from: Henrik_Jonsson
comment by Henrik_Jonsson · 2009-06-13T21:25:00.886Z · LW(p) · GW(p)

As long as you have a communications channel to the AI it would not be secure, since you are not a secure system and could be compromised by a sufficiently intelligent agent.

See http://yudkowsky.net/singularity/aibox

Replies from: Vladimir_Nesov, JamesAndrix, asciilifeform
comment by Vladimir_Nesov · 2009-06-13T21:52:32.608Z · LW(p) · GW(p)

As long as you have a communications channel to the AI it would not be secure, since you are not a secure system and could be compromised by a sufficiently intelligent agent.

Intelligence is no help if you need to open a safe that only gets opened by one of the 10^10 possible combinations. You also need enough information about the correct combination to have any chance of guessing it. Humans likely have different compromising combinations, if any, so you'd also need to know a lot about a specific person, or even about their state of mind at the moment, the knowledge of human psychology in general might not be enough.

(But apparently what would look to a human like almost no information about the correct combination might be more than enough to a sufficiently clever AI, so it's unsafe, but it's not magically unsafe.)

Replies from: Henrik_Jonsson
comment by Henrik_Jonsson · 2009-06-13T22:33:25.812Z · LW(p) · GW(p)

If you had a program that might or might not be on a track to self-improve and initiate an Intelligence explosion you'd better be sure enough that it would remain friendly to, at the very least, give it a robot body, a scalpel, and stand with your throat exposed before it.

Surrounding it with a sandboxed environment couldn't be guaranteed to add any meaningful amount of security. Maybe the few bits of information you provide through your communications channel would be enough for this particular agent to reverse-engineer your psychology and find that correct combination to unlock you, maybe not. Maybe the extra layer(s) between the agent and the physical world would be enough to delay it slightly or stall it completely, maybe not. The point is you shouldn't rely on it.

Replies from: Vladimir_Nesov
comment by Vladimir_Nesov · 2009-06-13T22:47:09.203Z · LW(p) · GW(p)

Of course.

comment by JamesAndrix · 2009-06-15T14:15:54.143Z · LW(p) · GW(p)

I am familiar with he AI Box experiment. My short answer: So don't have a communications channel, in the same way that if anyone is running our simulation, they don't currently have a communications channel with us.

The AI need only find itself in a series of universes with progressively more difficult challenges. (much like eurisko, actually) We can construct problems that have no bearing on our physics or our evolutionary history. (I'm not saying it's trivial, there would need to be a security review process)

If a pure software intelligence explosion is feasible, then we should be able to get it to create and prove a CEV before it knows anything about us, or that it's possible to communicate with us.

And just because humans aren't secure system doesn't mean we can't make secure systems.

Replies from: Henrik_Jonsson
comment by Henrik_Jonsson · 2009-06-15T14:49:14.873Z · LW(p) · GW(p)

I think my other reply applies here too, if you read "communications channel" as all the information that might be inferred from the universe the AI finds itself in. Either the AI is not smart enough to be a worry without any sandboxing at all, or you have enough to worry about that you should not be relying on the sandbox to protect you.

Your point about our own simulation (if it is one) lacking a simple communications channel actually works against you - In our universe the simulation hypothesis has been proposed, despite the fact that we have only human intelligence to work with.

Replies from: JamesAndrix
comment by JamesAndrix · 2009-06-15T16:21:02.099Z · LW(p) · GW(p)

But constructing the hypothesis isn't evidence that it's true, and if it is true, that still leaves us with (so far) no information about our simulators, and no way to guess their motives, let alone try to trick them.

I've actually been considering the possibility of a process that would create random universes and challenges. But even if the AI discovered some things about our physics, it does not significantly narrow the range of possible minds. It doesn't know if it's dealing with paperclippers or a pebblesorters. It might know roughly how smart we are.

The other half of the communication channel would be the solutions and self-modifications it provides at each iteration. These should not be emotionally compelling and would be subject to an arbitrary amount of review.

There are other advantages to this kind of sandbox, we can present it the task of inferring our physics at various levels of its development, and archive any versions that have learned more than we are comfortable with. (anything)

Keeping secrets from a hostile intelligence is something we already have formal and intuitive experience with. Controlling it's universe and peering into it mind are bonuses.

Interesting Cognitive bias side note: While writing this, I was inclined to write in a style to make it seem silly that an AI could mindhack us based on a few bits. I do think that it's very unlikely, but if I wrote as I was thinking, it would probably have sounded dismissive.

I do think a design goal should be zero bits.

Replies from: Henrik_Jonsson
comment by Henrik_Jonsson · 2009-06-15T16:51:36.807Z · LW(p) · GW(p)

But even if the AI discovered some things about our physics, it does not significantly narrow the range of possible minds. It doesn't know if it's dealing with paperclippers or a pebblesorters. It might know roughly how smart we are.

You're using your (human) mind to predict what a postulated potentially smarter-than-human intelligence could and could not do.

It might not operate on the same timescales as us. It might do things that appear like pure magic. No matter how often you took snapshots and checked how far it had gotten in figuring out details about us, there might be no way of ruling out progress, especially if you gave it motives for hiding that progress (such as pulling the plug every time it came close). Sooner or later you'd conclude that nothing interesting was happening and putting it on autopilot. A small self-improvement might cascade in an enormous difference in understanding, with the notorious FOOM following.

I don't usually like quoting myself, but

If you had a program that might or might not be on a track to self-improve and initiate an Intelligence explosion you'd better be sure enough that it would remain friendly to, at the very least, give it a robot body, a scalpel, and stand with your throat exposed before it.

If the scenario makes you nervous you should be pretty much equally nervous at the idea of giving your maybe-self-improving AI sitting inside thirty nestled sandboxes even 10 milliseconds (10^41 Planck intervals) of CPU time.

Let me be clear here: I'm not assigning any significant probability to someone recreating EURISKO or something like it in their spare time and having it recursively self-improve any time soon. My confidence intervals are spread widely enough that I can spend some time being worried about it, though. I'm just pointing out that sandboxing adds approximately zero extra defense in those situations we would need it.

The parallel to the simulation argument was interesting though, thanks.

Replies from: John_Maxwell_IV, JamesAndrix
comment by John_Maxwell (John_Maxwell_IV) · 2013-02-10T10:02:06.486Z · LW(p) · GW(p)

If the scenario makes you nervous you should be pretty much equally nervous at the idea of giving your maybe-self-improving AI sitting inside thirty nestled sandboxes even 10 milliseconds (10^41 Planck intervals) of CPU time.

I don't think the number of Planck intervals is especially useful to cite... it seems like the relevant factor is CPU cycles, and while I'm not an expert on CPUs, I'm pretty sure that we're not bumping up on Planck intervals yet.

Relatedly, if you were worried about self-improving superintelligence, you could give your AI a slow CPU.

comment by JamesAndrix · 2009-06-15T19:34:39.697Z · LW(p) · GW(p)

First, I feel like we're talking past each other a bit.

Second, I edited this somewhat out of order, apologies if it doesn't flow.

I am trying to look at this in a worst-case scenario, I'll grant that the AI is smart enough to solve any given solvable problem in a single iteration, that it's that smart even in the first experiment, and it would prioritze discovering it's true environment and paperclipping it.

I'm proposing that there exists a sandbox which [provably] can't be gotten out of.

And also a set of problems which do not convey information about our universe.

You're using your (human) mind to predict what a postulated potentially smarter-than-human intelligence could and could not do.

Isn't that required of FAI anyway?

AI sitting inside thirty nestled sandboxes even 10 milliseconds (10^41 Planck intervals) of CPU time.

Again talking past each other, I'm thinking in terms of giving the paperclipper hours. In the ideal, there isn't a provision for letting the AI out of the sandbox. thinking a bit more... None of it's problems/results need even be applicable to our universe, except for general principles of intelligence creation. Having it construct a CEV for itself might show our motives too much, or might not. (hmmmm, we should make sure any CEV we create finds, protects, and applies itself to any simulations used in its construction, in case our simulators use our CEV in their own universe :-)

especially if you gave it motives for hiding that progress (such as pulling the plug every time it came close).

But its existing self would never experience getting close, in the same way we have no records of the superweapons race of 1918. ;-)

Between Iterations, we can retroactively withdraw information that turned out to be revealing, during iterations, it has no capacity to affect our universe.

I think we can put strong brackets around what can be done with certain amounts of information, even by a superintelligence. Knowing all our physics doesn't imply our love of shiny objects and reciprocity. 'No universal arguments' cuts both ways.

comment by asciilifeform · 2009-06-14T03:56:04.010Z · LW(p) · GW(p)

Until Yudkowsky releases the chat transcripts for public review, the AI Box experiment proves nothing.