Steelmanning MIRI critics

post by fowlertm · 2014-08-19T03:14:15.072Z · score: 6 (7 votes) · LW · GW · Legacy · 67 comments

I'm giving a talk to the Boulder Future Salon in Boulder, Colorado in a few weeks on the Intelligence Explosion hypothesis. I've given it once before in Korea but I think the crowd I'm addressing will be more savvy than the last one (many of them have met Eliezer personally). It could end up being important, so I was wondering if anyone considers themselves especially capable of playing Devil's Advocate so I could shape up a bit before my talk? I'd like there to be no real surprises. 

I'd be up for just messaging back and forth or skyping, whatever is convenient.

67 comments

Comments sorted by top scores.

comment by Sean_o_h · 2014-08-19T10:43:40.845Z · score: 14 (14 votes) · LW(p) · GW(p)

Without knowing the content of your talk (or having time to Skype at present, apologies), allow me to offer a few quick points I would expect a reasonably well-informed, skeptical audience member to make (part-based on what I've encountered):

1) Intelligence explosion requires AI to get to a certain point of development before it can really take off (let's set aside that there's still a lot we need to figure out about where that point is, or whether there are multiple different versions of that point). People have been predicting that we can reach that stage of AI development "soon" since the Dartmouth conference. Why should we worry about this being on the horizon (rather than a thousand years away) now?

2) There's such a range of views on this topic by apparent experts in AI and computer science that an analyst might conclude "there is no credible expertise on "path/timeline to super intelligent AI". Why should we take MIRI/FHI's arguments seriously?

3) Why are mathematician/logician/philosophers/interdisciplinary researchers the community we should be taking most seriously when it comes to these concerns? Shouldn't we be talking to/hearing from the cutting edge AI "builders"?

4) (Related). MIRI (and also FHI, but not to such a 'primary' extend') focuses on developing theoretical safety designs, and friendly-AI/safety-relevant theorem proving and maths work ahead of any efforts to actually "build" AI. Would we not be better to be more grounded in the practical development of the technology - building, stopping, testing, trying, adapting as we see what works and what doesn't, rather than trying to lay down such far-reaching principles ahead of the technology development?

comment by Punoxysm · 2014-08-19T18:32:13.011Z · score: 4 (4 votes) · LW(p) · GW(p)

All good points.

I'd focus on #4 as the primary point. Focusing on theoretical safety measures far ahead of the development of the technology to be made safe is very difficult and has no real precedent in previous engineering efforts. In addition, MIRI's specific program isn't heading in a clear direction and hasn't gotten a lot of traction in the mainstream AI research community yet.

Edit: Also, hacks and heuristics are so vital to human cognition in every domain, that it seems clear that general computation models like AIXI don't show the roadmap to AI, despite their theoretical niceness.

comment by Ben Pace (Benito) · 2014-08-20T14:45:23.457Z · score: 1 (1 votes) · LW(p) · GW(p)

For a great-if-imprecise response to #4, you can just read aloud the single page story at the beginning of Bostrom's book 'Superintelligence'. For a more precise response, you can make explicit the analogy.

comment by whpearson · 2014-08-23T13:27:43.241Z · score: 1 (1 votes) · LW(p) · GW(p)

And if they come back with an snake egg instead? Surely we need to have some idea of the nature of AI and it thus how exactly it is dangerous.

comment by Punoxysm · 2014-08-20T23:10:35.689Z · score: 1 (1 votes) · LW(p) · GW(p)

Can you summarize what you mean or link to the excerpt?

And more precisely: Imagine if Roentgen had tried to come up with safety protocols for nuclear energy. He would simply have been far too early to possibly do so. Similarly, we are far too early in the development of AI to meaningfully make it safer, and MIRI's program as it exists doesn't convince me otherwise.

comment by Nornagest · 2014-08-21T00:00:23.541Z · score: 4 (4 votes) · LW(p) · GW(p)

From the Wikipedia article on Roentgen:

It is not believed his carcinoma was a result of his work with ionizing radiation because of the brief time he spent on those investigations, and because he was one of the few pioneers in the field who used protective lead shields routinely.

Sounds like he was doing something right.

comment by Ben Pace (Benito) · 2014-08-21T11:30:55.899Z · score: 1 (1 votes) · LW(p) · GW(p)

My apologies for not being clear on two counts. Here is the relevant passage. And the analogy referred to in my previous comment was the one between Bostrom's story and AI.

comment by ChristianKl · 2014-08-19T10:14:03.672Z · score: 10 (10 votes) · LW(p) · GW(p)

Holden Karnofsky post http://lesswrong.com/lw/cbs/thoughts_on_the_singularity_institute_si/ seems to be the best criticism of MIRI as an organisation.

comment by pragmatist · 2014-08-19T08:02:27.890Z · score: 5 (5 votes) · LW(p) · GW(p)

Just to be clear: Are you looking for criticisms of the intelligence explosion hypothesis or of MIRI? The hypothesis is just one of many claims made by MIRI researchers, and not the most controversial one. Are you just going to be arguing for the plausibility of a (reasonably imminent) intelligence explosion, or are you planning on defending the whole suite of MIRIesque beliefs?

comment by fowlertm · 2014-08-19T11:37:28.779Z · score: 3 (3 votes) · LW(p) · GW(p)

Only the IE as defended by MIRI; it'd be a much longer talk if I wanted to defend everything they've put forward!

comment by [deleted] · 2014-08-20T19:15:47.964Z · score: 2 (2 votes) · LW(p) · GW(p)

Short duration hard takeoff, A la That Alien Message? That's one of the hardest claims for MIRI to justify.

comment by pianoforte611 · 2014-08-19T03:27:20.518Z · score: 5 (5 votes) · LW(p) · GW(p)

It may be more useful to ask actual critics what they think (rather than asking proponents what they think critics are trying to say). Robin Hanson criticizes foom here. I don't actually know what he thinks of MIRI.

comment by fowlertm · 2014-08-19T11:50:59.627Z · score: 2 (2 votes) · LW(p) · GW(p)

Correct, I've been pursuing that as well.

comment by Stuart_Armstrong · 2014-08-19T13:19:14.321Z · score: 4 (4 votes) · LW(p) · GW(p)

Some of the stuff I've posted - http://lesswrong.com/lw/ksa/the_metaphormyth_of_general_intelligence/ , http://lesswrong.com/lw/hvo/against_easy_superintelligence_the_unforeseen/ - could be used to build a good anti-MIRI steelman, but I've not seen them used.

The most convincing anti-MIRI argument? AI may not develop in the way you're imagining. The most convincing rebuttal? We only need a decent probability of that happening to justify worrying about it.

comment by skeptical_lurker · 2014-08-20T16:26:12.208Z · score: 3 (3 votes) · LW(p) · GW(p)

Well, firstly its good that the crowd is savy, but it might still be wise to prepare for strawman/fleshman attacks as well as steelmanned ones.

These are some more plausible criticisms:

(1) Moore's law seems to be slowing - this could be a speedbump before the next paradigm takes over, or it could be the start of stagnation, in which case the singularity is postponed. Of course, if humanity survives the singularity will happen eventually anyway, but if it is hundreds of years in the future it would probably be wiser focussing on promoting rationality/genetic engineering/other methods of improving biological intelligence as well as cryonics in the short term, and leaving work on FAI to future generations.

(2) It could be argued that FAI and perhaps de novo AGI as well is simply so hard we will never get it done in time. Eventually neuromorphic AI/WBE/ brute force evolutionary simulations will be developed (assuming that exponential progress in these fields holds) and we would be better of preparing for this case, perhaps by developing empathic neuromorphic AI, or developing a framework for uploads and humans to live without a Malthusian race to the bottom.

(3) The budget/number of people involved of MIRI is tiny compared to google and other entities which could plausibly design AI. Therefore many would argue that MIRI cannot develop AI first, and instead should focus on outreach towards other, larger, groups.

(4) Gwern seems to be going further, and arguing that we should advocate that nations should suppress technology.

In all of these cases, some sort of rationality outreach would seem to be the alternative, so you could still spin that as a positive.

comment by FeepingCreature · 2014-08-31T18:38:51.193Z · score: 1 (1 votes) · LW(p) · GW(p)

(1) Moore's law seems to be slowing - this could be a speedbump before the next paradigm takes over, or it could be the start of stagnation, in which case the singularity is postponed.

The pithy one-liner comeback to this is that the human brain is an existence proof for a computer the size of the human brain with the performance of the human brain, and it seems implausible that nature arrived at the optimal basic design for neurons on (basically) its first try.

comment by skeptical_lurker · 2014-09-04T18:37:27.468Z · score: 2 (2 votes) · LW(p) · GW(p)

An existence proof is very different from a constructive proof! Nature did not happen upon this design on the first try, the brain has evolved for billions of generations. Of course, intelligence can work faster than the blind idiot god, and humanity, if it survives long enough, will do better. The question is, will this take decades or centuries?

comment by FeepingCreature · 2014-09-05T21:17:55.318Z · score: 1 (1 votes) · LW(p) · GW(p)

An existence proof is very different from a constructive proof!

Quite so. However, it does give reason to hope.

The question is, will this take decades or centuries?

If you look at Moore's Law coming to a close in silicon around 2020, and we're still so far away from a human brain equivalent computer, it's easy to get disheartened. I think it's important to remember that it's at least possible, and if nature could happen upon it..

comment by shminux · 2014-08-19T07:16:10.263Z · score: 3 (7 votes) · LW(p) · GW(p)

Well, I'm no Dymitry or XiXiDu, and not "especially capable" but why not give it a try.

Intelligence explosion pattern-matches pretty well to the religious ideas of Heaven/Hell, and cryonics to Limbo. Also note the dire warnings of the impending UFAI doom unless the FAI Savior is ushered by the righteous (called "rational") before it's too late. So one could dismiss the whole thing as a bad version of Christianity or some other religion.

comment by torekp · 2014-08-19T17:04:47.224Z · score: 8 (8 votes) · LW(p) · GW(p)

This one deserves a lot of attention, not because it's inherently brilliant, but because I expect it to match a large portion of what the audience will think.

comment by RichardKennaway · 2014-08-19T08:48:01.801Z · score: 5 (7 votes) · LW(p) · GW(p)

So one could dismiss the whole thing as a bad version of Christianity or some other religion.

Only by ignoring the fundamental question: Is it true?

comment by V_V · 2014-08-19T17:24:09.363Z · score: 2 (10 votes) · LW(p) · GW(p)

The point is that all object-level arguments for and against these scenarios, even if you call them "probability estimates", are ultimately based on intuitions which are difficult to formalize or quantify.

The scenarios hypthesized by the Singularitarians are extreme, both in the magnitude of the effect they are claimed to entail, and in the the highly conjunctive object-level arguments that are used to argue for them. Common sense rationality tells us that "extraordinary claims demand exceptional evidence". How do we evaluate whether the intuitions of these people constitute "exceptional evidence"?

So we take the "outside view" and try to meta-reason on these arguments and the people making them:
Can we trust their informal intuitions, or do they show any signs of classical biases?

Are these people privileging the hypothesis? Are they drawing their intuitions from the availability heuristic?

If intelligence explosion/cryonics/all things singularitarian were ideas radically different from any common meme, then the answer to these questions would be likely no: these ideas would appear counterintuitive at a gut level to most normally rational people, possibly in the same way quantum mechanics and Einstenian relativity appear conterituitive.
If domain-level experts, after studing the field for years, recalibrated their intuitions and claimed that these scenarios were likely, then we should probably listen to them.
We should not just accept their claims based on authority, of course: even the experts can subject to groupthink and other biases (cough...economists...cough), but as far as the "outside view" is concerned, we would at least have plausibly excluded the availability bias.

What we observe, instead, is that singulariarians ideas strongly pattern-match to Christian millenarianism and similar religious beliefs, mixed with popular scifi tropes (cryonics, AI revolt, etc.). They certainly originated, or at least were strongly influenced by these memes, and therefore the intuitions of the people arguing for them are likely "contaminated" via the availability heuristic by these memes.
More specifically, if singulariarians ideas make intuitive sense to you, you can't even trust your own intuitions since they are likely to be "contaminated" as well.

Add the fact that the strength of these intuitions seems to decrease rather than increase with domain-expertise, suggesting that the Dunning–Kruger effect is also at work, then the "outside view" tells us to be wary.

Of course, it is possible to believe correct things even when they are likely to be the subject of biases, or even to believe correct things that many people believe for the wrong reason, but in order to make a case for these beliefs, you need some airtight arguments with strong evidence.
As far as I can tell, MIRI/FHI/other Singularitarians have provided no such arguments.

comment by Luke_A_Somers · 2014-08-20T15:01:55.575Z · score: 6 (8 votes) · LW(p) · GW(p)

They certainly originated, or at least were strongly influenced by these memes

Originated? Citation needed, seriously.

What we observe, instead, is that singulariarians ideas strongly pattern-match to Christian millenarianism and similar religious beliefs, mixed with popular scifi tropes (cryonics, AI revolt, etc.).

Not very strong pattern match. In Christian millenarianism, you have the good being separated from the bad. And this is considered good, even with all of the horror. Also, the humans don't cause the good and bad things. It's God. Also, it's prophesied and certain to happen in a particular way.

In a typical FOOM scenario, everyone shares their fate regardless of any personal beliefs. And if it's bad for people, it's considered bad - no excuses for any horror. And humans create whatever it is that makes the rest happen, so that 'no excuses' is really salient. There are many ways it could work out, there is no roadmap. This produces pretty much diametrically opposite attitude - 'be really careful and don't trust that things are going to work out okay'.

So the pattern-match fails on closer inspection. "We are heading towards something dangerous but possibly awesome if we do it just right" just isn't like "God is going to destroy the unbelievers and elevate the righteous, you just need to believe!" in any relevant way.

comment by fowlertm · 2014-08-20T16:54:18.315Z · score: 1 (1 votes) · LW(p) · GW(p)

I've heard the singularity-pattern-matches-religious-tropes argument before and hadn't given it much thought, but I find your analysis that the argument is wrong to be convincing, at least for the futurism I'm acquainted with. I'm less sure that it's true of Kurzweil's brand of futurism.

comment by V_V · 2014-08-20T19:27:47.171Z · score: 0 (4 votes) · LW(p) · GW(p)

Originated? Citation needed, seriously.

Citation for what? We can't be sure of what was going on in the heads of the Singularitarians when they came up with these ideas, but it seems obvious that people like Kurzweil, Hanson, Bostrom, Yudkowsky, etc., were quite familiar with Christian millenarianism and scifi tropes.

In a typical FOOM scenario, everyone shares their fate regardless of any personal beliefs.

Well, those who died...pardon..."deanimated" without signing up for cryonics are out of luck, robot Jesus will not rise them from their icy graves.

Several variants of the Singularity allow different outcomes for different people, see Hanson's Malthusian EM society for instance.
Yudkowsky's CEV-FAI is (was?) supposed to impose a global morality based on some sort of "extrapolated" average of people's moralities. Some people may not like it. And don't let's get started with the Basilisk...

Anyway, Singularitarianism is not Christianity, so if you look at a sufficient level of detail you can certainly find some differences. But it seems clear to me that they are related.

comment by Luke_A_Somers · 2014-08-20T19:58:49.626Z · score: 4 (4 votes) · LW(p) · GW(p)

Citation for what? We can't be sure of what was going on in the heads of the Singularitarians when they came up with these ideas, but it seems obvious that people like Kurzweil, Hanson, Bostrom, Yudkowsky, etc., were quite familiar with Christian millenarianism and scifi tropes.

If that's all you've got, then you totally made the idea up. Why would a bunch of atheists be positively inclined towards a story that resembled something they rejected more or less directly?

Well, those who died...pardon..."deanimated" without signing up for cryonics are out of luck, robot Jesus will not rise them from their icy graves.

This is still really really different.

A) Only a tiny fraction of people who expect a singularity are into cryo. It's not the same belief.

B) Even if there is no singularity at all, cryo could pay off. They're separate things causally as well. You don't need a Robot Jesus to reanimate or upload someone, just amazingly awesome medical technology.

C) Everyone still alive at the time experiences the consequences, good or bad, so that's kind of moot if the singularity is to be expected any time vaguely soon. Outside of the basilisk, whether you brought it about or not doesn't have an impact - and taking the basilisk seriously would make one an extreme outlier.

D) If it turns out that existing cryo tech doesn't work, then the people who did sign up are SOL too, as is anyone who did sign up for cryo but didn't get frozen for whatever reason. These are very real risks taken seriously by almost everyone who does support cryo.

E) The only moral judgement here is on people who don't let others be frozen... and see C. There's no element of karma here, no justice. Just, do 'the smart thing' or don't (FYI, I am not signed up for cryo).

allow different outcomes for different people, see Hanson's Malthusian EM society for instance.

that looks like the same outcome for everyone to me. The 'survivors' are ground down to pure economics by Moloch. Plus, you seem to be overinterpreting my 'same outcome' statement. Outcome of the singularity, not personal outcome.

Yudkowsky's CEV-FAI is (was?) supposed to impose a global morality based on some sort of "extrapolated" average of people's moralities. Some people may not like it.

Whoa there. It would itself act in accordance with said morality. If said morality is pluralistic, which seems very likely considering that it's built on two layers of indirection, then it does not end up imposing a global morality on anyone else.

Anyway, Singularitarianism is not Christianity, so if you look at a sufficient level of detail you can certainly find some differences. But it seems clear to me that they are related.

I didn't exactly have to probe deeply, and considering that the philosophical effect of the belief is diametrically opposite, I certainly don't think I went too deeply. It feels shoehorned in to me.

comment by V_V · 2014-08-20T20:28:01.065Z · score: 3 (3 votes) · LW(p) · GW(p)

If that's all you've got, then you totally made the idea up.

What would a citation for it look like?

Why would a bunch of atheists be positively inclined towards a story that resembled something they rejected more or less directly?

I don't know, maybe because they were raised in highly religious families (Hanson and Muehlhauser in particular, Yudkowsky mentions an Ortodox Jewish upbringing but I don't know how much religious his parents were, I don't know about the other folks) and they are scared that they realized they live in a world "Beyond the Reach of God"?

Anyway, we don't have to psychoanalyze them. Similarity of beliefs and familiarity with the hypothetical source is evidence of relatedness.

I didn't exactly have to probe deeply, and considering that the philosophical effect of the belief is diametrically opposite, I certainly don't think I went too deeply. It feels shoehorned in to me.

You could compare different Christian denominations and find different "philosophical effect of the belief" (e.g. the five "Solae" of early Protestantism vs Catholic theology), but this doesn't mean that they are unrelated.

comment by Viliam_Bur · 2014-08-25T18:32:09.700Z · score: 1 (1 votes) · LW(p) · GW(p)

I don't know if this is a relevant data point, but I was raised in an atheist communist family, and I still like the idea that people could live forever (or at least much longer than today) and I think the world could be significantly improved.

It seems to me one doesn't need a religious background for this, only to overcome some learned helplessness and status-quo fatalism. Okay, the religion (and also communism) already provide you a story of a radical change in the future, so they kinda open the door... but I think that living in the 20th/21st century and watching the world around you change dramatically should allow one to extrapolate even if they wouldn't hear such ideas before.

comment by Luke_A_Somers · 2014-08-21T13:10:11.488Z · score: 1 (1 votes) · LW(p) · GW(p)

What would a citation for it look like?

Anything they wrote or said that might lead you to believe that there is actually this connection, beyond pure supposition?

'Beyond the Reach of God' is at least in the right vein, though there are two teensy weensy difficulties (i.e. it's completely useless to your argument). First, the fellow who wrote it was never Christian, so Christian Millenarianism wouldn't be ingrained into him. Second, 'Beyond the Reach of God' doesn't aim itself back into religion and less still Revelations-style religion. 'Let's build a tool that makes life fair' is completely crosswise to any religious teaching.

You could compare different Christian denominations and find different "philosophical effect of the belief" (e.g. the five "Solae" of early Protestantism vs Catholic theology), but this doesn't mean that they are unrelated.

Yes, and they are obviously related due to all being substantially the same thing - heck, they share their NAME. Having opposite philosophical conclusions is a good reason to cut off a particular line of reasoning that someone generated an idea by pattern-matching to an existing narrative, in the absence of any other evidence that they did so besides a mediocre pattern-match. I didn't claim it was a general disproof.

When you have two ideas that are: called differently, they claim no common origin, one came from revelation while the other from reasoning presented publicly, one claims certainty while the other claims uncertainty, one is a moral claim while the other is a factual claim, one is supernatural and the other is materialistic...

and,

the connections between them are that they both claim to accomplish several highly desirable things like: raising the dead and keeping people alive forever, and doing so for all the world...

the high desirability of these things mean that multiple people would aim to accomplish them, so aiming to accomplish them does not indicate shared origin!

comment by V_V · 2014-08-21T15:11:35.265Z · score: 1 (1 votes) · LW(p) · GW(p)

First, the fellow who wrote it was never Christian, so Christian Millenarianism wouldn't be ingrained into him.

He was born and raised in a predominantly Protestant Christian society, where these beliefs are widespread. And, by the way, apocalyptic beliefs existed in all religions and cultures, including Judaism (Christianity was originally a messianic and arguably apocalyptic Jewish cult).

Second, 'Beyond the Reach of God' doesn't aim itself back into religion and less still Revelations-style religion. 'Let's build a tool that makes life fair' is completely crosswise to any religious teaching.

'Salvation through good works' comes to mind.
More generally, various doomsday cults have beliefs involving the cult members having to perform specific actions in order to trigger the Apocalypse or make sure that it unfolds in the intended way.

I don't want to push the pattern-matching too far. 'Singularity is a cult' has been already debated at nausem here, and is probably and exagerated position.
It sufficies to say that singularitarian and religious ideas are probably salient to the same kind of psychological mechanisms and heuristics, some innate and some acquired or reinforced by culture.

As I said in the my original comment, this doesn't necessarily imply that singularitarian beliefs are wrong, but it strongly suggests that we should be wary for availability heuristic/priviledging the hypothesis biases when we evaluate them.

When you have two ideas that are: called differently, they claim no common origin,

'Beryon the reach of God' seems evidence to the contrary.

one came from revelation while the other from reasoning presented publicly,

Fair enough.

one claims certainty while the other claims uncertainty,

Does it? I'm under the impression that singularitarians believe that, barring some major catastrophe, the Singularity is pretty much inevitable.

one is a moral claim while the other is a factual claim,

No. Both are factual claims about events that are expected to happen in the future. They may be more or less falsifiable, depending on how much the authors commit to specific deadlines.

one is supernatural and the other is materialistic...

Any sufficiently advanced technology is indistinguishable from magic.

comment by RichardKennaway · 2014-08-21T12:37:26.211Z · score: 0 (2 votes) · LW(p) · GW(p)

Between outside view, Dunning-Krueger, and rhetorical questions about biases with no attempt to provide answers to them, you've built a schema for arguing against anything at all without the burden of bringing evidence to the table. I guess evidence would be the dreaded inside view, although that doesn't stop you demanding it from the other side. Bostrom's recent book? The arguments in the Sequences? No, that doesn't count, it's not exceptional enough, and besides, Dunning-Krueger means no-one ever knows they're wrong, and (contd. p.94).

Maybe a better name for "outside view" would be "spectator's view", or "armchair view".

comment by V_V · 2014-08-21T13:46:22.945Z · score: 1 (1 votes) · LW(p) · GW(p)

Between outside view, Dunning-Krueger, and rhetorical questions about biases with no attempt to provide answers to them, you've built a schema for arguing against anything at all without the burden of bringing evidence to the table.

I don't think so. Try to use this scheme to argue against, say, quantum mechanics.

Bostrom's recent book? The arguments in the Sequences? No, that doesn't count, it's not exceptional enough

I haven't read Bostrom's recent book. Given that he's a guy who takes the simulation hypothesis seriously, I'd don't expect much valuable insight from him, but I could be wrong of course. If you think he has some substatially novel strong argument, feel free to point it out to me.

The Sequences discuss cryonics using weak arguments (e.g. the hard drive analogy). AFAIK they don't focus on intelligence explosion.
I think that Yudkowsky/Muehlhauser/MIRI argument for intelligence explosion is Good's argument, variously expanded and articulated in the Yudkowsky/Hanson debate. Needless to say, I don't find this line of argument very convincing.
Again, feel free to refer me to any strong argument that I might be missing.

comment by shminux · 2014-08-19T15:21:03.050Z · score: 2 (2 votes) · LW(p) · GW(p)

Indeed. But it is hard to argue for the truth of the models whose predictions haven't come to pass (yet).

comment by John_Maxwell (John_Maxwell_IV) · 2014-08-19T06:09:05.223Z · score: 3 (3 votes) · LW(p) · GW(p)

I'd like there to be no real surprises.

It seems like surprises would be more valuable than just reciting info.

comment by Sean_o_h · 2014-08-19T10:20:00.438Z · score: 15 (15 votes) · LW(p) · GW(p)

Speaking as someone who speaks about X-risk reasonably regularly: I have empathy for the OP's desire for no surprises. IMO there are many circumstances in which surprises are very valuable - one on one discussions, closed seminars and workshops where a productive, rational exchange of ideas can occur, boards like LW where people are encouraged to interact in a rational and constructive way.

Public talks are not necessarily the best places for surprises, however. Unless you're an extremely skilled orator, the combination of nerves, time limitations, crowd dynamics, and other circumstances can make it quite difficult to engage in an ideal manner. Crowd perception of how you "handle" a point, particularly a criticism, can do a huge amount in how the overall merit of you, your talk, and your topic, are perceived - even if the criticism is invalid or your response adequate. My experience is also that the factors above can push us into less nuanced, more "strong"-seeming positions than we would ideally take. In a worst-case scenario, a poor presentation/defence of an important idea can impact perception of the idea itself outside the context of the talk (if the talk is widely enough disseminated).

These are all reasons why I think it's an excellent idea to consider the best and strongest possible objections to your argument, and to think through what an ideal and rational response would be - or, indeed, if the objection is correct, in which case it should be addressed in the talk. This may be the OP's only to expose his audience to these ideas.

comment by shminux · 2014-08-19T18:44:15.276Z · score: 5 (5 votes) · LW(p) · GW(p)

In a worst-case scenario, a poor presentation/defence of an important idea can impact perception of the idea itself outside the context of the talk

Right. Exposure to a weak meme inoculates people against being affected by similar memes in the future. There was a recent SSC post about it, I think. Bad presentation is worse than no presentation at all.

comment by fowlertm · 2014-08-19T11:38:28.795Z · score: 1 (1 votes) · LW(p) · GW(p)

Correct :)

comment by DanielLC · 2014-08-19T04:14:35.773Z · score: 3 (3 votes) · LW(p) · GW(p)

MIRI intends to make an AI that is provably friendly. This would require having a formal definition of friendliness that means exactly what it's supposed to mean, and then proving it. Either of those steps seems highly unlikely to be completed without error.

comment by lukeprog · 2014-08-19T05:10:19.040Z · score: 15 (17 votes) · LW(p) · GW(p)

MIRI intends to make an AI that is provably friendly.

I really wish people would stop repeating this claim. Mathematical Proofs Improve But Don’t Guarantee Security, Safety, and Friendliness.

comment by V_V · 2014-08-20T08:12:20.517Z · score: 3 (9 votes) · LW(p) · GW(p)

And yet, all the publicly known MIRI research seems to be devoted to formal proof systems, not to testing, "boxing", fail-safe mechanisms, defense in depth, probabilistic failure analysis, and so on.

Motte and bailey?

comment by lukeprog · 2014-08-20T17:15:27.362Z · score: 2 (4 votes) · LW(p) · GW(p)

This paragraph is a simplification rather than the whole story, but: Our research tends to be focused on mathematical logic and proof systems these days because those are expressive frameworks with which to build toy models that can give researchers some general insight into the shape of the novel problems of AGI control. Methods like testing and probabilistic failure analysis require more knowledge of the target system than we now have for AGI.

And we do try to be clear about the role that proof plays in our research. E.g. see the tiling agents LW post:

The paper uses first-order logic (FOL) because FOL has a lot of useful standard machinery for reflection which we can then invoke; in real life, FOL is of course a poor representational fit to most real-world environments outside a human-constructed computer chip with thermodynamically expensive crisp variable states.

As further background, the idea that something-like-proof might be relevant to Friendly AI is not about achieving some chimera of absolute safety-feeling, but rather about the idea that the total probability of catastrophic failure should not have a significant conditionally independent component on each self-modification, and that self-modification will (at least in initial stages) take place within the highly deterministic environment of a computer chip. This means that statistical testing methods (e.g. an evolutionary algorithm's evaluation of average fitness on a set of test problems) are not suitable for self-modifications which can potentially induce catastrophic failure (e.g. of parts of code that can affect the representation or interpretation of the goals).

And later, in an Eliezer comment:

Reply to: "My previous understanding had been that MIRI staff think that by default, one should expect to need to solve the Lob problem in order to build a Friendly AI."

By default, if you can build a Friendly AI you were not troubled by the Lob problem. That working on the Lob Problem gets you closer to being able to build FAI is neither obvious nor certain (perhaps it is shallow to work on directly, and those who can build AI resolve it as a side effect of doing something else) but everything has to start somewhere. Being able to state crisp difficulties to work on is itself rare and valuable, and the more you engage with a problem like stable self-modification, the more you end up knowing about it. Engagement in a form where you can figure out whether or not your proof goes through is more valuable than engagement in the form of pure verbal arguments and intuition, although the latter is significantly more valuable than not thinking about something at all.

My guess is that people hear the words "proof" and "Friendliness" in the same sentence but (quite understandably!) don't take time to read the actual papers, and end up with the impression that MIRI is working on "provably Friendly AI" even though, as far as I can tell, we've never claimed that.

comment by V_V · 2014-08-20T20:02:09.821Z · score: 2 (4 votes) · LW(p) · GW(p)

This paragraph is a simplification rather than the whole story, but: Our research tends to be focused on mathematical logic and proof systems these days because those are expressive frameworks with which to build toy models that can give researchers some general insight into the shape of the novel problems of AGI control. Methods like testing and probabilistic failure analysis require more knowledge of the target system than we now have for AGI.

When somebody says they are doing A for reason X, then reason X is criticized and they claim they are actually doing A for reason Y, and they have always been, I tend to be wary.

In this case A is "research on mathematical logic and formal proof systems",
X is "self-improving AI is unboxable and untestable, we need to get it provably right on the first try"
and Y is "Our research tends to be focused on mathematical logic and proof systems these days because those are expressive frameworks with which to build toy models that can give researchers some general insight into the shape of the novel problems of AGI control".

If Y is better than X, as it seems to me in this case, this is indeed an improvement, but when you modify your reasons and somehow conclude that your previously chosen course of action is still optimal, then I doubt your judgment.

as far as I can tell, we've never claimed that.

Well... (trigger wa-...)

"And if Novamente should ever cross the finish line, we all die. That is what I believe or I would be working for Ben this instant."
"I intend to plunge into the decision theory of self-modifying decision systems and never look back. (And finish the decision theory and implement it and run the AI, at which point, if all goes well, we Win.)"
"Take metaethics, a solved problem: what are the odds that someone who still thought metaethics was a Deep Mystery could write an AI algorithm that could come up with a correct metaethics? I tried that, you know, and in retrospect it didn’t work."
"Find whatever you’re best at; if that thing that you’re best at is inventing new math[s] of artificial intelligence, then come work for the Singularity Institute. [ ... ] Aside from that, though, I think that saving the human species eventually comes down to, metaphorically speaking, nine people and a brain in a box in a basement, and everything else feeds into that."

comment by lukeprog · 2014-08-20T21:12:48.221Z · score: 3 (5 votes) · LW(p) · GW(p)

X is "self-improving AI is unboxable and untestable, we need to get it provably right on the first try"

But where did somebody from MIRI say "we need to get it provably right on the first try"? Also, what would that even mean? You can't write a formal specification that includes the entire universe and than formally verify an AI against that formal specification. I couldn't find any Yudkowsky quotes about "getting it provably right on the first try" at the link you provided.

comment by TheAncientGeek · 2014-08-20T21:27:20.533Z · score: 2 (2 votes) · LW(p) · GW(p)

Why talk about unupdateable UFs and "solving morality" if you are not going for that approach?

comment by lukeprog · 2014-08-20T21:54:40.163Z · score: 3 (3 votes) · LW(p) · GW(p)

Again, a simplification, but: we want a sufficient guarantee of stably friendly behavior before we risk pushing things past a point of no return. A sufficient guarantee plausibly requires having robust solutions for indirect normatively, stable self-modification, reflectively consistent decision theory, etc. But that doesn't mean we expect to ever have a definite "proof" that system will be stably friendly.

Formal methods work for today's safety-critical software systems never results in a definite proof that a system will be safe, either, but ceteris paribis formal proofs of particular internal properties of the system give you more assurance that the system will behave as intended than you would otherwise have.

comment by TheAncientGeek · 2014-08-21T10:16:21.521Z · score: 2 (2 votes) · LW(p) · GW(p)

Otherwise compared to nothing, or otherwise compared to informal methods?

Are you talking into account that the formal/proveable/unupdateable approach has a drawback in the AI domain that it doesn't have in the non AI domain, namely you lose the potential to tell an AI "stop doing that,it isn't nice"

comment by lukeprog · 2014-08-21T15:46:04.499Z · score: -1 (3 votes) · LW(p) · GW(p)

you lose the potential to tell an AI "stop doing that,it isn't nice"

How so?

comment by TheAncientGeek · 2014-08-22T14:57:02.749Z · score: 0 (2 votes) · LW(p) · GW(p)

Do you think that wouldl work on Clippie?

comment by DanielLC · 2014-08-19T16:02:21.682Z · score: 2 (4 votes) · LW(p) · GW(p)

You admit that friendliness is not guaranteed. That means that you're not wrong, which is a good sign, but it doesn't fix the problem that friendliness isn't guaranteed. You have as many tries as you want for intelligence, but only one for friendliness. How do you expect to manage it in the first try?

It also doesn't seem to be clear to me that this is the best strategy. In order to get that provably friendly thing to work, you have to deal with an explicit, unchanging utility function, which means that friendliness has to be right from the beginning. If you deal with an implicit utility function that will change as the AI comes to understand itself better, you could program an AI to recognise pictures of smiles, then let it learn that the smiles correspond to happy humans and update its utility function accordingly, until it (hopefully) decides on "do what we mean".

It seems to me that part of the friendliness proof would require proving that the AI will follow its explicit utility function. This would be impossible. The AI is not capable of perfect solomonoff induction, and will alway have some bias, no matter how small. This means that its implicit utility function will never quite match its explicit utility function. Am I missing something here?

comment by lukeprog · 2014-08-19T16:35:45.257Z · score: 3 (3 votes) · LW(p) · GW(p)

You admit that friendliness is guaranteed.

Typo?

In order to get that provably friendly thing to work

Again, I think "provably friendly thing" mischaracterizes what MIRI thinks will be possible.

I'm not sure exactly what you're saying in the rest of your comment. Have you read the section on indirect normativity in Superintelligence? I'd start there.

comment by shminux · 2014-08-19T18:47:05.361Z · score: 9 (9 votes) · LW(p) · GW(p)

Given the apparent misconceptions about MIRI's work even among LWers, it seems like you need to write a Main post clarifying what MIRI does and does not claim, and does and does not work on.

comment by DanielLC · 2014-08-19T23:06:13.955Z · score: 1 (1 votes) · LW(p) · GW(p)

Typo?

Fixed.

Again, I think "provably friendly thing" mischaracterizes what MIRI thinks will be possible.

From what I can gather, there's still supposed to be some kind of proof, even if it's just the mathematical kind where you're not really certain because there might be an error in it. The intent is to have some sort of program that maximizes utility function U, and then explicitly write the utility function as something along the lines of "do what I mean".

Have you read the section on indirect normativity in Superintelligence? I'd start there.

I'm not sure what you're referring to. Can you give me a link?

comment by Adele_L · 2014-08-20T01:45:24.313Z · score: 4 (4 votes) · LW(p) · GW(p)

Superintelligence is a recent book by Nick Bostrom

comment by VAuroch · 2014-08-20T03:14:47.621Z · score: 2 (2 votes) · LW(p) · GW(p)

In order to get that provably friendly thing to work, you have to deal with an explicit, unchanging utility function,

I think this is incorrect. If it isn't, it at least requires some proof.

comment by DanielLC · 2014-08-20T03:29:25.122Z · score: 1 (1 votes) · LW(p) · GW(p)

For one thing, you'd have to explicitly come up with the utility function before you can prove the AI follows it.

You can either make an AI that will proveably do what you mean, or make one that will hopefully figure out what you meant when you said "do what I mean," and do that.

comment by VAuroch · 2014-08-20T07:10:54.695Z · score: 1 (1 votes) · LW(p) · GW(p)

When I picture what a proven-Friendly AI looks like, I think of something where it's goals are 1)Using a sample of simulated humans, generalize to unpack 'do what I mean' followed by 2)Make satisfying that your utility function.

Proving those two steps each rigorously would produce a proven-Friendly AI without an explicit utility function. Proving step 1 to be safe would obviously be very difficult; proving step 2 to be safe would probably be comparatively easy. Both, however, are plausibly rigorously provable.

comment by DanielLC · 2014-08-20T16:27:02.356Z · score: 1 (1 votes) · LW(p) · GW(p)

2)Make satisfying that your utility function.

This is what I mean by an explicit utility function. An implicit one is where it never actually calculates utility, like how humans work.

comment by TheAncientGeek · 2014-08-20T18:28:17.099Z · score: 0 (2 votes) · LW(p) · GW(p)

Those points were excellent, and it is no credit to LW that the comment was on negative karma when I encountered it.

No, the approach based on proveable correctness isn't a 100% guarantee, and, since it involves an unupdateable UF, and has the additional disadvantage that if you don't get the UF right first time, you can't tweak it.

The alternative family of approaches, based on flexibility, training and acculturation have often been put forward by MIRIs critics....and MIRI has never been quantiified why the one approach is better than the other.

comment by satt · 2014-08-19T23:48:00.292Z · score: 1 (1 votes) · LW(p) · GW(p)

For anyone else who only read the link's main text and couldn't understand how it's meant to refute the "MIRI intends to make an AI that is provably friendly" idea: the explicit disclaimer is in footnote 7.

comment by chaosmage · 2014-08-27T17:10:13.325Z · score: 2 (2 votes) · LW(p) · GW(p)

I can't think of rational arguments, even steelmanned ones, beyond those Holden already gave. Maybe I'm too close to the whole thing, but I think that when viewed rationally, MIRI is on pretty solid ground.

If I wanted to make people wary of supporting MIRI, I'd simply go ad hominem . Start with selected statements from supporters about how much MIRI is about Eliezer, and from Eliezer about how he can't get along with AI researchers, how he can't do straight work for more than two hours per day and how "this is a cult". Quote a few of the psychotic sounding parts from Eliezer's old autobiography piece. Paint him as a very skilled writer/persuader whose one great achievement was to get Peter Thiel to throw him a golden bone. Describe the whole Friendliness issue as an elaborate excuse from someone who claimed ability to code an AGI fifteen years ago, and hasn't.

Of course that's a lowly and unworthy style of argument, but it'd get attention from everyone there, and I wonder how you'd defend against it.

comment by fowlertm · 2014-08-29T16:42:33.622Z · score: 1 (1 votes) · LW(p) · GW(p)

I think I'm basically prepared for that line of attack. MIRI is not a cult, period. When you want to run a successful cult you do it Jim-Jones-style, carting everyone to a secret compound and carefully filtering the information that makes it in or out. You don't work as hard as you can to publish your ideas in a format where they can be read by anyone, you don't offer to publicly debate William Lane Craig, and you don't seek out the strongest versions of criticisms of your position (i.e. those coming from Robin Hanson).

Eliezer hasn't made it any easier on himself by being obnoxious about how smart he is, but then again neither did I; most smart people eventually have to learn that there are costs associated with being too proud of some ability or other. But whatever his flaws, the man is not at the center of a cult.

comment by chaosmage · 2014-08-29T17:49:28.017Z · score: 1 (1 votes) · LW(p) · GW(p)

Sure MIRI isn't a cult, but I didn't say it was. I pointed out that Eliezer does play a huge role in it and he's unusually vulnerable to ad hominem attack. If anyone does that, your going with "whatever his flaws" isn't going to sound great to your audience.

comment by fowlertm · 2014-08-30T02:19:17.881Z · score: 1 (1 votes) · LW(p) · GW(p)

How would you recommend responding?

comment by chaosmage · 2014-09-01T12:16:03.172Z · score: 2 (2 votes) · LW(p) · GW(p)

I think I'd point out that he's a fairly public person, which both should increase trust and gives more material for ad hominem attacks. And once someone else has dragged the discussion down to a personal level, you might as well throw in appeals to authority with Elon Musk on AI risk, i.e. change the subject.

comment by fowlertm · 2014-08-25T16:52:51.257Z · score: 2 (2 votes) · LW(p) · GW(p)

This comment is a poorly-organized brain dump which serves as a convenient gathering place for what I've learned after several days of arguing with every MIRI critic I could find. It will probably get it's own expanded post in the future, and if I have the time I may try to build a near-comprehensive list.

I've come to understand that criticisms of MIRI's version of the intelligence explosion hypothesis and the penumbra of ideas around it fall into two permeable categories:

Those that criticize MIRI as an organization or the whole FAI enterprise (people making these arguments may or may not be concerned about the actual IE) and those that attack object-level claims made by MIRI.

Broad Criticisms

1a) Why worry about this now, instead of in the distant future, given the abysmal performance of attempts to predict AI?

1b) Why take MIRI seriously when there are so many expert opinions that diverge?

1c) Aren't MIRI and LW just an Eliezer-worshipping cult?

1d) Is it even possible to do this kind of theoretical work so far in advance of actual testing and experimentation?

1e) The whole argument can be dismissed as it pattern matches other doomsday scenarios, almost all of which have been bullshit.


Specific Criticisms

2a) General intelligence is what we're worried about here, and it may prove much harder to build than we're anticipating.

2b) Tool AIs won't be as dangerous as agent AIs.

2c) Why not just build an Oracle?

2d) the FOOM will be distributed and slow, not fast and localized.

2e) Dumb Superintelligence, i.e. nothing worth of the name could possibly misinterpret a goal like 'make humans happy'

2f) Even FAI isn't a guarantee

2g) A self-improvement cascade will likely hit a wall at sub-superintelligent levels.

2h) Divergence Issue: all functioning AI systems have built-in sanity checks which take short-form goal statements and unpack them in ways that take account of constraints and context (???). It is actually impossible to build an AI which does not do this (???), and thus there can be no runaway SAI which is given a simple short-form goal and then carries it to ridiculous logical extremes (I WOULD BE PARTICULARLY INTERESTED IN SOMEONE ADDRESSING THIS).

comment by TheAncientGeek · 2014-08-22T16:30:24.274Z · score: 1 (5 votes) · LW(p) · GW(p)

A wrinkle in the foom argument, re: source code readability

There is a sense in which a programme can easily read its own source code. The whole point of a compiler is to scan and process source code. A C compiler can compile its own source code, providing it is written in C.

The wrinkle is the ability of a programme to knowingly read its own source code. Any running process can be put inside a sandbox or simulated environment, such that there is no surely technical way if circumventing it. A running process accesses its environment using system calls, for instance get_time() or open_file(), and it has to take their results on faith. The get_time() function doesn't have to return the real system time, and in a visualized process, attempts to access the file system do not access the real file system. There is no is_this_real() call, or at least, no unfakeable one. (Homoiconicity is no magic fix - even if a LISP programme can easily process it's own code once it has obtained it, it still has to trust the subroutine that advertises itself as returning it)

Therefore, a programme can easily be prevented from accessing and modifying its own code.

It could be argued that an intelligent agent could use social engineering to cajole a human operator into getting the source code out of a locked cabinet, or whatever. This is a variation on the standard MIRI claim that an AI could talk it's way out of a box. However,in this case the AI needs to talk it's way out before it starts to recursively self improve, because it needs it's source code. This suggests that an AI that is below a certain level of intelligence can be maintained there.

comment by fowlertm · 2014-08-25T16:07:56.088Z · score: 2 (2 votes) · LW(p) · GW(p)

A good point, I must spend some time looking into the FOOM debate.