Raising safety-consciousness among AGI researchers

lukeprog

Raising safety-consciousness among AGI researchers

post by lukeprog · 2012-06-02T21:39:04.860Z · LW · GW · Legacy · 32 comments

32 comments

Series: How to Purchase AI Risk Reduction

Another method for purchasing AI risk reduction is to raise the safety-consciousness of researchers doing work related to AGI.

The Singularity Institute is conducting a study of scientists who decided to either (1) stop researching some topic after realizing it might be dangerous, or who (2) forked their career into advocacy, activism, ethics, etc. because they became concerned about the potential negative consequences of their work. From this historical inquiry we hope to learn some things about what causes scientists to become so concerned about the consequences of their work that they take action. Some of the examples we've found so far: Michael Michaud (resigned from SETI in part due to worries about the safety of trying to contact ET), Joseph Rotblat (resigned from the Manhattan Project before the end of the war due to concerns about the destructive impact of nuclear weapons), and Paul Berg (became part of a self-imposed moratorium on recombinant DNA back when it was still unknown how dangerous this new technology could be).

What else can be done?

Academic outreach, in the form of conversations with AGI researchers and "basics" papers like Intelligence Explosion: Evidence and Import or Complex Value Systems are Required to Realize Valuable Futures.
A scholarly AI risk wiki.
Short primers on crucial topics.
Whatever is suggested by our analysis of past researchers who took action in response to their concerns about the ethics of their research, and by other analyses of human behavior.

Naturally, these efforts should be directed toward researchers who are both highly competent and whose work is very relevant to development toward AGI: researchers like Josh Tenenbaum, Shane Legg, and Henry Markram.

32 comments

Comments sorted by top scores.

comment by JGWeissman · 2012-06-03T03:39:07.289Z · LW(p) · GW(p)

For example, Moshe Looks (head of Google's AGI team) is now quite safety conscious, and a Singularity Institute supporter.

Has he done anything to make the work of Google's AGI team less dangerous?

Replies from: lukeprog

↑ comment by lukeprog · 2012-10-19T00:54:27.026Z · LW(p) · GW(p)

Update: please see here.

comment by ChrisHallquist · 2012-06-03T10:07:20.406Z · LW(p) · GW(p)

Given that I think "Google develops powerful AI" is much more likely than "SIAI develops powerful AI," I think this effort is a very good idea.

comment by XiXiDu · 2012-06-03T09:23:28.901Z · LW(p) · GW(p)

...Moshe Looks (head of Google's AGI team) is now quite safety conscious, and a Singularity Institute supporter.

A supporter? Interesting. In January he told me that he is merely aware of SIAI.

ETA He's the head of Google's AGI team? Did he say that?

Replies from: lukeprog

↑ comment by lukeprog · 2012-10-19T00:54:38.163Z · LW(p) · GW(p)

Update: please see here.

comment by private_messaging · 2012-06-03T15:24:43.992Z · LW(p) · GW(p)

Ohh, that's easily the one on which you guys can do most harm by associating the safety concern with crankery, as long as you look like cranks but do not realize it.

Speaking of which, use of complicated things you poorly understand is a sure fire way to make it clear you don't understand what you are talking about. It is awesome for impressing people who understand those things even more poorly or are very unconfident in their understanding, but for competent experts it won't work.

Simple example [of how not to promote beliefs]: idea that Kolmogorov complexity or Solomonoff probability favours many worlds interpretation because it is 'more compact' [without having any 'observer']. Why wrong: if you are seeking lowest complexity description of your input, your theory needs to also locate yourself within what ever stuff it generates somehow (hence appropriate discount for something really huge like MWI). Why stupid: because if you don't require that, then the iterator through all possible physical theories is the lowest complexity 'explanation' and we're back to square 1. How it affects other people's opinion of your relevance: very negatively for me. edit: To clarify, the argument is bad, and I'm not even getting into details such as non-computability, our inability to represent theories in the most compact manner (so we are likely to pick not the most probable theory but the one we can compactify easier), machine/language dependence etc etc etc.

edit: Another issue: there was the mistake in phases in the interferometer. A minor mistake, maybe (or maybe the i was confused with phase of 180, in which case it is a major misunderstanding). But the one that people whom refrain of talking of the topics they don't understand, are exceedingly unlikely to make (its precisely the thing you double check). Not being sloppy with MWI and Kolmogorov complexity etc is easy: you just need to study what others have concluded. Not being sloppy with AI is a lot harder. Being less biased won't in itself make you significantly less sloppy.

Replies from: albeola, D2AEFEA1, John_Maxwell_IV, khafra

↑ comment by albeola · 2012-06-04T19:16:00.771Z · LW(p) · GW(p)

if you are seeking lowest complexity description of your input, your theory needs to also locate yourself within what ever stuff it generates somehow (hence appropriate discount for something really huge like MWI)

It seems to me that such a discount exists in all interpretations (at least those that don't successfully predict measurement outcomes beyond predicting their QM probability distributions). In Copenhagen, locating yourself corresponds to specifying random outcomes for all collapse events. In hidden variables theories, locating yourself corresponds to picking arbitrary boundary conditions for the hidden variables. Since MWI doesn't need to specify the mechanism for the collapse or hidden variables, it's still strictly simpler.

Replies from: private_messaging

↑ comment by private_messaging · 2012-06-04T20:48:45.138Z · LW(p) · GW(p)

Well, the goal is to predict your personal observations, in MWI you have huge wavefunction on which you need to somehow select the subjective you. The predictor will need code for this, whenever you call it mechanism or not. Furthermore, you need to actually derive Born probabilities from some first principles somehow if you want to make a case for MWI. Deriving those, that's what would be interesting, actually making it more compact (if the stuff you're adding as extra 'first principles' is smaller than collapse). Also, btw, CI doesn't have any actual mechanism for collapse, it's strictly a very un-physical trick.

Much more interestingly, Solomonoff probability hints that one should try really to search for something that would predict beyond probability distributions. I.e. search for objective collapse of some kind. Other issue: QM actually has problem at macroscopic scale, it doesn't add up to general relativity (without nasty hacks), so we are matter of factly missing something, and this whole issue is really silly argument over nothing as what we have is just a calculation rule that happens to work but we know is wrong somewhere anyway. I think that's the majority opinion on the issue. Postulating a zillion worlds based on known broken model would be tad silly. I think basically most physicists believe neither in collapse as in CI (beyond believing its a trick that works) nor believe in many worlds, because forming either belief would be wrong.

Replies from: Will_Sawin, Kaj_Sotala, albeola

↑ comment by Will_Sawin · 2012-06-12T04:54:15.324Z · LW(p) · GW(p)

Much more interestingly, Solomonoff probability hints that one should try really to search for something that would predict beyond probability distributions. I.e. search for objective collapse of some kind.

We face logical uncertainty here. We do not know if there is a theory of objective collapse that more compactly describes our current universe then MWI or random collapse does. I am inclined to believe that the answer is "no". This issue seems very subtle, and differences on it do not seem clear enough to damn an entire organization.

because forming either belief would be wrong.

this is not really a Bayesian standard of evidence. Do you also believe that, in a Bayesian sense, it is wrong to believe those theories.

Replies from: private_messaging

↑ comment by private_messaging · 2012-06-12T06:34:32.859Z · LW(p) · GW(p)

Bayesian sense as in Bayesian probability, or Bayesian sense as in local dianetics style stuff?

In Bayesian sense you have to stay on the priors and not update them because none of the 'evidence' actually links to either (the humans have general meta facility to say 'i don't know' when it's pure prior). In the local dianetics-like trope, you should start updating any time anyone claims that their argument is favouring either, when you come up with a vague and likely (extremely likely) incorrect handwave 'argument', or should do other nearly-guaranteed-to-be-faulty updates which you get when you don't consider all possible interpretations but just two and end up putting the stuff that should update something else, as updating the MWI. Yes, I think it is wrong to do faulty updates.

I used MWI as example of local arguing that tends to aggravate the experts. Maybe it shouldn't damn entire organization in your view, because MWI may be correct, but in the view of AI researcher who is presented with similarly faulty argument regarding AI, yes, the use of the faulty argumentation is sufficient to deem SI cranks/pseudo-scientists regardless of the truth value of the thing being argued about and regardless of the opinion on the AI risk. A believer in AI danger would still deem SI to be cranks if SI argues this way.

There's other glaring errors as well: http://www.ex-parrot.com/~pete/quantum-wrong.html

edit: actually, you should re-read the MWI arguments in question. This is a good example: http://lesswrong.com/lw/qa/the_dilemma_science_or_bayes/ From this text it would be deduced that EY's knowledge of Bayes, Solomonoff induction, Kolmogorov complexity, quantum mechanics, and scientific method, was much much lower than he believed it to be. The SI does exact same thing when it makes and presents bad AI danger arguments. As extreme example: suppose you told that you believe in AI risk because 3+7+12=23 . There's no logical connection from that formula to AI risk, and there's arithmetical mistake in the formula That sort of 'argument' is easy to make when you build your beliefs out of handwaves in topics that you poorly understand.

↑ comment by Kaj_Sotala · 2012-06-04T21:01:11.593Z · LW(p) · GW(p)

I don't really know Solomonoff induction or MWI on a formal level, but... If I know that the universe seems to obey rule X everywhere, and I know what my local environment is like and how applying rule X to that local environment would affect it, isn't that enough? Why would I need to include in my model a copy of the entire wavefunction that made up the universe, if having a model of my local environment is enough to predict how my local environment behaves? In other words, I don't need to spend a lot of effort selecting the subjective me, because my model is small enough to mostly only include the subjective me in the first place.

(I acknowledge that I don't know these topics well, and might just be talking nonsense.)

Replies from: private_messaging

↑ comment by private_messaging · 2012-06-05T06:14:43.893Z · LW(p) · GW(p)

I don't really know Solomonoff induction or MWI on a formal level

You know more about it than most of the people talking of it: you know you don't know it. They don't. That is the chief difference. (I also don't know it all that well, but at least I can look at the argument that it favours something, and see if it favours the iterator over all possible worlds even more)

If I know that the universe seems to obey rule X everywhere, and I know what my local environment is like and how applying rule X to that local environment would affect it, isn't that enough?

Formally, there's no distinction between rules you know and the environment. You are to construct shortest self containing piece of code that will be predicting the experiment. You will have to include any local environment data as well.

If you follow this approach to the logical end, you get Copenhagen Interpretation, shut up and calculate form: you don't need to predict all the outcomes that you'll never see. So you are on the right track.

Replies from: Will_Sawin

↑ comment by Will_Sawin · 2012-06-12T04:57:37.608Z · LW(p) · GW(p)

it doesn't take any extra code to predict all the outcomes that you'll never see. Just extra space/time. But those are not the minimized quantity. In fact, predicting all the outcomes that you'll never see is exactly the sort of wasteful space/time usage that programmers engage in when they want to minimize code length - it's hard to write code telling your processor to abandon certain threads of computation when they are no longer relevant.

Replies from: private_messaging

↑ comment by private_messaging · 2012-06-12T06:27:13.898Z · LW(p) · GW(p)

you missed the point. you need code for picking some outcome that you see out of outcomes that you didn't see, if you calculated those. It does take extra code to predict the outcome you did see if you actually calculated extra outcomes you didn't see, and then it's hard to tell what would require less code, one piece of code is not subset of the other and difference likely depends to encoding of programs.

↑ comment by albeola · 2012-06-04T22:38:50.524Z · LW(p) · GW(p)

The problem of locating "the subjective you" seems to me to have two parts: first, to locate a world, and second, to locate an observer in that world. For the first part, see the grandparent; the second part seems to me to be the same across interpretations.

Replies from: private_messaging

↑ comment by private_messaging · 2012-06-05T06:02:09.883Z · LW(p) · GW(p)

The point is, code of a theory has to produce output matching your personal subjective input. The objective view doesn't suffice (and if you drop that requirement, you are back to square 1 because you can iterate all physical theories). The CI has that as part of theory, MWI doesn't, you need extra code.

The complexity argument for MWI that was presented doesn't favour MWI, it favours iteration over all possible physical theories, because that key requirement was omitted.

And my original point is not that MWI is false, or that MWI has higher complexity, or equal complexity. My point is that argument is flawed. I don't care about MWI being false or true, I am using argument for MWI as an example of sloppiness SI should try not to have (hopefully without this kind of sloppiness they will also be far less sure that AIs are so dangerous).

↑ comment by D2AEFEA1 · 2012-06-04T10:28:47.334Z · LW(p) · GW(p)

Most of this seems unrelated to what the OP says. Are you sure you posted this in the right place?

Replies from: private_messaging

↑ comment by private_messaging · 2012-06-04T16:44:27.292Z · LW(p) · GW(p)

Yup. The MWI stuff is just a good local example of how not to justify what you believe. They're doing same with AI what Eliezer did with MWI: trying to justify things they not very rationally believe in with advanced concepts they poorly understand, which works on non-experts only.

↑ comment by John_Maxwell (John_Maxwell_IV) · 2012-06-04T07:58:01.619Z · LW(p) · GW(p)

In my opinion, instead of trying to spread awareness broadly, SI should focus on persuading/learning from just a few AI researchers who are most sympathetic to its current position. Those researchers will be able to inform their position and tell them how to persuade the rest most effectively.

Replies from: private_messaging

↑ comment by private_messaging · 2012-06-04T09:17:09.969Z · LW(p) · GW(p)

learning from just a few AI researchers who are most sympathetic to its current position

That's some very serious bias and circular updates on cherry picked evidence.

Actually, you know what's worst? Say, you discovered that your truth finding method shows both A and ~A . Normal reaction is to consider the truth finding method in question to be flawed - some of the premises are contradictory, set of axioms is flawed, the method is not rigorous enough, the understanding of concepts is too fuzzy, etc etc. If I were working on automatic proof system, or any automated reasoning really, and it would generate both A and ~A depending to the order of the search, I'd know I have a bug to fix (even if normally it only outputs A). The reaction here is instead to proudly announce refusal to check if your method also gives ~A when you have shown it gives A, and proud announcement of not giving up on the method that is demonstrably flawed (normally you move on to something less flawed, like being more rigorous)

On top of this - Dunning Kruger effect being what it is - it is expected that very irrational people would be irrational enough to believe themselves to be very rational, so if you claim to be very rational, there's naturally two categories with excluded middle - very rational and know it, and very irrational and too irrational to know it. A few mistakes incompatible with the former go a long way.

↑ comment by khafra · 2012-06-04T19:01:35.005Z · LW(p) · GW(p)

Why wrong: if you are seeking lowest complexity description of your input, your theory needs to also locate yourself within what ever stuff it generates somehow (hence appropriate discount for something really huge like MWI). Why stupid: because if you don't require that, then the iterator through all possible physical theories is the lowest complexity 'explanation' and we're back to square 1.

Are you saying that MWI proponents need to explain why observers like themselves are more likely across the entire wavefunction than other MWI-possible observers? That's an interesting perspective, and a question I'd like to see addressed by smart people.

Replies from: private_messaging

↑ comment by private_messaging · 2012-06-05T05:54:39.777Z · LW(p) · GW(p)

That too.

My main point though is that you can't dispose of the code for generating the subjective view complete with some code for collapsing the observer (and subsequently collapsing the stuff entangled with observer). The 'objective' viewpoint doesn't suffice. It does not suffice to output something out of which intelligent observer will figure out the rest. With Solomonoff induction you are to predict your input. Not some 'objective' something. And if you drop that requirement, the whole thing falls apart. It is unclear whenever the shortest subjective experience generating code on top of MWI will be simpler than what you have in CI, or even distinct.

Replies from: CarlShulman

↑ comment by CarlShulman · 2012-06-13T23:28:18.465Z · LW(p) · GW(p)

I agree that MWI doesn't help much in explaining our sensory strings in a Solomonoff Induction framework, relative to "compute the wave function, sample experiences according to some anthropic rule and weighted by squared amplitude." This argument is known somewhat widely around here, e.g. see this Less Wrong post by Paul Christiano, under "Born probabilities," and discussions of MWI and anthropic reasoning going back to the 190s (on the everything-list, in Nick Bostrom's dissertation, etc).

MWI would help in Solomonoff induction if there was some way of deriving the Born probabilities directly from the theory. Thus Eliezer's praise of Robin Hanson's mangled worlds idea. But at the moment there is no well-supported account of that type, as Eliezer admitted.

It's also worth distinguishing between complexity of physical laws, and anthropic penalties. Accounts of the complexity/prior of anthropic theories and measures to use in cosmology are more contested than simplicity of physical law. The Solomonoff prior implies some contested views about measure.

comment by CWG · 2012-06-04T05:50:16.248Z · LW(p) · GW(p)

I don't know how many LessWrongers knew what AGI meant. (Apparently it's artificial general intelligence, aka Strong AI).

Replies from: wedrifid

↑ comment by wedrifid · 2012-06-04T06:16:05.414Z · LW(p) · GW(p)

I don't know how many LessWrongers knew what AGI meant.

Greater than 90%.

comment by CWG · 2012-06-04T05:58:20.810Z · LW(p) · GW(p)

Just looking at Wikipedia, and artificial general intelligence redirects to Strong AI).

I'm concerned that there's no mention of dangers, risks, or caution in the Wikipedia article.* Is there any "notable" information on the topic that could be added to the article? E.g. discussion of the subject in a publication of some kind (book or magazine/newspaper - not a self-published book).

*haven't read the whole thing - just did a search.

Replies from: Colby, John_Maxwell_IV, Colby

↑ comment by Colby · 2012-07-05T06:26:20.544Z · LW(p) · GW(p)

See Wikilink

↑ comment by John_Maxwell (John_Maxwell_IV) · 2012-06-04T16:42:50.347Z · LW(p) · GW(p)

http://singinst.org/upload/artificial-intelligence-risk.pdf appears to have been published in Global Catastrophic Risks, by Oxford University Press.

↑ comment by Colby · 2012-07-05T06:23:22.113Z · LW(p) · GW(p)

I have made the addition you suggested, this is a good time for suggestions or improvements...

comment by timtyler · 2012-06-03T02:43:13.077Z · LW(p) · GW(p)

Propaganda - to encourage competitors to slow down.

However, is there a good reason to think that such propaganda would be effective?

IMO, a more obvious approach would be to go directly for public opinion.

Replies from: CWG

↑ comment by CWG · 2012-06-04T05:38:54.485Z · LW(p) · GW(p)

Would negative public opinion do much more than (a) force such research underground, or (b) lead to researchers being more circumspect?

(Not a rhetorical question - just unsure whether focusing on public opinion is a useful approach.)

Replies from: timtyler

↑ comment by timtyler · 2012-06-04T09:54:04.850Z · LW(p) · GW(p)

Organisations seem more likely to take advice from their customers than their competitors.

Terminator, Matrix (and soon Robopocalypse) have already had a good go at public opinion on AI, though.

Raising safety-consciousness among AGI researchers

Contents

32 comments