Connecting Your Beliefs (a call for help)

post by lukeprog · 2011-11-20T05:18:08.193Z · LW · GW · Legacy · 73 comments

Contents

73 comments

A couple weeks after meeting me, Will Newsome gave me one of the best compliments I’ve ever received. He said: “Luke seems to have two copies of the Take Ideas Seriously gene.”

What did Will mean? To take an idea seriously is “to update a belief and then accurately and completely propagate that belief update through the entire web of beliefs in which it is embedded,” as in a Bayesian belief network (see right).

Belief propagation is what happened, for example, when I first encountered that thundering paragraph from I.J. Good (1965):

Let an ultraintelligent machine be defined as a machine that can far surpass all the intellectual activities of any man however clever. Since the design of machines is one of these intellectual activities, an ultraintelligent machine could design even better machines; there would then unquestionably be an "intelligence explosion," and the intelligence of man would be left far behind... Thus the first ultraintelligent machine is the last invention that man need ever make.

Good’s paragraph ran me over like a train. Not because it was absurd, but because it was clearly true. Intelligence explosion was a direct consequence of things I already believed, I just hadn’t noticed! Humans do not automatically propagate their beliefs, so I hadn’t noticed that my worldview already implied intelligence explosion.

I spent a week looking for counterarguments, to check whether I was missing something, and then accepted intelligence explosion to be likely (so long as scientific progress continued). And though I hadn’t read Eliezer on the complexity of value, I had read David Hume and Joshua Greene. So I already understood that an arbitrary artificial intelligence would almost certainly not share our values.

Accepting my belief update about intelligence explosion, I propagated its implications throughout my web of beliefs. I realized that:

I had encountered the I.J. Good paragraph on Less Wrong, so I put my other projects on hold and spent the next month reading almost everything Eliezer had written. I also found articles by Nick Bostrom and Steve Omohundro. I began writing articles for Less Wrong and learning from the community. I applied to Singularity Institute’s Visiting Fellows program and was accepted. I quit my job in L.A., moved to Berkeley, worked my ass off, got hired, and started collecting research related to rationality and intelligence explosion.

My story surprises people because it is unusual. Human brains don’t usually propagate new beliefs so thoroughly.

But this isn’t just another post on taking ideas seriously. Will already offered some ideas on how to propagate beliefs. He also listed some ideas that most people probably aren’t taking seriously enough. My purpose here is to examine one prerequisite of successful belief propagation: actually making sure your beliefs are connected to each other in the first place.

If your beliefs aren’t connected to each other, there may be no paths along which you can propagate a new belief update.

I’m not talking about the problem of free-floating beliefs that don’t control your anticipations. No, I’m talking about “proper” beliefs that require observation, can be updated by evidence, and pay rent in anticipated experiences. The trouble is that even proper beliefs can be inadequately connected to other proper beliefs inside the human mind.

I wrote this post because I'm not sure what the "making sure your beliefs are actually connected in the first place" skill looks like when broken down to the 5-second level.

I was chatting about this with atucker, who told me he noticed that successful businessmen may have this trait more often than others. But what are they doing, at the 5-second level? What are people like Eliezer and Carl doing? How does one engage in the purposeful decompartmentalization of one's own mind?

73 comments

Comments sorted by top scores.

comment by Eugine_Nier · 2011-11-20T18:44:37.100Z · LW(p) · GW(p)

Also keep in mind that it's more important to make your beliefs as correct as possible then to make them as consistent as possible. Of course the ultimate truth is both correct and consistent; however, it's perfectly possible to make your beliefs less correct by trying to make them more consistent. If you have two beliefs that do a decent job of modeling separate aspects of reality, it's probably a good idea to keep both around, even if they seem to contradict each other. For example, both General Relativity and Quantum Mechanics do a good job modeling (parts of) reality despite being inconsistent and we want to keep both of them. Now think about what happens when a similar situation arises in a field, e.g., biology, psychology, your personal life, where evidence is messier then it is in physics.

Replies from: amcknight, atucker
comment by amcknight · 2011-12-22T20:31:44.910Z · LW(p) · GW(p)

This comment, if expanded, would make for a great main post. Especially if you can come up with a second good example.

comment by atucker · 2011-11-21T05:08:09.715Z · LW(p) · GW(p)

It's also worth noting that they are actually inconsistent though, rather than just not thinking about that issue.

I think having connected beliefs is helpful for noticing that.

comment by Morendil · 2011-11-20T15:54:07.007Z · LW(p) · GW(p)

Something bothers me about this post. Querying my mind for "what bothers me about this post", the items that come up are:

  • the post starts with a pretty picture of a "web of beliefs", but is itself a long string of words
  • the picture is decorative, but certainly not illustrative; it is not a picture of your beliefs
  • the post offers some beliefs that you claim are connected, but none of the reasoning behind the claims
  • the post offers anecdotes, e.g. "successful businessmen may have this trait more often than others", but does not represent them as beliefs in a web of beliefs

The general theme appears to be "this post doesn't practice what it preaches".

If you did represent the beliefs you mention in this post in the form of a connected web, what would that look like?

(On a very few prior occasions I've tried such explicit representations, and was not motivated to keep using them.)

Replies from: thomblake
comment by thomblake · 2011-11-21T16:21:47.632Z · LW(p) · GW(p)

The general theme appears to be "this post doesn't practice what it preaches".

Indeed, I believe the point of this post was partly to solve that sort of problem. Note it is a request for solutions.

Replies from: Morendil
comment by Morendil · 2011-11-21T17:10:46.889Z · LW(p) · GW(p)

Agreed; I wasn't saying that as an objection to the underlying idea. I only wanted to state a few things that were bothering me about the post.

Quite a few of the replies were off-topic, so possibly others have been bothered by some aspects of it too. My idea is that future requests for help along the same lines could be more effective if presented in a less troublesome form.

The part where I do my best to respond to the request is the last couple lines.

comment by XiXiDu · 2011-11-20T10:35:00.257Z · LW(p) · GW(p)

I hadn’t noticed that my worldview already implied intelligence explosion.

I'd like to see a post on that worldview. The possibility of an intelligence explosion seems to be an extraordinary belief. What evidence justified a prior strong enough as to be updated on a single paragraph, written in natural language, to the extent that you would afterwards devote your whole life to that possibility?

I’m not talking about the problem of free-floating beliefs that don’t control your anticipations. No, I’m talking about “proper” beliefs that require observation, can be updated by evidence, and pay rent in anticipated experiences.

How do you anticipate your beliefs to pay rent? What kind of evidence could possible convince you that an intelligence explosion is unlikely, how could your beliefs be surprised by data?

Replies from: Giles, lessdazed, SimonF
comment by Giles · 2011-11-20T17:28:23.685Z · LW(p) · GW(p)

What evidence justified a prior strong enough as to be updated on a single paragraph

I can't speak for lukeprog, but I believe that "update" is the wrong word to use here. If we acted like Bayesian updaters then compartmentalization wouldn't be an issue in the first place. I.J. Good's paragraph, rather than providing evidence, seems to have been more like a big sign saying "Look here! This is a place where you're not being very Bayesian!". Such a trigger doesn't need to be written in any kind of formal language - it could have been an offhand comment someone made on a completely different subject. It's simply that (to an honest mind), once attention is drawn to an inconsistency in your own logic, you can't turn back.

That said, lukeprog hasn't actually explained why his existing beliefs strongly implied an intelligence explosion. That wasn't the point of this post, but like you it's a post that I'd very much like to see. I'm interested in trying to build a Bayesian case for or against the intelligence explosion (and other singularity-ish outcomes).

You're right that there's a problem obtaining evidence for or against beliefs about the future. I can think of three approaches:

  • Expertology - seeing what kinds of predictions have been made by experts in the past, and what factors are correlated with them being right.
  • Models - build a number of parameterizable models of the world which are (necessarily) simplified but which are at least capable of modeling the outcome you're interested in. Give a prior for each and then do a Bayesian update according to how well that model predicts the past.
  • There might be an intuitive heuristic along the lines of "if you can't rule it out then it might happen", but I don't know how to formalize that or make it quantitative.

So I'm interested in whether these can be done without introducing horrible biases, whether anyone's tried them before or whether there are any other approaches I've missed out.

Replies from: XiXiDu
comment by XiXiDu · 2011-11-20T18:54:20.247Z · LW(p) · GW(p)

Thanks, I have yet to learn the relevant math. I only have a very vague idea about Bayesian probability at this point so my use of "update" might very well be wrong, as you indicated.

There are some things I am particularly confused about when it comes to probability. I am looking forward to the time when I start learning probability theory (currently doing Calculus).

Just a two examples:

I don't understand why you can't just ignore some possible outcomes. We are computationally limited agents after all. Thus if someone comes a long and tells me about a certain possibility that involves a huge amount of utility but doesn't provide enough (how much would be enough anyway?) evidence, my intuition is to just ignore that possibility rather than to assign 50% probability to it. After all, if humanity was forced to throw a fair coin and heads would imply doom then I'd be in favor of doing everything to ensure that the coin doesn't land heads (e.g. a research project that figures out how to throw it in a certain way that maximizes the probability of it landing tails). But that would be crazy because anyone could come along and make unjustified predictions that involve huge utility stakes and I would have to ignore all lower utility possibilities even if there is a lot of empirical evidence in favor of them.

Something else I am confused about is how probability is grounded. What rules are used to decide how much certain evidence can influence my probability estimates? I mean, if using probability theory only shifts my use of intuition towards the intuitive assignment of numerical probability estimates of evidence, in favor of a possibility, then it might as well distort my estimates more than trusting my intuition about the possibility alone. Because 1.) if using my intuition to decide how much each piece of evidence changes the overall probability, I can be wrong on each occasion rather than just one (accumulation of error) 2.) humans are really bad with numbers, and forcing them to put a number on something vague might in some cases cause them to over or underestimate their confidence by many orders of magnitude.

Replies from: Giles, dlthomas
comment by Giles · 2011-11-21T00:29:17.077Z · LW(p) · GW(p)

I only have a very vague idea about Bayesian probability at this point so my use of "update" might very well be wrong

I think most people use "update" colloquially, i.e. something along the lines of "what you just said appears to constitute evidence that I wasn't previously aware of, and I should change my beliefs accordingly". I don't know how often rationalists actually plug these things into a formula.

I don't understand why you can't just ignore some possible outcomes.

This is the problem of Pascal's Mugging - I think it's something that people are still confused about. In general, if someone tells you about a really weird possibility you should assign it a probability of a lot less than 50%, as it would essentially be a conjunction of a lot of unlikely events. The problem is that the utility might be so high (or low) that when you multiply it by this tiny probability you still get something huge.

I'm still waiting for an answer to that one, but in the meantime it seems worth attacking the problems that lie in the grey area between "easily tractable expected utility calculations" and "classic Pascal's mugging". For me, AI risk mitigation is still in that grey area.

Replies from: lessdazed
comment by lessdazed · 2011-11-21T00:59:45.743Z · LW(p) · GW(p)

The problem is that the utility might be so high (or low) that when you multiply it by this tiny probability you still get something huge.

Don't worry about it; if you decline a Pascal's mugging I'll cause positive utility equal to twice the amount of negative utility you were threatened with, and if you accept one I'll cause negative utility equal to twice what you were threatened with.

Trust me.

Replies from: orthonormal, endoself
comment by orthonormal · 2011-11-25T18:15:57.685Z · LW(p) · GW(p)

Excuse me, I'm going to go and use this on all my friends.

comment by endoself · 2011-11-21T02:23:50.603Z · LW(p) · GW(p)

Why did I never think of this? I mean I have thought of very similar things in a thought experiment sense and I've even used it to explain to people that paying the mugging cannot be something correct but unpalatable, but it never occurred to me to use it on someone.

comment by dlthomas · 2011-11-20T19:41:15.145Z · LW(p) · GW(p)

What rules are used to decide how much certain evidence can influence my probability estimates?

Bayes' Theorem is precisely that rule.

Replies from: XiXiDu
comment by XiXiDu · 2011-11-20T20:41:25.933Z · LW(p) · GW(p)

Bayes' Theorem is precisely that rule.

That's not what I meant, I have been too vague. It is clear to me how to update on evidence given concrete data or goats behind doors in game shows. What I meant is how one could possible update on evidence like the victory of IBM Watson at Jeopardy regarding risks of AI. It seems to me that assigning numerical probability estimates to such evidence, that is then used to update on the overall probability of risks from AI, is a very shaky affair that might distort the end result as one just shifted the use of intuition towards the interpretation of evidence in favor of an outcome rather than the outcomes itself.

Replies from: None
comment by [deleted] · 2011-11-20T21:49:55.078Z · LW(p) · GW(p)

Causal analysis is probably closer to what you're looking for. It displays stability under (small) perturbation of relative probabilities, and it's probably closer to what humans do under the hood than Bayes' theorem. Pearl often observes that humans work with cause and effect with more facility than numerical probabilities.

Replies from: pnrjulius
comment by pnrjulius · 2012-06-05T16:08:37.298Z · LW(p) · GW(p)

Numerical stability is definitely something we need in our epistemology. If small errors make the whole thing blow up, it's not any good to us, because we know we make small errors all the time.

comment by lessdazed · 2011-11-21T00:21:11.312Z · LW(p) · GW(p)

What kind of evidence could possible convince you that an intelligence explosion is unlikely, how could your beliefs be surprised by data?

There is no reason to believe intelligence stops being useful for problem solving as one gets more of it, but I can easily imagine evidence that would suggest that.

A non-AI intelligence above human level, such as a human with computer processors integrated into his or her brain, a biologically enhanced human, etc. might prove to be no more usefully intelligent than Newton, Von Neumann, etc. despite being an order of magnitude smarter by practical measures.

to the extent that you would afterwards devote your whole life to that possibility?

Leveraging makes little sense according to many reasonable utility functions. If one is guessing the color of random cards, and 70% of the cards are red, and 30% are blue, and red and blue pay out equally, one should always guess red each turn.

Where utility is related to the ln of money, it makes sense to diversify, but that is different from a life-impact sort of case where one seeks to maximize the utility of others.

The outside view is to avoid total commitment lest one be sucked into a happy death spiral and suffer from the sunk costs fallacy, but if those and similar fallacies can be avoided, total commitment makes sense.

comment by Simon Fischer (SimonF) · 2011-11-20T18:34:17.593Z · LW(p) · GW(p)

The possibility of an intelligence explosion seems to be an extraordinary belief.

Extraordinary compared to what? We already now that most people are insane, so that belief beeing not shared by almost everybody doesn't make it unlikely a priori. In some ways the intellgence explosion is a straightforward extrapolation of what we know at the moment, so I don't think your critisism is valid here.

What evidence justified a prior strong enough as to be updated on a single paragraph, written in natural language, to the extent that you would afterwards devote your whole life to that possibility?

I think one could tell a reasonably competent physicist 50 years prior to Schrödinger how to derive quantum mechanics in one paragraph of natural language. Human language can contain lots of information, especially if speaker and listener already share a lot of concepts.

I'm not sure why you've written your comment, are you just using the opportunity to bring up this old topic again? I find myself irritated by this, even though I probably agree with you :)

Replies from: XiXiDu
comment by XiXiDu · 2011-11-20T19:42:08.217Z · LW(p) · GW(p)

For me there are a lot of things that sound completely sane but might be just bunk. Antimatter weapons, grey goo, the creation of planet eating black holes by particle accelerators, aliens or string theory. I don't have the necessary education to discern those ideas from an intelligence explosion. They all sound like possibilities to me that might or might not be true. All I can do is to recognize their extraordinary status and subsequently demand the peer-review and assessment of those ideas by a larger and educated audience. Otherwise I run the risk of being swayed by the huge utility associated with those ideas, I run the risk of falling prey to a Pascalian mugger.

I think one could tell a reasonably competent physicist 50 years prior to Schrödinger how to derive quantum mechanics in one paragraph of natural language.

Is Luke Muehlhauser that competent when it comes to all the fields associated with artificial intelligence research?

I'm not sure why you've written your comment, are you just using the opportunity to bring up this old topic again?

It's not old, it becomes more relevant each day. Since I first voiced skepticism about the topic they expanded to the point of having world-wide meetings. At least a few people playing devil's advocate is a healthy exercise in my opinion :-)

Replies from: AlexSchell, Nisan
comment by AlexSchell · 2011-11-20T20:31:31.709Z · LW(p) · GW(p)

Are you playing devil's advocate?

comment by Nisan · 2011-11-21T18:11:47.547Z · LW(p) · GW(p)

Antimatter weapons, grey goo, the creation of planet eating black holes by particle accelerators, aliens or string theory.

For what it's worth, I consider these things to have very different levels and kinds of implausibility.

comment by Zetetic · 2011-11-20T22:48:33.472Z · LW(p) · GW(p)

This is interesting to me in a sort of tangential way. It seems like studying philosophy exercises this tendency to propagate your beliefs in order to make them coherent. In fact logical belief propagation seems to embody a large aspect of traditional philosophy, so I would expect that on average someone who studies philosophy would have this tendency to a greater degree than someone who doesn't.

It would be interesting to me if anyone has seen any data related to this, because it feels intuitively true that studying philosophy changed my way of thinking, but it's of course difficult to pinpoint exactly how. This seems like a big part of it.

Replies from: thomblake, jmmcd
comment by thomblake · 2011-11-21T16:31:42.327Z · LW(p) · GW(p)

This seems right to me. I know a fair amount of philosophers and at least some varieties of them seem to naturally disfavor compartmentalization.

For example, I was originally an atheist for purely methodological reasons. Religion invites dead dogma, and dead dogma kills your ability to discover true beliefs (for exactly the same reason "0 and 1 are not probabilities"). I felt therefore that adhering to a religion while being a philosopher was professionally irresponsible.

comment by jmmcd · 2011-11-22T01:30:01.246Z · LW(p) · GW(p)

I agree, but more generally, philosophy takes more or less as an axiom the idea that a person's beliefs should not be self-contradictory. Many people operate without any such axiom, or (more importantly) without awareness that such an idea exists. Probably any type of scientific training contributes to raising consciousness from this very low base level.

Replies from: Zetetic
comment by Zetetic · 2011-11-22T02:45:43.982Z · LW(p) · GW(p)

With that view, another issue is to what extent philosophy and scientific training are more attractive to people that have a tendency to avoid compartmentalization and to what extent studying philosophy and science assists in/amplifies the ability to propagate beliefs, if at all.

It seems like the sort of study that goes into these areas of study would provide the learner with heuristics for belief propagation, though each area might come equipped with unique heuristics.

comment by tetsuo55 · 2011-11-20T14:04:10.671Z · LW(p) · GW(p)

This post made me realize just how important it is to completely integrate the new things you learn.

I have been reading a lot of books and blogs on the subject of students that finish school with honors, but don't seem to work very hard while doing so. I also met one of those people in person (he finished an entire 4 year curriculum with honors in just 3 months and is now a professor of that content)

It all boils to the same thing: Whatever the textbook is trying to tell you, make sure you integrate that in your life. Only then will you see if you really understood what it was saying and if you are missing any extra information, or if the information in the book is wrong. Once integrated you do not need any extra studying to get an A/10 for the exam.(because you will have recursively updated all your beliefs to include the thing you where supposed to learn)

Some of these books and blogs go into detail on how to how to do this. One of the methods i read was making a doodle of the idea in your notebook. This doodle borrows heavily from your current state of knowledge. An example of what I did: To model the process of taking a raw resource and making it into a profitable end product i drew a mine with rocks coming out, then a table with a chisel on the rock and finally a diamond with a price-tag. I know how diamonds are made so i could use that to represent this process.

There are many more methods, another that i have not yet tried to use is basically making a flashcard.: Question/Evidence/Conclusion http://calnewport.com/blog/2009/04/06/4-weeks-to-a-40-streamline-your-notes/

EDIT: I'm having a hard time explaining what i am trying to say, i will post a new comment or top level post if i manage to figure it out. Basically I'm trying to say that there already well working and documented methods for connecting and updating beliefs in the world of outlier student research.

Replies from: Technoguyrob, None
comment by robertzk (Technoguyrob) · 2011-11-22T01:04:46.703Z · LW(p) · GW(p)

I am one such person. I finished college at the age of 16, and I knew I was merely very good at guessing the teacher's password. People's remarks about my intelligence were dismissed by me internally, because I was aware of my own ignorance.

However, what you say can be difficult to apply in practice during a semester. I see formal education as a method for gathering data, which you can use for Bayesian propagation after the fact. This is why it can feel like you learn much more thinking between semesters, rather than during.

Your notion of necessity of integration is uncorrelated to outlier students. Given an outlier student, I would be surprised if active integration of textbook data was lower than given a non-outlier student. In both cases this conditional probability is, sadly, very small.

Replies from: tetsuo55
comment by tetsuo55 · 2011-11-22T12:14:50.888Z · LW(p) · GW(p)

I too am very good at guessing the teachers password in addition to really learning the textbook contents. I am talking specifically about those students that do not use guessing the teacher's password as a way to finish with honors. I always do the propagation during the learning itself and improve upon it after the fact (i'll suddenly realize that something is off or changed days later)

I said i had a hard time explaining it and your comment makes extra clear that i failed. I will use your feedback to improve the text i have in mind.

comment by [deleted] · 2011-11-20T19:33:43.331Z · LW(p) · GW(p)

EDIT: I'm having a hard time explaining what i am trying to say, i will post a new comment or top level post if i manage to figure it out. Basically I'm trying to say that there already well working and documented methods for connecting and updating beliefs in the world of outlier student research.

I would find such a post very useful.

Replies from: tetsuo55, Metus
comment by tetsuo55 · 2011-11-22T14:33:29.897Z · LW(p) · GW(p)

This video tries to explain what i mean, i hope the inferential distance is not too far

http://www.youtube.com/watch?feature=player_embedded&v=1F3vmNeyOvU

Replies from: None
comment by [deleted] · 2011-11-22T15:37:15.349Z · LW(p) · GW(p)

I have stumbled on links to his books and blogs, many on the IRC channel where rather sceptical of the usefulness of his advice. My own prior was rather low.

Nevertheless I would very like LWers to share their impressions on this, since there is something there that looks almost like it could work.

comment by Metus · 2011-11-20T21:23:47.745Z · LW(p) · GW(p)

I would also be interested and upvoted both this post and the parent for encouragement.

Replies from: tetsuo55
comment by tetsuo55 · 2011-11-22T14:33:35.890Z · LW(p) · GW(p)

I found a video that explains what i mean at a very basic level http://www.youtube.com/watch?feature=player_embedded&v=1F3vmNeyOvU

comment by antigonus · 2011-11-20T11:42:01.390Z · LW(p) · GW(p)

I spent a week looking for counterarguments, to check whether I was missing something

What did you find? Had you missed anything?

comment by TheOtherDave · 2011-11-20T21:08:50.068Z · LW(p) · GW(p)

Part of my day job involves doing design reviews for alterations to relatively complex software systems, often systems that I don't actually understand all that well to begin with. Mostly, what I catch are the integration failures; places where an assumption in one module doesn't line up quite right with the end-to-end system flow, or where an expected capability isn't actually being supported by the system design, etc.

Which isn't quite the same thing as what you're talking about, but has some similarities; being able to push through from an understanding of each piece of the system to an understanding of the expected implications of that idea across the system as a whole.

But thinking about it now, I'm not really sure what that involves, certainly not at the 5-second level.

A few not-fully-formed thoughts: Don't get distracted by details. Look at each piece, figure out its center, draw lines from center to center. Develop a schematic understanding before I try to understand the system in its entirety. If it's too complicated to understand that way, back out and come up with a different way of decomposing the system into "pieces" that results in fewer pieces. Repeat until I have a big-picture skeleton view. Then go back to the beginning, and look at every detail, and for each detail explain how it connects to that skeleton. Not just understand, but explain: write it down, draw a picture, talk it through with someone. That includes initial requirements: explain what each requirement means in terms of that skeleton, and for each such requirement-thread find a matching design-thread.

So, of course someone's going to ask for a concrete example, and I can't think of how to be concrete without actually working through a design review in tedious detail, which I really don't feel like doing. So I recognize that the above isn't really all that useful in and of itself, but maybe it's a place to start.

Replies from: thomblake
comment by thomblake · 2011-11-21T16:27:21.187Z · LW(p) · GW(p)

That's an interesting metaphor.

I wonder if it doesn't actually support compartmentalization. In software, you don't want a lot of different links between the internals of modules, and so it seems you might not want lots of links between different belief clusters. Just make sure they're connected with a clean API.

Of course, that's not really compartmentalization, that's just not drawing any more arrows on your Bayes net than you need to. If your entire religious belief network really hangs on one empirical claim, that might be a sufficient connection between it and the rest of your beliefs, at least for efficiency.

Replies from: TheOtherDave
comment by TheOtherDave · 2011-11-21T18:03:04.853Z · LW(p) · GW(p)

I agree that this approach likely works well for software precisely because software tends to be built modularly to begin with... it probably would work far less well for analyzing evolved rather than constructed systems.

Replies from: pnrjulius
comment by pnrjulius · 2012-06-05T16:02:24.217Z · LW(p) · GW(p)

I wouldn't be so sure. Tooby and Cosmides keep finding evidence that our brains are organized to some degree in terms of adaptive modules, like "cheater detection" and "status seeking" which work spectacularly well in the domain for which they evolved, and then fail miserably when expanded to other domains. If we had a "modus tollens" module or a "Bayesian update" module, we'd be a lot smarter than we are; but apparently we don't, because most people are atrociously bad at modus tollens and Bayesian updating, but remarkably good at cheater detection and status seeking.

(If you haven't seen it already, look up the Wason task experiments. They clearly show that human beings fail miserably at modus tollens---unless it's formulated in terms of cheater detection, in which case we are almost perfect.)

Replies from: TheOtherDave
comment by TheOtherDave · 2012-06-05T16:25:40.256Z · LW(p) · GW(p)

Yup, I'm familiar with the cheater stuff, and I agree that the brain has subsystems which work better in some domains than others.

The thing about a well-designed modular architecture, though, is not just that it has task-optimized subsystems, but that those subsystems are isolated from one another and communicate through interfaces that support treating them more or less independently. That's what makes the kind of compartmentalization thomblake is talking about feasible.

If, instead, I have a bunch of subsystems that share each other's code and data structures, I may still be able to identify "modules" that perform certain functions, but if I try to analyze (let alone optimize or upgrade) those modules independently I will quickly get bogged down in interactions that require me to understand the whole system.

comment by daenerys · 2011-11-20T18:12:37.663Z · LW(p) · GW(p)

This reminds me of a blog post by Jula Galef I read a couple weeks ago: The Penrose Triangle of Belief .

Replies from: Nisan
comment by Nisan · 2011-11-21T18:05:24.841Z · LW(p) · GW(p)

Before I clicked on the link I thought it was going to be about this.

comment by rwallace · 2011-11-20T07:10:15.927Z · LW(p) · GW(p)

That's actually a good question. Let me rephrase it to something hopefully clearer:

Compartmentalization is an essential safety mechanism in the human mind; it prevents erroneous far mode beliefs (which we all adopt from time to time) from having disastrous consequences. A man believes he'll go to heaven when he dies. Suicide is prohibited in a patch for the obvious problem, but there's no requirement to make an all-out proactive effort to stay alive. Yet when he gets pneumonia, he gets a prescription for penicillin. Compartmentalization literally saves his life. In some cases many other lives, as we saw when it failed on 9/11.

Here we have a case study where a man of intelligence and goodwill redirected his entire life down a path of negative utility on the basis of reading a single paragraph of sloppy wishful thinking backed up by no evidence whatsoever. (The most straightforward refutation of that paragraph is that creating a machine with even a noteworthy fraction of human intelligence is far beyond the capacity of any human mind; the relevant comparison of such a machine if built would be with that which created it, which would have to be a symbiosis of humanity and its technology as a whole - with that symbiosis necessarily being much more advanced than anything we have today.) What went wrong?

The most obvious part of the answer is that this is an error to which we geeks are particularly prone. (Supporting data: terrorists are disproportionately likely to be trained in some branch of engineering.) Why? Well, we are used to dealing in domains where we can actually apply long chains of logic with success; particularly in the age range when we are old enough to have forgotten how fallible were our first attempts at such logic, yet young enough to be still optimists, it's an obvious trap to fall into.

Yet most geeks do actually manage to stay out of the trap. What else goes wrong?

It seems to me that there must be a parameter in the human mind for grasping the inertia of the world, for understanding at a gut level how much easier is concept than reality, that we can think in five minutes of ideas that the labor of a million people for a thousand years cannot realize. I suppose in some individuals this parameter must be turned up too high, and they fall too easily into the trap of learned helplessness. And in some it must be turned too low, and those of us for whom this is the case undertake wild projects with little chance of success; and if ninety-nine fail for every one who succeeds, that can yet drive the ratchet of progress.

But we easily forget that progress is not really a ratchet, and the more advanced our communications, the more lethal bad ideas become, for just as our transport networks spread disease like the 1918 flu epidemic which killed more people in a single year than the First World War killed in four years, so our communication networks spread parasite memes deadlier still. And we can't shut down the networks. We need them too badly.

I've seen the Singularity mutate from a harmless, even inspiring fantasy, to a parasite meme that I suspect could well snuff out the entire future of intelligent life. It's proving itself in many cases immune to any weight of evidence against it; perhaps worst of all, it bypasses ethical defenses, for it can be spread by people of honest goodwill.

Compartmentalization seems to be the primary remaining defense. When that fails, what have we left? This is not a rhetorical question; it may be one of the most important in the world right now.

Replies from: drethelin, marchdown, Morendil
comment by drethelin · 2011-11-20T07:36:45.935Z · LW(p) · GW(p)

Compartmentalization may make ridiculous far beliefs have less of an impact on the world, but it also allows those beliefs to exist in the first place. If your beliefs about religion depended on the same sort evidence that underpins your beliefs about whether your car is running, then you could no more be convinced of religion than you could be convinced by a mechanic that your car "works" even though it does not start.

Replies from: rwallace
comment by rwallace · 2011-11-20T07:40:42.433Z · LW(p) · GW(p)

So your suggestion is that we should de-compartmentalize, but in the reverse direction to that suggested by the OP, i.e. instead of propagating forward from ridiculous far beliefs, become better at back-propagating and deleting same? There is certainly merit in that suggestion if it can be accomplished. Any thoughts on how?

Replies from: drethelin
comment by drethelin · 2011-11-20T07:48:13.058Z · LW(p) · GW(p)

You don't understand. Decompartmentalization doesn't have a direction. You don't go forwards towards a belief or backwards from a belief, or whatever. If your beliefs are decompartmentalized that means that the things you believe will impact your other beliefs reliably. This means that you don't get to CHOOSE what you believe. If you think the singularity is all important and worth working for, it's BECAUSE all of your beliefs align that way, not because you've forced your mind to align itself with that belief after having it.

Replies from: rwallace
comment by rwallace · 2011-11-20T07:57:33.402Z · LW(p) · GW(p)

I understand perfectly well how a hypothetical perfectly logical system would work (leaving aside issues of computational tractability etc.). But then, such a hypothetical perfectly logical system wouldn't entertain such far mode beliefs in the first place. What I'm discussing is the human mind, and the failure modes it actually exhibits.

comment by marchdown · 2011-11-20T09:12:23.441Z · LW(p) · GW(p)

What is that evidence against singularity which you're alluding to?

Replies from: rwallace
comment by rwallace · 2011-11-20T10:00:07.935Z · LW(p) · GW(p)

I discuss some of it at length here: http://lesswrong.com/lw/312/the_curve_of_capability/

I'll also ask the converse question: given that you can't typically prove a negative (I can't prove the nonexistence of psychic powers or flying saucers either), if what we are observing doesn't constitute evidence against the Singularity in your opinion, then what would?

Replies from: Kaj_Sotala
comment by Kaj_Sotala · 2011-11-20T13:43:34.708Z · LW(p) · GW(p)

if what we are observing doesn't constitute evidence against the Singularity in your opinion, then what would?

I'm not marchdown, but:

Estimating the probability of a Singularity requires looking at various possible advantages of digital minds and asking what would constitute evidence against such advantages being possible. Some possibilities:

  • Superior processing power. Evidence against would be the human brain already being close to the physical limits of what is possible.
  • Superior serial power: Evidence against would be an inability to increase the serial power of computers anymore.
  • Superior parallel power: Evidence against would be an indication of extra parallel power not being useful for a mind that already has human-equivalent (whatever that means) parallel power.
  • Improved algorithms: Evidence against would be the human brain's algorithms already being perfectly optimized and with no further room for improvement.
  • Designing new mental modules: Evidence against would be evidence that the human brain's existing mental modules are already sufficient for any cognitive task with any real-world relevance.
  • Modifiable motivation systems: Evidence against would be evidence that humans are already optimal at motivating themselves to work on important tasks, that realistic techniques could be developed to make humans optimal in this sense, or that having a great number of minds without any akrasia issues would have no major advantage over humans.
  • Copyability: Evidence against would be evidence that minds cannot be effectively copied, maybe because there won't be enough computing power to run many copies. Alternatively, that copying minds would result in rapidly declining marginal returns and that the various copying advantages discussed by e.g. Hanson and Shulman aren't as big as they seem.
  • Perfect co-operation: Evidence against would be that no minds can co-operate better than humans do, or at least not to such an extent that they'd receive a major advantage. Also, evidence of realistic techniques bringing humans to this level of co-operation.
  • Superior communication: Evidence against would be that no minds can communicate better than humans do, or at least not to such an extent that they'd receive a major advantage. Also, evidence of realistic techniques bringing humans to this level of communication.
  • Transfer of skills: Evidence against would be that no minds can teach better than humans do, or at least not to such an extent that they'd receive a major advantage. Also, evidence of realistic techniques bringing humans to this level of skill transfer.
  • Various biases: Evidence against would either be that human cognitive biases are not actually major ones, or that no mind architecture could overcome them. Also, evidence that humans actually have a realistic chance of overcoming most biases.

Depending on how you define "the Singularity", some of these may be irrelevant. Personally, I think the most important aspect of the Singularity is whether minds drastically different from humans will eventually take over, and how rapid the transition could be. Excluding the possibility of a rapid takeover would require at least strong evidence against gains from increased serial power, increased parallel power, improved algorithms, new mental modules, copyability, and transfer of skills. That seems quite hard to come by, especially once you take into account the fact that it's not enough to show that e.g. current trends in hardware development show mostly increases in parallel instead of serial power - to refute the gains from increased serial power, you'd also have to show that this is indeed some deep physical limit which cannot be overcome.

Replies from: rwallace, XiXiDu
comment by rwallace · 2011-11-20T14:53:23.960Z · LW(p) · GW(p)

Okay, to look at some of the specifics:

Superior processing power. Evidence against would be the human brain already being close to the physical limits of what is possible.

The linked article is amusing but misleading; the described 'ultimate laptop' would essentially be a nuclear explosion. The relevant physical limit is ln(2)kT energy dissipated per bit erased; in SI units at room temperature this is about 4e-21. We don't know exactly how much computation the human brain performs; middle-of-the-road estimates put it in the ballpark of 1e18 several-bit operations per second for 20 watts, which is not very many orders of magnitude short of even the theoretical limit imposed by thermodynamics, let alone whatever practical limits may arise once we take into account issues like error correction, communication latency and bandwidth, and the need for reprogrammability.

Superior serial power: Evidence against would be an inability to increase the serial power of computers anymore.

Indeed we hit this some years ago. Of course as you observe, it is impossible to prove serial speed won't start increasing again in the future; that's inherent in the problem of proving a negative. If such proof is required, then no sequence of observations whatsoever could possibly count as evidence against the Singularity.

Superior parallel power:

Of course uses can always be found for more parallel power. That's why we humans make use of it all the time, both by assigning multiple humans to a task, and increasingly by placing multiple CPU cores at the disposal of individual humans.

Improved algorithms:

Finding these is (assuming P!=NP) intrinsically difficult; humans and computers can both do it, but neither will ever be able to do it easily.

Designing new mental modules:

As for improved algorithms.

Modifiable motivation systems:

An advantage when they reduce akrasia, a disadvantage when they make you more vulnerable to wireheading.

Copyability: Evidence against would be evidence that minds cannot be effectively copied, maybe because there won't be enough computing power to run many copies.

Indeed there won't, at least initially; supercomputers don't grow on trees. Of course, computing power tends to become cheaper over time, but that does take time, so no support for hard takeoff here.

Alternatively, that copying minds would result in rapidly declining marginal returns and that the various copying advantages discussed by e.g. Hanson and Shulman aren't as big as they seem.

Matt Mahoney argues that this will indeed happen because an irreducible fraction of the knowledge of how to do a job is specific to that job.

Perfect co-operation:

Some of the more interesting AI work has been on using a virtual market economy to allocate resources between different modules within an AI program, which suggests computers and humans will be on the same playing field.

Superior communication:

Empirically, progress in communication technology between humans outpaces progress in AI, and has done so for as long as digital computers have existed.

Transfer of skills:

Addressed under copyability.

Various biases:

Hard to say, both because it's very hard to see our own biases, and because a bias that's adaptive in one situation may be maladaptive in another. But if we believe maladaptive biases run deep, such that we cannot shake them off with any confidence, then we should be all the more skeptical of our far beliefs, which are the most susceptible to bias.

Of course, there is also the fact that humans can and do tap the advantages of digital computers, both by running software on them, and in the long run potentially by uploading to digital substrate.

Replies from: Giles, lessdazed
comment by Giles · 2011-11-20T17:44:12.023Z · LW(p) · GW(p)

we should be all the more skeptical of our far beliefs, which are the most susceptible to bias.

Just out of interest... assume my far beliefs take the form of a probability distribution of possible future outcomes. How can I be "skeptical" of that? Given that something will happen in the future, all I can do is update in the direction of a different probability distribution.

In other words, which direction am I likely to be biased in?

Replies from: Eugine_Nier, rwallace
comment by Eugine_Nier · 2011-11-20T18:31:12.070Z · LW(p) · GW(p)

In other words, which direction am I likely to be biased in?

In the direction of overconfidence, i.e., assigning too much probability mass to your highest probability theory.

comment by rwallace · 2011-11-20T20:25:51.875Z · LW(p) · GW(p)

We should update away from beliefs that the future will resemble a story, particularly a story whose primary danger will be fought by superheroes (most particularly for those of us who would personally be among the superheroes!) and towards beliefs that the future will resemble the past and the primary dangers will be drearily mundane.

Replies from: Nornagest
comment by Nornagest · 2011-11-20T21:17:57.623Z · LW(p) · GW(p)

The future will certainly resemble a story -- or, more accurately, will be capable of being placed into several plausible narrative frames, just as the past has. The bias you're probably trying to point to is in interpreting any particular plausible story as evidence for its individual components -- or, for that matter, against.

The conjunction fallacy implies that any particular vision of a Singularity-like outcome is less likely than our untrained intuitions would lead us to believe. It's an excellent reason to be skeptical of any highly derived theories of the future -- the specifics of Ray Kurzweil's singularity timeline, for example, or Robin Hanson's Malthusian emverse. But I don't think it's a good reason to be skeptical of any of the dominant singularity models in general form. Those don't work back from a compelling image to first principles; most of them don't even present specific consequential predictions, for fairly straightforward reasons. All the complexity is right there on the surface, and attempts to narrativize it inevitably run up against limits of imagination. (As evidence, the strong Singularity has been fairly poor at producing fiction when compared to most future histories of comparable generality; there's no equivalent of Heinlein writing stories about nuclear-powered space colonization, although there's quite a volume of stories about weak or partial singularities.)

So yes, there's not going to be a singleton AI bent on turning us all into paperclips. But that's a deliberately absurd instantiation of a much more general pattern. I can conceive of a number of ways in which the general pattern too might be wrong, but the conjunction fallacy doesn't fly; a number of attempted debunkings, meanwhile, do suffer from narrative fixation issues.

Superhero bias is a more interesting question -- but it's also a more specific one.

Replies from: rwallace
comment by rwallace · 2011-11-20T21:36:58.828Z · LW(p) · GW(p)

Well, any sequence of events can be placed in a narrative frame with enough of a stretch, but the fact remains that different sequence of events differ in their amenability to this; fiction is not a random sampling from the space of possible things we could imagine happening, and the Singularity is narratively far stronger than most imaginable futures, to a degree that indicates bias we should correct for. I've seen a fair bit of strong Singularity fiction at this stage, though being, well, singular, it tends not to be amenable to repeated stories by the same author the way Heinlein's vision of nuclear powered space colonization was.

comment by lessdazed · 2011-11-21T00:47:43.042Z · LW(p) · GW(p)

Empirically, progress in communication technology between humans outpaces progress in AI, and has done so for as long as digital computers have existed.

The best way to colonize Alpha Centauri has always been to wait for technology to improve rather than launching an expedition, but it's impossible for that to continue to be true indefinitely. Short of direct mind-to-mind communication or something with a concurrent halt to AI progress, AI advances will probably outpace human communication advances in the near to medium term.

It seems unreasonable to believe human minds, optimized according to considerations such as politicking in addition to communication, will be able to communicate just as well as designed AIs. Human mind development was constrained by ancestral energy availability and head size, etc., so it's unlikely that we represent optimally sized minds to form a group of minds, even assuming an AI isn't able to reap huge efficiencies by becoming essentially as a single mind, regardless of scale.

Replies from: rwallace
comment by rwallace · 2011-11-21T00:58:09.556Z · LW(p) · GW(p)

Or human communications may stop improving because they are good enough to no longer be a major bottleneck, in which case it may not greatly matter whether other possible minds could do better. Amdahl's law: if something was already only ten percent of total cost, improving it by a factor of infinity would reduce total cost by only that ten percent.

comment by XiXiDu · 2011-11-20T15:08:00.935Z · LW(p) · GW(p)

Superior processing power. Evidence against would be the human brain already being close to the physical limits of what is possible.

It is often cited how much faster expert systems are at their narrow area of expertise. But does that mean that the human brain is actually slower or that it can't focus its resources on certain tasks? Take for example my ability to simulated some fantasy environment, off the top of my head, in front of my mind's eye. Or the ability of humans to run real-time egocentric world-simulations to extrapolate and predict the behavior of physical systems and other agents. Our best computers don't even come close to that.

Superior serial power: Evidence against would be an inability to increase the serial power of computers anymore.

Chip manufacturers are already earning most of their money by making their chips more energy efficient and working in parallel.

Improved algorithms: Evidence against would be the human brain's algorithms already being perfectly optimized and with no further room for improvement.

We simply don't know how efficient the human brain's algorithms are. You can't just compare artificial algorithms with the human ability to accomplish tasks that were never selected for by evolution.

Designing new mental modules: Evidence against would be evidence that the human brain's existing mental modules are already sufficient for any cognitive task with any real-world relevance.

This is an actual feature. It is not clear that you can have a general intelligence with a huge amount of plasticity that would work at all rather than messing itself up.

Modifiable motivation systems: Evidence against would be evidence that humans are already optimal at motivating themselves to work on important tasks...

This is an actual feature, see dysfunctional autism.

Copyability: Evidence against would be evidence that minds cannot be effectively copied, maybe because there won't be enough computing power to run many copies.

You don't really anticipate to be surprised by evidence on this point because your definition of "minds" doesn't even exist and therefore can't be shown not to be copyable. And regarding brains, show me some neuroscientists who think that minds are effectively copyable.

Perfect co-operation: Evidence against would be that no minds can co-operate better than humans do, or at least not to such an extent that they'd receive a major advantage.

Cooperation is a delicate quality. Too much and you get frozen, too little and you can't accomplish much. Human science is a great example of a balance between cooperation and useful rivalry. How is a collective intellect of AGI's going to preserve the right balance without mugging itself into pursuing insane expected utility-calculations?

Excluding the possibility of a rapid takeover would require at least strong evidence against gains...

Wait, are you saying that the burden of proof is with those who are skeptical of a Singularity? Are you saying that the null hypothesis is a rapid takeover? What evidence allowed you to make that hypothesises in the first place? Making up unfounded conjectures and then telling others to disprove them will lead to privileging random high-utility possibilities, that sound superficially convincing, while ignoring other problems that are based on empirical evidence.

...it's not enough to show that e.g. current trends in hardware development show mostly increases in parallel instead of serial power - to refute the gains from increased serial power, you'd also have to show that this is indeed some deep physical limit which cannot be overcome.

All that doesn't even matter. Computational resources are mostly irrelevant when it comes to risks from AI. What you have to show is that recursive self-improvement is possible. It is a question of whether you can dramatically speed up the discovery of unknown unknowns.

comment by Morendil · 2011-11-20T20:28:20.864Z · LW(p) · GW(p)

a parasite meme that I suspect could well snuff out the entire future of intelligent life

How do you propose that would happen?

Replies from: rwallace
comment by rwallace · 2011-11-20T21:44:37.323Z · LW(p) · GW(p)

We've had various kinds of Luddism before, but this one is particularly lethal in being a form that appeals to people who had been technophiles. If it spreads enough, best case scenario is the pool of people willing to work on real technological progress shrinks, worst case scenario is regulation that snuffs out progress entirely, and we get to sit around bickering about primate politics until whatever window of time we had runs out.

Replies from: Morendil
comment by Morendil · 2011-11-21T07:43:30.155Z · LW(p) · GW(p)

That's awfully vague. "Whatever window of time we had", what does that mean?

There's one kind of "technological progress" that SIAI opposes as far as I can tell: working on AGI without an explicit focus on Friendliness. Now if you happen to think that AGI is a must-have to ensure the long-term survival of humanity, it seems to me that you're already pretty much on board with the essential parts of SIAI's worldview, indistinguishable from them as far as the vast majority is concerned.

Otherwise, there's plenty of tech that is entirely orthogonal with the claims of SIAI: cheap energy, health, MNT, improving software engineering (so-called), and so on.

Replies from: rwallace, Risto_Saarelma
comment by rwallace · 2011-11-21T10:16:55.092Z · LW(p) · GW(p)

That's awfully vague. "Whatever window of time we had", what does that mean?

The current state of the world is unusually conducive to technological progress. We don't know how long this state of affairs will last. Maybe a long time, maybe a short time. To fail to make progress as rapidly as we can is to gamble the entire future of intelligent life on it lasting a long time, without evidence that it will do so. I don't think that's a good gamble.

There's one kind of "technological progress" that SIAI opposes as far as I can tell: working on AGI without an explicit focus on Friendliness.

I have seen claims to the contrary from a number of people, from Eliezer himself a number of years ago up to another reply to your comment right now. If SIAI were to officially endorse the position you just suggested, my assessment of their expected utility would significantly increase.

Replies from: Morendil
comment by Morendil · 2011-11-21T13:45:05.396Z · LW(p) · GW(p)

Well, SIAI isn't necessarily a homogenous bunch of people, with respect to what they oppose or endorse, but did you look for instance at Michael Anissimov's entries on MNT? (Focusing on that because it's the topic of Risto's comment and you seem to see that as a confirmation of your thesis.) You don't get the impression that he thinks it's a bad idea, quite the contrary: http://www.acceleratingfuture.com/michael/blog/category/nanotechnology/

Here is Eliezer on the SL4 mailing list:

If you solve the FAI problem, you probably solve the nanotech problem. If you solve the nanotech problem, you probably make the AI problem much worse. My preference for solving the AI problem as quickly as possible has nothing to do with the relative danger of AI and nanotech. It's about the optimal ordering of AI and nanotech.

The Luddites of our times are (for instance) groups like the publishing and music industries, the use of that label to describe the opinions of people affiliated with SIAI just doesn't make sense IMO.

Replies from: MichaelAnissimov, rwallace
comment by MichaelAnissimov · 2011-12-06T20:22:56.389Z · LW(p) · GW(p)

Human-implemented molecular nanotechnology is a bad idea. I just talk about it to attract people in who think it's important. MNT knowledge is a good filter/generator for SL3 and beyond thinkers.

MNT without friendly superintelligence would be nothing but a disaster.

It's true that SIAI isn't homogeneous though. For instance, Anna is much more optimistic about uploads than I am personally.

comment by rwallace · 2011-11-22T00:00:41.831Z · LW(p) · GW(p)

Thanks for the link, yes, that does seem to be a different opinion (and some very interesting posts).

I agree with you about the publishing and music industries. I consider current rampant abuse of intellectual property law to be a bigger threat than the Singularity meme, sufficiently so that if your comparative advantage is in politics, opposing that abuse probably has the highest expected utility of anything you could be doing.

comment by Risto_Saarelma · 2011-11-21T09:37:33.684Z · LW(p) · GW(p)

Molecular nanotechnology and anything else that can be weaponized to let a very small group of people effectively kill a very large group of people is probably something SIAI-type people would like to be countered with a global sysop scenario from the moment it gets developed.

comment by shokwave · 2011-11-21T11:07:50.134Z · LW(p) · GW(p)

The trouble is that even proper beliefs can be inadequately connected to other proper beliefs inside the human mind.

Proper beliefs can be too independent; if you have a belief network A -> B and the probabilities of 'B given A' and 'B given not-A' are similar, A doesn't have much value when you care about B. It doesn't change your belief much, because it isn't connected very much.

But my guess is most human brains have "A -> B" and don't have "B given A" and "B given not A". So they don't check the difference, so they don't see A isn't connected much to B.

So the general skill is noticing when B doesn't depend much on A.

I'm not sure what the "making sure your beliefs are actually connected in the first place" skill looks like when broken down to the 5-second level.

  • Find a belief connection.
  • Flip the truth value of A and see if B changes much.
  • If it doesn't, delete the connection.
  • If it does, remember the connection.

"Deleting the connection" and "remembering the connection" using our human brains are other 5-second level skills.

comment by SilasBarta · 2011-11-20T16:12:34.162Z · LW(p) · GW(p)

I don't know if this is an answer or a rephrasing of the problem, but "making sure your beliefs are propagated to the rest of your knowledge" is what I classified as the Level 2 Understanding.

comment by Shmi (shminux) · 2011-11-21T04:39:38.317Z · LW(p) · GW(p)

Somewhat off-topic, but what struck me about your blog post is the apparent contradiction between "And when you’re that smart, you can do almost anything." and presuming that we can actually program a "super-smart" AI with a certain set of values that it would not instantly override based on the considerations we cannot even imagine.

Just to give an example, it might decide that humans are bad for the universe, because they might some day evolve to destroy it. Would we be able to prevent it from wiping/neutralizing humanity for the sake of the universe and potentially other intelligent species that may exist there? Would we even want to? Or there might be a universal law of AI physics that leads it to destroy its ancestors. Or maybe some calculations, Asimov's Foundation-style, would require it to subject humanity to untold millennia of suffering in order to prevent its total destruction by excessive fun.

No point arguing with these examples, since a super-smart AI would think in ways we cannot fathom. My point, again, is that believing that we can give an AI a set of goals or behaviors it would care about once it is smarter than us seems no smarter than believing in an invisible man in the sky who has a list of 10 things we are not supposed to do (thanks, George Carlin).

Replies from: shokwave
comment by shokwave · 2011-11-21T11:06:51.278Z · LW(p) · GW(p)

If we restrict the space of its terminal goals to things we can imagine (and then set about proving each thing to be friendly) then we can be sure that even thinking in ways we cannot fathom, as long as its goal structure doesn't change (this seems decoupled from intelligence ie paperclip maximiser) it won't ever do bad things X Y or Z (because it checks them against its terminal goal).

Replies from: shminux
comment by Shmi (shminux) · 2011-11-21T23:59:12.118Z · LW(p) · GW(p)

That directly contradicts the EY's CEV, where whatever we can imagine is no more than a part of the Initial Dynamics. "Thou shalt..." or "Thou shalt not..." is not going to do the trick.

Replies from: shokwave
comment by shokwave · 2011-11-22T04:33:06.705Z · LW(p) · GW(p)

Right. Downgrading my estimate of how well I understand the problem.