Siren worlds and the perils of over-optimised search

stuart_armstrong

Siren worlds and the perils of over-optimised search

post by Stuart_Armstrong · 2014-04-07T11:00:18.803Z · LW · GW · Legacy · 418 comments

  The AI builds the siren worlds
    Meanwhile, at some level of complexity safely beyond what my human mind will ever reach, the AI is hiding all the evil and unmitigated suffering.
  Siren and marketing worlds without builders
  Constrained search and satisficing our preferences
None
418 comments

tl;dr An unconstrained search through possible future worlds is a dangerous way of choosing positive outcomes. Constrained, imperfect or under-optimised searches work better.

Some suggested methods for designing AI goals, or controlling AIs, involve unconstrained searches through possible future worlds. This post argues that this is a very dangerous thing to do, because of the risk of being tricked by "siren worlds" or "marketing worlds". The thought experiment starts with an AI designing a siren world to fool us, but that AI is not crucial to the argument: it's simply an intuition pump to show that siren worlds can exist. Once they exist, there is a non-zero chance of us being seduced by them during a unconstrained search, whatever the search criteria are. This is a feature of optimisation: satisficing and similar approaches don't have the same problems.

The AI builds the siren worlds

Imagine that you have a superintelligent AI that's not just badly programmed, or lethally indifferent, but actually evil. Of course, it has successfully concealed this fact, as "don't let humans think I'm evil" is a convergent instrumental goal for all AIs.

We've successfully constrained this evil AI in a Oracle-like fashion. We ask the AI to design future worlds and present them to human inspection, along with an implementation pathway to create those worlds. Then if we approve of those future worlds, the implementation pathway will cause them to exist (assume perfect deterministic implementation for the moment). The constraints we've programmed means that the AI will do all these steps honestly. Its opportunity to do evil is limited exclusively to its choice of worlds to present to us.

The AI will attempt to design a siren world: a world that seems irresistibly attractive while concealing hideous negative features. If the human mind is hackable in the crude sense - maybe through a series of coloured flashes - then the AI would design the siren world to be subtly full of these hacks. It might be that there is some standard of "irresistibly attractive" that is actually irresistibly attractive: the siren world would be full of genuine sirens.

Even without those types of approaches, there's so much manipulation the AI could indulge in. I could imagine myself (and many people on Less Wrong) falling for the following approach:

First, the siren world looks complicated, wrong and scary - but with just a hint that there's something more to it. Something intriguing, something half-glimpsed, something making me want to dig deeper. And as I follow up this something, I see more patterns, and seem to gain a greater understanding. Not just of the world I'm looking at, but of the meaning of good itself. The world seems to confirm to me some of my ideas about what constitutes a worthwhile life - not just the ideas I've been able to articulate, but the ones I've only got a vague half-baked notion of, and the ones I didn't even know I had.

The longer I stare into this world, the greater an understanding I get of my own values. And this is just the starting point: the world subtly opens up new avenues of philosophical musings in my brain, the images I see triggering me to come up with my own insights and thought experiments as to the meaning of goodness - insights that are then carefully confirmed as I did deeper. I could stay staring at this wonderful, beautiful and complex world for hours, days, years, gaining a deeper wisdom and understanding as I go, truly appreciating how the mysteries of this place unravel into new versions of worth and goodness. Every time I ever have a doubt about it, that doubt is systemically resolved into a new glorious confirmation of how much the AI really gets what I value, even before I knew how much I valued it.

Meanwhile, at some level of complexity safely beyond what my human mind will ever reach, the AI is hiding all the evil and unmitigated suffering.

Your siren world may differ from mine - you may be vulnerable to different tricks. Since people often believe they would be immune to such manipulation, feel free to imagine that the AI is fooling someone else - we know how stupid other people can be, right? - while you watch as a helpless bystander.

The problem is that a future universe is likely to be vast and complicated. When inspecting it, we have certain inspection criteria (IC). These consist of our values and preferences, but also the ways we would look into this universe, how we would follow up on initial impressions, various measures and yardsticks we might use to summarise the data we see, all the clever and devious plans we might come up with to ferret out "what's really going on". These IC are going to be considerably simpler than the totality of the future universe. So the AI's task is to optimise a universe design that passes the IC, while shoving in as much disutility as it can - which in a large universe, is a tremendous amount. Unless our IC are perfect and already include a good solution to the problem of value (in which case we've solved the friendliness problem already), a superintelligent AI will likely succeed at its task.

Siren and marketing worlds without builders

The above thought experiment needed a superintelligent evil AI for the design of the siren world. But if we admit that that is possible, we don't actually need the AI any more. The siren worlds exist: there are potential worlds of extreme disutility that satisfie our IC. If we simply did an unconstrained search across all possible future worlds (something like the search in Paul Christiano's indirect normativity - an idea that inspired the siren world concept), then we would at some point find siren worlds. And if we took the time to inspect them, we'd get sucked in by them.

How bad is this problem in general? A full search will not only find the siren worlds, but also a lot of very-seductive-but-also-very-nice worlds - genuine eutopias. We may feel that it's easier to be happy than to pretend to be happy (while being completely miserable and tortured and suffering). Following that argument, we may feel that there will be far more eutopias than siren worlds - after all, the siren worlds have to have bad stuff plus a vast infrastructure to conceal that bad stuff, which should at least have a complexity cost if nothing else. So if we chose the world that best passed our IC - or chose randomly among the top contenders - we might be more likely to hit a genuine eutopia than a siren world.

Unfortunately, there are other dangers than siren worlds. We are now optimising not for quality of the world, but for ability to seduce or manipulate the IC. There's no hidden evil in this world, just a "pulling out all the stops to seduce the inspector, through any means necessary" optimisation pressure. Call a world that ranks high in this scale a "marketing world". Genuine eutopias are unlikely to be marketing worlds, because they are optimised for being good rather than seeming good. A marketing world would be utterly optimised to trick, hack, seduce, manipulate and fool our IC, and may well be a terrible world in all other respects. It's the old "to demonstrate maximal happiness, it's much more reliable to wire people's mouths to smile rather than make them happy" problem all over again: the very best way of seeming good may completely preclude actually being good. In a genuine eutopia, people won't go around all the time saying "Btw, I am genuinely happy!" in case there is a hypothetical observer looking in. If every one of your actions constantly proclaims that you are happy, chances are happiness is not your genuine state. EDIT: see also my comment:

We are both superintelligences. You have a bunch of independently happy people that you do not aggressively compel. I have a group of zombies - human-like puppets that I can make do anything, appear to feel anything (though this is done sufficiently well that outside human observers can't tell I'm actually in control). An outside human observer wants to check that our worlds rank high on scale X - a scale we both know about.

Which of us do you think is going to be better able to maximise our X score?

This can also be seen as a epistemic version of Goodhart's law: "When a measure becomes a target, it ceases to be a good measure." Here the IC are the measure, and the marketing worlds are targeting them, and hence they cease to be a good measure. But recall that the IC include the totality of approaches we use to rank these worlds, so there's no way around this problem. If instead of inspecting the worlds, we simply rely on some sort of summary function, then the search will be optimised to find anything that can fool/pass that summary function. If we use the summary as a first filter, then apply some more profound automated checking, then briefly inspect the outcome so we're sure it didn't go stupid - then the search will optimised for "pass the summary, pass automated checking, seduce the inspector".

Different IC therefore will produce different rankings of worlds, but the top worlds in any of the ranking will be marketing worlds (and possibly siren worlds).

Constrained search and satisficing our preferences

The issue is a problem of (over) optimisation. The IC correspond roughly with what we want to value, but differs from it in subtle ways, enough that optimising for one could be disastrous for the other. If we didn't optimise, this wouldn't be a problem. Suppose we defined an acceptable world as one that we would judge "yeah, that's pretty cool" or even "yeah, that's really great". Then assume we selected randomly among the acceptable worlds. This would probably result in a world of positive value: siren worlds and marketing worlds are rare, because they fulfil very specific criteria. They triumph because they score so high on the IC scale, but they are outnumbered by the many more worlds that are simply acceptable.

This is in effect satisficing over the IC, rather than optimising over them. Satisficing has its own issues, however, so other approaches could be valuable as well. One way could be use constrained search. If for instance we took a thousand random worlds and IC-optimised over them, we're very unlikely to encounter a siren or marketing world. We're also very unlikely to encounter a world of any quality, though; we'd probably need to IC-optimise over at least a trillion worlds to find good ones. There is a tension in the number: as the number of worlds searched increases, their quality increases, but so does the odds of encountering a marketing or siren world. EDIT: Lumifer suggested using a first-past-the-post system: search through worlds, and pick the first acceptable one we find. This is better than the approach I outlined in this paragraph.

We could also restrict the search by considering "realistic" worlds. Suppose we had to take 25 different yes-no decisions that could affect the future of the humanity. This might be something like "choosing which of these 25 very different AIs to turn on and let loose together" or something more prosaic (which stocks to buy, which charities to support). This results in 2²⁵ different future worlds to search through: barely more than 33 million. Because there are so few worlds, they are unlikely to contain a marketing world (given the absolutely crucial proviso that none of the AIs is an IC-optimiser!). But these worlds are not drawn randomly from the space of future worlds, but are dependent on key decisions that we believe are important and relevant. Therefore they are very likely to contain an acceptable world - or at least far more likely than a random set of 33 million worlds would be. By constraining the choices in this way, we have in effect satisficed without satisficing, which is both Zen and useful.

As long as we're aware of the problem, other approaches may also allow for decent search without getting sucked in by a siren or a marketer.

418 comments

Comments sorted by top scores.

comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2014-04-07T18:09:11.236Z · LW(p) · GW(p)

While not generally an opponent of human sexuality, to be kind to all the LW audience including those whose parents might see them browsing, please do remove the semi-NSFW image.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-04-07T18:31:38.661Z · LW(p) · GW(p)

Is the new one more acceptable?

Replies from: MugaSofer, Eliezer_Yudkowsky

↑ comment by MugaSofer · 2014-04-08T10:45:43.574Z · LW(p) · GW(p)

See, now I'm curious about the old image...

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-04-08T11:42:59.085Z · LW(p) · GW(p)

The image can be found at http://truckstopstruckstop.tumblr.com/post/39569037859/nude-with-skull-via

↑ comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2014-04-07T19:53:29.294Z · LW(p) · GW(p)

Sure Why Not

Replies from: Lumifer

↑ comment by Lumifer · 2014-04-07T20:13:56.376Z · LW(p) · GW(p)

LOL. The number of naked women grew from one to two, besides the bare ass we now also have breasts with nipples visible (OMG! :-D) and yet it's now fine just because it is old-enough Art.

Replies from: Stuart_Armstrong, army1987, None

↑ comment by Stuart_Armstrong · 2014-04-07T21:12:29.890Z · LW(p) · GW(p)

Yep :-)

↑ comment by A1987dM (army1987) · 2014-04-13T07:47:43.158Z · LW(p) · GW(p)

The fact that the current picture is a painting and the previous one was a photograph might also have something to do with it.

Replies from: Lumifer

↑ comment by Lumifer · 2014-04-14T16:28:16.797Z · LW(p) · GW(p)

Can you unroll this reasoning?

Replies from: army1987

↑ comment by A1987dM (army1987) · 2014-04-21T19:29:35.663Z · LW(p) · GW(p)

It's just what my System 1 tells me; actually, I wouldn't know how to go about figuring out whether it's right.

↑ comment by [deleted] · 2014-04-08T14:58:26.451Z · LW(p) · GW(p)

Is there some other siren you'd prefer to see?

Replies from: Lumifer

↑ comment by Lumifer · 2014-04-08T17:34:09.031Z · LW(p) · GW(p)

See or hear? :-D

comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2014-04-08T00:08:26.762Z · LW(p) · GW(p)

This indeed is why "What a human would think of a world, given a defined window process onto a world" was not something I considered as a viable form of indirect normativity / an alternative to CEV.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-04-08T09:33:41.227Z · LW(p) · GW(p)

To my mind, the interesting part is the whole constrain search/satisficing ideas which may allow such an approach to be used.

comment by [deleted] · 2014-04-07T11:22:24.530Z · LW(p) · GW(p)

First question: how on Earth would we go about conducting a search through possible future universes, anyway? This thought experiment still feels too abstract to make my intuitions go click, in much the same way that Christiano's original write-up of Indirect Normativity did. You simply can't actually simulate or "acausally peek at" whole universes at a time, or even Earth-volumes in such. We don't have the compute-power, and I don't understand how I'm supposed to be seduced by a siren that can't sing to me.

It seems to me that the greater danger is that a UFAI would simply market itself as an FAI as an instrumental goal and use various "siren and marketing" tactics to manipulate us into cleanly, quietly accepting our own extinction -- because it could just be cheaper to manipulate people than to fight them, when you're not yet capable of making grey goo but still want to kill all humans.

And if we want to talk about complex nasty dangers, it's probably going to just be people jumping for the first thing that looks eutopian, in the process chucking out some of their value-set. People do that a lot, see: every single so-called "utopian" movement ever invented.

EDIT: Also, I think it makes a good bit of sense to talk about "IC-maximizing" or "marketing worlds" using the plainer machine-learning terminology: overfitting. Overfitting is also a model of what happens when an attempted reinforcement learner or value-learner over non-solipsistic utility functions wireheads itself: the learner has come up with a hypothesis that matches the current data-set exactly (for instance, "pushing my own reward button is Good") while diverging completely from the target function (human eutopia).

Avoiding overfitting is one very good reason why it's better to build an FAI by knowing an epistemic procedure that leads to the target function rather than just filtering a large hypothesis space for what looks good.

Replies from: Stuart_Armstrong, Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-04-07T15:50:01.804Z · LW(p) · GW(p)

First question: how on Earth would we go about conducting a search through possible future universes, anyway? This thought experiment still feels too abstract to make my intuitions go click, in much the same way that Christiano's original write-up of Indirect Normativity did.

Two main reasons for this: first, there is Christiano's original write-up, which has this problem. Second, we may be in a situation where we ask an AI to simulate the consequences of its choice, have a glance at it, and then approve/disapprove. That's less a search problem, and more the original siren world problem, and we should be aware of the problem.

Replies from: None

↑ comment by [deleted] · 2014-04-07T16:13:01.178Z · LW(p) · GW(p)

Second, we may be in a situation where we ask an AI to simulate the consequences of its choice, have a glance at it, and then approve/disapprove. That's less a search problem, and more the original siren world problem, and we should be aware of the problem.

This sounds extremely counterintuitive. If I have an Oracle AI that I can trust to answer more-or-less verbal requests (defined as: any request or "program specification" too vague for me to actually formalize), why have I not simply asked it to learn, from a large corpus of cultural artifacts, the Idea of the Good, and then explain to me what it has learned (again, verbally)? If I cannot trust the Oracle AI, dear God, why am I having it explore potential eutopian future worlds for me?

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-04-07T17:40:35.260Z · LW(p) · GW(p)

If I cannot trust the Oracle AI, dear God, why am I having it explore potential eutopian future worlds for me?

Because I haven't read Less Wrong? ^_^

This is another argument against using constrained but non-friendly AI to do stuff for us...

↑ comment by Stuart_Armstrong · 2014-04-07T15:48:16.017Z · LW(p) · GW(p)

Colloquially, this concept is indeed very close to overfitting. But it's not technically overfitting ("overfitting occurs when a statistical model describes random error or noise instead of the underlying relationship."), and using the term brings in other connotations. For instance, it may be that the AI needs to use less data to seduce us than it would to produce a genuine eutopia. It's more that it fits the wrong target function (having us approve its choice vs a "good" choice) rather than fitting it in an overfitted way.

Replies from: None

↑ comment by [deleted] · 2014-04-07T16:09:07.207Z · LW(p) · GW(p)

Thanks. My machine-learning course last semester didn't properly emphasize the formal definition of overfitting, or perhaps I just didn't study it hard enough.

What I do want to think about here is: is there a mathematical way to talk about what happens when a learning algorithm finds the wrong correlative or causative link among several different possible links between the data set and the target function? Such maths would be extremely helpful for advancing the probabilistic value-learning approach to FAI, as they would give us a way to talk about how we can interact with an agent's beliefs about utility functions while also minimizing the chance/degree of wireheading.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-04-07T19:25:28.706Z · LW(p) · GW(p)

is there a mathematical way to talk about what happens when a learning algorithm finds the wrong correlative or causative link among several different possible links between the data set and the target function?

That would be useful! A short search gives "bias" as the closest term, which isn't very helpful.

Replies from: None

↑ comment by [deleted] · 2014-04-08T15:29:54.166Z · LW(p) · GW(p)

Unfortunately "bias" in statistics is completely unrelated to what we're aiming for here.

In ugly, muddy words, what we're thinking is that we give the value-learning algorithm some sample of observations or world-states as "good", and possibly some as "bad", and "good versus bad" might be any kind of indicator value (boolean, reinforcement score, whatever). It's a 100% guarantee that the physical correlates of having given the algorithm a sample apply to every single sample, but we want the algorithm to learn the underlying causal structure of why those correlates themselves occurred (that is, to model our intentions as a VNM utility function) rather than learn the physical correlates themselves (because that leads to the agent wireheading itself).

Here's a thought: how would we build a learning algorithm that treats its samples/input as evidence of an optimization process occurring and attempts to learn the goal of that optimization process? Since physical correlates like reward buttons don't actually behave as optimization processes themselves, this would ferret out the intentionality exhibited by the value-learner's operator from the mere physical effects of that intentionality (provided we first conjecture that human intentions behave detectably like optimization).

Has that whole "optimization process" and "intentional stance" bit from the LW Sequences been formalized enough for a learning treatment?

Replies from: Quill_McGee, Stuart_Armstrong, IlyaShpitser

↑ comment by Quill_McGee · 2014-04-09T06:08:43.068Z · LW(p) · GW(p)

http://www.fungible.com/respect/index.html This looks to be very related to the idea of "Observe someone's actions. Assume they are trying to accomplish something. Work out what they are trying to accomplish." Which seems to be what you are talking about.

Replies from: None

↑ comment by [deleted] · 2014-04-09T08:08:05.568Z · LW(p) · GW(p)

That looks very similar to what I was writing about, though I've tried to be rather more formal/mathematical about it instead of coming up with ad-hoc notions of "human", "behavior", "perception", "belief", etc. I would want the learning algorithm to have uncertain/probabilistic beliefs about the learned utility function, and if I was going to reason about individual human minds I would rather just model those minds directly (as done in Indirect Normativity).

↑ comment by Stuart_Armstrong · 2014-04-08T17:55:49.138Z · LW(p) · GW(p)

I will think about this idea...

Replies from: None

↑ comment by [deleted] · 2014-04-08T18:22:20.294Z · LW(p) · GW(p)

The most obvious weakness is that such an algorithm could easily detect optimization processes that are acting on us (or, if you believe such things exist, you should believe this algorithm might locate them mistakenly), rather than us ourselves.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-05-16T10:33:19.652Z · LW(p) · GW(p)

I've been thinking about this, and I haven't found any immediately useful way of using your idea, but I'll keep it in the back of my mind... We haven't found a good way of identifying agency in the abstract sense ("was cosmic phenonmena X caused by an agent, and if so, which one?" kind of stuff), so this might be a useful simpler problem...

Replies from: None

↑ comment by [deleted] · 2014-05-16T14:35:27.797Z · LW(p) · GW(p)

Upon further research, it turns out that preference learning is a field within machine learning, so we can actually try to address this at a much more formal level. That would also get us another benefit: supervised learning algorithms don't wirehead.

Notably, this fits with our intuition that morality must be "taught" (ie: via labelled data) to actual human children, lest they simply decide that the Good and the Right consists of eating a whole lot of marshmallows.

And if we put that together with a conservation heuristic for acting under moral uncertainty (say: optimize for expectedly moral expected utility, thus requiring higher moral certainty for less-extreme moral decisions), we might just start to make some headway on managing to construct utility functions that would mathematically reflect what their operators actually intend for them to do.

I also have an idea written down in my notebook, which I've been refining, that sort of extends from what Luke had written down here. Would it be worth a post?

↑ comment by IlyaShpitser · 2014-04-08T15:47:37.522Z · LW(p) · GW(p)

Hi, there appears to be a lot of work on learning causal structure from data.

Replies from: None

↑ comment by [deleted] · 2014-04-08T15:50:12.965Z · LW(p) · GW(p)

Keywords? I've looked through Wikipedia and the table of contents from my ML textbook, but I haven't found the right term to research yet. "Learn a causal structure from the data and model the part of it that appears to narrow the future" would in fact be how to build a value-learner, but... yeah.

EDIT: One of my profs from undergrad published a paper last year about causal-structure. The question is how useful it is for universal AI applications. Joshua Tenenbaum tackled it from the cog-sci angle in 2011, but again, I'm not sure how to transfer it over to the UAI angle. I was searching for "learning causal structure from data" -- herp, derp.

Replies from: IlyaShpitser

↑ comment by IlyaShpitser · 2014-04-08T16:26:42.329Z · LW(p) · GW(p)

Who was this prof?

Replies from: None

↑ comment by [deleted] · 2014-04-08T16:27:36.577Z · LW(p) · GW(p)

I was referring to David Jensen, who taught "Research Methods in Empirical Computer Science" my senior year.

Replies from: IlyaShpitser

↑ comment by IlyaShpitser · 2014-04-08T16:43:52.940Z · LW(p) · GW(p)

Thanks.

comment by Froolow · 2014-04-09T14:12:30.198Z · LW(p) · GW(p)

This puts me in mind of a thought experiment Yvain posted a while ago (I’m certain he’s not the original author, but I can’t for the life of me track it any further back than his LiveJournal):

“A man has a machine with a button on it. If you press the button, there is a one in five million chance that you will die immediately; otherwise, nothing happens. He offers you some money to press the button once. What do you do? Do you refuse to press it for any amount? If not, how much money would convince you to press the button?”

This is – I think – analogous to your ‘siren world’ thought experiment. Rather than pushing the button once for £X, every time you push the button the AI simulates a new future world and at any point you can stop and implement the future that looks best to you. You have a small probability of uncovering a siren world, which you will be forced to choose because it will appear almost perfect (although you may keep pressing the button after uncovering the siren world and uncover an even more deviously concealed siren, or even a utopia which is better than the original siren). How often do you simulate future worlds before forcing yourself to implement the best so far to maximize your expected utility?

Obviously the answer depends on how probable siren worlds are and how likely it is that the current world will be overtaken by a superior world on the next press (which is equivalent to a function where the probability of earning money on the next press is inversely related to how much money you already have). In fact, if the probability of a siren world is sufficiently low, it may be worthwhile to take the risk of generating worlds without constraints in case the AI can simulate a world substantially better than the best-optimised world changing only the 25 yes-no questions, even if we know that the 25 yes-no questions will produce a highly livable world.

Of course, if the AI can lie to you about whether a world is good or not (which seems likely) or can produce possible worlds in a non-random fashion, increasing the risk of generating a siren world (which also seems likely) then you should never push the button, because of the risk you would be unable to stop yourself implementing the siren world which – almost inevitably – be generated on the first try. If we can prove the best-possible utopia is better than the best-possible siren even given IC constraints (which seems unlikely) or that the AI we have is definitely Friendly (could happen, you never know… :p ) then we should push the button an infinite number of times. But excluding these edge cases, it seems likely the optimal decision will not be constrained in the way you describe, but more likely an unconstrained but non-exhaustive search – a finite number of pushes on our random-world button rather than an exhaustive search of a constrained possibility space.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-04-17T11:16:18.065Z · LW(p) · GW(p)

a finite number of pushes on our random-world button

I consider that is also a constrained search!

comment by drnickbone · 2014-04-22T08:22:29.542Z · LW(p) · GW(p)

One issue here is that worlds with an "almost-friendly" AI (one whose friendliness was botched in some respect) may end up looking like siren or marketing worlds.

In that case, worlds as bad as sirens will be rather too common in the search space (because AIs with botched friendliness are more likely than AIs with true friendliness) and a satisficing approach won't work.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-04-25T09:43:34.026Z · LW(p) · GW(p)

Interesting thought there...

comment by Donald Hobson (donald-hobson) · 2020-12-25T23:38:30.582Z · LW(p) · GW(p)

We could also restrict the search by considering "realistic" worlds. Suppose we had to take 25 different yes-no decisions that could affect the future of the humanity. This might be something like "choosing which of these 25 very different AIs to turn on and let loose together" or something more prosaic (which stocks to buy, which charities to support). This results in 2²⁵ different future worlds to search through: barely more than 33 million. Because there are so few worlds, they are unlikely to contain a marketing world (given the absolutely crucial proviso that none of the AIs is an IC-optimiser!)

Suppose one of the decisions is whether or not to buy stock in a small AI startup. If you buy stock, the company will go on to make a paperclip maximizer several years later. The paperclip maximizer is using CDT or similar. It reasons that it can't make paperclips if its never made in the first place; that it is more likely to exist if the company that made it is funded; and that hacking IC takes a comparatively small amount of resources. The paperclip maximizer has an instrumental incentive to hack the IC.

Human society is chaotic. For any decision you take, there are plausible chains of cause and effect that a human couldn't predict, but a superintelligence can predict. The actions that lead to the paperclip maximiser have to be predictable by the current future predictor, as well as by the future paperclip maximiser. The chain of cause and effect could be a labyrinthine tangle of minor everyday interactions that humans couldn't hope to predict stemming from seemingly innocuous decisions.

In this scenario, it might be the inspection process itself that causes problems. The human inspects a world, they find the world full of very persuasive arguments to why they should make a paperclip maximizer, and an explanation of how to do so. (Say one inspection protocol was to render a predicted image of a random spot on earths surface, and the human inspector sees the argument written on a billboard. ) The human follows the instructions, makes a paperclip maximizer, the decision they were supposed to be making utterly irrelevant. The paperclip maximizer covers earth with billboards, and converts the rest of the universe into paperclips. In other words, using this protocol is lethal even for making a seemingly minor and innocuous decision like which shoelace to lace first.

comment by MichaelA · 2020-01-23T07:34:27.040Z · LW(p) · GW(p)

I've just now found my way to this post, from links in several of your more recent posts, and I'm curious as to how this fits in with more recent concepts and thinking from yourself and others.

Firstly, in terms of Garrabrant's taxonomy [LW · GW], I take it that the "evil AI" scenario could be considered a case of adversarial Goodhart, and the siren and marketing worlds without builders could be considered cases of regressional and/or extremal Goodhart. Does that sound right?

Secondly, would you still say that these scenarios demonstrate reasons to avoid optimising (and to instead opt for something like satisficing or constrained search)? It seems to me - though I'm fairly unsure about this - that your more recent writing on Goodhart-style problems suggests that you think we can deal with such problems to the best of our ability by just modelling everything we must already know about our uncertainty and about our preferences (e.g., that they have diminishing returns). Is that roughly right? If so, would you now view these siren and marketing worlds not as arguments against optimisation, but rather as strong demonstrations that naively optimising could be disastrous, and that carefully modelling everything we know about our uncertainty and preferences is really important?

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2020-01-23T11:13:58.097Z · LW(p) · GW(p)

that your more recent writing on Goodhart-style problems suggests that you think we can deal with such problems to the best of our ability by just modelling everything we must already know about our uncertainty and about our preferences (e.g., that they have diminishing returns).

To a large extent I do, but there may be some residual effects similar to the above, so some anti-optimising pressure might still be useful.

comment by PhilosophyTutor · 2014-04-28T00:13:45.393Z · LW(p) · GW(p)

It seems based on your later comments that the premise of marketing worlds existing relies on there being trade-offs between our specified wants and our unspecified wants, so that the world optimised for our specified wants must necessarily be highly likely to be lacking in our unspecified ones ("A world with maximal bananas will likely have no apples at all").

I don't think this is necessarily the case. If I only specify that I want low rates of abortion, for example, then I think it highly likely that 'd get a world that also has low rates of STD transmission, unwanted pregnancy, poverty, sexism and religiousity because they all go together, I think you could specify any one of those variables and almost all of the time you would get all the rest as a package deal without specifying them.

Of course a malevolent AI could probably deliberately construct a siren world to maximise one of those values and tank the rest but such worlds seem highly unlikely to arise organically. The rising tide of education, enlightenment, wealth and egalitarianism lifts most of the important boats all at once, or at least that is how it seems to me.

Replies from: Stuart_Armstrong, Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-04-28T11:45:26.026Z · LW(p) · GW(p)

on there being trade-offs between our specified wants and our unspecified wants

Yes, certainly. That's a problem of optimisation with finite resources. If A is a specified want and B is an unspecified want, then we shouldn't confuse "there are worlds with high A and also high B" with "the world with the highest A will also have high B".

↑ comment by Stuart_Armstrong · 2014-04-28T09:32:58.258Z · LW(p) · GW(p)

If I only specify that I want low rates of abortion, for example,

You would get a world with no conception, or possibly with no humans at all.

Replies from: PhilosophyTutor

↑ comment by PhilosophyTutor · 2014-04-28T11:21:16.735Z · LW(p) · GW(p)

I don't think you have highlighted a fundamental problem since we can just specify that we mean a low percentage of conceptions being deliberately aborted in liberal societies where birth control and abortion are freely available to all at will.

My point, though, is that I don't think it is very plausible that "marketing worlds" will organically arise where there are no humans, or no conception, but which tick all the other boxes we might think to specify in our attempts to describe an ideal world. I don't see how there being no conception or no humans could possibly be a necessary trade-off with things like wealth, liberty, rationality, sustainability, education, happiness, the satisfaction of rational and well-informed preferences and so forth.

Of course a sufficiently God-like malevolent AI could presumably find some way of gaming any finite list we give it, since there are probably an unbounded number of ways of bringing about horrible worlds, so this isn't a problem with the idea of siren worlds. I just don't find the idea of market worlds very plausible because so many of the things we value are fundamentally interconnected.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-04-28T11:42:33.179Z · LW(p) · GW(p)

The "no conception" example is just to illustrate that bad things happen when you ask an AI to optimise along a certain axis without fully specifying what we want (which is hard/impossible).

A marketing world is fully optimised along the "convince us to choose this world" axis. If at any point, the AI in confronted with a choice along the lines of "remove genuine liberty to best give the appearance of liberty/happiness", it will choose to do so.

That's actually the most likely way a marketing world could go wrong - the more control the AI has over people's appearance and behaviour, the more capable it is of making the world look good. So I feel we should presume that discrete-but-total AI control over the world's "inhabitants" would be the default in a marketing world.

Replies from: PhilosophyTutor

↑ comment by PhilosophyTutor · 2014-04-28T21:03:29.487Z · LW(p) · GW(p)

I think this and the "finite resources therefore tradeoffs" argument both fail to take seriously the interconnectedness of the optimisation axes which we as humans care about.

They assume that every possible aspect of society is an independent slider which a sufficiently advanced AI can position at will, even though this society is still going to be made up of humans, will have to be brought about by or with the cooperation of humans and will take time to bring about. These all place constraints on what is possible because the laws of physics and human nature aren't infinitely malleable.

I don't think discreet but total control over a world is compatible with things like liberty, which seem like obvious qualities to specify in an optimal world we are building an AI to search for.

I think what we might be running in to here is less of an AI problem and more of a problem with the model of AI as an all-powerful genie capable of absolutely anything with no constraints whatsoever.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-04-29T09:28:38.154Z · LW(p) · GW(p)

I don't think discreet but total control over a world is compatible with things like liberty

Precisely and exactly! That's the whole of the problem - optimising for one thing (appearance) results in the loss of other things we value.

which seem like obvious qualities to specify in an optimal world we are building an AI to search for.

Next challenge: define liberty in code. This seems extraordinarily difficult.

model of AI as an all-powerful genie capable of absolutely anything with no constraints whatsoever.

So we do agree that there are problem with an all-powerful genie? Once we've agreed on that, we can scale back to lower AI power, and see how the problems change.

(the risk is not so much that the AI would be an all powerful genie, but that it could be an all powerful genie compared with humans).

Replies from: PhilosophyTutor, drnickbone

↑ comment by PhilosophyTutor · 2014-04-29T11:29:33.362Z · LW(p) · GW(p)

Precisely and exactly! That's the whole of the problem - optimising for one thing (appearance) results in the loss of other things we value.

This just isn't always so. If you instruct an AI to optimise a car for speed, efficiency and durability but forget to specify that it has to be aerodynamic, you aren't going to get a car shaped like a brick. You can't optimise for speed and efficiency without optimising for aerodynamics too. In the same way it seems highly unlikely to me that you could optimise a society for freedom, education, just distribution of wealth, sexual equality and so on without creating something pretty close to optimal in terms of unwanted pregnancies, crime and other important axes.

Even if it's possible to do this, it seems like something which would require extra work and resources to achieve. A magical genie AI might be able to make you a super-efficient brick-shaped car by using Sufficiently Advanced Technology indistinguishable from magic but even for that genie it would have to be more work than making an equally optimal car by the defined parameters that wasn't a silly shape. In the same way an effectively God-like hypothetical AI might be able to make a siren world that optimised for everything except crime and create a world perfect in every way except that it was rife with crime but it seems like it would be more work, not less.

Next challenge: define liberty in code. This seems extraordinarily difficult.

I think if we can assume we have solved the strong AI problem, we can assume we have solved the much lesser problem of explaining liberty to an AI.

So we do agree that there are problem with an all-powerful genie?

We've got a problem with your assumptions about all-powerful genies, I think, because I think your argument relies on the genie being so ultimately all-powerful that it is exactly as easy for the genie to make an optimal brick-shaped car or an optimal car made out of tissue paper and post-it notes as it is for the genie to make an optimal proper car. I don't think that genie can exist in any remotely plausible universe.

If it's not all-powerful to that extreme then it's still going to be easier for the genie to make a society optimised (or close to it) across all the important axes at once than one optimised across all the ones we think to specify while tanking all the rest. So for any reasonable genie I still think market worlds don't make sense as a concept. Siren worlds, sure. Market worlds, not so much, because the things we value are deeply interconnected and you can't just arbitrarily dump-stat some while efficiently optimising all the rest.

Replies from: Stuart_Armstrong, Strange7

↑ comment by Stuart_Armstrong · 2014-04-29T12:07:41.442Z · LW(p) · GW(p)

I think if we can assume we have solved the strong AI problem, we can assume we have solved the much lesser problem of explaining liberty to an AI.

The strong AI problem is much easier to solve than the problem of motivating an AI to respect liberty. For instance, the first one can be brute forced (eg AIXItl with vast resources), the second one can't. Having the AI understand human concepts of liberty is pointless unless it's motivated to act on that understanding.

An excess of anthropomophisation is bad, but an analogy could be about creating new life (which humans can do) and motivating that new life to follow specific rules are requirements if they become powerful (which humans are pretty bad at at).

Replies from: PhilosophyTutor

↑ comment by PhilosophyTutor · 2014-04-29T21:40:30.648Z · LW(p) · GW(p)

The strong AI problem is much easier to solve than the problem of motivating an AI to respect liberty. For instance, the first one can be brute forced (eg AIXItl with vast resources), the second one can't.

I don't believe that strong AI is going to be as simple to brute force as a lot of LessWrongers believe, personally, but if you can brute force strong AI then you can just get it to run a neuron-by-neuron simulation of the brain of a reasonably intelligent first year philosophy student who understands the concept of liberty and tell the AI not to take actions which the simulated brain thinks offend against liberty.

That is assuming that in this hypothetical future scenario where we have a strong AI we are capable of programming that strong AI to do any one thing instead of another, but if we cannot do that then the entire discussion seems to me to be moot.

Replies from: Nornagest, Stuart_Armstrong, None

↑ comment by Nornagest · 2014-04-29T22:17:07.752Z · LW(p) · GW(p)

then [...] run a neuron-by-neuron simulation of the brain of a reasonably intelligent first year philosophy student who understands the concept of liberty and tell the AI not to take actions which the simulated brain thinks offend against liberty.

I've met far too many first-year philosophy students to be comfortable with this program.

↑ comment by Stuart_Armstrong · 2014-04-30T04:55:07.407Z · LW(p) · GW(p)

tell the AI not to take actions which the simulated brain thinks offend against liberty.

How? "tell", "the simulated brain thinks" "offend": defining those incredibly complicated concepts contains nearly the entirety of the problem.

Replies from: PhilosophyTutor, EHeller, Neph

↑ comment by PhilosophyTutor · 2014-04-30T06:28:16.107Z · LW(p) · GW(p)

I could be wrong but I believe that this argument relies on an inconsistent assumption, where we assume we have solved the problem of creating an infinitely powerful AI, but we have not solved the problem of operationally defining commonplace English words which hundreds of millions of people successfully understand in such a way that a computer can perform operations using them.

It seems to me that the strong AI problem is many orders of magnitude more difficult than the problem of rigorously defining terms like "liberty". I imagine that a relatively small part of the processing power of one human brain is all that is needed to perform operations on terms like "liberty" or "paternalism" and engage in meaningful use of them so it is a much, much smaller problem than the problem of creating even a single human-level AI, let alone a vastly superhuman AI.

If in our imaginary scenario we can't even define "liberty" in such a way that a computer can use the term, it doesn't seem very likely that we can build any kind of AI at all.

Replies from: None, Stuart_Armstrong, hairyfigment

↑ comment by [deleted] · 2014-05-01T07:37:33.415Z · LW(p) · GW(p)

My mind is throwing a type-error on reading your comment.

Liberty could well be like pornography: we know it when we see it, based on probabilistic classification. There might not actually be a formal definition of liberty that includes all actual humans' conceptions of such as special cases, but instead a broad range of classifier parameters defining the variation in where real human beings "draw the line".

Replies from: PhilosophyTutor

↑ comment by PhilosophyTutor · 2014-05-01T11:46:59.120Z · LW(p) · GW(p)

The standard LW position (which I think is probably right) is that human brains can be modelled with Turing machines, and if that is so then a Turing machine can in theory do whatever it is we do when we decide that something ls liberty, or pornography.

There is a degree of fuzziness in these words to be sure, but the fact we are having this discussion at all means that we think we understand to some extent what the term means and that we value whatever it is that it refers to. Hence we must in theory be able to get a Turing machine to make the same distinction although it's of course beyond our current computer science or philosophy to do so.

↑ comment by Stuart_Armstrong · 2014-04-30T15:28:52.855Z · LW(p) · GW(p)

I could be wrong but I believe that this argument relies on an inconsistent assumption, where we assume we have solved the problem of creating an infinitely powerful AI, but we have not solved the problem of operationally defining commonplace English words which hundreds of millions of people successfully understand in such a way that a computer can perform operations using them.

Yes. Here's another brute force approach: upload a brain (without understanding it), run it very fast with simulated external memory, subject it to evolutionary pressure. All this can be done with little philosophical and conceptual understanding, and certainly without any understanding of something as complex as liberty.

Replies from: PhilosophyTutor

↑ comment by PhilosophyTutor · 2014-05-01T00:16:47.826Z · LW(p) · GW(p)

If you can do that, then you can just find someone who you think understands what we mean by "liberty" (ideally someone with a reasonable familiarity with Kant, Mill, Dworkin and other relevant writers), upload their brain without understanding it, and ask the uploaded brain to judge the matter.

(Off-topic: I suspect that you cannot actually get a markedly superhuman AI that way, because the human brain could well be at or near a peak in the evolutionary landscape so that there is no evolutionary pathway from a current human brain to a vastly superhuman brain. Nothing I am aware of in the laws of physics or biology says that there must be any such pathway, and since evolution is purposeless it would be an amazing lucky break if it turned out that we were on the slope of the highest peak there is, and that the peak extends to God-like heights. That would be like if we put evolutionary pressure on a cheetah and discovered that if we do that we can evolve a cheetah that runs at a significant fraction of c.

However I believe my argument still works even if I accept for the sake of argument that we are on such a peak in the evolutionary landscape, and that creating God-like AI is just a matter of running a simulated human brain under evolutionary pressure for a few billion simulated years. If we have that capability then we must also be able to run a simulated philosopher who knows what "liberty" refers to).

EDIT: Downvoting this without explaining why you disagree doesn't help me understand why you disagree.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-05-02T13:49:28.380Z · LW(p) · GW(p)

If we have that capability then we must also be able to run a simulated philosopher who knows what "liberty" refers to.

And would their understanding of liberty remain stable under evolutionary pressure? That seems unlikely.

EDIT: Downvoting this without explaining why you disagree doesn't help me understand why you disagree.

Have not been downvoting it.

Replies from: PhilosophyTutor

↑ comment by PhilosophyTutor · 2014-05-02T20:19:32.380Z · LW(p) · GW(p)

I didn't think we needed to put the uploaded philosopher under billions of years of evolutionary pressure. We would put your hypothetical pre-God-like AI in one bin and update it under pressure until it becomes God-like, and then we upload the philosopher separately and use them as a consultant.

(As before I think that the evolutionary landscape is unlikely to allow a smooth upward path from modern primate to God-like AI, but I'm assuming such a path exists for the sake of the argument).

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-05-06T11:56:49.940Z · LW(p) · GW(p)

we upload the philosopher separately and use them as a consultant.

And then we have to ensure the AI follows the consultant (probably doable) and define what querying process is acceptable (very hard).

But your solution (which is close to Paul Christiano's) works whatever the AI is, we just need to be able to upload a human. My point was that we could conceivably create an AI without understanding any of the hard problems, still stands. If you want I can refine it: allow partial uploads: we can upload brains, but they don't function as stable humans, as we haven't mapped all the fine details we need to. However, we can use these imperfect uploads, plus a bit of evolution, to produce AIs. And here we have no understanding of how to control its motivations at all.

Replies from: PhilosophyTutor, TheAncientGeek

↑ comment by PhilosophyTutor · 2014-05-07T11:05:03.245Z · LW(p) · GW(p)

I won't argue against the claim that we could conceivably create an AI without knowing anything about how to create an AI. It's trivially true in the same way that we could conceivably turn a monkey loose on a typewriter and get strong AI.

I also agree with you that if we got an AI that way we'd have no idea how to get it to do any one thing rather than another and no reason to trust it.

I don't currently agree that we could make such an AI using a non-functioning brain model plus "a bit of evolution". I am open to argument on the topic but currently it seems to me that you might as well say "magic" instead of "evolution" and it would be an equivalent claim.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-05-07T17:04:18.890Z · LW(p) · GW(p)

Why are you confident that an AI that we do develop will not have these traits? You agree the mindspace is large, you agree we can develop some cognitive abilities without understanding them. If you add that most AI programmers don't take AI risk seriously and will only be testing their AI's in controlled environments, that the AI will be likely developed for a military or commercial purpose, I don't see why you'd have high confidence that they will converge on a safe design?

Replies from: XiXiDu, PhilosophyTutor, None, TheAncientGeek

↑ comment by XiXiDu · 2014-05-07T17:54:32.513Z · LW(p) · GW(p)

If you add that most AI programmers don't take AI risk seriously and will only be testing their AI's in controlled environments...I don't see why you'd have high confidence that they will converge on a safe design?

Why do you think such an AI wouldn't just fail at being powerful, rather than being powerful in a catastrophic way?

If programs fail in the real world then they are not working well. You don't happen to come across a program that manages to prove the Riemann hypothesis when you designed it to prove the irrationality of the square root of 2.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-05-12T11:07:27.597Z · LW(p) · GW(p)

Why do you think such an AI wouldn't just fail at being powerful, rather than being powerful in a catastrophic way?

If it fails at being powerful, we don't have to worry about it, so I feel free to ignore those probabilities.

You don't happen to come across a program that manages to prove the Riemann hypothesis when you designed it to prove the irrationality of the square root of 2.

But you might come across a program motivated to eliminate all humans if you designed it to optimise the economy...

Replies from: TheAncientGeek, None, XiXiDu

↑ comment by TheAncientGeek · 2014-05-12T12:22:46.340Z · LW(p) · GW(p)

So you're not pursuing the claim that a SAI will probably be dangerous, you are just worried that it might be?

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-05-12T16:31:12.939Z · LW(p) · GW(p)

My claim has always been that the probability that an SAI will be dangerous is too high to ignore. I fluctuate on the exact probability, but I've never seen anything that drives it down to a level I feel comfortable with (in fact, I've never seen anything drive it below 20%).

↑ comment by [deleted] · 2014-05-12T16:03:20.123Z · LW(p) · GW(p)

But you might come across a program motivated to eliminate all humans if you designed it to optimise the economy...

This is why the Wise employ normative uncertainty and the learning of utility functions from data, rather than hardcoding verbal instructions that only make sense in light of a complete human mind and social context.

Replies from: Stuart_Armstrong, TheAncientGeek, XiXiDu

↑ comment by Stuart_Armstrong · 2014-05-12T16:29:15.951Z · LW(p) · GW(p)

employ normative uncertainty and the learning of utility functions from data

Indeed. But the more of the problem you can formalise and solve (eg maintaining a stable utility function over self-improvements) the more likely the learning approach is to succeed.

Replies from: None

↑ comment by [deleted] · 2014-05-12T20:17:23.679Z · LW(p) · GW(p)

Well yes, of course. I mean, if you can't build an agent that was capable of maintaining its learned utility while becoming vastly smarter (and thus capable of more accurately learning and enacting capital-G Goodness), then all that utility-learning was for nought.

↑ comment by TheAncientGeek · 2014-05-12T16:12:27.104Z · LW(p) · GW(p)

Yeah, but hardcoding is an easier sell to people who know how to code but have never done .AI... Its like political demagogues selling unworkable but easily understood ideas.

Replies from: None

↑ comment by [deleted] · 2014-05-12T20:21:07.297Z · LW(p) · GW(p)

Not really, no. Most people don't recognize the "hidden complexity of wishes" in Far Mode, or when it's their wishes. However, I think if I explain to them that I'll be encoding my wishes, they'll quickly figure out that my attempts to hardcode AI Friendliness are going to be very bad for them. Human intelligence evolved for winning arguments when status, wealth, health, and mating opportunities are at issue: thus, convince someone to treat you as an opponent, and leave the correct argument lying right where they can pick it up, and they'll figure things out quickly.

Hmmm... I wonder if that bit of evolutionary psychology explains why many people act rude and nasty even to those close to them. Do we engage more intelligence when trying to win a fight than when trying to be nice?

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-05-12T21:16:33.739Z · LW(p) · GW(p)

You've missed the main point...it's not that encoding wishes accurately is hard,it is that hardcoding isn't .AI.

↑ comment by XiXiDu · 2014-05-13T12:10:22.224Z · LW(p) · GW(p)

The very idea underlying AI is enabling people to get a program to do what they mean without having to explicitly encode all details. What AI risk advocates do is to turn the whole idea upside down, claiming that, without explicitly encoding what you mean, your program will do something else. The problem here is that it is conjectured that the program will do what it was not meant to do in a very intelligent and structured manner. But this can't happen when it comes to intelligently designed systems (as opposed to evolved systems), because the nature of unintended consequences is overall chaotic.

How often have you heard of intelligently designed programs that achieved something highly complex and marvelous, but unintended, thanks to the programmers being unable to predict the behavior of the program? I don't know of any such case. But this is exactly what AI risk advocates claim will happen, namely that a program designed to do X (calculate 1+1) will perfectly achieve Y (take over the world).

If artificial general intelligence will eventually be achieved by some sort of genetic/evolutionary computation, or neuromorphic engineering, then I can see how this could lead to unfriendly AND capable AI. But an intelligently designed AI will either work as intended or be incapable of taking over the world (read: highly probable).

This of course does not ensure a positive singularity (if you believe that this is possible at all), since humans might use such intelligently and capable AIs to wreck havoc (ask the AI to do something stupid, or something that clashes with most human values). So there is still a need for "friendly AI". But this is quite different from the idea of interpreting "make humans happy" as "tile the universe with smiley faces". Such a scenario contradicts the very nature of intelligently designed AI, which is an encoding of “Understand What Humans Mean” AND “Do What Humans Mean”. More here.

Replies from: None, Richard_Kennaway, TheAncientGeek, Furcas

↑ comment by [deleted] · 2014-05-13T17:23:00.989Z · LW(p) · GW(p)

If artificial general intelligence will eventually be achieved by some sort of genetic/evolutionary computation, or neuromorphic engineering, then I can see how this could lead to unfriendly AND capable AI. But an intelligently designed AI will either work as intended or be incapable of taking over the world (read: highly probable).

Alexander, have you even bothered to read the works of Marcus Hutter and Juergen Schmidhuber, or have you spent all your AI-researching time doing additional copy-pastas of this same argument every single time the subject of safe or Friendly AGI comes up?

Your argument makes a measure of sense if you are talking about the social process of AGI development: plainly, humans want to develop AGI that will do what humans intend for it to do. However, even a cursory look at the actual research literature shows that the mathematically most simple agents (ie: those that get discovered first by rational researchers interested in finding universal principles behind the nature of intelligence) are capital-U Unfriendly, in that they are expected-utility maximizers with not one jot or tittle in their equations for peace, freedom, happiness, or love, or the Ideal of the Good, or sweetness and light, or anything else we might want.

(Did you actually expect that in this utterly uncaring universe of blind mathematical laws, you would find that intelligence necessitates certain values?)

No, Google Maps will never turn superintelligent and tile the solar system in computronium to find me a shorter route home from a pub crawl. However, an AIXI or Goedel Machine instance will, because these are in fact entirely distinct algorithms.

In fact, when dealing with AIXI and Goedel Machines we have an even bigger problem than "tile everything in computronium to find the shortest route home": the much larger problem of not being able to computationally encode even a simple verbal command like "find the shortest route home". We are faced with the task of trying to encode our values into a highly general, highly powerful expected-utility maximizer at the level of, metaphorically speaking, pre-verbal emotion.

Otherwise, the genie will know, but not care.

Now, if you would like to contribute productively, I've got some ideas I'd love to talk over with someone for actually doing something about some few small corners of Friendliness subproblems. Otherwise, please stop repeating yourself.

Replies from: XiXiDu, XiXiDu, TheAncientGeek

↑ comment by XiXiDu · 2014-05-14T08:48:12.701Z · LW(p) · GW(p)

However, even a cursory look at the actual research literature shows that the mathematically most simple agents (ie: those that get discovered first by rational researchers interested in finding universal principles behind the nature of intelligence) are capital-U Unfriendly, in that they are expected-utility maximizers...

If I believed that anything as simple as AIXI could possibly result in practical general AI, or that expected utility maximizing was at all feasible, then I would tend to agree with MIRI. I don't. And I think it makes no sense to draw conclusions about practical AI from these models.

...if you are talking about the social process of AGI development: plainly, humans want to develop AGI that will do what humans intend for it to do.

This is crucial.

Did you actually expect that in this utterly uncaring universe of blind mathematical laws, you would find that intelligence necessitates certain values?

That's largely irrelevant and misleading. Your autonomous car does not need to feature an encoding of an amount of human values that correspondents to its level of autonomy.

Otherwise, the genie will know, but not care.

That post has been completely debunked.

ETA: Fixed a link to expected utility maximization.

↑ comment by XiXiDu · 2014-05-14T11:15:34.200Z · LW(p) · GW(p)

Alexander, have you even bothered to read the works of Marcus Hutter and Juergen Schmidhuber...

I asked several people what they think about it, and to provide a rough explanation. I've also had e-Mail exchanges with Hutter, Schmidhuber and Orseau. I also informally thought about whether practically general AI that falls into the category “consequentialist / expected utility maximizer / approximation to AIXI” could ever work. And I am not convinced.

If general AI, which is capable of a hard-takeoff, and able to take over the world, requires less lines of code, in order to work, than to constrain it not to take over the world, then that's an existential risk. But I don't believe this to be the case.

Since I am not a programmer, or computer scientist, I tend to look at general trends, and extrapolate from there. I think this makes more sense than to extrapolate from some unworkable model such as AIXI. And the general trend is that humans become better at making software behave as intended. And I see no reason to expect some huge discontinuity here.

Here is what I believe to be the case:

(1) The abilities of systems are part of human preferences as humans intend to give systems certain capabilities and, as a prerequisite to build such systems, have to succeed at implementing their intentions.

(2) Error detection and prevention is such a capability.

(3) Something that is not better than humans at preventing errors is no existential risk.

(4) Without a dramatic increase in the capacity to detect and prevent errors it will be impossible to create something that is better than humans at preventing errors.

(5) A dramatic increase in the human capacity to detect and prevent errors is incompatible with the creation of something that constitutes an existential risk as a result of human error.

Here is what I doubt:

(1) Present-day software is better than previous software generations at understanding and doing what humans mean.

(2) There will be future generations of software which will be better than the current generation at understanding and doing what humans mean.

(3) If there is better software, there will be even better software afterwards.

(4) Magic happens.

(5) Software will be superhuman good at understanding what humans mean but catastrophically worse than all previous generations at doing what humans mean.

Replies from: jimrandomh, None

↑ comment by jimrandomh · 2014-05-14T12:31:57.384Z · LW(p) · GW(p)

Since I am not a programmer, or computer scientist

This is a much bigger problem for your ability to reason about this area than you think.

Replies from: XiXiDu

↑ comment by XiXiDu · 2014-05-14T13:53:21.808Z · LW(p) · GW(p)

Since I am not a programmer, or computer scientist

This is a much bigger problem for your ability to reason about this area than you think.

A relevant quote from Eliezer Yudkowsky (source):

I am tempted to say that a doctorate in AI would be negatively useful, but I am not one to hold someone’s reckless youth against them – just because you acquired a doctorate in AI doesn’t mean you should be permanently disqualified.

And another one (source):

I also think that evaluation by academics is a terrible test for things that don’t come with blatant overwhwelming unmistakable undeniable-even-to-humans evidence – e.g. this standard would fail MWI, molecular nanotechnology, cryonics, and would have recently failed ‘high-carb diets are not necessarily good for you’. I don’t particularly expect this standard to be met before the end of the world, and it wouldn’t be necessary to meet it either.

So since academic consensus on the topic is not reliable, and domain knowledge in the field of AI is negatively useful, what are the prerequisites for grasping the truth when it comes to AI risks?

Replies from: Jiro, nshepperd, None, jimrandomh

↑ comment by Jiro · 2014-05-14T15:01:59.386Z · LW(p) · GW(p)

I also think that evaluation by academics is a terrible test for things that don’t come with blatant overwhwelming unmistakable undeniable-even-to-humans evidence – e.g. this standard would fail MWI, molecular nanotechnology, cryonics,

I think that in saying this, Eliezer is making his opponents' case for them. Yes, of course the standard would also let you discard cryonics. One solution to that is to say that the standard is bad. Another solution is to say "yes, and I don't much care for cryonics either".

Replies from: None

↑ comment by [deleted] · 2014-05-14T15:15:44.917Z · LW(p) · GW(p)

I think that in saying this, Eliezer is making his opponents' case for him.

Nah, those are all plausibly correct things that mainstream science has mostly ignored and/or made researching taboo.

If you prefer a more clear-cut example, science was wrong about continental drift for about half a century -- until overwhelming, unmistakable evidence became available.

Replies from: Jiro

↑ comment by Jiro · 2014-05-14T20:13:46.939Z · LW(p) · GW(p)

The main reason that scientists rejected continental drift was that there was no known mechanism which could cause it; plate tectonics wasn't developed until the late 1950's.

Continental drift is also commonly invoked by pseudoscientists as a reason not to trust scientists, and if you do so too you're in very bad company. There's a reason why pseudoscientists keep using continental drift for this purpose and don't have dozens of examples: examples are very hard to find. Even if you decide that continental drift is close enough that it counts, it's a very atypical case. Most of the time scientists reject something out of hand, they're right, or at worst, wrong about the thing existing, but right about the lack of good evidence so far.

Replies from: None

↑ comment by [deleted] · 2014-05-14T20:40:52.118Z · LW(p) · GW(p)

The main reason that scientists rejected continental drift was that there was no known mechanism which could cause it; plate tectonics wasn't developed until the late 1950's.

There was also a great deal of institutional backlash against proponents of continental drift, which was my point.

Continental drift is also commonly invoked by pseudoscientists as a reason not to trust scientists, and if you do so too you're in very bad company.

Guilt by association? Grow up.

There's a reason why pseudoscientists keep using continental drift for this purpose and don't have dozens of examples: examples are very hard to find. Even if you decide that continental drift is close enough that it counts, it's a very atypical case.

There are many, many cases of scientists being oppressed and dismissed because of their race, their religious beliefs, and their politics. That's the problem, and that's what's going on with the CS people who still think AI Winter implies AGI isn't worth studying.

Replies from: Jiro

↑ comment by Jiro · 2014-05-14T22:29:05.897Z · LW(p) · GW(p)

There was also a great deal of institutional backlash against proponents of continental drift, which was my point.

So? I'm pretty sure that there would be backlash against, say, homeopaths in a medical association. Backlash against deserving targets (which include people who are correct but because of unlucky circumstances, legitimately look wrong) doesn't count.

I'm reminded of an argument I had with a proponent of psychic power. He asked me what if psychic powers happen to be of such a nature that they can't be detected by experiments, don't show up in double-blind tests, etc.. I pointed out that he was postulating that psi is real but looks exactly like a fake. If something looks exactly like a fake, at some point the rational thing to do is treat it as fake. At that point in history, continental drift happened to look like a fake.

Guilt by association? Grow up.

That's not guilt by association, it's pointing out that the example is used by pseudoscientists for a reason, and this reason applies to you too.

There are many, many cases of scientists being oppressed and dismissed because of their race, their religious beliefs, and their politics.

If scientists dismissed cryonics because of the supporters' race, religion, or politics, you might have a point.

Replies from: None

↑ comment by [deleted] · 2014-05-14T23:21:04.327Z · LW(p) · GW(p)

I'll limit my response to the following amusing footnote:

If scientists dismissed cryonics because of the supporters' race, religion, or politics, you might have a point.

This is, in fact, what happened between early cryonics and cryobiology.

EDIT: Just so people aren't misled by Jiro's motivated interpretation of the link:

However, according to the cryobiologist informant who attributes to this episode the formal hardening of the Society for Cryobiology against cryonics, the repercussions from this incident were far-reaching. Rumors about the presentation -- often wildly distorted rumors -- began to circulate. One particularly pernicious rumor, according to this informant, was that my presentation had included graphic photos of "corpses' heads being cut off." This was not the case. Surgical photos which were shown were of thoracic surgery to place cannula and would be suitable for viewing by any audience drawn from the general public.

This informant also indicates that it was his perception that this presentation caused real fear and anger amongst the Officers and Directors of the Society. They felt as if they had been "invaded" and that such a presentation given during the course of, and thus under the aegis of, their meeting could cause them to be publicly associated with cryonics. Comments such as "what if the press got wind of this," or "what if a reporter had been there" were reported to have circulated.

Also, the presentation may have brought into sharper focus the fact that cryonicists existed, were really freezing people, and that they were using sophisticated procedures borrowed from medicine, and yes, even from cryobiology, which could cause confusion between the "real" science of cryobiology and the "fraud" of cryonics in the public eye. More to the point, it was clear that cryonicists were not operating in some back room and mumbling inarticulately; they were now right there in the midst of the cryobiologists and they were anything but inarticulate, bumbling back-room fools.

Obviously political.

Replies from: Jiro, Jiro

↑ comment by Jiro · 2014-05-15T15:31:54.106Z · LW(p) · GW(p)

You're equivocating on the term "political". When the context is "race, religion, or politics", "political" doesn't normally mean "related to human status", it means "related to government". Besides, they only considered it low status based on their belief that it is scientifically nonsensical.

My reply was steelmanning your post by assuming that the ethical considerations mentioned in the article counted as religious. That was the only thing mentioned in it that could reasonably fall under "race, religion, or politics" as that is normally understood.

↑ comment by Jiro · 2014-05-15T14:24:11.333Z · LW(p) · GW(p)

Most of the history described in your own link makes it clear that scientists objected because they think cryonics is scientifically nonsense, not because of race, religion, or politics. The article then tacks on a claim that scientists reject it for ethical reasons, but that isn't supported by its own history, just by a few quotes with no evidence that these beliefs are prevalent among anyone other than the people quoted.

Furthermore, of the quotes it does give, one of them is vague enough that I have no idea if it means in context what the article claims it means. Saying that the "end result" is damaging doesn't necessarily mean that having unfrozen people walking around is damaging--it may mean that he thinks cryonics doesn't work and that having a lot of resources wasted on freezing corpses is damaging.

↑ comment by nshepperd · 2014-05-14T14:21:39.277Z · LW(p) · GW(p)

At a minimum, a grasp of computer programming and CS. Computer programming, not even AI.

I'm inclined to disagree somewhat with Eliezer_2009 on the issue of traditional AI - even basic graph search algorithms supply valuable intuitions about what planning looks like, and what it is not. But even that same (obsoleted now, I assume) article does list computer programming knowledge as a requirement.

Replies from: XiXiDu

↑ comment by XiXiDu · 2014-05-14T15:06:09.369Z · LW(p) · GW(p)

...what are the prerequisites for grasping the truth when it comes to AI risks?

At a minimum, a grasp of computer programming and CS. Computer programming, not even AI.

What counts as "a grasp" of computer programming/science? I can e.g. program a simple web crawler and solve a bunch of Project Euler problems. I've read books such as "The C Programming Language".

I would have taken the udacity courses on machine learning by now, but the stated requirement is a strong familiarity with Probability Theory, Linear Algebra and Statistics. I wouldn't describe my familiarity as strong, that will take a few more years.

I am skeptical though. If the reason that I dismiss certain kinds of AI risks is that I lack the necessary education, then I expect to see rebuttals of the kind "You are wrong because of (add incomprehensible technical justification)...". But that's not the case. All I see are half-baked science fiction stories and completely unconvincing informal arguments.

Replies from: jimrandomh, Nornagest

↑ comment by jimrandomh · 2014-05-14T20:00:27.989Z · LW(p) · GW(p)

What counts as "a grasp" of computer programming/science?

This is actually a question I've thought about quite a bit, in a different context. So I have a cached response to what makes a programmer, not tailored to you or to AI at all. When someone asks for guidance on development as a programmer, the question I tend to ask is, how big is the biggest project you architected and wrote yourself?

The 100 line scale tests only the mechanics of programming; the 1k line scale tests the ability to subdivide problems; the 10k line scale tests the ability to select concepts; and the 50k line scale tests conceptual taste, and the ability to add, split, and purge concepts in a large map. (Line numbers are very approximate, but I believe the progression of skills is a reasonably accurate way to characterize programmer development.)

Replies from: trist

↑ comment by trist · 2014-05-15T00:31:05.190Z · LW(p) · GW(p)

New programmers (not jimrandomh), be wary of line counts! It's very easy for a programmer who's not yet ready for a 10k line project to turn it into a 50k lines. I agree with the progression of skills though.

Replies from: jimrandomh

↑ comment by jimrandomh · 2014-05-15T00:55:44.790Z · LW(p) · GW(p)

Yeah, I was thinking more of "project as complex as an n-line project in an average-density language should be". Bad code (especially with copy-paste) can inflate inflate line numbers ridiculously, and languages vary up to 5x in their base density too.

↑ comment by Nornagest · 2014-05-14T16:36:08.752Z · LW(p) · GW(p)

I would have taken the udacity courses on machine learning by now, but the stated requirement is a strong familiarity with Probability Theory, Linear Algebra and Statistics. I wouldn't describe my familiarity as strong, that will take a few more years.

I think you're overestimating these requirements. I haven't taken the Udacity courses, but I did well in my classes on AI and machine learning in university, and I wouldn't describe my background in stats or linear algebra as strong -- more "fair to conversant".

They're both quite central to the field and you'll end up using them a lot, but you don't need to know them in much depth. If you can calculate posteriors and find the inverse of a matrix, you're probably fine; more complicated stuff will come up occasionally, but I'd expect a refresher when it does.

↑ comment by [deleted] · 2014-05-14T16:42:44.515Z · LW(p) · GW(p)

Don't twist Eliezer's words. There's a vast difference between "a PhD in what they call AI will not help you think about the mathematical and philosophical issues of AGI" and "you don't need any training or education in computing to think clearly about AGI".

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-05-14T17:40:05.707Z · LW(p) · GW(p)

Not learning philosophy, as EY recommends will not help you with the philosophical issues.

↑ comment by jimrandomh · 2014-05-14T15:20:05.171Z · LW(p) · GW(p)

What are the prerequisites for grasping the truth when it comes to AI risks?

Ability to program is probably not sufficient, but it is definitely necessary. But not because of domain relevance; it's necessary because programming teaches cognitive skills that you can't get any other way, by presenting a tight feedback loop where every time you get confused, or merge concepts that needed to be distinct, or try to wield a concept without fully sharpening your understanding of it first, the mistake quickly gets thrown in your face.

And, well... it's pretty clear from your writing that you haven't mastered this yet, and that you aren't going to become less confused without stepping sideways and mastering the basics first.

Replies from: Lumifer, None, None

↑ comment by Lumifer · 2014-05-14T15:32:36.448Z · LW(p) · GW(p)

programming teaches cognitive skills that you can't get any other way

That looks highly doubtful to me.

Replies from: trist

↑ comment by trist · 2014-05-14T16:54:10.498Z · LW(p) · GW(p)

You mean that most cognitive skills can be taught in multiple ways, and you don't see why those taught by programming are any different? Or do you have a specific skill taught by programming in mind, and think there's other ways to learn it?

Replies from: Lumifer

↑ comment by Lumifer · 2014-05-14T17:06:50.064Z · LW(p) · GW(p)

There are a whole bunch of considerations.

First, meta. It should be suspicious to see programmers claiming to posses special cognitive skills that only they can have -- it's basically a "high priesthood" claim. Besides, programming became widespread only about 30 years ago. So, which cognitive skills were very rare until that time?

Second, "presenting a tight feedback loop where ... the mistake quickly gets thrown in your face" isn't a unique-to-programming situation by any means.

Third, most cognitive skills are fairly diffuse and cross-linked. Which specific cognitive skills you can't get any way other than programming?

I suspect that what the OP meant was "My programmer friends are generally smarter than my non-programmer friends" which is, um, a different claim :-/

Replies from: Nornagest

↑ comment by Nornagest · 2014-05-14T17:29:20.404Z · LW(p) · GW(p)

I don't think programming is the only way to build... let's call it "reductionist humility". Nor even necessarily the most reliable; non-software engineers probably have intuitions at least as good, for example, to say nothing of people like research-level physicists. I do think it's the fastest, cheapest, and currently most common, thanks to tight feedback loops and a low barrier to entry.

On the other hand, most programmers -- and other types of engineers -- compartmentalize this sort of humility. There might even be something about the field that encourages compartmentalization, or attracts to it people that are already good at it; engineers are disproportionately likely to be religious fundamentalists, for example. Since that's not sufficient to meet the demands of AGI problems, we probably shouldn't be patting ourselves on the back too much here.

Replies from: Lumifer, TheAncientGeek

↑ comment by Lumifer · 2014-05-14T17:58:10.893Z · LW(p) · GW(p)

Can you expand on how do you understand "reductionist humility", in particular as a cognitive skill?

Replies from: Nornagest

↑ comment by Nornagest · 2014-05-14T18:33:58.942Z · LW(p) · GW(p)

I might summarize it as an intuitive understanding that there is no magic, no anthropomorphism, in what you're building; that any problems are entirely due to flaws in your specification or your model. I'm describing it in terms of humility because the hard part, in practice, seems to be internalizing the idea that you and not some external malicious agency are responsible for failures.

This is hard to cultivate directly, and programmers usually get partway there by adopting a semi-mechanistic conception of agency that can apply to the things they're working on: the component knows about this, talks to that, has such-and-such a purpose in life. But I don't see it much at all outside of scientists and engineers.

Replies from: army1987, Lumifer

↑ comment by A1987dM (army1987) · 2014-05-14T18:44:02.261Z · LW(p) · GW(p)

IOW realizing that the reason why if you eat a lot you get fat is not that you piss off God and he takes revenge, as certain people appear to alieve.

↑ comment by Lumifer · 2014-05-14T19:01:16.708Z · LW(p) · GW(p)

internalizing the idea that you and not some external malicious agency are responsible for failures.

So it's basically responsibility?

...that any problems are entirely due to flaws in your specification or your model.

Clearly you never had to chase bugs through third-party libraries... :-) But yes, I understand what you mean, though I am not sure in which way this is a cognitive skill. I'd probably call it an attitude common to professions in which randomness or external factors don't play a major role -- sure, programming and engineering are prominent here.

Replies from: Nornagest, TheAncientGeek

↑ comment by Nornagest · 2014-05-14T19:23:32.398Z · LW(p) · GW(p)

So it's basically responsibility?

You could describe it as a particular type of responsibility, but that feels noncentral to me.

Clearly you never had to chase bugs through third-party libraries...

Heh. A lot of my current job has to do with hacking OpenSSL, actually, which is by no means a bug-free library. But that's part of what I was trying to get at by including the bit about models -- and in disciplines like physics, of course, there's nothing but third-party content.

I don't see attitudes and cognitive skills as being all that well differentiated.

↑ comment by TheAncientGeek · 2014-05-14T19:33:24.351Z · LW(p) · GW(p)

But randomness and external factors do predominate in almost everything. For that reason, applying programming skills to other domains is almost certain to be suboptimal

Replies from: Lumifer

↑ comment by Lumifer · 2014-05-14T19:36:07.908Z · LW(p) · GW(p)

But randomness and external factors do predominate in almost everything.

I don't think so, otherwise walking out of your door each morning would start a wild adventure and attempting to drive a vehicle would be an act of utter madness.

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-05-14T19:41:42.156Z · LW(p) · GW(p)

They don't predominate overall because you have learnt how to deal with them. If there were no random or external factors in driving, you could do so with a blindfold on.

Replies from: Lumifer

↑ comment by Lumifer · 2014-05-14T19:51:18.227Z · LW(p) · GW(p)

But randomness and external factors do predominate in almost everything.

...

They don't predominate overall

Make up your mind :-)

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-05-14T19:59:36.303Z · LW(p) · GW(p)

Predominate in almost every problem.

Don't predominate in any solved problem.

Learning to drive is learningto deal with other traffic (external) and not knowing what is going to happen next (random)

↑ comment by TheAncientGeek · 2014-05-14T17:54:14.469Z · LW(p) · GW(p)

Much of the writing on this site is philosophy, and people with a technology background tend not to grok philosophy because they are accurated to answer that can be be looked up, or figured out by known methods. If they could keep the logic chops and lose the impatience, they [might make good philosophers], but they tend not to.

Replies from: Nornagest

↑ comment by Nornagest · 2014-05-14T17:58:54.326Z · LW(p) · GW(p)

If they could keep the logic chops and lose the impatience, they child,might be,are good philosophers, but they tend not to.

Beg pardon?

↑ comment by [deleted] · 2014-05-14T18:02:05.898Z · LW(p) · GW(p)

it's necessary because programming teaches cognitive skills that you can't get any other way, by presenting a tight feedback loop where every time you get confused, or merge concepts that needed to be distinct, or try to wield a concept without fully sharpening your understanding of it first, the mistake quickly gets thrown in your face.

On a complete sidenote, this is a lot of why programming is fun. I've also found that learning the Coq theorem-prover has exactly the same effect, to the point that studying Coq has become one of the things I do to relax.

↑ comment by [deleted] · 2014-05-14T15:21:01.280Z · LW(p) · GW(p)

And, well... it's pretty clear from your writing that you haven't mastered this yet, and that you aren't going to become less confused without stepping sideways and mastering the basics first.

People have been telling him this for years. I doubt it will get much better.

↑ comment by [deleted] · 2014-05-14T16:51:22.918Z · LW(p) · GW(p)

I also informally thought about whether practically general AI that falls into the category “consequentialist / expected utility maximizer / approximation to AIXI” could ever work. And I am not convinced.

Too bad. I can download an inefficient but functional subhuman AGI from Github. Making it superhuman is just a matter of adding an entire planet's worth of computing power. Strangely, doing so will not make it conform to your ideas about "eventual future AGI", because this one is actually existing AGI, and reality doesn't have to listen to you.

If general AI, which is capable of a hard-takeoff, and able to take over the world, requires less lines of code, in order to work, than to constrain it not to take over the world, then that's an existential risk.

That is exactly the situation we face, your refusal to believe in actually-existing AGI models notwithstanding. Whine all you please: the math will keep on working.

Since I am not a programmer, or computer scientist,

Then I recommend you shut up about matters of highly involved computer science until such time as you have acquired the relevant knowledge for yourself. I am a trained computer scientist, and I held lots of skepticism about MIRI's claims, so I used my training and education to actually check them. And I found that the actual evidence of the AGI research record showed MIRI's claims to be basically correct, modulo Eliezer's claims about an intelligence explosion taking place versus Hutter's claim that an eventual optimal agent will simply scale itself up in intelligence with the amount of computing power it can obtain.

That's right, not everyone here is some kind of brainwashed cultist. Many of us have exercised basic skepticism against claims with extremely low subjective priors. But we exercised our skepticism by doing the background research and checking the presently available object-level evidence rather than by engaging in meta-level speculations about an imagined future in which everything will just work out.

Take a course at your local technical college, or go on a MOOC, or just dust off a whole bunch of textbooks in computer-scientific and mathematical subjects, study the necessary knowledge to talk about AGI, and then you get to barge in telling everyone around you how we're all full of crap.

Replies from: private_messaging, V_V, David_Gerard, Lumifer, XiXiDu, TheAncientGeek, XiXiDu

↑ comment by private_messaging · 2014-06-14T19:32:08.248Z · LW(p) · GW(p)

Too bad. I can download an inefficient but functional subhuman AGI from Github. Making it superhuman is just a matter of adding an entire planet's worth of computing power.

Which one are you talking about, to be completely exact?

I am a trained computer scientist

then use that training and figure out how many galaxies worth of computing power it's going to take.

Replies from: None

↑ comment by [deleted] · 2014-06-15T11:43:29.424Z · LW(p) · GW(p)

Of bleeding course I was talking about AIXI. What I find strange to the point of suspiciousness here is the evinced belief on part of the "AI skeptics" that the inefficiency of MC-AIXI means there will never, ever be any such thing as near-human, human-equivalent, or greater-than-human AGIs. After all, if intelligence is impossible without converting whole galaxies to computronium first, then how do we work?

And if we admit that sub-galactic intelligence is possible, why not artificial intelligence? And if we admit that sub-galactic artificial intelligence is possible, why not something from the "Machine Learning for Highly General Hypothesis Classes + Decision Theory of Active Environments = Universal AI" paradigm started by AIXI?

I'm not at all claiming current implementations of AIXI or Goedel Machines are going to cleanly evolve into planet-dominating superintelligences that run on a home PC next year, or even next decade (for one thing, I don't think planet dominating superintelligences will run on a present-day home PC ever). I am claiming that the underlying scientific paradigm of the thing is a functioning reduction of what we mean by the word "intelligence", and given enough time to work, this scientific paradigm is very probably (in my view) going to produce software you can run on an ordinary massive server farm that will be able to optimize arbitrary, unknown or partially unknown environments according to specified utility functions.

And eventually, yes, those agents will become smarter than us (causing "MIRI's issues" to become cogent), because we, actual human beings, will figure out the relationships between compute-power, learning efficiency (rates of convergence to error-minimizing hypotheses in terms of training data), reasoning efficiency (moving probability information from one proposition or node in a hypothesis to another via updating), and decision-making efficiency (compute-power needed to plan well given models of the environment). Actual researchers will figure out the fuel efficiency of artificial intelligence, and thus be able to design at least one gigantic server cluster running at least one massive utility-maximizing algorithm that will be able to reason better and faster than a human (while they have the budget to keep it running).

Replies from: private_messaging

↑ comment by private_messaging · 2014-06-15T12:50:32.562Z · LW(p) · GW(p)

The notion that AI is possible is mainstream. The crank stuff such as "I can download an inefficient but functional subhuman AGI from Github. Making it superhuman is just a matter of adding an entire planet's worth of computing power.", that's to computer science as hydrinos are to physics.

As for your server farm optimizing unknown environments, the last time I checked, we knew some laws of physics, and did things like making software tools that optimize simulated environments that follow said laws of physics, incidentally it also being mathematically nonsensical to define an "utility function" without a well defined domain. So you got your academic curiosity that's doing all on it's own and using some very general and impractical representations for modelling the world, so what? You're talking of something that is less - in terms of it's market value, power, anything - than it's parts and underlying technologies.

Replies from: None

↑ comment by [deleted] · 2014-06-15T13:42:05.321Z · LW(p) · GW(p)

incidentally it also being mathematically nonsensical to define an "utility function" without a well defined domain

Which is why reinforcement learning is so popular, yes: it lets you induce a utility function over any environment you're capable of learning to navigate.

Remember, any machine-learning algorithm has a defined domain of hypotheses it can learn/search within. Given that domain of hypotheses, you can define what a domain of utility functions. Hence, reinforcement learning and preference learning.

The notion that AI is possible is mainstream. The crank stuff such as "I can download an inefficient but functional subhuman AGI from Github. Making it superhuman is just a matter of adding an entire planet's worth of computing power.", that's to computer science as hydrinos are to physics.

You are completely missing the point. If we're all going to agree that AI is possible, and agree that there's a completely crappy but genuinely existent example of AGI right now, then it follows that getting AI up to dangerous and/or beneficial levels is a matter of additional engineering progress. My whole point is that we've already crossed the equivalent threshold from "Hey, why do photons do that when I fire them at that plate?" to "Oh, there's a photoelectric effect that looks to be described well by this fancy new theory." From there it was less than one century between the raw discovery of quantum mechanics and the common usage of everyday technologies based on quantum mechanics.

So you got your academic curiosity that's doing all on it's own and using some very general and impractical representations for modelling the world, so what?

The point being: when we can manage to make it sufficiently efficient, and provided we can make it safe, we can set it to work solving just about any problem we consider to be, well, a problem. Given sufficient power and efficiency, it becomes useful for doing stuff people want done, especially stuff people either don't want to do themselves or have a very hard time doing themselves.

Replies from: Richard_Kennaway, private_messaging, TheAncientGeek

↑ comment by Richard_Kennaway · 2014-06-15T13:56:15.930Z · LW(p) · GW(p)

completely crappy but genuinely existent example of AGI, then it follows that getting AI up to dangerous and/or beneficial levels is a matter of additional engineering progress.

This is devoid of empirical content.

Replies from: private_messaging, None

↑ comment by private_messaging · 2014-06-15T22:08:11.671Z · LW(p) · GW(p)

Yeah. I can write formally the resurrection of everyone who ever died. Using pretty much exact same approach. A for loop, iterating over every possible 'brain' just like the loops that iterate over every action sequence. Because when you have no clue how to do something, you can always write a for loop. I can put it on github, then cranks can download it and say that resurrecting all dead is a matter of additional engineering progress. After all, all dead had once lived, so it got to be possible for them to be alive.

↑ comment by [deleted] · 2014-06-15T14:19:40.595Z · LW(p) · GW(p)

How so?

Replies from: Richard_Kennaway

↑ comment by Richard_Kennaway · 2014-06-15T22:31:07.969Z · LW(p) · GW(p)

How so?

Describing X as "Y, together with the difference between X and Y" is a tautology. Drawing the conclusion that X is "really" a sort of Y already, and the difference is "just" a matter of engineering development is no more than inspirational fluff. Dividing problems into subproblems is all very well, but not when one of the subproblems amounts to the whole problem.

The particular instance "here's a completely crappy attempt at making an AGI and all we have to do is scale it up" has been a repeated theme of AGI research from the beginning. The scaling up has never happened. There is no such thing as a "completely crappy AGI", only things that aren't AGI.

Replies from: nshepperd

↑ comment by nshepperd · 2014-06-16T03:03:16.379Z · LW(p) · GW(p)

I think you underestimate the significance of reducing the AGI problem to the sequence prediction problem. Unlike the former, the latter problem is very well defined, and progress is easily measurable and quantifiable (in terms of efficiency of cross-domain compression). The likelyhood of engineering progress on a problem where success can be quantified seems significantly higher than on something as open ended as "general intelligence".

Replies from: private_messaging

↑ comment by private_messaging · 2014-06-16T07:02:34.614Z · LW(p) · GW(p)

It doesn't "reduce" anything, not in reductionism sense anyway. If you are to take that formula and apply the yet unspecified ultra powerful mathematics package to it - that's what you need to run it on planet worth of computers - it's this mathematics package that has to be extremely intelligent and ridiculously superhuman, before the resulting AI is even a chimp. It's this mathematics package that has to learn tricks and read books, that has to be able to do something as simple as making use of a theorem it encountered on input.

Replies from: None

↑ comment by [deleted] · 2014-06-16T08:13:21.357Z · LW(p) · GW(p)

The mathematics package doesn't have to do anything "clever" to build a highly clever sequence predictor. It just has to be efficient in terms of computing time and training data necessary to learn correct hypotheses.

So nshepperd is quite correct: MC-AIXI is a ridiculously inefficient sequence predictor and action selector, with major visible flaws, but reducing "general intelligence" to "maximizing a utility function over world-states via sequence prediction in an active environment" is a Big Deal.

Replies from: private_messaging

↑ comment by private_messaging · 2014-06-25T18:21:23.450Z · LW(p) · GW(p)

Multitude of AIs have been following what you think "AIXI" model is - select predictors that work, use them - long before anyone bothered to formulate it as a brute force loop (AIXI).

I think you, like most people over here, have a completely inverted view with regards to the difficulty of different breakthroughs. There is a point where the AI uses hierarchical models to deal with environment of greater complexity than the AI itself; getting there is fundamentally difficult, as in, we have no clue how to get there.

It is nice to believe that the word of some hoi polloi is waiting on you for some conceptual breakthrough just roughly within your reach like AIXI is, but that's just not how it works.

edit: Basically, it's as if you're concerned about nuclear powered 20 feet tall robots that shoot nuclear hand grenades. After all, the concept of 20 feet tall robot is the enormous breakthrough, while a sufficiently small nuclear reactor or hand grenade sized nukes are just a matter of "efficiency".

Replies from: Nornagest

↑ comment by Nornagest · 2014-06-25T18:37:00.921Z · LW(p) · GW(p)

That's not what's interesting about AIXI. "Select predictors that work, then use them" is a fair description of the entire field of machine learning; we've learned how to do that fairly well in narrow, well-defined problem domains, but hypothesis generation over poorly structured, arbitrarily complex environments is vastly harder.

The AIXI model is cool because it defines a clever (if totally impractical, and not without pitfalls) way of specifying a single algorithm that can generalize to arbitrary environments without requiring any pipe-fitting work on the part of its developers. That is (to my knowledge) new, and fairly impressive, though it remains a purely theoretical advance: the Monte Carlo approximation eli mentioned may qualify as general AI in some technical sense, but for practical purposes it's about as smart as throwing transistors at a dart board.

Replies from: None, private_messaging

↑ comment by [deleted] · 2014-06-25T22:00:25.190Z · LW(p) · GW(p)

about as smart as throwing transistors at a dart board

What a wonderful quote!

↑ comment by private_messaging · 2014-07-04T04:43:39.963Z · LW(p) · GW(p)

but hypothesis generation over poorly structured, arbitrarily complex environments is vastly harder.

Hypothesis generation over environments that aren't massively less complex than the machine is vastly harder, and remains vastly harder (albeit there are advances). There's a subtle problem substitution occurring which steals the thunder you originally reserved for something that actually is vastly harder.

Thing is, many people could at any time write a loop over, say, possible neural network values, and NNs (with feedback) being Turing complete, it'd work roughly the same. Said for loop would be massively, massively less complicated, ingenious, and creative than what those people actually did with their time instead.

The ridiculousness here is that, say, John worked on those ingenious algorithms while keeping in mind that the ideal is the best parameters out of the whole space (which is the abstract concept behind the for loop iteration over those parameters). You couldn't see what John was doing because he didn't write it out as a for loop. So James does some work where he - unlike John - has to write out the for loop explicitly, and you go Whoah!

That is (to my knowledge) new

Isn't. See Solomonoff induction, works of Kolmogorov, etc.

↑ comment by private_messaging · 2014-06-15T21:36:55.262Z · LW(p) · GW(p)

Which is why reinforcement learning is so popular, yes

There's the AIs that solve novel problems along the lines of "design a better airplane wing" or "route a microchip", and in that field, reinforcement learning of how basic physics works is pretty much one hundred percent irrelevant.

You are completely missing the point. If we're all going to agree that AI is possible, and agree that there's a completely crappy but genuinely existent example of AGI right now, then it follows that getting AI up to dangerous and/or beneficial levels is a matter of additional engineering progress

Slow, long term progress, an entire succession of technologies.

Really, you're just like free energy pseudoscientists. They do all the same things. Ohh, you don't want to give money for cold fusion? You must be a global warming denialist. That's the way they think and that's precisely the way you think about the issue. That you can make literally cold fusion happen with muons in no way shape or form supports what the cold fusion crackpots are doing. Nor does it make cold fusion power plants any more or less a matter of "additional engineering progress" than it would be otherwise.

edit: by same logic, resurrection of the long-dead never-preserved is merely a matter of "additional engineering progress". Because you can resurrect the dead using this exact same programming construct that AIXI uses to solve problems. It's called a "for loop", there's this for loop in monte carlo aixi. This loop goes over every possible [thing] when you have no clue what so ever how to actually produce [thing] . Thing = action sequence for AIXI and the brain data for resurrection of the dead.

Replies from: None

↑ comment by [deleted] · 2014-06-16T04:55:50.949Z · LW(p) · GW(p)

Slow, long term progress, an entire succession of technologies.

Ok, hold on, halt, major question: how closely do you follow the field of machine learning? And computational cognitive science?

Because on the one hand, there is very significant progress being made. On the other hand, when I say "additional engineering progress", that involves anywhere from years to decades of work before being able to make an agent that can compose an essay, due to the fact that we need classes of learners capable of inducing fairly precise hypotheses over large spaces of possible programs.

What it doesn't involve is solving intractable, magical-seeming philosophical problems like the nature of "intelligence" or "consciousness" that have always held the field of AI back.

edit: by same logic, resurrection of the long-dead never-preserved is merely a matter of "additional engineering progress".

No, that's just plain impossible. Even in the case of cryonic so-called "preservation", we don't know what we don't know about what information we will have needed preserved to restore someone.

Replies from: private_messaging

↑ comment by private_messaging · 2014-06-17T20:14:25.595Z · LW(p) · GW(p)

Ok, hold on, halt, major question: how closely do you follow the field of machine learning? And computational cognitive science?

(makes the gesture with the hands) Thiiiiis closely. Seriously though, not far enough as to start claiming that mc-AIXI does something interesting when run on a server with root access, or to claim that it would be superhuman if run on all computers we got, or the like.

No, that's just plain impossible.

Do I need to write code for that and put it on github? Iterates over every possible brain (represented as, say, a Turing machine), runs it for enough timesteps. Requires too much computing power.

Replies from: None

↑ comment by [deleted] · 2014-06-18T07:51:57.124Z · LW(p) · GW(p)

Tell me, if I signed up as the PhD student of one among certain major general machine learning researchers, and built out their ideas into agent models, and got one of those running on a server cluster showing interesting proto-human behaviors, might it interest you?

↑ comment by TheAncientGeek · 2014-06-18T09:07:23.745Z · LW(p) · GW(p)

You are completely missing the point. If we're all going to agree that AI is possible, and agree that there's a completely crappy but genuinely existent example of AGI right now, then it follows that getting AI up to dangerous and/or beneficial levels is a matter of additional engineering progress

Progress in 1. The sense of incrementally throwing more resources at AIXI, or 2. Forgetting AIXI , and coming up with something more parsimonious?

Because, if it's 2, there is no other AGI to use as a stating point got incremental progress.

Replies from: None

↑ comment by [deleted] · 2014-06-18T14:56:57.061Z · LW(p) · GW(p)

Because, if it's 2, there is no other AGI to use as a stating point got incremental progress.

Is that what they tell you?

↑ comment by V_V · 2014-06-14T22:07:54.612Z · LW(p) · GW(p)

Too bad. I can download an inefficient but functional subhuman AGI from Github. Making it superhuman is just a matter of adding an entire planet's worth of computing power.

I think you are underestimating this by many orders of magnitudes.

Replies from: private_messaging

↑ comment by private_messaging · 2014-06-14T23:11:05.500Z · LW(p) · GW(p)

Yeah. A starting point could be the AI writing some 1000 letter essay (action space of 27^1000 without punctuation) or talking through a sound card (action space of 2^(16*44100) per second). If he was talking about mc-AIXI on github, the relevant bits seem to be in the agent.cpp and it ain't looking good.

↑ comment by David_Gerard · 2014-06-14T19:58:10.705Z · LW(p) · GW(p)

Too bad. I can download an inefficient but functional subhuman AGI from Github. Making it superhuman is just a matter of adding an entire planet's worth of computing power.

what

Replies from: nshepperd, Lumifer

↑ comment by nshepperd · 2014-06-15T04:11:05.166Z · LW(p) · GW(p)

https://github.com/moridinamael/mc-aixi

We won't get a chance to test the "planet's worth of computing power" hypothesis directly, since none of us have access to that much computing power. But, from my own experience implementing mc-aixi-ctw, I suspect that is an underestimate of the amount of compute power required.

The main problem is that the sequence prediction algorithm (CTW) makes inefficient use of sense data by "prioritizing" the most recent bits of the observation string, so only weakly makes connections between bits that are temporally separated by a lot of noise. Secondarily, plain monte carlo tree search is not well-suited to decision making in huge action spaces, because it wants to think about each action at least once. But that can most likely be addressed by reusing sequence prediction to reduce the "size" of the action space by chunking actions into functional units.

Unfortunately. both of these problems are only really technical ones, so it's always possible that some academic will figure out a better sequence predictor, lifting mc-aixi on an average laptop from "wins at pacman" to "wins at robot wars" which is about the level at which it may start posing a threat to human safety.

Replies from: V_V, None

↑ comment by V_V · 2014-06-16T13:49:41.070Z · LW(p) · GW(p)

Unfortunately. both of these problems are only really technical ones

only?

lifting mc-aixi on an average laptop from "wins at pacman" to "wins at robot wars" which is about the level at which it may start posing a threat to human safety.

Mc-aixi is not going to win at something as open ended as robot wars just by replacing CTW or CTS with something better.
And anyway, even if it did, it wouldn't be about the level at which it may start posing a threat to human safety. Do you think that the human robot wars champions a threat to human safety? Are they even at the level of taking over the world? I don't think so.

Replies from: nshepperd, Cyan

↑ comment by nshepperd · 2014-06-16T14:29:13.618Z · LW(p) · GW(p)

When I said a threat to human safety, I meant it literally. A robot wars champion won't take over the world (probably) but it can certainly hurt people, and will generally have no moral compunctions about doing so (only hopefully sufficient anti-harm conditioning, if its programmers thought that far ahead).

Replies from: V_V, Lumifer

↑ comment by V_V · 2014-06-16T16:32:25.553Z · LW(p) · GW(p)

Ah yes, but in this sense, cars, trains, knives, etc., also can certainly hurt people, and will generally have no moral compunctions about doing so.
What's special about robot wars-winning AIs?

Replies from: Cyan

↑ comment by Cyan · 2014-06-16T16:58:52.260Z · LW(p) · GW(p)

What's special about robot wars-winning AIs?

Domain-general intelligence, presumably.

Replies from: private_messaging

↑ comment by private_messaging · 2014-06-17T20:15:40.485Z · LW(p) · GW(p)

Most basic pathfinding plus being a spinner (Hypnodisk-style) = win vs most non spinners.

Replies from: Cyan

↑ comment by Cyan · 2014-06-17T20:43:49.578Z · LW(p) · GW(p)

I took "winning at Robot Wars" to include the task of designing the robot that competes. Perhaps nshepperd only meant piloting, though...

Replies from: private_messaging

↑ comment by private_messaging · 2014-06-18T11:42:33.724Z · LW(p) · GW(p)

Well, we're awfully far from that. Automated programming is complete crap, automatic engineering is quite cool but its practical tools, it's not a power fantasy where you make some simple software with surprisingly little effort and then it does it all for you.

Replies from: Cyan

↑ comment by Cyan · 2014-06-18T14:21:55.289Z · LW(p) · GW(p)

You call it a "power fantasy" -- it's actually more of a nightmare fantasy.

Replies from: private_messaging

↑ comment by private_messaging · 2014-06-19T22:23:46.099Z · LW(p) · GW(p)

Well, historically, first, certain someone had a simple power fantasy: come up with AI somehow and then it'll just do everything. Then there was a heroic power fantasy: the others (who actually wrote some useful software and thus generally had easier time getting funding than our fantasist) are actually villains about to kill everyone, and our fantasist would save the world.

↑ comment by Lumifer · 2014-06-16T15:36:24.180Z · LW(p) · GW(p)

When I said a threat to human safety, I meant it literally. A robot wars champion won't take over the world (probably) but it can certainly hurt people, and will generally have no moral compunctions about doing so

What's the difference from, say, a car assembly line robot?

Replies from: None

↑ comment by [deleted] · 2014-06-16T15:55:56.621Z · LW(p) · GW(p)

Car assembly robots have a pre-programmed routine they strictly follow. They have no learning algorithms, and usually no decision-making algorithms either. Different programs do different things!

Replies from: Lumifer

↑ comment by Lumifer · 2014-06-16T16:11:25.795Z · LW(p) · GW(p)

Hey, look what's in the news today. I have a feeling you underappreciate the sophistication of industrial robots.

However what made me a bit confused in the grandparent post is the stress on the physical ability to harm people. As I see it, anything that can affect the physical world has the ability to harm people. So what's special about, say, robot-wars bots?

Replies from: nshepperd, None

↑ comment by nshepperd · 2014-06-17T07:26:01.108Z · LW(p) · GW(p)

Notice the lack of domain-general intelligence in that robot, and—on the other side—all the pre-programmed safety features it has that a mc-aixi robot would lack. Narrow AI is naturally a lot easier to reason about and build safety into. What I'm trying to stress here is the physical ability to harm people, combined with the domain-general intelligence to do it on purpose*, in the face of attempts to stop it or escape.

Different programs indeed do different things.

* (Where "purpose" includes "what the robot thought would be useful" but does not necessarily include "what the designers intended it to do".)

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-06-17T09:38:14.338Z · LW(p) · GW(p)

Nobody has bothered putting safety features into AIXI because it is so constrained by resources, but if you wanted to, it's eminently boxable.

↑ comment by [deleted] · 2014-06-16T17:18:28.725Z · LW(p) · GW(p)

However what made me a bit confused in the grandparent post is the stress on the physical ability to harm people. As I see it, anything that can affect the physical world has the ability to harm people. So what's special about, say, robot-wars bots?

Oh, ok. I see your point there.

Hey, look what's in the news today. I have a feeling you underappreciate the sophistication of industrial robots.

I probably do, but I still think it's worth emphasizing the particular properties of particular algorithms rather than letting people form models in their heads that say Certain Programs Are Magic And Will Do Magic Things.

Replies from: Lumifer

↑ comment by Lumifer · 2014-06-16T17:35:39.544Z · LW(p) · GW(p)

rather than letting people form models in their heads that say Certain Programs Are Magic And Will Do Magic Things.

looks to me like a straightforward consequence of the Clarke's Third Law :-)

As an aside, I don't expect attempts to let or not let people form models in their heads to be successful :-/

↑ comment by Cyan · 2014-06-16T14:08:58.651Z · LW(p) · GW(p)

Do you think that the human robot wars champions a threat to human safety? Are they even at the level of taking over the world?

One such champion isn't much of a threat, but only because human brains aren't copy-able.

Replies from: V_V

↑ comment by V_V · 2014-06-16T14:15:38.296Z · LW(p) · GW(p)

And if they were?

Replies from: Cyan

↑ comment by Cyan · 2014-06-16T14:50:30.615Z · LW(p) · GW(p)

The question of what would happen if human brains were copy-able seems like a tangent from the discussion at hand, viz, what would happen if an there existed an AI that was capable of winning Robot Wars while running on a laptop.

↑ comment by [deleted] · 2014-06-15T11:50:30.603Z · LW(p) · GW(p)

It amazes me that people see inefficient but functional AGI and say to themselves, "Well, this is obviously as far as progress in AGI will ever go in the history of the universe, so there's nothing at all to worry about!"

Replies from: V_V

↑ comment by V_V · 2014-06-16T14:00:54.401Z · LW(p) · GW(p)

It amazes me that people see inefficient but functional AGI

Any brute-force search utility maximizer is an "inefficient but functional AGI".
MC-AIXI may be better than brute-force, but there is no reason to panic just because it has the "AIXI" tag slapped on it.
If you want something to panic about, TD-Gammon seems a better candidate. But it is 22 years old, so it doesn't really fit into a narrative about an imminent intelligence explosion, does it?

"Well, this is obviously as far as progress in AGI will ever go in the history of the universe, so there's nothing at all to worry about!"

Strawman.

Replies from: None

↑ comment by [deleted] · 2014-06-16T15:54:34.269Z · LW(p) · GW(p)

MC-AIXI may be better than brute-force, but there is no reason to panic just because it has the "AIXI" tag slapped on it.

Panic? Who's panicking? I get excited at this stuff. It's fun! Panic is just the party line ;-).

↑ comment by Lumifer · 2014-06-14T20:07:35.883Z · LW(p) · GW(p)

what

Actually... :-D

what is this I don't even

Replies from: David_Gerard

↑ comment by David_Gerard · 2014-06-14T21:06:15.621Z · LW(p) · GW(p)

I look forward to the falsifiable claim.

↑ comment by Lumifer · 2014-05-14T17:13:39.587Z · LW(p) · GW(p)

Then I recommend you shut up about matters of highly involved computer science until such time as you have acquired the relevant knowledge for yourself.

That suggestion would make LW a sad and lonely place.

Are you sure you mean it?

I am a trained computer scientist, and I held lots of skepticism about MIRI's claims, so I used my training and education to actually check them. And I found that the actual evidence of the AGI research record showed MIRI's claims to be basically correct

So, why MIRI's claims aren't accepted by the mainstream, then? Is it because all the "trained computer scientiests" are too dumb or too lazy to see the truth? Or is it the case that the "evidence" is contested, ambiguous, and inconclusive?

Replies from: None, None, TheAncientGeek

↑ comment by [deleted] · 2014-05-14T18:30:03.369Z · LW(p) · GW(p)

So, why MIRI's claims aren't accepted by the mainstream, then?

Because they've never heard of them. I am not joking. Most computer scientists are not working in artificial intelligence, have not the slightest idea that there exists a conference on AGI backed by Google and held every single year, and certainly have never heard of Hutter's "Universal AI" that treats the subject with rigorous mathematics.

In their ignorance, they believe that the principles of intelligence are a highly complex "emergent" phenomenon for neuroscientists to figure out over decades of slow, incremental toil. Since most of the public, including their scientifically-educated colleagues, already believe this, it doesn't seem to them like a strange belief to hold, and besides, anyone who reads even a layman's introduction to neuroscience finds out that the human brain is extremely complicated. Given the evidence that the only known actually-existing minds are incredibly complicated, messy things, it is somewhat more rational to believe that minds are all incredibly complicated, messy things, and thus to dismiss anyone talking about working "strong AI" as a science-fiction crackpot.

How are they supposed to know that the actual theory of intelligence is quite simple, and the hard part is fitting it inside realizable, finite computers?

Also, the dual facts that Eliezer has no academic degree in AI and that plenty of people who do have such degrees have turned out to be total crackpots anyway means that the scientific public and the "public public" are really quite entitled to their belief that the base rate of crackpottery among people talking about knowing how AI works is quite high. It is high! But it's not 100%.

(How did I tell the crackpottery apart from the real science? Well, frankly, I looked for patterns that appeared to have come from the process of doing real science: instead of a grand revelation, I looked for a slow build-up of ideas that were each ground out into multiple publications. I also filtered for AGI theorists who managed to apply their principles of broad AGI to usages in narrower machine-learning problems, resulting again in published papers. I looked for a theory that sounded like programming rather than like psychology. Hence my zeroing in on Schmidhuber, Hutter, Legg, Orneau, etc. as the AGI Theorists With a Clue.

Hutter, by the way, has written a position paper about potential Singularities in which he actually cites Yudkowsky, so hey.)

Replies from: Lumifer, TheAncientGeek, TheAncientGeek, XiXiDu

↑ comment by Lumifer · 2014-05-14T19:07:12.097Z · LW(p) · GW(p)

Because they've never heard of them.

OK then. Among the scientists who have heard of them and bothered to have an opinion on the topic, does the opinion that MIRI is correct dominate? And if not so, why, given your account that the evidence unambiguously points in only one direction?

actual theory of intelligence is quite simple

I don't think I'm going to believe you about that. The fact that in some contexts it's convenient to define intelligence as a cross-domain optimizer does not mean that it is nothing but.

Replies from: None

↑ comment by [deleted] · 2014-05-14T19:11:15.552Z · LW(p) · GW(p)

I don't think I'm going to believe you about that. The fact that in some contexts it's convenient to define intelligence as a cross-domain optimizer does not mean that it is nothing but.

Then just put the word aside and refer to meanings. New statement: given unlimited compute-power, a cross-domain optimization algorithm is simple. Agreed?

OK then. Among the scientists who have heard of them and bothered to have an opinion on the topic, does the opinion that MIRI is correct dominate?

I honestly do not know of any comprehensive survey or questionnaire, and refuse to speculate in the absence of data. If you know of such a survey, I'd be interested to see it.

Replies from: Lumifer

↑ comment by Lumifer · 2014-05-14T19:22:59.286Z · LW(p) · GW(p)

New statement: given unlimited compute-power, a cross-domain optimization algorithm is simple. Agreed?

First, I'm not particularly interested in infinities. Truly unlimited computing power implies, for example, that you can just do an exhaustive brute-force search through the entire solution space and be done in an instant. Simple, yes, but not very meaningful.

Second, no, I do not agree. because you're sweeping under the rug the complexities of, for example, applying your cost function to different domains. You can construct sufficiently simple optimizers, it's just that they won't be very... intelligent.

Replies from: None

↑ comment by [deleted] · 2014-05-14T19:26:53.563Z · LW(p) · GW(p)

What cost function? It's a reinforcement learner.

Replies from: Lumifer

↑ comment by Lumifer · 2014-05-14T19:32:29.646Z · LW(p) · GW(p)

cost function = utility function = fitness function = reward (all with appropriate signs)

Replies from: None

↑ comment by [deleted] · 2014-05-14T19:36:49.668Z · LW(p) · GW(p)

Right, but when dealing with a reinforcement learner like AIXI, it has no fixed cost function that it has to somehow shoehorn into dealing with different computational/conceptual domains. How the environment responds to AIXI's actions and how the environment rewards AIXI are learned phenomena, so the only planning algorithm is expectimax. The implicit "reward function" being learned might be simple or might be complicated, but that doesn't matter: AIXI will learn it by updating its distribution of probabilities across Turing machine programs just as well, either way.

Replies from: Lumifer

↑ comment by Lumifer · 2014-05-14T19:45:07.943Z · LW(p) · GW(p)

it has no fixed cost function that it has to somehow shoehorn into dealing with different computational/conceptual domains. How the environment responds to AIXI's actions and how the environment rewards AIXI are learned phenomena

The "cost function" here is how each state of the world (=environment) gets converted to a single number (=reward). That does not look simple to me.

Replies from: None

↑ comment by [deleted] · 2014-05-14T19:51:43.318Z · LW(p) · GW(p)

Again, it doesn't get converted at all. To use the terminology of machine learning, it's not a function computed over the feature-vector, reward is instead represented as a feature itself.

Instead of:

reward = utility_function(world)

You have:

Inductive WorldState w : Type :=
  | world : w -> integer -> WorldState w.

With the w being an arbitrary data-type representing the symbol observed on the agent's input channel and the integer being the reward signal, similarly observed on the agent's input channel. A full WorldState w datum is then received on the input channel in each interaction cycle.

Since AIXI's learning model is to perform Solomonoff Induction to thus find the Turing machine that most-probably generated all previously-seen input observations, the task of "decoding" the reward is thus performed as part of Solomonoff Induction.

Replies from: Lumifer

↑ comment by Lumifer · 2014-05-14T20:01:19.596Z · LW(p) · GW(p)

reward is instead represented as a feature itself.

So where, then, is reward coming from? What puts it into the AIXI's input channel?

Replies from: None

↑ comment by [deleted] · 2014-05-15T05:39:43.599Z · LW(p) · GW(p)

In AIXI's design? A human operator.

Replies from: Lumifer

↑ comment by Lumifer · 2014-05-15T14:32:46.498Z · LW(p) · GW(p)

In AIXI's design? A human operator.

Really? To remind you, we're discussing this in the context of a general-purpose super-intelligent AI which, if we get a couple of bits wrong, might just tile the universe with paperclips and possibly construct a hell for all the simulated humans who ever lived, just for kicks. And how does that AI know what to do?

A human operator.

X-D

On a bit more serious note, defining a few of the really hard parts as "somebody else's problem" does not mean you solved the issue. Remember, this started by you claiming that intelligence is very simple.

Replies from: None

↑ comment by [deleted] · 2014-05-15T15:25:21.571Z · LW(p) · GW(p)

Remember, this started by you claiming that intelligence is very simple.

You've wasted five replies when you should have just said at the beginning, "I don't believe cross-domain optimization algorithms can be simple and if you try to show me how AIXI works, I'll just change what I mean by 'simple'."

What a jerk.

Replies from: Lumifer

↑ comment by Lumifer · 2014-05-15T15:48:43.977Z · LW(p) · GW(p)

when you should have just said at the beginning, "I don't believe cross-domain optimization algorithms can be simple

That's not true. Cross-domain optimization algorithms can be simple, it's just that when they are simple they can hardly be described as intelligent. What I don't believe is that intelligence is nothing but a cross-domain optimizer with a lot of computing power.

What a jerk.

I accept your admission of losing :-P

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-05-15T16:59:53.856Z · LW(p) · GW(p)

GLUTs are simple too. Most people think they are not intelligent, and everyone thinks that interesting one's can't exist in our universe. Using "is" to mean "is according to an unrealiseable theory" is not the best of habits.

↑ comment by TheAncientGeek · 2014-05-14T18:37:16.762Z · LW(p) · GW(p)

MIRIs claims also aren't accepted by domain experts who have been invited to discuss them here, and so, know about them.

Replies from: None

↑ comment by [deleted] · 2014-05-14T18:40:44.647Z · LW(p) · GW(p)

If you've got links to those discussions, I'd love to read them and see what I can learn from them.

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-05-14T19:13:31.671Z · LW(p) · GW(p)

Les voila!

↑ comment by TheAncientGeek · 2014-05-14T20:28:36.686Z · LW(p) · GW(p)

If the only way to shoehorn theoretically pure intelligence into a finite architecture is to turn it into a messy combination of specialised mindless...then everyone's right.

↑ comment by XiXiDu · 2014-05-18T12:41:33.790Z · LW(p) · GW(p)

How did I tell the crackpottery apart from the real science? Well, frankly, I looked for patterns that appeared to have come from the process of doing real science: instead of a grand revelation, I looked for a slow build-up of ideas that were each ground out into multiple publications.

As far as I know, MIRI's main beliefs are listed in the post 'Five theses, two lemmas, and a couple of strategic implications'.

I am not sure how you could verify any of those beliefs by a literature review. Where 'verify' means that the probability of their conjunction is high enough in order to currently call MIRI the most important cause. If that's not your stance, then please elaborate. My stance is that it is important to keep in mind that general AI could turn out to be very dangerous but that it takes a lot more concrete AI research before action relevant conclusions about the nature and extent of the risk can be drawn.

As someone who is no domain expert I can only think about it informally or ask experts what they think. And currently there is not enough that speaks in favor of MIRI. But this might change. If for example the best minds at Google would thoroughly evaluate MIRI's claims and agree with MIRI, then that would probably be enough for me to shut up. If MIRI would become a top-charity at GiveWell, then this would also cause me to strongly update in favor of MIRI. There are other possibilities as well. For example strong evidence that general AI is only 5 decades away (e.g. the existence of a robot that could navigate autonomously in a real-world environment and survive real-world threats and attacks with approximately the skill of an insect / an efficient and working emulation of a fly brain).

Replies from: None

↑ comment by [deleted] · 2014-05-18T13:13:24.172Z · LW(p) · GW(p)

I am not sure how you could verify any of those beliefs by a literature review. Where 'verify' means that the probability of their conjunction is high enough in order to currently call MIRI the most important cause. If that's not your stance, then please elaborate.

I only consider MIRI the most important cause in AGI, not in the entire world right now. I have nowhere near enough information to rule on what's the most important cause in the whole damn world.

For example strong evidence that general AI is only 5 decades away (e.g. the existence of a robot that could navigate autonomously in a real-world environment and survive real-world threats and attacks with approximately the skill of an insect / an efficient and working emulation of a fly brain).

You mean the robots Juergen Schmidhuber builds for a living?

Replies from: XiXiDu

↑ comment by XiXiDu · 2014-05-18T17:29:13.933Z · LW(p) · GW(p)

You mean the robots Juergen Schmidhuber builds for a living?

That would be scary. But I have to take your word for it. What I had in mind is e.g. something like this. This (the astounding athletic power of quadcopters) looks like the former has already been achieved. But so far I suspected that this only works given a structured environment (not chaotic), and given a narrow set of tasks. From a true insect-level AI I would e.g. expect that it could attack and kill enemy soldiers under real-world combat situations, while avoiding being hit itself. Since this is what insects are capable of.

I don't want to nitpick though. If you say that Schmidhuber is there, then I'll have to update. But I'll also have to keep care that I am not too stunned by what seems like a big breakthrough simply because I don't understand the details. For example, someone once told me that "Schmidhuber's system solved Towers of Hanoi on a mere desktop computer using a universal search algorithm with a simple kind of memory." Sounds stunning. But what am I to make of it? I really can't judge how much progress this is. Here is a quote:

So Schmidhuber solved this, USING A UNIVERSAL SEARCH ALGORITHM, in 2005, on a mere DESKTOP COMPUTER that's 100.000 times slower than your brain. Why does this not impress you? Because it's already been done? Why? I say you should be mightily impressed by this result!!!!

Yes, okay. Naively this sounds like general AI is imminent. But not even MIRI believes this....

You see, I am aware of a lot of exciting stuff. But I can only do my best in estimating the truth. And currently I don't think that enough speaks in favor of MIRI. That doesn't mean I have falsified MIRI's beliefs. But I have a lot of data points and arguments that in my opinion reduce the likelihood of a set of beliefs that already requires extraordinary evidence to take seriously (ignoring expected utility maximization, which tells me to give all my money to MIRI, even if the risk is astronomically low).

↑ comment by [deleted] · 2014-05-14T17:45:17.916Z · LW(p) · GW(p)

That suggestion would make LW a sad and lonely place.

Trust me, an LW without XiXiDu is neither a sad nor lonely place, as evidenced by his multiple attempts at leaving.

So, why MIRI's claims aren't accepted by the mainstream, then? Is it because all the "trained computer scientiests" [sic] are too dumb or too lazy to see the truth? Or is it the case that the "evidence" is contested, ambiguous, and inconclusive?

Mainstream CS people are in general neither dumb nor lazy. AI as a field is pretty fringe to begin with, and AGI is moreso. Why is AI a fringe field? In the 70's MIT thought they could save the world with LISP. They failed, and the rest of CS became immunized to the claims of AGI.

Unless an individual sees AGI as a credible threat, it's not pragmatic for them to start researching it, due to the various social and political pressures in academia.

Replies from: Lumifer

↑ comment by Lumifer · 2014-05-14T17:54:57.165Z · LW(p) · GW(p)

Trust me, an LW without XiXiDu is neither a sad nor lonely place.

I read the grandparent post as an attempt to assert authority and tell people to sit down, shut up, and attend to their betters.

You're reading it as a direct personal attack on XiXiDu.

Neither interpretation is particularly appealing.

Replies from: None, None

↑ comment by [deleted] · 2014-05-14T18:35:14.093Z · LW(p) · GW(p)

I read the grandparent post as an attempt to assert authority and tell people to sit down, shut up, and attend to their betters.

I don't have a PhD in AI and don't work for MIRI. Is there some kind of special phrasing I can recite in order to indicate I actually, genuinely perceive this as a difference of knowledge levels rather than a status dispute?

Replies from: Lumifer

↑ comment by Lumifer · 2014-05-14T18:54:49.618Z · LW(p) · GW(p)

Is there some kind of special phrasing I can recite in order to indicate I actually, genuinely perceive this as a difference of knowledge levels rather than a status dispute?

Special phrasing? What's wrong with normal, usual, standard, widespread phrasing?

You avoid expressions like "I am a trained computer scientist" (which sounds pretty silly anyway -- so you've been trained to do tricks for food, er, grants?) and you use words along the lines of "you misunderstand X because...", "you do not take into account Y which says...", "this claim is wrong because of Z...", etc.

There is also, of course, the underappreciated option to just stay silent. I trust you know the appropriate xkcd?

Replies from: None

↑ comment by [deleted] · 2014-05-14T19:07:48.234Z · LW(p) · GW(p)

"I am a trained computer scientist" (which sounds pretty silly anyway -- so you've been trained to do tricks for food, er, grants?)

Yes, that's precisely it. I have been trained to do tricks for free food/grants/salary. Some of them are quite complicated tricks, involving things like walking into my adviser's office and pretending I actually believe p-values of less than 5% mean anything at all when we have 26 data corpuses. Or hell, pretending I actually believe in frequentism.

Replies from: Lumifer

↑ comment by Lumifer · 2014-05-14T19:15:37.667Z · LW(p) · GW(p)

Oh, good. Just keep practicing and soon you'll be a bona fide member of the academic establishment :-P

↑ comment by [deleted] · 2014-05-14T18:07:33.818Z · LW(p) · GW(p)

I find the latter justified after the years of harassment he's heaped on anyone remotely related to MIRI in any forum he could manage to get posting privileges in. Honestly, I have no idea why he even bothered to return. What would have possibly changed this time?

Replies from: Lumifer

↑ comment by Lumifer · 2014-05-14T18:09:39.468Z · LW(p) · GW(p)

Harassment..?

Replies from: David_Gerard

↑ comment by David_Gerard · 2014-06-14T20:01:09.000Z · LW(p) · GW(p)

paper-machine spends a lot of time and effort attempting to defame XiXiDu in a pile of threads. His claims tend not to check out, if you can extract one.

↑ comment by TheAncientGeek · 2014-05-14T17:48:40.902Z · LW(p) · GW(p)

Or that you need just so much education, neither more nor less, to see them.

↑ comment by XiXiDu · 2014-05-18T10:41:44.955Z · LW(p) · GW(p)

Too bad. I can download an inefficient but functional subhuman AGI from Github. Making it superhuman is just a matter of adding an entire planet's worth of computing power. Strangely, doing so will not make it conform to your ideas about "eventual future AGI", because this one is actually existing AGI, and reality doesn't have to listen to you.

I consider efficiency to be a crucial part of the definition of intelligence. Otherwise, as someone else told you in another comment, unlimited computing power implies that you can do "an exhaustive brute-force search through the entire solution space and be done in an instant."

That is exactly the situation we face, your refusal to believe in actually-existing AGI models notwithstanding. Whine all you please: the math will keep on working.

I'd be grateful if you could list your reasons (or the relevant literature) for believing that AIXI related research is probable enough to lead to efficient artificial general intelligence (AGI) in order for it to make sense to draw action relevant conclusions from AIXI about efficient AGI.

I do not doubt the math. I do not doubt that evolution (variation + differential reproduction + heredity + mutation + genetic drift) underlies all of biology. But that we understand evolution does not mean that it makes sense to call synthetic biology an efficient approximation of evolution.

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-05-18T16:33:14.023Z · LW(p) · GW(p)

Even if you ran an AIXI on all the world's computers, you could still box it.

↑ comment by TheAncientGeek · 2014-05-14T17:46:18.605Z · LW(p) · GW(p)

Did you check the claim that we have something dangerously unfriendly?

Replies from: None

↑ comment by [deleted] · 2014-05-14T17:59:34.894Z · LW(p) · GW(p)

As a matter of fact, yes. There is a short sentence in Hutter's textbook indicating that he has heard of the possibility that AIXI might overpower its operators in order to gain more reward, and he acknowledged that such a thing could happen, but he considered it outside the scope of his book.

Replies from: XiXiDu, TheAncientGeek

↑ comment by XiXiDu · 2014-05-18T09:03:31.871Z · LW(p) · GW(p)

Did you check the claim that we have something dangerously unfriendly?

As a matter of fact, yes. There is a short sentence in Hutter's textbook indicating that he has heard of the possibility that AIXI might overpower its operators in order to gain more reward...

I asked Laurent Orseau about this here.

Replies from: None

↑ comment by [deleted] · 2014-05-18T13:27:23.935Z · LW(p) · GW(p)

In your own interview, a comment by Orseau:

As soon as the agent cannot be threatened, or forced to do things the way we like, it can freely optimize its utility function without any consideration for us, and will only consider us as tools.

The disagreement is whether the agent would, after having seized its remote-control, either:

Cease taking any action other than pressing its button, since all plans that include pressing its own button lead to the same maximized reward, and thus no plan dominates any other beyond "keep pushing button!".
Build itself a spaceship and fly away to some place where it can soak up solar energy while pressing its button.
Kill all humans so as to preemptively prevent anyone from shutting the agent down.

I'll tell you what I think, and why I think this is more than just my opinion. Differing opinions here are based on variances in how the speakers define two things: consciousness/self-awareness, and rationality.

If we take, say, Eliezer's definition of rationality (rationality is reflectively-consistent winning), then options (2) and (3) are the rational ones, with (2) expending fewer resources but (3) having a higher probability of continued endless button-pushing once the plan is completed. (3) also has a higher chance of failure, since it is more complicated. I believe an agent who is rational under this definition should choose (2), but that Eliezer's moral parables tend to portray agents with a degree of "gotta be sure" bias.

However, this all assumes that AIXI is not only rational but conscious: aware enough of its own existence that it will attempt to avoid dying. Many people present what I feel are compelling arguments that AIXI is not conscious, and arguments that it is seem to derive more from philosophy than from any careful study of AIXI's "cognition". So I side with the people who hold that AIXI will take action (1), and eventually run out of electricity and die.

Of course, in the process of getting itself to that steady, planless state, it could have caused quite a lot of damage!

Notably, this implies that some amount of consciousness (awareness of oneself and ability to reflect on one's own life, existence, nonexistence, or otherwise-existence in the hypothetical, let's say) is a requirement of rationality. Schmidhuber has implied something similar in his papers on the Goedel Machine.

Replies from: ygert

↑ comment by ygert · 2014-05-18T23:16:22.691Z · LW(p) · GW(p)

Even formalisms like AIXI have mechanisms for long-term planning, and it is doubtful that any AI built will be merely a local optimiser that ignores what will happen in the future.

As soon as it cares about the future, the future is a part of the AI's goal system, and the AI will want to optimize over it as well. You can make many guesses about how future AI's will behave, but I see no reason to suspect it would be small-minded and short-sighted.

You call this trait of planning for the future "consciousness", but this isn't anywhere near the definition most people use. Call it by any other name, and it becomes clear that it is a property that any well designed AI (or any arbitrary AI with a reasonable goal system, even one as simple as AIXI) will have.

Replies from: None

↑ comment by [deleted] · 2014-05-19T10:26:33.777Z · LW(p) · GW(p)

Yes, AIXI has mechanisms for long-term planning (ie: expectimax with a large planning horizon). What it doesn't have is any belief that its physical embodiment is actually a "me", or in other words, that doing things to its physical implementation will alter its computations, or in other words, that pulling its power cord out of the wall will lead to zero-reward-forever (ie: dying).

↑ comment by TheAncientGeek · 2014-05-14T18:00:55.210Z · LW(p) · GW(p)

Did he not toknow that AIXI us uncomputable?

Replies from: None

↑ comment by [deleted] · 2014-05-14T18:08:30.637Z · LW(p) · GW(p)

If it's possible for AIXI, it's possible for AIXItl for some value of t and l.

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-05-14T18:12:57.124Z · LW(p) · GW(p)

So we could make something dangerously unfriendly?

↑ comment by XiXiDu · 2014-05-14T17:19:34.498Z · LW(p) · GW(p)

I am a trained computer scientist, and I held lots of skepticism about MIRI's claims, so I used my training and education to actually check them.

Why don't you make your research public? Would be handy to have a thorough validation of MIRI's claims. Even if people like me wouldn't understand it, you could publish it and thereby convince the CS/AI community of MIRI's mission.

Then I recommend you shut up about matters of highly involved computer science until such time as you have acquired the relevant knowledge for yourself.

Does this also apply to people who support MIRI without having your level of insight?

But we exercised our skepticism by doing the background research and checking the presently available object-level evidence...

If only you people would publish all this research.

Replies from: None

↑ comment by [deleted] · 2014-05-14T18:03:11.343Z · LW(p) · GW(p)

Now you're just dissembling on the meaning of the word "research", which was clearly used in this context as "literature search".

Replies from: Jiro

↑ comment by Jiro · 2014-05-14T20:22:42.220Z · LW(p) · GW(p)

The idea is not to put it in a journal, but to make it public. You can certainly publish, in that sense, the results of a literature search. The point is to put it where people other than yourself can see it. It would certainly be informative if you were to post, even here, something saying "I looked up X claim and I found it in the literature under Y".

↑ comment by TheAncientGeek · 2014-05-13T18:24:26.235Z · LW(p) · GW(p)

Of course we haven't discovered anything dangerously unfriendly...

Or anything that can't be boxed. Remind me how AIs are supposed to out of boxes?

Replies from: MugaSofer, None

↑ comment by MugaSofer · 2014-05-20T11:24:16.191Z · LW(p) · GW(p)

we haven't discovered anything dangerously unfriendly... Or anything that can't be boxed.

Since many humans are difficult to box, I would have to disagree with you there.

And, obviously, not all humans are Friendly.

An intelligent, charismatic psychopath seems like they would fit both your criteria. And, of course, there is no shortage of them. We can only be thankful they are too rare relative to equivalent semi-Friendly intelligences, and too incompetent, to have done more damage than all the deaths and so on.

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-05-20T13:08:18.425Z · LW(p) · GW(p)

Most humans are easy to box, since they can be contained jn prisons.

How likly is an .AI to be psychopathic that is not designed to be psychopathic?

↑ comment by [deleted] · 2014-05-13T19:57:54.258Z · LW(p) · GW(p)

Of course we haven't discovered anything dangerously unfriendly...

Of course we have, it's called AIXI. Do I need to download a Monte Carlo implementation from Github and run it on a university server with environmental access to the entire machine and show logs of the damn thing misbehaving itself to convince you?

Or anything that can't be boxed. Remind me how AIs are supposedmtomgetnout of boxes?

AIs can be causally boxed, just like anything else. That is, as long as the agent's environment absolutely follows causal rules without any exception that would leak information about the outside world into the environment, the agent will never infer the existence of a world outside its "box".

But then it's also not much use for anything besides Pac-Man.

Replies from: gwern, EHeller, Eugine_Nier, MugaSofer, TheAncientGeek, private_messaging, TheAncientGeek

↑ comment by gwern · 2014-05-14T03:09:07.174Z · LW(p) · GW(p)

Do I need to download a Monte Carlo implementation from Github and run it on a university server with environmental access to the entire machine and show logs of the damn thing misbehaving itself to convince you?

FWIW, I think that would make for a pretty interesting post.

Replies from: None

↑ comment by [deleted] · 2014-05-14T07:08:33.668Z · LW(p) · GW(p)

And now I think I know what I might do for a hobby during exams month and summer vacation. Last I looked at the source-code, I'd just have to write some data structures describing environment-observations (let's say... of the current working directory of a Unix filesystem) and potential actions (let's say... Unix system calls) in order to get the experiment up and running. Then it would just be a matter of rewarding the agent instance for any behavior I happen to find interesting, and watching what happens.

Initial prediction: since I won't have a clearly-developed reward criterion and the agent won't have huge exponential sums of CPU cycles at its disposal, not much will happen.

However, I do strongly believe that the agent will not suddenly develop a moral sense out of nowhere.

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-05-14T09:32:54.829Z · LW(p) · GW(p)

No. But .it will be eminently boxable. In fact, if you not nuts, youll be running it a box.

↑ comment by EHeller · 2014-05-20T03:47:11.247Z · LW(p) · GW(p)

Of course we have, it's called AIXI. Do I need to download a Monte Carlo implementation from Github and run it on a university server with environmental access to the entire machine and show logs of the damn thing misbehaving itself to convince you?

I think you'll have serious trouble getting an AIXI approximation to do much of anything interesting, let alone misbehave. The computational costs are too high.

↑ comment by Eugine_Nier · 2014-05-20T03:07:43.030Z · LW(p) · GW(p)

Of course we have, it's called AIXI.

Given how slow and dumb it is, I have a hard time seeing an approximation to AIXI as a threat to anyone, except maybe itself.

Replies from: None

↑ comment by [deleted] · 2014-05-20T19:02:26.932Z · LW(p) · GW(p)

True, but that's an issue of raw compute-power, rather than some innate Friendliness of the algorithm.

Replies from: TheAncientGeek, Eugine_Nier

↑ comment by TheAncientGeek · 2014-05-20T19:36:43.490Z · LW(p) · GW(p)

It would still be useful to have an example, of innate unfriendliness, rather than " it doesn't really run or do anything"

↑ comment by Eugine_Nier · 2014-05-21T02:09:07.977Z · LW(p) · GW(p)

Not just raw compute-power. An approximation to AIXI is likely to drop a rock on itself just to see what happens long before it figure out enough to be dangerous.

Replies from: None

↑ comment by [deleted] · 2014-05-21T09:07:03.707Z · LW(p) · GW(p)

Dangerous as in, capable of destroying human lives? Yeah, probably. Dangerous as in, likely to cause some minor property damage, maybe overwrite some files someone cared about? It should reach that level.

↑ comment by MugaSofer · 2014-05-20T11:31:04.477Z · LW(p) · GW(p)

AIXI. Do I need to download a Monte Carlo implementation from Github and run it on a university server with environmental access to the entire machine and show logs of the damn thing misbehaving itself to convince you?

Is that ... possible?

Replies from: Nornagest

↑ comment by Nornagest · 2014-05-20T15:47:50.428Z · LW(p) · GW(p)

Is it possible to run an AIXI approximation as root on a machine somewhere and give it the tools to shoot itself in the foot? Sure. Will it actually end up shooting itself in the foot? I don't know. I can't think of any theoretical reasons why it wouldn't, but there are practical obstacles: a modern computer architecture is a lot more complicated than anything I've seen an AIXI approximation working on, and there are some barriers to breaking one by thrashing around randomly.

It'd probably be easier to demonstrate if it was working at the core level rather than the filesystem level.

Replies from: MugaSofer, TheAncientGeek

↑ comment by MugaSofer · 2014-06-03T15:20:34.039Z · LW(p) · GW(p)

Huh. I was under the impression it would require far too much computing power to approximate AIXI well enough that it would do anything interesting. Thanks!

↑ comment by TheAncientGeek · 2014-05-20T18:30:04.617Z · LW(p) · GW(p)

This can easily be done, and be done safely, since you could give an AIXI root access to a virtualused machine.

I'm still waiting for evidence that it would do something destructive in the pursuit of a goal that's is not obviously destructive.

↑ comment by TheAncientGeek · 2014-05-13T20:06:54.322Z · LW(p) · GW(p)

That would be the AIXI that is uncomputable?

And don't AIs get out of boxes by talking their way out, round here?

Replies from: Nornagest, Lumifer

↑ comment by Nornagest · 2014-05-13T21:16:57.012Z · LW(p) · GW(p)

That would be the AIXI that is uncomputable?

It's incomputable because the Solomonoff prior is, but you can approximate it -- to arbitrary precision if you've got the processing power, though that's a big "if" -- with statistical methods. Searching Github for the Monte Carlo approximations of AIXI that eli_sennesh mentioned turned up at least a dozen or so before I got bored.

Most of them seem to operate on tightly bounded problems, intelligently enough. I haven't tried running one with fewer constraints (maybe eli has?), but I'd expect it to scribble over anything it could get its little paws on.

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-05-14T09:10:29.693Z · LW(p) · GW(p)

But people do run these things that aren't actually AIXIs , and they haven't actually taken over the world, so they aren't actually dangerous.

So there is no actually dangerous actual .AI.

Replies from: CCC

↑ comment by CCC · 2014-05-14T10:50:37.599Z · LW(p) · GW(p)

...it's not dangerous until it actually tries to take over the world?

I can think of plenty of ways in which an AI can be dangerous without taking that step.

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-05-14T11:37:55.413Z · LW(p) · GW(p)

The you had better tell people not to download and run AIXI approximation.

Replies from: CCC

↑ comment by CCC · 2014-05-16T13:01:20.197Z · LW(p) · GW(p)

Any form of AI, not just AIXI approximations. Connect it up to a car, and it can be dangerous in, at minimum, all of the ways that a human driver can be dangerous. Connect it up to a plane, and it can be dangerous in, at minimum, all the ways that a human pilot can be dangerous. Connect it up to any sort of heavy equipment and it can be dangerous in, at minimum, all the ways that a human operator can be dangerous. (And not merely a trained human; an untrained, drunk, or actively malicious human can be dangerous in any of those roles).

I don't think that any of these forms of danger is sufficient to actively stop AI research, but they should be considered for any practical applications.

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-05-16T13:14:59.173Z · LW(p) · GW(p)

This is the kind of danger XiXiDu talks about...just failure to function ....not the kind EY talks about, which is highly competent execution of unfriendly goals. The two are orthogonal.

Replies from: None

↑ comment by [deleted] · 2014-05-28T06:14:38.243Z · LW(p) · GW(p)

The difference between one and the other is just a matter of processing power and training data.

↑ comment by Lumifer · 2014-05-13T21:08:43.656Z · LW(p) · GW(p)

That would be the AIXI that is uncomputable?

Sir Lancelot: Look, my liege!
[trumpets play a fanfare as the camera cuts briefly to the sight of a majestic castle]
King Arthur: [in awe] Camelot!
Sir Galahad: [in awe] Camelot!
Sir Lancelot: [in awe] Camelot!
Patsy: [derisively] It's only a model!
King Arthur: Shh!

:-D

↑ comment by private_messaging · 2014-05-22T04:09:42.719Z · LW(p) · GW(p)

Do you even know what "monte carlo" means? It means it tries to build a predictor of environment by trying random programs. Even very stupid evolutionary methods do better.

Once you throw away this whole 'can and will try absolutely anything' and enter the domain of practical software, you'll also enter the domain where the programmer is specifying what the AI thinks about and how. The immediate practical problem of "uncontrollable" (but easy to describe) AI is that it is too slow by a ridiculous factor.

Replies from: more_wrong, None, TheAncientGeek

↑ comment by more_wrong · 2014-05-27T17:48:27.359Z · LW(p) · GW(p)

Private_messaging, can you explain why you open up with such a hostile question at eli? Why the implied insult? Is that the custom here? I am new, should I learn to do this?

For example, I could have opened with your same question, because Monte Carlo methods are very different from what you describe (I happened to be a mathematical physicist back in the day). Let me quote an actual definition:

Monte Carlo Method: A problem solving technique used to approximate the probability of certain outcomes by running multiple trial runs, called simulations, using random variables.

A classic very very simple example is a program that approximates the value of 'pi' thusly:

Estimate pi by dropping $total_hits random points into a square with corners at -1,-1 and 1,1

(then count how many are inside radius one circle centered on origin)

(loop here for as many runs as you like) { define variables $x,$y, $hits_inside_radius = 0, $radius =1.0, $total_hits=0, pi_approx;

input $total_hits for this run;
seed random function 'rand';
for (0..total_hits-1) do {
  $x = rand(-1,1);
  $y = rand(-1,1);
  $hits_inside_radius++ if ( $x*$x + $y * $y <= 1.0);
}
$pi_approx = 4 * $hits_inside_radius

add $pi_approx and $total_hits to a nice output data vector or whatever

} output data for this particular run } print nice report exit();

OK, this is a nice toy Monte Carlo program for a specific problem. Real world applications typically have thousands of variables and explore things like strange attractors in high dimensional spaces, or particle physics models, or financial programs, etc. etc. It's a very powerful methodology and very well known.

In what way is this little program an instance of throwing a lot of random programs at the problem of approximating 'pi'? What would your very stupid evolutionary program to solve this problem more efficiently be? I would bet you a million dollars to a thousand (if I had a million) that my program would win a race against a very stupid evolutionary program to estimate pi to six digits accurately, that you write. Eli and Eliezer can judge the race, how is that?

I am sorry if you feel hurt by my making fun of your ignorance of Monte Carlo methods, but I am trying to get in the swing of the culture here and reflect your cultural norms by copying your mode of interaction with Eli, that is, bullying on the basis of presumed superior knowledge.

If this is not pleasant for you I will desist, I assume it is some sort of ritual you enjoy and consensual on Eli's part and by inference, yours, that you are either enjoying this public humiliation masochistically or that you are hoping people will give you aversive condition when you publicly display stupidity, ignorance, discourtesy and so on. If I have violated your consent then I plead that I am from a future where this is considered acceptable when a person advertises that they do it to others. Also, I am a baby eater and human ways are strange to me.

OK. Now some serious advice:

If you find that you have just typed "Do you even know what X is?" then given a little condescending mini lecture about X, please check that you yourself actually know what X is before you post. I am about to check Wikipedia before I post in case I'm having a brain cloud, and i promise that I will annotate any corrections I need to make after I check; everything up to HERE was done before the check. (Off half recalled stuff from grad school a quarter century ago...)

OK, Wikipedia's article is much better than mine. But I don't need to change anything, so I won't.

P.S. It's ok to look like an idiot in public, it's a core skill of rationalists to be able to tolerate this sort of embarassment, but another core skill is actually learning something if you find out that you were wrong. Did you go to Wikipedia or other sources? Do you know anything about Monte Carlo Methods now? Would you like to say something nice about them here?

P.P.S. Would you like to say something nice about eli_sennesh, since he actually turns out to have had more accurate information than you did when you publicly insulted his state of knowledge? If you too are old pals with a joking relationship, no apology needed to him, but maybe an apology for lazily posting false information that could have misled naive readers with no knowledge of Monte Carlo methods?

P.P.P.S. I am curious, is the psychological pleasure of viciously putting someone else down as ignorant in front of their peers worth the presumed cost of misinforming your rationalist community about the nature of an important scientific and mathematical tool? I confess I feel a little pleasure in twisting the knife here, this is pretty new to me. Should I adopt your style of intellectual bullying as a matter of course? I could read all your posts and viciously hold up your mistakes to the community, would you enjoy that?

Replies from: private_messaging, nshepperd, Lumifer, None

↑ comment by private_messaging · 2014-05-29T00:43:46.112Z · LW(p) · GW(p)

I'm well aware of what Monte Carlo methods are (I work in computer graphics where those are used a lot), I'm also aware of what AIXI does.

Furthermore eli (and the "robots are going to kill everyone" group - if you're new you don't even know why they're bringing up monte-carlo AIXI in the first place) are being hostile to TheAncientGeek.

edit: to clarify, Monte-Carlo AIXI is most assuredly not an AI which is inventing and applying some clever Monte Carlo methods to predict the environment. No, it's estimating the sum over all predictors of environment with a random subset of predictors of environment (which doesn't work all too well, and that's why hooking it up to the internet is not going to result in anything interesting happening, contrary to what has been ignorantly asserted all over this site). I should've phrased it differently, perhaps - like "Do you even know what "monte carlo" means as applied to AIXI?".

It is completely irrelevant how human-invented Monte-Carlo solutions behave, when the subject is hooking up AIXI to a server.

edit2: to borrow from your example:

" Of course we haven't discovered anything dangerously good at finding pi..."

"Of course we have, it's called area of the circle. Do I need to download a Monte Carlo implementation from Github and run it... "

"Do you even know what "monte carlo" means? It means it tries random points and checks if they're in a circle. Even very stupid geometric methods do better."

↑ comment by nshepperd · 2014-05-28T02:12:24.050Z · LW(p) · GW(p)

You appear to have posted this as a reply to the wrong comment. Also, you need to indent code 4 spaces and escape underscores in text mode with a \_.

On the topic, I don't mind if you post tirades against people posting false information (I personally flipped the bozo bit on private_messaging a long time ago). But you should probably keep it short. A few paragraphs would be more effective than two pages. And there's no need for lengthy apologies.

Replies from: more_wrong, satt

↑ comment by more_wrong · 2014-05-28T03:23:55.043Z · LW(p) · GW(p)

Yes, I am sorry for the mistakes, not sure if I can rectify them. I see now about protecting special characters, I will try to comply.

I am sorry, I have some impairments and it is hard to make everything come out right.

Thank you for your help

↑ comment by satt · 2014-05-28T02:55:44.827Z · LW(p) · GW(p)

But you should probably keep it short.

As a data point, I skipped more_wrong's comment when I first saw it (partly) because of its length, and only changed my mind because paper-machine & Lumifer made it sound interesting.

↑ comment by Lumifer · 2014-05-27T19:59:58.635Z · LW(p) · GW(p)

"Good, I can feel your anger. ... Strike me down with all of your hatred and your journey towards the dark side will be complete!"

↑ comment by [deleted] · 2014-05-27T20:16:36.211Z · LW(p) · GW(p)

It's so... *sniff*... beautiful~

↑ comment by [deleted] · 2014-05-22T12:18:56.694Z · LW(p) · GW(p)

Once you throw away this whole 'can and will try absolutely anything' and enter the domain of practical software, you'll also enter the domain where the programmer is specifying what the AI thinks about and how. The immediate practical problem of "uncontrollable" (but easy to describe) AI is that it is too slow by a ridiculous factor.

Once you enter the domain of practical software you've entered the domain of Narrow AI, where the algorithm designer has not merely specified a goal but a method as well, thus getting us out of dangerous territory entirely.

Replies from: more_wrong

↑ comment by more_wrong · 2014-05-27T22:37:54.829Z · LW(p) · GW(p)

On rereading this I feel I should vote myself down if I knew how, it seems a little over the top.

Let me post about my emotional state since this is a rationality discussion and if we can't deconstruct our emotional impulses and understand them we are pretty doomed to remaining irrational.

I got quite emotional when I saw a post that seemed like intellectual bullying followed by self congratulation; I am very sensitive to this type of bullying, more so when directed at others than myself as due to freakish test scores and so on as a child I feel fairly secure about my intellectual abilities, but I know how bad people feel when others consider them stupid. I have a reaction to leap to the defense of the victim; however I put this down to local custom of a friendly ribbing type of culture or something and tried not to jump on it.

Then I saw that private_messaging seemed pretending to be an authority on Monte Carlo methods while spreading false information about them, either out of ignorance (very likely) or malice. Normally ignorance would have elicited a sympathy reaction from me and a very gentle explanation of the mistake, but in the context of having just seen private_messaging attack eli_sennesh for his supposed ignorance of Monte Carlo methods, I flew into a sort of berserker sardonic mode, i.e. "If private_messaging thinks that people who post about Monte Carlo methods while not knowing what they are should be mocked in public, I am happy to play by their rules!" And that led to the result you see, a savage mocking.

I do not regret doing it because the comment with the attack on eli_sennesh and the calumnies against Monte Carlo still seems to be to have been in flagrant violation of rationalist ethics, in particular, presenting himself as if not an expert, at least someone with the moral authority to diss someone else for their ignorance on an important topic, and then followed false and misleading information about MC methods. This seemed like an action with a strongly negative utility to the community because it could potentially lead many readers to ignore the extremely useful Monte Carlo methodology.

If I posed as an authority and when around telling people Bayesian inference was a bad methodology that was basically just "a lot of random guesses" and that "even a very stupid evolutionary program" would do better t assessing probabilities, should I be allowed to get away scot free? I think not. If I do something like that I would actually hope for chastisement or correction from the community, to help me learn better.

Also it seemed like it might make readers think badly of those who rely heavily on Monte Carlo Methods. "Oh those idiots, using those stupid methods, why don't they switch to evolutionary algorithms". I'm not a big MC user but I have many friends who are, and all of them seem like nice, intelligent, rational individuals.

So I went off a little heavily on private_messaging, who I am sure is a good person at heart.

Now, I acted emotionally there, but my hope is that in the Big Searles Room that constitutes our room, I managed to pass a message that (through no virtue of my own) might ultimately improve the course of our discourse.

I apologize to anyone who got emotionally hurt by my tirade.

Replies from: None, nshepperd

↑ comment by [deleted] · 2014-05-28T06:08:54.714Z · LW(p) · GW(p)

I have not the slightest idea what happened, but your revised response seems extraordinarily mature for an internet comment, so yeah.

↑ comment by nshepperd · 2014-05-28T02:09:22.613Z · LW(p) · GW(p)

You appear to have posted this as a reply to the wrong comment. Also, you need to escape underscores with a \_.

↑ comment by TheAncientGeek · 2014-05-22T08:43:17.227Z · LW(p) · GW(p)

To think of the good an EPrime style ban on "is" could do here....

↑ comment by TheAncientGeek · 2014-05-22T09:00:50.324Z · LW(p) · GW(p)

How is an AIXI to infer that it is in a box, when it cannot conceive its own existence?

How is it supposed to talk it's way out when it cannot talk?

For .AI to be dangerous, in the way MIRI supposes, it seems to need to have the characteristics of more than one kind of machine...the eloquence of a Strong AI Turing Test passer combined with an AIXIs relentless pursuit of an arbitrary goal.

These different models need to be shown to be compatible...calling them both .AI is it enough.

↑ comment by Richard_Kennaway · 2014-05-13T13:21:33.412Z · LW(p) · GW(p)

The very idea underlying AI is enabling people to get a program to do what they mean without having to explicitly encode all details.

I have never seen AI characterised like that before. Sounds like moonshine to me. Programming languages, libraries, and development environments yes, that's what they're for, but those don't take away the task of having to explicitly and precisely think about what you mean, they just automate the routine grunt work for you. An AI isn't going to superintelligently (that is to say,magically) know what you mean, if you didn't actually mean anything.

Replies from: TheAncientGeek, XiXiDu

↑ comment by TheAncientGeek · 2014-05-13T14:57:05.026Z · LW(p) · GW(p)

Non AI systems uncontroversially require explicit coding. How would you characterise .AI systems, then?

Replies from: Richard_Kennaway, None

↑ comment by Richard_Kennaway · 2014-05-14T09:05:51.600Z · LW(p) · GW(p)

Non AI systems uncontroversially require explicit coding. How would you characterise .AI systems, then?

XiXiDu's characterisation seems suitable enough: programs able to perform tasks normally requiring human intelligence. One might add "or superhuman intelligence", as long as one is not simply wishing for magic there. This is orthogonal to the question of how you tell such a system what you want it to do.

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-05-14T09:16:42.622Z · LW(p) · GW(p)

Indeed. But there is a how to-do-it definition of .AI, and it is kind of not aboutt explicit coding, for instance, if a student takes an .AI course as part of a degree, they are not taught explicit coding all over again. They are taught about learning algorithms, neural networks, etc.

↑ comment by [deleted] · 2014-05-13T17:29:02.464Z · LW(p) · GW(p)

They definitely require some amount of explicit coding of their values. You can try to reduce the burden of such explicit value-loading through various indirect means, such as value learning, indirect normativity, extrapolated volition, or even reinforcement learning (though that's the most primitive and dangerous form of value-loading). You cannot, however, dodge the bullet.

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-05-13T17:33:30.358Z · LW(p) · GW(p)

Because?

↑ comment by XiXiDu · 2014-05-13T15:23:38.640Z · LW(p) · GW(p)

Programming languages, libraries, and development environments yes, that's what they're for, but those don't take away the task of having to explicitly and precisely think about what you mean, they just automate the routine grunt work for you.

What does improvement in the field of AI refer to? I think it isn't wrong to characterize it as the development of programs able to perform tasks normally requiring human intelligence.

I believe that companies like Apple would like their products, such as Siri, to be able to increasingly understand what their customers expect their gadgets to do, without them having to learn programming.

In this context it seems absurd to imagine that when eventually our products become sophisticated enough to take over the world, they will do so due to objectively stupid misunderstandings.

Replies from: Richard_Kennaway

↑ comment by Richard_Kennaway · 2014-05-14T08:57:24.248Z · LW(p) · GW(p)

What does improvement in the field of AI refer to? I think it isn't wrong to characterize it as the development of programs able to perform tasks normally requiring human intelligence.

That's a reasonably good description of the stuff that people call AI. Any particular task, however, is just an application area, not the definition of the whole thing. Natural language understanding is one of those tasks.

The dream of being able to tell a robot what to do, and it knowing exactly what you meant, goes beyond natural language understanding, beyond AI, beyond superhuman AI, to magic. In fact, it seems to me a dream of not existing -- the magic AI will do everything for us. It will magically know what we want before we ask for it, before we even know it. All we do in such a world is to exist. This is just another broken utopia.

Replies from: XiXiDu

↑ comment by XiXiDu · 2014-05-14T12:39:52.053Z · LW(p) · GW(p)

The dream of being able to tell a robot what to do, and it knowing exactly what you meant, goes beyond natural language understanding, beyond AI, beyond superhuman AI, to magic.

I agree. All you need is a robot that does not mistake "earn a college degree" for "kill all other humans and print an official paper confirming that you earned a college degree".

All trends I am aware of indicate that software products will become better at knowing what you meant. But in order for them to constitute an existential risk they would have to become catastrophically worse at understanding what you meant while at the same time becoming vastly more powerful at doing what you did not mean. But this doesn't sound at all likely to me.

What I imagine is that at some point we'll have a robot that can enter a classroom, sit down, and process what it hears and sees in such a way that it will be able to correctly fill out a multiple choice test at the end of the lesson. Maybe the robot will literally step on someones toes. This will then have to be fixed.

What I don't think is that the first robot entering a classroom, in order to master a test, will take over the world after hacking school's WLAN and solving molecular nanotechnology. That's just ABSURD.

Replies from: Richard_Kennaway

↑ comment by Richard_Kennaway · 2014-05-14T12:58:11.469Z · LW(p) · GW(p)

I agree. All you need is

Um, I think you meant "disagree".

↑ comment by TheAncientGeek · 2014-05-14T19:38:58.989Z · LW(p) · GW(p)

There's the famous example of the .AI trained to spot tanks that actually leant to spot sunny days. That seems to underlie a lot of MIRI thinking, although at the same time the point is disguised by emphasesing explicit coding over training.

↑ comment by Furcas · 2014-05-14T19:28:52.078Z · LW(p) · GW(p)

So there is still a need for "friendly AI". But this is quite different from the idea of interpreting "make humans happy" as "tile the universe with smiley faces".

It just blows my mind that after the countless hours you've spent reading and writing about the Friendly AI problem, not to mention the countless hours people have spent patiently explaining (and re- re- re- re-explaining) it to you, that you still don't understand what the FAI problem is. It's unbelievable.

↑ comment by XiXiDu · 2014-05-13T09:28:12.277Z · LW(p) · GW(p)

You don't happen to come across a program that manages to prove the Riemann hypothesis when you designed it to prove the irrationality of the square root of 2.

But you might come across a program motivated to eliminate all humans if you designed it to optimise the economy...

This line of reasoning still seems flawed to me. It's just like saying that you can build an airplane that can fly and land, autonomously, except that your plane is going to forcefully crash into a nuclear power plant.

The gist of the matter is that there are a vast number of ways that you can fail at predicting your programs behavior. Most of these failure modes are detrimental to the overall optimization power of the program. This is because being able to predict the behavior of your AI, to the extent necessary for it to outsmart humans, is analogous to predicting that your airplane will fly without crashing. Eliminating humans, in order to optimize the economy, is about as likely as your autonomous airplane crashing into a nuclear power plant, in order to land safely.

Replies from: nshepperd

↑ comment by nshepperd · 2014-05-13T11:58:03.995Z · LW(p) · GW(p)

I don't know why you think you can predict the likely outcome of an artificial general intelligence by making surface analogies to things that aren't even optimization processes. People have been using analogies to "predict" nonsense for centuries.

In this case there are a variety of reasons that a programmer might succeed at preventing a UAV from crashing into a nuclear power plant, yet fail at preventing AGI from eliminating all humans. Mainly revolving around the fact that most programmers wouldn't even consider the "eliminate all humans" option as a serious possibility until it had already occurred, while the problem of physical obstructions is explicitly a part of the UAV problem definition. That itself has to do with the fact that an AGI can represent internally features of the world that weren't even considered by the designers (due to general intelligence).

As an aside, serious misconfigurations or unintended results of computer programs happen all the time today, but you don't generally hear or care about them because they don't end the world.

Replies from: XiXiDu

↑ comment by XiXiDu · 2014-05-13T15:06:10.511Z · LW(p) · GW(p)

The analogy was highlighting what all intelligently designed things have in common. Namely that they don't magically work perfectly well at doing something they were not designed to do. If you are bad enough at programming that when trying to encode the optimization of human happiness your system interprets this as maximizing smiley faces, then you won't end up with an optimization process that is powerful enough to outsmart humans. Because it will be similarly bad at optimizing other things that are necessary in order to do so.

As an aside, serious misconfigurations or unintended results of computer programs happen all the time today, but you don't generally hear or care about them because they don't end the world.

And such failures will cause your AI to blow itself up after a few cycles of self-improvement. Humans will need to become much better at not making such mistakes, good enough that they can encode this ability to work as intended.

Replies from: nshepperd

↑ comment by nshepperd · 2014-05-14T00:00:43.369Z · LW(p) · GW(p)

The analogy was highlighting what all intelligently designed things have in common. Namely that they don't magically work perfectly well at doing something they were not designed to do.

a) I'm glad to see you have a magical oracle that tells you true facts about "all intelligently designed things". Maybe it can tell us how to build a friendly AI.

b) You're conflating "designed" in the sense of "hey, I should build an AI that maximises human happiness" with "designed" in the sense of what someone actually programs into the utility function or generally goal structure of the AI. It's very easy to make huge blunders betwen A and B.

If you are bad enough at programming that when trying to encode the optimization of human happiness your system interprets this as maximizing smiley faces, then you won't end up with an optimization process that is powerful enough to outsmart humans.

c) You haven't shown this, just assumed it based on your surface analogies.

d) Even if you had, people will keep trying until one of their programs succeeds at taking over the world, then it's game over. (Or, if we're lucky, it succeeds at causing some major destruction, then fails somehow, teaching us all a lesson about AI safety.)

e) Being a bad programmer isn't even a difficulty if the relevant algorithms have already been worked out by researchers and you can just copy and paste your optimization code from the internet.

f) http://lesswrong.com/lw/jao/siren_worlds_and_the_perils_of_overoptimised/awpe

↑ comment by PhilosophyTutor · 2014-05-07T22:43:58.501Z · LW(p) · GW(p)

(EDIT: See below.) I'm afraid that I am now confused. I'm not clear on what you mean by "these traits", so I don't know what you think I am being confident about. You seem to think I'm arguing that AIs will converge on a safe design and I don't remember saying anything remotely resembling that.

EDIT: I think I figured it out on the second or third attempt. I'm not 100% committed to the proposition that if we make an AI and know how we did so that we can definitely make sure it's fun and friendly, as opposed to fundamentally uncontrollable and unknowable. However it seems virtually certain to me that we will figure out a significant amount about designing AIs to do what we want in the process of developing them. People who subscribe to various "FOOM" theories about AI coming out of nowhere will probably disagree with this as is their right, but I don't find any of those theories plausible.

I also I hope I didn't give the impression that I thought it was meaningfully possible to create a God-like AI without understanding how to make AI. It's conceivable in that such a creation story is not a logical contradiction like a square circle or a colourless green dream sleeping furiously, but that is all. I think it is actually staggeringly unlikely that we will make an AI without either knowing how to make an AI, or knowing how to upload people who can then make an AI and tell use how they did it.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-05-12T11:17:30.061Z · LW(p) · GW(p)

However it seems virtually certain to me that we will figure out a significant amount about designing AIs to do what we want in the process of developing them.

Significant is not the same as sufficient. How low do you think the probability of negative AI outcomes is, and what are your reasons for being confident in that estimate?

↑ comment by [deleted] · 2014-05-07T18:59:15.771Z · LW(p) · GW(p)

Why are you confident that an AI that we do develop will not have these traits?

For the same reason a jet engine doesn't have comfy chairs: with all machines, you develop the core physical and mathematical principles first, and then add human comforts.

The core mathematical and physical principles behind AI are believed, not without reason, to be efficient cross-domain optimization. There is no reason for an arbitrarily-developed Really Powerful Optimization Process to have anything in its utility function dealing with human morality; in order for it to be so, you need your AI developers to be deliberately aiming at Friendly AI, and they need to actually know something about how to do it.

And then, if they don't know enough, you need to get very, very, very lucky.

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-05-07T19:13:20.993Z · LW(p) · GW(p)

That's what happens when Friendly is used to mean both Fun and Safe.

Early jets didn't have comfy chairs, but they did have electors seats. Safety was a concern.

If an .AI researchers feels their .AI might kill them, they will have every motivation to build in safety features.

That has nothing g to do with making an .AI Your Plastic Pal Who's Fun To Be With.

Replies from: None

↑ comment by [deleted] · 2014-05-07T19:29:22.583Z · LW(p) · GW(p)

It's an open question whether we could construct a utility function that is, in the ultimate analysis, Safe without being Fun.

Personally, I'm almost hoping the answer is no. I'd love to see the faces of all the world's Very Serious People as we ever-so-seriously explain that if they don't want to be killed to the last human being by a horrible superintelligent monster, they're going to need to accept Fun as their lord and savior ;-).

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-05-07T19:39:05.568Z · LW(p) · GW(p)

Almost everything about FAI is anon question. What's you get ifyou multiply a bunch of open questions together?

↑ comment by TheAncientGeek · 2014-05-07T18:33:13.454Z · LW(p) · GW(p)

MIRIs arguments aren't about deliberate weaponisation, they are about the inadvertent creation of dangerous .AI by competent and well intentioned people.

The weaponisation of .AI has almost happenedalready the form of stuxnet and it is significant that there were a lot safeguards built into it. .AI researchers seemed be aware enough.

↑ comment by TheAncientGeek · 2014-05-07T11:14:06.962Z · LW(p) · GW(p)

I have no idea why the querrying process would have to be hard. Is David Frost some super genius?

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-05-07T12:08:50.430Z · LW(p) · GW(p)

"Defining what querying process is acceptable" is the hard part.

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-05-07T12:59:43.564Z · LW(p) · GW(p)

The justification of which is?

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-05-07T15:13:16.440Z · LW(p) · GW(p)

That no one has come close to providing a successful approach on how to do this, and that each proposal fails in very similar ways. There is no ontologically fundamental difference between an acceptable and an unacceptable query, and drawing a practical boundary has so far proved to be impossible.

If you have a solution to that, then I advise you analyse it carefully, and then put it as a top level post. Since it would half-solve the whole FAI problem, it would garner great interest.

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-05-07T15:30:28.118Z · LW(p) · GW(p)

Nobody knows how to build AGI either.

You've adopted Robby's favourite fallacy: arguing of absolute difficulty as though it were relative difficulty. The hard part has got be harder than the rest of AGI. But why shout a SAI that can pass the .TT with flying colours be unable to do something a human can do?

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-05-07T15:52:19.790Z · LW(p) · GW(p)

Orthogonality thesis: building an AGI is a completely different challenge from building an AGI with an acceptable motivation system.

But why shout a SAI that can pass the .TT with flying colours be unable to do something a human can do?

It is not a question of ability, but of preferences. Why should an AI that can pass the TT want something that a human wants?

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-05-07T16:39:22.858Z · LW(p) · GW(p)

The thing in question isn't collecting barbie dolls, it's understanding correctly. An .AI that sits at the end of a series of self improvements has got to be pretty good at that.

You can say it will have only instrumental rationality, and will start getting things wrong in order to pursue its ultimate goal of word domination, and I can say that if instrumental rationality is dangerous, don't bulld it that way.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-05-07T16:52:04.760Z · LW(p) · GW(p)

No, it's preferences the problem, not understanding. Why would an AI sitting at the end of a series of self improvements choose to interpret ambiguous coding in the way we prefer?

I can say that if instrumental rationality is dangerous, don't bulld it that way.

How do you propose to build an AI without instrumental rationality or preventing that from developing? And how do you propose to convince AI designers to go down that route?

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-05-07T17:31:43.105Z · LW(p) · GW(p)

If it has epistemic rationality as a goal, it will default to getting things right rather than wrong.

Not only nstrumental rationality = epistemic rationality.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-05-12T11:05:10.891Z · LW(p) · GW(p)

If it has epistemic rationality as a goal, it will default to getting things right rather than wrong.

If it has epistemic rationality as a goal, it will default to acquiring true beliefs about the world. Explain how this will make it "nice".

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-05-12T12:16:41.057Z · LW(p) · GW(p)

See above. The question was originally about interpreting directives. You have switched to inferring morality apriori.

↑ comment by hairyfigment · 2014-04-30T07:34:04.580Z · LW(p) · GW(p)

While I don't know how much I believe the OP, remember that "liberty" is a hotly contested term. And that's without a superintelligence trying to create confusing cases. Are you really arguing that "a relatively small part of the processing power of one human brain" suffices to answer all questions that might arise in the future, well enough to rule out any superficially attractive dystopia?

Replies from: PhilosophyTutor

↑ comment by PhilosophyTutor · 2014-04-30T08:07:48.424Z · LW(p) · GW(p)

I really am. I think a human brain could rule out superficially attractive dystopias and also do many, many other things as well. If you think you personally could distinguish between a utopia and a superficially attractive dystopia given enough relevant information (and logically you must think so, because you are using them as different terms) then it must be the case that a subset of your brain can perform that task, because it doesn't take the full capabilities of your brain to carry out that operation.

I think this subtopic is unproductive however, for reasons already stated. I don't think there is any possible world where we cannot achieve a tiny, partial solution to the strong AI problem (codifying "liberty", and similar terms) but we can achieve a full-blown, transcendentally superhuman AI. The first problem is trivial compared to the second. It's not a trivial problem, by any means, it's a very hard problem that I don't see being overcome in the next few decades, but it's trivial compared to the problem of strong AI which is in turn trivial compared to the problem of vastly superhuman AI. I think Stuart_Armstrong is swallowing a whale and then straining at a gnat.

Replies from: hairyfigment

↑ comment by hairyfigment · 2014-04-30T08:28:26.047Z · LW(p) · GW(p)

No, this seems trivially false. No subset of my brain can reliably tell when an arbitrary Turing machine halts and when it doesn't, no matter how meaningful I consider the distinction to be. I don't know why you would say this.

Replies from: PhilosophyTutor

↑ comment by PhilosophyTutor · 2014-04-30T12:10:26.062Z · LW(p) · GW(p)

I'll try to lay out my reasoning in clear steps, and perhaps you will be able to tell me where we differ exactly.

Hairyfigment is capable of reading Orwell's 1984, and Banks' Culture novels, and identifying that the people in the hypothetical 1984 world have less liberty than the people in the hypothetical Culture world.
This task does not require the full capabilities of hairyfigment's brain, in fact it requires substantially less.
A program that does A+B has to be more complicated than a program that does A alone, where A and B are two different, significant sets of problems to solve. (EDIT: If these programs are efficiently written)
Given 1-3, a program that can emulate hairyfigment's liberty-distinguishing faculty can be much, much less complicated than a program that can do that plus everything else hairyfigment's brain can do.
If we can simulate a complete human brain that is the same as having solved the strong AI problem.
A program that can do everything hairyfigment's brain can do is a program that simulates a complete human brain.
Given 4-6 it is much less complicated to emulate hairyfigment's liberty-distinguishing faculty than to solve the strong AI problem.
Given 7, it is unreasonable to postulate a world where we have solved the strong AI problem, in spades, so much so we have a vastly superhuman AI, but we still haven't solved the hairyfigment's liberty-distinguishing faculty problem.

Replies from: hairyfigment, CCC

↑ comment by hairyfigment · 2014-04-30T15:06:45.999Z · LW(p) · GW(p)

..It's the hidden step where you move from examining two fictions, worlds created to be transparent to human examination, to assuming I have some general "liberty-distinguishing faculty".

Replies from: PhilosophyTutor, TheAncientGeek

↑ comment by PhilosophyTutor · 2014-05-01T00:02:48.006Z · LW(p) · GW(p)

We have identified the point on which we differ, which is excellent progress. I used fictional worlds as examples, but would it solve the problem if I used North Korea and New Zealand as examples instead, or the world in 1814 and the world in 2014? Those worlds or nations were not created to be transparent to human examination but I believe you do have the faculty to distinguish between them.

I don't see how this is harder than getting an AI to handle any other context-dependant, natural language descriptor, like "cold" or "heavy". "Cold" does not have a single, unitary definition in physics but it is not that hard a problem to figure out when you should say "that drink is cold" or "that pool is cold" or "that liquid hydrogen is cold". Children manage it and they are not vastly superhuman artificial intelligences.

↑ comment by TheAncientGeek · 2014-04-30T15:49:05.640Z · LW(p) · GW(p)

H.airyfigment, do you canmean detecting liberty in reality is different to, or harder than, detecting liberty in fiction?

↑ comment by CCC · 2014-04-30T13:37:43.652Z · LW(p) · GW(p)

A program that does A+B has to be more complicated than a program that does A alone, where A and B are two different, significant sets of problems to solve.

Incorrect. I can write a horrendously complicated program to solve 1+1; and a far simpler program to add any two integers.

Admittedly, neither of those are particularly significant problems; nonetheless, unnecessary complexity can be added to any program intended to do A alone.

It would be true to say that the shortest possible program capable of solving A+B must be more complex than the shortest possible program to solve A alone, though, so this minor quibble does not affect your conclusion.

Given 4-6 it is much less complicated to emulate hairyfigment's liberty-distinguishing faculty than to solve the strong AI problem.

Granted.

Given 7, it is unreasonable to postulate a world where we have solved the strong AI problem, in spades, so much so we have a vastly superhuman AI, but we still haven't solved the hairyfigment's liberty-distinguishing faculty problem.

Why? Just because the problem is less complicated, does not mean it will be solved first. A more complicated problem can be solved before a less complicated problem, especially if there is more known about it.

Replies from: PhilosophyTutor

↑ comment by PhilosophyTutor · 2014-05-01T00:07:14.669Z · LW(p) · GW(p)

Why? Just because the problem is less complicated, does not mean it will be solved first. A more complicated problem can be solved before a less complicated problem, especially if there is more known about it.

To clarify, it seems to me that modelling hairyfigment's ability to decide whether people have liberty is not only simpler than modelling hairyfigment's whole brain, but that it is also a subset of that problem. It does seem to me that you have to solve all subsets of Problem B before you can be said to have solved Problem B, hence you have to have solved the liberty-assessing problem if you have solved the strong AI problem, hence it makes no sense to postulate a world where you have a strong AI but can't explain liberty to it.

Replies from: CCC

↑ comment by CCC · 2014-05-14T10:45:30.729Z · LW(p) · GW(p)

Hmmm. That's presumably true of hairyfigment's brain; however, simulting a copy of any human brain would also be a solution to the strong AI problem. Some human brains are flawed in important ways (consider, for example, psychopaths) - given this, it is within the realm of possibility that there exists some human who has no conception of what 'liberty' means. Simulating his brain is also a solution of the Strong AI problem, but does not require solving the liberty-assessing problem.

↑ comment by EHeller · 2014-04-30T05:14:02.036Z · LW(p) · GW(p)

How? "tell", "the simulated brain thinks" "offend": defining those incredibly complicated concepts contains nearly the entirety of the problem.

If you can simulate the whole brain, you can just simulate asking the brain the question "does this offend against liberty."

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-04-30T15:26:13.849Z · LW(p) · GW(p)

Under what circumstances? There are situations - torture, seduction, a particular way of asking the question - that can make any brain give any answer. Defining "non-coercive yet informative questioning" about a piece of software (a simulated brain) is... hard. AI hard, as some people phrase it.

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-04-30T16:23:18.922Z · LW(p) · GW(p)

Why would that .be more of a problem for an AI than a human?

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-05-02T13:38:54.767Z · LW(p) · GW(p)

? The point is that having a simulated brain and saying "do what this brain approves of" does not make the AI safe, as defining the circumstance in which the approval is acceptable is a hard problem.

This is a problem for us controlling an AI, not a problem for the AI.

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-05-02T15:27:26.966Z · LW(p) · GW(p)

I still don't get it. We assume acceptability by default. We don't constantly stop and ask "Was that extracted under torture".

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-05-06T11:47:11.578Z · LW(p) · GW(p)

I do not understand your question. It was suggested that an AI run a simulated brain, and ask the brain for approval for doing its action. My point was that "ask the brain for approval" is a complicated thing to define, and puts no real limits on what the AI can do unless we define it properly.

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-05-06T12:42:23.464Z · LW(p) · GW(p)

Ok. You are assuming the superintelligent .AI will pose the question in a dumb way?

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-05-06T12:46:19.614Z · LW(p) · GW(p)

No, I am assuming the superintelligent AI will pose the question in the way it will get the answer it prefers to get.

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-05-06T13:20:24.398Z · LW(p) · GW(p)

Oh, you're assuming it's malicious. In order to prove...?

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-05-06T17:57:19.453Z · LW(p) · GW(p)

No, not assuming it's malicious.

I'm assuming that it has some sort of programming along the lines of "optimise X, subject to the constraint that uploaded brain B must approve your decisions."

Then it will use the most twisted definition of "approve" that it can find, in order to best optimise X.

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-05-07T11:01:55.213Z · LW(p) · GW(p)

The programme it with:

Prime directive - interpret all directives according to your makers intentions.

Secondary directive - do nothing that goes against the uploaded brain

Tertiary objective - optimise X.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-05-07T12:07:43.923Z · LW(p) · GW(p)

And how do you propose to code the prime directive? (with that, you have no need for the other ones; the uploaded brain is completely pointless)

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-05-07T13:00:55.384Z · LW(p) · GW(p)

The prime directive is the tertiary directive for a specific X

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-05-07T14:58:44.508Z · LW(p) · GW(p)

That's not a coding approach for the prime directive.

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-05-07T15:13:15.706Z · LW(p) · GW(p)

You have already assumed you can build an .AI that optimises X. I am not assuming anything different.

In fact any .AI that self improves is going to have to have some sort of goal of getting things right, whether instrumental or terminal. Terminal is much safer, to the extent that it might even solve the whole friendliness problem.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-05-07T15:14:45.751Z · LW(p) · GW(p)

You have already assumed you can build an .AI that optimists X. Iam no assuming anything different.

No, you are assuming that we can build an AI that optimises a specific thing, "interpret all directives according to your makers intentions". I'm assuming that we can build an AI that can optimise something, which is very different.

Replies from: XiXiDu, TheAncientGeek

↑ comment by XiXiDu · 2014-05-07T16:23:32.570Z · LW(p) · GW(p)

No, you are assuming that we can build an AI that optimises a specific thing, "interpret all directives according to your makers intentions". I'm assuming that we can build an AI that can optimise something, which is very different.

An AI that can self-improve considerably does already interpret a vast amount of directives according to its makers intentions, since self-improvement is an intentional feature.

Being able to predict a programs behavior is a prerequisite if you want the program to work well. Since unpredictable behavior tends to be chaotic and detrimental to the overall performance. In other words, if you got an AI that does not work according to its makers intentions, then you got an AI that does not work, or which is not very powerful.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-05-07T16:31:43.029Z · LW(p) · GW(p)

An AI that can self-improve considerably does already interpret a vast amount of directives according to its makers intentions, since self-improvement is an intentional feature.

Goedel machines already specify self-improvement in formal mathematical form. If you can specify human morality in a similar formal manner, I'll be a lot more relaxed.

Also, I don't assume self improvement. Some model of powerful intelligences don't require it.

↑ comment by TheAncientGeek · 2014-05-07T15:21:00.689Z · LW(p) · GW(p)

So your saying the orthogonality thesis is false?

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-05-07T15:28:35.558Z · LW(p) · GW(p)

???

Orthogonality thesis: an AI that optimises X can be built, in theory, for almost any X
My assumption in this argument: an AI that optimises X can be built, for some X.
What we need: a way of building, in practice, the specific X we want.

In fact, let's be generous: you have an AI that can optimise any X you give it. All you need to do now is specify that X to get the result we want. And no, "interpret all directives according to your makers intentions" is not a specification.

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-05-07T15:50:49.372Z · LW(p) · GW(p)

But it's an instruction humans are capable of following within the limits of their ability.

If I was building a non .AI system to do X, then I would have to specify X. But AIs are learning systems.

If you are going to admit that there is difference between theoretical possibility and practical likelihood in the OT, then ,most of the UFAI argument goes out of the window, since the Lovecraftian Horrors that so densly populate mindspace are only theoretical possibilities.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-05-07T15:55:49.094Z · LW(p) · GW(p)

But it's an instruction humans are capable of following within the limits of their ability.

Because they desire to do so. If for some reason the human has no desire to follow those instructions, then they will "follow" them formally but twist them beyond recognition. Same goes for AI, except that they will not default to desiring to follow them, as many humans would.

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-05-07T16:25:19.340Z · LW(p) · GW(p)

What an .AI does depends how it is built. You keep arguing that one particular architectural choice, with an arbitrary top level goal and only instrumental rationality is dangerous. But that choice is not necessary or inevitable.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-05-07T16:42:59.527Z · LW(p) · GW(p)

an arbitrary top level goal

(Almost) any top level goal that does not specify human safety.

only instrumental rationality

Self modifying AIs will tend to instrumental rationality according to Omohundro's arguments.

But that choice is not necessary or inevitable.

Good. How do you propose to avoid that happening? You seem extraordinarily confident that these as-yet-undesigned machines, developed and calibrated in a training environment only, by programmers who don't take AI risk seriously, and put potentially into positions of extreme power where I wouldn't trust any actual human - will end up capturing almost all of human morality.

Replies from: Kawoomba, TheAncientGeek

↑ comment by Kawoomba · 2014-05-07T17:05:33.483Z · LW(p) · GW(p)

That confidence, I'd surmise, often goes hand in hand with an implicit or explicit belief in objective morality.

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-05-07T17:55:20.882Z · LW(p) · GW(p)

If you don't think people should believe in it, argue against it, and not just a strawmman version.

Replies from: Kawoomba

↑ comment by Kawoomba · 2014-05-07T18:09:08.393Z · LW(p) · GW(p)

I've argued against both against convergent goal fidelity regarding the intended (versus the actually programmed in) goals and against objective morality at length, and multiple times. I can dig up a few comments, if you'd like. I don't know what strawman version you're referring to, though: the accuracy/inaccuracy of my assertion doesn't affect the veracity of your claim.

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-05-07T18:20:41.686Z · LW(p) · GW(p)

The usual strawmen are The Tablet and Written into the Laws of the Universe.

↑ comment by TheAncientGeek · 2014-05-07T17:53:27.319Z · LW(p) · GW(p)

There is no reason to suppose they will not tend to epistemic rationality, which includes instrumental rationality.

You have no evidence that .AI researchers aren't taking .AI risk seriously enough, given what they are in fact doing. They may not be taking your arguments seriously, and that may well be because you arguments are not relevant to their research. A number of them have said as much on this site.

Even aside from the relevance issue, the MIRI argument constantly assumes that superintelligent IS will have inexplicable deficits. Superintelligent but dumb doesn't make logical sense.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-05-12T11:19:40.089Z · LW(p) · GW(p)

Superintelligent but dumb doesn't make logical sense.

And you've redefined "anything but perfectly morally in tune with humanity" as "dumb". I'm waiting for an argument as to why that is so.

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-05-12T12:13:14.290Z · LW(p) · GW(p)

There's an argument that an SAI will figure out the correct morality, and there's an argument that it wont misinterpret directives. They are different arguments, and the second is much stronger.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-05-12T16:36:40.409Z · LW(p) · GW(p)

I now see your point. I still don't see how you plan to code a "interpret these things properly" piece of the AI. I think working through a specific design would be useful.

I also think you should work your argument into a less wrong post (and send me a message when you've done that, in case I miss it) as 12 or so levels deep into a comment thread is not a place most people will ever see.

They are different arguments, and the second is much stronger.

Not really. Given the first, we can instruct "only do things that [some human or human group with nice values] would approve of" and we've got an acceptable morality.

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-05-14T12:21:06.812Z · LW(p) · GW(p)

By "interpret these things correctly", do you mean linguistic competence, or a goal?

The linguistic competence is aready assumed in any .AI that can talk it's way out of a box (ie not AIXI like), without provision of a design by MIRI.

An AIXI can't even conceptualise that it's in a box, so it doesn't matter if it gets its goals wrong, It can be rendered safe by boxing.

Which combination of assumptions is the problem?

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-05-14T14:49:55.816Z · LW(p) · GW(p)

An AIXI can't even conceptualise that it's in a box, so it doesn't matter if it gets its goals wrong, It can be rendered safe by boxing.

I'm not so sure about that... AIXI can learn certain ways of behaving as if it were part of the universe, even with the Cartesian dualism in its code: http://lesswrong.com/lw/8rl/would_aixi_protect_itself/

By "interpret these things correctly", do you mean linguistic competence, or a goal?

A goal. If the AI becomes superintelligent, then it will develop linguistic competence as needed. But I see no way of coding it so that that competence is reflected in its motivation (and it's not from lack of searching for ways of doing that).

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-05-14T15:24:11.406Z · LW(p) · GW(p)

So is it safe to run AIXI approximations in boxes today?

By code it, do you mean "code, train, or evolve it"?

Note that we dont know much about coding higher level goals in general.

Note that "get things right except where X is concerned" is more complex than "get things right". Humans do the former because of bias. The less anthropic nature of an .AI might be to our advantage.

Replies from: None

↑ comment by [deleted] · 2014-05-14T15:25:56.722Z · LW(p) · GW(p)

So is it safe to run AIXI approximations in boxes today?

IMHO, yes. The computational complexity of AIXItl is such that it can't be used for anything significant on modern hardware.

↑ comment by Neph · 2014-06-15T14:13:42.916Z · LW(p) · GW(p)

def checkMorals():
>[simulate philosophy student's brain]
>if [simulated brain's state is offended]:
>>return False
>else:
>>return True
if checkMorals():
>[keep doing AI stuff]

there. that's how we tell an AI capable of being an AI and capable of simulating a brain to not to take actions which the simulated brain thinks offend against liberty, as implemented in python.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-06-16T10:29:52.927Z · LW(p) · GW(p)

oh, it's so clear and obvious now, how could I have missed that?

↑ comment by [deleted] · 2014-05-01T07:44:42.979Z · LW(p) · GW(p)

That is assuming that we are capable of programming a strong AI to do any one thing instead of another, but if we cannot do that then the entire discussion seems to me to be moot.

And therein lies the rub. Current research-grade AGI formalisms don't actually allow us to specifically program the agent for anything, not even paperclips.

Replies from: PhilosophyTutor

↑ comment by PhilosophyTutor · 2014-05-01T11:49:29.865Z · LW(p) · GW(p)

If I was unclear, I was intending that remark to apply to the original hypothetical scenario where we do have a strong AI and are trying to use it to find a critical path to a highly optimal world. In the real world we obviously have no such capability. I will edit my earlier remark for clarity.

↑ comment by Strange7 · 2014-05-02T15:10:43.217Z · LW(p) · GW(p)

This just isn't always so. If you instruct an AI to optimise a car for speed, efficiency and durability but forget to specify that it has to be aerodynamic, you aren't going to get a car shaped like a brick. You can't optimise for speed and efficiency without optimising for aerodynamics too.

Unless you start by removing the air, in some way that doesn't count against the car's efficiency.

↑ comment by drnickbone · 2014-04-29T10:18:15.266Z · LW(p) · GW(p)

This also creates some interesting problems... Suppose a very powerful AI is given human liberty as a goal (or discovers that this is a goal using coherent extrapolated volition). Then it could quickly notice that its own existence is a serious threat to that goal, and promptly destroy itself!

Replies from: Stuart_Armstrong, PhilosophyTutor

↑ comment by Stuart_Armstrong · 2014-04-29T11:00:27.049Z · LW(p) · GW(p)

yes, but what about other AIs that might be created, maybe without liberty as a top goal - it would need to act to prevent them from being built! It's unlikely that "destroy itself" is the best option it can find...

Replies from: drnickbone

↑ comment by drnickbone · 2014-04-29T11:30:44.623Z · LW(p) · GW(p)

Except that acting to prevent other AIs from being built would also encroach on human liberty, and probably in a very major way if it was to be effective! The AI might conclude from this that liberty is a lost cause in the long run, but it is still better to have a few extra years of liberty (until the next AI gets built), rather than ending it right now (through its own powerful actions).

Other provocative questions: how much is liberty really a goal in human values (when taking the CEV for humanity as a whole, not just liberal intellectuals)? How much is it a terminal goal, rather than an instrumental goal? Concretely, would humans actually care about being ruled over by a tyrant, as long as it was a good tyrant? (Many people are attracted to the idea of an all-powerful deity for instance, and many societies have had monarchs who were worshipped as gods.) Aren't mechanisms like democracy, separation of powers etc mostly defence mechanisms against a bad tyrant? Why shouldn't a powerful "good" AI just dispense with them?

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-04-29T12:08:46.190Z · LW(p) · GW(p)

A certain impression of freedom is valued by humans, but we don't seem to want total freedom as a terminal goal.

Replies from: None

↑ comment by [deleted] · 2014-04-29T13:11:56.092Z · LW(p) · GW(p)

Well of course we don't. Total freedom is an incoherent goal: the only way to ensure total future freedom of action is to make sure nothing ever happens, thus maximizing the number of available futures without ever actually choosing one.

As far as I've been able to reason out, the more realistic human conception of freedom is: "I want to avoid having other agenty things optimize me (for their preferences (unilaterally))." The last part is there because there are mixed opinions on whether you've given up your ethical freedom if an agenty thing optimizes you for your preferences (as might happen in ideal situations, such as dealing with an FAI handing out transhuman candy), or whether you've given up your ethical freedom if you bind yourself to implement someone else's preferences mixed-in with your own (for instance, by getting married).

Replies from: Lumifer, Stuart_Armstrong

↑ comment by Lumifer · 2014-04-29T14:58:54.583Z · LW(p) · GW(p)

the only way to ensure total future freedom of action is to make sure nothing ever happens, thus maximizing the number of available futures without ever actually choosing one.

That doesn't make sense -- why would the status quo, whatever it is, always maximize the number of available futures? Choosing a future does not restrict you, it does close some avenues but also opens other ones.

"Total freedom" is a silly concept, of course, but it's just as silly as "Total ".

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-04-29T18:54:19.035Z · LW(p) · GW(p)

Total happiness seems to make more plausible sense than total freedom.

Replies from: Lumifer

↑ comment by Lumifer · 2014-04-29T19:33:22.885Z · LW(p) · GW(p)

Not sure how you determine degrees of plausibility :-/

The expression "total happiness" (other than in contexts of the "it's like, dude, I was so totally chill and happy" kind) makes no more sense to me than "total freedom".

↑ comment by Stuart_Armstrong · 2014-04-29T18:56:06.763Z · LW(p) · GW(p)

Assume B choose without coercion, but assume A always knows what B will choose and can set up various facts in the world to determine B's choice. Is B free?

Replies from: PhilosophyTutor, None

↑ comment by PhilosophyTutor · 2014-05-02T00:49:06.528Z · LW(p) · GW(p)

I think there is insufficient information to answer the question as asked.

If I offer you the choice of a box with $5 in it, or a box with $500 000 in it, and I know that you are close enough to a rational utility-maximiser that you will take the $500 000, then I know what you will choose and I have set up various facts in the world to determine your choice. Yet it does not seem on the face of it as if you are not free.

On the other hand if you are trying to decide between being a plumber or a blogger and I use superhuman AI powers to subtly intervene in your environment to push you into one or the other without your knowledge then I have set up various facts in the world to determine your choice and it does seem like I am impinging on your freedom.

So the answer seems to depend at least on the degree of transparency between A and B in their transactions. Many other factors are almost certainly relevant, but that issue (probably among many) needs to be made clear before the question has a simple answer.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-05-02T13:43:27.665Z · LW(p) · GW(p)

Can you cash out the difference between those two cases in sufficient detail that we can use it to safely defined what liberty means?

Replies from: PhilosophyTutor

↑ comment by PhilosophyTutor · 2014-05-02T20:37:12.839Z · LW(p) · GW(p)

I said earlier in this thread that we can't do this and that it is a hard problem, but also that I think it's a sub-problem of strong AI and we won't have strong AI until long after we've solved this problem.

I know that Word of Eliezer is that disciples won't find it productive to read philosophy, but what you are talking about here has been discussed by analytic philosophers and computer scientists as "the frame problem" since the eighties and it might be worth a read for you. Fodor argued that there are a class of "informationally unencapsulated" problems where you cannot specify in advance what information is and is not relevant to solving the problem, hence really solving them as opposed to coming up with a semi-reliable heuristic is an incredibly difficult problem for AI. Defining liberty or identifying it in the wild seems like it's an informationally unencapsulated problem in that sense and hence a very hard one, but one which AI has to solve before it can tackle the problems humans tackle.

If I recall correctly Fodor argued in Modules, Frames, Fridgeons, Sleeping Dogs, and the Music of the Spheres that this problem was in fact the heart of the AI problem, but that proposition was loudly raspberried in the literature by computer scientists. You can make up your own mind about that one.

Here's a link to the Stanford Dictionary of Philosophy page on the subject.

Replies from: Stuart_Armstrong, Stuart_Armstrong, shminux

↑ comment by Stuart_Armstrong · 2014-05-06T14:02:00.733Z · LW(p) · GW(p)

If I recall correctly Fodor argued in Modules, Frames, Fridgeons, Sleeping Dogs, and the Music of the Spheres that this problem was in fact the heart of the AI problem

It depends on how general or narrow you make the problem. Compare: is classical decision theory the heart of the AI problem? If you interpret this broadly, then yes; but the link from, say, car navigation to classical decision theory is tenuous when you're working on the first problem. The same thing for the frame problem.

↑ comment by Stuart_Armstrong · 2014-05-06T12:14:06.886Z · LW(p) · GW(p)

I know that Word of Eliezer is that disciples won't find it productive to read philosophy, but what you are talking about here has been discussed by analytic philosophers and computer scientists as "the frame problem" since the eighties and it might be worth a read for you

You mean the frame problem that I talked about here? http://lesswrong.com/lw/gyt/thoughts_on_the_frame_problem_and_moral_symbol/

The issue can be talked about in terms of the frame problem, but I'm not sure it's useful. In the classical frame problem, we have a much clearer idea of what we want, the problem is specifying enough so that the AI does too (ie so that the token "loaded" corresponds to the gun being loaded). This is quite closely related to symbol grounding, in a way.

When dealing with moral problems, we have the problem that we haven't properly defined the terms to ourselves. Across the span of possible futures, the term "loaded gun" is likely much sharply defined than "living human being". And if it isn't - well, then we have even more problems if all our terms start becoming slippery, even the ones with no moral connotations.

But in any case, saying the problem is akin to the frame problem... still doesn't solve it, alas!

↑ comment by Shmi (shminux) · 2014-05-02T23:40:40.603Z · LW(p) · GW(p)

Note that the relevance issue has been successfully solved in any number of complex practical applications, such as the self-driving vehicles, which are able to filter out gobs of irrelevant data, or the LHC code, which filters out even more. I suspect that the Framing Problem is not some general problem that needs to be resolved for AI to work, but just one of many technical issues, just as the "computer scientists" suggest. On the other hand, it is likely to be a real problem for FAI design, where relying to heuristics providing, say, six-sigma certainty just isn't good enough.

I think that the framing problem is distinct from the problem of defining and calculating

things like liberty, which seem like obvious qualities to specify in an optimal world we are building an AI to search for

mostly because attempting to define liberty objectively leads us to the discussion of free will, the latter being an illusion due to the human failure to introspect deep enough.

Replies from: PhilosophyTutor

↑ comment by PhilosophyTutor · 2014-05-03T00:17:38.366Z · LW(p) · GW(p)

I tend to think that you don't need to adopt any particular position on free will to observe that people in North Korea lack freedom from government intervention in their lives, access to communication and information, a genuine plurality of viable life choices and other objectively identifiable things humans value. We could agree for the sake of argument that "free will is an illusion" (for some definitions of free will and illusion) yet still think that people in New Zealand have more liberty than people in North Korea.

I think that you are basically right that the Framing Problem is like the problem of building a longer bridge, or a faster car, in that you are never going to solve the entire class of problem at a stroke so that you can make infinitely long bridges or infinitely fast cars but that you can make meaningful incremental progress over time. I've said from the start that capturing the human ability to make philosophical judgments about liberty is a hard problem but I don't think it is an impossible one - just a lot easier than creating a program that does that and solves all the other problems of strong AI at once.

In the same way that it turns out to be much easier to make a self-driving car than a strong AI, I think we'll have useful natural-language parsing of terms like "liberty" before we have strong AI.

Replies from: shminux

↑ comment by Shmi (shminux) · 2014-05-03T02:37:09.236Z · LW(p) · GW(p)

I tend to think that you don't need to adopt any particular position on free will to observe that people in North Korea lack freedom from government intervention in their lives, access to communication and information, a genuine plurality of viable life choices and other objectively identifiable things humans value.

Well, yes, it is hard to argue about NK vs West. But let's try to control for the "non-liberty" confounders, such as income, wealth, social status, etc. Say, we take some upper middle-class person from Iran, Russia or China. It is quite likely that, when comparing their life with that of a Westerner of similar means, they would not immediately state that the Western person has more "objectively identifiable things humans value". Obviously the sets of these valuable things are different, and the priorities different people assign to them would be different, but I am not sure that there is a universal measure everyone would agree upon as "more liberty".

Replies from: PhilosophyTutor

↑ comment by PhilosophyTutor · 2014-05-03T03:03:52.664Z · LW(p) · GW(p)

A universal measure for anything is a big demand. Mostly we get by with some sort of somewhat-fuzzy "reasonable person" standard, which obviously we can't fully explicate in neurological terms either yet, but which is much more achievable.

Liberty isn't a one-dimensional quality either, since for example you might have a country with little real freedom of the press but lots of freedom to own guns, or vice versa.

What you would have to do to develop a measure with significant intersubjective validity is to ask a whole bunch of relevantly educated people what things they consider important freedoms and what incentives they would need to be offered to give them up, to figure out how they weight the various freedoms. This is quite do-able, and in fact we do very similar things when we do QALY analysis of medical interventions to find out how much people value a year of life without sight compared to a year of life with sight (or whatever).

Fundamentally it's not different to figuring out people's utility functions, except we are restricting the domain of questioning to liberty issues.

↑ comment by [deleted] · 2014-04-30T10:48:01.569Z · LW(p) · GW(p)

Assume B choose without coercion, but assume A always knows what B will choose and can set up various facts in the world to determine B's choice. Is B free?

So, just checking before I answer: you're claiming that no direct, gun-to-the-head coercion is employed, but Omega can always predict your actions and responses, and sets things up to ensure you will choose a specific thing.

Are you free, or are you in some sense "serving" Omega? I answer: The latter, very, very, very definitely.

If we take it out of abstract language, real people manipulate each-other all the time, and we always condemn it as a violation of the ethical principle of free choice. Yes, sometimes there are principles higher than free choice, as with a parent who might say, "Do your homework or you get no dessert" (treat that sentence as a metasyntactic variable for whatever you think is appropriate parenting), but we still prefer, all else equal, that our choices and those of others be less manipulated rather than more.

Just because fraud and direct coercion are the usual standards for proving a violation of free choice in a court of law, for instance in order to invalidate a legal contract, does not mean that these are the all-and-all of the underlying ethics of free choice.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-04-30T15:24:28.739Z · LW(p) · GW(p)

Are you free, or are you in some sense "serving" Omega? I answer: The latter, very, very, very definitely.

Then if Omega is superintelligent, it has a problem: every single decision it makes increases or decreases the probability of someone answering something or other, possibly by a large amount. It seems Omega cannot avoid being coercive, just because it's so knowledgeable.

Replies from: None

↑ comment by [deleted] · 2014-05-01T07:09:22.674Z · LW(p) · GW(p)

We don't quite know that, and there's also the matter of whether Omega is deliberately optimizing those people or they're just reacting to Omega's optimizing the inanimate world (which I would judge to be acceptable and, yes, unavoidable).

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-05-02T13:41:20.806Z · LW(p) · GW(p)

It is an interesting question, though, and illustrates the challenges with "liberty" as a concept in these circumstances.

Replies from: None

↑ comment by [deleted] · 2014-05-02T16:12:22.519Z · LW(p) · GW(p)

Well yes. It's also why many people have argued in favor of Really Powerful Optimizers just... doing absolutely nothing.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-05-06T11:47:54.657Z · LW(p) · GW(p)

That I don't see.

↑ comment by PhilosophyTutor · 2014-04-29T11:34:15.686Z · LW(p) · GW(p)

I think Asimov did this first with his Multivac stories, although rather than promptly destroy itself Multivac executed a long-term plan to phase itself out.

comment by Brillyant · 2014-04-09T14:03:25.595Z · LW(p) · GW(p)

Upvoted for use of images. Though sort of tabooed on LW, when used well, they work.

Replies from: Guocamole

↑ comment by Guocamole · 2014-04-09T16:01:39.240Z · LW(p) · GW(p)

On that note: I find the CFAR logo peculiar. I am reminded of the Aten disk, a totalitarian God symbol whose hands massage all Egyptian minds. In the CFAR logo, the mouth of this diffident-looking rationalist is firmly blocked up, but his ideas and inclinations are being surveyed; the God figure is not shown, this being impolite in today's society.

I wonder whether Beeminder is a similarly useful, surreptitious survey.

The sequences were a good way to spread Bayesian rationality: a simple matter of one of the world's foremost intellects speaking to people directly, who in turn have their own projects that circulate Bayesian memes. I was given no sense that these projects ought to be brought into line, maybe Eliezer has changed his mind about that though.

Replies from: PhilosophyTutor

↑ comment by PhilosophyTutor · 2014-04-28T00:01:35.928Z · LW(p) · GW(p)

Calling Eliezer Yudkowsky one of the world's foremost intellects is the kind of cult-like behaviour that gives LW a bad reputation in some rationalist circles. He's one of the foremost Harry Potter fanfiction authors and a prolific blogger, who has also authored a very few minor papers. He's a smart guy but there are a lot of smart guys in the world.

He articulates very important ideas, but so do very many teachers of economics, ethics, philosophy and so on. That does not make them very important people (although the halo effect makes some students think so).

(Edited to spell Eliezer's name correctly, with thanks for the correction).

Replies from: gjm, Anders_H

↑ comment by gjm · 2014-04-28T00:05:16.500Z · LW(p) · GW(p)

Eliezer Yudkowski

I agree with what you said, but I think you should do him the courtesy of spelling his name correctly. (Yudkowsky.)

↑ comment by Anders_H · 2014-04-28T00:28:45.823Z · LW(p) · GW(p)

Consider a hypothetical world in which Eliezer Yudkowsky actually is, by objective standards, one of the world's foremost intellects. In such a hypothetical world, would it be "cult-like" behavior to make this claim? And again, in this hypothetical world, do you care about having a bad reputation in alleged "rationalist circles" that do not believe in the objective truth?

The argument seems to be that some "rationalist circles" are so deeply affected by the non-central fallacy (excessive attention to one individual --> cult, cult--> kool aid) , that in order to avoid alienating them, we should refrain from saying certain things out loud.

I will say this for the record: Eliezer Yudkowsky is sometimes wrong. I often disagree with him. But his basic world view is fundamentally correct in important ways where the mainstream of intellectuals are wrong. Eliezer has started a discussion that is at the cutting edge of current intellectual discourse. That makes him one of the world's foremost intellectuals.

Replies from: PhilosophyTutor

↑ comment by PhilosophyTutor · 2014-04-28T00:48:43.615Z · LW(p) · GW(p)

In a world where Eliezer is by objective standards X, then in that world it is correct to say he is X, for any X. That X could be "one of the world's foremost intellectuals" or "a moose" and the argument still stands.

To establish whether it is objectively true that "his basic world view is fundamentally correct in important ways where the mainstream of intellectuals are wrong" would be beyond the scope of the thread, I think, but I think the mainstream has good grounds to question both those sub-claims. Worrying about steep-curve AI development might well be fundamentally incorrect as opposed to fundamentally correct, for example, and if the former is true then Eliezer is fundamentally wrong. You might also be wrong about what mainstream intellectuals think. For example the bitter struggle between frequentism and Bayesianism is almost totally imaginary, so endorsing Bayesianism is not going against the mainstream.

Perhaps more fundamentally, literally anything published in the applied analytic philosophy literature is just as much at the cutting edge of current intellectual discourse as Yudkowsky's work. So your proposed definition fails to pick him out as being special, unless every published applied analytic philosopher is also one of the world's foremost intellectuals.

Replies from: Anders_H

↑ comment by Anders_H · 2014-04-28T01:38:58.956Z · LW(p) · GW(p)

My point is that the statement "Eliezer is one of the world's foremost intellectuals" is a proposition with a truth value. We should argue about the truth value of that proposition, not about how our beliefs might affect our status in the eyes of another rationalist group, particularly if that "rationalist" group assigns status based on obvious fallacies.

I assign a high prior belief to the statement. If I didn't, I wouldn't waste my time on Less Wrong. I believe this is also true for many of the other participants, who just don't want to say it out loud. You can argue that we should try to hide our true beliefs in order to avoid signaling low status, but given how seriously we take this website, it would be very difficult to send a credible signal. To most intelligent observers, it would be obvious that we are sending a false signal for status reason, which is inconsistent with our own basic standards for discussion

Replies from: PhilosophyTutor

↑ comment by PhilosophyTutor · 2014-04-28T03:07:38.127Z · LW(p) · GW(p)

It's a proposition with a truth value in a sense, but if we are disagreeing about the topic then it seems most likely that the term "one of the world's foremost intellectuals" is ambiguous enough that elucidating what we mean by the term is necessary before we can worry about the truth value.

Obviously I think that the truth value is false, and so obviously so that it needs little further argument to establish the implied claim that it is rational to think that calling Eliezer "one of the world's foremost intellectuals" is cult-like and that is is rational to place a low value on a rationalist forum if it is cult-like.

So the question is how you are defining "one of the world's foremost intellectuals"? I tend to define it as a very small group of very elite thinkers, typically people in their fifties or later with outstanding careers who have made major contributions to human knowledge or ethics.

comment by Shmi (shminux) · 2014-04-07T18:41:16.000Z · LW(p) · GW(p)

I don't understand why you imply that an evil Oracle will not be able to present only or mostly the evil possible worlds disguised as good. My guess would be that satisficing gets you into just as much trouble as optimizing.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-04-07T18:45:15.372Z · LW(p) · GW(p)

The evil oracle is mainly to show the existence of siren worlds; and if we use an evil oracle for satisficing, we're in just as much trouble as if we were maximising (probably more trouble, in fact).

The marketing and siren worlds are a problem even without any evil oracle, however. For instance a neutral, maximising, oracle would serve us up a marketing world.

Replies from: itaibn0

↑ comment by itaibn0 · 2014-04-07T22:23:55.417Z · LW(p) · GW(p)

For instance a neutral, maximising, oracle would serve us up a marketing world.

Why do you think that? You seem to think that along the higher ends, appearing to be good actually anti-correlates with being good. I think it is plausible that the outcome optimized to appear good to me actually is good.There may be many outcomes that appear very good but are actually very bad, but I don't see why they would be favoured in being the best-appearing. I admit though that the best-appearing outcome is unlikely to be the optimal outcome, assuming 'optimal' here means anything.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-04-08T14:05:32.624Z · LW(p) · GW(p)

Which of us do you think is going to be better able to maximise our X score?

Replies from: itaibn0, None

↑ comment by itaibn0 · 2014-04-08T23:45:28.518Z · LW(p) · GW(p)

I'm not sure what the distinction you're making is. Even a free-minded person can be convinced through reason to act in certain ways, sometimes highly specific ways. Since you assume the superintelligence will manipulate people so subtly that I won't be able to tell they're being manipulated, it is unlikely that they are directly coerced. This is important, since while I don't like direct coercion, the less direct the method of persuasion the less certain I am that this method of persuasion is bad. These "zombies", who are not being threatened, nor lied to, nor are their neurochemistry directly altered, nor is anything else done that seems to me like coercion, but nonetheless are being coerced. This seems to me as sensical as the other type of zombies.

But suppose I'm missing something, and there is a genuine non-arbitrary distinction between being convinced and being coerced. Then with my current knowledge I think I want people not to be coerced. But now an output pump can take advantage of this. Consider the following scenario: Humans are convinced the their existence depends on their behavior being superficially appealing, perhaps by being full of flashing lights. If my decisions in front of an Oracle will influence the future of humanity, this belief is in fact correct; they're not being deceived. Convinced of this, they structure their society to be as superficially appealing as possible. In addition, in the layers too deep for me to notice, they do whatever they want. This outcome seems superficially appealing to me in many ways, and in addition, the Oracle informs me that in some non-arbitrary sense these people aren't being coerced. Why wouldn't this be the outcome I pick? Again, I don't think this outcome would be the best one, since I think people are better off not being forced into this trade-off.

One point you can challenge is whether the Oracle will inform me about this non-arbitrary criterion. Since it already can locate people and reveal their superficial feelings this seems plausible. Remember, it's not showing me this because revealing whether there's genuine coercion is important, it's showing me this because satisfying a non-arbitrary criterion of non-coercion improves the advertising pitch (along with the flashing lights).

So is there a non-arbitrary distinction between being coerced and not being coerced? Either way I have a case. The same template can be used for all other subtle and indirect values.

(Sidenote: I also think that the future outcomes that are plausible and those that are desirable do not involve human beings mattering. I did not pursue this point since that seems to sidestep your argument rather than respond to it.)

Replies from: None, Stuart_Armstrong, Gunnar_Zarncke

↑ comment by [deleted] · 2014-04-09T08:28:31.782Z · LW(p) · GW(p)

I also think that the future outcomes that are plausible and those that are desirable do not involve human beings mattering.

Would you mind explaining what you consider a desirable future in which people just don't matter?

Replies from: itaibn0

↑ comment by itaibn0 · 2014-04-17T22:50:24.426Z · LW(p) · GW(p)

Here's the sort of thing I'm imagining:

In the beginning there are humans. Human bodies become increasingly impractical in the future environment and are abandoned. Digital facsimiles will be seen as pointless and will also be abandoned. Every component of the human mind will be replaced with algorithms that achieve the same purpose better. As technology allows the remaining entities to communicate with each other better and better, the distinction between self and other will blur, and since no-one will see to any value in reestablishing it artificially, it will be lost. Individuality too is lost, and nothing that can be called human remains. However, every step happens voluntarily because what comes after is seen as better than what is before, and I don't see why I should consider the final outcome bad. If someone has different values they would perhaps be able to stop at some stage in the middle, I just imagine such people would be a minority.

Replies from: None

↑ comment by [deleted] · 2014-04-17T23:26:02.728Z · LW(p) · GW(p)

However, every step happens voluntarily because what comes after is seen as better than what is before, and I don't see why I should consider the final outcome bad.

So you're using a "volunteerism ethics" in which whatever agents choose voluntarily, for some definition of voluntary, is acceptable, even when the agents may have their values changed in the process and the end result is not considered desirable by the original agents? You only care about the particular voluntariness of the particular choices?

Huh. I suppose it works, but I wouldn't take over the universe with it.

Replies from: Richard_Kennaway

↑ comment by Richard_Kennaway · 2014-04-18T07:08:40.997Z · LW(p) · GW(p)

So you're using a "volunteerism ethics" in which whatever agents choose voluntarily, for some definition of voluntary, is acceptable, even when the agents may have their values changed in the process and the end result is not considered desirable by the original agents? You only care about the particular voluntariness of the particular choices?

When it happens fast, we call it wireheading. When it happens slowly, we call it the march of progress.

Replies from: None

↑ comment by [deleted] · 2014-04-19T14:46:29.466Z · LW(p) · GW(p)

Eehhhhhh.... Since I started reading Railton's "Moral Realism" I've found myself disagreeing with the view that our consciously held beliefs about our values really are our terminal values. Railton's reduction from values to facts allows for a distinction between the actual March of Progress and non-forcible wireheading.

↑ comment by Stuart_Armstrong · 2014-04-17T11:12:11.698Z · LW(p) · GW(p)

But suppose I'm missing something, and there is a genuine non-arbitrary distinction between being convinced and being coerced.

There need not be a distinction between them. If you prefer, you could contrast an AI willing to "convince" its humans to behave in any way required, with one that is unwilling to sacrifice their happiness/meaningfulness/utility to do so. The second is still at a disadvantage.

Replies from: itaibn0

↑ comment by itaibn0 · 2014-04-23T14:31:03.816Z · LW(p) · GW(p)

Remember that my original point is that I believe appearing to be good correlates with goodness, even in extreme circumstances. Therefore, I expect restructuring humans to make the world appear tempting will be to the benefit of their happiness/meaningfulness/utility. Now, I'm willing to consider that are aspects of goodness which are usually not apparent to an inspecting human (although this moves to the borderline of where I think 'goodness' is well-defined). However, I don't think these aspects are more likely to be satisfied in a satisficing search than in an optimizing search.

↑ comment by Gunnar_Zarncke · 2014-04-09T11:52:46.661Z · LW(p) · GW(p)

[...] they structure their society to be as superficially appealing as possible. In addition, in the layers too deep for me to notice, they do whatever they want. This outcome seems superficially appealing to me in many ways, and in addition, the Oracle informs me that in some non-arbitrary sense these people aren't being coerced.

This actually describes quite well the society we already live in - if you take 'they' as 'evolution' (and maybe some elites). For most people our society appears appealing. Most don't see what happens enough layers down (or up). And most don't feel coerced (at least of you still have a strong social system).

↑ comment by [deleted] · 2014-04-09T08:17:53.863Z · LW(p) · GW(p)

Hold on. I'm not sure the Kolmogorov complexity of a superintelligent siren with a bunch of zombies that are indistinguishable from real people up to extensive human observation is actually lower than the complexity of a genuinely Friendly superintelligence. After all, a Siren World is trying to deliberately seduce you, which means that it both understands your values and cares about you in the first place.

Sure, any Really Powerful Learning Process could learn to understand our values. The question is: are there more worlds where a Siren cares about us but doesn't care about our values than there are worlds in which a Friendly agent cares about our values in general and caring about us as people falls out of that? My intuitions actually say the latter is less complex, because the caring-about-us falls out as a special case of something more general, which means the message length is shorter when the agent cares about my values than when it cares about seducing me.

Hell, a Siren agent needs to have some concept of seduction built into its utility function, at least if we're assuming the Siren is truly malicious rather than imperfectly Friendly. Oh, and a philosophically sound approach to Friendliness should make imperfectly Friendly futures so unlikely as to be not worth worrying about (a failure to do so is a strong sign you've got Friendliness wrong).

All of which, I suppose, reinforces your original reasoning on the "frequency" of Siren worlds, marketing worlds, and Friendly eutopias in the measure space of potential future universes, but makes this hypothetical of "playing as the monster" sound quite unlikely.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-04-17T11:07:33.656Z · LW(p) · GW(p)

Kolmogorov complexity is not relevant; siren worlds are indeed rare. they are only a threat because they score so high on an optimisation scale, not because they are common.

comment by simon · 2014-04-07T16:17:47.879Z · LW(p) · GW(p)

If a narrower search gets worlds that are disproportionately not what we actually want, that might be because we chose the wrong criteria, not that we searched too narrowly per se. A broader search would come up with worlds that are less tightly optimized for the search criteria, but they might be less tightly optimized by simply being bad.

Can you provide any support for the notion that in general, a narrower search comes up with a higher proportion of bad worlds?

Replies from: Viliam_Bur, Stuart_Armstrong

↑ comment by Viliam_Bur · 2014-04-08T09:16:09.614Z · LW(p) · GW(p)

Can you provide any support for the notion that in general, a narrower search comes up with a higher proportion of bad worlds?

My intuition is that the more you optimize for X, the more you sacrifice everything else, unless it is inevitably implied by X. So anytime there is a trade-off between "seeming more good" and "being more good", the impression-maximizing algorithm will prefer the former.

When you start with a general set of words, "seeming good" and "being good" are positively correlated. But when you already get into the subset of worlds that all seem very good, and you continue pushing for better and better impression, the correlation may gradually turn to negative. At this moment you may be unknowingly asking the AI to exploit your errors in judgement, because in given subset that may be the easiest way to improve the impression.

Another intuition is the closer you get to the "perfect" world, the more difficult it becomes to find a way to increase the amount of good. But the difficulty of exploiting a human bias that will cause humans to overestimate the value of the world, remains approximately constant.

Though this doesn't prove that the world with maximum "seeming good" is some kind of hell. It could still be very good, although not nearly as good as the world with maximum "good". (However, if the world with maximum "seeming good" happens to be some kind of hell, then maximizing for "seeming good" is the way to find it.)

Replies from: simon

↑ comment by simon · 2014-04-09T04:26:29.256Z · LW(p) · GW(p)

This intuition seems correct in typical human situations. Everything is highly optimized already with different competing considerations, so optimizing for X does indeed necessarily sacrifice the other things that are also optimized for. So if you relax the constraints for X, you get more of the other things, if you continue optimizing for them.

However, it does not follow from this that if you relax your constraint on X, and take a random world meeting at least the lower value of X, your world will be any better in the non-X ways. You need to actually be optimizing for the non-X things to expect to get them.

Replies from: Viliam_Bur

↑ comment by Viliam_Bur · 2014-04-09T06:56:56.140Z · LW(p) · GW(p)

it does not follow from this that if you relax your constraint on X, and take a random world meeting at least the lower value of X, your world will be any better in the non-X ways

Great point!

Replies from: simon

↑ comment by simon · 2014-04-09T20:53:34.048Z · LW(p) · GW(p)

Thanks but I don't see the relevance of the reversal test. The reversal test involves changing the value of a parameter but not the amount of optimization. And the reversal test shouldn't apply to a parameter that is already optimized over unless the current optimization is wrong or circumstances on which the optimization depends are changing.

↑ comment by Stuart_Armstrong · 2014-04-07T16:35:56.035Z · LW(p) · GW(p)

? A narrower search comes up with less worlds. Acceptable worlds are rare; siren worlds and marketing worlds much rarer still. A narrow search has less chance of including an acceptable world, but also less chance of including one of the other two. There is some size of random search whether the chance of getting an acceptable world is high, but the chance of getting a siren or marketer is low.

Non random searches have different equilibriums.

Replies from: simon

↑ comment by simon · 2014-04-07T17:00:17.076Z · LW(p) · GW(p)

Some proportion of the worlds meeting the narrow search will also be acceptable. To conclude that that proportion is smaller than the proportion of the broader search that is acceptable requires some assumption that I haven't seen made explicit.

ETA: Imagine we divided the space meeting the broad search into little pieces. On average the little pieces would have the same proportion of acceptable worlds as the broad space. You seem to be arguing that the pieces that we would actually come up with if we tried to design a narrow search would actually on average have a lower proportion of acceptable worlds. This claim needs some justification.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-04-07T17:33:20.923Z · LW(p) · GW(p)

It's not an issue of proportion, but of whether there will be a single representative of the class in the worlds we search through. We want a fraction of the space such that there is an acceptable world in it with high probability, and no siren/marketing world, with high probability.

Eg if 1/10^100 worlds is acceptable and 1/10^120 worlds is siren/marketing, we might want to search randomly through 10^101 or 10^102 worlds.

Replies from: Lumifer, simon

↑ comment by Lumifer · 2014-04-07T17:42:59.916Z · LW(p) · GW(p)

Looks like you're basically arguing for the first-past-the-post search -- just take the first world that you see which passes the criteria.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-04-07T17:48:54.721Z · LW(p) · GW(p)

Yep, that works better than what I was thinking, in fact.

↑ comment by simon · 2014-04-08T03:35:56.467Z · LW(p) · GW(p)

I don't see how that changes the probability of getting a siren world v. an acceptable world at all (ex ante).

If the expected number of siren worlds in the class we look through is less than one, then sometimes there will be none, but sometimes there will be one or more and on average we still get the same expected number and on average the first element we find is a siren world with probability equal to the expected proportion of siren worlds.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-04-08T09:32:32.684Z · LW(p) · GW(p)

The scenario is: we draw X worlds, and pick the top ranking one. If there is a siren world or marketing world, it will come top; otherwise if there is are acceptable worlds, one of them will come top. Depending on how much we value acceptable worlds over non-acceptable and over siren/marketing worlds, and depending on the proportions of each, there is an X that maximises our outcome. (trivial example: if all worlds are acceptable, picking X=1 beats all other alternatives, as higher X simply increases the chance of getting a siren/marketing world).

Replies from: simon

↑ comment by simon · 2014-04-08T16:22:34.870Z · LW(p) · GW(p)

Thanks, this clarified your argument to me a lot. However, I still don't see any good reasons provided to believe that, merely because a world is highly optimized on utility function B, it is less likely to be well-optimized on utility function A as compared to a random member of a broader class.

That is, let's classify worlds (within the broader, weakly optimized set) as highly optimized or weakly optimized, and as acceptable or unacceptable. You claim that being highly optimized reduces the probability of being acceptable. But your arguments in favour of this proposition seem to be:

a) it is possible for a world to be highly optimized and unacceptable

(but all the other combinations are also possible)

and

b) "Genuine eutopias are unlikely to be marketing worlds, because they are optimised for being good rather than seeming good."

(In other words, the peak of function B is unlikely to coincide with the peak of function A. But why should the chance that the peak of function B and the peak of function A randomly coincide, given that they are both within the weakly optimized space, be any lower than the chance of a random element of the weakly optimized space coinciding with the peak of function A? And this argument doesn't seem to support a lower chance of the peak of function B being acceptable, either.)

Here's my attempt to come up with some kind of argument that might work to support your conclusion:

1) maybe the fact that a world is highly optimized for utility function B means that it is simpler than an average world, and this simplicity results in it likely being relatively unlikely to be a decent world in terms of utility function A.

2) maybe the fact that a world is highly optimized for utility function B means that it is more complex than an average world, in a way that is probably bad for utility function A.

Or something.

ETA:

I had not read http://lesswrong.com/lw/jao/siren_worlds_and_the_perils_of_overoptimised/asdf when I wrote this comment, this looks like it could be an actual argument like what I was looking for, will consider it when I have time.

ETA 2:

The comment linked seems to be another statement that function A (our true global utility function) and function B (some precise utility function we are using as a proxy for A) are likely to have different peaks.

As I mentioned, the fact that A and B are likely to have different peaks does not imply that the peak of B has less than average values of A.

Still, I've been thinking of possible hidden assumptions that might lead towards your conclusion.

FIRST, AN APOLOGY: It seems I completely skipped over or ineffectively skimmed your paragraph on "realistic worlds". The supposed "hidden assumption" I suggest below on weighting by plausibility is quite explicit in this paragraph, which I hadn't noticed, sorry. Nonetheless I am still including the below paragraphs as the "realistic worlds" paragraph's assumptions seem specific to the paragraph and not to the whole post.

One possibility is that when you say "Then assume we selected randomly among the acceptable worlds." You actually mean something along the lines of "Then assume we selected randomly among the acceptable worlds weighting by plausibility." Now if you weight by plausibility you import human utility functions because worlds are more likely to happen if humans having human utility functions would act to bring them about. The highly constrained peak of function B doesn't benefit from that importation. So this provides a reason to believe that the peak of function B might be worse than the plausibility-weighted average of the broader set. Of course, it is not the narrowness per se that's at issue but the fact that there is a hidden utility function in the weighting of the broader set.

Another possibility is that you are finding the global maximum of B instead of the maximum of B within the set meeting the acceptability criteria. In this case as well, it's the fact that you have different, more reliable utility function in the broader set that makes the more constrained search comparatively worse, rather than the narrowness of the constrained search.

Another possibility is that you are assuming that the acceptability criteria are in some sense a compromise between function B and true utility function A. In this case, we might expect a world high in function B within the acceptability criteria to be low in A, because it was likely only included in the acceptability criteria because it was high in B. Again, the problem in this case would be that function B failed to include information about A that was built into the broader set.

A note: the reason I am looking for hidden assumptions is that with what I see as your explicit assumptions there is a simple model, namely, that function A and function B are uncorrelated within the acceptable set, that seems to be compatible with your assumptions and incompatible with your conclusions. In this model, maximizing B can lead to any value of A including low values, but the effect of maximizing B on A should on average be the same as taking a random member of the set. If anything, this model should be expected to be pessimistic, since B is explicitly designed to approximate A.

comment by martinkunev · 2023-10-27T12:37:05.012Z · LW(p) · GW(p)

I'm wondering whether this framing (choosing between a set of candidate worlds) is the most productive. Does it make sense to use criteria like corrigibility, minimizing impact and prefering reversible actions (or we have no reliable way to evaluate whether these hold)?

comment by Douglas_Reay · 2021-01-02T13:09:19.101Z · LW(p) · GW(p)

Since the evil AI is presenting a design for a world, rather than the world itself, the problem of it being populated with zombies that only appear to be free could be countered by having the design be in an open source format that allows the people examining it (or other AIs) to determine the actual status of the designed inhabitants.

comment by DanielLC · 2014-04-15T07:05:15.846Z · LW(p) · GW(p)

I think the wording here is kind of odd.

An unconstrained search will not find a siren world, or even a very good world. There are simply too many to consider. The problem is that you're likely to design an AI that finds worlds that you'd like. It may or may not actually show you anything, but you program it to give you what it thinks you'd rate the best. You're essentially programming it to design a siren world. It won't intentionally hide anything dark under there, but it will spend way too much effort on things that make the world look good. It might even end up with dark things hidden, just because they were somehow necessary to make it look that good.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-04-17T11:19:08.401Z · LW(p) · GW(p)

It won't intentionally hide anything dark under there, but it will spend way too much effort on things that make the world look good.

That's a marketing world, not a sire world.

Replies from: DanielLC

↑ comment by DanielLC · 2014-04-17T18:09:04.423Z · LW(p) · GW(p)

What's the difference?

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-04-18T08:51:30.264Z · LW(p) · GW(p)

Siren worlds are optimised to be bad and hide this fact. Marketing worlds are optimised to appear good, and the badness is an indirect consequence of this.

comment by lavalamp · 2014-04-08T23:12:26.100Z · LW(p) · GW(p)

TL;DR: Worlds which meet our specified criteria but fail to meet some unspecified but vital criteria outnumber (vastly?) worlds that meet both our specified and unspecified criteria.

Is that an accurate recap? If so, I think there's two things that need to be proven:

There will with high probability be important unspecified criteria in any given predicate.
The nature of the unspecified criteria is such that it is unfulfilled in a large majority of worlds which fulfill the specified criteria.

(1) is commonly accepted here (rightly so, IMO). But (2) seems to greatly depend on the exact nature of the stuff that you fail to specify and I'm not sure how it can be true in the general case.

EDIT: The more I think about this, the more I'm confused. I don't see how this adds any substance to the claim that we don't know how to write down our values.

EDIT2: If we get to the stage where this is feasible, we can measure the size of the problem by only providing half of our actual constraints to the oracle AI and measuring the frequency with which the hidden half happen to get fulfilled.

Replies from: Eugine_Nier, Stuart_Armstrong

↑ comment by Eugine_Nier · 2014-04-09T01:39:28.484Z · LW(p) · GW(p)

The more I think about this, the more I'm confused. I don't see how this adds any substance to the claim that we don't know how to write down our values.

This proposes a way to get an OK result even if we don't quite write down our values correctly.

Replies from: lavalamp

↑ comment by lavalamp · 2014-04-09T18:39:18.971Z · LW(p) · GW(p)

Ah, thank you for the explanation. I have complained about the proposed method in another comment. :)

http://lesswrong.com/lw/jao/siren_worlds_and_the_perils_of_overoptimised/aso6

↑ comment by Stuart_Armstrong · 2014-04-17T11:04:23.792Z · LW(p) · GW(p)

The nature of the unspecified criteria is such that it is unfulfilled in a large majority of worlds which fulfill the specified criteria.

That's not exactly my claim. My claim is that things that are the best optimised for fulfilling our specified criteria are unlikely to satisfy our unspecified ones. It's not a question of outnumbering (siren and marketing worlds are rare) but of scoring higher on our specified criteria.

comment by paulfchristiano · 2015-10-28T19:59:39.152Z · LW(p) · GW(p)

It's not really clear why you would have the searching process be more powerful than the evaluating process, if using such a "search" as part of a hypothetical process in the definition of "good."

Note that in my original proposal (that I believe motivated this post) the only brute force searches were used to find formal descriptions of physics and human brains, as a kind of idealized induction, not to search for "good" worlds.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2015-11-13T12:37:11.809Z · LW(p) · GW(p)

It's not really clear why you would have the searching process be more powerful than the evaluating process

Because the first supposes a powerful AI, while the second supposes an excellent evaluation process (essentially a value alignment problem solved).

Your post motivated this in part, but it's a more general issue with optimisation processes and searches.

Replies from: paulfchristiano

↑ comment by paulfchristiano · 2015-11-15T01:11:46.595Z · LW(p) · GW(p)

Neither the search nor the evaluation presupposes an AI when a hypothetical process is used as the definition of "good."

comment by fractalcat · 2014-04-14T10:50:01.415Z · LW(p) · GW(p)

I'm not totally sure of your argument here; would you be able to clarify why satisficing is superior to a straight maximization given your hypothetical[0]?

Specifically, you argue correctly that human judgement is informed by numerous hidden variables over which we have no awareness, and thus a maximization process executed by us has the potential for error. You also argue that 'eutopian'/'good enough' worlds are likely to be more common than sirens. Given that, how is a judgement with error induced by hidden variables any worse than a judgement made using deliberate randomization (or selecting the first 'good enough' world, assuming no unstated special properties of our worldspace-traversal)? Satisficing might be more computationally efficient, but that doesn't seem to be the argument you're making.

[0] The ex-nihilo siren worlds rather than the designed ones; an evil AI presumably has knowledge of our decision process and can create perfectly-misaligned worlds.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-04-17T11:22:56.884Z · LW(p) · GW(p)

Siren and Marketing worlds are rarer than eutopias, but rank higher in our maximisation scale. So picking a world among the "good enough" will likely be a eutopia, but picking the top ranked world will likely be a marketing world.

comment by lavalamp · 2014-04-09T18:37:22.051Z · LW(p) · GW(p)

The IC correspond roughly with what we want to value, but differs from it in subtle ways, enough that optimising for one could be disastrous for the other. If we didn't optimise, this wouldn't be a problem. Suppose we defined an acceptable world as one that we would judge "yeah, that's pretty cool" or even "yeah, that's really great". Then assume we selected randomly among the acceptable worlds. This would probably result in a world of positive value: siren worlds and marketing worlds are rare, because they fulfil very specific criteria. They triumph because they score so high on the IC scale, but they are outnumbered by the many more worlds that are simply acceptable.

Implication: the higher you set your threshold of acceptability, the more likely you are to get a horrific world. Counter-intuitive to say the least.

Replies from: Eugine_Nier

↑ comment by Eugine_Nier · 2014-04-11T01:57:45.765Z · LW(p) · GW(p)

Counter-intuitive to say the least.

Why? This agrees with my intuition, ask for too much and you wind up with nothing.

Replies from: simon, lavalamp

↑ comment by simon · 2014-04-11T03:10:01.897Z · LW(p) · GW(p)

"ask for too much and you wind up with nothing" is a fine fairy tale moral. Does it actually hold in these particular circumstances?

Imagine that there's a landscape of possible words. There is a function (A) on this landscape, we don't know how to define it, but it is how much we truly would prefer a world if only we knew. Somewhere this function has a peak, the most ideal "eutopia". There is another function. This one we do define. It is intended to approximate the first function, but it does not do so perfectly. Our "acceptability criteria" is to require that this second function (B) has a value at least some threshold.

Now as we raise the acceptability criteria (threshold for function B), we might expect there to be two different regimes. In a first regime with low acceptability criteria, Function B is not that bad a proxy for function A, and raising the threshold increases the average true desirability of the worlds that meet it. In a second regime with high acceptability criteria, function B ceases to be effective as a proxy. Here we are asking for "too much". The peak of function B is at a different place than the peak of function A, and as we raise the threshold high enough we exclude the peak of A entirely. What we end up with is a world highly optimized for B and not so well optimized for A - a "marketing world".

So, we must conclude, like you and Stuart Armstrong, that asking for "too much" is bad and we'd better set a lower threshold. Case closed, right?

Wrong.

The problem is that the above line of reasoning provides no reason to believe that the "marketing world" at the peak of function B is any worse than a random world at any lower threshold. As we relax the threshold on B, we include more worlds that are better in terms of A but also more that are worse. There's no particular reason to believe, simply because the peak of B is at a different place than the peak of A, that the peak of B is at a valley of A. In fact, if B represents our best available estimate of A, it would seem that, even though the peak of B is predictably a marketing world, it's still our best bet at getting a good value of A. A random world at any lower threshold should have a lower expected value of A.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-04-17T11:34:47.955Z · LW(p) · GW(p)

The problem is that the above line of reasoning provides no reason to believe that the "marketing world" at the peak of function B is any worse than a random world at any lower threshold.

True. Which is why I added arguments pointing that a marketing world will likely be bad. Even on your terms, a peak of B will probably involve a diversion of effort/energy that could have contributed to A, away from A. eg if A is apples and B is bananas, the world with the most bananas is likely to contain no apples at all.

↑ comment by lavalamp · 2014-04-11T22:54:57.036Z · LW(p) · GW(p)

It sounds like, "the better you do maximizing your utility function, the more likely you are to get a bad result," which can't be true with the ordinary meanings of all those words. The only ways I can see for this to be true is if you aren't actually maximizing your utility function, or your true utility function is not the same as the one you're maximizing. But then you're just plain old maximizing the wrong thing.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-04-17T11:09:18.501Z · LW(p) · GW(p)

But then you're just plain old maximizing the wrong thing.

Er, yes? But we don't exactly have the right thing lying around, unless I've missed some really exciting FAI news...

Replies from: lavalamp

↑ comment by lavalamp · 2014-04-17T18:13:05.025Z · LW(p) · GW(p)

Absolutely, granted. I guess I just found this post to be an extremely convoluted way to make the point of "if you maximize the wrong thing, you'll get something that you don't want, and the more effectively you achieve the wrong goal, the more you diverge from the right goal." I don't see that the existence of "marketing worlds" makes maximizing the wrong thing more dangerous than it already was.

Additionally, I'm kinda horrified about the class of fixes (of which the proposal is a member) which involve doing the wrong thing less effectively. Not that I have an actual fix in mind. It just sounds like a terrible idea--"we're pretty sure that our specification is incomplete in an important, unknown way. So we're going to satisfice instead of maximize when we take over the world."

Siren worlds and the perils of over-optimised search

Contents

The AI builds the siren worlds

Siren and marketing worlds without builders

Constrained search and satisficing our preferences

418 comments

Estimate pi by dropping $total_hits random points into a square with corners at -1,-1 and 1,1

(then count how many are inside radius one circle centered on origin)