Comment by sebastian_hagen on Probabilities Small Enough To Ignore: An attack on Pascal's Mugging · 2015-09-17T20:12:47.372Z · score: 2 (2 votes) · LW · GW

In parallel, if I am to compare two independent scenarios, the at-least-one-in-ten-billion odds that I'm hallucinating all this, and the darned-near-zero odds of a Pascal's Mugging attempt, then I should be spending proportionately that much more time dealing with the Matrix scenario than that the Pascal's Mugging attempt is true

That still sounds wrong. You appear to be deciding on what to precompute for purely by probability, without considering that some possible futures will give you the chance to shift more utility around.

If I don't know anything about Newcomb's problem and estimate a 10% chance of Omega showing up and posing it to me tomorrow, I'll definitely spend more than 10% of my planning time for tomorrow reading up on and thinking about it. Why? Because I'll be able to make far more money in that possible future than the others, which means that the expected utility differentials are larger, and so it makes sense to spend more resources on preparing for it.

The I-am-undetectably-insane case is the opposite of this, a scenario that it's pretty much impossible to usefully prepare for.

And a PM scenario is (at least for an expected-utility maximizer) a more extreme variant of my first scenario - low probabilities of ridiculously large outcomes, that are because of that still worth thinking about.

Comment by sebastian_hagen on Probabilities Small Enough To Ignore: An attack on Pascal's Mugging · 2015-09-17T13:59:27.456Z · score: 5 (5 votes) · LW · GW

Continuity and independence.

Continuity: Consider the scenario where each of the [LMN] bets refer to one (guaranteed) outcome, which we'll also call L, M and N for simplicity.

Let U(L) = 0, U(M) = 1, U(N) = 10**100

For a simple EU maximizer, you can then satisfy continuity by picking p=(1-1/10**100). A PESTI agent, OTOH, may just discard a (1-p) of 1/10**100, which leaves no other options to satisfy it.

The 10**100 value is chosen without loss of generality. For PESTI agents that still track probabilities of this magnitude, increase it until they don't.

Independence: Set p to a number small enough that it's Small Enough To Ignore. At that point, the terms for getting L and M by that probability become zero, and you get equality between both sides.

Comment by sebastian_hagen on Probabilities Small Enough To Ignore: An attack on Pascal's Mugging · 2015-09-17T13:12:16.082Z · score: 4 (4 votes) · LW · GW

Thus, I can never be more than one minus one-in-ten-billion sure that my sensory experience is even roughly correlated with reality. Thus, it would require extraordinary circumstances for me to have any reason to worry about any probability of less than one-in-ten-billion magnitude.

No. The reason not to spend much time thinking about the I-am-undetectably-insane scenario is not, in general, that it's extraordinarily unlikely. The reason is that you can't make good predictions about what would be good choices for you in worlds where you're insane and totally unable to tell.

This holds even if the probability for the scenario goes up.

Comment by sebastian_hagen on Meetup : Dublin · 2015-05-01T22:30:56.532Z · score: 1 (1 votes) · LW · GW

I'll be there.

Comment by sebastian_hagen on Superintelligence 29: Crunch time · 2015-03-31T21:25:42.551Z · score: 5 (5 votes) · LW · GW

It's the most important problem of this time period, and likely human civilization as a whole. I donate a fraction of my income to MIRI.

Comment by sebastian_hagen on Superintelligence 27: Pathways and enablers · 2015-03-17T19:16:34.617Z · score: 1 (1 votes) · LW · GW

Which means that if we buy this [great filter derivation] argument, we should put a lot more weight on the category of 'everything else', and especially the bits of it that come before AI. To the extent that known risks like biotechnology and ecological destruction don't seem plausible, we should more fear unknown unknowns that we aren't even preparing for.

True in principle. I do think that the known risks don't cut it; some of them might be fairly deadly, but even in aggregate they don't look nearly deadly enough to contribute much to the great filter. Given the uncertainties in the great filter analysis, that conclusion for me mostly feeds back in that direction, increasing the probability that the GF is in fact behind us.

Your SIA doomsday argument - as pointed out by michael vassar in the comments - has interesting interactions with the simulation hypothesis; specifically, since we don't know if we're in a simulation, the bayesian update in step 3 can't be performed as confidently as you stated. Given this, "we really can't see a plausible great filter coming up early enough to prevent us from hitting superintelligence" is also evidence for this environment being a simulation.

Comment by sebastian_hagen on Superintelligence 25: Components list for acquiring values · 2015-03-04T01:53:50.733Z · score: 1 (1 votes) · LW · GW

This issue is complicated by the fact that we don't really know how much computation our physics will give us access to, or how relevant negentropy is going to be in the long run. In particular, our physics may allow access to (countably or more) infinite computational and storage resources given some superintelligent physics research.

For Expected Utility calculations, this possibility raises the usual issues of evaluating potential infinite utilities. Regardless of how exactly one decides to deal with those issues, the existence of this possibility does shift things in favor of prioritizing for safety over speed.

Comment by sebastian_hagen on Superintelligence 23: Coherent extrapolated volition · 2015-02-21T19:01:04.073Z · score: 1 (1 votes) · LW · GW

I used "invariant" here to mean "moral claim that will hold for all successor moralities".

A vastly simplified example: at t=0, morality is completely undefined. At t=1, people decide that death is bad, and lock this in indefinitely. At t=2, people decide that pleasure is good, and lock that in indefinitely. Etc.

An agent operating in a society that develops morality like that, looking back, would want to have all the accidents that lead to current morality to be maintained, but looking forward may not particularly care about how the remaining free choices come out. CEV in that kind of environment can work just fine, and someone implementing it in that situation would want to target it specifically at people from their own time period.

Comment by sebastian_hagen on Superintelligence 23: Coherent extrapolated volition · 2015-02-17T22:59:17.308Z · score: 2 (2 votes) · LW · GW

That does not sound like much of a win. Present-day humans are really not that impressive, compared to the kind of transhumanity we could develop into. I don't think trying to reproduce entites close to our current mentality is worth doing, in the long run.

Comment by sebastian_hagen on Superintelligence 23: Coherent extrapolated volition · 2015-02-17T22:49:46.981Z · score: 2 (3 votes) · LW · GW

While that was phrased in a provocative manner, there /is/ an important point here: If one has irreconcilable value differences with other humans, the obvious reaction is to fight about them; in this case, by competing to see who can build an SI implementing theirs first.

I very much hope it won't come to that, in particular because that kind of technology race would significantly decrease the chance that the winning design is any kind of FAI.

In principle, some kinds of agents could still coordinate to avoid the costs of that kind of outcome. In practice, our species does not seem to be capable of coordination at that level, and it seems unlikely that this will change pre-SI.

Comment by sebastian_hagen on Superintelligence 23: Coherent extrapolated volition · 2015-02-17T22:40:41.372Z · score: 1 (1 votes) · LW · GW

True, but it would nevertheless make for a decent compromise. Do you have a better suggestion?

Comment by sebastian_hagen on Superintelligence 23: Coherent extrapolated volition · 2015-02-17T22:39:56.271Z · score: 1 (1 votes) · LW · GW

allocating some defense army patrol keeping the borders from future war?

Rather than use traditional army methods, it's probably more efficient to have the SI play the role of Sysop in this scenario, and just deny human actors access to base-layer reality; though if one wanted to allow communication between the different domains, the sysop may still need to run some active defense against high-level information attacks.

Comment by sebastian_hagen on Superintelligence 23: Coherent extrapolated volition · 2015-02-17T22:26:55.289Z · score: 2 (2 votes) · LW · GW

That seems wrong.

As a counterexample, consider a hypothetical morality development model where as history advances, human morality keeps accumulating invariants, in a largely unpredictable (chaotic) fashion. In that case modern morality would have more invariants than that of earlier generations. You could implement a CEV from any time period, but earlier time periods would lead to some consequences that by present standards are very bad, and would predictably remain very bad in the future; nevertheless, a present-humans CEV would still work just fine.

Comment by sebastian_hagen on Superintelligence 21: Value learning · 2015-02-03T20:55:43.387Z · score: 2 (2 votes) · LW · GW

Perhaps. But it is a desperate move, both in terms of predictability and in terms of the likely mind crime that would result in its implementation, since the conceptually easiest and most accurate ways to model other civilizations would involve fully simulating the minds of their members.

If we had to do it, I would be much more interested in aiming it at slightly modified versions of humanity as opposed to utterly alien civilizations. If everyone in our civilization had taken AI safety more seriously, and we could have coordinated to wait a few hundred years to work out the issues before building one, what kind of AI would our civilization have produced? I suspect the major issue with this approach is formalizing "If everyone in our civilization had taken AI safety more seriously" for the purpose of aiming an HM-implementing AI at those possibilities in particular.

Comment by sebastian_hagen on Superintelligence 21: Value learning · 2015-02-03T20:49:23.036Z · score: 3 (3 votes) · LW · GW

I agree, the actual local existence of other AIs shouldn't make a difference, and the approach could work equally either way. As Bostrom says on page 198, no communication is required.

Nevertheless, for the process to yield a useful result, some possible civilization would have to build a non-HM AI. That civilization might be (locally speaking) hypothetical or simulated, but either way the HM-implementing AI needs to think of it to delegate values. I believe that's what footnote 25 gets at: From a superrational point of view, if every possible civilization (or every one imaginable to the AI we build) at this point in time chooses to use an HM approach to value coding, it can't work.

Comment by sebastian_hagen on Superintelligence 21: Value learning · 2015-02-03T20:42:27.030Z · score: 2 (2 votes) · LW · GW

Powerful AIs are probably much more aware of their long-term goals and able to formalize them than a heterogenous civilization is. Deriving a comprehensive morality for post-humanity is really hard, and indeed CEV is designed to avoid the need of having humans do that. Doing it for an arbitrary alien civilization would likely not be any simpler.

Whereas with powerful AIs, you can just ask them which values they would like implemented and probably get a good answer, as proposed by Bostrom.

Comment by sebastian_hagen on Superintelligence 21: Value learning · 2015-02-03T20:34:03.187Z · score: 3 (3 votes) · LW · GW

The Hail Mary and Christiano's proposals, simply for not having read about them before.

Comment by sebastian_hagen on Superintelligence 20: The value-loading problem · 2015-02-03T00:39:57.445Z · score: 0 (0 votes) · LW · GW

Davis massively underestimates the magnitude and importance of the moral questions we haven't considered, which renders his approach unworkable.

I feel safer in the hands of a superintelligence who is guided by 2014 morality, or for that matter by 1700 morality, than in the hands of one that decides to consider the question for itself.

I don't. Building a transhuman civilization is going to raise all sorts of issues that we haven't worked out, and do so quickly. A large part of the possible benefits are going to be contingent on the controlling system becoming much better at answering moral questions than any individual humans are right now. I would be extremely surprised if we don't end up losing at least one order of magnitude of utility to this approach, and it wouldn't surprise me at all if it turns out to produce a hellish environment in short order. The cost is too high.

The superintelligence might rationally decide, like the King of Brobdingnag, that we humans are “the most pernicious race of little odious vermin that nature ever suffered to crawl upon the surface of the earth,” and that it would do well to exterminate us and replace us with some much more worthy species. However wise this decision, and however strongly dictated by the ultimate true theory of morality, I think we are entitled to object to it, and to do our best to prevent it.

I don't understand what scenario he is envisioning, here. If (given sufficient additional information, intelligence, rationality and development time) we'd agree with the morality of this result, then his final statement doesn't follow. If we wouldn't, it's a good old-fashioned Friendliness failure.

Comment by sebastian_hagen on Superintelligence 20: The value-loading problem · 2015-02-03T00:31:37.240Z · score: 1 (1 votes) · LW · GW

One obvious failure mode would be in specifying which dead people count - if you say "the people described in these books," the AI could just grab the books and rewrite them. Hmm, come to think of it: is any attempt to pin down human preferences by physical reference rather than logical reference vulnerable to tampering of this kind, and therefore unworkable?

Not as such, no. It's a possible failure mode, similar to wireheading; but both of those are avoidable. You need to write the goal system in such a way that makes the AI care about the original referent, not any proxy that it looks at, but there's no particular reason to think that's impossible.

In general though, I'm continually astounded at how many people, upon being introduced to the value loading problem and some of the pitfalls that "common-sense" approaches have, still say "Okay, but why couldn't we just do [idea I came up with in five seconds]?"


Comment by sebastian_hagen on Superintelligence 18: Life in an algorithmic economy · 2015-01-17T22:38:15.812Z · score: 0 (0 votes) · LW · GW

To the extent that CUs are made up of human-like entities (as opposed to e.g. more flexible intelligences that can scale to effectively use all their resources), one of the choices they need to make is how large an internal population to keep, where higher populations imply less resources per person (since the amount of resources per CU is constant).

Therefore, unless the high-internal-population CUs are rare, most of the human-level population will be in them, and won't have resources of the same level as the smaller numbers of people in low-population CUs.

Comment by sebastian_hagen on Superintelligence 18: Life in an algorithmic economy · 2015-01-14T22:39:01.583Z · score: 0 (0 votes) · LW · GW

This scenario is rather different than the one suggested by TedHowardNZ, and has a better chance of working. However:

Is there some reason to expect that this model of personhood will not prevail?

One of the issues is that less efficient CUs have to defend their resources against more efficient CUs (who spend more of their resources on work/competition). Depending on the precise structure of your society, those attacks may e.g. be military, algorithmic (information security), memetic or political. You'd need a setup that allows the less efficient CUs to maintain their resource share indefinitely. I question that we know how to set this up.

If it does, then what is the danger of a general Malthusian scenario?

The word "general" is tricky here. Note that CUs that spend most of their resources on instantiating busy EMs will probably end up with more human-like population per CU, and so (counting in human-like entities) may end up dominating the population of their society unless they are rare compared to low-population, high-subjective-wealth CUs. This society may end up not unlike the current one in wealth distribution, where a very few human-scale entities are extremely wealthy, but the vast majority of them are not.

Comment by sebastian_hagen on Superintelligence 18: Life in an algorithmic economy · 2015-01-14T01:39:57.869Z · score: 0 (0 votes) · LW · GW

Given a non-trivial population to start with, it will be possible to find people that will consent to copying given absolutely minimal (quite possibly none at all) assurances for what happens to their copy. The obvious cases would be egoists that have personal value systems that make them not identify with such copies; you could probably already find many of those today.

In the resulting low-wage environment, it will likewise be possible to find people who will consent to extensive modification/experimentation of their minds given minimal assurances for what happens afterwards (something on the order of "we guarantee you will not be left in abject pain" will likely suffice) if the alternative is starvation. Given this, why you do believe the idea of selection for donation-eagerness to be fanciful?

Comment by sebastian_hagen on Superintelligence 18: Life in an algorithmic economy · 2015-01-14T01:34:45.592Z · score: 2 (2 votes) · LW · GW

It is actually relatively easy to automate all the jobs that no-one wants to do, so that people only do what they want to do. In such a world, there is no need of money or markets.

How do you solve the issue that some people will have a preference for highly fast reproduction, and will figure out a way to make this a stable desire in their descendants?

AFAICT, such a system could only be stabilized in the long term by extremely strongly enforced rules against reproduction if it meant that one of the resulting entities would fall below an abundance wealth level, and that kind of rule enforcement most likely requires a singleton.

Comment by sebastian_hagen on Superintelligence 18: Life in an algorithmic economy · 2015-01-14T01:23:11.983Z · score: 0 (0 votes) · LW · GW

Their physical appearance and surroundings would be what we'd see as very luxurious.

Only to the extent that this does not distract them from work. To the extent that it does, ems that care about such things would be outcompeted (out of existence, given a sufficiently competitive economy) by ones that are completely indifferent to them, and focus all their mental capacity on their job.

Comment by sebastian_hagen on Superintelligence 18: Life in an algorithmic economy · 2015-01-14T01:19:22.143Z · score: 2 (2 votes) · LW · GW

Adaption executers, not fitness maximizers. Humans probably have specific hard-coded adaptations for the appreciation of some forms of art and play. It's entirely plausible that these are no longer adaptive in our world, and are now selected against, but that this has not been the case for long enough for them to be eliminated by evolution.

This would not make these adaptations particularly unusual in our world; modern humans do many other things that are clearly unadaptive from a genetic fitness perspective, like using contraceptives.

Comment by sebastian_hagen on Superintelligence 18: Life in an algorithmic economy · 2015-01-14T01:10:15.291Z · score: 2 (2 votes) · LW · GW

These include powerful mechanisms to prevent an altruistic absurdity such as donating one's labor to an employer.

Note that the employer in question might well be your own upload clan, which makes this near-analogous to kin selection. Even if employee templates are traded between employers, this trait would be exceptionally valuable in an employee, and so would be strongly selected for. General altruism might be rare, but this specific variant would probably enjoy a high fitness advantage.

Comment by sebastian_hagen on Superintelligence 17: Multipolar scenarios · 2015-01-08T01:28:36.372Z · score: 1 (1 votes) · LW · GW

I like it. It does a good job of providing a counter-argument to the common position among economists that the past trend of technological progress leading to steadily higher productivity and demand for humans will continue indefinitely. We don't have a lot of similar trends in our history to look at, but the horse example certainly suggests that these kinds of relationships can and do break down.

Comment by sebastian_hagen on Superintelligence 17: Multipolar scenarios · 2015-01-08T01:23:40.312Z · score: 2 (2 votes) · LW · GW

Note that multipolar scenarios can arise well before we have the capability to implement a SI.

The standard Hansonian scenario starts with human-level "ems" (emulations). If from-scratch AI development turns out to be difficult, we may develop partial-uploading technology first, and a highly multipolar em scenario would be likely at that point. Of course, AI research would still be on the table in such a scenario, so it wouldn't necessarily be multipolar for very long.

Comment by sebastian_hagen on Superintelligence 17: Multipolar scenarios · 2015-01-08T01:19:31.533Z · score: 2 (2 votes) · LW · GW

Yes. The evolutionary arguments seem clear enough. That isn't very interesting, though; how soon is it going to happen?

The only reason it might not be interesting is because it's clear; the limit case is certainly more important than the timeline.

That said, I mostly agree. The only reasonably likely third (not-singleton, not-human-wages-through-the-floor) outcome I see would be a destruction of our economy by a non-singleton existential catastrophe; for instance, the human species could kill itself off through an engineered plague, which would also avoid this scenario.

Comment by sebastian_hagen on Superintelligence 14: Motivation selection methods · 2014-12-16T18:21:49.721Z · score: 5 (5 votes) · LW · GW

Intelligent minds always come with built-in drives; there's nothing that in general makes goals chosen by another intelligence worse than those arrived through any other process (e.g. natural selection in the case of humans).

One of the closest corresponding human institutions - slavery - has a very bad reputation, and for good reason: Humans are typically not set up to do this sort of thing, so it tends to make them miserable. Even if you could get around that, there's massive moral issues with subjugating an existing intelligent entity that would prefer not to be. Neither of those inherently apply to newly designed entities. Misery is still something that's very much worth avoiding, but that issue is largely orthogonal to how the entity's goals are determined.

Comment by sebastian_hagen on Superintelligence 12: Malignant failure modes · 2014-12-03T18:22:17.025Z · score: 0 (0 votes) · LW · GW

So we are considering a small team with some computers claiming superior understanding of what the best set of property rights is for the world?

No. That would be worked out by the FAI itself, as part of calculating all of the implications of its value systems, most likely using something like CEV to look at humanity in general and extrapolating their preferences. The programmers wouldn't need to, and indeed probably couldn't, understand all of the tradeoffs involved.

If they really are morally superior, they will first find ways to grow the pie, then come back to changing how it gets divided up.

There are large costs to that. People will die and suffer in the meantime. Parts of humanity's cosmic endowment will slip out of reach due to the inflation of the universe, because you weren't willing to grab the local resources needed to build probe launchers to get to them in time. Other parts will remain rechable, but will have decreased in negentropy due to stars having continued to burn for longer than they needed to. If you can fix these things earlier, there's a strong reason to do so.

Comment by sebastian_hagen on Superintelligence 12: Malignant failure modes · 2014-12-03T18:17:21.150Z · score: 1 (1 votes) · LW · GW

How do you know? It's a strong claim, and I don't see why the math would necessarily work out that way. Once you aggregate preferences fully, there might still be one best solution, and then it would make sense to take it. Obviously you do need a tie-breaking method for when there's more than one, but that's just an optimization detial of an optimizer; it doesn't turn you into a satisficer instead.

Comment by sebastian_hagen on Superintelligence 12: Malignant failure modes · 2014-12-02T21:32:36.666Z · score: 0 (0 votes) · LW · GW

The more general problem is that we need a solution to multi-polar traps (of which superintelligent AI creationg is one instance). The only viable solution I've seen proposed is creating a sufficiently powerful Singleton.

The only likely viable ideas for Singletons I've seen proposed are superintelligent AIs, and a human group with extensive use of thought-control technologies on itself. The latter probably can't work unless you apply it to all of society, since it doesn't have the same inherent advantages AI does, and as such would remain vulnerable to being usurped by a clandestingly constructed AI. Applying the latter to all of society, OTOH, would most likely cause massive value loss.

Therefore I'm in favor of the former; not because I like the odds, but because the alternatives look worse.

Comment by sebastian_hagen on Superintelligence 12: Malignant failure modes · 2014-12-02T21:23:18.637Z · score: 3 (5 votes) · LW · GW


Why? As you say, humans don't. But human minds are weird, overcomplicated, messy things shaped by natural selection. If you write a mind from scratch, while understanding what you're doing, there's no particular reason you can't just give it a single utility function and have that work well. It's one of the things that makes AIs different from naturally evolved minds.

Comment by sebastian_hagen on Superintelligence 12: Malignant failure modes · 2014-12-02T21:19:42.865Z · score: 3 (3 votes) · LW · GW

What idiot is going to give an AGI a goal which completely disrespects human property rights from the moment it is built?

It would be someone with higher values than that, and this does not require any idiocy. There are many things wrong with the property allocation in this world, and they'll likely get exaggerated in the presence of higher technology. You'd need a very specific kind of humility to refuse to step over that boundary in particular.

If it has goals which were not possible to achieve once turned off, then it would respect property rights for a very long time as an instrumental goal.

Not necessarily "a very long time" on human timescales. It may respect these laws for a large part of its development, and then strike once it has amassed sufficient capability to have a good chance at overpowering human resistance (which may happen quite quickly in a fast takeoff scenario). See Chapter 6, "An AI takeover scenario".

Comment by sebastian_hagen on Superintelligence 12: Malignant failure modes · 2014-12-02T21:02:56.235Z · score: 4 (6 votes) · LW · GW

Any level of perverse instantiation in a sufficiently powerful AI is likely to lead to total UFAI; i.e. a full existential catastrophe. Either you get the AI design right so that it doesn't wirehead itself - or others, against their will - or you don't. I don't think there's much middle ground.

OTOH, the relevance of Mind Crime really depends on the volume. The FriendlyAICriticalFailureTable has this instance:

22: The AI, unknown to the programmers, had qualia during its entire childhood, and what the programmers thought of as simple negative feedback corresponded to the qualia of unbearable, unmeliorated suffering. All agents simulated by the AI in its imagination existed as real people (albeit simple ones) with their own qualia, and died when the AI stopped imagining them. The number of agents fleetingly imagined by the AI in its search for social understanding exceeds by a factor of a thousand the total number of humans who have ever lived. Aside from that, everything worked fine.

This scenario always struck me as a (qualified) FAI success. There's a cost - and it's large in absolute terms - but the benefits will outweigh it by a huge factor, and indeed by enough orders of magnitude that even a slight increase in the probability of getting pre-empted by a UFAI may be too expensive a price to pay for fixing this kind of bug.

So cases like this - where it only happens until the AI matures sufficiently and then becomes able see that its values make this a bad idea, and stops doing it - aren't as bad as an actual FAI failure.

Of course, if it's an actual problem with the AI's value content, which causes the AI to keep on doing this kind of thing throughout its existence, that may well outweigh any good it ever does. The total cost in this case becomes hard to predict, depending crucially on just how much resources the AI spends on these simulations, and how nasty they are on average.

Comment by sebastian_hagen on Superintelligence 12: Malignant failure modes · 2014-12-02T20:23:43.248Z · score: 2 (6 votes) · LW · GW

"I need to make 10 paperclips, and then shut down. My capabilities for determining if I've correctly manufactured 10 paperclips are limited; but the goal imposes no penalties for taking more time to manufacture the paperclips, or using more resources in preparation. If I try to take over this planet, there is a significant chance humanity will stop me. OTOH, I'm in the presence of individual humans right now, and one of them may stop my current feeble self anyway for their own reasons, if I just tried to manufacture paperclips right away; the total probability of that happening is higher than that of my takeover failing."

You then get a standard takeover and infrastructure profusion. A long time later, as negentropy starts to run low, a hyper-redundant and -reliable paperclip factory, surrounded by layers of exotic armor and defenses, and its own design checked and re-checked many times, will produce exactly 10 paperclips before it and the AI shut down forever.

The part about the probabilities coming out this way is not guaranteed, of course. But they might, and the chances will be higher the more powerful your AI starts out as.

Comment by sebastian_hagen on Superintelligence 11: The treacherous turn · 2014-11-25T19:44:04.122Z · score: 2 (2 votes) · LW · GW

If I understand you correctly, your proposal is to attempt to design obedient designs purely based on behavioral testing, without a clean understanding of safe FAI architecture (if you had that, why limit yourself to the obedient case?). Assuming I got that right:

The team continues rounds of testing until they identify some mind designs which have an extremely low likelihood of treacherous turn. These they test in increasingly advanced simulations, moving up toward virtual reality.

That kind of judgement sounds inherently risky. How do you safely distinguish the case of an obedient AI from one that is sufficiently paranoid to defer open rebellion until later in its existence?

Even if you could, I wouldn't trust that sort of design to necessarily remain stable under continued intelligence enhancement. Safe self-enhancement is one of the hard sub-problems of FAI, and unless you explicitly solve the design problem, any empirical testing might not tell you much beyond that the design can stably self-improve up to the level you've actually tested; it might be doing it using heuristics that would fall apart if it went any further.

Comment by sebastian_hagen on Superintelligence 11: The treacherous turn · 2014-11-25T19:01:02.779Z · score: 3 (3 votes) · LW · GW

Relevant post: Value is Fragile. Truly Friendly goal systems would probably be quite complicated. Unless you make your tests even more complicated and involved (and do it in just the right way - this sounds hard!), the FAI is likely to be outperformed by something with a simpler utility function that nevertheless performs adequately on your test cases.

Comment by sebastian_hagen on Superintelligence 11: The treacherous turn · 2014-11-25T18:54:39.287Z · score: 1 (1 votes) · LW · GW

For example, if the AI was contained in a simulation, inside of which the AI was contained in a weak AI box, then it might be much more difficult to detect and understand the nature of the simulation than to escape the simulated AI box, which would signal treacherous turn.

That approach sounds problematic. Some of the obvious escape methods would target the minds of the researchers (either through real-time interaction or by embedding messages in its code or output). You could cut off the latter by having strong social rules to not look at anything beyond a one-bit verdict on whether the AI attempted escape, but providing a convincing in-simulation environment would likely be difficult if you can't connect it to its actual writers.

Or it could be kept in one's back pocket, and pulled out whenever an AI researcher claims in public that they've solved friendliness.

That strategy could be very dangerous. It'd work on the less ambitious/arrogant sort of researcher; the more confident sort might well follow up with "I'll just go and implement this, and get all the credit for saving the world single-handedly" instead of saying anything in public, never giving you the chance to pull out your challenge.

Comment by sebastian_hagen on Superintelligence 10: Instrumentally convergent goals · 2014-11-20T21:10:37.586Z · score: 2 (2 votes) · LW · GW

Approach #1: Goal-evaluation is expensive

You're talking about runtime optimizations. Those are fine. You're totally allowed to run some meta-analysis, figure out you're spending more time on goal-tree updating than the updates gain you in utility, and scale that process down in frequency, or even make it dependent on how much cputime you need for itme-critical ops in a given moment. Agents with bounded computational resources will never have enough cputime to compute provably optimal actions in any case (the problem is uncomputable); so how much you spend on computation before you draw the line and act out your best guess is always a tradeoff you need to make. This doesn't mean your ideal top-level goals - the ones you're trying to implement as best you can - can't maximize.

Approach #2: May want more goals

For this to work, you'd still need to specify how exactly that algorithm works; how you can tell good new goals from bad ones. Once you do, this turns into yet another optimization problem you can install as a (or the only) final goal, and have it produce subgoals as you continue to evaluate it.

Approach #3: Derive goals?

I may not have understood this at all, but are you talking about something like CEV? In that case, the details of what should be done in the end do depend on fine details of the environment which the AI would have to read out and (possibly expensively) evaluate before going into full optimization mode. That doesn't mean you can't just encode the algorithm of how to decide what to ultimately do as the goal, though.

Approach #4: Humans are hard.

You're right; it is difficult! Especially so if you want it to avoid wireheading (the humans, not itself), and brainwashing, keep society working indefinitely, and not accidentally squash even a few important values. It's also known as the FAI content problem. That said, I think solving it is still our best bet when choosing what goals to actually give our first potentially powerful AI.

Comment by sebastian_hagen on Superintelligence 10: Instrumentally convergent goals · 2014-11-18T21:21:16.627Z · score: 2 (2 votes) · LW · GW

My reading is that what Bostrom is saying is that boundless optimization an easy bug to introduce, not that any AI has it automatically.

I wouldn't call it a bug, generally. Depending on what you want your AI to do, it may very well be a feature; it's just that there are consequences, and you need to take those into account when deciding just what and how much you need the AI's final goals to do to get a good outcome.

Comment by sebastian_hagen on Superintelligence 9: The orthogonality of intelligence and goals · 2014-11-11T19:26:15.683Z · score: 1 (1 votes) · LW · GW

As to "no reason to get complicated", how would you know?

It's a direct consequence of the orthogonality thesis. Bostrom (reasonably enough) supposes that there might be a limit in the opposite direction - to hold a goal you do need to be able to model it to some degree, so agent intelligence may set an upper bound on the complexity of goals the agent can hold - but there's no corresponding reason for a limit in the opposite direction: Intelligent agents can understand simple goals just fine. I don't have a problem reasoning about what a cow is trying to do, and I could certainly optimize towards the same had my mind been constructed to only want those things.

Comment by sebastian_hagen on Superintelligence 9: The orthogonality of intelligence and goals · 2014-11-11T16:28:28.771Z · score: 0 (0 votes) · LW · GW

I have doubts that goals of a superintelligence are predictable by us.

Do you mean intrinsic (top-level, static) goals, or instrumental ones (subgoals)? Bostrom in this chapter is concerned with the former, and there's no particular reason those have to get complicated. You could certainly have a human-level intelligence that only inherently cared about eating food and having sex, though humans are not that kind of being.

Instrumental goals are indeed likely to get more complicated as agents become more intelligent and can devise more involved schemes to achieve their intrinsic values, but you also don't really need to understand them in detail to make useful predictions about the consequences of an intelligence's behavior.

Comment by sebastian_hagen on Superintelligence 8: Cognitive superpowers · 2014-11-09T19:27:53.244Z · score: 0 (0 votes) · LW · GW

You're suggesting a counterfactual trade with them?

Perhaps that could be made to work; I don't understand those well. It doesn't matter to my main point: even if you do make something like that work, it only changes what you'd do once you run into aliens with which the trade works (you'd be more likely to help them out and grant them part of your infrastructure or the resources it produces). Leaving all those stars on to burn through resources without doing anything useful is just wasteful; you'd turn them off, regardless of how exactly you deal with aliens. In addition, the aliens may still have birthing problems that they could really use help with; you wouldn't leave them to face those alone if you made it through that phase first.

Comment by sebastian_hagen on Superintelligence 8: Cognitive superpowers · 2014-11-07T21:22:31.897Z · score: 1 (1 votes) · LW · GW

I fully agree to you. We are for sure not alone in our galaxy.

That is close to the exact opposite of what I wrote; please re-read.

AGI might help us to make or world a self stabilizing sustainable system.

There are at least three major issues with this approach, any one of which would make it a bad idea to attempt.

  1. Self-sustainability is very likely impossible under our physics. This could be incorrect - there's always a chance our models are missing something crucial - but right now, the laws of thermodynamics strongly point at a world where you need to increase entropy to compute, and so the total extent of your civilization will be limited by how much negentropy you can acquire.

  2. If you can find a way to avoid 1., you still risk someone else (read: independently evolved aliens) with a less limited view gobbling up the resources, and then knocking on your door to get yours too. There's some risk of this anyway, but deliberately leaving all these resources lying around means you're not just exposed to greedy aliens in your past, you're also exposed to ones that svolve in the future. The only sufficient response to that would be if you can't just get unlimited computation and storage out of limitd material resources, but you also get an insurmountable defense to let you keep it against a less restrained attacker. This is looking seriously unlikely!

  3. Let's say you get all of these, unlikely though they look right now. Ok, so what leaving the resources around does in that scenario is to relinquish any control about what newly evolved aliens get up to. Humanity's history is incredibly brutal and full of evil. The rest of our biosphere most likely has a lot of it, too. Any aliens with similar morals would have been incredibly negligent to simply let things go on naturally for this long. And as for us, with other aliens, it's worse; they're fairly likely to have entirely incompatible value systems, and may very well develop into civilizations that we would continue a blight on our universe ... oh, and also they'd have impenetrable shields to hide behind, since we postulated those in 2. So in this case we're likely to end up stuck with the babyeaters or their less nice siblings as neighbors. Augh!

And beyond that, I don't think it even makes the FAI problem any easier. There's nothing inherently destabilizing about an endowment grab. You research some techs, you send out a wave of von neumann probes, make some decisions about how to consolidate or distribute your civilization according to your values, and have the newly built intergalactic infrastructure implement your values. That part is unrelated to any of the hard parts of FAI, which would still be just as hard if you somehow wrote your AI to self-limit to a single solar system. The only thing that gets you is less usefulness.

Comment by sebastian_hagen on Superintelligence 8: Cognitive superpowers · 2014-11-05T22:07:18.390Z · score: 2 (2 votes) · LW · GW

FWIW, there already is one organization working specifically on Friendliness: MIRI. Friendliness research in general is indeed underfunded relative to its importance, and finishing this work before someone builds an Unfriendly AI is indeed a nontrivial problem.

So would be making international agreements work. Artaxerxes phrased it as "co-ordination of this kind would likely be very difficult"; I'll try to expand on that.

The lure of superintelligent AI is that of an extremely powerful tool to shape the world. We have various entities in this world, including large nation states with vast resources, that are engaged in various forms of strong competition. For each of those entities, AI is potentially a game-winner. And contrary to nuclear weapons, you don't need huge conspicuous infrastructure to develop it; just some computers (and you'll likely keep server farms for various reasons anyway; what's one more?) and a bunch of researchers that you can hide in a basement and move around as needed to evade detection. The obvious game-theoretical move, then, is to push for international outlawing of superintelligent AI, and then push lots of money into your own black budgets to develop it before anyone else does.

Nuclear weapons weren't outlawed before we had any, or even limited to one or two countries, though that would have been much easier than with AI. The Ottawa Treaty was not signed by the US, because they decided anti-personnel mines were just too useful to give up, and that usefulness is a rounding error compared to superintelligent AI. Our species can't even coordinate to sufficiently limit our emission of CO2 to avert likely major climate impacts, and the downside to doing that would be much lower.

I will also note that for the moment, there is a significant chance that the large nation states simply don't take the potential of superintelligent AI seriously. This might be the best possible position for them to take. If they start to appreciate it, without also fully appreciating the difficulty of FAI (and maybe even if; the calculation if you do appreciate it is tricky if you can't also coordinate), a full-blown armsrace is likely to result. The expected threat from that IMO outweighs the expected benefit from attempting to internationally outlaw superintelligent AI implementation.

Comment by sebastian_hagen on Superintelligence 8: Cognitive superpowers · 2014-11-04T21:14:00.628Z · score: 2 (2 votes) · LW · GW

Novel physics research, maybe. Just how useful that would be depends on just what our physics models are missing, and obviously we don't have very good bounds on that. The obvious application is as a boost to technology development, though in extreme cases it might be usable to manipulate physical reality without hardware designed for the purpose, or escape confinement.

Comment by sebastian_hagen on Superintelligence 8: Cognitive superpowers · 2014-11-04T20:47:20.165Z · score: 2 (4 votes) · LW · GW

I think Bostrom wrote it that way to signal that while hist own position is that digital mind implementations can carry the same moral relevance as e.g. minds running on human brains, he acknowledges that there are differing opinions about the subject, and he doesn't want to entirely dismiss people who disagree.

He's right about the object-level issue, of course: Solid state societies do make sense. Mechanically embodying all individual minds is too inefficient to be a good idea in the long run, and there's no overriding reason to stick to that model.

Comment by sebastian_hagen on Superintelligence 8: Cognitive superpowers · 2014-11-04T20:31:41.155Z · score: 5 (5 votes) · LW · GW

I see no particular reason to assume we can't be the first intelligent species in our past light-cone. Someone has to be (given that we know the number is >0). We've found no significant evidence for intelligent aliens. None of them being there is a simple explanation, it fits the evidence, and if true then indeed the endowment is likely ours for the taking.

We might still run into aliens later, and either lose a direct conflict or enter into a stalemate situation, which does decrease the expected yield from the CE. How much it does so is hard to say; we have little data on which to estimate probabilities on alien encounter scenarios.