Computation Hazards

alex_altair

Computation Hazards

post by Alex_Altair · 2012-06-13T21:49:19.986Z · LW · GW · Legacy · 58 comments

    Agents
    Predictors
    Oracles
  Examples of hazards
  Methods for avoiding computational hazards
  References
None
58 comments

This is a summary of material from various posts and discussions. My thanks to Eliezer Yudkowsky, Daniel Dewey, Paul Christiano, Nick Beckstead, and several others.

Several ideas have been floating around LessWrong that can be organized under one concept, relating to a subset of AI safety problems. I’d like to gather these ideas in one place so they can be discussed as a unified concept. To give a definition:

A computation hazard is a large negative consequence that may arise merely from vast amounts of computation, such as in a future supercomputer.

For example, suppose a computer program needs to model people very accurately to make some predictions, and it models those people so accurately that the "simulated" people can experience conscious suffering. In a very large computation of this type, millions of people could be created, suffer for some time, and then be destroyed when they are no longer needed for making the predictions desired by the program. This idea was first mentioned by Eliezer Yudkowsky in Nonperson Predicates.

There are other hazards that may arise in the course of running large-scale computations. In general, we might say that:

Large amounts of computation will likely consist in running many diverse algorithms. Many algorithms are computation hazards. Therefore, all else equal, the larger the computation, the more likely it is to produce a computation hazard.

Of course, most algorithms may be morally neutral. Furthermore, algorithms must be somewhat complex before they could possibly be a hazard. For instance, it is intuitively clear that no eight-bit program could possibly be a computation hazard on a normal computer. Worrying computations therefore fall into two categories: computations that run most algorithms, and computations that are particularly likely to run algorithms that are computation hazards.

An example of a computation that runs most algorithms is a mathematical formalism called Solomonoff induction. First published in 1964, it is an attempt to formalize the scientific process of induction using the theory of Turing machines. It is a brute-force method that finds hypotheses to explain data by testing all possible hypotheses. Many of these hypotheses may be algorithms that describe the functioning of people. At a sufficient precision, these algorithms themselves may experience consciousness and suffering. Taken literally, Solomonoff induction runs all algorithms; therefore it produces all possible computation hazards. If we are to avoid computation hazards, any implemented approximations of Solomonoff induction will need to determine ahead of time which algorithms are computation hazards.

Computations that run most algorithms could also hide in other places. Imagine a supercomputer’s power is being tested on a simple game, like chess or Go. The testing program simply tries all possible strategies, according to some enumeration. The best strategy that the supercomputer finds would be a measure of how many computations it could perform, compared to other computers that ran the same program. If the rules of the game are complex enough to be Turing complete (a surprisingly easy achievement) then this game-playing program would eventually simulate all algorithms, including ones with moral status.

Of course, running most algorithms is quite infeasible simply because of the vast number of possible algorithms. Depending on the fraction of algorithms that are computation hazards, it may be enough that a computation run an enormous number which act as a random sample of all algorithms. Computations of this type might include evolutionary programs, which are blind to the types of algorithms they run until the results are evaluated for fitness. Or they may be Monte Carlo approximations of massive computations.

But if computation hazards are relatively rare, then it will still be unlikely for large-scale computations to stumble across them unguided. Several computations may fall into the second category of computations that are particularly likely to run algorithms that are computation hazards. Here we focus on three types of computations in particular: agents, predictors and oracles. The last two types are especially important because they are often considered safer types of AI than agent-based AI architectures. First I will stipulate definitions for these three types of computations, and then I will discuss the types of computation hazards they may produce.

Agents

An agent is a computation which decides between possible actions based on the consequences of those actions. They can be thought of as “steering” the future towards some target, or as selecting a future from the set of possible futures. Therefore they can also be thought of as having a goal, or as maximizing a utility function.

Sufficiently powerful agents are extremely powerful because they constitute a feedback loop. Well-known from physics, feedback loops often change their surroundings incredibly quickly and dramatically. Examples include the growth of biological populations, and nuclear reactions. Feedback loops are dangerous if their target is undesirable. Agents will be feedback loops as soon as they are able to improve their ability to improve their ability to move towards their goal. For example, humans can improve their ability to move towards their goal by using their intelligence to make decisions. A student aiming to create cures can use her intelligence to learn chemistry, therefore improving her ability to decide what to study next. But presently, humans cannot improve their intelligence, which would improve their ability to improve their ability to make decisions. The student cannot yet learn how to modify her brain in order for her to more quickly learn subjects.

Predictors

A predictor is a computation which takes data as input, and predicts what data will come next. An example would be certain types of trained neural networks, or any approximation of Solomonoff induction. Intuitively, this feels safer than an agent AI because predictors do not seem to have goals or take actions; they just report predictions as requested by human.

Oracles

An oracle is a computation which takes questions as input, and returns answers. They are broader than predictors in that one could ask an oracle about predictions. Similar to a predictor, oracles do not seem to have goals or take actions. (Some material summarized here.)

Examples of hazards

Agent-like computations are the most clearly dangerous computation hazards. If any large computation starts running the beginning of a self-improving agent computation, it is difficult to say how far the agent may safely be run before it is a computation hazard. As soon as the agent is sufficiently intelligent, it will attempt to acquire more resources like computing substrate and energy. It may also attempt to free itself from control of the parent computation.

Another major concern is that, because people are an important part of the surroundings, even non-agent predictors or oracles will simulate people in order to make predictions or give answers respectively. Someone could ask a predictor, “What will this engineer do if we give him a contract?” It may be that the easiest way for the predictor to determine the answer is to simulate the internal workings of the given engineer's mind. If these simulations are sufficiently precise, then they will be people in and of themselves. The simulations could cause those people to suffer, and will likely kill them by ending the simulation when the prediction or answer is given.

Similarly, one can imagine that a predictor or oracle might simulate powerful agents; that is, algorithms which efficiently maximize some utility function. Agents may be simulated because many agent-like entities exist in the real world, and their behavior would need to be modeled. Or, perhaps oracles would investigate agents for the purpose of answering questions better. These agents, while being simulated, may have goals that require acting independently of the oracle. These agents may also be more powerful than the oracles, especially since the oracles were not designed with self-improvement behavior in mind. Therefore these agents may attempt to “unbox” themselves from the simulation and begin controlling the rest of the universe. For instance, the agents may use previous questions given to the oracle to deduce the nature of the universe and the psychology of the oracle-creators. (For a fictional example, see That Alien Message.) Or, the agent might somehow distort the output of the predictor, in a way that what the oracle predicts will cause us to unbox the agent.

Predictors also have the problem of self-fulfilling prophecies (first suggested here). An arbitrarily accurate predictor will know that its prediction will affect the future. Therefore, to be a correct prediction, it must make sure that delivering its prediction doesn’t cause the receiver to act in a way that negates the prediction. Therefore, the predictor may have to choose between predictions which cause the receiver to act in a way that fulfills the prediction. This is a type of control over the user. Since the predictor is super-intelligent, any control may rapidly optimize the universe towards some unknown goal.

Overall, there is a large worry that sufficiently intelligent oracles or predictors may become agents. Beside the above possibilities, some are worried that intelligence is inherently an optimization process, and therefore oracles and predictors are inherently satisfying some utility function. This, combined with the fact that nothing can be causally isolated from the rest of the universe, seems to invite an eventual AI-takeoff.

Methods for avoiding computational hazards

It is often thought that, while no proposal has yet been shown safe from computational hazards, oracles and predictors are safer than deliberately agent-based AGI. Other methods have been proposed to make these even safer. Armstrong et al. describe many AI safety measures in general. Below we review some possible techniques for avoiding computational hazards specifically.

One obvious safety practice is to limit the complexity, or the size of computations. In general, this will also limit the algorithm below general intelligence, but it is a good step while progressing towards FAI. Indeed, it is clear that all current prediction or AI systems are too simple to either be general intelligences, or pose as a computational hazard.

A proposal for regulating complex oracles or predictors is to develop safety indicators. That is, develop some function that will evaluate the proposed algorithm or model, and return whether it is potentially dangerous. For instance, one could write a simple program that rejects running an algorithm if any part of it is isomorphic to the human genome (since DNA clearly creates general intelligence and people under the right circumstances). Or, to measure the impact of an action suggested by an oracle, one could ask how many humans would be alive one year after the action was taken.

But one could only run an algorithm if they were sure it was not a person. A function that could evaluate an algorithm and return 0 only if it is not a person is called a nonperson predicate. Some algorithms are obviously not people. For example, squaring the numbers from 1 to 100 will not simulate people. Any algorithm whose behavior is periodic with a short period is unlikely to be a person, or nearly any presently constructed software. But in general this seems extremely difficult to verify. It could be that writing nonperson predicates or other safety indicators is FAI-complete in that sense that if we solve them, we will have discovered friendliness theory. Furthermore, it may be that some attempts to evaluate whether an algorithm is a person actually causes a simulation of a person, by running parts of the algorithm, by modeling a person for comparison, or by other means. Similarly, it may be that attempts to investigate the friendliness of a particular agent cause that agent to unbox itself.

Predictors seem to be one of the most goal-agnostic forms of AGI. This makes them a very attractive model in which to perfect safety. Some ideas for avoiding self-fulfilling predictions suggest that we ask the predictor to tell us what it would have predicted if we hadn’t asked (first suggested here). This frees the predictor from requiring itself to make predictions consistent with our behavior. Whether this will work depends on the exact process of the predictor; it may be so accurate that it cannot deal with counterfactuals, and will simply report that it would have predicted that we would have asked anyway. It is also problematic that the prediction is now inaccurate; because it has told us, we will act, possibly voiding any part of the prediction.

A very plausible but non-formal solution is to aim for a soft takeoff. For example, we could build a predictor that is not generally intelligent, and use it to investigate safe ways advance the situation. Perhaps we could use a sub-general intelligence to safely improve our own intelligence.

Have I missed any major examples in this post? Does “computation hazards” seem like a valid concept as distinct from other types of AI-risks?

References

Armstrong S., Sandberg A., Bostrom N. (2012). “Thinking inside the box: using and controlling an Oracle AI”. Minds and Machines, forthcoming.

Solomonoff, R., "A Formal Theory of Inductive Inference, Part I" Information and Control, Vol 7, No. 1 pp 1-22, March 1964.

Solomonoff, R., "A Formal Theory of Inductive Inference, Part II" Information and Control, Vol 7, No. 2 pp 224-254, June 1964.

58 comments

Comments sorted by top scores.

comment by evand · 2012-06-14T03:26:47.449Z · LW(p) · GW(p)

A computation hazard is a large negative consequence that may arise merely from vast amounts of computation, such as in a future supercomputer.

Are you including anything in this beyond the hazards of accidental simulation? It sounds to me like you aren't.

computations that run most algorithms, and computations that are particularly likely to run algorithms that are computation hazards.

I can't imagine any computation hazard arising from a computer that runs most algorithms, ie Solomonoff induction, actually being a hazard on any size computer and timescale that is commensurate with, say, turning the solar system into computronium and burning all the energy of the Sun. I don't think the selection power present in "run most algorithms" actually simulates anything sentient for a sufficient length of time for me to care. Now, a computer program that selectively simulates algorithms likely to be sentient might be a different matter, but then you're having a discussion about the ethics of simulation, not accidental computation hazards or "most algorithms". In other words, I think the claim expressed above stems from a fundamental misunderstanding of exactly how strong the term "uncomputably complex" is. I suspect you have not yet truly understood the growth curve of the busy beaver function.

Imagine a supercomputer’s power is being tested on a simple game, like chess or Go

I've written programs to play games. I cannot possibly see where such a hazard comes from. Have you looked at the proofs that such games can be Turing complete? That is, not just noted their existence, but actually examined the proofs. The level of machine that can be built on a standard size Go or Chess board is trivial. I think the accidental simulation hazard of such a computer is far less than that from noise errors on 8-bit microcontrollers running industrial control programs. Turing completeness relies on assumptions of infinte or very large (for merely approximate completeness) board sizes. And once you let the board size scale like that, you either have something that is exceedingly selective about which paths it explores (and therefore isn't "accidental" in the sense of running most algorithms), or doesn't simulate anything for long enough to exhibit sentient behavior, or even to have a very good chance of producing something with anything close to the potential of an unhatched ant.

In other words, I fail to see why this line of inquiry is deserving of anything more than an off-hand assessment of "not a hazard", let alone the first quarter of the post.

The remainder of the post seems to me to do a poor job of separating out the hazards of simulating people inside an AGI, vs the hazards presented by simply having the AGI. The latter hazards are thoroughly discussed elsewhere, and I don't find that this discussion adds anything. By mixing those hazards in with computation hazards, you make it very unclear whether those are intended to be included in your definition (which I originally thought excluded the results of the computation), and you make the rest of the post much less clear and much less useful.

Replies from: Alex_Altair

↑ comment by Alex_Altair · 2012-06-14T04:11:37.802Z · LW(p) · GW(p)

I can't imagine any computation hazard arising from a computer that runs most algorithms actually being a hazard on any size computer and timescale that is commensurate with, say, turning the solar system into computronium and burning all the energy of the Sun.

I'm not sure about this. It's difficult for me to do the order-of-magnitude calculation, because I don't know how many flops we can gets from using the solar system as substrate, and because I have no idea how small a program can be before it has moral status.

I suspect you have not yet truly understood the growth curve of the busy beaver function.

From my understanding, the busy beaver function is about the maximum number of steps an n-state Turing machine can take before you know it won't halt. This doesn't seem to have anything to do with the probability of simulating people; both halting and non-halting programs could have moral status.

Overall, I agree that the "runs most algorithms" category is not realistic. It's mostly just for philosophical interest, and completeness.

The remainder of the post seems to me to do a poor job of separating out the hazards of simulating people inside an AGI, vs the hazards presented by simply having the AGI.

Yeah, I'm worried that the concept of "computational hazard" as I've used it here isn't very useful, or rather, doesn't carve reality at its joints.

The latter hazards are thoroughly discussed elsewhere, and I don't find that this discussion adds anything.

I wasn't really trying to add original material to the discussion. This post was supposed to be a summary of ideas already discovered on LW, and I wanted to know if I was missing any, or if this was a good summary of the ideas.

Thanks for all your feedback!

Replies from: evand

↑ comment by evand · 2012-06-14T04:46:08.871Z · LW(p) · GW(p)

The busy beaver function is the lower bound on how fast a function has to grow before it counts as "uncomputable". When people say that the Solomonoff induction or Kolmogorov complexity is uncomputable, that's what they mean. When you say you're having trouble with an order of magnitude estimate, I have to wonder whether you attempted to come up with an order of magnitude estimate for the exponent. For example, I'm pretty sure that no conceivable amount of computronium will get to 10^10000 territory, and that I can safely ignore ethical problems that only arise once we're in that territory. I'm rather skeptical of the idea that we could even get to 10^1000 territory by brute force, even if we turned the galaxy into computronium.

Therefore, when you say you're concerned about computing Solmonoff induction or Kolmogorov priors, I'm left wondering whether you're worried about 5- or 6-state machines, because there is no conceivable way the program would ever get to the 7-state machines.

And you also can't simply dismiss the longest-running small Turing machines, in my opinion. If there exists an accidental computational hazard worth worrying about, I would assume it's in the form of something analogous to "run Conway's life until the board is empty" -- in other words, precisely a small Turing machine that is likely to run a very long time before halting, and whose halting status is difficult to determine.

I wasn't really trying to add original material to the discussion. This post was supposed to be a summary of ideas already discovered on LW, and I wanted to know if I was missing any, or if this was a good summary of the ideas.

I think the summary is overly confused. If the accidental hazards are of purely philosophical interest, they shouldn't occupy such a large fraction of the post, and shouldn't be the first thing discussed. I almost didn't bother reading the rest.

I think the non-simulation hazards have little in common with the simulation hazards from an ethics standpoint, and less from a practical FAI programming view. If you're having trouble separating them, then I would take that as a strong clue that you've chosen the wrong points to carve at.

Thanks for all your feedback!

You're welcome! There's some interesting stuff here, though I'm skeptical that there's much of interest in anything except the intentional, directed simulations questions.

Replies from: Alex_Altair

↑ comment by Alex_Altair · 2012-06-14T05:09:19.307Z · LW(p) · GW(p)

The busy beaver function is the lower bound on how fast a function has to grow before it counts as "uncomputable".

Ah, I see. That's not the definition, but it is a fact about it. Although I think you might be confused about the definition of "uncomputable". It doesn't have to do with functions growing. It's just a separate, awesome fact that all computable functions grow slower than the (uncomputable) busy beaver function. There are many uncomputable functions that grow slower than computable functions.

Replies from: evand

↑ comment by evand · 2012-06-14T07:42:09.870Z · LW(p) · GW(p)

Hmm. Seems I've confused some stuff. What you've said is correct, but I think my point is still valid.

The Kolmogorov prior is uncomputable. The time required to approximate it by simulating Turing machines to size n (in other words, the naive brute force approach in question) grows at busy beaver speeds, because it requires simulating all non-halting machines within the relevant size, and there is no generalizable way to shortcut those simulations.

Now, there are ways to approximate Solomonoff / Kolmogorov / AIXI with sane computational limits. However, once you start doing that, you can no longer claim that you run the risk of "running most algorithms", at least by the mathematical definition of "most" that I assumed you were using. Or rather, you will eventually, as you wait for infinite computation to be expended on the problem. But I'd say that either you're being selective to a degree that the hazard lies in selecting for simulation, rather than in running "most" algorithms, or you're in no more danger than in the brute force case. This is related to the point I made above: I think the accidentally dangerous portion of the search space is likely to lie in the algorithms whose runtime is long relative to their complexity, which are precisely the ones that will be avoided by approximations intended to be tractable.

comment by Viliam_Bur · 2012-06-14T11:40:53.397Z · LW(p) · GW(p)

I am not sure that simulating people is the same as creating people. Or more generally, that simulating a universe is the same as creating the universe, and stopping the simulation is the same as destroying the universe.

Even if we accept that the simulated people are real, they are real even if we don't simulate them -- they already exist somewhere in the multiverse. (They may have very low prior probability, but so do we, right?) Your metaphor for simulation is "creating a copy", my metaphor is "looking through a window". Which one is correct? They have different ethical consequences, because creating a copy of suffering means increasing suffering, but looking through a window at suffering does not. In other words, not simulating suffering is just as helpful as closing your eyes; it does not remove the suffering from the world.

So the question is, if the universe A contains a simulation of a universe B, does this increase the "existence" of the universe B? If the universe A runs the simulation of the universe B thousand times, is the increase thousand times greater? What if the simulation runs only once, but the data are copied thousand times to a redundant disk array? What if we write the algorithm, but don't actually run it? The calculation "2+2=4" may be also part of many universes, some of them including sentient beings; does writing it have ethical consequences too? If we can simulate a universe by billion computations of Life, is it also ethically wrong to make billion Life computations on random boards? (The individual steps of Life don't have an identity, do they?) Every configuration of atoms around us is some kind of computation, we just don't have the means to extract the data; compared with this, are our computer computations even significant? How about approximations, are they simulations too?

Replies from: Alex_Altair

↑ comment by Alex_Altair · 2012-06-14T15:57:55.880Z · LW(p) · GW(p)

Many Worlds does not say that everything you can imagine exists in some universe. What does exist in some universe is determined by the Schrödinger equation, which is specific and limiting.

Replies from: Viliam_Bur

↑ comment by Viliam_Bur · 2012-06-14T16:54:42.949Z · LW(p) · GW(p)

You are right. Many Worlds says that our universe exists in a superposition of many states, all of them governed by the same physical laws.

But if we assume the possibility of other universes with different physical laws (which I did implicitly), Solomonoff prior provides a framework for reasoning about them. Simply said, every universe exists, but some of them "exist more" and others "exist less", whatever that means. Simpler universes "exist more", complex universes "exist less"; each additional bit of description reduces the "existence" in half. Therefore very complicated universes have so little "existence" that we don't have to care about them.

This hypothesis feels even more weird than the Many Worlds hypothesis, but it explains some things that are otherwise difficult to explain, such as why our universe is fine-tuned for us. Without the hypothesis of multiple universes, the anthropic principle provides only a partial answer. It explains why we can't exist where we can't exist, but it does not explain why there is a universe where we can exist. On the other hand, if everything exists, why does our universe follow any laws? Solomonoff prior says that universes which follow laws "exist more", because it is easier to describe them (you only have to describe the initial state and the laws, not every possible exception). Thus, the anthropic principle + multiverse + Solomonoff prior together say that we do most probably exist in the simplest universe where we can exist; where simplest does not mean smallest in space and time, but most easy to fully describe mathematically. (Though I am not really sure if this universe really is simpler than other possible intelligent-life-containing universes. Maybe something is wrong with my explanation.)

Replies from: FeepingCreature, Alex_Altair, Alex_Altair

↑ comment by FeepingCreature · 2012-06-15T20:09:17.059Z · LW(p) · GW(p)

I'm not sure about this, but I think if even some of the more complex universes run enumerative Turing simulations (basically, run every possible Turing machine, in order), one might expect most of our "real-ness" to come from complex universes simulating simple ones. Eliezer touches on this in Finale.

Replies from: Alex_Altair

↑ comment by Alex_Altair · 2012-06-15T23:00:14.507Z · LW(p) · GW(p)

Are you saying that complex universes can run more simulations? Don't forget that complexity refers to Kolmogorov complexity, so simple universes can have tons of particles, but they all have the same properties. A complex universe would have just as many particles, but they would all have different physics. I'm not sure which of those universes is more capable of computation.

Replies from: FeepingCreature

↑ comment by FeepingCreature · 2012-06-16T00:57:28.230Z · LW(p) · GW(p)

No, my point is that there are a lot of complex universes but the Kolmogorov ordering of Turing machines is universal, so universe complexity isn't transitive - a complex universe that starts a Kolmogorov search still runs the simple ones first.

↑ comment by Alex_Altair · 2012-06-14T17:01:04.749Z · LW(p) · GW(p)

That... is interesting.

↑ comment by Alex_Altair · 2012-06-19T01:02:00.165Z · LW(p) · GW(p)

Just out of curiosity, have you considered how this belief pays rent? I can see how it pays utils by letting us simulate people in this situation, but I wouldn't know how to determine whether it really paid utils.

Replies from: Viliam_Bur

↑ comment by Viliam_Bur · 2012-06-19T07:59:58.909Z · LW(p) · GW(p)

The only way this belief is useful to me, is that it provides explanations to a few questions I would otherwise spend time answering; plus a wrong answer on them might make my real life worse.

First, avoiding generalized Pascal mugging: Yeah, everything is possible, including the chance that if I don't give you $1000 now, I will be tortured forever by an omnipotent sadist; but the probability is epsilon, so I won't give you those $1000 anyway.

Second, avoiding generalized quantum suicide: Yeah, whatever I do, in some universe it will have good consequences. And in some universe it will have bad consequences. But I should focus on whether the average (expected) results are positive or negative. For example, in case of a quantum suicide, the average result is me dead; in case of a lottery, the average result is not winning; in case of religion, the average result is no afterlife. On the other hand, when rationally doing useful things, the average result is more utilons.

The line between MWI and Tegmark Multiverse is not very clear, some of my arguments could be used for both. Using only MWI can answer questions about quantum randomness or generally about lawful randomness (which is probably on some level fueled by a quantum randomness: for example if I throw a coin, the exact movement of my muscles is determined by exact firing of my neurons, and a quantum event can make this signal a little bit weaker or stronger). But mere MWI cannot answer to questions like "what if this universe is just a simulation?", because that is outside of its framework (a simulation in what? possibly in a universe with different laws of physics? how do I calculate a probability of that?).

comment by Gastogh · 2012-06-14T11:51:40.871Z · LW(p) · GW(p)

For example, suppose a computer program needs to model people very accurately to make some predictions, and it models those people so accurately that the "simulated" people can experience conscious suffering. In a very large computation of this type, millions of people could be created, suffer for some time, and then be destroyed when they are no longer needed for making the predictions desired by the program. This idea was first mentioned by Eliezer Yudkowsky in Nonperson Predicates.

Nitpick: we can date this concern at least as far back as Vernor Vinge's A Fire Upon the Deep:

Pham Nuwen's ticket to the Transcend was based on a Power's sudden interest in the Straumli perversion. This innocent's ego might end up smeared across a million death cubes, running a million million simulations of human nature.

comment by lavalamp · 2012-06-14T01:46:07.082Z · LW(p) · GW(p)

For instance, it is intuitively clear that no eight-bit program could possibly be a computation hazard on a normal computer.

This is not clear to me at all.

Replies from: Nornagest

↑ comment by Nornagest · 2012-06-14T02:34:18.881Z · LW(p) · GW(p)

Well, modern computers have word sizes much larger than eight bits, so it's not possible for a valid eight-bit program to exist on such a platform. That's a degenerate case, though.

comment by DanielLC · 2012-06-14T06:09:15.401Z · LW(p) · GW(p)

Some algorithms are obviously not people.

I disagree. I don't think sentience is all-or-nothing. Given that, I'd expect that it would be almost impossible (in the mathematical one-in-infinity sense) for a given system to have exactly zero sentience. Some algorithms are just not very much people. Some algorithms will produce less sentience in a thousand years than you will in a microsecond.

Replies from: Alex_Altair

↑ comment by Alex_Altair · 2012-06-14T06:18:39.392Z · LW(p) · GW(p)

I don't think sentience is all-or-nothing.

Fascinating! I can imagine this being true. So maybe I should say, "Some algorithms are obviously not in the utility function of pretty much anybody.". But then again, I don't think "people" means "sentience". I don't care about simulating rocks, whether or not they have 0.001 sentience.

comment by Thomas · 2012-06-14T04:54:28.795Z · LW(p) · GW(p)

Agree with this. Besides, there is a vast natural computing going on in the wild. As an aspect of physics. I'd like to believe it is trivial, but I can't. Some could be quite unsafe, I am afraid, before it goes to some trivial direction.

Replies from: Alex_Altair

↑ comment by Alex_Altair · 2012-06-14T04:56:50.455Z · LW(p) · GW(p)

Interesting point. I considered mentioning that human brains are computers, and therefore there is a potential for computational hazards withing human brains. In fact, humans are the result of the universe computing, so really, every hazard is a computation hazard.

Replies from: DanielLC, Thomas

↑ comment by DanielLC · 2012-06-14T06:10:59.661Z · LW(p) · GW(p)

Case in point: xkcd: Nightmares.

Try not to dream about and/or imagine sad people.

↑ comment by Thomas · 2012-06-14T05:39:45.083Z · LW(p) · GW(p)

I considered mentioning that human brains are computers, and therefore there is a potential for computational hazards withing human brains.

Those already have the name. Mental problems. Be hallucinations or something else.

Replies from: DanielLC

↑ comment by DanielLC · 2012-06-14T06:13:34.638Z · LW(p) · GW(p)

Mental problems are when something goes wrong. It's entirely possible that computational hazards happen in normal thought processes. It's perfectly normal to imagine people that can pass the Turing test. Are they real? Do they die when you stop thinking about them?

comment by amit · 2012-06-16T08:35:23.607Z · LW(p) · GW(p)

An example of a computation that runs most algorithms is a mathematical formalism called Solomonoff induction.

Solomonoff Induction is uncomputable, so it's not a computation. Would be correct if you had written

An example of a computation that runs most algorithms could be some program that approximates a mathematical formalism Solomonoff induction.

Also, strictly speaking no real-world computation could run "most" algorithms, since there are infinitely many and it could only run a finite number. It would make more sense to use an expression like "computations that search through the space of all possible algorithms".

comment by amit · 2012-06-16T08:04:05.946Z · LW(p) · GW(p)

A function that could evaluate an algorithm and return 0 only if it is not a person is called a nonperson predicate. Some algorithms are obviously not people. Some algorithms are obviously not people. For example, any algorithm whose output is repeating with a period less than gigabytes...

Is this supposed to be about avoiding the algorithms simulating suffering people, or avoiding them doing something dangerous to the outside world? Obviously an algorithm could simulate a person while still having a short output, so I'm thinking it has to be about the second one. But then the notion of nonperson predicates doesn't apply, because it's about avoiding simulating people (that might suffer and that will die when the simulation ends). Also, a dangerous algorithm could probably do some serious damage with under a gigabyte of output. So having less than a gigabyte output doesn't really protect you from anything.

Replies from: Alex_Altair

↑ comment by Alex_Altair · 2012-06-17T07:00:42.140Z · LW(p) · GW(p)

I meant the first one. I was thinking that extremely brief "experiences" repeated over and over wouldn't constitute a person, and so I called it periodic output, but obviously that was wrong. I changed it for clarity.

comment by yli · 2012-06-16T08:02:50.832Z · LW(p) · GW(p)

A function that could evaluate an algorithm and return 0 only if it is not a person is called a nonperson predicate. Some algorithms are obviously not people. Some algorithms are obviously not people. For example, any algorithm whose output is repeating with a period less than gigabytes...

comment by billswift · 2012-06-14T01:55:11.160Z · LW(p) · GW(p)

The first link in the reference section is wrong, it should be to http://www.aleph.se/papers/oracleAI.pdf

Replies from: Alex_Altair

↑ comment by Alex_Altair · 2012-06-14T03:43:28.310Z · LW(p) · GW(p)

Thanks, fixed.

comment by DanielLC · 2012-06-14T00:37:10.137Z · LW(p) · GW(p)

This seems to be talking about preventing the accidental creation of people in general. This has no net effect. You need to prevent the creation of suffering people, and encourage the creation of happy people.

Replies from: gwern, Alex_Altair

↑ comment by gwern · 2012-06-14T01:16:52.577Z · LW(p) · GW(p)

This has no net effect.

You believe there are as many possible happy people as people suffering? That seems a little unlikely...

Replies from: DanielLC

↑ comment by DanielLC · 2012-06-14T03:09:01.890Z · LW(p) · GW(p)

I don't believe the amounts are exactly the same, but I don't know which is more common/useful.

Also, I believe the amounts are probably similar. As such, you'd get significantly more utility if you try to only create happy people vs. creating more/fewer people.

↑ comment by Alex_Altair · 2012-06-14T03:48:21.587Z · LW(p) · GW(p)

In the situations above, the people will be created and, happy or not, eliminated as soon as they are no longer needed.

Also, I think it's not obvious whether we should create more happy people, or just improve the lives of the currently existing people. I kind of get the idea that post-singularity, we will all be combined into One Big Super Person, like reverse Ebborians, and it won't end up mattering.

Replies from: DanielLC

↑ comment by DanielLC · 2012-06-14T06:06:55.134Z · LW(p) · GW(p)

In the situations above, the people will be created and, happy or not, eliminated as soon as they are no longer needed.

I don't mean that they'll exist permanently. It's good for a happy person to exist, even if it's only for a little while.

Also, I think it's not obvious whether we should create more happy people, or just improve the lives of the currently existing people.

You shouldn't go out of your way to avoid running programs that create happy people. More generally, if it would be helpful to run such a program, but not quite worth the resources on its own, it may be worth while if it's a happy person. That will happen about as often as a program being worth while on its own, but not worth running because it creates a sad person.

Replies from: Alex_Altair

↑ comment by Alex_Altair · 2012-06-14T06:20:13.214Z · LW(p) · GW(p)

It's good for a happy person to exist, even if it's only for a little while.

I see that we have different utility functions.

comment by JGWeissman · 2012-06-13T22:32:20.628Z · LW(p) · GW(p)

There are other hazards that may arise in the course of running large-scale computations.

Every computational hazard I can come up with involves simulating people. Do you have examples of any that don't?

Replies from: Alex_Altair, gwern

↑ comment by Alex_Altair · 2012-06-13T22:56:42.247Z · LW(p) · GW(p)

Yeah, the "self-improving agent", "simulate powerful agents", "self-fulfilling prophecies" and "oracles or predictors may become agents" were all meant to be examples of computation hazards, and those doesn't necessarily involved simulating people.

Replies from: JGWeissman

↑ comment by JGWeissman · 2012-06-13T23:18:39.130Z · LW(p) · GW(p)

Ah, I was thinking of "computational hazard" as meaning the computation itself is bad, not its consequences on the computing substrate or outside environment. I thought a "self-improving agent" was an example of something that might compute a hazard as a result of computing lots of stuff, some of which turns out to be hazardous. But short of instantiating that particular computational hazard, I don't think it does bad merely by computation, rather the computation helps it direct its actions to achieve bad consequences.

Replies from: Alex_Altair

↑ comment by Alex_Altair · 2012-06-14T00:19:07.434Z · LW(p) · GW(p)

I think I agree.

↑ comment by gwern · 2012-06-14T01:15:38.670Z · LW(p) · GW(p)

If your consequentialist ethics cares only about suffering sentient beings, then unless the simulations can affect the simulating agent in some way and render its actions less optimal, creating suffering beings is the only way there can be computation hazards.

If your ethics cares about other things like piles made of prime-numbered rocks, then that's a computation hazard; or if the simulations can affect the simulator, that obviously opens a whole kettle of worms.

(For example, there's apparently a twisty problem of 'false proofs' in the advanced decision theories where simulating a possible proof makes the agent decide to take a suboptimal choice; or the simulator could stumble upon a highly optimized program which takes it over. I'm sure there are other scenarios like that I haven't thought of.)

Replies from: JGWeissman

↑ comment by JGWeissman · 2012-06-14T01:32:37.092Z · LW(p) · GW(p)

If your consequentialist ethics cares only about suffering sentient beings, then unless the simulations can affect the simulating agent in some way and render its actions less optimal, creating suffering beings is the only way there can be computation hazards.

Agreed. The sentence I quoted seemed to indicate that Alex thought he had a counterexample, but it turns out we were just using different definitions of "computational hazards".

Replies from: Alex_Altair

↑ comment by Alex_Altair · 2012-06-20T20:06:35.889Z · LW(p) · GW(p)

The only counterexample I can think of is where the computation invents cures or writes symphonies and, in the course of computation, indifferently disposes of them. This could be considered a large negative consequence of "mere" computation, but yeah, not really.

comment by Shmi (shminux) · 2012-06-13T23:14:58.996Z · LW(p) · GW(p)

For example, suppose a computer program needs to model people very accurately to make some predictions, and it models those people so accurately that the "simulated" people can experience conscious suffering.

Presumably one will have enough power to detect suffering in a simulated human and block this simulated emotion (it's already possible in real humans with drugs, hypnosis or surgery).

Replies from: gwern, DanArmak

↑ comment by gwern · 2012-06-14T01:07:58.240Z · LW(p) · GW(p)

Any such algorithm for detecting suffering in arbitrary Turing machines would seem to run afoul of Turing/Rice; a heuristic algorithm could probably either reject most suffering Turing machines (but is that acceptable enough?) or reject all suffering Turing machines (but does that cripple the agent from a practical standpoint?), and such a heuristic might take more processing power than running the Turing machines in question...

Replies from: shminux

↑ comment by Shmi (shminux) · 2012-06-14T02:03:13.412Z · LW(p) · GW(p)

A few points:

Simulated humans are not arbitrary Turing machines.
To make any progress toward FAI, one has to figure out how to define human suffering, including simulated human suffering. It might not be easy, but I see it as an unavoidable step. (Which also means that if you can prove that human suffering is non-computable, you basically prove that FAI is impossible.)
Analogous to pain asymbolia, it should be possible to modify the simulated human to report (and possibly block) potential "suffering" without feeling it.
Real humans don't take a lot of CPU cycles to identify and report suffering, so neither should simulated humans.
A non-suffering agent might not be as good as one which had loved and lost, but it is certainly much more useful than a blanket prohibition against simulating humans, as proposed in the OP.

Replies from: gwern, Zaine, None

↑ comment by gwern · 2012-06-14T20:50:47.064Z · LW(p) · GW(p)

Simulated humans are not arbitrary Turing machines.

Arbitrary Turing machines are arbitrary simulated humans. If you want to cut the knot with a 'human' predicate, that's just as undecidable.

Which also means that if you can prove that human suffering is non-computable, you basically prove that FAI is impossible.

There we have more strategies. For example, 'prevent any current human from suffering or creating another human which might then suffer'.

Analogous to pain asymbolia, it should be possible to modify the simulated human to report (and possibly block) potential "suffering" without feeling it.

Is there a way to do this perfectly without running into undecidability? Even if you had the method, how would you know when to apply it...

↑ comment by Zaine · 2012-06-14T03:24:45.257Z · LW(p) · GW(p)

I can't help but think of TRON 2 when considering the ethics of creating simulated humans that are functionally identical to biological humans. For those unfamiliar with the film, a world comprised of data is inherently sufficient to enable the spontaneous generation of human-like entities. The creator of the data world finds the entities too imperfect, and creates a data world version of himself tasked with making the data world perfect according to an arcane definition for 'perfection' the creator himself has not fully formed. The data world version of the creator then begins mass genocide of the entities, creating human-like programs that are merely perfect executions of crafted code to replace them; if the programs exhibit individuality, they are deleted. The movie asserts this genocide is wrong.

If an AI is sufficiently powerful enough to be capable of mass-generating simulations that are functionally identical to a biological human, such that they are capable of original ideas, compassion, and suffering; if an AI can create simulated humans unique enough that their thoughts and actions over thousands of iterations of the same event are not predictable with 100% accuracy; then would it not be generating Homo sapiens sapiens en masse?

If indeed not, then I fail to see why mass creation and subsequent genocide over many iterations is the sort of behaviour mitigators of computational hazards wish to encourage.

Replies from: Kyre

↑ comment by Kyre · 2012-06-14T05:28:34.019Z · LW(p) · GW(p)

Off topic, but the TRON sequal has at least two distinct friendly AI failures.

Flynn creates CLU and gives him simple-sounding goals, which ends badly.

Flynn's original creation of the grid gives rise to unexpected and uncontrolled intelligence of at least human level.

↑ comment by [deleted] · 2012-06-14T05:34:16.465Z · LW(p) · GW(p)

Simulated humans are not arbitrary Turing machines.

We still don't have guaranteed decidability for properties of simulations.

To make any progress toward FAI, one has to figure out how to define human suffering,

There are so many problems in FAI that have nothing to do with defining human suffering or any other object level moral terms. Metaethics, goal invariant self modification, value learning and extrapolation, avoiding wireheading, self deception, blackmail, self fulfilling prophecies, representing logical uncertainty correctly, finding a satisfactory notion of truth, and many more.

Which also means that if you can prove that human suffering is non-computable, you basically prove that FAI is impossible

This sounds like an appeal to consequences, but putting that aside: Undecidability is a limitation of minds in general, not just FAI, and yet, behold!, quite productive, non-oracular AI researchers exist. Do you know that we can compute uncomputable information? Don't declare things impossible so quickly. We know that friendlier-than-torturing-everyoine AI is possible. No dream of FAI should fall short of that, even if FAI is "impossible".

Real humans don't take a lot of CPU cycles to identify and report suffering, so neither should simulated humans.

Even restricting simulated minds to things that looks like present humans, what makes you think that humans have any general capacity to recognize their own suffering? Most mental activity is not consciously perceived.

↑ comment by DanArmak · 2012-06-17T06:45:38.398Z · LW(p) · GW(p)

If the simulated person is suffering, presumably that is part of what the simulation is meant to analyze. If you change it so they don't suffer, they'll behave differently, and the simulation will not answer the original question it was meant to.

comment by Pentashagon · 2012-06-14T01:18:38.916Z · LW(p) · GW(p)

Mathematically, every algorithm halts and has a well-defined deterministic sequence of operations that results in a final outcome. Therefore every algorithm that simulates a conscious being is already mathematically well-defined and everything that happens to the simulated being is equally well defined, including the internal relationships in the simulated being that represent thoughts, emotions, feelings, etc. In the mathematical sense every possible algorithmic simulation already exists regardless of whether we perform the full computation on some physical hardware. Therefore the suffering and happiness of every possible simulated being also exists, mathematically.

Does physically executing a particular algorithm fundamentally affect the real existence of a being simulated by that algorithm? To the ones running the simulation it becomes obvious what happens to the being whereas it may not be obvious otherwise, but to the simulated being itself it is indistinguishable from simply existing as a mathematical result of the definition of an algorithm and a particular input to that algorithm.

That said, it's obvious that which algorithms we allow to interact with our universe is indeed important. It could be that simulating a suffering being would have negative consequences for us, similar to how observing real suffering can have negative consequences. Similarly, simulating an AGI that can interact with our universe (choosing input based upon our universe or especially upon previous executions of the algorithm) pulls that mathematically well-defined AGI out of algorithm-space and into real-space, allowing its computation to affect the world. It is conceivable that certain algorithms and inputs could even escape the most strictly controlled physical computer due to flaws in design or manufacturing.

Replies from: moridinamael, Randaly

↑ comment by moridinamael · 2012-06-14T04:00:21.293Z · LW(p) · GW(p)

I'm going to shamelessly quote myself from a previous discussion on waterfall ethics,

I don't think our intuitions about what "really happens" (versus what is "mathematically well defined") are useful. I think we have to zoom out at least one level and realize that our moral and ethical intuitions only mean anything within our particular instantiation of our causal framework. We can't be morally responsible for the notional space of computable torture simulations because they exist whether or not we "carry them out." But perhaps we are morally responsible for particular instantiations of those algorithms.

I also want to draw attention to the statement in the original post,

If these simulations are sufficiently precise, then they will be people in and of themselves. The simulations could cause those people to suffer, and will likely kill them by ending the simulation when the prediction or answer is given.

This usage of "killing" is conceptually very distant from the intuitive notion for the reasons you (Pentashagon) indicate. I don't feel that the matter of how to handle moral culpability for events occurring in causally disconnected algorithms is sufficiently settled that we can meaningfully have this conversion.

↑ comment by Randaly · 2012-06-14T03:39:22.508Z · LW(p) · GW(p)

Mathematically, every algorithm halts and has a well-defined deterministic sequence of operations that results in a final outcome.

This is not true.

Replies from: Zack_M_Davis, asr

↑ comment by Zack_M_Davis · 2012-06-14T04:16:00.715Z · LW(p) · GW(p)

It depends on what you mean by the word algorithm; there are contexts where many authors find it useful to reserve the word for processes that do, in fact, halt. (Example citations: the definition of algorithm in Schneider et. al's Invitation to Computer Science includes the phrase "halts in a finite amount of time"; Hopcroft et al.'s Introduction to Automata Theory, Languages, and Computation says "Turing machines that always halt [...] are a good model of an 'algorithm.'")

Replies from: Randaly

↑ comment by Randaly · 2012-06-14T05:21:35.798Z · LW(p) · GW(p)

Then Pentashagon is arguing by definition. The article discusses (and its arguments are relevant to) algorithms in general, not necessarily those that halt; I don't believe that it has ever been proven that any algorithm representing a person necessarily halts.

Replies from: Pentashagon

↑ comment by Pentashagon · 2012-06-15T19:50:07.915Z · LW(p) · GW(p)

In this universe all the computations we perform will halt either because the Turing Machine they represent halts or because we interrupt the computation. Therefore every computation we will perform can be represented by a halting algorithm (even if it's only limited by "execute only up to 10^100 operations"). I don't see that as a limitation on the kinds of simulations that we perform, and I don't think I treated it as such. If you'd like my same argument for Turing Machines, here it is:

A Turing Machine is a set of symbols (with a blank), states (with an initial state and a possibly empty set of accepting/halting states), a transition function, and a tape with input on it. For this purpose I will define the result of applying the transition function to a Turing Machine in a halting/accepting state to yield the same state with no changes to the tape rather than being undefined as in some definitions. Let TM_i represent the entire state of the Turing Machine TM at any point of its execution so that TM_0 = TM and TM_i is the result of applying the transition function to the tape and state of TM_i-1. By induction, TM_n exists for all n>=0.

Given any simulation of a sentient being, S_b, that can be fully represented by a Turing machine, there exists a well-defined state of that being at any discrete step of the simulation given by TM_i. Therefore all suffering and happiness (and everything else) the being experiences is defined by the set of TM_i's for all i>=0. It is not necessary for another agent to compute any TM_i for those relationships to exist; the only effect is that the agent is made aware of the events represented by TM_i. I like the "looking through a window" analogy of Viliam_Bur which captures the essence of this argument quite well. What is mathematically certain to exist does not require any outside action to give it more existence. In that sense there is no inherent danger in computing any particular simulation because what we compute does not change the predetermined events in that simulation. What does pose a potential danger is the effect we allow those simulations to have on us, and by extension what we choose to simulate can pose a danger. Formally if P(U|C) is lower than P(U|~C) where U is something of desirable utility and C is the execution of a computation then C poses a danger to us. If we simulate an AGI and then alter the world based on its output we are probably in danger.

Aside from theoretical dangers there are practical dangers arising from the fact that our theoretical computations are actually performed by real matter in our universe. For instance I once saw a nice demonstration of displaying a sequence of black and white line segments on a CRT monitor to generate radio waves that could be audibly received on a nearby AM radio. CPUs generate a lower amount of RF but theoretically a simulation could abuse this ability to interact with the real world in unexpected and uncontrolled ways.

↑ comment by asr · 2012-06-14T04:27:50.061Z · LW(p) · GW(p)

I suspect that Pentashagon is not using "algorithm" as a synonym for "Turing machine" -- I've often seen the word used to mean a deterministic computation that always halts.

Computation Hazards

Contents

This is a summary of material from various posts and discussions. My thanks to Eliezer Yudkowsky, Daniel Dewey, Paul Christiano, Nick Beckstead, and several others.

Agents

Predictors

Oracles

Examples of hazards

Methods for avoiding computational hazards

References

58 comments