Limits on self-optimisation

rolfandreassen

Limits on self-optimisation

post by RolfAndreassen · 2012-01-20T21:58:05.271Z · LW · GW · Legacy · 37 comments

37 comments

Disclaimer: I am a physicist, and in the field of computer science my scholarship is weak. It may be that what I suggest here is well known, or perhaps just wrong.

Abstract: A Turing machine capable of saying whether two arbitrary Turing machines have the same output for all inputs is equivalent to solving the Halting Problem. To optimise a function it is necessary to prove that the optimised version always has the same output as the unoptimised version, which is impossible in general for Turing machines. However, real computers have finite input spaces.

Context: FOOM, Friendliness, optimisation processes.

Consider a computer program which modifies itself in an attempt to optimise for speed. A modification to some algorithm is *proper* if it results, for all inputs, in the same output; it is an optimisation if it results in a shorter running time on average for typical inputs, and a *strict* optimisation if it results in a shorter running time for all inputs.

A Friendly AI, optimising itself, must ensure that it remains Friendly after the modification; it follows that it can only make proper modifications. (When calculating a CEV it may make improper modifications, since the final answer for "How do we deal with X" may change in the course of extrapolating; but for plain optimisations the answer cannot change.)

For simplicity we may consider that the output of a function can be expressed as a single bit; the extension to many bits is obvious. However, in addition to '0' and '1' we must consider that the response to some input can be "does not terminate". The task is to prove that two functions, which we may consider as Turing machines, have the same output for all inputs.

Now, suppose you have a Turing machine that takes as input two arbitrary Turing machines and their respective tapes, and outputs "1" if the two input machines have the same output, and "0" otherwise. Then, by having one of the inputs be a Turing machine which is known not to terminate - one that executes an infinite loop - you can solve the Halting Problem. Therefore, such a machine cannot exist: You cannot build a Turing machine to prove, for arbitrary input machines, that they have the same output.

It seems to follow that you cannot build a fully general proper-optimisation detector.

However, "arbitrary Turing machines" is a strong claim, in fact stronger than we require. No physically realisable computer is a true Turing machine, because it cannot have infinite storage space, as the definition requires. The problem is actually the slightly easier (that is, not *provably* impossible) one of making a proper-optimisation detector for the space of possible inputs to an actual computer, which is finite though very large. In practice we may limit the input space still further by considering, say, optimisations to functions whose input is two 64-bit numbers, or something. Even so, the brute-force solution of running the functions on all possible inputs and comparing is already rather impractical.

37 comments

Comments sorted by top scores.

comment by JGWeissman · 2012-01-20T22:06:38.056Z · LW(p) · GW(p)

Given a program, finding another program that has the same input to output map is much easier than given two programs, determining if they have the same input to output map.

The halting problem and similar are generally not practical problems when you are deliberately constructing your programs with analyzable structure, and you are willing to require that programs not just have a desired property, but that they provably have the desired property.

Replies from: RolfAndreassen, billswift

↑ comment by RolfAndreassen · 2012-01-21T05:20:27.845Z · LW(p) · GW(p)

Given a program, finding another program that has the same input to output map is much easier than given two programs, determining if they have the same input to output map.

This is interesting. Do you have any sources that discuss why this is so?

Replies from: asr, saturn, JGWeissman, timtyler

↑ comment by asr · 2012-01-22T02:16:14.604Z · LW(p) · GW(p)

Given a program, finding another program that has the same input to output map is much easier than given two programs, determining if they have the same input to output map.

This is interesting. Do you have any sources that discuss why this is so?

I have a short proof: suppose we have some program, A. And some program that provably does nothing, B.

Run first B, then A. Presto, a new program with output identical to A. Depending on whether A is a black box or some structure you can reason about, you can also hoist B up into the body of A. Again, provably output identical to A.

The underlying reason this is easier than proving general equivalence of programs is that we picked specific program B with a well-known behavior. Trying to prove two programs equivalent would be trying to reason about all possible modifications, rather than some one specific patch.

↑ comment by saturn · 2012-01-22T00:54:22.972Z · LW(p) · GW(p)

You can prove that a smaller set of transformations preserve the behavior of subsections of a program, then you can combine arbitrarily many of those transformations and preserve the input to output map of a larger program.

↑ comment by JGWeissman · 2012-01-21T06:06:33.087Z · LW(p) · GW(p)

This comes from my direct experience programming computers.

Making small changes to a program which do not affect its input output map is usually trivially easy, and IDEs such as Eclipse and Visual Studios include tools to make such changes for you, such as extracting a section of code into a method, and replacing that section of code with a call to that method. In general this is called refactoring, and its common forms tend to be useful preparations for more substantial changes.

With respect to optimization, a program that is well organized into methods which hide implementation details can be changed by replacing a method with one that performs the same function with a more efficient implementation. Sometimes a strict refactoring is not required, you just have to preserve the properties that callers of the method rely on. For example, if you have a method which sorts a list, and you only care that the sorted list reflects the ordering, you could replace it using an algorithm with different behavior with respect to elements that are equivalent by the ordering. More to the point, if you have a chess AI that searches the game tree as deep as it can in a certain time, and you optimize the search, you are happy that the AI produces different output as long as it still outputs the best move it finds according to its deeper search of the game tree. For similar reasons, an AGI should produce better output in real time if it substantially optimizes its own efficiency.

Asking if you can determine if arbitrary have the same output is really a wrong question. Ask if you can construct a better program that has the same properties you care about.

Replies from: RolfAndreassen

↑ comment by RolfAndreassen · 2012-01-21T21:58:40.625Z · LW(p) · GW(p)

I also have some experience in programming, but I think you are focusing too narrowly on human techniques. Humans have discovered a certain toolkit for optimisations, and they proceed as you say. But note that we usually optimise for readability and maintainability of the code, not for speed. Refactorings such as putting some code in a separate method will actually slow you down slightly, since you get the overhead of an extra stack frame - unless of course the compiler inlines the method for you!

A useful AI, however, is not constrained by human limitations; if it were, we wouldn't build it. It can construct completely bizarre code; it doesn't necessarily make small changes to existing code. Think of the evolved circuit that discriminates between high and low frequencies: No human engineer would ever build such a thing, and certainly you would never arrive at it by deliberately making small working modifications to an existing, working circuit. But an AI might arrive at such a thing, or its equivalent in code, by trial and error, and then need to prove that the output is still the same.

Replies from: JGWeissman, timtyler

↑ comment by JGWeissman · 2012-01-21T23:18:30.862Z · LW(p) · GW(p)

An AI is not constrained to write human maintainable code because the AI can maintain more complicated code than humans. But the AI's code will still have structure that the AI understands and uses to manage the complexity.

I do not expect an AI to write code by trial and error, so I am not worried about difficulties with that approach.

Replies from: RolfAndreassen

↑ comment by RolfAndreassen · 2012-01-21T23:48:32.275Z · LW(p) · GW(p)

I respectfully suggest that you are not thinking Weirdly enough. Notice that the evolved circuit still has structure that the laws of physics understand! An AI needn't operate directly at that level to make intuitive leaps beyond the capability of any human; and needn't operate by trial and error, precisely, (although, notice, we don't know the internal structure of generating an insight in human brains; for all we know it involves subconsciously trying a hundred different paths and discarding most of them) to generate stuff that's very different from the code it starts with.

Replies from: JGWeissman

↑ comment by JGWeissman · 2012-01-22T01:32:18.951Z · LW(p) · GW(p)

I respectfully suggest that you are not thinking Weirdly enough.

Thinking Weirdly has nothing to do with it. I expect an AI to not use programming techniques it doesn't expect to be able to use effectively, and I expect the AI's expectations to use techniques effectively to be accurate. So, given that an AI is using a technique, even if it is Weird, I expect the AI to be effective using it. If you have an argument that a certain technique, like random guessing and checking, has insurmountable problems, then you have an argument that an AI will not use that technique. Given that the AI is using a Weird technique, I expect the AI to be advanced enough to cope with, and benefit from, the Weirdness.

Notice that the evolved circuit still has structure that the laws of physics understand!

The laws of physics can only understand the structure in a poetic sense. When I say that an AI understands the structure of its code, I mean that it has a map of the code, organized into logical components with information (not required to actually run the program) about high level properties components have and how other components rely on those properties, and this information is available and useful to modifying the code in good ways.

An AI needn't operate directly at that level to make intuitive leaps beyond the capability of any human; and needn't operate by trial and error, precisely, (although, notice, we don't know the internal structure of generating an insight in human brains; for all we know it involves subconsciously trying a hundred different paths and discarding most of them) to generate stuff that's very different from the code it starts with.

It doesn't matter if the AI makes leaps beyond the capability of any human as long as it doesn't make leaps beyond its own capability. You seem much more eager to apply Weirdness to difficulty of the problem than to capability of solving the problem. It doesn't matter that understanding the AI's Weird code is too hard for humans, because it's not too hard for the Weird AI. The AI may "generate stuff that's very different from the code it starts with", but it won't generate anything so different the AI can't verify it is a good change.

↑ comment by timtyler · 2012-01-21T22:58:39.192Z · LW(p) · GW(p)

If you don't want to face that problem, surely you can just constrain the machine to write readable code.

A complicated refeactoring that is difficult to show whether it does the same thing could be discarded. What is needed most is a path forward, not the ability to traverse any possible path that leads forwards.

It seems likely that there will be a tradeoff between progress speed and safety, with "looking for a proof" being the slowest approach. Such a technique seems relatively unlikely to be effective if applied during a race.

Replies from: RolfAndreassen

↑ comment by RolfAndreassen · 2012-01-21T23:45:35.629Z · LW(p) · GW(p)

What's the use of an AI that writes code I could have written myself? If that's the case, cut out the middleman and just write the damn stuff! I specifically want an AI that's smarter than me, otherwise I have no use for it.

It seems likely that there will be a tradeoff between progress speed and safety, with "looking for a proof" being the slowest approach. Such a technique seems relatively unlikely to be effective if applied during a race.

Yes, this is true. That's a problem. We don't want the first AI that FOOMs effectively to win. We want a provably Friendly AI to win. If we demonstrate that proving Friendliness has constraints that don't apply generally, we still cannot abandon the constraints! That defeats the entire purpose!

Replies from: thomblake, timtyler

↑ comment by thomblake · 2012-01-23T19:50:02.584Z · LW(p) · GW(p)

What's the use of an AI that writes code I could have written myself?

I'm confused. Isn't this sort of thing what most machines are for? You might just as well ask, "What's the use of a clothes-washing machine, when I could've just washed all of the clothes by hand?"

Replies from: RolfAndreassen

↑ comment by RolfAndreassen · 2012-01-23T21:45:51.648Z · LW(p) · GW(p)

This is a good point, but there are two objections. First, the washing machine lets you substitute one kind of effort for another and thus use comparative advantage. I can earn enough money for a washing machine in much less time than the total time saved over the lifetime of the washing machine. With an AI, code is code; there's no comparative advantage in writing code to instead of writing code.

Second, in referring to "code I could have written myself", I was referring to qualitative advantages rather than time saved. To make the washing-machine analogy work with this, postulate a washing machine that doesn't actually save you any time - maybe you have to sit about cranking a driving shaft for two hours - but produces much cleaner clothes, or a smaller chance of ripping, or some other qualitative measure.

I note that automatic code generators we have already, in some cases built into the language features, like templates in C++. They're occasionally useful but not likely to FOOM on us.

Replies from: thomblake

↑ comment by thomblake · 2012-01-23T21:56:57.673Z · LW(p) · GW(p)

First, the washing machine lets you substitute one kind of effort for another and thus use comparative advantage.

'Comparative advantage' needs exchanging one kind of effort for another because it's a law regarding trade amongst humans. What you're looking for is mechanical advantage, which often involves trading work for more work.

With an AI, code is code; there's no [mechanical] advantage in writing code to instead of writing code.

No. If you spend 1 day writing code for an AI and then it writes all your code from now on, you've saved an arbitrarily large amount of time writing code.

Second, in referring to "code I could have written myself", I was referring to qualitative advantages rather than time saved.

Well your question was "What's the use of an AI...?", to which I could legitimately bring up all sorts of advantages you hadn't been referring to. If you had said, "What's the use of a chicken when it can't even dance?" I could respond "I could eat it" and presumably that would be an answer to your question.

To make the washing-machine analogy work with this, postulate a washing machine that doesn't actually save you any time - maybe you have to sit about cranking a driving shaft for two hours - but produces much cleaner clothes, or a smaller chance of ripping, or some other qualitative measure.

Your washing machine does still sound preferable to hand-washing, so I'm not sure what the point was.

I'm terribly confused as to what your point was, or why you think a code-writing machine would be useless. I want one!

Replies from: RolfAndreassen

↑ comment by RolfAndreassen · 2012-01-24T02:20:32.343Z · LW(p) · GW(p)

Ok, yes, an AI that saves me the effort of writing code would be useful, fair enough. I think, however, that in the context of writing a FOOMing Friendly AI, code that I could have written is not going to be sufficient.

Replies from: thomblake

↑ comment by thomblake · 2012-01-24T14:48:46.158Z · LW(p) · GW(p)

We're in agreement.

↑ comment by timtyler · 2012-01-22T01:12:35.388Z · LW(p) · GW(p)

We don't want the first AI that FOOMs effectively to win. We want a provably Friendly AI to win.

This seems as though it is framing the problem incorrectly to me. Today's self-improving systems are corporations. They are a mix of human and machine components. Nobody proves anything about their self-improvement trajectories - but that doesn't necessarily mean that they are destined to go off the rails. The idea that growth will be so explosive that it can't be dynamically steered neglects the possibility of throttles.

A "provably-Friendly AI" doesn't look very likely to win - so due attention should be give to all the other possibilities with the potential to produce a positive outcome.

↑ comment by timtyler · 2012-01-21T22:46:48.832Z · LW(p) · GW(p)

Given a program, finding another program that has the same input to output map is much easier than given two programs, determining if they have the same input to output map.

This is interesting. Do you have any sources that discuss why this is so?

The second problem is equivalent to the halting problem. The first problem isn't.

↑ comment by billswift · 2012-01-20T23:29:02.941Z · LW(p) · GW(p)

You might want to edit your second paragraph. The grammar seems to be a bit garbled, to the extent that I can't figure out what you are trying to say.

Replies from: JGWeissman

↑ comment by JGWeissman · 2012-01-20T23:40:17.128Z · LW(p) · GW(p)

Yes, there were missing words. Fixed.

comment by roystgnr · 2012-01-21T16:00:40.904Z · LW(p) · GW(p)

We don't need to compare two arbitrary Turing machines to verify an optimization, but we don't even need to compare two arbitrary Turing machines with an enormous-but-finite arbitrary input space; we just need to compare unoptimized-version-of-code to optimized-version-of-code. That's a much narrower task. Compilers arguably have a harder task, of comparing human-readable-code with machine-executable-code, and there have been provably correct compilers created even for inelegant languages like C.

I'm not sure about the entire direction of "prove that optimization won't change our output", however. A practical AI is likely to be limited by computational resources, right? So if we want to get answers out of it at all, there's a good chance that we're going to have to use approximate algorithms such as iterative methods that never reach exact convergence. But then what should a better-optimized AI do with it's effectively increased computational resources? Terminate the functions at the same, equally inaccurate point, then while(no_new_input()) twiddle_thumbs()? Or use more accurate approximations and/or more iterations to get to a better, but different output?

(edited to fix markup; thanks army1987)

Replies from: army1987, RolfAndreassen

↑ comment by A1987dM (army1987) · 2012-01-22T00:18:12.916Z · LW(p) · GW(p)

Underscores are for italics in Markdown. To get “no_new_input”, write no\_new\_input.

↑ comment by RolfAndreassen · 2012-01-21T23:41:00.355Z · LW(p) · GW(p)

we just need to compare unoptimized-version-of-code to optimized-version-of-code. That's a much narrower task.

I don't think that's true. The optimised and unoptimised code can be considered as the start of the tape of a fixed Turing machine, or as the operating instructions of different Turing machines. It is important not to conceptualise Turing machines too much in terms of our experience with desktop computers, which are only one particular implementation of the general concept. A computer has hardware, code, and input; a Turing machine just has input and rules. There is an exact equivalence between "What is the output of Turing machine X with input A+B", and "What is the output of Turing machine X' with input B", where A has been absorbed into the machine.

Terminate the functions at the same, equally inaccurate point, then while(nonewinput()) twiddle_thumbs()? Or use more accurate approximations and/or more iterations to get to a better, but different output?

I was speaking of optimisations for speed; presumably you always optimise a bottleneck, so you terminate at the same point and then go on to deal with some other input that was waiting for the response from this particular calculation.

Replies from: roystgnr

↑ comment by roystgnr · 2012-01-24T04:53:22.064Z · LW(p) · GW(p)

I never claimed that "optimized code" and "unoptimized code" can't be considered as two different generalized Turing machines; I claimed that two arbitrary different generalized Turing machines might not always be possible outputs of an optimization process. You're correctly pointing out that "all before-and-after-optimization code pairs are equivalent to pairs of Turing machines", but that is not the opposite of "not all pairs of Turing machines are before-and-after-optimization codes". And once you start looking at strict subsets all sorts of impossibilities become possibilities; I can solve the halting problem if you let me pick a sufficiently restricted subset of Turing machines as admissible inputs!

I was also speaking of optimizations for speed, to ask the question: why are you bothering to optimize for speed? The answer isn't "I want the code to be able to produce the same output in 1/Nth of the time so my computer can idle for (N-1)/Nths of the time"; what we're really looking to do is let the code produce different, better output. So "will my optimized version produce the exact same output for the exact same input" is only half of the problem; it still isn't a proof of recursive stability in a realistic use case where after each improvement we're also changing the input.

Replies from: RolfAndreassen

↑ comment by RolfAndreassen · 2012-01-24T18:34:23.481Z · LW(p) · GW(p)

You're correctly pointing out that "all before-and-after-optimization code pairs are equivalent to pairs of Turing machines", but that is not the opposite of "not all pairs of Turing machines are before-and-after-optimization codes".

A useful distinction. Thank you.

So "will my optimized version produce the exact same output for the exact same input" is only half of the problem; it still isn't a proof of recursive stability in a realistic use case where after each improvement we're also changing the input.

Suppose we have an AI that does two things: Calculate ballistics trajectories for its army of killer robots, and coherently extrapolate human volition to guide its overall goals. If it optimises the ballistics calculation, it can spend more time thinking about the CEV; this will produce a different result (unless it was already at the point of reflective stability), but in this case that's a good thing. However, the optimised ballistics calculation had better be yielding the same results or it will start losing the war. So I distinguish between two outputs: The output of the specific function being optimised must be the same. The output of the AI as a whole can differ.

comment by DuncanS · 2012-01-22T00:28:55.253Z · LW(p) · GW(p)

I think it's right to say that it's not always possible to show that two programs are equivalent.

However, it is very often possible to do so for real functions - in fact far more often than not. So real options for optimization don't appear to be much constrained by this.

comment by Anatoly_Vorobey · 2012-01-21T15:13:01.994Z · LW(p) · GW(p)

Suppose the Universe is finite in volume and amount of matter. It follows that all your algorithms are actually finite state machines, not Turing machines. How does that affect your argument? Do you believe that the choice between the Universe being infinite vs it having at most 10^117 atoms directly bears on the possibility of constructing an AGI within the next 1000 years?

Replies from: RolfAndreassen

↑ comment by RolfAndreassen · 2012-01-21T23:41:58.191Z · LW(p) · GW(p)

No, but that's not necessarily the correct upper bound to use. The difference between infinity and, say, having only a few dozen terabytes of storage space might be important.

comment by XiXiDu · 2012-01-21T10:14:32.658Z · LW(p) · GW(p)

A Friendly AI, optimising itself, must ensure that it remains Friendly after the modification;

Isn't this also true for unfriendly AI? Any AI has to ensure that improved versions of itself are friendly with respect to its initial values. So for each modification, or successor, it has to find a proof that it will not only respect its values but that it will do so in a way that more effectively maximizes expected utility.

Replies from: RolfAndreassen, timtyler, TheOtherDave

↑ comment by RolfAndreassen · 2012-01-21T21:50:43.275Z · LW(p) · GW(p)

Ah no. Friendliness is a special category of AIs, and as such is more restrictive: No AI can be Friendly whose output changes under optimisation, but an Unfriendly AI is still Unfriendly if its output changes.

↑ comment by timtyler · 2012-01-21T22:50:30.318Z · LW(p) · GW(p)

Not really. For example, you could have a "sloppy" superintelligence that traded short term gain over the future of the universe by giving it a short planning horizon.

↑ comment by TheOtherDave · 2012-01-21T18:19:00.760Z · LW(p) · GW(p)

The phrase "has to" is a little confusing here. Sure, any AI that doesn't reliably preserve its value structure under self-modification risks destroying value when it self-modifies. But something can be an AI without preserving its value structure, just like we can be NIs without preserving our value structures.

comment by DanielLC · 2012-01-21T03:54:05.087Z · LW(p) · GW(p)

It seems to follow that you cannot build a fully general proper-optimisation detector.

So don't make it fully general. It's not like that means it's some near-useless thing that can only run in certain special cases. It's fine if it works most of the time.

Replies from: RolfAndreassen

↑ comment by RolfAndreassen · 2012-01-21T04:14:29.943Z · LW(p) · GW(p)

Only if you can reliably tell when it didn't work. :)

Replies from: DanielLC

↑ comment by DanielLC · 2012-01-21T20:25:18.913Z · LW(p) · GW(p)

Make it so that if they give the same results, it can usually prove it. If they don't give the same results, it will never prove it. As such, it will work fine.

Replies from: RolfAndreassen

↑ comment by RolfAndreassen · 2012-01-21T21:51:46.237Z · LW(p) · GW(p)

Well, ok, just terminate the search for proof after some defined cutoff time, ok. But that may drastically limit the kinds of proof you are able to find.

Replies from: DanielLC

↑ comment by DanielLC · 2012-01-22T00:27:30.531Z · LW(p) · GW(p)

It may. It may not. I don't know. It may be impossible to optimize a system that much. That's not my field.

Limits on self-optimisation

Contents

37 comments