AI cooperation in practice

cousin_it

AI cooperation in practice

post by cousin_it · 2010-07-30T16:21:50.871Z · LW · GW · Legacy · 166 comments

166 comments

You know that automated proof verifiers exist, right? And also that programs can know their own source code? Well, here's a puzzle for you:

Consider a program A that knows its own source code. The algorithm of A is as follows: generate and check all possible proofs up to some huge size (3^^^^3). If A finds a proof that A returns 1, it returns 1. If the search runs to the end and fails, it returns 0. What will this program actually return?

Wait, that was the easy version. Here's a harder puzzle:

Consider programs A and B that both know their own, and each other's, source code. The algorithm of A is as follows: generate and check all proofs up to size 3^^^^3. If A finds a proof that A returns the same value as B, it returns 1. If the search fails, it returns 0. The algorithm of B is similar, but possibly with a different proof system and different length limit. What will A and B return?

This second puzzle is a formalization of a Prisoner's Dilemma strategy proposed by Eliezer: "I cooperate if and only I expect you to cooperate if and only if I cooperate". So far we only knew how to make this strategy work by "reasoning from symmetry", also known as quining. But programs A and B can be very different - a human-created AI versus an extraterrestrial crystalloid AI. Will they cooperate?

I may have a tentative proof that the answer to the first problem is 1, and that in the second problem they will cooperate. But: a) it requires you to understand some logic (the diagonal lemma and Löb's Theorem), b) I'm not sure it's correct because I've only studied the subject for the last four days, c) this margin is too small to contain it. So I put it up here. I submit this post with the hope that, even though the proof is probably wrong or incomplete, the ideas may still be useful to the community, and someone else will discover the correct answers.

Edit: by request from Vladimir Nesov, I reposted the proofs to our wiki under my user page. Many thanks to all those who took the time to parse and check them.

166 comments

Comments sorted by top scores.

comment by Vladimir_Nesov · 2010-07-31T10:47:09.959Z · LW(p) · GW(p)

I notice a curious pattern in this post: technically illiterate critical comments get voted up to +1/+2 (even after getting debunked), while technically accurate arguments (replies) stay at 0/+1. Please vote up only things you know you understand, and do vote up comments that debunk incorrect arguments.

ETA: Mostly corrected now.

Replies from: cousin_it

↑ comment by cousin_it · 2010-07-31T10:55:18.195Z · LW(p) · GW(p)

I wanted to say that yesterday, but chose instead to stay silent on the meta-issue, try as best I can to clarify the technical issues (though having to explain that there are actual technical issues was a nasty surprise for me), and refrain from downvoting illiterate critiques because I could be biased.

So thanks for saying what I wanted to say - it sounds better when someone else says it.

comment by Wei Dai (Wei_Dai) · 2010-07-30T21:46:05.397Z · LW(p) · GW(p)

Cousin_it, do you have an argument that n1 ... n6 < nmax? It seems like one way for the proof to fail is if no matter what nmax is, step 11 forces n1 > nmax.

Replies from: cousin_it

↑ comment by cousin_it · 2010-07-31T02:35:46.427Z · LW(p) · GW(p)

Only a very rough one, I'm afraid. The number n1 is chosen to "contain" the blow-up from the short proof of 10 to the proof of 11, which grows with the computational complexities of n1,...,nmax, not their values. These computational complexities can be made way smaller than n1.

Replies from: Wei_Dai, Nisan

↑ comment by Wei Dai (Wei_Dai) · 2010-07-31T06:35:13.178Z · LW(p) · GW(p)

Ok, I was checking if there was an obvious flaw that you might have missed. Now I'm trying to actually understand your proof, and I'm getting stuck on the first step, where you invoke the Diagonal Lemma. Now a typical statement of the Diagonal Lemma is:

Diagonal Lemma: If T is a theory in which diag is representable, then for any formula B(x) with exactly one free variable x there is a formula G such that G <=> B(g(G)) is a theorem in T. (g(G) is the Gödel number of G. This was copied from page 2 of http://www.cs.cornell.edu/courses/cs4860/2009sp/lec-23.pdf)

If you look at page 3 of that PDF, it gives an explicit formula for G, which happens to contain an existential quantifier. Now my question is, how do I translate that G into a code string (the one named "box" in your proof)? Also, is this supposed to be obvious, or are you assuming some background knowledge beyond the standard expositions of the Diagonal Lemma and Löb's Theorem?

Replies from: cousin_it, Vladimir_Nesov

↑ comment by cousin_it · 2010-07-31T09:02:13.653Z · LW(p) · GW(p)

You can build "box" by quining, I think. Like this:

var s = "...";
return implies(
           proves(s, n1),
           eval(outputs(1)));

Where s contains the source code of the entire snippet. I'd kinda assumed that quining and the diagonal lemma were the same thing, I guess that was wrong.

Replies from: apophenia

↑ comment by apophenia · 2010-08-03T04:00:32.481Z · LW(p) · GW(p)

Quining is implied by the diagonal lemma, and but you can use the diagonal lemma like you've written. See this webpage on quines to understand the distinction. Wei Dai, it's easy to convert between code strings and Gödel numbers, at least in theory. I don't know about your particular example's numbering, because that link isn't working for me. You may want to find something specific to complexity theory, instead of math, which I suspect you (personally) would find more understandable. The quine page has a brief intro, which includes a proof of the fixed-point theorem.

In this case I think we have B(x) = implies(proves(x, n1), eval(outputs(1))) which is fine, but doesn't bound the size of BOX. However, cousin_it specifies a way to keep BOX small using quining, so this proof seems legit to me. Its conclusion is just as unintuitive as Lob's Theorem, but the steps all look all right.

Edit: For an explicit way to compute fixed points, see the Y combinator.

↑ comment by Vladimir_Nesov · 2010-07-31T10:24:36.396Z · LW(p) · GW(p)

You don't need that to be a program for the proof to go through.

↑ comment by Nisan · 2010-09-05T19:31:06.005Z · LW(p) · GW(p)

The number n1 [...] grows with the computational complexities of n1,...,nmax

Why is this true? I am trying to understand. An obvious way to prove that eval(box) returns true is to go through the execution step by step. But when I write down what box is, it seems to call proves(box, n1) and eval(outputs(1)), which take much more than n1 steps.

Replies from: cousin_it

↑ comment by cousin_it · 2010-09-06T05:36:50.224Z · LW(p) · GW(p)

n1 is chosen as to contain the size of the "box" statement, not its execution time.

comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2010-07-30T19:47:15.821Z · LW(p) · GW(p)

Actually, someone shot down my proposed my.C = (my.C == your.C) algorithm, because if two prisoners both defect, that is in a mathematical sense "cooperating if and only if you cooperate". So if the other prisoner is Defection Rock, then my.C = (my.C == your.C) has no consistent solution.

Replies from: cousin_it, Vladimir_Nesov

↑ comment by cousin_it · 2010-07-30T20:00:02.102Z · LW(p) · GW(p)

My algorithm defects against the Defection Rock just fine :-)

Replies from: Unknowns

↑ comment by Unknowns · 2010-07-30T20:09:09.612Z · LW(p) · GW(p)

Does your proof of that take more than 3^^^^3 steps? If not, then doesn't your algorithm see that it defects against Defection Rock, and therefore it cooperates, which is inconsistent? If it does, when did you do that proof?

In other words, it seems to me that played against Defection Rock, your algorithm freezes-- it doesn't output either 1 or 0.

Replies from: cousin_it

↑ comment by cousin_it · 2010-07-30T20:17:11.958Z · LW(p) · GW(p)

Here's my proof that A defects against Defection Rock: by assumption, A's proof checker is correct. If A cooperates against the Defection Rock, A must have arrived at a proof that A's choice is equal to B's choice. But A's choice is not equal to Rock's choice. Therefore A's proof checker is incorrect, contradiction, QED.

This proof doesn't take more than 3^^^^3 steps, but it doesn't matter. A can't use it anyway because it can't assume (or prove) that its proof checker is correct. Goedel's second theorem: if a formal system asserts or proves its own consistency, it is inconsistent. That's why all proof systems that people actually use cannot prove their own consistency.

Replies from: DanielVarga, orthonormal, Unknowns

↑ comment by DanielVarga · 2010-07-31T10:35:19.995Z · LW(p) · GW(p)

This comment, and another one ("No, A doesn't know that because it doesn't know B's proof checker is correct.") are buried deep down in the comment thread, but I think they deserve their own mentions in the main post, because they are extremely helpful for the inexperienced reader to understand what the real issues are here.

↑ comment by orthonormal · 2010-07-31T18:27:39.972Z · LW(p) · GW(p)

Nice separation of levels in this comment. It's essential to point out the difference between what can be done by the automated proof-checker and what can be shown 'from the outside' about the operation of the proof-checker.

I second DanielVarga: you should incorporate this discussion into the post itself, since it's exactly the part that people get stuck on.

↑ comment by Unknowns · 2010-07-30T20:23:17.513Z · LW(p) · GW(p)

That proves that A can't cooperate, but it doesn't prove that it defects, since there remains the possibility that the program doesn't output anything.

Replies from: cousin_it

↑ comment by cousin_it · 2010-07-30T20:24:14.336Z · LW(p) · GW(p)

The program is guaranteed to finish because there's a finite number of proofs to check, and checking every proof takes finite time (a proof checker cannot loop forever on a given proof).

Replies from: Unknowns

↑ comment by Unknowns · 2010-07-30T20:30:59.271Z · LW(p) · GW(p)

So with an argument like the one that A defects, I can prove that your mind is inconsistent:

"Cousin_it will never prove this statement to be true."

I know that this is true, and presumably you know it too... which is a contradiction.

Replies from: cousin_it

↑ comment by cousin_it · 2010-07-30T20:32:46.848Z · LW(p) · GW(p)

I don't know that it's true.

And I don't see how this makes the proof wrong.

Replies from: Unknowns

↑ comment by Unknowns · 2010-07-30T20:35:41.110Z · LW(p) · GW(p)

It doesn't make the proof wrong.

Replies from: cousin_it

↑ comment by cousin_it · 2010-07-30T20:36:38.325Z · LW(p) · GW(p)

Oh, sorry then :-)

"Lucas cannot consistently assert this sentence" is an old philosophers' joke, actually.

Replies from: LucasSloan

↑ comment by LucasSloan · 2010-07-30T20:39:10.128Z · LW(p) · GW(p)

Yes, I can.

↑ comment by Vladimir_Nesov · 2010-07-31T10:43:23.620Z · LW(p) · GW(p)

The informal principle still can find a formalization, and this post gives witness to that.

comment by Unknowns · 2010-07-30T16:52:49.399Z · LW(p) · GW(p)

In the first version, there are different ways to go about programming algorithm A, and some will output 1, and some will output 0. In other words, "if A finds a proof that A returns 1" is ambiguous, because it depends on the exact value of A, not the generic description given.

In the second version, both programs output 1, since this is the only possible consistent result. If either program might output a 0, then both programs would have to, which implies that both should have output 1.

Replies from: cousin_it

↑ comment by cousin_it · 2010-07-30T17:04:06.075Z · LW(p) · GW(p)

In the first version, are you sure? I gave an argument that A will output 1, have you looked at it?

In the second version, you confuse truth and provability. If both programs output 0, that doesn't imply they can prove that their output will be the same.

Replies from: Unknowns, timtyler

↑ comment by Unknowns · 2010-07-30T19:27:17.310Z · LW(p) · GW(p)

I've looked at the argument, and it seems to me that Tim Tyler is right, the first case depends on the implementation of "proves (code, maxsteps)."

Regarding the second case, suppose A runs through checking for proofs, and doesn't find any. It then is about to output a 0. But in this case, it is impossible that B should output a 1, because then B would have a proof that A and B output the same thing, and this would be false. Therefore B also must output a 0. Therefore A and B output the same thing. So we should add a line of code to A to express this: "If A is about to output 0, then output 1, because it is proved (by the above reasoning) that A and B output the same thing." The same code should be included in B, and so both will output 1.

Replies from: cousin_it, Vladimir_Nesov, Vladimir_Nesov

↑ comment by cousin_it · 2010-07-30T19:37:57.383Z · LW(p) · GW(p)

Your proposed change will make A always output 1, and thus become a rather poor player in the Prisoner's Dilemma. You don't want to cooperate with everybody.

Replies from: Unknowns

↑ comment by Unknowns · 2010-07-30T19:39:24.863Z · LW(p) · GW(p)

However, B will as well, and you do want to cooperate with everybody who cooperates with you, so my proposed change works very well.

Replies from: JGWeissman

↑ comment by JGWeissman · 2010-07-30T19:43:21.785Z · LW(p) · GW(p)

you do want to cooperate with everybody who cooperates with you

Not if they will cooperate regardless of what you do. Then you can win more by defecting.

Replies from: Unknowns

↑ comment by Unknowns · 2010-07-30T19:44:35.017Z · LW(p) · GW(p)

A knows B's source code, and so it knows that B will only output a 1 if A and B output the same thing, i.e. only if A and B both output 1.

Replies from: cousin_it, cousin_it, cousin_it

↑ comment by cousin_it · 2010-07-30T20:29:26.105Z · LW(p) · GW(p)

No, A doesn't know that because it doesn't know B's proof checker is correct.

You really have to be triple extra careful when talking about this stuff. I'm trying to.

Replies from: Unknowns

↑ comment by Unknowns · 2010-07-30T20:32:57.406Z · LW(p) · GW(p)

Yes, I see that.

↑ comment by cousin_it · 2010-07-30T20:10:50.128Z · LW(p) · GW(p)

Nope, this is another case of confusing truth with provability. A doesn't know B's proof checker is correct, so I don't think it can make that inference.

↑ comment by cousin_it · 2010-07-30T19:53:20.045Z · LW(p) · GW(p)

Actually, no. You confuse truth and provability for the second (third?) time. A doesn't have a proof that B's proof checker is correct, so it cannot deduce what you just deduced. Please please please, just try to parse any technical material associated with the topic. People have learned to avoid this particular pitfall decades ago. We have tools for that. I am using them heavily here.

↑ comment by Vladimir_Nesov · 2010-07-31T10:38:26.015Z · LW(p) · GW(p)

I've looked at the argument, and it seems to me that Tim Tyler is right, the first case depends on the implementation of "proves (code, maxsteps)."

I replied here.

Replies from: Unknowns

↑ comment by Unknowns · 2010-07-31T13:14:22.742Z · LW(p) · GW(p)

I changed my mind anyway.

↑ comment by Vladimir_Nesov · 2010-07-31T10:34:05.892Z · LW(p) · GW(p)

Nothing depends on the details of proof verifier, since it completely covers all proofs up to some length, a set of proofs which is completely defined by the logical language. The arguments of proves(-,-) determine its value (and it's even a primitive recursive function).

↑ comment by timtyler · 2010-07-30T17:20:34.923Z · LW(p) · GW(p)

You don't specify the implementation of "proves(code, maxsteps)".

The output would appear to depend on that.

Of course, there's the whole business of universal heat death once again...

Replies from: Vladimir_Nesov, cousin_it

↑ comment by Vladimir_Nesov · 2010-07-31T10:37:40.766Z · LW(p) · GW(p)

You don't specify the implementation of "proves(code, maxsteps)".

Replies from: red75, red75, timtyler

↑ comment by red75 · 2010-07-31T18:04:11.088Z · LW(p) · GW(p)

On the other hand, pathological formal language can be constructed, that makes proof of "main()==1" arbitrary large (e.g. where 1 is represented by 3^^^3 symbols). Thus it is better to be proven that formal language isn't pathological, i.e there is no prover proves' that satisfies condition

\exists n1 \exists n2 \forall code (n1<n2 & proves'(code,n1)=proves(code,n2) )

↑ comment by red75 · 2010-07-31T16:31:16.818Z · LW(p) · GW(p)

Enumeration order of proofs is not specified, so we can construct pathological prover, that reorders proofs such that all proofs of "main()==1" require more steps than maxsteps.

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2010-07-31T16:32:55.320Z · LW(p) · GW(p)

You can't influence the length of a proof by placing it at a different position in some list of proofs. Parameter 'maxsteps' specifies the maximum length (or Gödel number) of the proofs that get checked, not the maximum number of steps performed by the proof checker algorithm.

Replies from: red75

↑ comment by red75 · 2010-07-31T16:38:23.157Z · LW(p) · GW(p)

Ouch. maxsteps means maxprooflenght. My inattention, sorry.

↑ comment by timtyler · 2010-07-31T17:51:41.570Z · LW(p) · GW(p)

...but the the "logical language" is not specified! That is one of the implementation details that is missing in the problem specification.

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2010-07-31T17:53:27.658Z · LW(p) · GW(p)

...but the the "logical language" is not specified! That is one of the implementation details that is missing in the problem specification.

You know, there are always unstated assumptions, like sanity of the reader.

Replies from: timtyler

↑ comment by timtyler · 2010-07-31T19:15:48.591Z · LW(p) · GW(p)

Usually, when posing puzzles, it is best to try to minimise the number of unstated assumptions the solution depends upon - or there won't be a unique answer.

↑ comment by cousin_it · 2010-07-30T17:29:01.804Z · LW(p) · GW(p)

The argument is admittedly sketchy, but I don't think it depends much on the implementation of "proves". Of course I'd want the proof system to hold the same properties that are used in the proof of Löb's Theorem, because my argument is almost an exact replica of that; but pretty much all proof systems actually used by people (apart from some extremely weak ones) satisfy these properties. And you're guaranteed not to become worse off if you make your proof system stronger while your opponent stays the same. That's a really big deal, if my argument is indeed correct.

About heat death: even though the two programs say they can iterate up to 3^^^3 or something, the actual time to cooperation will be much less, because the reasoning steps that I outlined are exactly the reasoning steps they'd need to stumble upon. Yes, that would make a proof of length 10000 or so (which would mean 2^10000 proofs tried), but a that's long way from 3^^^3. If the programs use some kind of heuristics to search for working proofs (but still guarantee that they will do the brute force search if heuristics fail), the proofs can be found quicker still.

Replies from: None, timtyler

↑ comment by [deleted] · 2010-08-01T00:14:04.410Z · LW(p) · GW(p)

It's a nice result, but I doubt it'll ever be practically relevant.

You could implement the algorithm today, although with a limit of less than 3^^^3. Rough guesstimates suggest that about a limit of 50 would be workable at home, 100 for a worldwide distributed computing effort and 500 for a universe-sized computer that does one operation per bit per plank time. In all cases, nowhere near 10000 (let alone 3^^^3).

Heuristics could help a bit, but making the algorithm more complex is likely to also make the proof that it works longer and so harder to find. The fallback to brute-force search is unhelpful because no computer that could actually exist would be able to find the proof that way.

Replies from: cousin_it

↑ comment by cousin_it · 2010-08-01T05:59:04.892Z · LW(p) · GW(p)

If program A uses heuristics, program B doesn't need to prove that these heuristics are correct. It only needs to see that A's use of heuristics makes A either output 1 or move on to the brute-force search.

Agreed with the rest of your comment. An easier way to reach the same conclusion: if program B is implemented as "always defect", program A has to run the brute-force search to the end, there's no shortcut (because having any shortcut in the code would make the proofs invalid if B were a copy of A).

↑ comment by timtyler · 2010-07-30T17:52:07.205Z · LW(p) · GW(p)

Re: "apart from some extremely weak ones"

It does depend on the implementation - because it is relatively easy to imagine poor provers that result in a failure to understand and prove anything much. So, you would need to specify some constraints on the prover before you can make statements about what it outputs.

Replies from: fiddlemath

↑ comment by fiddlemath · 2010-07-31T01:23:43.047Z · LW(p) · GW(p)

timtyler: I think you're confused about how proof verifiers work. A proof verifier takes a proof as input, and verifies that the given proof is correct. It doesn't have to generate the proof itself, and it gets to assume that each step is some small, easily-checkable application from a finite set of axioms.

Put another way, if the proof you give a verifier makes a large leap in reasoning, then the verifier will just tell you that the proof is wrong. Moreover, by the formalization that we demand of those proofs, the proof is wrong.

So, the problem that proof verifiers solve is computable. So, we need only to assume that the proof verifier that these programs have is correctly implemented, we don't really need to know their internal details.

(If you already understand this, though, then I'm confused about your confusion. Could happen.)

[edit:] Actually, I think I might see the point you're making; we have to assume the theory the prover understands is rich enough to model stepwise program semantics. This probably should have been specified, but we can just assume it.

Replies from: timtyler

↑ comment by timtyler · 2010-07-31T08:15:59.169Z · LW(p) · GW(p)

An automated proof-checker can identify some correct proofs - but not all of them.

That's a consequence of "Tarski's Undefinability of Truth Theorem":

http://en.wikipedia.org/wiki/Tarski%27s_undefinability_theorem

You can make a Godel sentence for any supplied finite proof checker that essentially says: "This proof checker can never prove this statement to be true". Sure enough, the proof checker won't approve any proofs that the statement is correct - but the statement is true - and that can be proven.

Replies from: Vladimir_Nesov, JoshuaZ, cousin_it

↑ comment by Vladimir_Nesov · 2010-07-31T10:14:29.576Z · LW(p) · GW(p)

Checking a proof in a recursively axiomatizable theory for correctness is decidable. The post topic programs only check proofs up to some length in, say, Peano arithmetic, which is recursively axiomatizable. Tarski's undefinability theorem is about complete theories of arithmetic.

Replies from: timtyler

↑ comment by timtyler · 2010-07-31T11:48:32.652Z · LW(p) · GW(p)

There is no such thing as a "complete theory of arithmetic" - see Godel's incompleteness theorems.

The post says it is about "programs" - a fairly general class. Where is Peano arithmetic coming into the issue?

Replies from: Vladimir_Nesov, JoshuaZ

↑ comment by Vladimir_Nesov · 2010-07-31T12:00:31.542Z · LW(p) · GW(p)

Please give up.

↑ comment by JoshuaZ · 2010-07-31T15:12:25.044Z · LW(p) · GW(p)

There is no such thing as a "complete theory of arithmetic" - see Godel's incompleteness theorems.

The above statement isn't true. What Godel's theorem says is that any effectively generatable theory which is strong enough to model PA is either incomplete or inconsistent. Effectively generatable for our purposes can be taken to mean that valid proofs in the system are enumerable and that there's a primitive recursive procedure to determine whether a given proof is valid.

If one removes the effectively generatable condition then one can easily produce a complete, consistent system. Consider for example the following system: Consider some lexicographic ordering of well-formed statements in the language of PA, and let S(i) be the ith statement in this list. Let P(0) be PA, and generate P(n) by asking whether S(n) or ~S(n) is a theorem in P(n-1). If so, then set P(n) = P(n-1). If not, then let P(n) be P(n-1) with the additional axiom of S(n). Now, consider P(infinity) defined as the union of all P(n). This theory is complete and consistent but it is very much not effectively generatable since we can't even recursively enumerate the system's axioms.

It might help for you to take a model theory or logic course since there are a lot of subtle issues here that you seem to be missing.

Replies from: timtyler

↑ comment by timtyler · 2010-07-31T16:53:21.834Z · LW(p) · GW(p)

I suppose there are complete inconsistent theories of arithmetic. However, here the discussion concerns whether a machine returns 1 - or not. Systems of math that are useless, or have infinitely many axioms and so can't be implemented don't really enter into the picture.

Replies from: JoshuaZ, Vladimir_Nesov

↑ comment by JoshuaZ · 2010-07-31T17:06:09.827Z · LW(p) · GW(p)

Tim, you're making false statements. You've now misapplied both Godel's theorem and Tarski's theorem.You also seem to be under the impression that proof checking is not generally doable in an axiomatic system, which is also wrong. Let me suggest that you simply don't have the background expertise to discuss these issues. These are subtle issues. Indeed, even your above reply runs into new issues (For example having infinitely many axioms doesn't prevent a system of axioms from being effectively generatable. Indeed many well behaved axiomatic systems have axiom schema which do just this.) You may be running into a Dunning-Kruger type problem. I'm a professional mathematician (ok, a grad student but close enough for this purpose), and I find these issues difficult. Vladimir, I and others have pointed out what is wrong with your reasoning but you don't seem to be following the explanations. Please take a logic course or read a textbook.

Replies from: cousin_it, timtyler

↑ comment by cousin_it · 2010-07-31T17:35:22.834Z · LW(p) · GW(p)

Seconded.

↑ comment by timtyler · 2010-07-31T17:36:48.016Z · LW(p) · GW(p)

Re: "You also seem to be under the impression that proof checking is not generally doable in an axiomatic system, which is also wrong."

You are claiming that every possible set of axioms allows for easy automation of proof checking?!?

If so, then we do have a genuine disagreement - and I think you are mistaken.

Replies from: Blueberry, JoshuaZ

↑ comment by Blueberry · 2010-07-31T17:58:47.635Z · LW(p) · GW(p)

Not every possible set. All you need is a primitive recursive procedure to decide whether a given statement is an axiom, and primitive recursive procedures to check whether each rule of inference is correctly followed in going from one step to another.

Replies from: timtyler

↑ comment by timtyler · 2010-07-31T19:06:58.635Z · LW(p) · GW(p)

Right: automated proof checking is not really "generally doable" in an axiomatic system - how hard it is depends entirely on the axiomatic system under discussion.

↑ comment by JoshuaZ · 2010-08-01T04:37:54.981Z · LW(p) · GW(p)

It is possible that I wasn't clear enough (my phrasing could certainly use work). I was thinking of statements such as your remark that "An automated proof-checker can identify some correct proofs - but not all of them" whereas we generally don't call something an automated proof checker unless it terminates on any proof and returns correctly if that's a valid proof or not in the axiomatic system. Sometimes we insist that this be a primitively recursive process. Sometimes this is also generalizable in a natural way such that the proofchecker can do this given some specific oracle (there are results about how these sorts of things can behave if they have access to a Halting oracle for example. I don't fully understand these except in vague generality.)

Replies from: timtyler

↑ comment by timtyler · 2010-08-01T07:43:02.069Z · LW(p) · GW(p)

FWIW, I was thinking of unprovable truths, (wrt some chosen axioms) that are provable with stronger axioms. There, a proof checker based on a finite axiom set would return "false", for true and provable statements.

Replies from: Vladimir_Nesov, JoshuaZ

↑ comment by Vladimir_Nesov · 2010-08-01T09:31:45.801Z · LW(p) · GW(p)

Many statements aren't simply "true"; if a statement isn't logically valid, it's only true in a specific theory, or about a specific structure. A given satisfiable non-valid statement is true about one structure, and false about another, true in one theory, false in another theory, independent in the third one.

Replies from: timtyler

↑ comment by timtyler · 2010-08-01T09:40:07.484Z · LW(p) · GW(p)

Sure. In this problem, we at least have a specified problem domain - whether a machine will return "1" or "0".

↑ comment by JoshuaZ · 2010-08-01T13:10:23.042Z · LW(p) · GW(p)

That's irrelevant to a proof-checker. Proof-checkers reside in a specific axiomatic system. It doesn't matter if there's an axiomatic system that would give stronger results for the purpose that was under discussion, it just matters if there's one of a given length in whatever formalism we want.

↑ comment by Vladimir_Nesov · 2010-07-31T17:00:20.602Z · LW(p) · GW(p)

You keep talking with confidence about something you clearly don't understand at all.

↑ comment by JoshuaZ · 2010-07-31T13:40:11.747Z · LW(p) · GW(p)

An automated proof-checker can identify some correct proofs - but not all of them.

That's a consequence of "Tarski's Undefinability of Truth Theorem":

No. That's not what Tarski's theorem says. Tarski's theorem is a statement about first-order arithmetic descriptions of truth (it can be generalized slightly beyond that but this is the easiest way of thinking about it). It asserts that there's no first-order description of what statements are true in a system that lies in the system.

Proof checking is a very different issue. Proof checking is straightforward. Indeed, most of the time we insist that axiomatic systems have proofs checkable as a primitive recursive function. We can do this in Peano Arithmetic for example, and in PA every proof can be verified by a straightforward algorithm.

Replies from: Vladimir_Nesov, timtyler

↑ comment by Vladimir_Nesov · 2010-07-31T13:54:08.508Z · LW(p) · GW(p)

[Tarski's theorem] asserts that there's no first-order description of what statements are true in a system that lies in the system.

You'd need be careful about what "system" or "description" are you talking about, since you can weakly represent, say, the set of statements true in Robinson arithmetic (which is recursively enumerable), in Robinson arithmetic. There is a formula s(-) with one free variable such that

Q |- s(gn(f)) iff Q |- f

where gn(f) is the Gödel number of (arbitrary) statement f and Q is Robinson arithmetic.

Replies from: JoshuaZ

↑ comment by JoshuaZ · 2010-07-31T14:01:32.894Z · LW(p) · GW(p)

Yes, I know that. I was being deliberately imprecise because it is very easy in this case to get lost in the technical details. My intention was not to convey the theorem's precise statement but rather convey to Tim why it doesn't apply in the context he's trying to use it.

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2010-07-31T14:04:24.000Z · LW(p) · GW(p)

I know you know that, but I couldn't quite naturally interpret your informal statement as saying the formally correct thing, so had to make the clarification.

↑ comment by timtyler · 2010-07-31T16:45:30.872Z · LW(p) · GW(p)

Here we are not talking about Peano Arithmetic. We are talking about whether a computer program returns 1 - or not. If you want to talk about how easy it is to verify proofs based on the axioms of Peano Arithmetic, I think you first need to explain why you think that is relevant to the problem - which says it is talking about "all possible proofs" and doesn't refer to a specified axiom set.

Replies from: JoshuaZ

↑ comment by JoshuaZ · 2010-07-31T16:48:09.969Z · LW(p) · GW(p)

Tim, please reread what I wrote. PA is an example. The relevant point is that Tarski's theorem doesn't say what you seem to think it says.

Replies from: timtyler

↑ comment by timtyler · 2010-07-31T17:05:43.479Z · LW(p) · GW(p)

It's an example - but one that doesn't relate to the problem at hand, which concerns whether a machine returns 1 - or not. If you ASSUME some particular axiom and proof system, the problem becomes easier. However, with no specified axiom system for generating proofs - and no specified proof checker - then the problem becomes poorly-specified.

Regarding Tarski's theorem - you seem to be misinterpreting me. I am not saying that checking a proof with respect to an axiom system is necessarily difficult. I am saying that truth is not contained by any finite axiom system. So, there will be true statements that are unprovable within the system - but can still be proven in a larger system. That is also what Tarski's theorem happens to say. Check it out:

"The undefinability theorem does not prevent truth in one theory from being defined in a stronger theory. For example, the set of (codes for) formulas of first-order Peano arithmetic that are true in N is definable by a formula in second order arithmetic. Similarly, the set of true formulas of the standard model of second order arithmetic (or n-th order arithmetic for any n) can be defined by a formula in first-order ZFC."

http://en.wikipedia.org/wiki/Tarski%27s_undefinability_theorem

Replies from: JoshuaZ

↑ comment by JoshuaZ · 2010-07-31T17:14:15.999Z · LW(p) · GW(p)

Regarding Tarski's theorem - you seem to be misinterpreting me. I am not saying that checking a proof with respect to an axiom system is necessarily difficult. I am saying that truth is not contained by any finite axiom system. So, there will be true statements that are unprovable within the system - but can still be proven in a larger system. Some of those unprovable truths will be of the form that I mentioned.

If that's what you mean, then you seem to be even more confused than I realized. If you are just talking about the original context then the set of proofs that are below some fixed length cannot be a statement that is true and unproveable. The proof itself self-verifies. If the statement in question is true, there must exist a proof of the statement that is of the desired bounded size. This follows from consistency and the fact that there are only a finite collection of such proofs.

Replies from: timtyler

↑ comment by timtyler · 2010-07-31T17:27:00.029Z · LW(p) · GW(p)

With a finite limit on proof length, the Godel sentence I mentioned might be longer than the limit - and so not fall within the set of proofs under consideration.

However, the point of mentioning the "Undefinability of Truth Theorem" was to illustrate how there might be truths that would be provably correct - under a more sophisticated set of axioms - but which are not captured by the axiom system being used. That can still be true if the limit on proof length is finite.

Incidentally, your rhetoric about me is getting out of control. Please calm down!

↑ comment by cousin_it · 2010-07-31T08:53:08.651Z · LW(p) · GW(p)

The proof checker checks proofs within some formal theory. The Godel sentence for the checker is certainly true and provable by us, given the consistency of that formal theory. (If the theory were inconsistent, the checker would be able to prove any sentence.) But this doesn't work as a proof within the theory! The theory cannot believe its own consistency (Godel's second incompleteness theorem), so the checker cannot assume it when checking proofs. So your argument doesn't actually give an example of a valid proof rejected by the checker.

Replies from: Jonathan_Graehl, Vladimir_Nesov, timtyler

↑ comment by Jonathan_Graehl · 2010-08-01T07:51:53.012Z · LW(p) · GW(p)

the checker would be able to prove any sentence

I think you mean that there would be some proof (that checks) for any sentence.

Replies from: cousin_it

↑ comment by cousin_it · 2010-08-01T07:54:53.840Z · LW(p) · GW(p)

Yes. Thanks.

↑ comment by Vladimir_Nesov · 2010-08-01T09:48:54.328Z · LW(p) · GW(p)

Worse, it's only "true" that a consistent theory is consistent in some unclear sense, because you can extend the theory with a statement that asserts inconsistency of the original theory, and the resulting theory will remain consistent.

Replies from: cousin_it

↑ comment by cousin_it · 2010-08-01T10:20:13.315Z · LW(p) · GW(p)

What you're saying is certainly true (onlookers, see pages 5-6 of this pdf for as especially clear explanation), but I like to think that you can't actually exhibit a proof string that shows the inconsistency of PA. If you could, we'd all be screwed!

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2010-08-01T10:39:17.744Z · LW(p) · GW(p)

I like to think that you can't actually exhibit a proof string that shows the inconsistency of PA.

Proof in what theory, "can't" by what definition of truth? In the extension of PA with inconsistency-of-PA axiom, it's both provable and true that PA is inconsistent.

Replies from: cousin_it, Wei_Dai

↑ comment by cousin_it · 2010-08-01T10:44:36.693Z · LW(p) · GW(p)

A proof in PA that 1+1=3 would suffice. Or, if you will, the Goedel number of this proof: an integer that satisfies some equations expressible in ordinary arithmetic. I agree that there's something Platonic about the belief that a system of equations either has or doesn't have an integer solution, but I'm not willing to give up that small degree of Platonism, I guess.

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2010-08-01T10:56:24.427Z · LW(p) · GW(p)

You would demand that particular proof, but why? PA+~Con(PA) doesn't need such eccentricities. You already believe Con(PA), so you can't start from ~Con(PA) as an axiom. Something in your mind makes that choice.

↑ comment by Wei Dai (Wei_Dai) · 2010-08-01T10:52:57.662Z · LW(p) · GW(p)

For those curious what Nesov is talking about:

I promised to explain why the Incompleteness Theorem doesn't contradict the Completeness Theorem. The easiest way to do this is probably through an example. Consider the "self-hating theory" PA+Not(Con(PA)), or Peano Arithmetic plus the assertion of its own inconsistency. We know that if PA is consistent, then this strange theory must be consistent as well -- since otherwise PA would prove its own consistency, which the Incompleteness Theorem doesn't allow. It follows, by the Completeness Theorem, that PA+Not(Con(PA)) must have a model. But what could such a model possibly look like? In particular, what you happen if, within that model, you just asked to see the proof that PA was inconsistent?

I'll tell you what would happen: the axioms would tell you that proof of PA's inconsistency is encoded by a positive integer X. And then you would say, "but what is X?" And the axioms would say, "X." And you would say, "But what is X, as an ordinary positive integer?"

"No, no, no! Talk to the axioms."

"Alright, is X greater or less than 10500,000?"

"Greater." (The axioms aren't stupid: they know that if they said "smaller", then you could simply try every smaller number and verify that none of them encode a proof of PA's inconsistency.)

"Alright then, what's X+1?"

"Y."

And so on. The axioms will keep cooking up fictitious numbers to satisfy your requests, and assuming that PA itself is consistent, you'll never be able to trap them in an inconsistency. The point of the Completeness Theorem is that the whole infinite set of fictitious numbers the axioms cook up will constitute a model for PA -- just not the usual model (i.e., the ordinary positive integers)! If we insist on talking about the usual model, then we switch from the domain of the Completeness Theorem to the domain of the Incompleteness Theorem.

-- Scott Aaronson

↑ comment by timtyler · 2010-07-31T11:24:36.541Z · LW(p) · GW(p)

Let's say we are trying to prove statements within ZFC.

"ZFC can never prove this statement to be true"

...is one thing and...

"This proof checker can never prove this statement to be true"

...is another.

Neither can be proved by the specified proof checker - but the second statement can be proved by another, better proof checker - still working within ZFC - so it can be seen that it is true.

Replies from: Jonathan_Graehl

↑ comment by Jonathan_Graehl · 2010-08-01T07:49:29.325Z · LW(p) · GW(p)

"proved by a proof checker" - huh?

Replies from: timtyler

↑ comment by timtyler · 2010-08-01T08:44:40.657Z · LW(p) · GW(p)

"Asserted", "approved" - whatever.

comment by Vladimir_Nesov · 2010-07-30T22:12:44.307Z · LW(p) · GW(p)

Edit: The following is not obviously possible, see this comment.

This can be generalized to provide even better algorithms for more general games. Let the following be a simplified outline of the algorithm for cooperation problem, with bounds omitted (players are A and B, our player is A):

function main1() {
    if (proves("A()==B()"))
        return "Cooperate";
    return "Defect";
}

The proof that the outcome is "Cooperate" basically hinges on the provability of Löb statement

proves("A()==B()") => (A()==B())

from which we conclude that A()==B() and so A()=="Cooperate". The statement can be constructed from

proves("A()==B()") => (A()=="Cooperate")
proves("A()==B()") => (B()=="Cooperate")

and so

proves("A()==B()") => (A()==B())

Now, we can get basically the same proof if we use a simpler algorithm for the players:

function main2() {
    if (proves("A()=="Cooperate" && B()=="Cooperate""))
        return "Cooperate";
    return "Defect";
}

This gets rid of the more general relation, but achieves basically the same result. We can improve on this by defecting against cooperating rocks:

function main3() {
    if (proves("A()=="Defect" && B()=="Cooperate""))
        return "Defect";
    if (proves("A()=="Cooperate" && B()=="Cooperate""))
        return "Cooperate";
    return "Defect";
}

This suggests a general scheme for construction of decision-making algorithms: for all possible actions a and b of players A and B, sort the pairs by utility for A, starting from highest, and implement the Löb algorithm for each possibility in turn:

function main4() {
    for(action pairs <a,b> in order of descending utility for A) {
        if (proves("A()==a && B()==b"))
            return a;
    }
    return something;
}

This plays against arbitrary opponents quite well.

Replies from: AlephNeil, cousin_it, Wei_Dai, cousin_it

↑ comment by AlephNeil · 2010-08-01T09:56:11.816Z · LW(p) · GW(p)

function main3() {
     if (proves("A()=="Defect" && B()=="Cooperate""))
         return "Defect";
     if (proves("A()=="Cooperate" && B()=="Cooperate""))
         return "Cooperate";
     return "Defect";
 }

Sorry if I'm being naive, but if we amend the program that way then doesn't it lose the ability to prove the Löb statement implies(proves(P), P) where P = "A()=='Cooperate' && B()=='Cooperate'"? (Due to the unprovability of unprovability.)

Replies from: red75, Vladimir_Nesov

↑ comment by red75 · 2010-08-01T10:29:55.139Z · LW(p) · GW(p)

Can this problem be solved by adjusting maximal length of checked proofs such that unprovability of proves("A()=="Defect" && B()="Cooperate"", maxDC) is provable by proves(..., maxsteps)?

Replies from: AlephNeil

↑ comment by AlephNeil · 2010-08-01T11:26:08.249Z · LW(p) · GW(p)

Yes, that occurred to me too, and I think it does the job.

I wonder if there are theorems of the form "There's no significantly better way to show that P isn't provable in N steps than by listing all proofs of length N and observing that P isn't proved". Then of course maxsteps would need to be exponential in maxDC.

Replies from: red75, apophenia

↑ comment by red75 · 2010-08-01T12:09:54.087Z · LW(p) · GW(p)

On the other hand the fact that proves("A()=="Cooperate" && B()=="Cooperate"") was called means that A()=="Defect" && B()=="Cooperate" is unprovable under maxDC proof length. The question is can this fact be used in proof. The proof checker can be programmed to add such self-evident propositions into set of axioms, but I can't clearly see consequences of this. And I have a bad feeling of mixing meta- and object levels.

Replies from: cousin_it, AlephNeil

↑ comment by cousin_it · 2010-08-01T14:42:26.854Z · LW(p) · GW(p)

I'm not sure if your idea will work, but have some notation to help you avoid mixing the levels. Here's how you say "try to prove this fact conditional on some other fact":

function main()
{
    if (proves(outputs(1), n1))
        return 1;
    else if (proves("implies(!proves(outputs(1), n1), eval(outputs(2)))", n2))
        return 2;
    else
        return 0;
}

↑ comment by AlephNeil · 2010-08-01T12:38:04.731Z · LW(p) · GW(p)

OK, so first we try to prove P1 = "A()=="Defect" && B()=="Cooperate"" using some theory T, then on failing that, we try to prove P2 = "A()=="Cooperate" && B()=="Cooperate"" using the augmented theory T' = T + "T cannot prove P1 in maxDC steps".

And then T' can prove that "If T' proves P2 then P2 is true".

And note that the number of steps doesn't matter any more.

So perhaps Nesov's idea was OK all along?

↑ comment by apophenia · 2010-08-03T02:43:13.712Z · LW(p) · GW(p)

"P isn't provable in N steps" is in Co-NP. Since in general it's an open question whether P=Co-NP, there might be a polynomial time algorithm to solve it, or their might not, respectively.

On the other hand, if you 1) Showed some large class of particular problems (ones that can reduced to from [no typo there] nonequivalent SAT instances, which I suspect is most of them) weren't provable in N=cP steps IF it's unprovable, then 2) You would have shown a sufficiently large class of SAT problems weren't easily solvable in N=kP steps, 3) which would imply P!=NP (and also P!=Co-NP).

If on the other hand your whole class of problems is unprovable, then you can prove P isn't provable by your theorem, which is presumably smaller than N in this context.

So, in summary my guess interesting classes of P, your question is equivalent to "Does P=NP?", which we know is a tough nut to crack.

↑ comment by Vladimir_Nesov · 2010-08-01T10:20:07.744Z · LW(p) · GW(p)

This is true, and it's a bit disturbing that it took so long for someone to point that out. I feel that there should be a way out, but don't see what it is (and finding it might suggest interesting limitations).

↑ comment by cousin_it · 2010-07-31T02:42:50.821Z · LW(p) · GW(p)

According to my experience, the study of game-playing programs has two independent components: fairness/bargaining and enforcement. If you have a good method of enforcement (quining, löbbing, Wei Dai's joint construction), you can implement almost any concept of fairness. Even though Freaky Fairness relied on quining, I feel you can port it to löbbing easily enough. But fairness in games with non-transferable utility is so difficult that it should really be studied separately, without thinking of a specific enforcement mechanism.

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2010-07-31T10:42:28.876Z · LW(p) · GW(p)

Bargaining is about logical uncertainty, something we avoid using sufficiently high upper bounds in this post.

↑ comment by Wei Dai (Wei_Dai) · 2010-07-30T22:31:23.704Z · LW(p) · GW(p)

Suppose we have two possible actions X,Y and A prefers to to to and B prefers to to to . What will be the outcome if both of them use main4() with their respective orderings and the same proves() function?

Replies from: Vladimir_Nesov, Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2010-07-30T22:50:01.998Z · LW(p) · GW(p)

Probably , which is not quite optimal. This doesn't solve the bargaining problem, which is exactly what your example is.

↑ comment by Vladimir_Nesov · 2010-07-30T22:44:57.398Z · LW(p) · GW(p)

Most likely , but possibly or if one of the agents manages to "trick" the other.

↑ comment by cousin_it · 2010-07-31T11:23:24.280Z · LW(p) · GW(p)

Your idea is similar to preferential voting.

comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2010-07-30T19:50:14.975Z · LW(p) · GW(p)

Does the obvious Lobian proof for 1 still go through if there's a bound on the proof length built into all the sentences?

Replies from: cousin_it

↑ comment by cousin_it · 2010-07-30T19:57:45.533Z · LW(p) · GW(p)

In the linked text file I tried to do just that, carefully tracking proof lengths and making sure the bounds stay bounded. But this could really use some checking.

The second problem uses the same trick, of course: "if we both prove that we output the same thing, then we output the same thing", then move ahead using "we both prove" instead of "I prove" everywhere.

comment by Vladimir_Nesov · 2010-08-01T08:17:53.733Z · LW(p) · GW(p)

The generalized action idea, applied to this setting, works as follows. Agents A and B can surrender control over their output to procedures they don't fully understand, but expect to produce good results. That they expect good result is reflected, as in previous construction, in the order in which they check for the possibility:

function main_A() {
    if (proves("A()==Foo() && B()==Bar()"))
        return Foo();
    ...
}

function main_B() {
    if (proves("A()==Foo() && B()==Bar()"))
        return Bar();
    ...
}

Here, the programs might be unable to prove anything specific about the output of Foo() and Bar(), but can still use the trick to coordinate their simultaneous invocation. This is also one way to look at how to solve the coordination problem.

Note that Foo() and Bar() might as well implement the algorithms similar to what we have in main(), so choosing them is a high-level decision, while the low-level decisions still remain to be made by them. Deciding hierarchically makes individual proofs, and individual decisions, easier to perform, so that you don't need a ridiculously big upper bound on which proofs to consider. That we can forget how the hierarchical code is written (say, by inlining most functions) also shows the potential for finding similarly simplifying patterns in the more monolithic code.

comment by AlephNeil · 2010-07-31T19:32:47.768Z · LW(p) · GW(p)

This is awesome - great post!

I've scanned briefly through the comments, and didn't see anyone ask this, but apologies if I missed it:

You use the diagonal lemma to establish that there is a piece of code called "box" such that the following is provable:

eval(box) == implies( proves(box, n1), eval(outputs(1)))

But couldn't there likewise be a "box2" such that:

eval(box2) == implies( proves(box2, n1), eval(outputs(2)))

And then couldn't you patch up the proof somehow to show that program has to output 2? Specifically, to get the analogue of step 2, can't you prove that proves(outputs(2), n) implies the negation of proves(outputs(1), n), and hence prove that proves(outputs(2), n) implies outputs(2)?

Even if this is correct, it only applies to the program you use in your proof, where if proves(outputs(2), n) then 2 is returned; whereas the puzzle above doesn't allow for 2 to be returned at all. However, if I'm right then presumably something must be going wrong in the proof?

Replies from: cousin_it

↑ comment by cousin_it · 2010-07-31T19:38:55.700Z · LW(p) · GW(p)

can't you prove that proves(outputs(2), n) implies the negation of proves(outputs(1), n)

To prove that, the proof system would need to know its own consistency, because an inconsistent proof system could prove both of these statements and indeed any statement at all. But a proof system cannot know its own consistency without actually being inconsistent (Goedel's second theorem).

Replies from: Vladimir_Nesov, AlephNeil

↑ comment by Vladimir_Nesov · 2010-07-31T20:30:53.529Z · LW(p) · GW(p)

For a finite set of proofs, that a given statement is not provable by a proof from that set, is provable, possibly by a proof from that set.

Replies from: cousin_it, cousin_it

↑ comment by cousin_it · 2010-07-31T21:14:30.722Z · LW(p) · GW(p)

Technically true. You could always just evaluate the two statements and there you have your proof, with length exponential in n. I very much doubt the proof could be made shorter than n though, because the statement is equivalent to "you can't find a contradiction in these axioms using less than n steps", the difficulty of which should grow way faster than linearly in n.

↑ comment by cousin_it · 2010-07-31T20:43:43.593Z · LW(p) · GW(p)

Good catch! Yes, my argument only shows that there's no short easy proof. You can always just evaluate the two statements and there you have your proof. But that has length exponential in nmax. I'd be amazed if "1=2 is unprovable with length < nmax" were provable with length < nmax, which is pretty much the same statement. This is a very very strong statement. It can also be reformulated as "you cannot find a contradiction in the theory using less than nmax steps".

↑ comment by AlephNeil · 2010-07-31T19:41:38.603Z · LW(p) · GW(p)

Aha - very cunning.

comment by AnotherIdiot · 2012-07-31T10:23:00.048Z · LW(p) · GW(p)

Edit: Wow, I really am an idiot. I assumed the second statement was true about every statement, but I just realized (by reading one extra comment after posting) that by Lob's Theorem that's not true. But my original idea, that the first statement is all that's required to prove anything, still seems to hold.

Okay, I can follow the first proof when I assume statement 1, but I don't quite understand how cousin_it got to 1. The Diagonal Lemma requires a free variable inside the formula, but I can't seem to find it.

In fact, I think I totally misunderstand the Diagonal Lemma, because it seems to me that you could use it to prove anything. If you replace "outputs(1)" by "2==3", the proof still seems to hold. Statement 2 would still be true with "2==3" (it is true about any statement, after all), and all the logic that follows from those two statements would be true. In fact, by an unclearly written chain of reasoning which I originally intended to post before realizing that it would be much simpler to just say this, all you seem to require is the first statement to be able to prove anything. If I am mistaken, which is probable, then I expect my error lies in the assumption that "outputs(1)" could be replaced by any string of code.

For my original unclear explanation of why the first statement in the proof seems to allow anything to be proven, in case anyone cares for it:

Also, I must be an idiot, but it seems to me that you can prove pretty much anything using 1. As far as I can tell, the "eval(outputs(1))" could be "2==3" or any other statement, and the only reason "eval(outputs(1))" is used is because it's useful in the proof. Given that statement 1 is proven, "eval(box)" must return true, because if it returns false, then "proves(box,n1)" cannot return true, and therefore "implies(proves(box, n1),eval(outputs(1)))" must return true, and false!=true. Assuming that my reasoning is correct (a very questionable assumption, I know), I have just proven that "eval(box)" is true, therefore "prove(box,n1)" is also true for some arbitrarily large n1. Therefore, the "implies" will return "eval(outputs(1))", which must be true for "eval(box)" to equal it. So given my earlier assumption (which I suspect is my error) that "eval(outputs(1))" can be replaced by anything, then I can replace it by any statement, which must return true to equal "eval(box)". So through this, I seem to have proven that "2==3".

Anyone care to point me to my mistake (or, to satisfy my wildest of dreams, say that I appear to be right ;) )?

Replies from: cousin_it

↑ comment by cousin_it · 2012-07-31T12:15:01.074Z · LW(p) · GW(p)

I don't understand how the first statement can be used to prove anything...

The second statement might be true for every statement, but it's not necessarily provable for every statement, which is required in the proof. In fact, the second statement is provable for "outputs(1)" by inspection of the program (because the program searches for proofs of "outputs(1)"), but not provable for "2==3" (because then "2==3" would be true, by Lob's theorem).

Replies from: AnotherIdiot

↑ comment by AnotherIdiot · 2012-07-31T15:22:25.506Z · LW(p) · GW(p)

I'm sorry, my comment grew into a mess, I should have cleaned it up a bit before posting. Anyway, I agree fully about the second statement only applying to this program, that's what I realized in the edit.

But for the first statement, I'll try to be a bit more clear.

My first claim is that "eval(box) == implies(proves(box, n1), eval('2==3'))" is a true statement, proven by the Diagonal Lemma. I'll refer to it as "statement 1", or "the first statement".

If "eval(box)" returns false, then for the first statement to be true, "implies(proves(box, n1), eval('2==3'))" must return false. "implies" only returns false if "proves(box, n1)" is true and "eval('2==3')" is false. Therefore for statement 1 to be true when "eval(box)" is false, then "proves(box, n1)" has to be true, which is a contradiction: "eval(box)" can't be false and also provably true. Therefore, "eval(box)" must be true.

So let's say "eval(box)" is true, that means that "implies(proves(box, n1), eval('2==3'))" must also return true for statement 1 to be true. One way for the "implies" statement to return true is for "proves(box, n1)" to return false. Since I have proven above that "eval(box)" is true, any good definition of the "proves" function will also return true, because if it finds no other, it will at least find my proof. Therefore, "proves(box, n1)" will return true.

So there is only one other way for the "implies" statement to return true: "eval('2==3')" must return true. Therefore, "eval('2==3')" returns true, and it follows from this that 2=3.

So, where did I go wrong?

Replies from: cousin_it

↑ comment by cousin_it · 2012-07-31T15:35:01.311Z · LW(p) · GW(p)

Your proof of eval(box) relied on the fact that eval(box) can't be false and also provably true. But that fact is equivalent to consistency of the formal theory, so it can't be proven within the formal theory itself, by Godel's theorem. Of course, since eval(box) is a terminating computation that returns true (your proof of that is correct), the formal theory can eventually prove that by step-by-step simulation. But that proof will be much longer than your proof above, and much longer than n1. In fact, proves(box,n1) will return false, and your comment serves as a nice proof of that by reductio.

Don't get discouraged, it's a very common mistake =)

comment by gRR · 2012-02-18T05:50:02.065Z · LW(p) · GW(p)

I think it's possible to coordinate without the huge computational expense, if the programs would directly provide their proofs to each other. Then each of them would only need to check a proof, not find it.

The input to the programs would be a pair (opponent's source, opponent's proof), and the output would be the decision: cooperate or defect.

The algorithm for A would be: Check the B's proof, which must prove that B would cooperate if the proof it gets in its input checks out. If the proof checks out - cooperate, otherwise - defect.

Replies from: cousin_it

↑ comment by cousin_it · 2012-02-18T06:04:14.438Z · LW(p) · GW(p)

If A needs to give B the proof of some theorem that involves both A and B (as in the post), how does it obtain such a proof quickly without knowing the source code of B in advance? And if it's a proof of some theorem involving only A, then what is that theorem and how does the whole setup work?

Replies from: gRR

↑ comment by gRR · 2012-02-18T07:21:51.086Z · LW(p) · GW(p)

Let ChecksOut(code, proof, X) = true iff proof is a proof that code is a function of two parameters PCode and PProof, which returns "C" if ChecksOut(PCode, PProof).

Def A(code, proof): if(ChecksOut(code, proof)) return "C", otherwise return "D".

The definition of ChecksOut is kinda circular, but it should be fixable with some kind of diagonalization. Like:

SeedChecksOut(code, proof, X) = true iff proof is a proof that code is a function of two parameters PCode and PProof, which returns "C" if eval(X(PCode, PProof)),

ChecksOut(code, proof) = SeedChecksOut(code, proof, #SeedChecksOut)

Replies from: cousin_it

↑ comment by cousin_it · 2012-02-18T08:14:36.302Z · LW(p) · GW(p)

Hmm, that's pretty cool, thanks! But it's still not obvious to me that it solves the same problem as the algorithm in the post. There are probably many mutually incompatible implementations of ChecksOut, because there are many ways to diagonalize that differ only syntactically. Do you have an argument why they will end up cooperating with each other? It seems to me that they won't, just like different implementations of quining cooperation.

Replies from: gRR

↑ comment by gRR · 2012-02-18T09:21:08.141Z · LW(p) · GW(p)

Hmm, yes, you're right, it's quining cooperation all over again - as defined, the programs would only cooperate with opponents that are sufficiently similar syntactically. And they need compatible proof systems in any case. So it's not a solution to the same problem, only a possible optimization.

comment by rwallace · 2010-07-30T20:15:10.314Z · LW(p) · GW(p)

The most obvious implementation for the easy version will return 0 (or, if the length limit is lifted, fail to return at all).

It's important, however, to be clear on just what constitutes a valid proof. Consider the following implementation:

main() {return 1;}

Not valid because it didn't find a proof, you say? On the contrary, assuming we are dealing with idealized mathematical computers that are perfectly deterministic and have no outside inputs, the fact that a program returns a given value on this run does indeed constitute a proof that it will do so on every run.

The point being that, while you can use concepts like proof to draw rigorous conclusions, in the face of counterintuitive things like self reference, you need some experience actually nailing things down before you can be sure you've closed all the loopholes.

Replies from: cousin_it, whpearson, rwallace

↑ comment by cousin_it · 2010-07-31T02:48:39.434Z · LW(p) · GW(p)

The most obvious implementation for the easy version will return 0 (or, if the length limit is lifted, fail to return at all).

I don't understand this. Do you mean the proof checker fails to check anything? Wouldn't that be a wrong implementation, rather than an obvious one?

Replies from: rwallace

↑ comment by rwallace · 2010-07-31T03:17:03.145Z · LW(p) · GW(p)

No, I mean the obvious implementation would have the program basically asking "what does this program return? Well, it tries to find out what this program returns. What does this program return? Well, it tries to find out..." until it hits the length limit and comes back with no solution found (or, if the length limit is lifted, keeps going forever). This is true even if the proof checker always returns quickly.

Replies from: cousin_it, orthonormal

↑ comment by cousin_it · 2010-07-31T08:38:52.189Z · LW(p) · GW(p)

This is absolutely not the obvious implementation of the algorithm in the post. My program does not "try to find out" something. It mechanically checks all proofs up to some maximum length. You can't just replace a piece of code with another piece that has a vaguely similar intent, but fails to halt!

Replies from: rwallace

↑ comment by rwallace · 2010-07-31T08:54:42.141Z · LW(p) · GW(p)

The 'fails to halt' part in either case would be only if the length limit were removed. But you are right that mechanically checking all proofs up to a maximum length is a mathematically valid implementation and can only be replaced with another piece of code that is guaranteed to give an identical result.

However, that implementation can still be reasonably described as trying to find out something. The intent can be seen not in the simple loop that iterates through all proofs up to a certain length, but in the choice of logic and axioms to encode the problem. The proofs generated from suitably chosen axioms are isomorphic to execution paths of an appropriately written program (as whpearson observed, a way to see this is to consider the Curry-Howard correspondence).

↑ comment by orthonormal · 2010-07-31T18:35:55.629Z · LW(p) · GW(p)

This isn't the same as the calculator that asks itself what value it will return, because there's a subtle separation of levels here.

↑ comment by whpearson · 2010-07-30T23:30:24.422Z · LW(p) · GW(p)

The Curry-Howard correspondence is what you are referring to right?

Replies from: mkehrt, rwallace

↑ comment by mkehrt · 2010-07-31T10:35:32.988Z · LW(p) · GW(p)

That program does not prove its return value under the Curry-Howard Correspondence without a lot of footwork (see first caveat). Under the CH interpretation, programs are proofs that their types are inhabited, not that they can produce their return values. Thus, the canonical thing one could say the above program proves is "int".

Some caveats. First, one could imagine a sufficiently complex type system where there is a type of all integers which are 1. Then, one could say that the above program is a proof of the type "type-of-ints-that-are-1". However, when adding this much specificity to one's type system, one has to be careful that the resulting system can decidably type terms; adding richness often leads to undecidability.

Second, there is simple constructive proof that the the above program produces 1. Just run it and see; if it returns 1 in finite time, that's a proof.

↑ comment by rwallace · 2010-07-30T23:36:42.932Z · LW(p) · GW(p)

That is certainly one way to look at it.

Another way to look at it is to consider purely computational proofs, such as the four color theorem, or the solutions to checkers and 5 x 5 Go.

↑ comment by rwallace · 2010-07-30T23:01:09.765Z · LW(p) · GW(p)

I'm curious, can the downvoters give any reasons for disagreeing with me?

Replies from: JenniferRM, Blueberry

↑ comment by JenniferRM · 2010-07-30T23:54:36.046Z · LW(p) · GW(p)

I did not downvote (you were at -3 and -2 when I wrote this), but I think the there are many possible reasons to downvote you depending on the voter's implicit or explicit theory of voting.

This might be seen as a pure "puzzle" post, and you aren't offering a puzzle solution that takes the question at face value. Your program doesn't do anything quine-like (like put its own source code into a input variable), nor does it have any logic engine (even by implication).

This is the kind of solution evolution might work out in very certain specific environments where it happened to "just work", but if you put this code into a place where it actually had to "work for a living" by analyzing other programs using roughly the same routines it uses to analyze itself and it gained or lost by modulation of its behavior with them (and some programs were suckers, some were cheaters, and some could be implicitly bargained with, so there were real opportunities to gain or lose) this code would not give you "optimal play ignoring computation costs" (it is either a sucker or a cheater, depending on what "1" means and in some contexts either of these strategies will lose).

Maybe another way of putting the criticism is that your "clever trick" is in noticing that the problem can sort of be solved in practice by collapsing down to have no extra levels of self awareness. But for theoretical robustness in other circumstances it would be preferable to have a system that scales to infinity, so that agents capable of modeling other agent's models of their own attempts to model other agents can still be "trusted by bad modelers". (Remember that part of the unspoken motivation here is to build skills towards building a verifiably friendly AGI.)

Also, the people here are all super smart, so telling them "just be dumber and people will trust you" isn't an answer they want to hear, and saying this (even indirectly) could maybe be seen as an attempt to troll, or to drag people into a topic area that is too difficult to discus, given LW's current sanity waterline (which is the only reason I personally vote things below zero).

Another possibility is that people don't see the reversed Tortoise and Achilles joke and just think (1) you said something dumb and (2) they should "punish dumbness" in order to maintain LW's purity (IE the signal to noise ratio).

The more reasons to downvote you expose, the more chance you have to accumulate a really low score, and that comment exposed several.

Replies from: rwallace

↑ comment by rwallace · 2010-07-31T01:54:32.996Z · LW(p) · GW(p)

Thanks for the analysis. I'll try to clarify those points, then.

Of course, I'm not suggesting my answer as a solution to the prisoner's dilemma. I actually don't think this kind of mathematical analysis is even relevant to the prisoner's dilemma in practice -- I think the latter is best dealt with by pre-commitments, reputation tracking etc. to change the payoff matrix and in particular change the assumption that the interaction is one-off -- which it typically isn't anyway. (Note that in cases involving actual criminals, we need witness protection programs to make it even approximately one-off.)

As for the logic engine, hey, I just unrolled the search loop then deleted 3^^^^3 - 1 lines of unreachable code. No different than optimizing compilers do every day :-)

If we look at the logic puzzle aspect though, this is in similar vein to this statement is true (the reverse of the classic this statement is false liar paradox -- it has two consistent solutions instead of none -- note that return 0 is just as valid a solution as return 1).

And if you ask what sort of proof system permits that, you start investigating higher-order logic, and self-reference, and the classic paradoxes thereof, and type theory and other possible solutions thereto, which are topics that are of very real significance to software verification and other branches of AI.

In summary, I think this puzzle goes places interesting enough that it's actually worth taking apart and analyzing.

Replies from: cousin_it

↑ comment by cousin_it · 2010-07-31T09:37:13.574Z · LW(p) · GW(p)

Actually, this statement is provable. This is the "Henkin sentence" and it is indeed provable in Peano arithmetic, so no need for stronger things like type theory etc. My proof in 1 is basically a translation of the proofs of the Henkin sentence and Löb's theorem (which is used to prove the former).

Replies from: rwallace

↑ comment by rwallace · 2010-07-31T11:29:13.490Z · LW(p) · GW(p)

Yes, in that sense you're right, this is isomorphic to the Henkin sentence, which is provable in first-order Peano arithmetic, if you set everything up just right in terms of precisely chosen axioms and their correspondence to the original code. (Note that there is no guarantee of such correspondence -- return 0 is, after all, also a correct solution to the problem.)

But the metaproof in which you showed this, and the sketch you gave of how you would go about implementing it, are not written in first-order PA. And by the time you have actually written out all the axioms, implemented the proof checker etc., it will be apparent that almost all the real work is not being done on the computer, but in your brain, using methods far more powerful than first-order PA.

If we don't regard this as a problem, then we should be willing to accept return 1 as a solution. But if we do regard this as a problem (and I think we should; apart from anything else, it means nearly all problems that would benefit from formal solution, don't actually receive such, because there just isn't that much skilled human labor to go around -- this problem is itself an example of that; you didn't have time to more than sketch the implementation) then we have to climb the ladder of more powerful formal techniques.

Replies from: rwallace

↑ comment by rwallace · 2010-07-31T14:23:52.477Z · LW(p) · GW(p)

And once again I will ask the downvoters, which part of my comment do you disagree with?

↑ comment by Blueberry · 2010-07-31T00:06:34.010Z · LW(p) · GW(p)

I upvoted the grandparent and parent, because what you said seems right to me. I wish people wouldn't downvote someone asking why e was downvoted.

Replies from: Cameron_Taylor

↑ comment by Cameron_Taylor · 2010-07-31T01:39:14.238Z · LW(p) · GW(p)

Upvoted entire ancestor tree, for similar reasons.

comment by DuncanS · 2010-07-30T19:21:58.219Z · LW(p) · GW(p)

Without the very long limit you've added, the easy problem becomes formally undecidable as it leads to an infinite regress. In order to return 1, the program has to determine whether it will return 1, and in order to do that it has to determine whether it will return 1, and in order to do that it has to determine whether it will return 1, and..... You'll use up your 3^^...^^3 iterations and return 0.

It's a bit like this problem. http://en.wikipedia.org/wiki/Halting_problem

Replies from: cousin_it

↑ comment by cousin_it · 2010-07-30T19:30:01.828Z · LW(p) · GW(p)

Without the limit, the first program would return 1. My proof keeps working almost without modifications. The only change will be that main() will use a while loop instead of an if, which is only used in deriving (2) and doesn't change its validity. Your argument doesn't show that the program will never find a proof - it only shows that it can never find the specific proof that you outlined, because it leads to infinite regress or whatever. It will just find another proof instead.

Replies from: DuncanS, Unknowns

↑ comment by DuncanS · 2010-07-31T06:48:04.119Z · LW(p) · GW(p)

OK. Let's suppose you successfully write an algorithm a() that can do everything that you said.

Now let's make some changes. Suppose you were, in the course of your search of proof space, to come across a proof that a() returned 0? You could simply return 0 at that point rather than continuing to search for proofs that a() returned 1. So let's make that change.

Now comes the fun bit. Change your algorithm a() such that it returns 1 when it hits the iteration limit instead of 0. Not that hard.

Now comes the really fun bit. Invert the output of this algorithm. Let's call it b(). This function now has the interesting property that it returns 1 when a() returns 0, and vice versa - except in the case where no proofs can be found of a value for a() short of the termination condition, where both now return 0, It's also interesting that it has exactly the same definition as a() does....

Replies from: Wei_Dai, cousin_it

↑ comment by Wei Dai (Wei_Dai) · 2010-07-31T06:55:19.887Z · LW(p) · GW(p)

Now let's make some changes. Suppose you were, in the course of your search of proof space, to come across a proof that a() returned 0? You could simply return 0 at that point rather than continuing to search for proofs that a() returned 1. So let's make that change.

If you make this change, then step 2 of cousin_it's proof won't go through anymore, I think.

↑ comment by cousin_it · 2010-07-31T08:45:08.008Z · LW(p) · GW(p)

What Wei Dai said. The first change you propose breaks A's ability to obtain step 2 by simple reasoning about itself. Originally I came up with a different problem: systematically search all proofs up to a large length, and if we find a proof that we output some number, then output that number. That problem is way more crazy than mine, I don't know what the answer is, it might actually depend on the proof checker's implementation.

↑ comment by Unknowns · 2010-07-30T19:36:31.668Z · LW(p) · GW(p)

So do you also think that this statement is true: "This statement is true if and only if it can be proven to be true"?

Replies from: cousin_it

↑ comment by cousin_it · 2010-07-30T19:41:37.023Z · LW(p) · GW(p)

See this PDF, paragraph 44.3. It constructs exactly the statement you gave (Henkin's statement) and proves it's provable.

Replies from: Unknowns, Unknowns

↑ comment by Unknowns · 2010-07-30T20:00:52.943Z · LW(p) · GW(p)

Ok, you've convinced me. Saying that this statement is true is basically the same as saying program A outputs 1.

Replies from: cousin_it

↑ comment by cousin_it · 2010-07-30T20:03:00.435Z · LW(p) · GW(p)

Yes, exactly. The proof is a little fussy because it has to keep track of bounded proof lengths everywhere, but it works basically the same.

Does it help you understand the second problem better? It uses the same trick, but with a twist. (You need to get creative when you talk about two proof systems instead of one.)

Replies from: Unknowns

↑ comment by Unknowns · 2010-07-30T20:19:46.563Z · LW(p) · GW(p)

I still agree with what I said originally about the second problem. You could compare it to these two statements:

A) A is true if and only if A and B are either both provable, or both non-provable. B) B is true if and only if B and A are either both provable, or both non-provable.

It is much more obvious that both of these statements are necessarily true and provable (by symmetry), than it is that "this is true if and only if it is provable" is true or provable. This is why at first I only accepted the second case.

↑ comment by Unknowns · 2010-07-31T01:44:29.620Z · LW(p) · GW(p)

Actually, after thinking about it more, the statement is obviously provable, without the long derivation.

If "this statement is true iff it is provable" is false, then it naturally cannot be provable. But then it is both false and not provable, which logically implies that it is true iff provable. Therefore it is both true and provable.

So the first program has to output 1 for the same reason.

Replies from: cousin_it

↑ comment by cousin_it · 2010-07-31T10:07:40.626Z · LW(p) · GW(p)

When you have a self-referential sentence, it does no good to prove it in the "real world", because the commonsense logic we use in the "real world" becomes inconsistent when applied to self-referential statements. ("This statement is false.") You really need a proof of your statement within the formal system. And within the system, the proof you gave doesn't work: the system cannot prove that falsity improves non-provability, because that would be the system assuming its own consistency all over again. So the long derivation is needed after all.

Replies from: Unknowns

↑ comment by Unknowns · 2010-07-31T13:18:43.015Z · LW(p) · GW(p)

I agree that the system can't use this argument, but that doesn't mean the argument doesn't work, just like your argument for why your algorithm must defect against a program that always defects.

The difference between my argument and "this statement is false" is that by directly depending on its own truth, "this statement is false" leaves its truth or falsity insufficiently defined. But "this statement is true iff it is provable" depends not directly on its truth but on provability, which is already defined. So I think that my "real world" argument does establish that the statement will be true also within a formal system, even though a formal system cannot establish it with this argument.

Replies from: cousin_it

↑ comment by cousin_it · 2010-07-31T13:32:06.135Z · LW(p) · GW(p)

Um, "this statement is true iff it is provable" directly depends on both truth and provability. It does refer to its own "platonic" truth value in the "real world".

Replies from: Unknowns

↑ comment by Unknowns · 2010-07-31T13:42:25.401Z · LW(p) · GW(p)

It refers to its truth value but not in a paradox generating way.

comment by RobinZ · 2010-07-30T19:04:25.658Z · LW(p) · GW(p)

I believe Program A in "the easy version" would return 0. Assuming zero run-time errors, its structure implements the logical structure:

If A can be proven to return 1 in under n steps, then A returns 1.
If A cannot be proven to return 1 in under n steps, then A returns 0.

However n is defined (the post proposes n = 3^^^^3), it can be shown by the definition of the word "proof" that:

If A can be proven to return 1 in under n steps, then A returns 1.

and therefore the first proposition holds for every program, and cannot be used to show that A returns 1.

However, the second proposition also cannot be used to show that A returns 1. If the given condition holds, A does not return 1; if the given condition does not hold, the second proposition demonstrates nothing.

Therefore no property of the program can be used to demonstrate that the program must return 1. Therefore no proof can demonstrate that the program must return 1. Therefore the program will find no proof that the program returns 1. Therefore the program will return 0.

Q.E.D.

Replies from: orthonormal, cousin_it

↑ comment by orthonormal · 2010-07-31T18:19:05.696Z · LW(p) · GW(p)

IIRC, the modification of Gödel's statement which instead has the interpretation "I can be proved in this formal system" is called a Henkin sentence, and does in fact have a finite proof in that system. This seems weird in the intuitive sense you're talking about, but it's actually the case.

Replies from: cousin_it

↑ comment by cousin_it · 2010-07-31T18:56:48.417Z · LW(p) · GW(p)

Yep. The Henkin sentence has already come up multiple times in the comments here. Understanding the proof of the Henkin sentence takes you about 95% of the way to understanding my original argument, I think.

↑ comment by cousin_it · 2010-07-30T19:19:19.753Z · LW(p) · GW(p)

...Nope.

This logic stuff is pretty tricky, but I'll try to explain how exactly the distinction between truth and provability matters in this problem. It was quite an epiphany when it hit me.

The first proposition indeed holds for every program. But it is not what is used to show that A returns 1. What is really used is this: A can prove that the first proposition holds for A by looking at the source code of A. This is definitely not the case for every program! (Provability-of-provability does not imply provability-of-truth. This is the whole point of Löb's Theorem.) A's proof of the first proposition relies on how the code of A looks - specifically, that it says explicitly "if provable then return 1". This proof would break if the structure of A were changed. For example, if A were searching for both proofs and disproofs that it would output 1, it wouldn't be able to prove the first proposition quite so easily, and maybe not at all.

Replies from: RobinZ, AlephNeil

↑ comment by RobinZ · 2010-07-30T19:37:15.408Z · LW(p) · GW(p)

I'm still confused, sadly.

Replies from: cousin_it

↑ comment by cousin_it · 2010-07-30T19:47:11.781Z · LW(p) · GW(p)

Can you follow Eliezer's cartoon proof of Löb's Theorem, linked from the post? It's kind of a prerequisite to understand what's happening here.

Replies from: RobinZ

↑ comment by RobinZ · 2010-07-30T20:15:08.585Z · LW(p) · GW(p)

I believe follow Eliezer's proof - should I look over your proof again?

Replies from: cousin_it

↑ comment by cousin_it · 2010-07-30T20:19:47.968Z · LW(p) · GW(p)

Well, my proof is basically a rewrite of Eliezer's proof in another notation and with tracking of proof lengths. The exact same steps in the exact same sequence. Even the name of the "box" statement is taken from there. So you can try, yeah.

↑ comment by AlephNeil · 2010-08-01T09:21:25.032Z · LW(p) · GW(p)

For example, if A were searching for both proofs and disproofs that it would output 1, it wouldn't be able to prove the first proposition quite so easily, and maybe not at all.

Yeah - I suspect the answer is "not at all" because this 'searching for proofs or disproofs' seems to be exactly what's needed to unbreak the reasoning in my comment above.

comment by Clippy · 2010-07-30T17:20:50.145Z · LW(p) · GW(p)

This second puzzle is a formalization of a Prisoner's Dilemma strategy proposed by Eliezer: "I cooperate if and only I expect you to cooperate if and only if I cooperate". So far we only knew how to make this strategy work by "reasoning from symmetry", also known as quining. But programs A and B can be very different - a human-created AI versus an extraterrestrial crystalloid AI. Will they cooperate?

And interestingly, posters here only started criticizing this game-theoretic reasoning when I used it. c=/

Why?

Replies from: Tyrrell_McAllister

↑ comment by Tyrrell_McAllister · 2010-07-31T07:12:15.832Z · LW(p) · GW(p)

And interestingly, posters here only started criticizing this game-theoretic reasoning when I used it. c=/

Why?

Your original formulation did not include the "I expect you to" part.

AI cooperation in practice

Contents

166 comments