# No coinductive datatype of integers

post by cousin_it · 2011-05-04T16:37:30.333Z · LW · GW · Legacy · 147 commentsFollowup to: What's a "natural number"?

While thinking about how to make machines understand the concept of "integers", I accidentally derived a tiny little math result that I haven't seen before. Not sure if it'll be helpful to anyone, but here goes:

You're allowed to invent an arbitrary scheme for encoding integers as strings of bits. Whatever encoding you invent, I can give you an infinite input stream of bits that will make your decoder hang and never give a definite answer like "yes, this is an integer with such-and-such value" or "no, this isn't a valid encoding of any integer".

To clarify, let's work through an example. Consider an unary encoding: 0 is 0, 1 is 10, 2 is 110, 3 is 1110, etc. In this case, if we feed the decoder an infinite sequence of 1's, it will remain forever undecided as to the integer's value. The result says we can find such pathological inputs for any other encoding system, not just unary.

The proof is obvious. (If it isn't obvious to you, work it out!) But it seems to strike at the heart of the issue why we can't naively explain to computers what a "standard integer" is, what a "terminating computation" is, etc. Namely, if you try to define an integer as some observable interface (get first bit, get last bit, get CRC, etc.), then you inevitably invite some "nonstandard integers" into your system.

This idea must be already well-known and have some standard name, any pointers would be welcome!

## 147 comments

Comments sorted by top scores.

## comment by Emile · 2011-05-05T14:05:04.238Z · LW(p) · GW(p)

An alternative phrasing:

Replies from: cousin_itIn the Atlas Mountains, one can meet two species of Sphinxes: True Sphinxes and Hollow Sphinxes. Each True Sphinx has a secret number, and it will truthfully answer any question about it, provided it can do so in less than a hundred words. A Hollow Sphinx, however, just enjoys wasting people's time : it has no secret number but will pretend it does, and answers questions so as to never contradict itself and maintain uncertainty in the asker's mind.

While you can be sure you are speaking to a True Sphinx (for example, by guessing it's number), you can never be sure that you are speaking to a Hollow Sphinx - it might be a True Sphinx whose number is very large. In fact, no-one has been able to determine whether any Hollow Sphinxes still exist.

## ↑ comment by cousin_it · 2011-05-05T14:16:53.820Z · LW(p) · GW(p)

I wonder if your explanation will make sense to people who didn't understand the original problem... but it's awesome!

Replies from: Emile## ↑ comment by Emile · 2011-05-05T15:05:28.588Z · LW(p) · GW(p)

Seems that the best strategy would "merely" be to try to find an upper bound, like "Is the number smaller than "BusyBeaver(BusyBeaver(BusyBeaver(BusyBeaver(3^^^3))))?", and a Hollow Sphinx would just answer "no, it's bigger", so presented like that, it doesn't look like a very interesting problem.

Though there *may* be a series of question such that a Hollow Sphinx can't keep up the masquarade unless it has a Halting Oracle (say, a yes-or-no question such that it can't compute which answer could apply to an infinite amount of numbers), but I have no idea which questions those might be.

(The same applies to the original problem: there might be some exotic encoding such that an "infinitely confusing" input exists, determining say it's first 100 bits is uncomputable.)

Replies from: cousin_it, None## ↑ comment by cousin_it · 2011-05-05T17:08:22.430Z · LW(p) · GW(p)

BusyBeaver(BusyBeaver(BusyBeaver(BusyBeaver(3^^^3))))

This is the moral equivalent of saying BB(3^^^3)+1. Mere recursion is puny. At least jump from BB to BB_2!

determining say it's first 100 bits is uncomputable

Every sequence of 100 bits is computable.

Here's an encoding where the "infinitely confusing" input is uncomputable: take the unary encoding from my post and XOR it with some infinite uncomputable sequence, e.g. the halting oracle sequence. Unfortunately, you can't use this test in practice because you can't compute the required bits either.

Replies from: ciphergoth, Emile, wedrifid## ↑ comment by Paul Crowley (ciphergoth) · 2011-05-05T18:32:22.623Z · LW(p) · GW(p)

Mere recursion may be puny, but given the "fifteen-second challenge" and that BB_2 etc aren't well known enough to use there, it seems like the best bet, doesn't it?

Replies from: cousin_it## ↑ comment by Emile · 2011-05-05T19:19:18.423Z · LW(p) · GW(p)

To put it more formally: we want an encoding algorithm E and a decoding algorithm D such that for any natural number n, a turing machine can compute E(n) and D(E(n)) in finite time, but that there exists a k such that "determining a string s of length k so that the set {n such that E(n) starts with s} is infinite" is an undecidable problem. Do D and E exist fulfilling those conditions?

I admit my working knowledge of what is decidable/computable is somewhat limited; maybe the condition should be to find E and D such that no algorithm can return s given k, or whether there exists an algorithm that given E and D, can return a "confusing" sequence of arbitrary length.

Replies from: cousin_it, cousin_it## ↑ comment by cousin_it · 2011-05-12T14:40:07.045Z · LW(p) · GW(p)

I asked MathOverflow to solve this and got a complete solution within an hour. Just like last time. Wow, just wow.

Replies from: Emile## ↑ comment by cousin_it · 2011-05-12T08:58:46.261Z · LW(p) · GW(p)

I think I have a partial solution for the case where the "infinitely confusing" string is unique. Namely, if there is a unique infinite string S on which D doesn't terminate, then S is computable.

Here's an algorithm for computing S by using D. Assume that the first bit of S is 0. That means D terminates on all strings whose first bit is 1. By the argument from the original post, that implies D's running time on strings starting with 1 is uniformly bounded. (Otherwise we would be able to generate an "infinitely confusing" string for D that starts with 1 by using the proof technique from my post, which can't happen because S is unique and starts with 0 by assumption.) Therefore, by feeding D all possible finite inputs in turn while allowing it gradually increasing running times ("dovetailing"), we will eventually get a "covering set" that proves D always terminates on strings starting with 1. So we determine that the first bit of S must be 0. In the same way you can determine the second bit of S, and so on.

Let's see how it works on unary notation. We run D on all finite inputs in turn. First we notice that D("0") terminates without trying to read further, therefore the first bit of S is 1. Then we notice that D("10") terminates, therefore the second bit of S is also 1. And so on.

Replies from: Emile## ↑ comment by wedrifid · 2011-05-05T18:00:33.790Z · LW(p) · GW(p)

At least jump from BB to BB_2!

BB_2?

Replies from: Plasmon## ↑ comment by Plasmon · 2011-05-05T18:04:32.080Z · LW(p) · GW(p)

The BusyBeaver function for Turing machines with a halting oracle.

Replies from: benelliott, wedrifid## ↑ comment by benelliott · 2011-05-05T20:05:13.923Z · LW(p) · GW(p)

I'm sorry to ask this question, because it seems very stupid, but how exactly would one create a set-up where a Turing machine, as in head-and-tape style thing, actually interacts with a halting oracle?

I don't doubt that its possible but I can't think of an elegant way to do it.

Replies from: cousin_it## ↑ comment by cousin_it · 2011-05-05T20:33:25.595Z · LW(p) · GW(p)

Wikipedia has a detailed explanation of the setup.

## ↑ comment by wedrifid · 2011-05-05T19:16:23.335Z · LW(p) · GW(p)

Ahh, so *that's* what the <Can someone recall the title of Eliezer's parable in which the genius level humans spend thousands of years deciphering the messages sent by the not-so-smart universe simulators?> were using!

## ↑ comment by Zack_M_Davis · 2011-05-05T20:00:57.219Z · LW(p) · GW(p)

<Can someone recall the title of Eliezer's parable in which the genius level humans spend thousands of years deciphering the messages sent by the not-so-smart universe simulators?>

You may be thinking of "That Alien Message." Best wishes, the *Less Wrong* Reference Desk.

## ↑ comment by **[deleted]** ·
2011-05-05T20:51:17.334Z · LW(p) · GW(p)

Try asking the sphinx "Is your number less than 10 if and only if P=NP?"

Replies from: Emile## ↑ comment by Emile · 2011-05-05T21:11:52.326Z · LW(p) · GW(p)

That wouldn't work for distinguishing the two kinds of sphinxes, unless you suppose that one kind knows whether P=NP and the other doesn't.

Replies from: None## ↑ comment by **[deleted]** ·
2011-05-05T21:27:52.902Z · LW(p) · GW(p)

True, but it would still yield an interesting answer :)

ETA: Also, one could give the sphinx a Turing machine and ask if it halts within N steps or fewer, where N is that sphinx's number. After a number of such questions, you'd feel pretty confident that either the sphinx is a True Sphinx or it has a halting oracle.

Replies from: Emile## ↑ comment by Emile · 2011-05-06T10:17:50.121Z · LW(p) · GW(p)

Huh, that looks like it could work, neat :)

So to bring it back to cousin_it's post, one could have a "busy beaver candidate" (a turing machines where we don't know whether it goes on for ever or eventually halt or start repeating), and the encoding of a natural number n would be that the first bit is whether or not that candidate halts before n steps, followed by the unary encoding of n.

"Encoding" or decoding n would be easy (but long when n is big), but "breaking" that encoding for any usy beaver candidate would require a halting oracle.

Replies from: None## ↑ comment by **[deleted]** ·
2011-05-06T18:10:09.650Z · LW(p) · GW(p)

Suppose I pass you the bit that says the candidate does not halt, followed by an infinite string of 1s. Then to decode this (by which I mean reject it as invalid) you would need to know whether the busy beaver candidate halts, which we've rejected as hard.

This is a problem with the Sphinxes, too, in retrospect. A Hollow Sphinx could just keep answering "yes" if it's hard to check whether the Turing machine halts, making you do the hard work.

Replies from: Emile## ↑ comment by Emile · 2011-05-06T18:48:07.872Z · LW(p) · GW(p)

Agreed, but a non-oracle Sphinx wouldn't be impossible to recognize any more, it would just be very hard to recognize (you would need the Sphinx to guess wrong, *and* to be patient enough to figure out that it did).

In summary, whoever has the halting oracle wins, and if nobody does, it's a patience contest.

## comment by Clippy · 2011-05-04T23:20:49.417Z · LW(p) · GW(p)

This phenomenon is noted in any handling of information theory that discusses "stream arithmetic coding". Those codes can be interpreted as a fraction written in binary, where each bit narrows down a region of the [0,1) numberline, with earlier-terminating bitstreams reserved for more likely source strings.

Any probability distribution on integers has a corresponding (optimal) arithmetic stream coder, and you can always make it indefinitely expect more input by feeding it the bit that corresponds to the least likely next character (which means it's not finished decompressing).

The human David MacKay discusses such encodings and their implicit unbounded length potential in Chapter 6 of his/her book which can be accessed without spending USD or violating intellectual property rights.

Replies from: cousin_it## ↑ comment by cousin_it · 2011-05-05T08:10:02.262Z · LW(p) · GW(p)

Thanks for pointing out the connection!

Replies from: Clippy## ↑ comment by Clippy · 2011-05-05T16:56:19.737Z · LW(p) · GW(p)

You're welcome! I'm always glad to learn when knowledge I've gained through paperclip maximization has value to humans (though ideally I'd want to extract USD when such value is identified).

I should add (to extend this insight to some ot the particulars of your post) that the probability distribution on the integers implicitly assumed by the unary encoding you described is that smaller numbers are more likely (in proportion to their smallness), as do all n-ary number systems. So-called "scientific" notation instead favors "round" numbers, i.e. those padded with zeros the soonest in the least-significant-digit direction.

Replies from: cousin_it## ↑ comment by cousin_it · 2011-05-05T17:15:28.975Z · LW(p) · GW(p)

though ideally I'd want to extract USD when such value is identified

Your comments are often pleasant to read, but I don't pay USD for comments that are pleasant to read, and don't know anyone who does. Sorry.

Replies from: Clippy## ↑ comment by Clippy · 2011-05-05T17:16:58.130Z · LW(p) · GW(p)

Thanks. I didn't mean charging for comments, just that if I identified major insights, I could sell consulting services or something. Or become a professor at a university teaching the math I've learned from correct reasoning and paperclip maximizing. (Though my robot would need a lot of finishing touches to pass.)

## comment by Emile · 2011-05-04T21:24:40.341Z · LW(p) · GW(p)

You're allowed to invent an arbitrary scheme for encoding integers as strings of bits. Whatever encoding you invent, I can give you an infinite input stream of bits that makes your decoder hang and never give a definite answer like "yes, this is an integer with such-and-such value" or "no, this isn't a valid encoding of any integer".

I find it annoying how my brain keeps saying "hah, I bet I could" even though I explained to it that it's mathematically provable that such an input always exists. It still keeps coming up with "how about this clever encoding?, blablabla" ... I guess that's how you get cranks.

Replies from: cousin_it## comment by **[deleted]** ·
2011-05-04T17:17:59.474Z · LW(p) · GW(p)

Agree about the theorem. Not sure about the implication for explaining integers, since a human who has been assigned the task of decoding the string should have the same difficulty with that infinite input stream. His understanding of what an integer is doesn't protect him. What seems to me to protect the human from being mentally trapped by an endless input is that his tolerance for the task is finite and eventually he'll quit. If that's all it is, then what you need to teach the computer is impatience.

Replies from: cousin_it## ↑ comment by cousin_it · 2011-05-04T17:28:11.908Z · LW(p) · GW(p)

It still feels very weird to me that we cannot explain our intuitive notion of "finiteness" to a machine. Maybe this shows that we don't understand finiteness very well yet.

Replies from: Perplexed, None, endoself, CronoDAS, None## ↑ comment by Perplexed · 2011-05-05T01:14:16.708Z · LW(p) · GW(p)

*Of course* we can explain it to a machine, just as we explain it to a person. By using second-order concepts (like "*smallest set* of thingies closed under zero and successor").
Of course then we need to leave some aspects of those second-order concepts unexplained and ambiguous - *for both machines and humans*.

## ↑ comment by jsalvatier · 2011-05-05T07:21:56.691Z · LW(p) · GW(p)

I don't understand what you're referring to in your second sentence. Can you elaborate? What sorts of things need to be ambiguous?

Replies from: Perplexed## ↑ comment by Perplexed · 2011-05-05T14:36:17.003Z · LW(p) · GW(p)

By 'ambiguous', I meant to suggest the existence of multiple non-isomorphic models.

The thing that puzzled cousin_it was that the axioms of first-order Peano arithmetic can be satisfied by non-standard models of arithmetic, and that there is no way to add additional first-order axioms to exclude these unwanted models.

The solution I proposed was to use a second-order axiom of induction - working with *properties* (i.e.sets) of numbers rather than the first-order *predicates* over numbers. This approach successfully excludes all the non-standard models of arithmetic, leaving only the desired standard model of cardinality aleph nought. But it extends the domain of discourse from simply numbers to both numbers and sets of numbers. And now we are left with the ambiguity of what model of *sets of numbers* we want to use.

It is mildly amusing that in the case of arithmetic, the unwanted non-standard models were all too big, but in the case of set theory, people seem to prefer to think of the large models as standard and dismiss Godel's constructive set theory as an aberation.

Replies from: AlephNeil, jsalvatier## ↑ comment by AlephNeil · 2011-05-06T02:36:25.114Z · LW(p) · GW(p)

prefer to think of the large models as standard

Depends what you mean by 'large' I suppose. A non-well founded model of ZFC is 'larger' than the well-founded submodel it contains (in the sense that it properly contains its well-founded submodel), but it certainly isn't "standard".

By Gödel's constructive set theory are you talking about set theory plus the axiom of constructibility (V=L)? V=L is hardly 'dismissed as an aberration' any more than the field axioms are an 'aberration' but just as there's more scope for a 'theory of rings' than a 'theory of fields', so adding V=L as an axiom (and making a methodological decision to refrain from exploring universes where it fails) has the effect of truncating the hierarchy of large cardinals. Everything above zero-sharp becomes inconsistent.

Furthermore, the picture of L sitting inside V that emerges from the study of zero-sharp is so stark and clear that set theorists will never want to let it go. ("No one will drive us from the paradise which Jack Silver has created for us".)

## ↑ comment by jsalvatier · 2011-05-05T15:22:40.336Z · LW(p) · GW(p)

Thank you for the clarification.

## ↑ comment by **[deleted]** ·
2011-05-04T17:37:34.049Z · LW(p) · GW(p)

You've thought more about explaining things to machines than I have, but I'm not sure what you consider to count as having successfully explained a concept to a machine. For example, if you tell the machine that any string with both a beginning and an end is finite, then you've given the machine at least some sort of explanation of finiteness - one which I presume you consider to be inadequate. But when I try to imagine what you might find to be inadequate about it, my naive guess is that you're thinking something like this: "even if you tell a machine that a finite string has an end, that won't help the machine to decide whether a given input has reached its end". In other words, the definition doesn't give the machine the ability to identify infinite strings as such, or to identify finite strings as such in a time shorter than the length of the string itself. But my response to this is that our own human understanding of finiteness doesn't give *us* that ability either. Just for starters, we don't know whether the universe goes on forever in all directions or closes in on itself as a four-dimensional ball - and we might never know.

## ↑ comment by cousin_it · 2011-05-04T17:49:01.265Z · LW(p) · GW(p)

I'm not sure what you consider to count as having successfully explained a concept to a machine

I want it to be able to do some of the things that humans can do, e.g. arrive at the conclusion that the Paris-Harrington theorem is "true" even though it's independent of PA. Humans manage that by having a vague unformalized concept of "standard integers" over which it is true, and then inventing axioms that fit their intuition. So I'm kicking around different ways to make the "standard integers" more clear.

Replies from: Zetetic, None## ↑ comment by Zetetic · 2011-05-04T18:18:41.132Z · LW(p) · GW(p)

Considering that Paris-Harrington is derivable from second order arithmetic, do you think any of the ideas from reverse mathematics might come into play? This paper might of interest if you aren't very familiar with reverse mathematics, but would like to know more.

Also, it's my intuition that a good deal of mathematics has more to say about human cognition than it does about anything else. That said, this seems like the sort of problem that should be tackled from a computational neuroscience angle first and foremost. I'm likely wrong about the 'first and foremost' part, but it seems like something on numerical cognition could help out.

Also, have you looked at this work? I don't care for the whole 'metaphor' camp of thought (I view it as sort of a ripoff of the work on analogies), but there seem to be a few ideas that could be distilled here.

Replies from: cousin_it## ↑ comment by cousin_it · 2011-05-04T18:27:31.157Z · LW(p) · GW(p)

I'm familiar with reverse mathematics and it is indeed very relevant to what I want. Not so sure about Lakoff. If you see helpful ideas in his paper, could you try to distill them?

Replies from: Zetetic## ↑ comment by Zetetic · 2011-05-04T18:44:51.362Z · LW(p) · GW(p)

I could give it a shot. Technically I think they are Rafael Nunez's ideas more than Lakoff's (though they are framed in Lakoff's metaphorical framework). The essential idea is that mathematics is built directly from certain types of embodied cognition, and that the feeling of intuitiveness for things like limits comes from the association of the concept with certain types of actions/movements. Nunez's papers seem to have the central goal of framing as much mathematics as possible into an embodied cognition framework.

I'm really not sure how useful these ideas are, but I'll give it another look through. I think that at most there might be the beginnings of some useful ideas, but I get the impression that Nunez's mathematical intuition is not top notch, which makes his ideas difficult to evaluate when he tries to go further than calculus.

Fortunately, his stuff on arithmetic appears to be the most developed, so if there is something there I think I should be able to find it.

## ↑ comment by **[deleted]** ·
2011-05-04T19:50:07.167Z · LW(p) · GW(p)

Having now looked up the PH theorem, I don't understand what you mean. Do you disagree with any of the following?

Computers can prove Paris-Harrington just as easily as humans can. They can also prove the strong Ramsey result that is the subject of PH as easily as humans can.

Both humans and computers are incapable of proving the Ramsey result within Peano arithmetic. Both are capable of proving it in some stronger formal systems.

Both humans and computers can "see that the Ramsey result is true" in the sense that they can verify that a certain string of symbols is a valid proof in a formal system. They are both equally capable of verifying that the Ramsey result (which concerns finite sets of integers) is true by experiment. Neither a human nor a computer can "see that the Ramsey result is true" in any stronger sense.

Replies from: cousin_it## ↑ comment by cousin_it · 2011-05-04T20:54:37.564Z · LW(p) · GW(p)

I agree with everything in your comment except the last sentence. Sorry for being cryptic, I think this still gets the point across :-)

Replies from: JoshuaZ, None## ↑ comment by JoshuaZ · 2011-05-05T02:48:22.195Z · LW(p) · GW(p)

Can I attempt a translation/expansion for Sewing-Machine of why you disagree with the last sentence?

It seems that there's an intuition among humans that the Ramsey result is true, in the sense that PA + PH captures our intuition of the integers more closely than PA + ~PH given the second order result. What you want is a computer to be able to make that sort of intuitive reasoning or to make it more precise. Is that more or less the point?

Replies from: None, Sniffnoy## ↑ comment by **[deleted]** ·
2011-05-05T04:09:42.350Z · LW(p) · GW(p)

We can all agree that human intuition is grand but not magical, I hope? Here is my point of view: you are having difficulty teaching a computer to "make that sort of intuitive reasoning" because that sort of reasoning is not quite right.

"That sort of reasoning" is a good heuristic for discovering true facts about the world (for instance, discovering interesting sequences of symbols that constitute a formal proof of the Paris-Harrington theorem), and to that extent it surely can be taught to a computer. But it does not itself express a true fact about the world, and because of that you are limited in your ability to make it part of the premises on which a computer operates (such as the limitation discussed in the OP).

So I've been thinking lately, anyway.

## ↑ comment by Sniffnoy · 2011-05-05T03:03:19.195Z · LW(p) · GW(p)

I'm really at a loss as to why such a thing should be intuitive. The additional condition seems to me to be highly unnatural; Ramsey's theorem is a purely graph-theoretic result, and this strengthened version involves comparing the number of vertices used to numbers that the vertices happen to correspond to, a comparison we would ordinarily consider meaningless.

Replies from: JoshuaZ## ↑ comment by JoshuaZ · 2011-05-05T03:14:00.587Z · LW(p) · GW(p)

If I'm following cousin it, the idea doesn't have anything really to do with the statement about Ramsey numbers. As I understand it, if in some system that is only slightly stronger than PA we can show some statement S of the form A x in N, P(x), then we should believe that the correct models of PA are those which have S being true. Or to put it a different way, we should think PA + S will do a better job telling us about reality than PA + ~S would. I'm not sure this can be formalized beyond that. Presumably if it he had a way to formalize this, cousin it wouldn't have an issue with it.

## ↑ comment by **[deleted]** ·
2011-05-05T02:11:35.185Z · LW(p) · GW(p)

Shades of Penrosian nonsense.

## ↑ comment by endoself · 2011-05-05T03:13:08.155Z · LW(p) · GW(p)

A finite number is one that cannot be the cardinality of a set that has a subset with an equal cardinality.

Replies from: cousin_it, None, Sniffnoy## ↑ comment by cousin_it · 2011-05-05T08:15:52.895Z · LW(p) · GW(p)

This reduces the problem of explaining "standard integers" to the problem of explaining "subsets", which is not easier. I don't think there's any good first-order explanation of what a "subset" is. For example, your definition fails to capture "finiteness" in some weird models of ZFC. More generally, I think "subsets" are a much more subtle concept than "standard integers". For example, a human can hold the position that all statements in the arithmetical hierarchy have well-defined (though unknown) truth values over the "standard integers", and at the same time think that the continuum hypothesis is "neither true nor false" because it quantifies over all subsets of the same integers. (Scott Aaronson defends something like this position here.)

Replies from: None## ↑ comment by **[deleted]** ·
2011-05-05T14:21:19.652Z · LW(p) · GW(p)

I don't think there's any good first-order explanation of what a "subset" is.

Well, ZFC is a first-order theory...

Replies from: MrMind## ↑ comment by MrMind · 2011-05-06T07:57:46.512Z · LW(p) · GW(p)

Yes, but Subsets(x,y) is a primitive relationship in ZFC. I don't really know what cousin_it means by an explanation, but assuming it's something like a first-order definition formula, nothing like that exists in ZFC that doesn't subsume the concept in the first place.

Replies from: Sniffnoy## ↑ comment by Sniffnoy · 2011-05-06T08:12:01.577Z · LW(p) · GW(p)

No, it isn't. The only primitive relations in ZFC are set membership and possibly equality (depending on how you prefer it). "x is a subset of y" is defined to mean "for all z, z in x implies z in y".

Replies from: MrMind## ↑ comment by MrMind · 2011-05-06T13:22:31.376Z · LW(p) · GW(p)

Can I downvote myself? Somehow my mind switched "subset" and "membership", and by the virtue of ZFC being a one-sorted theory, lo and behold, I wrote the above absurdity. Anyway, to rewrite the sentence and make it less wrong: subsets(x,y) is defined by the means of a first-order formula through the membership relation, which in a one-sorted theory already pertains the idea of 'subsetting'. x E y --> {x} <= y. So subsetting can be seen as a transfinite extension of the membership relation, and in ZFC we get no more clarity or computational intuition from the first than from the second.

## ↑ comment by **[deleted]** ·
2011-05-05T03:35:30.283Z · LW(p) · GW(p)

Set theory is not easier than arithmetic! Zero is a finite number, and N+1 is a finite number if and only if N is.

Replies from: endoself## ↑ comment by endoself · 2011-05-05T19:25:49.351Z · LW(p) · GW(p)

Yes, that is a much better definition. I don't know why this one occurred to me first.

Replies from: AlephNeil## ↑ comment by AlephNeil · 2011-05-06T02:06:23.375Z · LW(p) · GW(p)

Sewing Machine's previous comment isn't really a definition, but it leads to the following:

"n is a finite ordinal if and only if for all properties P such that P(0) and P(k) implies P(k+1), we have P(n)."

In other words, the finite numbers are "the smallest" collection of objects containing 0 and closed under successorship.

(If "properties" means predicates then our definition uses second-order logic. Or it may mean 'sets' in which case we're using set theory.)

## ↑ comment by CronoDAS · 2011-05-06T20:20:14.215Z · LW(p) · GW(p)

It still feels very weird to me that we cannot explain our intuitive notion of "finiteness" to a machine. Maybe this shows that we don't understand finiteness very well yet.

There's actually a reason for that - it's impossible to define "finite" in first-order logic. There's no set of first-order statements that's true in all finite models and false in all infinite models.

Replies from: cousin_it, rhollerith_dot_com## ↑ comment by rhollerith_dot_com · 2011-05-06T20:22:38.319Z · LW(p) · GW(p)

How's that relevant?

For present purposes, why should we care whether the logic we use is *first-order*?

## ↑ comment by CronoDAS · 2011-05-06T20:47:48.095Z · LW(p) · GW(p)

In first-order logic, if a statement is logically valid - is true in all models - then there exists a (finite) proof of that statement. Second-order logic, however, is incomplete; there is no proof system that can prove all logically valid second-order statements.

So if you can reduce something to first-order logic, that's a lot better than reducing it to second-order logic.

Replies from: rhollerith_dot_com, rhollerith_dot_com## ↑ comment by rhollerith_dot_com · 2011-05-06T20:55:32.608Z · LW(p) · GW(p)

Well, now you are exhibiting the IMHO regrettable tendency that the more mathematical LWers exhibit of putting way too much focus on incompleteness uncomputability and other negative results that have negligible chance of actually manifesting unless you are specifically looking for pathological cases or negative results.

I use second-order logic all the time. Will work fine for this purpose.

Replies from: CronoDAS## ↑ comment by rhollerith_dot_com · 2011-05-07T11:38:52.089Z · LW(p) · GW(p)

I still do not see how first-order logic relates in any way to cousin_it's statement in grandparent of grandparent.

Just because second-order logic is incomplete does not mean we must restrict ourselves to first-order logic.

Replies from: CronoDAS, rhollerith_dot_com## ↑ comment by CronoDAS · 2011-05-08T05:04:07.350Z · LW(p) · GW(p)

I think I recall reading somewhere that you only need first-order logic to define Turing machines and computer programs in general, which seems to suggest that "not expressible in first order logic" means "uncomputable"... I could just be really confused about this though...

Anyway, for some reason or other I had the impression that "not expressible in first order logic" is a property that might have something in common with "hard to explain to a machine".

## ↑ comment by rhollerith_dot_com · 2011-05-07T17:55:39.388Z · LW(p) · GW(p)

ADDED. relates -> supports or helps to illuminate

## ↑ comment by **[deleted]** ·
2011-05-04T18:37:38.008Z · LW(p) · GW(p)

If we could explain what "finite" meant, then we could explain what "infinite" meant. More and more I'm starting to believe "infinite" doesn't mean anything.

Replies from: Zetetic## ↑ comment by Zetetic · 2011-05-04T21:15:08.415Z · LW(p) · GW(p)

How about 'not finite'? Even better, 'non-halting' or maybe even...'not step-wise completable/obtainable'? Something is 'in-finite'ly far away if there is no set of steps that I can execute to obtain it within some determinate amount of time.

Replies from: None## ↑ comment by **[deleted]** ·
2011-05-05T02:09:43.667Z · LW(p) · GW(p)

You're repeating my point. It's no harder to explain what "finite" means than it is to explain what "not finite" means. You take this to mean that the concept of "not finite" is easy, and I take it to mean that concept of "finite" is hard. Cousin it's recent experience lends credence to my point of view.

For any N, it's easy to explain to a computer what "cardinality smaller than N" means. It's hard to explain to a computer what "exists N such that cardinality is smaller than N" means, and this is the source of cousin it's trouble. The next logical step is to realize for a cognitive illusion the idea that humans have a special insight into what this means, that computers can't have. For some reason most people aren't willing to take that step.

Replies from: cousin_it## ↑ comment by cousin_it · 2011-05-05T14:32:47.272Z · LW(p) · GW(p)

If it's just a cognitive illusion, why is it so hard to find contradictions in axiom systems about integers that were generated by humans intuitively, like PA?

We seem to have a superpower for inventing apparently consistent axiom systems that make "intuitive sense" to us. The fact that we have this superpower is even machine-checkable to a certain degree (say, generate all proofs of up to a million symbols and look for contradictions), but the superpower itself resists attempts at formalization.

Replies from: None## ↑ comment by **[deleted]** ·
2011-05-06T03:34:06.401Z · LW(p) · GW(p)

We work in axiom systems that have not been proven inconsistent because in the past we have reacted to inconsistencies (such as Russel's paradox) by abandoning the axioms. I wouldn't call this a superpower but a bias.

Russel's paradox is a cliche but it's really illustrative. The principle of noncontradiction says that, since we have arrived at a false statement (the barber shaves himself and doesn't shave himself) some of our premises must be wrong. Apparently the incorrect premise is: given a property P, there is a set of elements that satisfy P. What could it *possibly mean* that this premise is false? Evidently it means that the everyday meaning of words like "property," "set," and "element" is not clear enough to avoid contradictions. Why are you so sure that the everyday meaning of words like "number," "successor," and "least element" are so much clearer?

Here's another reason to be unimpressed by the fact that no contradictions in PA have been found. The length of the shortest proof of falsum in a formal system has the same property as the busy beaver numbers: they cannot be bounded above by any computable function. i.e. there is an *inconsistent* formal system with fewer than 100 axioms in which all theorems with proofs of length smaller than BB(100) are consistent with each other. Why couldn't PA be such a system?

## ↑ comment by cousin_it · 2011-05-06T08:54:25.132Z · LW(p) · GW(p)

there is an inconsistent formal system with fewer than 100 axioms in which all theorems with proofs of length smaller than BB(100) are consistent with each other. Why couldn't PA be such a system?

I suspect that for most "naive" ways of constructing such a system X, the very long proof of inconsistency for X should reduce to a very short proof of inconsistency for X+Con(X), because the latter system should be enough to capture most "informal" reasoning that you used to construct X in the first place. The existence of such a proof wouldn't imply the inconsistency of X directly (there are consistent systems X such that X+Con(X) is inconsistent, e.g. X=Y+not(Con(Y)) where Y is consistent), but if PA+Con(PA) were ever shown to be inconsistent, that would be highly suggestive and would cause me to abandon my intuitions expressed above.

But as far as we know now, PA+Con(PA) looks just as consistent as PA itself. Moreover, I think you can add a countable-ordinal pile of iterated Con's on top and still get a system that's weaker than ZFC. (I'm not 100% confident of that, would be nice if someone corrected me!)

So I'm pretty confident that PA is consistent, conditional on the statement "PA is consistent" being meaningful. Note that you need a notion of "standard integers" to make sense of the latter statement too, because integers encode proofs.

## ↑ comment by JoshuaZ · 2011-05-06T04:16:27.450Z · LW(p) · GW(p)

You've made good points (and I've voted up your remark) but I'd like to note a few issues:

First we can prove the consistency of PA assuming certain other axiomatic systems. In particular, Gentzen's theorem shows that consistency of PA can be proved if one accepts a system which is incomparable in strength to PA (in the sense that there are theorems in each system which are not theorems in the other).

they cannot be bounded above by any computable function. i.e. there is an inconsistent formal system with fewer than 100 axioms in which all theorems with proofs of length smaller than BB(100) are consistent with each other.

This isn't necessarily true unless one has a 1-1 correspondence between axiomatic systems and Turing machines, rather than just having axiomatic systems modelable as Turing machines. This is a minor detail that doesn't impact your point in a substantial fashion.

Second, it isn't clear how long we expect an average output of a Turing machine to be. I don't know if anyone has done work on this, but it seems intuitively clear to me that if I pick a random Turing machine with n states, run it on the blank tape, I should expect with a high probability that the Turing machine will halt well before BB(n) or will never halt. Let's make this claim more precise:

Definition: Let g(f)(n) be the (# of Turing machines with n states which when run on the blank tape either never halt or halt in fewer than f(n) steps) / (# of Turing machines with n states).

Question: Is there is a computable function f(n) such that g(f)(n) goes to 1 as n goes to infinity.

If such a function exists, then we should naively expect that random axiomatic systems will likely to either have an easy contradiction or be consistent. I'm not sure one way or another whether or not this function exists, but my naive guess is that it does.

Replies from: AlephNeil, AlephNeil, None## ↑ comment by AlephNeil · 2011-05-06T06:58:28.578Z · LW(p) · GW(p)

Question: Is there is a computable function f(n) such that g(f)(n) goes to 1 as n goes to infinity.

Well, let's say we want to know whether Turing machine T halts. Say T has k states. Can't we use T's program to find a family of programs T' which are essentially the same as T except they consist of a common prefix followed by arbitrary suffix? Let k' be the number of states in the 'prefix'. The proportion of programs with <= n states that belong to family T' must be at least 2^(-k'-1) for all n >= k'.

Now let's start running all programs and as we go along let's keep track of whether, for each value of n, a proportion greater than or equal to 1 - 2^(-k'-1) of programs of length <= n have halted in fewer than f(n) steps. Eventually this will be true for some n >= k'. At that point, we check to see whether one of the programs in T' has halted. If not, then none of the programs in T' can halt, and neither can T.

Therefore, such a function f could be used to solve the halting problem.

## ↑ comment by AlephNeil · 2011-05-06T06:30:00.282Z · LW(p) · GW(p)

incomparable in strength

But strictly stronger in consistency strength, of course. (Consistency strength turns out to be more fundamental than logical strength simpliciter.)

Question: Is there is a computable function f(n) such that g(f)(n) goes to 1 as n goes to infinity.

Well, let's say we want to know whether Turing machine T halts. Say T has k states. Can't we use T's program to find a family of programs T' which are essentially the same as T except they consist of a common prefix followed by arbitrary suffix? Let k' be the number of states in the 'prefix'. The proportion of programs with less than n states that belong to family T' must tend to 2^(-k') as n goes to infinity. If we choose n such that g(f)(n) > 1 - 2^(-k') then we just need to run all members of T' for f(n) steps. If none of them have halted then T does not halt.

Therefore, such a function f could be used to solve the halting problem.

## ↑ comment by **[deleted]** ·
2011-05-06T05:10:56.018Z · LW(p) · GW(p)

If such a function exists, then we should naively expect that random axiomatic systems will likely to either have an easy contradiction or be consistent. I'm not sure one way or another whether or not this function exists, but my naive guess is that it does.

You must be aware that such halting probabilities are usually uncomputable, right? In any case you're not going to be surprised that I wouldn't find any information about this limit of ratios compelling, any more than you would buy my argument that 15 is not prime because most numbers are not prime.

Replies from: JoshuaZ## ↑ comment by JoshuaZ · 2011-05-06T05:21:41.934Z · LW(p) · GW(p)

You must be aware that such halting probabilities are usually uncomputable, right?

Yes, but the existence of this function looks weaker than being able to compute Chaitin constants. Am I missing something here?

In any case you're not going to be surprised that I wouldn't find any information about this limit of ratios compelling, any more than you would buy my argument that 15 is not prime because most numbers are not prime.

My prior that a random integer is prime is 1/log n . If you give me a large integer, the chance that it is prime is very tiny and that is a good argument for assuming that your random integer really isn't prime. I'm not sure why you think that's not a good argument, at least in the context when I can't verify it (if say the number is too large).

Replies from: None## ↑ comment by **[deleted]** ·
2011-05-06T05:36:49.291Z · LW(p) · GW(p)

But 1/log(n) takes a long time to get small, so that the argument "15 is not prime because most numbers are not prime" is not very good. It seems even more specious in settings where we have less of a handle on what's going on at all, such as with halting probabilities.

Are you trying to make a probability argument like this because you scanned my argument as saying "PA is likely inconsistent because a random axiom system is likely inconsistent?" That's not what I'm trying to say at all.

Replies from: JoshuaZ## ↑ comment by JoshuaZ · 2011-05-06T13:15:34.480Z · LW(p) · GW(p)

But 1/log(n) takes a long time to get small, so that the argument "15 is not prime because most numbers are not prime" is not very good. It seems even more specious in settings where we have less of a handle on what's going on at all, such as with halting probabilities

Sure, in the case of n=15, that's a very weak argument. And just verifying is better, but the point is the overall thrust of the type of argument is valid Bayesian evidence.

Are you trying to make a probability argument like this because you scanned my argument as saying "PA is likely inconsistent because a random axiom system is likely inconsistent?" That's not what I'm trying to say at all.

No. I'm confused as to what I said that gives you that impression. If you had said that I'd actually disagree strongly (since what it is a reasonable distribution for "random axiomatic system" is not at all obvious). My primary issue again was with the Turing machine statement, where it isn't at all obvious how frequently a random Turing machine behaves like a Busy Beaver.

Replies from: None## ↑ comment by **[deleted]** ·
2011-05-06T14:00:26.626Z · LW(p) · GW(p)

And just verifying is better, but the point is the overall thrust of the type of argument is valid Bayesian evidence.

I think you are being way to glib about the possibility of analyzing these foundational issues with probability. But let's take for granted that it makes sense -- the strength of this "Bayesian evidence" is

P(ratio goes to 1 | PA is inconsistent) / P(ratio goes to 1)

Now, I have no idea what the numerator and denominator actually mean in this instance, but informally speaking it seems to me that they are about the same size.

We can replace those "events" by predictions that I'm more comfortable evaluating using Bayes, e.g. P(JoshuaZ will find a proof that this ratio goes to 1 in the next few days) and P(JoshuaZ will find a proof that this ratio goes to 1 in the next few days | Voevodsky will find an inconsistency in PA in the next 10 years). Those are *definitely* about the same size.

## ↑ comment by JoshuaZ · 2011-05-06T16:48:20.515Z · LW(p) · GW(p)

Sure. There's an obvious problem with what probabilities mean and how we would even discuss things like Turing machines if PA is inconsistent. One could talk about some model of Turing machines in Robinson arithmetic or the like.

But yes, I agree that using conventional probability in this way is fraught with difficulty.

## comment by Emile · 2011-05-16T12:51:02.302Z · LW(p) · GW(p)

OK, here's my encoding (thanks Misha):

N is encoded by N triplets of bits of the form 1??, followed by a zero.

Each triplet is of the form (1, a(N, k), b(N, k)) (with 1 <= k <= N), where

a(N, k) means "Turing machine number k halts in N steps or less, and it's final state has an

*even*number of ones on the tape", andb(N, k) means "Turing machine number k halts in N steps or less, and it's final state has an

*odd*number of ones on the tape".

By "Turing machine number k" I mean we have an infinite list containing all possible Turing machines, sorted by description length.

So to encode N, the encoder "just" simulates the N first turing machines for N steps, and notes which of them stops.

The decoder reads his input triplet-by-triplet. When reading input k, it simulates the k first Turing machines for k steps, and checks whether the results are consistent with all that it has read in the input so far : if one of the simulated machines halts in a way not announced in the input (or if a triplet is impossible like "111"), it stops reading.

While an "infinitely confusing" input stream does exist, generating it requires solving the halting problem, which is generally considered pretty hard.

Replies from: cousin_it## ↑ comment by cousin_it · 2011-05-16T13:21:52.808Z · LW(p) · GW(p)

Nice! But why do you need the parity of the output tape? Can't you use groups of 2 bits instead of 3 and just specify whether machine number k halts within N steps? (Edit: sorry, now I see why. You could confuse the decoder by telling it that all machines halt.)

Thanks a lot! Your solution feels more intuitive to me than the one offered on MathOverflow.

Replies from: Emile## ↑ comment by Emile · 2011-05-16T13:49:58.565Z · LW(p) · GW(p)

Well you could, but then you'd need a way to protect against an input of only ones - i.e. "all Turing machines halt in N steps or less". You could make a decoder that handles cases like that by checking whether a Turing machine eventually starts looping, but that makes it's logic less straightforward, and it isn't clear whether an infinite input couldn't be found without solving the halting problem.

Turing machines could be classified in three categories:

- H: those that halt
- L: those that eventually start looping
- F : those that go on forever

The halting problem is checking for membership in H, and tricking the "loop-detecting" 2-bit decoder would "only" require never returning '1' for a member of L. There may be an algorithm A that returns 1 for membership in H, 0 for membership of L, and either 1 or 0 for membership of F. That algorithm wouldn't be equivalent to a halting oracle, so it might not be undecidable, I don't know.

So having 3 bits just makes checking simpler, since you can't pretend to know that an algorithm will halt, you have to know something about the state it halts in too.

## comment by Alicorn · 2011-05-06T00:25:58.032Z · LW(p) · GW(p)

The proof is obvious. (If it isn't obvious to you, work it out!)

I deem this obnoxious.

Replies from: Skatche## ↑ comment by Skatche · 2011-05-06T01:09:38.825Z · LW(p) · GW(p)

A proof in ROT13:

Gb rnpu cbffvoyr rapbqvat bs n pbhagnoyr frg ol svavgr ovg fgevatf gurer pbeerfcbaqf n ovanel gerr: ng gur ebbg abqr lbh tb yrsg (fnl) vs gur svefg ovg vf n mreb naq evtug vs vg'f n bar, gura ng gur arkg abqr lbh tb yrsg be evtug vs gur frpbaq ovg vf n mreb be bar, naq fb sbegu. Gur npghny vagrtref pbeerfcbaq gb grezvany abqrf, ernpurq ol n svavgr ahzore bs oenapuvatf. Znysbezrq ovg fgevatf, gbb, pbeerfcbaq gb grezvany abqrf, ng gur rneyvrfg cbvag gung jr pna gryy gurl'er znysbezrq. Ubjrire, fvapr gurer ner vasvavgryl znal vagrtref, naq bayl svavgryl znal cbffvoyr ovg fgevatf bs nal tvira yratgu, gurer zhfg or ng yrnfg bar vasvavgryl ybat cngu va gur gerr, naq vs jr srrq guvf frdhrapr bs mrebf naq barf vagb bhe pbzchgre, gur pbzchgre jvyy unat.

Replies from: Sniffnoy## ↑ comment by Sniffnoy · 2011-05-06T01:57:36.232Z · LW(p) · GW(p)

You seem to have left out the part of the proof where you do the actual work! ROT13'd:

Fxngpur fubjf gung gurer zhfg or neovgenevyl ybat cnguf; gur ceboyrz abj vf gb fubj gung gurer vf na *vasvavgryl* ybat cngu. Jr pbafgehpg guvf cngu nf sbyybjf: Fvapr gurer ner vasvavgryl znal cnguf, gurer ner rvgure vasvavgryl znal cnguf fgnegvat jvgu 0, be vasvavgryl znal cnguf fgnegvat jvgu 1. Vs gur sbezre ubyqf, fgneg jvgu 0, bgurejvfr fgneg jvgu 1. Fvapr juvpurire qvtvg jr cvpxrq, gurer ner vasvavgryl znal fgnegvat jvgu vg, jr pna ybbx ng gur arkg qvtvg naq ercrng guvf cebprff. Fvapr jr pna ercrng guvf neovgenevyl, guvf trgf hf na vasvavgryl ybat cngu. Va trareny guvf nethzrag trgf lbh jung'f xabja nf Xbavt'f Yrzzn.

## ↑ comment by Skatche · 2011-05-06T02:39:31.580Z · LW(p) · GW(p)

Sorry, I have a bit of a skewed perspective about what's obvious. :P Once I perceived the connection to binary trees it seemed plain as day.

Replies from: jschulter## ↑ comment by jschulter · 2011-05-06T06:14:35.245Z · LW(p) · GW(p)

I find that for me, and many other people I know in the mathematics department of my university, once infinities, uncountability, and such enter the picture, the accuracy of intuition quickly starts to diminish, so it's wise to be careful and make sure the proof is complete before declaring it obvious. As a good example, note how surprising and notable Cantor's diagonal argument seemed the first time you heard it- it isn't obvious that the reals aren't countable when you don't already know that, so you might start trying to construct a counting scheme and end up with one that "obviously" works.

## comment by **[deleted]** ·
2011-05-04T19:01:25.786Z · LW(p) · GW(p)

Here is a related well-known idea: in nonstandard analysis, a subset B of the real numbers is infinite if and only if its extension B* contains nonstandard numbers.

It might be equivalent, actually. Think of infinite bit streams as hyperintegers (e.g. the infinite sequence of 1s is the hyperinteger (1, 11, 111, 1111...), in binary if you like). Let E be the set of encodings of all integers, and all their prefixes. E is infinite, so E* contains a nonstandard element. I believe (here is where my nonstandard analysis fails me) that such a nonstandard element gives us an infinite bit stream which can't be rejected by the decoder.

A less serious but also related idea is the Frivolous Theorem of Arithmetic, which states that almost all positive integers are really, really, really, really large.

## comment by Paul Crowley (ciphergoth) · 2011-05-04T16:58:51.061Z · LW(p) · GW(p)

Nice problem! I think this is a proof, but I don't know if it's your proof:

Vs K unf vasvavgryl znal inyvq fhssvkrf, ng yrnfg bar bs K0 be K1 zhfg unir vasvavgryl znal inyvq fhssvkrf. Fb yrg k_{x+1} = 0 vs k_1 ... k_x 0 unf vasvavgryl znal inyvq fhssvkrf, naq 1 bgurejvfr; gura k_1 gb k_x unf vasvavgryl znal inyvq fhssvkrf sbe nal x.

Replies from: Cyan, Tyrrell_McAllister, cousin_it## ↑ comment by Tyrrell_McAllister · 2011-05-05T17:06:45.043Z · LW(p) · GW(p)

Vs K unf vasvavgryl znal inyvq fhssvkrf, ng yrnfg bar bs K0 be K1 zhfg unir vasvavgryl znal inyvq fhssvkrf. Fb yrg k_{x+1} = 0 vs k_1 ... k_x 0 unf vasvavgryl znal inyvq fhssvkrf, naq 1 bgurejvfr; gura k_1 gb k_x unf vasvavgryl znal inyvq fhssvkrf sbe nal x.

Ner gurer frgf bs inyvq fgevatf fhpu gung ab Ghevat znpuvar *N* pbhyq vzcyrzrag guvf "nggnpx" ba gur bevtvany znpuvar *G*? Gung vf, qbrf gurer rkvfg n *G* fhpu gung, rira vs *N* unf npprff gb *G*'f fbhepr pbqr, *N* pnaabg gryy pbapyhfviryl juvpu bs K0 be K1 unf vasvavgryl znal inyvq fhssvkrf? Vs fb, pna lbh npghnyyl pbafgehpg n *G* gung *cebinoyl* unf guvf cebcregl?

ETA: Looks like cousin_it says that the answer to my existence question is "Yes."

Replies from: ciphergoth## ↑ comment by Paul Crowley (ciphergoth) · 2011-05-05T18:29:22.197Z · LW(p) · GW(p)

The "original Turing machine" decides whether a prefix is valid? Then yes definitely; bear in mind Rice's theorem, which basically says that *no* non-trivial property of a Turing machine is computable.

## ↑ comment by Tyrrell_McAllister · 2011-05-05T18:37:36.623Z · LW(p) · GW(p)

The "original Turing machine" decides whether a prefix is valid?

If I understand you, yes. By the "original Turing machine", I meant the machine *T* that putatively can interpret an input string as an integer or, alternatively, reject the input string as not corresponding to any integer.

So, can we actually construct such a machine that is *provably* immune to the "attack" in your proof, in the sense that no Turing machine could implement the attack? Or are you saying that Rice's theorem (with which I will have to acquaint myself) says that pretty much *any* Turing machine *T* that maps strings to integers will fit the bill? (Though, the one in cousin_it's OP doesn't . . .)

## ↑ comment by Paul Crowley (ciphergoth) · 2011-05-06T07:49:08.465Z · LW(p) · GW(p)

Hmm. So T is an acceptor, A is an attacker. A(T) is an infinite sequence of symbols produced after examining T's source code, and T(A(T)) = ⊥ means T never rejects the sequence. Then what I was asserting is ¬∃A:∀T:T(A(T)) = ⊥. What you're asking now is whether ∀T:∃A:T(A(T)) = ⊥ and I can't immediately work out the answer.

## comment by jsalvatier · 2011-05-04T18:36:01.384Z · LW(p) · GW(p)

This seems like a decent place to ask: I'm slowly trying to learn Type Theory. I haven't seen a place where (Co)Inductive datatypes (as in the Calculus of (Co)Inductive Constructions) are explained formally (though preferably for novice readers); does anyone have a a suggestion?

## comment by jsalvatier · 2011-05-20T04:29:31.594Z · LW(p) · GW(p)

After learning a bit more about inductive and coinductive types from Robert Harper's book in progress(which I found more lucid on the topic than Types and Programming Languages), I don't quite understand why it's important for the datatype to be coinductive. You can clearly encode the integers as inductive types, why doesn't that suffice for 'explaining to a computer what a "terminating computation" is'? Are computers 'naturally coinductive' in some sense?

Replies from: jsalvatier## ↑ comment by jsalvatier · 2011-05-20T04:54:49.474Z · LW(p) · GW(p)

Actually, since computers are turing complete, perhaps it's more like 'computers are naturally recursively typed' and there is an analogous impossibility theorem for encoding some coinductive type in the same way? Perhaps I just don't understanding enough yet.

## comment by wedrifid · 2011-05-05T08:05:05.678Z · LW(p) · GW(p)

To clarify, let's work through an example. Consider an unary encoding: 0 is 0, 1 is 10, 2 is 110, 3 is 1110, etc. In this case, if we feed the decoder an infinite sequence of 1's, it will remain forever undecided as to the integer's value. The result says we can find such pathological inputs for any other encoding system, not just unary.

I'm not sure this is particularly significant. If you haven't finished feeding in the input you can hardly expect the machine to give a complete output unless you have given a specific specification as to how to simplify or truncate. I perhaps would avoid saying that the decoder 'hung'. 'Hung' implies some sort of pathological error in the decoder, not just a decoder that is processing new input as fast as is possible in exactly the way it is supposed to.

That said, many things that seem bloody obvious and trivial are considered to be important mathematical principles once phrased in maths language. Maybe what you are describing is significant. It does seem like the sort of thing that usually gets a mathsy name. "Infinite Garbage In, Garbage Out" may not cut it. :)

## comment by **[deleted]** ·
2015-01-07T14:46:41.824Z · LW(p) · GW(p)

Encoding is not definition. While it is true that we cannot decide the integer value of "1111..." as you describe, the reason behind it is obvious: you are trying to use an inductive algorithm on a co-inductive data structure.

Clearly, your proposed bits-to-ints encoding scheme is isomorphic to lists of unit, and it so happens that there is a one-to-one mapping between list-of-unit and inductively defined natural numbers, *exactly so long as the lists in question are inductive and thefore finite.* This isomorphism is made up of `length :: list unit -> int`

and `repeat unit :: int -> list unit`

, according to their usual definition. Proof of that this is isomorphic is left as an exercise to the reader.

## comment by jsalvatier · 2011-05-20T04:19:27.787Z · LW(p) · GW(p)

Have you considered posting your observation/question to cstheory.stackexchange.com? I'd be interested in seeing the response there.

Replies from: cousin_it## comment by AlephNeil · 2011-05-06T03:09:07.645Z · LW(p) · GW(p)

Curiously, if you replace "finite" with "countable" and "infinite" with "uncountable" then it is possible to devise such an encoding scheme. (This is connected with the existence of "Aronszajn trees".)

Suppose we have a countable alphabet of symbols A, and we want to encode countable ordinals using functions alpha -> A for countable ordinals alpha, then there exists an encoding scheme such that, after only countably many symbols have been received, you can either say "yes, this represents the countable ordinal x" or "no, this isn't a valid encoding".

(Consider the set of all increasing functions from alpha into the rationals, where alpha is a finite or countable ordinal.)

## comment by Clippy · 2011-05-04T23:17:33.854Z · LW(p) · GW(p)

I'm pretty sure information theory books mention this phenomenon (though not explicitly as a theorem) in discussions of stream arithmetic coding. Those can be interpreted as a fraction written in binary, where each bit narrows down a region of the [0,1) numberline, with earlier-terminating bitstreams reserved for more likely source strings.

Any probability distribution on integers has a corresponding (optimal) arithmetic stream coder and you can always make it indefinitely expect more input by feeding it the bit that corresponds to the least likely next character (which means it's not finished decompressing).

MacKay discusses this in Chapter 6 of his free book.

## comment by jsalvatier · 2011-05-04T18:48:29.944Z · LW(p) · GW(p)

As for your problem, Robert Harper (posts as 'Abstract Type') frequently notes that you cannot define the natural numbers or any inductive type in haskell because it is lazy so bottom (non-termination) is a member of all types and therefore requires coinductive types (for example see also the reddit discussion).

Replies from: cousin_it## ↑ comment by cousin_it · 2011-05-04T20:52:16.750Z · LW(p) · GW(p)

Right. In fact, this phrase from Harper's blog was one of the inspirations for my post:

In Carollian style there are types called naturals and lists, but that’s only what they’re

called, it’s not what theyare.

**ETA:** it seems you slightly misunderstood Harper's point. The problem with defining "data Nat = Zero | Succ Nat" is not that bottom is in Nat, but rather that laziness allows you to write the circular definition "omega = Succ omega" (note that pattern-matching on omega *doesn't* hang), which is similar to the unary example in my post.

## ↑ comment by jsalvatier · 2011-05-04T23:35:22.837Z · LW(p) · GW(p)

Thanks for correcting me :) I imagine there's a lot of Harper I don't understand correctly.

## comment by Daniel_Burfoot · 2011-05-04T17:57:11.369Z · LW(p) · GW(p)

Whatever encoding you invent, I can give you an infinite input stream of bits that makes your decoder hang and never give a definite answer like "yes, this is an integer with such-and-such value" or "no, this isn't a valid encoding of any integer".

What if my encoding is: return TRUE(=is an integer) for all inputs?

That's not such a dumb thing to do. In fact, any encoding that didn't assign some integer outcome to any sequence of bits would be trivially suboptimal.

Replies from: cousin_it## comment by **[deleted]** ·
2011-05-04T17:23:35.144Z · LW(p) · GW(p)

How would an infinite stream of bits possibly encode an integer in the first place? All integers are finite, so unless you did something stupid like assign the infinite sequence 11111111... to the integer 37, an infinite number of bits should never correspond to any integer.

Replies from: benelliott, cousin_it## ↑ comment by benelliott · 2011-05-04T17:43:49.467Z · LW(p) · GW(p)

The idea is that you invent a system where each integer corresponds to a finite bit-string, but the lengths of these strings must be unbounded. Unary is one such code.

Then you could set up a computer program which decodes these strings, so you feed it one bit at a time, each time asking it 'has an integer been uniquely specified yet?' The OP's claim is that whatever encoding method you come up with, he can come up with an infinite string that will make your program keep saying "no integer has been uniquely defined yet" forever.

So, nobody is encoding integers as infinite strings, although there's no particular why you can't, in fact the overwhelming majority of possible encodings use infinite strings.

## ↑ comment by cousin_it · 2011-05-04T17:33:14.357Z · LW(p) · GW(p)

The decoder doesn't have access to the fact that the stream is infinite. It can only query individual bits.

Replies from: None, None, thakil## ↑ comment by **[deleted]** ·
2011-05-04T17:51:52.364Z · LW(p) · GW(p)

I guess my objection is that your task is *too* obviously impossible. If I tell you that I can color any page in any coloring book, and you give me an infinite coloring book, I won't be able to finish, even though I know how to color. Similarly, if we have a decoder that can interpret infinitely many finite bit strings, it can be stumped by an infinite one, by virtue of its being infinite.

## ↑ comment by wmorgan · 2011-05-04T18:08:07.534Z · LW(p) · GW(p)

Agreed. There's an even easier way to confound the machine, really: when it asks you for the bit string, walk away from the computer!

Note that there is no way to feed infinite input to a Turing machine. The halting problem isn't demonstrated by a program of infinite length. It's demonstrated by a finite program that resists static analysis. The analogous integer is not the one represented by an infinite bit stream, but it might be something like the Berry number.

## ↑ comment by **[deleted]** ·
2011-05-04T18:19:16.673Z · LW(p) · GW(p)

The thing is, not just any infinite input stream will do. It has to at all times look properly encoded to the decoder. You're allowed to come up with any encoding scheme that you like, so what you can try to do is to come up with an encoding scheme that an infinite input stream must violate in finite time. This turns out to be impossible, and the proof is fairly easy, but it is not as obvious as your description characterizes it here.

Replies from: None, wedrifid## ↑ comment by **[deleted]** ·
2011-05-04T18:52:26.354Z · LW(p) · GW(p)

Granted. But even so, the primary problem is that the input is infinite: we can deal with any finite input stream, no problem.

ETA: Also, the obvious way to approach the problem is to have a terminator character (this is how the unary representation works, for instance). In that case, what this result says is "if your decoder expects a terminator, then if you feed it an infinite string without a terminator, it'll get stuck." This seems obvious.

## ↑ comment by wedrifid · 2011-05-05T08:18:42.851Z · LW(p) · GW(p)

This turns out to be impossible, and the proof is fairly easy, but it is not as obvious as your description characterizes it here.

It is very nearly as obvious as Misha's description characterizes... and Misha's analogy could be extended to 'color encoding schemes' in the same way.

## ↑ comment by thakil · 2011-05-04T17:38:48.675Z · LW(p) · GW(p)

But the stream isnt an integer. If you tell me I must assign this number to an integer and give me no options to leave then I will be forced to spend the rest of mylife decoding it. Ignoring that we can't give an infinite sequence in the first place I am not sure this is a meaningful problem

## comment by JohnH · 2011-05-04T17:58:20.936Z · LW(p) · GW(p)

Isn't this what Godel said?

Replies from: cousin_it, RichardKennaway## ↑ comment by cousin_it · 2011-05-04T18:02:31.776Z · LW(p) · GW(p)

The informal part is kinda sorta reminiscent, but the formal part is too simple to be analogous to Gödel's theorem.

Replies from: David_Allen## ↑ comment by David_Allen · 2011-05-04T18:11:17.460Z · LW(p) · GW(p)

But is it analogous to the halting problem?

Replies from: cousin_it## ↑ comment by cousin_it · 2011-05-04T18:13:08.352Z · LW(p) · GW(p)

The halting problem and Gödel's first incompleteness theorem are pretty much the same thing, and proofs of both involve self-reference. The proof of my thingy is much simpler and doesn't involve self-reference, so it seems to be unrelated.

Replies from: JohnH, JohnH## ↑ comment by JohnH · 2011-05-05T04:34:33.848Z · LW(p) · GW(p)

Apparently pointing out that things like the axiom of choice, the GCH, or other statements fall under Godel's proof and are not self referential is not popular around here.

Replies from: Sniffnoy, JoshuaZ## ↑ comment by Sniffnoy · 2011-05-05T05:19:23.479Z · LW(p) · GW(p)

How on earth do those at all "fall under Gödel's proof"? Satisfying the *conclusion* of a theorem is not a meaningful relation to that theorem.

## ↑ comment by JohnH · 2011-05-05T05:32:52.566Z · LW(p) · GW(p)

The theory is that there are statements that are true but undecidable with any sufficiently strong decider. Those statements are proofs of that theory, they (or their negation) are true but undecidable.

Replies from: benelliott## ↑ comment by benelliott · 2011-05-05T20:18:38.149Z · LW(p) · GW(p)

You are right, in that those statements do satisfy the conclusion, and I don't see why you're being downvoted. The difference between them and the original, self referential, statement is generality. For example:

You: I know that ZF set theory is incomplete because the axiom of choice cannot be proven within it.

Some other guy: Okay then, how about we add the axiom of choice as another axiom, maybe this new system will be complete?

You: Nope, it still can't prove or disprove the continuum hypothesis.

SUG: So I'll add that in as another axiom, maybe that will finally patch it?

You: Nope, still doesn't work because ...

SUG: But if I add *that* as an axiom as well...

Hopefully you can see why this might keep going for quite a while, and given that its quite difficult to prove a statement is undecidable you will run out eventually. There's nothing you can do to convince him those undecidable statements are symptoms of a general problem rather than just one-time flaws.

Compare to this:

Godel: ZF set theory is not complete, because I have this self-referential construction of an unprovable statement.

SUG: But what if I add your statement as another axiom?

Godel: Then my proof still applies to this new system you created, and I can construct another, similar statement.

SUG: But what if I...

Godel: There's no point in you trying to continue this, whatever system you create my theorem will always apply.

SUG: Drat! Foiled again!

Replies from: Cyan## ↑ comment by Cyan · 2011-05-06T00:35:16.045Z · LW(p) · GW(p)

(Why is some other guy SUG and not SOG? Oversight? Euphony?)

Replies from: benelliott## ↑ comment by benelliott · 2011-05-06T07:35:10.591Z · LW(p) · GW(p)

Because I evidently can't spell :(

## ↑ comment by JoshuaZ · 2011-05-05T04:48:32.308Z · LW(p) · GW(p)

Apparently pointing out that things like the axiom of choice, the GCH, or other statements fall under Godel's proof and are not self referential is not popular around here.

Let's say for now that the proofs that AC and CH being undecidable in ZF - AC don't have much to do with self-reference. (This seems sort of true to me but my understanding of the forcing technique is very poor.) How does this at all relate to the comments which got downvoted?

Replies from: JohnH## ↑ comment by JohnH · 2011-05-05T05:24:36.512Z · LW(p) · GW(p)

Those are statements that fall under what Godel proved, that is they are statements that are unprovable in ZF. So even though his statement doesn't include self-reference it can still fall under Godel's proof if his decoder is strong enough to determine what is an integer and what is not an integer. Self-referencing has nothing to do with it at all.

The best response that anyone has given to me about being wrong itself references back to the halting problem which is itself another formulation of Godel's proof.

The correct response would be to point out that deciding if something is an integer can be accomplished with just addition and the example decoder proceeds in that manner rather than using any form of multiplication to determine what is an integer and what is not. Such a system is decidable but also infinite, so the string given is indeed decidable and given infinite time the decoder will halt (at infinity which isn't part of the axiomization, which is where the problem lies). However, what throws me is "we can find such pathological inputs for any other encoding system," which to me implies a stronger system is being thought of which would cause the system to hang for some inputs as it would fall under Godel's proof.

It is very funny to me that my most downvoted comment isn't about religion but is about Godel's proof and no one gave a decent refutation of what I said.

Replies from: JoshuaZ## ↑ comment by JoshuaZ · 2011-05-05T14:56:02.790Z · LW(p) · GW(p)

Those are statements that fall under what Godel proved, that is they are statements that are unprovable in ZF. So even though his statement doesn't include self-reference it can still fall under Godel's proof if his decoder is strong enough to determine what is an integer and what is not an integer. Self-referencing has nothing to do with it at all.

The existence of specific undecidable statements in ZF or ZF - AC is a different sort of result than what Godel showed. That for example the continuum hypothesis is undecidable in ZFC is interesting because the continuum hypothesis is interesting. However, Godel's theorems show that any consistent, axiomatizable systemn with all valid proofs recursively enumerable, that is strong enough to model a large chunk of N, must be incomplete. Exhibiting results like Cohen's results about choice and the continuum hypothesis don't give you the full result, they just show specific things about the system ZF. Indeed, if one didn't know Godel's theorems, and just knew about Cohen's forcing results, you might be tempted to just throw in AC and GH as additional axioms if you didn't have them already and then wonder if that system is complete.

It is very funny to me that my most downvoted comment isn't about religion but is about Godel's proof and no one gave a decent refutation of what I said.

People here have a very complicated set of attitudes. Even as many don't want to bother talking about irrational aspects of religion, or engaging in religious individuals who are convinced that their religious view is somehow different from all the other religious views, people who are religious are that way due to a variety of very strong cognitive biases. So there's some sympathy there. Getting math wrong and then insisting one is correct is just simple arrogance or at least, only Dunning-Kruger. There's much less sympathy there. And yes, people have tried to explain why you were wrong albeit fairly succinctly.

Replies from: JohnH## ↑ comment by JohnH · 2011-05-05T16:55:09.013Z · LW(p) · GW(p)

Dunning-Kruger.

Possibly.

However, what throws me is "we can find such pathological inputs for any other encoding system," which to me implies a stronger system is being thought of which would cause the system to hang for some inputs as it would fall under Godel's proof.

## ↑ comment by JohnH · 2011-05-04T20:03:29.411Z · LW(p) · GW(p)

No, it is directly related and is exactly what Godel's incompleteness theorem says without the self referential part. That is, this covers precisely the idea that if ones system is strong enough to determine the natural numbers then it will have statements that hang. The halting problem and the proof given are proofs that the theorem is true, not that those are the only examples of the theorem.

Replies from: benelliott## ↑ comment by benelliott · 2011-05-04T21:29:06.766Z · LW(p) · GW(p)

I really don't think it is analogous. This result doesn't exploit any interesting property that the integers have, unlike Godel, it just uses with their size. In addition, this one says 'certain *integers* cannot be represented' whereas Godel says 'certain *statements* about integers cannot be proven'.

## ↑ comment by JohnH · 2011-05-05T00:30:02.193Z · LW(p) · GW(p)

Godel was working under a philosophical system that assumed that everything could be deconstructed into first order logic. He disagreed and essentially wrote "this statement is false" as a proof that it can not be done. However, he also points out in his proof that there is an infinite number of such statements and that they do not have to be self-referential. Nor was what he was doing exploiting an interesting property of the integers but exploiting an interesting property of the system that creates the integers. This decoder can tell if something is an integer and tell one what integer it is, therefore it is able to evaluate everything needed for Godels proof, therefore it must have not only one but an infinite number of cases for which it hangs.

Replies from: benelliott## ↑ comment by benelliott · 2011-05-05T15:36:54.078Z · LW(p) · GW(p)

You're wrong. You're arguing from surface similarity rather than detailed internal workings, which is a big no-no in maths. I could have spent about five paragraphs breaking down your argument point by point, but you've made my life easy with that final line:

therefore it is able to evaluate everything needed for Godel's proof, therefore it must have not only one but an infinite number of cases for which it hangs.

This is not the case, there is a simple counter-example. Unary code, in which 0 is 0, 1 is 10, 2 is 110, 3 is 1110, 15 is 1111111111111110, hangs on the string 111111.... and no other. The fact your argument led to this conclusion is a demonstration of how completely wrong it is.

Replies from: JohnH## ↑ comment by RichardKennaway · 2011-05-04T20:24:48.642Z · LW(p) · GW(p)

No, it's completely different. The representation of integers as bit strings doesn't even have to be computable. There will always be an infinite string of bits of which no prefix represents any integer. This can be derived from König's Lemma, which says that if a finitely branching tree has paths of all finite lengths, it has an infinite path. Consider the infinite binary tree with the out-edges of each node labelled 0 and 1. Chop off all the descendants of every node that represents an integer and consider the tree that remains. Since there are infinitely many integers, there must be such nodes of arbitrarily large depth. Therefore the tree contains an infinite path.

Replies from: JohnH## ↑ comment by JohnH · 2011-05-04T23:19:36.188Z · LW(p) · GW(p)

Interesting, however it is my understanding that Konig's lemma is related to the halting problem and the Wiki article agrees with me:

"Any such tree has a path computable from 0', the canonical Turing complete set that can decide the halting problem."

Replies from: RichardKennaway## ↑ comment by RichardKennaway · 2011-05-05T06:24:50.967Z · LW(p) · GW(p)

There are certainly connections. König's Lemma isn't constructively true, for example. That is, given a computable description of an infinite tree, the problem of finding an infinite path may not be computable. Imagine walking down the tree node by node -- how can you tell whether you have stepped into a merely finite but enormously large subtree and have missed the infinite branch? Obviously, an oracle for the halting problem will solve that problem. I don't know off-hand, or from thinking about it for five minutes, whether the reverse is the case, i.e. if the halting problem can be encoded as the problem of finding an infinite branch in some computable infinite tree.