Non-axiomatic math reasoning with naive Bayes

post by cousin_it · 2011-05-30T10:26:49.552Z · LW · GW · Legacy · 4 comments

Contents

  5 > 4
  For any x, x+1 > x
  There exists x such that x*x = 9
  For any x, there exists y such that x ≥ y*y and x < (y+1)*(y+1)
None
4 comments

ETA: this post is WRONG. Read on only if you like to find mistakes in other people's reasoning. Sorry.


So I've been thinking how to teach machines to reason about integers without running into Goedelian limitations. A couple days ago I suddenly got this painfully obvious idea that I'm sure others have already developed, but I don't know how to search for prior work, so I'm posting it here.

Here's a simple statement about the integers:

5 > 4

It's easy enough to check by direct calculation if your CPU can do arithmetic. Here's a more complex statement:

For any x, x+1 > x

How do you check this? Let's denote the whole statement as S, and the individual statements "x + 1 > x" as S(x). The statements S(1), S(2) and so on are all directly checkable like the first one. For simplicity we can make the "naive Bayesian" assumption that they're all independent, so P(S) = P(S(1))*P(S(2))*... (ETA: this is shady, though it doesn't break the construction. See AlephNeil's comment and my reply for a slightly better way.) After manually checking that S(1) holds, we update its probability to 1, so our estimate of P(S) increases. Then we try S(2), etc. After a while P(S) will become as close to 1 as we want, if S is true.

We can also use a different quantifier:

There exists x such that x*x = 9

You can check this one by naive Bayesian reasoning too, except "exists" corresponds to probabilistic "or", whereas "for any" corresponded to probabilistic "and". So you check S(1), S(2) and so on, revising P(S) downward at each step, until you stumble on the right x and P(S) jumps to 1.

Here's a still more complex statement:

For any x, there exists y such that x ≥ y*y and x < (y+1)*(y+1)

How do you check this one? If we again denote it as S and the individual parts as S(x), we already know how to approximately check S(x) for each value of x, because it's a statement of the previous "level"! So we know how to check the whole S approximately, too.

Hopefully you get the general idea by now: add as many quantifiers as you wish. The fancy math name for this method of organizing statements is the arithmetical hierarchy. The calculations become expensive, but in the limit of enough computing power the system will get closer and closer to correct reasoning about the integers. As time passes, it will come to believe pretty strongly in all the axioms of PA (or at least the correct ones :-)), the consistency of PA (if it's consistent), P≠NP (if it's true), etc. The Goedelian limitation doesn't apply because we don't get proofs, only ever-stronger corroborations. The arithmetical hierarchy doesn't cover tricky second-order questions like the continuum hypothesis, but perhaps we can live without those?

The flip side is that you need a whole lot of computational resources, in fact you need so much that no computable function can estimate how long it'll take our system to converge to the "actual truth". Given an arithmetical statement that's guaranteed to have a counterexample, the number of CPU cycles required to find the counterexample grows uncomputably fast with the size of the statement, I believe. So it's pretty far from a practical AI, and more like Solomonoff induction - something to be approximated.

Some optimizations immediately spring to mind. If you spend some time calculating and come to strongly believe a bunch of statements involving quantifiers, you can adopt them as heuristics (prospective axioms) while investigating other statements, search for proofs or contradictions using ordinary first-order logic, etc. Such tricks seem to fit nicely into the Bayesian framework as long as you're willing to introduce some "naivety" here and there.

This sort of non-axiomatic reasoning looks like a good approximation of our math intuitions. The axioms of PA sound reasonable to me only because I can try some examples in my mind and everything works out. So I was pretty excited to find a simple computational process that, given enough time, settles on the axioms of PA and much more besides. What do you think?

4 comments

Comments sorted by top scores.

comment by AlephNeil · 2011-05-30T11:17:49.844Z · LW(p) · GW(p)

For simplicity we can make the "naive Bayesian" assumption that they're all independent,

But then for that to work, your prior belief that x + 1 > x, for really large x, has to begin very close to 1. If there was some delta > 0 such that the prior beliefs were bounded above by 1 - delta then the infinite product would always be zero even after Bayesian updates.

How would you know to have a prior belief that reallybignumber + 1 > reallybignumber even in advance of noticing the universal generalization?

Replies from: cousin_it
comment by cousin_it · 2011-05-30T11:32:50.191Z · LW(p) · GW(p)

I cheat by assigning beliefs this way only in the "context" of checking this specific large statement :-) Maybe you could do it smarter by making the small statements non-independent. Will think.

ETA: yeah, sure. What we actually need is P(S1)*P(S2|S1)*P(S3|S1,S2)*... The convergence of this infinite product seems to be a much easier sell.

comment by prase · 2011-05-30T11:30:26.542Z · LW(p) · GW(p)

Doesn't the non-monotonous convergence of the existential statements pose a problem for practical application? The process eventually converges to truth, even in finite time, but before it does, the estimates move in the wrong direction.

As for Goedelian limitations: is it correct to suppose that the statements which are undecidable in an axiomatic system would have oscillating (i.e. non-convergent) probabilities in this probabilistic system? Is it guaranteed that contradictions don't arise even here? (By a contradicition, I mean a statement X whose probability converges at p while simultaneously the probability of non-X converge at anything different from 1-p. Of course this can be prevented by implementing an updating rule which, after each update, ensures that the probabilities of X and non-X add up to 1. But this doesn't seem particularly elegant: it would be analogous to declaring non-X as non-theorem after proving X in a formal system.)

Replies from: cousin_it
comment by cousin_it · 2011-05-30T11:36:49.763Z · LW(p) · GW(p)

Doesn't the non-monotonous convergence of the existential statements pose a problem for practical application? The process eventually converges to truth, even in finite time, but before it does, the estimates move in the wrong direction.

This is right. Actually "forall" statements can converge non-monotonously too, if they're false. I believe Goedel's theorem implies you can't make a monotonously converging algorithm.

is it correct to suppose that the statements which are undecidable in an axiomatic system would have oscillating (i.e. non-convergent) probabilities in this probabilistic system?

Hmm. This seems to be wrong because my construction doesn't use any single axiomatic system.

ETA: I thought a little more and it seems we can't get rid of "oscillating" statements. Thanks! My whole post is wrong :-)