# You only need faith in two things

post by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2013-03-10T23:45:19.782Z · score: 33 (39 votes) · LW · GW · Legacy · 86 comments

You only need faith in two things: That "induction works" has a non-super-exponentially-tiny prior probability, and that some single large ordinal is well-ordered. Anything else worth believing in is a deductive consequence of one or both.

(Because being exposed to ordered sensory data will rapidly promote the hypothesis that induction works, even if you started by assigning it very tiny prior probability, so long as that prior probability is not super-exponentially tiny. Then induction on sensory data gives you all empirical facts worth believing in. Believing that a mathematical system has a model usually corresponds to believing that a certain computable ordinal is well-ordered (the proof-theoretic ordinal of that system), and large ordinals imply the well-orderedness of all smaller ordinals. So if you assign non-tiny prior probability to the idea that induction might work, and you believe in the well-orderedness of a single sufficiently large computable ordinal, all of empirical science, and all of the math you will actually believe in, will follow without any further need for faith.)

(The reason why you need faith for the first case is that although the fact that induction works can be readily observed, there is also some anti-inductive prior which says, 'Well, but since induction has worked all those previous times, it'll probably fail next time!' and 'Anti-induction is bound to work next time, since it's never worked before!' Since anti-induction objectively gets a far lower Bayes-score on any ordered sequence and is then demoted by the logical operation of Bayesian updating, to favor induction over anti-induction it is not necessary to start out believing that induction works better than anti-induction, it is only necessary *not* to start out by being *perfectly* confident that induction won't work.)

(The reason why you need faith for the second case is that although more powerful proof systems - those with larger proof-theoretic ordinals - can prove the consistency of weaker proof systems, or equivalently prove the well-ordering of smaller ordinals, there's no known perfect system for telling which mathematical systems are consistent just as (equivalently!) there's no way of solving the halting problem. So when you reach the strongest math system you can be convinced of and further assumptions seem dangerously fragile, there's some large ordinal that represents all the math you believe in. If this doesn't seem to you like faith, try looking up a Buchholz hydra and then believing that it can always be killed.)

(Work is ongoing on eliminating the requirement for faith in these two remaining propositions. For example, we might be able to describe our increasing confidence in ZFC in terms of logical uncertainty and an inductive prior which is updated as ZFC passes various tests that it would have a substantial subjective probability of failing, even given all other tests it has passed so far, if ZFC were inconsistent.)

(No, this is *not* the "tu quoque!" moral equivalent of starting out by assigning probability 1 that Christ died for your sins.)

## 86 comments

Comments sorted by top scores.

This phrase confuses me:

and that some single large ordinal is well-ordered.

Every definition I've seen of ordinal either includes well-ordered or has that as a theorem. I'm having trouble imagining a situation where it's necessary to use the well-orderedness of a larger ordinal to prove it for a smaller one.

*edit- Did you mean well-founded instead of well-ordered?

Every ordinal (in the sense I use the word[1]) is both well-founded and well-ordered.

If I assume what you wrote makes sense, then you're talking about a different sort of ordinal. I've found a paper[2] that talks about proof theoretic ordinals, but it doesn't talk about this in the same language you're using. Their definition of ordinal matches mine, and there is no mention of an ordinal that might not be well-ordered.

Also, I'm not sure I should care about the consistency of some model of set theory. The parts of math that interact with reality and the parts of math that interact with irreplaceable set theoretic plumbing seem very far apart.

[1] An ordinal is a transitive set well-ordered by "is an element of".

[2] www.icm2006.org/proceedings/Vol_II/contents/ICM_Vol_2_03.pdf

This argument strikes me as boiling down to: "I can't think of any bad attractors besides the anti-inductive prior, therefore I'm going to assume I don't need to worry about them".

Nevertheless, the lack of exposure to such attractors is quite relevant: if there was any, you'd expect some scientist to encounter it.

Why would one expect scientists to have encountered such attractors before even if they exist? As far as I know there hasn't been much effort to systematically search for them, and even if there has been some effort in that direction, Eliezer didn't site any.

being exposed to ordered sensory data will rapidly promote the hypothesis that induction works

Promote it how? By ways of inductive reasoning, to which Bayesian inference belongs. It seems like there's a contradiction between the initially small prior of "induction works" (which is different from inductive reasoning, but still related) and "promote that low-probability hypothesis (that induction works) by ways of inductive reasoning".

If you see no tension there, wouldn't you still need to state the basis for "inductive reasoning works", at least such that its use can be justified (initially)?

Consider the following toy model. Suppose you are trying to predict a sequence of zeroes and ones. The stand-in for "induction works" here will be Solomonoff induction (the sequence is generated by an algorithm and you use the Solomonoff prior). The stand-in for "induction doesn't work" here will be the "binomial monkey" prior (the sequence is an i.i.d. sequence of Bernoulli random variables with p = 1/2, so it is not possible to learn anything about future values of the sequence from past observations). Suppose you initially assign some nonzero probability to Solomonoff induction working and the rest of your probability to the binomial monkey prior. If the sequence of zeroes and ones isn't completely random (in the sense of having high Kolmogorov complexity), Solomonoff induction will quickly be promoted as a hypothesis.

Not all Bayesian inference is inductive reasoning in the sense that not all priors allow induction.

To amplify on Qiaochu's answer, the part where you promote the Solomonoff prior is Bayesian *deduction*, a matter of logic - Bayes's Theorem follows from the axioms of probability theory. It doesn't proceed by saying "induction worked, and my priors say that if induction worked it should go on working" - that part is actually implicit in the Solomonoff prior itself, and the rest is pure Bayesian deduction.

Doesn't this add "the axioms of probability theory" ie "logic works" ie "the universe runs on math" to our list of articles of faith?

Edit: After further reading, it seems like this is entailed by the "Large ordinal" thing. I googled well orderedness, encountered the wikipedia article, and promptly shat a brick.

What sequence of maths do I need to study to get from Calculus I to set theory and what the hell well orderedness means?

Solomonoff induction will quickly be promoted as a hypothesis

Again, promoted how? *All* you know is "induction is very, very unlikely to work" (low prior, non 0), and "some single large ordinal is well-ordered". That's it. How can you deduce an inference system from that that would allow you to promote a hypothesis based on it being consistent with past observations?

It seems like putting the hoversled before the bantha (= assuming the explanandum).

Promoted by Bayesian inference. Again, not all Bayesian inference is inductive reasoning. Are you familiar with Cox's theorem?

Only in passing. However, why would you assume those postulates that Cox's theorem builds on?

You'd have to construct and argue for those postulates out of (sorry for repeating) "induction is very, very unlikely to work" (low prior, non 0), and "some single large ordinal is well-ordered". How?

Wouldn't it be: large ordinal -> ZFC consistent -> Cox's theorem?

Maybe you then doubt that consequences follow from valid arguments (like Carroll's Tortoise in his dialogue with Achilles). We could add a third premise that logic works, but I'm not sure it would help.

**[deleted]**· 2013-03-13T20:47:10.859Z · score: 1 (1 votes) · LW · GW

Can you elaborate on the first step?

Believing that a mathematical system has a model usually corresponds to believing that a certain computable ordinal is well-ordered (the proof-theoretic ordinal of that system), and large ordinals imply the well-orderedness of all smaller ordinals.

I'm no expert in this -- my comment is based just on reading the post, but I take the above to mean that there's some large ordinal for ZFC whose existence implies that ZFC has a model. And if ZFC has a model, it's consistent.

that some single large ordinal is well-ordered

An ordinal is well-ordered by definition, is it not?

Did you mean to say "some single large ordinal *exists*"?

Yeah, it's hard to phrase this well and I don't know if there's a standard phrasing. What I was trying to get at was the idea that some computable ordering is total and well-ordered, and therefore an ordinal.

Well, supposing that a large ordinal *exists* is equivalent to supposing a form of Platonism about mathematics (that a colossal infinity of other objects exist). So that is quite a large statement of faith!

All maths really needs is for a large enough ordinal to be logically possible, in that it is not self-contradictory to suppose that a large ordinal exists. That's a much weaker statement of faith. Or it can be backed by an inductive argument in the way Eliezer suggests.

I'm a bit skeptical of this minimalism (if "induction works" needs to get explicitly stated, I'm afraid all sorts of other things---like "deduction works"---also do).

But while we're at it, I don't think you need to take any mathematical statements on faith. To the extent that a mathematical statement does any useful predictive work, it too can be supported by the evidence. Maybe you could say that we should include it on a technicality (we don't yet know how to do induction on mathematical objects), but if you don't think that you can do induction over mathematical facts, you've got more problems than not believing in large ordinals!

My guess is that deduction, along with bayesian updating, are being considered part of our rules of inference, rather than axioms.

Oh, like Achilles and the tortoise. Thanks, this comment clarified things a bit.

To get to Bayes, don't you also need to believe not just that probability theory is internally consistent (your well-ordered ordinal gives you that much) but also that it is the correct system for deducing credences from other credences? That is, you need to believe Cox's assumptions, or equivalently (I think) Jayes' desiderata (consistent, non-ideological, quantitative). Without these, you can do all the probability theory you want but you'll never be able to point at the number at the end of a calculation and say "that is now my credence for the sun rising tomorrow".

If you believe in a prior, you believe in probability, right?

(No, this is

notthe "tu quoque!" moral equivalent of starting out by assigning probability 1 that Christ died for your sins.)

Can someone please explain this?

I understand many religious people claim to just 'have faith' in Christ, with absolute certainty. I think the standard argument would run "well, you say I shouldn't have faith in Christ, but you have faith in 'science' / 'non-neglible probability on induction and some single well ordered large ordinal' so you can't argue against faith".

What is Eliezer saying here?

Addendum: By which I mean, can someone give a clear explanation of why they are not the same?

Presuppositionalism is a school of Christian apologetics that believes the Christian faith is the only basis for rational thought. It presupposes that the Bible is divine revelation and attempts to expose flaws in other worldviews. It claims that apart from presuppositions, one could not make sense of any human experience, and there can be no set of neutral assumptions from which to reason with a non-Christian.

You pretty much got it. Eliezer's predicting that response and saying, no, they're really not the same thing. (Tu quoque)

EDIT: Never mind, I thought it was a literal question.

I see. Could you articulate how exactly they're not the same thing please?

**[deleted]**· 2013-03-11T11:44:23.324Z · score: 7 (7 votes) · LW · GW

For instance: nowhere above did EY claim anything had probability one.

I don't think this got mentioned, but I assume that it's really difficult (as in, nobody has done it yet) to go from "induction works" to "a large ordinal is well-ordered". That would reduce the number of things you have faith in from two to one.

I was staring at this thinking "Didn't I *just say that* in the next-to-last paragraph?" and then I realized that to a general audience it is not transparent that adducing the consistency of ZFC by induction corresponds to inducing the well-ordering of some large ordinal by induction.

to a general audience it is not transparent

Not transparent? This general audience has no idea what all this even means.

I was at least familiar with the concepts involved and conflated mathematical induction and evidential inductive reasoning anyways.

I can't even understand if the post is about pure math or about the applicability of certain mathematical models to the physical world.

From what I understand it's along the same lines as Bertrand Russel's search for the smallest set of axioms to form mathematics, except for *everything* and not just math.

except for everything and not just math.

If so, it makes little sense to me. Math is one tool for modeling and accurately predicting the physical world, and it is surely nice to minimize the number of axioms required to construct an accurate model, but it is still about the model, there is no well-ordering and no ordinals in the physical world, these are all logical constructs. It seems that there is something in EY's epistemology I missed.

being exposed to ordered sensory data will rapidly promote the hypothesis that induction works

...unless you are dealing with phenomena where it doesn't, like stock markets? Or is this a statement about the general predictability of the world, i.e. that models are useful? Then it is pretty vacuous, since otherwise what point would be there in trying to model the world?

there's some large ordinal that represents all the math you believe in.

"Believe" in what sense? That it is self-consistent? That it enables accurate modeling of physical systems?

I figured it out from context. But, sure, that could probably be clearer.

Because being exposed to ordered sensory data will rapidly promote the hypothesis that induction works

Not if the alternative hypothesis assigns about the same probability to the data up to the present. For example, an alternative hypothesis to the standard "the sun rises every day" is "the sun rises every day, until March 22, 2015", and the alternative hypothesis assigns the same probability to the data observed until the present as the standard one does.

You also have to trust your memory and your ability to compute Solomonoff induction, both of which are demonstrably imperfect.

There's an infinite number of alternative hypotheses like that and you need a new one every time the previous one gets disproven; so assigning so much probability to all of them, that they went on dominating Solomonoff induction on every round even after being exposed to large quantities of sensory information, would require that the remaining probability mass assigned to the prior for Solomonoff induction be less than exp(amount of sensory information), that is, super-exponentially tiny.

My brain parsed "super-exponentially tiny" as "arbitrarily low" or somesuch. I did not wonder why it specifically needed to be super-exponential. Hence this post served both to point out that I should have been confused, (I wouldn't have understood why) and to dispel the confusion.

Something about that amuses me.

You could choose to single out a single alternative hypothesis that says the sun won't rise some day in the future. The ratio between P(sun rises until day X) and P(sun rises every day) will not change with any evidence before day X. If initially you believed a 99% chance of "the sun rises every day until day X" and a 1% chance of Solomonoff induction's prior, you would end up assigning more than a 99% probability to "the sun rises every day until day X".

Solomonoff induction itself will give some significant probability mass to "induction works until day X" statements. The Kolmogorov complexity of "the sun rises until day X" is about the Kolmogorov complexity of "the sun rises every day" plus the Kolmogorov complexity of X (approximately log2(x)+2log2(log2(x))). Therefore, even according to Solomonoff induction, the "sun rises until day X" hypothesis will have a probability approximately proportional to P(sun rises every day) / (X log2(X)^2). This decreases subexponentially with X, and even slower if you sum this probability for all Y >= X.

In order to get exponential change in the odds, you would need to have repeatable independent observations that distinguish between Solomonoff induction and some other hypothesis. You can't get that in the case of "sun rises every day until day X" hypotheses.

If you only assign significant probability mass to one changeover day, you behave inductively on almost all the days up to that point, and hence make relatively few epistemic errors. To put it another way, unless you assign superexponentially-tiny probability to induction ever working, the number of anti-inductive errors you make over your lifespan will be bounded.

If you only assign significant probability mass to one changeover day, you behave inductively on almost all the days up to that point, and hence make relatively few epistemic errors.

But even one epistemic error is enough to cause an arbitrarily large loss in utility. Suppose you think that with 99% probability, unless you personally join a monastery and stop having any contact with the outside world, God will put everyone who ever existed into hell on 1/1/2050. So you do that instead of working on making a positive Singularity happen. Since you can't update away this belief until it's too late, it does seem important to have "reasonable" priors instead of just a non-superexponentially-tiny probability to "induction works".

But even one epistemic error is enough to cause an arbitrarily large loss in utility.

This is always true.

Since you can't update away this belief until it's too late, it does seem important to have "reasonable" priors instead of just a non-superexponentially-tiny probability to "induction works".

I'd say more that besides your one reasonable prior you also need to not make various sorts of specifically harmful mistakes, but this only becomes true when instrumental welfare as well as epistemic welfare are being taken into account. :)

Do you think it's useful to consider "epistemic welfare" independently of "instrumental welfare"? To me it seems that approach has led to a number of problems in the past.

- Solomonoff Induction was historically justified a way similar to your post: you should use the universal prior, because whatever the "right" prior is, if it's computable then substituting the universal prior will cost you only a limited number of epistemic errors. I think this sort of argument is more impressive/persuasive than it should be (at least for some people, including myself when I first came across it), and makes them erroneously think the problem of finding "the right prior" or "a reasonable prior" is already solved or doesn't need to be solved.
- Thinking that anthropic reasoning / indexical uncertainty is clearly an epistemic problem and hence ought to be solved within epistemology (rather than decision theory), leading for example to dozens of papers arguing over what is the right way to do Bayesian updating in the Sleeping Beauty problem.

Ok, I agree with this interpretation of "being exposed to ordered sensory data will rapidly promote the hypothesis that induction works".

Yep! And for the record, I agree with your above paragraphs given that.

I would like to note explicitly for other readers that probability goes down proportionally to the exponential of Kolmogorov complexity, not proportional to Kolmogorov complexity. So the probability of the Sun failing to rise the next day really is going down at a noticeable rate, as jacobt calculates (1 / x log(x)^2 on day x). You can't repeatedly have large likelihood ratios against a hypothesis or mixture of hypotheses and not have it be demoted exponentially fast.

But... no.

"The sun rises every day" is much simpler information and computation than "the sun rises every day until Day X". To put it in caricature, if hypothesis "the sun rises every day"is:

XXX1XXXXXXXXXXXXXXXXXXXXXXXXXX

(reading from the left)

then the hypothesis "the sun rises every day until Day X" is:

XXX0XXXXXXXXXXXXXXXXXXXXXX1XXX

And I have no idea if that's even remotely the right order of magnitude, simply because I have no idea how many possible-days or counterfactual days we need to count, nor of how exactly the math should work out.

The important part is that for every possible Day X, it is equally balanced by the "the sun rises every day" hypothesis, and AFAICT this is one of those things implied by the axioms. So because of complexity giving you base rates, most of the evidence given by sunrise accrues to "the sun rises every day", and the rest gets evenly divided over all non-falsified "Day X" (also, induction by this point should let you induce that Day X hypotheses will continue to be falsified).

You're making the argument that Solomonoff induction would select "the sun rises every day" over "the sun rises every day until day X". I agree, assuming a reasonable prior over programs for Solomonoff induction. However, if your prior is 99% "the sun rises every day until day X", and 1% "Solomonoff induction's prior" (which itself might assign, say, 10% probability to the sun rising every day), then you will end up believing that the sun rises every day until day X. Eliezer asserted that in a situation where you assign only a small probability to Solomonoff induction, it will quickly dominate the posterior. This is false.

most of the evidence given by sunrise accrues to "the sun rises every day", and the rest gets evenly divided over all non-falsified "Day X"

Not sure exactly what this means, but the ratio between the probabilities "the sun rises every day" and "the sun rises every day until day X" will not be affected by any evidence that happens before day X.

In the real world inductions seems to work for some problems but not for others.

The turkey who gets feed by humans can update every day he's fed on the thesis that humans are benelovent. When he get's slaughtered at thanksgiving, he's out of luck.

I feel like this is more of a problem with your optimism than with induction. You should really have a hypothesis set that says "humans want me to be fed for some period of time" and the evidence increases your confidence in that, not just some subset of it. After that, you can have additional hypotheses about, for example, their possible motivations, that you could update on based on whatever other data you have (e.g. you're super-induction-turkey, so you figured out evolution). Or, more trivially, you might notice that sometimes your fellow turkeys disappear and don't come back (if that happens). You would then predict the future based on all of these hypotheses, not just one linear trend you detected.

I'm not sure why, but now I want Super-induction-turkey to be the LW mascot.

If you have a method of understanding the world that works for all problems, I would love to hear it.

If you have a method of understanding the world that works for all problems, I would love to hear it.

Acknowledging that you can't solve them?

In what sense does that "work"?

Being able to predict the results of giving up on a problem does not imply that giving up is superior to tackling a problem that I don't know I'll be able to solve.

How do you know which ones are the ones you can't solve?

So induction gives the right answer 100s of times, and then gets it wrong once. Doesn't seem too bad a ratio.

I've long claimed to not have faith in *anything*. I'm certainly don't have "faith" in inductive inference. I don't see why anyone would have "faith" in something which they are uncertain about. The need for lack of certainty about induction has long been understood.

I don't have *faith* in induction. I happen to be the kind of monster who *does* induction.

But only sometimes. The more toothpaste I have extracted from the tube so far, the more likely it is to be empty.

I've long claimed to not have faith in anything. I'm certainly don't have "faith" in inductive inference. I don't see why anyone would have "faith" in something which they are uncertain about. The need for lack of certainty about induction has long been understood.

Which of these seven models of faith do you have in mind when you use that word (if any)?

Bah, philosophy. I essentially mean belief not justified by the evidence.

I think you also need faith in "wanting something". I mean, it's not absolutely essential, but if you don't want anything, it's unlikely that you'll make any use of your shiny induction and ordinal.

Nicely timed with this sequence rerun.

I actually thought this was part of a sequence, because I'm missing some context along the lines of "I vaguely remember this being discussed but now I can't remember why the topic was on the table in the first place." Initially I thought it was part of the Epistemology sequence but I can't figure out where it follows from. Someone enlighten me?

Does induction state a fact about the territory or the map? Is it more akin to "The information processing influencing my sensory inputs *actually* has to a processor in which P(0) & [P(0) & P (1) & ... & P(n) -> P(n+1)] for all propositions P and natural n?" Or is it "my *own* information processor is one for which P(0) & [P(0) & P (1) & ... & P(n) -> P(n+1)] for all propositions P and natural n?"

It seems like the second option is true by definition (by the authoring of the AI, we simply make it so because we suppose that is the way to author an AI to map territories). This supposition itself would be more like the first option.

I'm guessing I'm probably just confused here. Feel free to dissolve the question.

This question seems to confuse mathematical induction with inductive reasoning.

So I have. Mathematical induction is, so I see, actually a form of deductive reasoning because its conclusions necessarily follow from its premises.

Mathematical induction is more properly regarded as an axiom. It is accepted by a vast majority of mathematicians, but not all.

How should I think about the terminologies "faith" and "axiom" in this context? Is this "faith in two things" more fundamental than belief in some or all mathematical axioms?

For example, if I understand correctly, mathematical induction is equivalent to the well-ordering principle (pertaining to subsets of the natural numbers, which have a quite low ordinal). Does this mean that this axiom is subsumed by the second faith, which deals with the well-ordering of a single much higher ordinal?

Or, as above, did Eliezer mean "well-founded?" In which case, is he taking well-ordering as an axiom to prove that his faiths are enough to believe all that is worth believing?

It may be better to just point me to resources to read up on here than to answer my questions. I suspect I may still be missing the mark.

I'm not sure how to answer your specific question; I'm not familiar with proof-theoretic ordinals, but I think that's the keyword you want. I'm not sure what your general question means.

Utter pedantry: or rather an axiom schema, in first order languages.

You only need faith in two things: ...that some single large ordinal is well-ordered.

I'm confused. What do you mean by *faith* in... well, properties of abstract formal systems? That some single large ordinal must exist in at least one of your models for it to usefully model reality (or other models)?

Work is ongoing on eliminating the requirement for faith in these two remaining propositions. For example, we might be able to describe our increasing confidence in ZFC in terms of logical uncertainty and an inductive prior which is updated as ZFC passes various tests that it would have a substantial subjective probability of failing, even given all other tests it has passed so far, if ZFC were inconsistent.

Would using the length of the demonstration of a contradiction work? Under the Curry-Howard correspondence, a lengthy proof should correspond to a lengthy program, which under Solomonoff induction should have less and less credit.

Unless I've missed something, it is easy to exhibit small formal systems such that the minimum proof length of a contradiction is unreasonably large. E.g. Peano Arithmetic plus the axiom "Goodstein(Goodstein(256)) does not halt" can prove a contradiction but only after some very, very large number of proof steps. Thus failure to observe a contradiction after small huge numbers of proof steps doesn't provide very strong evidence.

Given that we can't define that function in PA what do you mean by Goodstein(256)?

Goodstein is definable, it just can't be proven total. If I'm not mistaken, all Turing machines are definable in PA (albeit they may run at nonstandard times).

So I gather we define a Goodstein relation G such that [xGy] in PA if [y = Goodstein(x)] in ZFC, then you're saying PA plus the axiom [not(exists y, (256Gy and exists z, (yGz)))] is inconsistent but the proof of that is huge because it the proof basically has to write an execution trace of Goodstein(Goodstein(256)). That's interesting!

Isn't is possible to trivially generate an order of arbitrary size that is well-ordered?

How?

You can do it with the axiom of choice, but beyond that I'm pretty sure you can't.

If "arbitrary size" means "arbitrarily large size," see Hartogs numbers. On the other hand, the well-ordering principle is equivalent to AC.

Take the empty set. Add an element. Preserving the order of existing elements, add a greatest element. Repeat.

That sounds like it would only work for countable sets.

Is the single large ordinal which must be well-ordered uncountable? I had figured that simply unbounded was good enough for this application.