## Posts

## Comments

**oscar_cunningham**on Are index funds still a good investment? · 2020-12-03T09:14:41.262Z · LW · GW

Passive investors own the same proportion of each stock (to a first approximation). Therefore the remaining stocks, which are held by active investors, also consist of the same proportion of every stock. So if stocks go down then this will reduce the value of the stocks held by the average active investor by the same amount as those of passive investors.

If you think stocks will go down across the market then the only way to avoid your investments going down is to not own stocks.

**oscar_cunningham**on Scoring 2020 U.S. Presidential Election Predictions · 2020-11-11T10:47:27.146Z · LW · GW

I just did the calculations. Using the interactive forecast from 538 gives them a score of -9.027; using the electoral_college_simulations.csv data from The Economist gives them a score of -7.841. So The Economist still wins!

**oscar_cunningham**on Did anybody calculate the Briers score for per-state election forecasts? · 2020-11-11T09:26:21.534Z · LW · GW

**oscar_cunningham**on Scoring 2020 U.S. Presidential Election Predictions · 2020-11-11T09:25:52.540Z · LW · GW

Does it make sense to calculate the score like this for events that aren't independent? You no longer have the cool property that it doesn't matter how you chop up your observations.

I think the correct thing to do would be to score the single probability that each model gave to this exact outcome. Equivalently you could add the scores for each state, but for each use the probability conditional on the states you've already scored. For 538 these probabilities are available via their interactive forecast.

Otherwise you're counting the correlated part of the outcomes multiple times. So it's not surprising that The Economist does best overall, because they had the highest probability for a Biden win and that did in fact occur.

My suggested method has the nice property that if you score two perfectly correlated events then the second one always gives exactly 0 points.

**oscar_cunningham**on Did anybody calculate the Briers score for per-state election forecasts? · 2020-11-11T07:19:45.261Z · LW · GW

Does it make sense to calculate the score like this for events that aren't independent? You no longer have the cool property that it doesn't matter how you chop up your observations.

I think the correct thing to do would be to score the single probability that each model gave to this exact outcome. Equivalently you could add the scores for each state, but for each use the probabilities conditional on the states you've already scored. For 538 these probabilities are available via their interactive forecast.

Otherwise you're counting the correlated part of the outcomes multiple times. So it's not surprising that The Economist does best overall, because they had the highest probability for a Biden win and that did in fact occur.

EDIT: My suggested method has the nice property that if you score two perfectly correlated events then the second one always gives exactly 0 points.

**oscar_cunningham**on PredictIt: Presidential Market is Increasingly Wrong · 2020-10-19T06:19:16.587Z · LW · GW

Maybe there's just some new information in Trump's favour that you don't know about yet?

**oscar_cunningham**on Classifying games like the Prisoner's Dilemma · 2020-07-04T19:17:59.974Z · LW · GW

I've been wanting to do something like this for a while, so it's good to see it properly worked out here.

If you wanted to expand this you could look at games which weren't symmetrical in the players. So you'd have eight variables, W, X, Y and Z, and w, x, y and z. But you'd only have to look at the possible orderings within each set of four, since it's not necessarily valid to compare utilities between people. You'd also be able to reduce the number of games by using the swap-the-players symmetry.

**oscar_cunningham**on Wolf's Dice · 2019-07-17T13:06:40.118Z · LW · GW

Right. But also we would want to use a prior that favoured biases which were near fair, since we know that Wolf at least thought they were a normal pair of dice.

**oscar_cunningham**on Open Thread April 2019 · 2019-04-04T07:04:15.002Z · LW · GW

Suppose I'm trying to infer probabilities about some set of events by looking at betting markets. My idea was to visualise the possible probability assignments as a high-dimensional space, and then for each bet being offered remove the part of that space for which the bet has positive expected value. The region remaining after doing this for all bets on offer should contain the probability assignment representing the "market's beliefs".

My question is about the situation where there is no remaining region. In this situation for every probability assignment there's some bet with a positive expectation. Is it a theorem that there is always an arbitrage in this case? In other words, can one switch the quantifiers from "for all probability assignments there exists a positive expectation bet" to "there exists a bet such that for all probability assignments the bet has positive expectation"?

**oscar_cunningham**on The Kelly Criterion · 2018-10-17T00:38:14.810Z · LW · GW

I believe you missed one of the rules of Gurkenglas' game, which was that there are at most 100 rounds. (Although it's possible I misunderstood what they were trying to say.)

If you assume that play continues until one of the players is bankrupt then in fact there are lots of winning strategies. In particular betting any constant proportion less than 38.9%. The Kelly criterion isn't unique among them.

My program doesn't assume anything about the strategy. It just works backwards from the last round and calculates the optimal bet and expected value for each possible amount of money you could have, on the basis of the expected values in the next round which it has already calculated. (Assuming each bet is a whole number of cents.)

**oscar_cunningham**on The Kelly Criterion · 2018-10-16T18:39:28.929Z · LW · GW

If you wager one buck at a time, you win almost certainly.

But that isn't the Kelly criterion! Kelly would say I should open by betting *two* bucks.

In games of that form, it seems like you should be more-and-more careful as the amount of bets gets larger. The optimal strategy doesn't tend to Kelly in the limit.

EDIT: In fact my best opening bet is $0.64, leading to expected winnings of $19.561.

EDIT2: I reran my program with higher precision, and got the answer $0.58 instead. This concerned me so I reran again with infinite precision (rational numbers) and got that the best bet is $0.21. The expected utilities were very similar in each case, which explains the precision problems.

EDIT3: If you always use Kelly, the expected utility is only $18.866.

**oscar_cunningham**on The Kelly Criterion · 2018-10-16T16:33:39.971Z · LW · GW

Can you give a concrete example of such a game?

**oscar_cunningham**on The Kelly Criterion · 2018-10-16T14:06:19.033Z · LW · GW

even if your utility outside of the game is linear, inside of the game it is not.

Are there any games where it's a wise idea to use the Kelly criterion even though your utility outside the game is linear?

**oscar_cunningham**on The Kelly Criterion · 2018-10-16T13:34:57.081Z · LW · GW

Marginal utility is decreasing, but in practice falls off far less than geometrically.

I think this is only true if you're planning to give the money to charity or something. If you're just spending the money on yourself then I think marginal utility is literally zero after a certain point.

**oscar_cunningham**on Open Thread September 2018 · 2018-09-26T12:34:52.460Z · LW · GW

Yeah, I think that's probably right.

I thought of that before but I was a bit worried about it because Löb's Theorem says that a theory can never prove this axiom schema about itself. But I think we're safe here because we're assuming "If T proves φ, then φ" while not actually working in T.

**oscar_cunningham**on Open Thread September 2018 · 2018-09-26T11:00:25.121Z · LW · GW

I'm arguing that, for a theory T and Turing machine P, "T is consistent" and "T proves that P halts" aren't together enough to deduce that P halts. And as I counter example I suggested T = PA + "PA is inconsistent" and P = "search for an inconsistency in PA". This P doesn't halt even though T is consistent and proves it halts.

So if it doesn't work for that T and P, I don't see why it would work for the original T and P.

**oscar_cunningham**on Open Thread September 2018 · 2018-09-25T18:07:25.927Z · LW · GW

Consistency of T isn't enough, is it? For example the theory (PA + "The program that searches for a contradiction in PA halts") is consistent, even though that program doesn't halt.

**oscar_cunningham**on Quantum theory cannot consistently describe the use of itself · 2018-09-25T09:53:31.606Z · LW · GW

https://www.scottaaronson.com/blog/?p=3975

**oscar_cunningham**on Open Thread September 2018 · 2018-09-20T20:11:55.066Z · LW · GW

This is a good point. The Wikipedia pages for other sites, like Reddit, also focus unduly on controversy.

**oscar_cunningham**on Zut Allais! · 2018-09-06T20:35:20.377Z · LW · GW

And the fact that situations like that occurred in humanity's evolution explains why humans have the preference for certainty that they do.

**oscar_cunningham**on Open Thread September 2018 · 2018-09-03T09:10:35.767Z · LW · GW

As well as ordinals and cardinals, Eliezer's construction also needs concepts from the areas of computability and formal logic. A good book to get introduced to these areas is Boolos' "Computability and Logic".

**oscar_cunningham**on Open Thread September 2018 · 2018-09-03T08:08:29.260Z · LW · GW

being unable to imagine a scenario where something is possible

This isn't an accurate description of the mind projection fallacy. The mind projection fallacy happens when someone thinks that some phenomenon occurs in the real world but in fact the phenomenon is a part of the way their mind works.

But yes, it's common to almost all fallacies that they are in fact weak Bayesian evidence for whatever they were supposed to support.

**oscar_cunningham**on Open Thread September 2018 · 2018-09-01T17:12:14.313Z · LW · GW

Eliezer made this attempt at naming a large number computable by a small Turing machine. What I'm wondering is exactly what axioms we need to use in order to prove that this Turning machine does indeed halt. The description of the Turing machine uses a large cardinal axiom ("there exists an I0 rank-into-rank cardinal"), but I don't think that assuming this cardinal is enough to prove that the machine halts. Is it enough to assume that this axiom is consistent? Or is something stronger needed?

**oscar_cunningham**on You Play to Win the Game · 2018-08-31T19:10:06.529Z · LW · GW

games are a specific case where the utility (winning) is well-defined

Lots of board games have badly specified utility functions. The one that springs to mind is Diplomacy; if a stalemate is negotiated then the remaining players "share equally in a draw". I'd take this to mean that each player gets utility 1/n (where there are n players, and 0 is a loss and 1 is a win). But it could also be argued that they each get 1/(2n), sharing a draw (1/2) between them (to get 1/n each wouldn't they have to be "sharing equally in a win"?).

Another example is Castle Panic. It's allegedly a cooperative game. The players all "win" or "lose" together. But in the case of a win one of the players is declared a "Master Slayer". It's never stated how much the players should value being the Master Slayer over a mere win.

Interesting situations occur in these games when the players have different opinions about the value of different outcomes. One player cares more about being the Master Slayer than everyone else, so everyone else lets them be the Master Slayer. They think that they're doing much better that everyone else, but everyone else is happy so long as they all keep winning.

**oscar_cunningham**on Open Thread August 2018 · 2018-08-16T19:31:54.613Z · LW · GW

I actually learnt quantum physics from that sequence, and I'm now a mathematician working in Quantum Computing. So it can't be too bad!

The explanation of quantum physics is the best I've seen anywhere. But this might be because it explained it in a style that was particularly suited to me. I really like the way it explains the underlying reality first and only afterwards explains how this corresponds with what we perceive. A lot of other introductions follow the historical discovery of the subject, looking at each of the famous experiments in turn, and only building up the theory in a piecemeal way. Personally I hate that approach, but I've seen other people say that those kind of introductions were the only ones that made sense to them.

The sequence is especially good if you don't want a math-heavy explantation, since it manages to explain exactly what's going on in a technically correct way, while still not using any equations more complicated than addition and multiplication (as far as I can remember).

The second half of the sequence talks about interpretations of quantum mechanics, and advocates for the "many-worlds" interpretation over "collapse" interpretations. Personally I found it sufficient to convince me that collapse interpretations were bullshit, but it didn't quite convince me that the many-worlds interpretation is obviously true. I find it plausible that the true interpretation is some third alternative. Either way, the discussion is very interesting and worth reading.

As far as "holding up" goes, I once read through the sequence looking for technical errors and only found one. Eliezer says that the wavefunction can't become more concentrated because of Liouville's theorem. This is completely wrong (QM is time-reversible, so if the wavefunction can become more spread out it must also be able to become more concentrated). But I'm inclined to be forgiving to Eliezer on this point because he's making exactly the mistake that he repeatedly warns us about! He's confusing the distribution described by the wavefunction (the uncertainty that we *would* have if we performed a measurment) with the uncertainty we *do* have *about* the wavefunction (which is what Liouville's theorem actually applies to).

**Oscar_Cunningham**on [deleted post] 2018-08-15T13:34:41.626Z

Really, the fact that different sizes of moral circle can incentivize coercion is just a trivial corollary of the fact that value differences in general can incentivize coercion.

**Oscar_Cunningham**on [deleted post] 2018-08-15T07:04:31.147Z

When people have a wide circle of concern and advocate for its widening as a norm, this makes me nervous because it implies huge additional costs forced on me, through coercive means like taxation or regulations

At the moment I (and many others on LW) are experiencing the opposite. We would prefer to give money to people in Africa, but instead we are forced by taxes to give to poor people in the same country as us. Since charity to Africa is much more effective, this means that (from our point of view) 99% of the taxed money is being wasted.

**oscar_cunningham**on Open Thread August 2018 · 2018-08-14T13:06:38.396Z · LW · GW

Okay, sure. But an idealized rational reasoner wouldn't display this kind of uncertainty about its own beliefs, but it would still have the phenomenon you were originally asking about (where statements assigned the same probability update by different amounts after the introduction of evidence). So this kind of second-order probability can't be used to answer the question you originally asked.

**oscar_cunningham**on Open Thread August 2018 · 2018-08-13T11:22:04.778Z · LW · GW

It seems like you're describing a Bayesian probability distribution over a frequentist probability estimate of the "real" probability.

Right. But I was careful to refer to f as a frequency rather than a probability, because f isn't a description of our beliefs but rather a physical property of the coin (and of the way it's being thrown).

Agreed that this works in cases which make sense under frequentism, but in cases like "Trump gets reelected" you need some sort of distribution over a Bayesian credence, and I don't see any natural way to generalise to that.

I agree. But it seems to me like the other replies you've received are mistakenly treating all propositions as though they do have an f with an unknown distribution. Unnamed suggests using the beta distribution; the thing which it's the distribution of would have to be f. Similarly rossry's reply, containing phrases like "something in the ballpark of 50%" and "precisely 50%", talks as though there is some unknown percentage to which 50% is an estimate.

A lot of people (like in the paper Pattern linked to) think that our distribution over f is a "second-order" probability describing our beliefs about our beliefs. I think this is wrong. The number f doesn't describe our beliefs at all; it describes a physical property of the coin, just like mass and diameter.

In fact, any kind of second-order probability must be trivial. We have introspective access to our own beliefs. So given any statement about our beliefs we can say for certain whether or not it's true. Therefore, any second-order probability will either be equal to 0 or 1.

**oscar_cunningham**on Open Thread August 2018 · 2018-08-13T10:06:29.757Z · LW · GW

The Open Thread appears to no longer be stickied. Try pushing the pin in harder next time.

**oscar_cunningham**on Open Thread August 2018 · 2018-08-12T06:43:20.191Z · LW · GW

It doesn't really matter for the point I was making, so long as you agree that the probability moves further for the second coin.

**oscar_cunningham**on Open Thread August 2018 · 2018-08-11T22:13:19.688Z · LW · GW

This is related to the problem of predicting a coin with an unknown bias. Consider two possible coins: the first which you have inspected closely and which looks perfectly symmetrical and feels evenly weighted, and the second which you haven't inspected at all and which you got from a friend who you have previously seen cheating at cards. The second coin is much more likely to be biased than the first.

Suppose you are about to toss one of the coins. For each coin, consider the event that the coin lands on heads. In both cases you will assign a probability of 50%, because you have no knowledge that distinguishes between heads and tails.

But now suppose that before you toss the coin you learn that the coin landed on heads for each of its 10 previous tosses. How does this affect your estimate?

- In the case of the first coin it doesn't make very much difference. Since you see no way in which the coin could be biased you assume that the 10 heads were just a coincidence, and you still assign a probability of 50% to heads on the next toss (maybe 51% if you are beginning to be suspicious despite your inspection of the coin).
- But when it comes to the second coin, this evidence would make you very suspicious. You would think it likely that the coin had been tampered with. Perhaps it simply has two heads. But it would also still be possible that the coin was fair. Two headed coins are pretty rare, even in the world of degenerate gamblers. So you might assign a probability of around 70% to getting heads on the next toss.

This shows the effect that you were describing; both events had a prior probability of 50%, but the probability changes by different amounts in response to the same evidence. We have a lot of knowledge about the first coin, and compared to this knowledge the new evidence is insignificant. We know much less about the second coin, and so the new evidence moves our probability much further.

Mathematically, we model each coin as having a fixed but unknown frequency with which it comes up heads. This is a number 0 ≤ f ≤ 1. If we knew f then we would assign a probability of f to any coin-flip except those about which we have direct evidence (i.e. those in our causal past). Since we don't know f we describe our knowledge about it by a probability distribution P(f). The probability of the next coin-flip coming up heads is then the expected value of f, the integral of P(f)f.

Then in the above example our knowledge about the first coin would be described by a function P(f) with a sharp peak around 1/2 and almost zero probability everywhere else. Our knowledge of the second coin would be described by a much broader distribution. When we find out that the coin has come up heads 10 times before our probability distribution updates according to Bayes' rule. It changes from P(f) to P(f)f^10 (or rather the normalisation of P(f)f^10). This doesn't affect the sharply pointed distribution very much because the function f^10 is approximately constant over the sharp peak. But it pushes the broad distribution strongly towards 1 because 1^10 is 1024 times larger than 1/2^10 and P(f) isn't 1024 times taller near 1/2 than near 1.

So this is a nice case where it is possible to compare between two cases how much a given piece of evidence moves our probability estimate. However I'm not sure whether this can be extended to the general case. A proposition like "Trump gets reelected" can't be thought of as being like a flip of a coin with a particular frequency. Not only are there no "previous flips" we can learn about, it's not clear what another flip would even look like. The election that Trump won doesn't count, because we had totally different knowledge about that one.

**oscar_cunningham**on Open Thread August 2018 · 2018-08-04T19:28:29.654Z · LW · GW

I see, thanks. I had been looking at the page https://www.lesswrong.com/daily, linked to from the sidebar under the same phrase "All Posts".

**oscar_cunningham**on Open Thread August 2018 · 2018-08-04T11:59:37.965Z · LW · GW

I don't see it there. Have you done the update yet?

**oscar_cunningham**on Open Thread August 2018 · 2018-08-03T20:59:30.993Z · LW · GW

What does "stickied" do?

**oscar_cunningham**on What are your plans for the evening of the apocalypse? · 2018-08-02T09:42:00.526Z · LW · GW

The financial effects would be immediate and extreme. All sorts of mad things would happen to stock prices, inflation, interest rates, etc. The people who quit their jobs to live off their savings might well find that their savings don't stretch as far as they thought, which is probably a good thing since the whole system would collapse much faster than five years if a significant proportion of people were to quit their jobs.

**oscar_cunningham**on Open Thread August 2018 · 2018-08-01T20:20:49.510Z · LW · GW

Okay, great.

**oscar_cunningham**on Open Thread August 2018 · 2018-08-01T11:45:15.744Z · LW · GW

Is it possible to subscribe to a post so you get notifications when new comments are posted? I notice that individual *comments* have subscribe buttons.

**oscar_cunningham**on Open Thread August 2018 · 2018-08-01T11:16:46.942Z · LW · GW

Old LW had a link to the open thread in the sidebar. Would it be good to have that here so that comments later in the month still get some attention?

**oscar_cunningham**on Applying Bayes to an incompletely specified sample space · 2018-07-30T22:01:44.566Z · LW · GW

I've always thought that chapter was a weak point in the book. Jaynes doesn't treat probabilities of probabilities in quite the right way (for one thing they're really probabilities of frequencies). So take it with a grain of salt.

**oscar_cunningham**on Bayesianism (Subjective or Objective) · 2018-07-30T14:42:59.792Z · LW · GW

I'm not quite sure what you mean here, but I don't think the idea of calibration is directly related to the subjective/objective dichotomy. Both subjective and objective Bayesians could desire to be well calibrated.

**oscar_cunningham**on Bayesianism (Subjective or Objective) · 2018-07-30T12:48:53.195Z · LW · GW

Also, here's Eliezer on the subject: Probability is Subjectively Objective

Under his definitions he's subjective. But he would definitely say that agents with the same state of knowledge must assign the same probabilities, which rules him out of the very subjective camp.

**oscar_cunningham**on Bayesianism (Subjective or Objective) · 2018-07-30T12:38:21.300Z · LW · GW

I think everyone agrees on the directions "more subjective" and "more objective", but they use the words "subjective"/"objective" to mean "more subjective/objective than me".

A very subjective position would be to believe that there are no "right" prior probabilities, and that it's okay to just pick any prior depending on personal choice. (i.e. Agents with the same knowledge can assign different probabilities)

A very objective position would be to believe that there are some probabilities that must be the same even for agents with different knowledge. For example they might say that you must assign probability 1/2 to a fair coin coming up heads, no matter what your state of knowledge is. (i.e. Agents with different knowledge must (sometimes) assign the same probabilities)

Jaynes and Yudkowsky are somewhere in between these two positions (i.e. agents with the same knowledge must assign the same probabilities, but the probability of any event can vary depending on your knowledge of it), so they get called "objective" by the maximally subjective folk, and "subjective" by the maximally objective folk.

The definitions in the SEP above would definitely put Jaynes and Yudkowsky in the objective camp, but there's a lot of room on the scale past the SEP definition of "objective".

**oscar_cunningham**on Bayesianism (Subjective or Objective) · 2018-07-29T14:41:41.883Z · LW · GW

The SEP is quite good on this subject:

Subjective and Objective Bayesianism.Are there constraints on prior probabilities other than the probability laws? Consider a situation in which you are to draw a ball from an urn filled with red and black balls. Suppose you have no other information about the urn. What is the prior probability (before drawing a ball) that, given that a ball is drawn from the urn, that the drawn ball will be black? The question divides Bayesians into two camps:

(a)Subjective Bayesiansemphasize the relative lack of rational constraints on prior probabilities. In the urn example, they would allow that any prior probability between 0 and 1 might be rational (though some Subjective Bayesians (e.g., Jeffrey) would rule out the two extreme values, 0 and 1). The most extreme Subjective Bayesians (e.g., de Finetti) hold that the only rational constraint on prior probabilities is probabilistic coherence. Others (e.g., Jeffrey) classify themselves as subjectivists even though they allow for some relatively small number of additional rational constraints on prior probabilities. Since subjectivists can disagree about particular constraints, what unites them is that their constraints rule out very little. For Subjective Bayesians, our actual prior probability assignments are largely the result of non-rational factors—for example, our own unconstrained, free choice or evolution or socialization.

(b)Objective Bayesians(e.g., Jaynes and Rosenkrantz) emphasize the extent to which prior probabilities are rationally constrained. In the above example, they would hold that rationality requires assigning a prior probability of 1/2 to drawing a black ball from the urn. They would argue that any other probability would fail the following test: Since you have no information at all about which balls are red and which balls are black, you must choose prior probabilities that are invariant with a change in label (“red” or “black”). But the only prior probability assignment that is invariant in this way is the assignment of prior probability of 1/2 to each of the two possibilities (i.e., that the ball drawn is black or that it is red).

In the limit, an Objective Bayesian would hold that rational constraints uniquely determine prior probabilities in every circumstance. This would make the prior probabilitieslogical probabilitiesdeterminable purelya priori.

Under these definitions, Eliezer and LW in general fall under the Objective category. We tend to believe that two agents with the same knowledge should assign the same probability.

**oscar_cunningham**on Open Thread July 2018 · 2018-07-17T15:54:16.771Z · LW · GW

Sure, the inductor doesn't know which systems are consistent, but nevertheless it eventually starts believing the proofs given by any system which is consistent.

**oscar_cunningham**on Open Thread July 2018 · 2018-07-11T12:43:15.305Z · LW · GW

Is there a preferred way to flag spam posts like this one: https://www.lesswrong.com/posts/g7LgqmEhaoZnzggzJ/teaching-is-everything-and-more ?

**oscar_cunningham**on Open Thread July 2018 · 2018-07-10T21:41:21.485Z · LW · GW

Could logical inductors be used as a partial solution to Hilbert's Second Problem (of putting mathematics on a sure footing)? Thanks to Gödel we know that there are lots of things that any given theory can't prove. But by running a logical inductor we could at least say that these things are true with some probability. Of course a result proved in the "Logical Induction" paper is that the probability of an undecidable statement tends to a value that is neither 0 or 1, so we can't use this approach to justify belief in a stronger theory. But I noticed a weaker result that does hold. There's a certain class of statements such that (assuming ZF is consistent) an inductor over PA will think that they're very likely as soon as it finds a proof for them in ZF.

This class of statements is those with only bounded quantifiers; those where every "∀" and "∃" are restricted to a predefined range. This class of statements is decidable, meaning that there's a Turing machine that will take a bounded sentence and will always halt and tell you whether or not it holds in *ℕ*. Because of this every bounded sentence has a proof (or a proof of its negation) in both PA and ZF (and PA and ZF agree which it is).

But the proofs of a bounded sentence in PA and ZF can have very different lengths. Consider the self-referential bounded sentence "PA cannot prove this sentence in fewer than 1000000 symbols". This must have a proof in PA, since we can just check all sentence with fewer than 1000000 symbols by brute force, but its proof must be longer than 1000000 symbols, or else we would get a contradiction. But the preceding sentences constitute a proof in ZF with much fewer than 1000000 symbols. So the sentence is provable in both PA and ZF, but the ZF proof is much shorter.

It might seem like the bounded sentences can't express many interesting concepts. But in fact I'd contend that they can express most (if not all) things that you might actually need to know. For example, it seems like the fact "For all x and y, x + y = y + x" is a useful unbounded sentence. But whenever you face a situation where you would want to use it there are always some particular x and y that apply in that situation. So then we can use the bounded sentence "x + y = y + x" instead, where x and y stand for whichever values actually occurred.

Now I'll show that logical inductors over PA eventually trust proofs in ZF of bounded sentences (assuming ZF is consistent). Consider the Turing machine that takes as input a number n and searches through all strings in length order, keeping track of any that are a ZF proof of a bounded sentence. When it's been searching for n timesteps it stops and outputs whichever bounded sentence it found a proof for last. Call this sentence ϕ_n. Now let P be a logical inductor over PA. Assuming that ZF is consistent, the sequence ϕ_n are all theorems of PA, and by construction there's a polynomial time Turning machine that outputs them. So by a theorem in the logical inductor paper, we have that P_n(ϕ_n) tends to 1 as n goes to infinity, meaning that for large n the logical inductor becomes confident in ϕ_n sometime around day n. If a bounded statement ϕ has a ZF proof in m symbols then it's equal to ϕ_n for n ~ exp(m). So P begins to think that ϕ is very likely from day exp(m) onward.

Assuming that the logical inductor is working with a deductive process that searches through PA proofs in length order, this can occur long before the deductive process actually proves that ϕ is true. The exponential doesn't really make a difference here, since we don't know exactly how fast the deductive process is working. But it hardly matters, because the length of ZF proofs can be arbitrarily better than those in PA. For example the shortest proof of the sentence "PA cannot prove this sentence in fewer than exp(exp(exp(n))) symbols" in PA is longer than exp(exp(exp(n))) symbols, whereas the length of the shortest proof in ZF is about log(n).

So in generality what we have proved is that weak systems will accept as very good evidence proofs given in stronger theories, so long as the target of the proof is a bounded sentence, and so long as the stronger theory is in fact consistent. This is an interesting partial answer to Hilbert's question, since it explains why we would care about proofs in ZF, even if we only believe in PA.

**oscar_cunningham**on What could be done with RNA and DNA sequencing that's 1000x cheaper than it's now? · 2018-06-26T19:17:14.270Z · LW · GW

If we can do testing quickly then we could use it for security. Perhaps (further into the future) your phone will test your DNA when you try to use it?

**oscar_cunningham**on UDT can learn anthropic probabilities · 2018-06-25T19:39:02.806Z · LW · GW

Can I actually do this experiment, and thereby empirically determine (for myself but nobody else) which of SIA and SSA is true?

**oscar_cunningham**on Set Up for Success: Insights from 'Naïve Set Theory' · 2018-02-28T09:00:45.042Z · LW · GW

This was valuable feedback for calibration, and I intend to continue this practice. I'm still worried that down the line and in the absence of teachers, I may believe that I've learnt the research guide with the necessary rigor, go to a MIRIx workshop, and realize I hadn't been holding myself to a sufficiently high standard. Suggestions for ameliorating this would be welcome.

I think if you read more textbooks you'll naturally get used to the correct level of rigour.