## Posts

## Comments

**Oscar_Cunningham**on Agency in Conway’s Game of Life · 2021-06-02T07:50:18.088Z · LW · GW

The link still works for me. Perhaps you must first become a member of that discord? Invite link: https://discord.gg/nZ9JV5Be (valid for 7 days)

**Oscar_Cunningham**on Agency in Conway’s Game of Life · 2021-05-14T20:55:56.026Z · LW · GW

The weird thing is that there are two metrics involved: information can propagate through a nonempty universe at 1 cell per generation in the sense of the l_infinity metric, but it can only propagate into empty space at 1/2 a cell per generation in the sense of the l_1 metric.

**Oscar_Cunningham**on Agency in Conway’s Game of Life · 2021-05-14T20:50:42.473Z · LW · GW

You're probably right, but I can think of the following points.

Its rule is more complicated than Life's, so its worse as an example of emergent complexity from simple rules (which was Conway's original motivation).

It's also a harder location to demonstrate self replication. Any self replicator in Critters would have to be fed with some food source.

**Oscar_Cunningham**on Agency in Conway’s Game of Life · 2021-05-14T15:54:52.893Z · LW · GW

Yeah, although probably you'd want to include a 'buffer' at the edge of the region to protect the entity from gliders thrown out from the surroundings. A 1,000,000 cell thick border filled randomly with blocks at 0.1% density would do the job.

**Oscar_Cunningham**on Agency in Conway’s Game of Life · 2021-05-14T12:32:38.359Z · LW · GW

This is very much a heuristic, but good enough in this case.

Suppose we want to know how many times we expect to see a pattern with n cells in a random field of area A. Ignoring edge effects, there are A different offsets at which the pattern could appear. Each of these has a 1/2^n chance of being the pattern. So we expect at least one copy of the pattern if n < log_2(A).

In this case the area is (10^60)^2, so we expect patterns of size up to 398.631. In other words, we expect the ash to contain any pattern you can fit in a 20 by 20 box.

**Oscar_Cunningham**on Agency in Conway’s Game of Life · 2021-05-14T08:07:14.620Z · LW · GW

The glider moves at c/4 *diagonally*, while the c/2 ships move horizontally. A c/2 ship moving right and then down will reach its destination at the same time the c/4 glider does. In fact, gliders travel at the empty space speed limit.

**Oscar_Cunningham**on Agency in Conway’s Game of Life · 2021-05-13T13:16:28.646Z · LW · GW

Most glider guns in random ash will immediately be destroyed by the chaos they cause. Those that don't will eventually reach an eater which will neutralise them. But yes, such things could pose a nasty surprise for any AI trying to clean up the ash. When it removes the eater it will suddenly have a glider stream coming towards it! But this doesn't prove it's impossible to clear up the ash.

**Oscar_Cunningham**on Agency in Conway’s Game of Life · 2021-05-13T08:04:32.634Z · LW · GW

Making a 'ship sensor' is tricky. If it collides with something unexpected it will create more chaos that you'll have to clear up.

**Oscar_Cunningham**on Agency in Conway’s Game of Life · 2021-05-13T08:01:51.957Z · LW · GW

This sounds like you're treating the area as empty space, whereas the OP specifies that it's filled randomly outside the area where our AI starts.

**Oscar_Cunningham**on Agency in Conway’s Game of Life · 2021-05-13T08:00:42.223Z · LW · GW

My understanding was that we just want to succeed with high probability. The vast majority of configurations will not contain enemy AIs.

**Oscar_Cunningham**on Agency in Conway’s Game of Life · 2021-05-13T07:58:11.100Z · LW · GW

See here https://conwaylife.com/forums/viewtopic.php?f=7&t=1234&sid=90a05fcce0f1573af805ab90e7aebdf1 and here https://discord.com/channels/357922255553953794/370570978188591105/834767056883941406 for discussion of this topic by Life hobbyists who have a good knowledge of what's possible and not in Life.

What we agree on is that the large random region will quickly settle down into a field of 'ash': small stable or oscillating patterns arranged at random. We wouldn't expect any competitior AIs to form in this region since an area of 10^120 will only be likely to contain arbitrary patterns of sizes up to log(10^120), which almost certainly isn't enough area to do anything smart.

So the question is whether our AI will be able to cut into this ash and clear it up, leaving a blank canvas for it to create the target pattern. Nobody knows a way to do this, but it's also not known to be impossible.

Recently I tried an experiment where I slowly fired gliders at a field of ash, along twenty adjacent lanes. My hope had been that each collision of a glider with the ash would on average destroy more ash than it created, thus carving a diagonal path of width 20 into the ash. Instead I found that the collisions created more ash, and so a stalagmite of ash grew towards the source at which I was creating the gliders.

**Oscar_Cunningham**on Löb's theorem simply shows that Peano arithmetic cannot prove its own soundness · 2021-04-23T14:04:52.911Z · LW · GW

Another good one is the spell 'Assume for contradiction!', which when you are trying to prove p gives you the lemma ¬p.

**Oscar_Cunningham**on Löb's theorem simply shows that Peano arithmetic cannot prove its own soundness · 2021-04-22T12:15:33.192Z · LW · GW

The rule in modal logic is that we can get ⊢□p from ⊢p, not that we can get □p from p.

True:

If PA proves p, then PA proves that PA proves p.

False:

If p, then PA proves p.

EDIT: Maybe it would clarify to say that '⊢p' and '□p' both mean 'PA (or whichever theory) can prove p', but '⊢' is used when talking *about* PA, whereas '□' is used when talking *within* PA.

**Oscar_Cunningham**on Löb's theorem simply shows that Peano arithmetic cannot prove its own soundness · 2021-04-22T11:08:34.645Z · LW · GW

From our vantage point of ZFC, we can see that PA is in fact consistent. But we know that PA can't prove its own consistency or inconsistency. So the classic example of a system which is consistent but unsound is PA + ¬Con(PA). This system is consistent since deriving a contradiction in it would amount to a proof by contradiction of consistency in PA, which we know is impossible. But it's unsound since it falsely believes that PA is not consistent.

Your proof of 'consistency → soundness' goes wrong in the following way:

Suppose no soundness: ¬(□p→p); then □p∧¬p.

This is correct. But to be more clear, a theory being unsound would mean that there was some p for which the sentence '□p∧¬p' was **true**, not that there was some p for which the sentence '□p∧¬p' was **provable in that theory**. So then in the next line

From ¬p, by necessitation □¬p

we can't apply necessitation, because we don't know that our theory proves ¬p, only that p is false.

**Oscar_Cunningham**on Löb's theorem simply shows that Peano arithmetic cannot prove its own soundness · 2021-04-22T10:51:02.818Z · LW · GW

You can't write down '∀p: Provable(p)→p' in PA, because in order to quantify over sentences we have to encode them as numbers (Gödel numbering).

We do have a formula Provable, such that when you substitute in the Gödel number p you get a sentence Provable(p) which is true if and only if the sentence p represents is provable. But we don't have a formula True, such that True(p) is true if and only if the sentence p represents is true. So the unadorned p in '∀p: Provable(p)→p' isn't possible. No such formula is possible since otherwise you could use diagonalization to construct the Liar Paradox: p⟷¬True(p) (Tarski's undefinability theorem).

What we can do is write down the sentence 'Provable(p)→p' for any particular Gödel number p. This is possible because when p is fixed we don't need True(p), we can just directly use the sentence p represents. I think of this as a restricted version of soundness: 'Soundness at p'. Then Löb's theorem tells us precisely which p PA is sound at. It's precisely the p which PA can prove.

**Oscar_Cunningham**on Löb's theorem simply shows that Peano arithmetic cannot prove its own soundness · 2021-04-21T20:34:23.604Z · LW · GW

I think you've got one thing wrong. The statement isn't *consistency*, it's a version of *soundness*. Consistency says that you can't prove a contradiction, in symbols simply . Whereas soundness is the stronger property that the things you prove are actually true, in symbols . Of course first order logic can't quantify over sentences, so you can't even ask the question of whether PA can prove itself sound. But what you can ask is whether PA can prove itself sound for particular statements, i.e. whether it can prove for some s.

What Löb's theorem says is that it can only do this for a trivial class of s, the ones that PA can prove outright. Obviously if PA can prove then it can prove (or indeed for any ). Löb's theorem tells you that these obvious cases are the *only* ones for which you can prove PA sound.

**Oscar_Cunningham**on The average North Korean mathematician · 2021-03-10T16:35:55.271Z · LW · GW

Tails can alternate between fat and thin as you go further out. If heights were normally distributed with the same mean and variance then there would be fewer people above 7ft than there are now, but the tallest man would be taller than the tallest man now.

**Oscar_Cunningham**on The average North Korean mathematician · 2021-03-08T06:45:34.458Z · LW · GW

North Korea were caught cheating in 1991 and given a 15 year ban until 2007. They were also disqualified from the 2010 IMO because of weaker evidence of cheating. Given this, an alternative hypothesis is that they have also been cheating in other years and weren't caught. The adult team leaders at the IMO do know the problems in advance, so cheating is not too hard.

**Oscar_Cunningham**on Kelly *is* (just) about logarithmic utility · 2021-03-04T21:37:12.865Z · LW · GW

One other argument I've seen for Kelly is that it's optimal if you start with $a and you want to get to $b as quickly as possible, in the limit of b >> a. (And your utility function is linear in time, i.e. -t.)

You can see why this would lead to Kelly. All good strategies in this game will have somewhat exponential growth of money, so the time taken will be proportional to the logarithm of b/a.

So this is a way in which a logarithmic utility might arise as an instrumental value while optimising for some other goal, albeit not a particularly realistic one.

**Oscar_Cunningham**on Kelly isn't (just) about logarithmic utility · 2021-02-23T13:29:20.879Z · LW · GW

Why teach about these concepts in terms of the Kelly criterion, if the Kelly criterion isn't optimal? You could just teach about repeated bets directly.

**Oscar_Cunningham**on Are index funds still a good investment? · 2020-12-03T09:14:41.262Z · LW · GW

Passive investors own the same proportion of each stock (to a first approximation). Therefore the remaining stocks, which are held by active investors, also consist of the same proportion of every stock. So if stocks go down then this will reduce the value of the stocks held by the average active investor by the same amount as those of passive investors.

If you think stocks will go down across the market then the only way to avoid your investments going down is to not own stocks.

**Oscar_Cunningham**on Scoring 2020 U.S. Presidential Election Predictions · 2020-11-11T10:47:27.146Z · LW · GW

I just did the calculations. Using the interactive forecast from 538 gives them a score of -9.027; using the electoral_college_simulations.csv data from The Economist gives them a score of -7.841. So The Economist still wins!

**Oscar_Cunningham**on Did anybody calculate the Briers score for per-state election forecasts? · 2020-11-11T09:26:21.534Z · LW · GW

**Oscar_Cunningham**on Scoring 2020 U.S. Presidential Election Predictions · 2020-11-11T09:25:52.540Z · LW · GW

Does it make sense to calculate the score like this for events that aren't independent? You no longer have the cool property that it doesn't matter how you chop up your observations.

I think the correct thing to do would be to score the single probability that each model gave to this exact outcome. Equivalently you could add the scores for each state, but for each use the probability conditional on the states you've already scored. For 538 these probabilities are available via their interactive forecast.

Otherwise you're counting the correlated part of the outcomes multiple times. So it's not surprising that The Economist does best overall, because they had the highest probability for a Biden win and that did in fact occur.

My suggested method has the nice property that if you score two perfectly correlated events then the second one always gives exactly 0 points.

**Oscar_Cunningham**on Did anybody calculate the Briers score for per-state election forecasts? · 2020-11-11T07:19:45.261Z · LW · GW

Does it make sense to calculate the score like this for events that aren't independent? You no longer have the cool property that it doesn't matter how you chop up your observations.

I think the correct thing to do would be to score the single probability that each model gave to this exact outcome. Equivalently you could add the scores for each state, but for each use the probabilities conditional on the states you've already scored. For 538 these probabilities are available via their interactive forecast.

Otherwise you're counting the correlated part of the outcomes multiple times. So it's not surprising that The Economist does best overall, because they had the highest probability for a Biden win and that did in fact occur.

EDIT: My suggested method has the nice property that if you score two perfectly correlated events then the second one always gives exactly 0 points.

**Oscar_Cunningham**on PredictIt: Presidential Market is Increasingly Wrong · 2020-10-19T06:19:16.587Z · LW · GW

Maybe there's just some new information in Trump's favour that you don't know about yet?

**Oscar_Cunningham**on Classifying games like the Prisoner's Dilemma · 2020-07-04T19:17:59.974Z · LW · GW

I've been wanting to do something like this for a while, so it's good to see it properly worked out here.

If you wanted to expand this you could look at games which weren't symmetrical in the players. So you'd have eight variables, W, X, Y and Z, and w, x, y and z. But you'd only have to look at the possible orderings within each set of four, since it's not necessarily valid to compare utilities between people. You'd also be able to reduce the number of games by using the swap-the-players symmetry.

**Oscar_Cunningham**on Wolf's Dice · 2019-07-17T13:06:40.118Z · LW · GW

Right. But also we would want to use a prior that favoured biases which were near fair, since we know that Wolf at least thought they were a normal pair of dice.

**Oscar_Cunningham**on Open Thread April 2019 · 2019-04-04T07:04:15.002Z · LW · GW

Suppose I'm trying to infer probabilities about some set of events by looking at betting markets. My idea was to visualise the possible probability assignments as a high-dimensional space, and then for each bet being offered remove the part of that space for which the bet has positive expected value. The region remaining after doing this for all bets on offer should contain the probability assignment representing the "market's beliefs".

My question is about the situation where there is no remaining region. In this situation for every probability assignment there's some bet with a positive expectation. Is it a theorem that there is always an arbitrage in this case? In other words, can one switch the quantifiers from "for all probability assignments there exists a positive expectation bet" to "there exists a bet such that for all probability assignments the bet has positive expectation"?

**Oscar_Cunningham**on The Kelly Criterion · 2018-10-17T00:38:14.810Z · LW · GW

I believe you missed one of the rules of Gurkenglas' game, which was that there are at most 100 rounds. (Although it's possible I misunderstood what they were trying to say.)

If you assume that play continues until one of the players is bankrupt then in fact there are lots of winning strategies. In particular betting any constant proportion less than 38.9%. The Kelly criterion isn't unique among them.

My program doesn't assume anything about the strategy. It just works backwards from the last round and calculates the optimal bet and expected value for each possible amount of money you could have, on the basis of the expected values in the next round which it has already calculated. (Assuming each bet is a whole number of cents.)

**Oscar_Cunningham**on The Kelly Criterion · 2018-10-16T18:39:28.929Z · LW · GW

If you wager one buck at a time, you win almost certainly.

But that isn't the Kelly criterion! Kelly would say I should open by betting *two* bucks.

In games of that form, it seems like you should be more-and-more careful as the amount of bets gets larger. The optimal strategy doesn't tend to Kelly in the limit.

EDIT: In fact my best opening bet is $0.64, leading to expected winnings of $19.561.

EDIT2: I reran my program with higher precision, and got the answer $0.58 instead. This concerned me so I reran again with infinite precision (rational numbers) and got that the best bet is $0.21. The expected utilities were very similar in each case, which explains the precision problems.

EDIT3: If you always use Kelly, the expected utility is only $18.866.

**Oscar_Cunningham**on The Kelly Criterion · 2018-10-16T16:33:39.971Z · LW · GW

Can you give a concrete example of such a game?

**Oscar_Cunningham**on The Kelly Criterion · 2018-10-16T14:06:19.033Z · LW · GW

even if your utility outside of the game is linear, inside of the game it is not.

Are there any games where it's a wise idea to use the Kelly criterion even though your utility outside the game is linear?

**Oscar_Cunningham**on The Kelly Criterion · 2018-10-16T13:34:57.081Z · LW · GW

Marginal utility is decreasing, but in practice falls off far less than geometrically.

I think this is only true if you're planning to give the money to charity or something. If you're just spending the money on yourself then I think marginal utility is literally zero after a certain point.

**Oscar_Cunningham**on Open Thread September 2018 · 2018-09-26T12:34:52.460Z · LW · GW

Yeah, I think that's probably right.

I thought of that before but I was a bit worried about it because Löb's Theorem says that a theory can never prove this axiom schema about itself. But I think we're safe here because we're assuming "If T proves φ, then φ" while not actually working in T.

**Oscar_Cunningham**on Open Thread September 2018 · 2018-09-26T11:00:25.121Z · LW · GW

I'm arguing that, for a theory T and Turing machine P, "T is consistent" and "T proves that P halts" aren't together enough to deduce that P halts. And as I counter example I suggested T = PA + "PA is inconsistent" and P = "search for an inconsistency in PA". This P doesn't halt even though T is consistent and proves it halts.

So if it doesn't work for that T and P, I don't see why it would work for the original T and P.

**Oscar_Cunningham**on Open Thread September 2018 · 2018-09-25T18:07:25.927Z · LW · GW

Consistency of T isn't enough, is it? For example the theory (PA + "The program that searches for a contradiction in PA halts") is consistent, even though that program doesn't halt.

**Oscar_Cunningham**on Quantum theory cannot consistently describe the use of itself · 2018-09-25T09:53:31.606Z · LW · GW

https://www.scottaaronson.com/blog/?p=3975

**Oscar_Cunningham**on Open Thread September 2018 · 2018-09-20T20:11:55.066Z · LW · GW

This is a good point. The Wikipedia pages for other sites, like Reddit, also focus unduly on controversy.

**Oscar_Cunningham**on Zut Allais! · 2018-09-06T20:35:20.377Z · LW · GW

And the fact that situations like that occurred in humanity's evolution explains why humans have the preference for certainty that they do.

**Oscar_Cunningham**on Open Thread September 2018 · 2018-09-03T09:10:35.767Z · LW · GW

As well as ordinals and cardinals, Eliezer's construction also needs concepts from the areas of computability and formal logic. A good book to get introduced to these areas is Boolos' "Computability and Logic".

**Oscar_Cunningham**on Open Thread September 2018 · 2018-09-03T08:08:29.260Z · LW · GW

being unable to imagine a scenario where something is possible

This isn't an accurate description of the mind projection fallacy. The mind projection fallacy happens when someone thinks that some phenomenon occurs in the real world but in fact the phenomenon is a part of the way their mind works.

But yes, it's common to almost all fallacies that they are in fact weak Bayesian evidence for whatever they were supposed to support.

**Oscar_Cunningham**on Open Thread September 2018 · 2018-09-01T17:12:14.313Z · LW · GW

Eliezer made this attempt at naming a large number computable by a small Turing machine. What I'm wondering is exactly what axioms we need to use in order to prove that this Turning machine does indeed halt. The description of the Turing machine uses a large cardinal axiom ("there exists an I0 rank-into-rank cardinal"), but I don't think that assuming this cardinal is enough to prove that the machine halts. Is it enough to assume that this axiom is consistent? Or is something stronger needed?

**Oscar_Cunningham**on You Play to Win the Game · 2018-08-31T19:10:06.529Z · LW · GW

games are a specific case where the utility (winning) is well-defined

Lots of board games have badly specified utility functions. The one that springs to mind is Diplomacy; if a stalemate is negotiated then the remaining players "share equally in a draw". I'd take this to mean that each player gets utility 1/n (where there are n players, and 0 is a loss and 1 is a win). But it could also be argued that they each get 1/(2n), sharing a draw (1/2) between them (to get 1/n each wouldn't they have to be "sharing equally in a win"?).

Another example is Castle Panic. It's allegedly a cooperative game. The players all "win" or "lose" together. But in the case of a win one of the players is declared a "Master Slayer". It's never stated how much the players should value being the Master Slayer over a mere win.

Interesting situations occur in these games when the players have different opinions about the value of different outcomes. One player cares more about being the Master Slayer than everyone else, so everyone else lets them be the Master Slayer. They think that they're doing much better that everyone else, but everyone else is happy so long as they all keep winning.

**Oscar_Cunningham**on Open Thread August 2018 · 2018-08-16T19:31:54.613Z · LW · GW

I actually learnt quantum physics from that sequence, and I'm now a mathematician working in Quantum Computing. So it can't be too bad!

The explanation of quantum physics is the best I've seen anywhere. But this might be because it explained it in a style that was particularly suited to me. I really like the way it explains the underlying reality first and only afterwards explains how this corresponds with what we perceive. A lot of other introductions follow the historical discovery of the subject, looking at each of the famous experiments in turn, and only building up the theory in a piecemeal way. Personally I hate that approach, but I've seen other people say that those kind of introductions were the only ones that made sense to them.

The sequence is especially good if you don't want a math-heavy explantation, since it manages to explain exactly what's going on in a technically correct way, while still not using any equations more complicated than addition and multiplication (as far as I can remember).

The second half of the sequence talks about interpretations of quantum mechanics, and advocates for the "many-worlds" interpretation over "collapse" interpretations. Personally I found it sufficient to convince me that collapse interpretations were bullshit, but it didn't quite convince me that the many-worlds interpretation is obviously true. I find it plausible that the true interpretation is some third alternative. Either way, the discussion is very interesting and worth reading.

As far as "holding up" goes, I once read through the sequence looking for technical errors and only found one. Eliezer says that the wavefunction can't become more concentrated because of Liouville's theorem. This is completely wrong (QM is time-reversible, so if the wavefunction can become more spread out it must also be able to become more concentrated). But I'm inclined to be forgiving to Eliezer on this point because he's making exactly the mistake that he repeatedly warns us about! He's confusing the distribution described by the wavefunction (the uncertainty that we *would* have if we performed a measurment) with the uncertainty we *do* have *about* the wavefunction (which is what Liouville's theorem actually applies to).

**Oscar_Cunningham**on [deleted post] 2018-08-15T13:34:41.626Z

Really, the fact that different sizes of moral circle can incentivize coercion is just a trivial corollary of the fact that value differences in general can incentivize coercion.

**Oscar_Cunningham**on [deleted post] 2018-08-15T07:04:31.147Z

When people have a wide circle of concern and advocate for its widening as a norm, this makes me nervous because it implies huge additional costs forced on me, through coercive means like taxation or regulations

At the moment I (and many others on LW) are experiencing the opposite. We would prefer to give money to people in Africa, but instead we are forced by taxes to give to poor people in the same country as us. Since charity to Africa is much more effective, this means that (from our point of view) 99% of the taxed money is being wasted.

**Oscar_Cunningham**on Open Thread August 2018 · 2018-08-14T13:06:38.396Z · LW · GW

Okay, sure. But an idealized rational reasoner wouldn't display this kind of uncertainty about its own beliefs, but it would still have the phenomenon you were originally asking about (where statements assigned the same probability update by different amounts after the introduction of evidence). So this kind of second-order probability can't be used to answer the question you originally asked.

**Oscar_Cunningham**on Open Thread August 2018 · 2018-08-13T11:22:04.778Z · LW · GW

It seems like you're describing a Bayesian probability distribution over a frequentist probability estimate of the "real" probability.

Right. But I was careful to refer to f as a frequency rather than a probability, because f isn't a description of our beliefs but rather a physical property of the coin (and of the way it's being thrown).

Agreed that this works in cases which make sense under frequentism, but in cases like "Trump gets reelected" you need some sort of distribution over a Bayesian credence, and I don't see any natural way to generalise to that.

I agree. But it seems to me like the other replies you've received are mistakenly treating all propositions as though they do have an f with an unknown distribution. Unnamed suggests using the beta distribution; the thing which it's the distribution of would have to be f. Similarly rossry's reply, containing phrases like "something in the ballpark of 50%" and "precisely 50%", talks as though there is some unknown percentage to which 50% is an estimate.

A lot of people (like in the paper Pattern linked to) think that our distribution over f is a "second-order" probability describing our beliefs about our beliefs. I think this is wrong. The number f doesn't describe our beliefs at all; it describes a physical property of the coin, just like mass and diameter.

In fact, any kind of second-order probability must be trivial. We have introspective access to our own beliefs. So given any statement about our beliefs we can say for certain whether or not it's true. Therefore, any second-order probability will either be equal to 0 or 1.