## Posts

## Comments

**jonathan_lee**on I Want To Live In A Baugruppe · 2017-03-18T19:29:23.680Z · score: 4 (4 votes) · LW · GW

Even if it's the case that the statistics are as suggested, it would seem that a highly effective strategy is to ensure that there are multiple adults around all the time. I'll accept your numbers ad arguendo (though I think they're relevantly wrong).

If there's a 4% chance that one adult is an abuser, there's a 1/625 chance that two independent ones are, and one might reasonably assume that the other 96% of adults are unlikely to let abuse slide if they see any evidence of it. The failure modes are then things like abusers being able to greenbeard well enough that multiple abusers identify each other and then proceed to be all the adults in a given situation. Which is pretty conjunctive as failures go, and especially in a world where you insist that you know all the adults personally from before you started a baugruppe rather that letting Bob (and his 5 friends who are new to you) all join.

You also mention "selection for predators", but that seems to run against the (admittedly folk) wisdom that children at risk of abuse are those that are isolated and vulnerable. Daycare centres are not the central tendency of abuse; quiet attics are.

**jonathan_lee**on Stationary algorithmic probability · 2015-06-12T07:32:49.000Z · score: 2 (2 votes) · LW · GW

It looks like Theorem 1 can be improved slightly, by dropping the "only if" condition on . We can then code up something like Kolmogorov complexity by adding a probability transition from every site to our chosen UTM.

If you only want the weaker statement that there is no stationary distribution, it looks like there's a cheaper argument: Since is aperiodic and irreducible the hypothetical stationary distribution is unique. is closed under the action of , and (2) implies that for any , the map is an automorphism of the Markov chain. If the (infinite) transition matrix is , then can be considered as a permutation matrix with (abusing notation) . Then and so by uniqueness. So is constant on orbits of , which are all countably infinite. Hence is everywhere , a contradiction.

The above still holds if (2) is restricted to only hold for a group such that every orbit under is infinite.

I think the above argument shows why (2) is too strong; we shouldn't expect the world to look the same if you pick a "wrong" (ie. complicated) UTM to start off with. Weakening (2) might mean saying something like asserting only . To do this, we might define the measures and together (ie. finding a fixed point of a map from pairs to ). In such a model, constraints the transition probabilities, is stationary; it's not clear how one might formalise a derivation of from but it seems plausible that there is a canonical way to do it.

**jonathan_lee**on The Galileo affair: who was on the side of rationality? · 2015-02-18T10:50:12.288Z · score: 3 (3 votes) · LW · GW

That sounds a rather odd argument to make, even at the time. Astronomy from antiquity was founded on accurate observations.

Astronomy and epistemology aren't quite the same. Predicting where Saturn would be on a given date requires accurate observation, and nobody objected to Coperniucus as a calculational tool. For example, the Jesuits are teaching Copernicus in China in Chinese about 2 years after he publishes, which implies they translated and shipped it with some alacrity.

The heavens were classically held to be made of different stuff; quintessense (later called aether) was not like regular matter -- this is obvious from the inside, because it maintains perpetual motion where normal matter does not. A lot of optical phenomena (eg. twinkling stars, the surface of the moon) were not seen as properties of the objects in question but properties of regular 4-elements matter between us and them.

By a modern standard, the physics is weird and disjointed... but that is historically how it was seen.

**jonathan_lee**on The Galileo affair: who was on the side of rationality? · 2015-02-17T02:54:47.714Z · score: 5 (5 votes) · LW · GW

The precise phrasing is deliberately a little tendentious, but the issue of the epistemological status of the telescope was raised by loads of people at the time. For a modern review with heavy footnotes, see eg Galileo, Courtier: The Practice of Science in the Culture of Absolutism, pp 95-100, (though the whole chapter is good)

For example, the first anti-Galilean tract is by Horky in 1610 and focussed mostly on the lack of reliability of the telescope. For another, Magini's letters (confirmed in Kepler and Galileo) write of a "star party" in 1610 where Galileo attempted to convince a number of astronomers of the discovery of the Medician (now Galilean) moons; noone else could see the moons and additionally the telescope produced doubled images of everything more distant than the moon.

There wasn't much dispute about terrestial applications. Under Aristotle's physics everything above the moon is made of different stuff with different physics anyway, so any amount of accuracy when looking at stuff of the four elements doesn't allow one to induct to accuracy in observations of the heavens.

**jonathan_lee**on The Galileo affair: who was on the side of rationality? · 2015-02-16T12:52:39.435Z · score: 41 (41 votes) · LW · GW

tl;dr: The side of rationality during Galileo's time would be to recognise one's confusion and recognise that the models did not yet cash out in terms of a difference in expected experiences. That situation arguably holds until Newton's Principia; prior to that no one has a working physics for the heavens.

The initial heliocentric models weren't more accurate by virtue of being heliocentric; they were better by virtue of having had their parameters updated with an additional 400 years of observational data over the previous best-fit model (the Alfonsine tables from the 1250s). The geometry was similarly complicated; there was still a strong claim that only circular motions could be maintained indefinitely, and so you have to toss 60 or so circular motions in to get the full solar system on either model.

Basically everyone was already using the newer tables as calculational tools, and it had been known from ancient times that you could fix any point you wanted in an epicyclic model and get the same observational results. The dispute was about which object *was in fact* fixed. Kepler dates to the same time, and will talk about ellipses (and dozens of other potential curves) in place of circular motion from 1610, but he cannot predict where a planet will be efficiently. He's also not exactly a paragon of rationality; astrology and numerology drive most of his system, and he quite literally ascribes his algebraic slips to god.

A brief but important digression into Aristotle is needed; he saw as key that was made was that the motion of the planets is unceasing but changes, whereas all terrestrial motions ceased eventually. He held that circular motions were the only kind of motion that could be sustained indefinitely, and even then, only by a certain special kind of perfect matter. The physics of this matter fundamentally differed from the physics of normal stuff in Aristotle. Roughly and crudely, if it can change then it has to have some kind of dissipative / frictional physics and so will run down.

Against that backdrop, Galileo's key work wasn't the Dialogue, but the Siderius Nuncius. There had been two novae observed in the 40 years prior, and this had been awkward because a whole bunch of (mostly neo-Platonists) were arguing that this showed the heavens changed, which is a problem for Aristotle. Now Galileo shows up and using a device which distorts his vision, he claims to be able to deduce:

- There are Mountains on the moon (so that it is not a sphere contra Aristotle)
- There are Invisible objects orbiting Jupiter
- That the planets show disks
- That the Sun has spots, which move across the face and separately change with time
- That Venus has phases (which essentially require that it orbit the Sun)
- That Saturn has lumps on it (and thus not a sphere -- he's seeing the rings) As an observational program, this is picked up with and deeply explored by loads of people (inc. Jesuits like Riccioli). But to emphasise: Galileo is using a device which distorts his vision and which can only be tested on terrestrial objects and claiming to use it to find out stuff about the heavens, which contemporary physics says is grossly different. Every natural philosopher who's read Aristotle recognises that this kind of procedure hasn't historically been useful.

From a viewpoint which sees a single unified material physics, these observations kill Aristotelian cosmology. You've got at least three centers of circular-ish motion, which means you can't mount the planets on transparent spheres to actually move them around. You have an indication that the Sun might be rotating, and is certainly dynamic. If you kill Aristotle's cosmology, you have to kill most of his physics, and thus a good chunk of his philosophy. That's a problem, because since Aquinas the Catholic church had been deriving theology as a natural consequence of Aristotle in order to secure themselves against various heresies. And now some engineer with pretensions is turning up, distorting his vision and claiming to upend the cart.

What Galileo does not have is a coherent alternative package of physics and cosmology. He claims to be able to show a form of circular inertia from first principles. He claims that this yields a form of relativity in motion which makes it difficult to discern your true motion without reference to the fixed stars. He claims that physics is kinda-sorta universal, based on his experience with cannon (which Aristotelian physics would dismiss because [using modern terminology] experiments where you apply forces yourself are not reproducible and so cannot yield knowledge). This means his physics has real issues explaining dissipative effects. He doesn't have action at a distance, so he can't explain why the planets do their thing (whereas there are physical models of Aristotelian / Ptolemaic models).

He gets into some pro forma trouble over the book, because he doesn't put a disclaimer on it saying that he'll retract it if it's found to be heretical. Which is silly and it gets his knuckles rapped over it. The book is "banned", which means two things, for there are two lists of banned books. One is "burn before reading" and the other is more akin to being in the Restricted Section; Galileo's work is the latter.

Then he's an ass in the Dialogue. Even that would not have been an issue, but at the time he's the court philosopher of the Grand Duke of Tuscany, Cosimo I de' Medici. This guy is a secular problem for the Pope; he has an army, he's not toeing the line, there's a worry that he'll annex the Papal states. So there's a need to pin his ears back, and Galileo is a sufficiently senior member of the court that Cosimo won't ignore his arrest nor will he go to war over it.

So the Inquisition cooks up a charge for political purposes, has him "tortured" (which is supposed to mean they /show/ him the instruments of torture, but they actually forget to), get him to recant (in particular get Cosimo to come beg for his release), and release him to "house arrest" (where he is free to come, go, see whoever, write, etc). The drama is politics, rather than anything epistemological.

As to the disputes you mention, some had been argued through by the ancient Greeks. For example, everyone knew that measurements were imprecise, and so moving the earth merely required that the stars were *distant*. It was also plain that *if* you accepted Galileo's observations as being indicative of truth, then Aristotelian gravity was totally dead, because some stuff did not strive to fall (cometary tails were also known to be... problematic).

Now, Riccioli is writing 20 years later, in an environment where heliocentrism has become a definite *thing* with political and religious connotations, associated to neo-Platonism, anti-Aristotelean, anti-Papal thinking. This is troublesome because it strikes at the foundational philosophy underpinning the Church, and secular rulers in Europe are trying to strategically leverage this. Much like Aquinas, Riccioli's bottom line is /written/ already. He has to mesh this new stack of observational data with something which looks at least somewhat like Aristotle. Descartes is contracted at about the same time to attempt to rederive Catholicism from a new mixed Aristotilean / Platonist basis.

As a corollary, he's being quite careful to list every argument which anyone has made, and every refutation (there's a comparatively short summary here). Most of the arguments presented have counterpoints from the other side, however strained they might seem from a modern view. It's more akin to having 126 phenomena which need to be explained than anything else. They don't touch on the apparently changing nature of the planets (by this point cloud bands on Jupiter could be seen) and restrict themselves mostly to the physics of motion. There's a lot of duplication of the same fundamental point, and it's not a quantitative discussion. There are some "in principle" experiments discussed, but a fair few had been considered by Galileo and calculated to be infeasible (eg. observing 1 inch deflections in cannon shot at 500 yards, when the accuracy is more like a yard).

Obviously Newton basically puts a stop to the whole thing, because (modulo a lack of mechanism) he can give you a calculational tool which spits out Kepler and naturally fixes the center of mass. There are still *huge* problems; the largest is that even point-like stars appear to have small disks from diffraction, and until you know this you end up thinking every other star has to be larger than the entire solar system. And the apparent madness of a universal law is almost impossible to understate. It's really ahistorical to think that a very modern notion of parsimony in physics could have been applied to Galileo and his contemporaries.

**jonathan_lee**on Probability, knowledge, and meta-probability · 2013-09-15T01:18:46.772Z · score: 1 (1 votes) · LW · GW

So, my observation is that without meta-distributions (or A_p), or conditioning on a pile of past information (and thus tracking /more/ than just a probability distribution over current outcomes), you don't have the room in your knowledge to be able to even talk about sensitivity to new information coherently. Once you can talk about a complete state of knowledge, you can begin to talk about the utility of long term strategies.

For example, in your example, one would have the same *probability* of being paid today if 20% of employers actually pay you every day, whilst 80% of employers never paid you. But in such an environment, it would not make sense to work a second day in 80% of cases. The optimal strategy depends on what you *know*, and to represent that in general requires more than a straight probability.

There *are* different problems coming from the distinction between choosing a long term policy to follow, and choosing a one shot action. But we can't even approach this question in general unless we can talk sensibly about a sufficient set of information to keep track of about. There are two distinct problems, one prior to the other.

Jaynes does discuss a problem which is closer to your concerns (that of estimating neutron multiplication in a 1-d experiment 18.15, pp579. He's comparing two approaches, which for my purposes differ in their prior A_p distribution.

**jonathan_lee**on Probability, knowledge, and meta-probability · 2013-09-14T20:40:02.062Z · score: 4 (4 votes) · LW · GW

The substantive point here isn't about EU calculations per se. Running a full analysis of everything that might happen and doing an EU calculation on that basis is fine, and I don't think the OP disputes this.

The subtlety is about what numerical data can formally represent your full state of knowledge. The claim is that a mere probability of getting the $2 payout does not. It's the case that on the first use of a box, the probability of the payout given its colour is 0.45 regardless of the colour.

However, if you merely hold onto that probability, then if you put in a coin and so learn something about the boxes you can't update that probability to figure out what the probability of payout for the second attempt is. You need to go back and also remember whether the box is green or brown. The point of Jaynes and the A_p distribution is that it actually does screen off all other information. If you keep track of it you never need to worry about remembering the colour of the box, or the setup of the experiment. Just this "meta-distribution".

**jonathan_lee**on An attempt at a short no-prerequisite test for programming inclination · 2013-06-30T09:16:30.432Z · score: 4 (4 votes) · LW · GW

Concretely, I have seen this style of test (for want of better terms, natural language code emulation) used as a screening test by firms looking to find non-CS undergraduates who would be well suited to develop code.

In as much as this test targets indirection, it is comparatively easy to write tests which target data driven flow control or understanding state machines. In such a case you read from a fixed sequence and emit a string of outputs. For a plausible improvement, get the user to log the full sequence of writes, so that you can see on which instruction things go wrong.

There also seem to be aspects of coding which are not simply being technically careful about the formal function of code. The most salient to me would be taking an informally specified natural language problem and reducing it to operations one can actually do. Algorithmic / architectural thinking seems at least as rare as fastidiousness about code.

**jonathan_lee**on Model Stability in Intervention Assessment · 2013-06-12T23:21:58.765Z · score: 2 (2 votes) · LW · GW

To my knowledge, it's not discussed explicitly in the wider literature. I'm not a statistician by training though, so my knowledge of the literature is not brilliant.

On the other hand, talking to working Bayesian statisticians about "what do you do if we don't know what the model should be" seems to reliably return answers of broad form "throw that uncertainty into a two-level model, run the update, and let the data tell you which model is correct". Which is the less formal version of what Jaynes is doing here.

This seems to be a reasonable discussion of the same basic material, though in a setting of finitely many models rather than the continuum of p models for Jaynes.

**jonathan_lee**on The Use of Many Independent Lines of Evidence: The Basel Problem · 2013-06-08T20:59:08.296Z · score: 1 (1 votes) · LW · GW

Thank you for calling out a potential failure mode. I observe that my style of inquisition can come across as argumentative, in that I do not consistently note when I have shifted my view (instead querying other points of confusion). This is unfortunate.

To make my object level opinion changes more explicit:

I have had a weak shift in opinion towards the value of attempting to quantify and utilise weak arguments in internal epistemology, after our in person conversation and the clarification of what you meant.

I have had a much lesser shift in opinion of the value of weak arguments in rhetoric, or other discourse where I cannot assume that my interlocutor is entirely rational and truth-seeking.

I have not had a substantial shift in opinion about the history of mathematics (see below).

As regards the history of mathematics, I do not *know* our relative expertise, but my background prior for most mathematicians (including JDL_{2008}) has a measure >0.99 cluster that finds true results obvious in hindsight and counterexamples to false results obviously natural. My background prior also suggests that those who have spent time thinking about mathematics *as it was done at the time* fairly reliably do not have this view. It further suggests that on this metric, I have done more thinking than the median mathematician (against a background of Cantab. mathmos, I would estimate I'm somewhere above the 5th centile of the distribution). The upshot of this is that your recent comments have not substantively changed my views about the relative merit of Cauchy and Euler's arguments at the time they were presented; my models of historians of mathematics who have studied this do not reliably make statements that look like your claims wrt. the Basel problem.

I do not know what your priors look like on this point, but it seems highly likely that our difference in views on the mathematics factor through to our priors, and convergence will likely be hindered by being merely human and having low baud channels.

**jonathan_lee**on The Use of Many Independent Lines of Evidence: The Basel Problem · 2013-06-08T19:26:16.577Z · score: 1 (1 votes) · LW · GW

Fermat considered the sequence of functions f(n,x) = x^n for n = 0, 1, 2, 3, ....

Only very kind of. Fermat didn't have a notion of function in the sense meant later, and showed geometrically that the area under certain curves could be computed by something akin to Archimedes' method of exhaustion, if you dropped the geometric rigour and worked algebraically. He wasn't looking at a limit of functions in any sense; he showed that the integral could be computed in general.

The counterexample is only "very simple" in the context of *knowing* that the correct condition is uniform convergence, and *knowing* that the classical counterexamples look like x^n, n->\infty or bump functions. Counterexamples are not generally obvious upfront; put another way, it's really easy to engage in Whig history in mathematics.

**jonathan_lee**on The Use of Many Independent Lines of Evidence: The Basel Problem · 2013-06-04T19:14:58.214Z · score: 1 (1 votes) · LW · GW

It possible that "were known in general to lead to paradoxes" would be a more historically accurate phrasing than "without firm foundation".

For east to cite examples, there's "The Analyst" (1734, Berkeley). The basic issue was that infinitesimals needed to be 0 at some points in a calculation and non-0 at others. For a general overview, this seems reasonable. Grandi noticed in 1703 that infinite series did not need to give determinate answers; this was widely known in by the 1730's. Reading the texts, it's fairly clear that the mathematicians working in the field were aware of the issues; they would dress up the initial propositions of their calculi in lots of metaphysics, and then hurry to examples to prove their methods.

**jonathan_lee**on The Use of Many Independent Lines of Evidence: The Basel Problem · 2013-06-03T21:25:07.371Z · score: 1 (1 votes) · LW · GW

That it worked in every instance of continuous functions that had been considered up to that point, seemed natural, and extended many existing demonstrations that a specific sequence of continuous functions had a continuous limit.

A need for lemmas of the latter form are endemic, for a concrete class of examples, any argument via a Taylor series on an interval implicitly requires such a lemma, to transfer continuity, integrals and derivatives over. In just this class, you get numerical evidence came from the success of perturbative solutions to Newtonian mechanics, and theoretical evidence in the existence of well behaved Taylor series for most functions.

**jonathan_lee**on The Use of Many Independent Lines of Evidence: The Basel Problem · 2013-06-03T19:32:19.300Z · score: 12 (12 votes) · LW · GW

Observationally, the vast majority of mathematical papers do not make claims that are non-rigorous but as well supported as the Basel problem. They split into rigorous proofs (potentially conditional on known additional hypotheses eg. Riemann), or they offer purely heuristic arguments with substantially less support.

It should also be noted that Euler was working at a time when it was widely known that the behaviour of infinite sums, products and infinitesimal analysis (following Newton or Leibnitz) was without any firm foundation. So analysis of these objects at that time was generally flanked with "sanity check" demonstrations that the precise objects being analysed did not trivially cause bad behaviour. Essentially everyone treated these kinds of demonstrations as highly suspect until the 1830's and a firm foundation for analysis (cf. Weierstrass and Riemann). Today we grandfather these demonstrations in as proofs because we can show proper behaviour of these objects.

On the other hand, there were a great many statements made at that time which later turned out to be false, or require additional technical assumptions once we understood analysis, as distinct from an calculus of infinitesimals. The most salient to me would be Cauchy's 1821 "proof" that the pointwise limit of continuous functions is continuous; counterexamples were not constructed until 1826 (by which time functions were better understood) and it took until 1853 for the actual conditions (uniform continuity) to be developed properly. This statement was at least as well supported in 1821 as Euler's was in 1735.

As to confidence in modern results: Looking at the Web of Science data collated here for retractions in mathematical fields suggests that around 0.15% of current papers are retracted.

**jonathan_lee**on CEA does not seem to be credibly high impact · 2013-02-23T09:55:44.950Z · score: -1 (1 votes) · LW · GW

I do not think that they are "making it up"; that phrase to me seems to attach all sorts of deliberate malfeasance that I do not wish to suggest. I think that to an *outside observer* the estimate is optimistic to the point of being incredible, and reflecting poorly on CEA for that.

These 291 people haven't pledged dollar values. They've pledged percentage incomes. To turn that into a dollar value you need to estimate whole-life incomes. Reverse engineering an estimate of income (assuming that most people pledge 10%, and a linear drop off in pledgers with 50% donating for 40 years), yields mean lifetime earnings of ~£100K. That's about the 98th centile for earnings in the UK.

**jonathan_lee**on CEA does not seem to be credibly high impact · 2013-02-22T00:38:56.645Z · score: 5 (17 votes) · LW · GW

Hi Will,

I'm glad to hear that a general response is being collated; if there are things where CEA can improve it would seem like a good idea to do them, and if I'm wrong I would like to know that. Turning to the listed points:

I went into that conversation with a number of questions I sought answers to, and either asked them or saw the data coming up from other questions. I knew your time was valuable and mostly targeted at other people there.

Adam explicitly signed off on my comment to Luke. He saw the draft post, commented on it, recommended it be put here and received the original string of emails in the context of being a friend, and person I knew would have a closer perspective on the day to day running of CEA than myself.

£1700 came from Jacob (Trefethen), in conversation shortly after you were in Cambridge, and purporting to be from internal numbers. I had asked whether CEA has an internal price at which new pledges would be bought, on the basis that one should exist, and it would be important for valuing a full-time Cambridge position.

~4K is 1/3 of the Oxford undergrad population, which was the figure I had heard quoted in the discussion in Cambridge.

GWWC lists 8 people as a sample of past-and-present researchers, a research manager and a research director. I estimated that half of the former set would have moved on, and thus that 6 people were at least engaged in part time research for GWWC.

I am concerned both about utility-maximisation and the ROI. It seems easier to fix efficiency problems whilst institutions are still small, or create alternate more efficient institutions if need be; ideally groups akin to CEA's projects are going to move budgets of O(10^9 / year), and I want to see that used as effectively as possible.

In terms of ROI, I don't put large weight in the estimated returns absent a calculation or substantial trust in the instrumental rationality of the organisation making the claims. To take the canonical example, GiveWell provides some measure of each; CEA's projects need to be at least as credible.

Thanks again for taking the critique in the spirit that was intended.

Best wishes,

Jonathan

**jonathan_lee**on CEA does not seem to be credibly high impact · 2013-02-21T21:33:44.052Z · score: -3 (15 votes) · LW · GW

The primary source of the post was an extensive email exchange with Adam Casey (currently working full time at CEA). Since we are friends, this was not primarily in an official capacity. I also asked Adam to cross check the numbers whilst wearing a more official hat.

I was encouraged by him and Alexey Morgunov (Cambridge LWer) to make the substance of this public immediately after Will Crouch came up to Cambridge.

**jonathan_lee**on CEA does not seem to be credibly high impact · 2013-02-21T11:45:18.146Z · score: 0 (0 votes) · LW · GW

Whose status ordering are you using? Getting someone who is *not* a mathematician to TMS is harder; within the Natural Sciences it is possible, and there are O(1) Computer Scientists, philosophers or others. For the historians, classicists or other subjects, mathmos are not high status. In terms of EtG, these groups are valuable - most traders are not quants.

**jonathan_lee**on CEA does not seem to be credibly high impact · 2013-02-21T11:40:35.051Z · score: 2 (6 votes) · LW · GW

In that case, having a claim on every page of the GWWC site claiming that £112.8M have been pledged seems deceptive. 291 people have pledged, and [by a black box that doesn't trivially correspond to reality] that's become £112.8M. I know that at least 3 people in Cambridge have seen that statistic and promptly *laughed* at GWWC. The numbers are implausible enough that <5s Fermi estimates seem to refute it, and then the status of GWWC as somewhat effective rational meta-charity is destroyed. Why would someone trust GWWC's assessment of charities or likely impact over, say, GiveWell, if the numbers GWWC display are so weird *and* lacking in justification?

**jonathan_lee**on CEA does not seem to be credibly high impact · 2013-02-21T11:34:41.920Z · score: -2 (2 votes) · LW · GW

Talking about effective altruism is a constraint, as is talking about mathematics. Being a subject society makes it easier to get people from that subject to attend; it also makes it harder to convince people from outside that subject to even consider coming.

TMS pulls 80+ people to most of its talks, which are not generally from especially famous mathematicians. TCSS got 600 people for a Pensrose-Rees event. Both TCSS and TMS have grown rapidly in 18-24 months, having existed for far longer. This seems to indicate that randomly selected student societies have low hanging fruit. It doesn't seem incongruous to suggest that OUIS, OUSS and GWWC have the capacity to at least double their attendances -- the TMS did in one term, and doubled the number of events (so a x4 in person-talks).

**jonathan_lee**on CEA does not seem to be credibly high impact · 2013-02-21T11:06:22.070Z · score: 0 (2 votes) · LW · GW

This holds for graduates who earn less than average as well. Is there data showing that the predominant source of career changes are people who would otherwise earn substantially less than mean? Is there data suggesting that the career changes are increasing incomes substantially?

**jonathan_lee**on Proofs, Implications, and Models · 2012-10-29T11:11:27.453Z · score: 2 (2 votes) · LW · GW

We might mean many things by "2 + 2 = 4". In PA: "PA |- SS0 + SS0 = SSSS0", and so by soundness "PA |= SS0 + SS0 = SSSS0" In that sense, it is a logical truism independent of people counting apples. Of course, this is clearly not what most people mean by "2+2=4", if for no other reason than people did number theory before Peano.

When applied to apples, "2 + 2 = 4" probably is meant as: "apples + the world |= 2 apples + 2 apples = 4 apples". the truth of which depends on the nature of "the world". It seems to be a correct statement about apples. Technically I have not checked this property of apples recently, but when I consider placing 2 apples on a table, and then 2 more, I think I can remove 4 apples and have none left. It seems that if I require 4 apples, it suffices to find 2 and then 2 more. This is also true of envelopes, paperclips, M&M's and other objects I use. So I generalise a law like behaviour of the world that "2 things + 2 things makes 4 things, for ordinary sorts of things (eg. apples)".

At some level, this is part of why I care about things that PA entails, rather than an arbitrary symbol game; it seems that PA is a logical structure that extracts lawlike behaviour of the world. If I assumed a different system, I might get "2+2=5", but then I don't think the system would correspond to the behaviours of apples and M&M's that I want to generalise.

(On the other hand, PA clearly isn't enough; it seems to me that strengthened finite Ramsey is true, but PA doesn't show it. But then we get into ZFC / second order arithmetic, and then systems at least as strong as PA_ordinal, and still lose because there are no infinite descending chains in the ordinals)

**Jonathan_Lee**on [deleted post] 2012-05-07T01:13:47.564Z

I want to note that I may be confused: I have multiple hypotheses fitting some fraction of the data presented.

- Supergoals and goals known, but unconscious affective death spirals or difficulties in actioning a far goal are interfering with the supergoals.
- Supergoals and goals known, goal is suboptimal.
- Supergoals not known consciously, subgoal known but suboptimal given knowledge of supergoals.

The first is what seems to be in the example. The second is what the strategy handles. The third is what I get when I try to interpret:

This technique is about finding concrete things that make you think "hey, that's awesome, how can I get that?"

The third is a call for more luminosity; the second is bad goal choice. The first is more awkward to handle. You need to operationally notice which goals are not useful and which are. That means noticing surface level features of your apparent goals that are not optimal.

As I see it, speaking of an "intuitive notion" of "perfectly honed instrument for realizing your goals", or merely stopping at "particular patterns of reality" is the warning signal of this failure mode. Taboo these terms, make them *operationally* defined. If you have a sequence of definite concrete statements about what the world would look like if you were this kind of entity, then you have a functional definition of what you want from the goal.

Of course, the imprecise goal may shatter into a large number of actionable goals. It may be the case that the skills needed to achieve these subgoals have a larger scale skill to learn in them. Functionally, if that high level skill can't be stated with sufficient precision to go out and know success when it's seen, then more data is needed about this possible high-level skill before we can be confident it's there in a form matching the imprecise goal. So note it, do the concrete things now, and look again when there is a better sense of the potential high level problem to solve.

The bit of the post that I find most awesome is the couple of days taken to audit your goals, and notice that achieving your goals were being hindered by this urge. I am aware that when I noticed how badly broken my goal structures were, I had to call "halt and catch fire" and keep a diary for a couple of months. Being able to perform an audit in a few days would be incredibly useful.

**Jonathan_Lee**on [deleted post] 2012-05-06T20:55:05.394Z

So, it seems to me that what you describe here is not moving up a hierarchy of goals, unless there are serious issues with the mechanisms used to generate subgoals. It seems like slogans more appropriate to avoiding the demonstrated failure mode are:

"Beware affective death spirals on far-mode (sub)goals" or "Taboo specific terms in your goals to make them operationally useful" or possibly even "Check that your stated goals are not semantic stop-signs"

As presented, you are claiming that:

I wanted to be a perfectly honed instrument for realizing my goals, similar to the hyper-competent characters in my favorite fictions

was generated as a subgoal of specific concrete goals (you mention programming and business). This seems to be a massive failure of planning. I would compare it to stating you would develop calculus to solve a constant speed distance-time problem, having never solved any of the latter sort of question. There is no shape to such a goal; to such an individual "calculus" is a term without content. Similarly, unless you have already developed high competence in many concrete tasks, how would you recognise a mind that was a perfectly honed instrument for realizing your goals? Taboo "perfectly honed instrument", "hyper-competent" etc., and the goal dissolves.

On the other hand, going up the pyramid of goals seems more likely to induce this error. Generally my high level goals are in farer modes and less concrete. Certainly "acquire awesome skills" is not something that I have generated as a subgoal of other goals; I have it as a generalisation of past methods of success, in the (inductive) belief that acquiring such skills will be useful in general. As subgoals to that I attempt general self improvement, for example learning to code in new languages or pushing other skillsets. Going up the pyramid of goals in such a context is an active hinderance, because the higher goals are harder to make operational.

**jonathan_lee**on Meetup Feedback: Topic selection and precommittments · 2012-05-06T16:59:55.957Z · score: 1 (1 votes) · LW · GW

Thanks. Definite typo, Fixed.

**jonathan_lee**on Pre-commitment and meta at the Cambridge UK meetup · 2012-04-30T10:27:34.389Z · score: 2 (2 votes) · LW · GW

Better directions to the JCR (with images) are here.

ETA: Also fixed the list of meetups to link there.

**jonathan_lee**on 'Is' and 'Ought' and Rationality · 2011-07-05T10:09:37.663Z · score: 1 (5 votes) · LW · GW

The foundational problem in your thesis is that you have grounded "rationality" as a normative "ought" on beliefs or actions. I dispute that assertion.

Rationality is more reasonably grounded as selecting actions so as to satisfy your explicit or implicit desires. There is no normative force to statements of the form "action X is not rational", unpacked as "If your values fall into {large set of human-like values}, then action X is not optimal, choosing for all similar situations where the algorithm you use is run".

There may or may not be general facts about what it is "rational" for "people" to do; it depends rather crucially on how consistent terminal values are across the set of "people". Neglecting trade with Clippy, it is (probably) not rational for humans to convert Jupiter to paperclips. Clippy might disagree.

It should be clear that rational actions are predicated on terminal values, and do not carry normative connotations. Given terminal values, your means of selecting actions may be rational or otherwise. Again, this is not normative; it may be suboptimal.

**jonathan_lee**on The Problem With Trolley Problems · 2010-10-24T09:13:16.868Z · score: 2 (2 votes) · LW · GW

From your own summary:

I think that trolley problems contain perfect information about outcomes in advance of them happening, ignore secondary effects, ignore human nature, and give artificially false constraints.

Which is to say they are idealised problems; they are *trued* dilemmas. Your remaining argument is fully general against any idealisation or truing of a problem that can also be used rhetorically. This is (I think) what Tordmor's summary is getting at; mine is doing the same.

Now, I think that's bad. Agree/disagree there?

So, I clearly disagree, and further you fail to actually establish this "badness". It is not problematic to think about simplified problems. The trolley problems demonstrate that instinctual ethics are sensitive to whether you have to "act" in some sense. I consider that a bug. The problem is that finding these bugs is harder in "real world" situations; people can avoid the actual point of the dilemma by appealing for more options.

In the examples you give, there is no similar pair of problems. The point isn't the utilitarianism in a single trolley problem; it's that when two tracks are replaced by a (canonically larger) person on the bridge and 5 workers further down, people change their answers.

Okay, finally, I think this kind of thinking seeps over into politics, and it's likewise bad there. Agree/disagree?

You don't establish this claim (I disagree). It is worth observing that the standard third "trolley" problem is 5 organ recipients and one healthy potential donor for all. The point is to establish that real world situations have more complexity -- your four problems.

The point of the trolley problems is to draw attention to the fact that the H.Sap inbuilt ethics is distinctly suboptimal in some circumstances. Your putative "better" dilemmas don't make that clear. Failing to note and account for these bugs is *precisely* "sloppy thinking". Being inconsistent in action on the basis of the varying descriptions of identical situations seems to be "sloppy thinking". Failing on Newcomb's problem is "sloppy thinking". Taking an "Activists" hypothetical as a true description of the world is "sloppy thinking". Knowing that the hardware you use is buggy? Not so much.

**jonathan_lee**on The Problem With Trolley Problems · 2010-10-23T08:31:47.787Z · score: 8 (12 votes) · LW · GW

The thrust of your argument appears to be that: 1) Trolley problems are idealised 2) Idealisation can be a dark art rhetorical technique in discussion of the real world. 3) Boo trolley problems!

There are a number of issues.

First and foremost, reversed stupidity is not intelligence. Even if you are granted the substance of your criticisms of the activists position, this does not argue per se against trolley problems as dilemmas. The fact that they share features with a "Bad Thing" does not inherently make them bad.

Secondly, the whole point of considering trolley problems is to elucidate human nature and give some measure of training in cognition in stressful edge cases. The observation that humans freeze or behave inconsistently is important. This is why the trolley problems have to be trued in the sense that you object to - if they are not, many humans will avoid thinking about the ethical question being posed. In essence "I don't like your options, give me a more palatable one" is a fully general and utterly useless answer; it must be excluded.

Thirdly, your argument turns on the claim that merely admitting trolley problems as objects of thought somehow makes people more likely to accept dichotomies that "justify tyranny and oppression". This is risible. Even if the dichotomy is a false one, you surely should find one or the other branch preferable. It is perfectly admissible to say:

"I prefer this option (implicitly you presume that will be the taxation), but that if this argument is to be the basis for policy, then there are better alternatives foo, bar, etc., and that various important real world effects have been neglected."

Those familiar with the trolley problems and general philosophical dilemmas are more likely to be aware of the idealisations and voice these concerns cogently if idealisations are used in rhetoric or politics.

Fourthly, in terms of data, I would challenge you to find evidence suggesting that study of trolley problems leads to acceptance of tyranny. I would note (anecdotally) that communities where one can say "trolley problem" without needing to explain further seem to have a higher density of the libertarians and anarchists than the general population.

So in rough summary: 1) Your conclusion does not follow from the argument. 2)Trolley problems are idealised because if they aren't humans evade rather than engage. 3) Noting and calling out dark arts rhetoric is roughly orthogonal to thinking about trolley problems (conditional on thinking). 4) Citation needed wrt. increased tyranny in those who consider trolley problems.

**jonathan_lee**on Significance of Compression Rate Method · 2010-05-31T07:48:26.228Z · score: 0 (0 votes) · LW · GW

In the wider sense, MML still works on the dataset {stock prices, newspapers, market fear}. Regardless of what work has presently been done to compress newspapers and market fear, if your hypothesis is efficient then you can produce the stock price data for a very low marginal message length cost.

You'd write up the hypothesis as a compressor-of-data; the simplest way being to produce a distribution over stock prices and apply arithmetic coding, though in practice you'd tweak whatever state of the art compressors for stock prices exist.

Of course the side effect of this is that your code references more data, and will likely need longer internal identifiers on it, so if you just split the cost of code across the datasets being compressed, you'd punish the compressors of newspapers and market fear. I would suggest that the solution is to deploy shapely value, with the value being the number of bits saved overall by a single compressor working on all the data sets in a given pool of cooperation.

**jonathan_lee**on Conditioning on Observers · 2010-05-13T12:22:26.999Z · score: 0 (0 votes) · LW · GW

You and I both agree on Bayes implying 1/21 in the single constant case. Considering the 2 constant game as 2 single constant games in series, with uncertainty over which one (k1 and k2 the mutually exclusive "this is the k1/k2 game")

P(H | W) = P(H ∩ k1|W) + P(H ∩ k2|W) = P(H | k1 ∩ W)P(k1|W) + P(H|k2 ∩ W)P(k2|W) = 1/21 . 1/2 + 1/21 . 1/2 = 1/21

This is the logic that to me drives PSB to SB and the 1/3 solution. I worked it through in SB by conditioning on the day (slightly different but not substantially).

I have had a realisation. You work directly with W, I work with subsets of W that can only occur at most once in each branch and apply total probability.

Formally, I think what is going on is this: (Working with simple SB) We have a sample space S = {H,T}

"You have been woken" is not an event, in the sense of being a *set* of experimental outcomes. "You will be woken at least once" is, but these are *not* the same thing.

"You will be woken at least once" is a nice straightforward event, in the sense of being a set of experimental outcomes {H,T}. "You have been woken" should be considered formally as the multiset {H,T,T}. Formally just working thorough with multisets wherever sets are used as events in probability theory, we recover all of the standard theorems (including Bayes) without issue.

What changes is that since P(S) = 1, and there are multisets X such that X contains S, P(X) > 1.

Hence P({H,T,T}) = 3/2; P({H}|{H,T,T}) = 1/3.

In the 2 constant PSB setup you suggest, we have S = {H,T} x {1,..,20} W = {(H,k1),(H,k2), (T,1),(T,1),(T,2),(T,2),....,(T,20),(T,20)}

And P(H|W) = 1/21 without issue.

My statement is that this more accurately represents the experimental setup; when you wake, conditioned on all background information, you don't know how many times you've been woken before, but this *changes* the conditional probabilities of H and T. If you merely use background knowledge of "You have been woken at least once", and squash all of the *events* "You are woken for the nth time" into a single event by using union on the events, then you discard information.

This is closely related to my earlier (intuition) that the problem was something to do with linearity.

In sets, union and intersection are only linear when the working on some collection of atomic sets, but are generally linear in multisets. [eg. (A υ B) \ B ≠ A in general in sets]

Observe that the approach I take of splitting "events" down to disjoint things that occur at most once is *precisely* taking a multiset event apart into well behaved events and then applying probability theory.

What was concerning me is that the true claim that P({H,T}|T) = 1 seemed to discard pertinent information (ie the potential for waking on the second day). With W as the multiset {H,T,T}, P(W|T) = 2. You can regard this as expectation number of times you see Tails, or the extension of probability to multisets.

The difference in approach is that you have to put the double counting of waking given tails in as a boost to payoffs given Tails, which seems odd as from the point of view of you having just been woken you are being offered immediate take-it-or-leave-it odds. This is made clearer by looking at the twins scenario; each person is offered at most one bet.

**jonathan_lee**on Conditioning on Observers · 2010-05-13T08:29:48.300Z · score: 0 (0 votes) · LW · GW

Continuity problem is that the 1/2 answer is independent of the ratio of expected number of wakings in the two branches of the experiment

Why is this a problem?

The next clause of the sentence is the problem

unless the ratio is 0 (or infinite) at which point special case logic is invoked to prevent the trivially absurd claim that credence of Heads is 1/2 when you are never woken under Heads.

The problem is special casing out the absurdity, and thus getting credences that are discontinuous in the ratio. On the other hand, you seem to take 1/21in PSB (ie you do let it depend on the ratio) but deviate from 1/21 when multiple runs of PSB aggregate, which is not what I had expected...

D was used in the comment I was replying to as an "event" that was studiously avoiding being W.

http://lesswrong.com/lw/28u/conditioning_on_observers/201l shows multiple ways I get the 1/3 solution; alternatively betting odds taken on awakening or the long run frequentist probability, they all cohere, and yield 1/3.

The problem as I see it with W is that it's not a set of outcomes, it's really a multiset. That's fine in it's way, but it gets confusing because it no longer bounds probabilities to [0,1]. Your approach is to quash multiple membership to get a set back.

**jonathan_lee**on Conditioning on Observers · 2010-05-13T08:19:50.026Z · score: 0 (0 votes) · LW · GW

No; P(H|W) = 1/21

Multiple ways to see this: 1) Under heads, I expect to be woken 1/10 of the time Under tails, I expect to be woken twice. Hence on the average for every waking after a head I am woken 20 times after a tail. Ergo 1/21.

2) Internally split the game into 2 single constant games, one for k1 and one for k2. We can simply play them sequentially (with the same die roll). When I am woken I do not know which of the two games I am playing. We both agree that in the single constant game P(H|W) = 1/21.

It's reasonably clear that playing two single constant games in series (with the same die roll and coin flip) reproduces the 2 constant game. The correleation between the roll and flip in the two games doesn't affect the expectations, and since you have complete uncertainty over which game you're in (c/o amnesia), the correlation of your current state with a state you have no information on is irrelevant.

P(H|W ∩ game i) = 1/21, so P(H|W) = 1/21, as the union over all i of (W ∩ game i) is W. At some level this is why I introduced PSB, it seems clearer that this should be the case when the number of wakings is bounded to 1.

3) Being woken implies either W1 or W2 (currently being woken for the first time or the second time) has occured. In general note that the expected count of something is a probability (and vice versa) if the number of times the event occurs is in {0,1} (trivial using the frequentist def of probability; under the credence view it's true for betting reasons).

P(W1 | H) = 1/10, P(W2 | H) = 0 P(W1 | T) = 1, P(W2 | T) = 1, from the experimental setup.

Hence P(H|W1) = 1/11, P(H|W2) = 0 You're woken in 11/20 of experiments for the first time and in 1/2 of experiments for the second, so P(W1| I am woken) = 11/21

P(H | I am woken ) = P(H ∩ W1 | I am woken ) + P(H ∩ W2 | I am woken ) = P(H | W1 ∩ I am woken).P(W1 | I am woken) + 0 = 1/11 . 11/21 = 1/21.

The issues you've raised with this is seem to be that you would either: Set P(W1 | I am woken) = 1 or Set P(W1 | T) = P(W2 | T) = 1/2 [ so P(H|W1) = 1/6 ], and set P(W1 | I am woken) = 6/11.

My problem with this is that if P(W1 | I am woken) =/= 11/21, you're poorly calibrated. Your position appears to be that this is because you're being "forced to make the bet twice in some circumstances but not others". Hence what you're doing is clipping the number of times a bet is made to {0,1}, at which point expectation counts of number of outcomes are probabilities of outcomes. I think such an approach is wrong, because the underlying problem is that the counts of event occurences conditional on H or T aren't constrained to be in {0,1} anymore. This is why I'm not concerned about the "probabilities" being over-unity. Indeed you'd expect them to be over-unity, because the long run number of wakings exceeds the long run number of experiments. In the limit you get well defined over unity probability, under the frequentist view. Betting odds aren't constrained in [0,1] either, so again you wouldn't expect credence to stay in [0,1]. It is bounded in [0,2] in SB or your experiment, because the maximum number of winning events in a branch is 2.

As I see it, the 1/21 answer (or 1/3 in SB) is the only plausible answer because it holds when we stack up multiple runs of the experiment in series or equivalently have uncertainty over which constant is being used in PSB. The 1/11 (equiv. 1/2) answer doesn't have this property, as is seen from 1/21 going to 1/11 from nothing but running two experiments of identical expected behaviour in series...

**jonathan_lee**on Conditioning on Observers · 2010-05-12T19:39:20.286Z · score: 1 (1 votes) · LW · GW

Continuity problem is that the 1/2 answer is independent of the ratio of expected number of wakings in the two branches of the experiment, unless the ratio is 0 (or infinite) at which point special case logic is invoked to prevent the trivially absurd claim that credence of Heads is 1/2 when you are never woken under Heads.

If you are put through multiple sub-experiments in series, or probabilistically through some element of a set of sub-experiments, then the Expected number of times you are woken is linearly dependent on the distribution of sub-experiments. The probability that you are woken *ever* is not.

So the problem is that it's not immediately clear what D should be. If D is split by total probability to be Heads or Tails, and the numbers worked separately in both cases, then to get 1/2 requires odd conditional probabilities, but 1/3 does not. If you don't split D, and calculate back from 1/3, you get 3/2 as the "probability" of D. It isn't. What's happened is closer to E(H|D) = E(D|H) E(H) / E(D), over one run of the experiment, and this yields 1/3 immediately.

The issue is that certain values of "D" occur multiple times in some branches, and allowing those scenarios to be double counted leads to oddness. I second the observation that caution is generally required.

**jonathan_lee**on Conditioning on Observers · 2010-05-12T19:27:42.446Z · score: 0 (0 votes) · LW · GW

You're woken with a big sign in front of you saying "the experiment is over now", or however else you wish to allow sleeping beauty to distinguish the experimental wakings from being allowed to go about her normal life.

Failing that, you are never woken; it shouldn't make any difference, as long as waking to leave is clearly distinguished from being woken for the experiment.

**jonathan_lee**on Conditioning on Observers · 2010-05-12T19:23:20.294Z · score: 0 (0 votes) · LW · GW

No. I assert P(H|W) = 1/21 in this case.

Two ways of seeing this: Either calculate the expected number of wakings conditional on the coin flip (m/20 and m for H and T). [As in SB]

Alternatively consider this as m copies of the single constant game, with uncertainty on each waking as to which one you're playing. All m single constant games are equally likely, and all have P(H|W) = 1/21. [The hoped for PSB intuition-pump]

**jonathan_lee**on Conditioning on Observers · 2010-05-12T07:35:00.090Z · score: 0 (0 votes) · LW · GW

*Before* I am woken up, my prior belief is that I spend 24 hours on Monday and 24 on Tuesday regardless of the coin flip. Hence *before* I condition on waking, my probabilities are 1/4 in each cell.

When I wake, one cell is driven to 0, and the is no information to distinguish the remaining 3. This is the point that the sleeping twins problem was intended to illuminate.

Given awakenings that I know to be on Monday, there are two histories with the same measure. They are equally likely. If I run the experiment and count the number of events Monday ∩ H and Monday ∩ T, I will get the same numbers (mod. epsilon errors). Your assertion that it's H/T with probability 0.5 is false given that you have woken. Hence sleeping twins.

**jonathan_lee**on Conditioning on Observers · 2010-05-11T18:00:50.871Z · score: 1 (1 votes) · LW · GW

As I see it, initially (as a prior, before considering that I've been woken up), both Heads and Tails are equally likely, and it is equally likely to be either day. Since I've been woken up, I know that it's not (Tuesday ∩ Heads), but I gain no further information.

Hence the 3 remaining probabilities are renormalised to 1/3.

Alternatively: I wake up; I know from the setup that I will be in this subjective state once under Heads and twice under Tails, and they are a priori equally likely. I have no data that can distinguish between the three states of identical subjective state, so my posterior is uniform over them.

If she knows it's Tuesday then it's Tails. If she knows it's Monday then she learns nothing of the coin flip. If she knows the flip was Tails then she is indifferent to Monday and Tuesday. 1/3 drops out as the only consistent answer at that point.

**jonathan_lee**on Conditioning on Observers · 2010-05-11T17:35:22.661Z · score: 0 (0 votes) · LW · GW

The reason it corresponds to Sleeping Beauty is that in the limit of a large number of trials, we can consider blocks of 20 trials where heads was the flip and all values of the die roll occurred, and similar blocks for tails, and have some epsilon proportion left over. (WLLN)

Each of those blocks corresponds to Sleeping Beauty under heads/tails.

**jonathan_lee**on Conditioning on Observers · 2010-05-11T16:57:21.285Z · score: 0 (0 votes) · LW · GW

No; between sedation and amnesia you know nothing but the fact that you've been woken up, and that 20 runs of this experiment are to be performed.

Why would an earlier independent trial have any impact on you or your credences, when you can neither remember it nor be influenced by it?

**jonathan_lee**on Conditioning on Observers · 2010-05-11T16:40:12.821Z · score: 0 (0 votes) · LW · GW

It isn't a probability; the only use of it was to note the method leading to a 1/2 solution and where I consider it to fail, specifically because the number of times you are woken is not bound in [0,1] and thus "P(W)" as used in the 1/2 conditioning is malformed, as it doesn't keep track of when you're actually woken up. In as much as it is anything, using the 1/2 argumentation, "P(W)" is the expected number of wakings.

No. You will wake on Monday with probability one. But, on a randomly selected awakening, it is more likely that it's Monday&Heads than Monday&Tails, because you are on the Heads path on 50% of experiments

Sorry, but if we're randomly selecting a *waking* then it is not true that you're on the heads path 50% of the time. In a pair of runs, one head, one tail, you are woken 3 times, twice on the tails path.

On a randomly selected run of the experiment, there is a 1/2 chance of being in either branch, but:
Choose a uniformly random waking in a uniformly chosen random run
is *not* the same as
Choose a uniformly random waking.

**jonathan_lee**on Conditioning on Observers · 2010-05-11T15:38:12.896Z · score: 0 (0 votes) · LW · GW

Of course P(W) isn't bound within [0,1]; W is one of any number of events, in this case 2: P(You will be woken for the first time) = 1; P(You will be woken a second time) = 1/2. The fact that natural language and the phrasing of the problem attempts to hide this as "you wake up" is not important. That is why P(W) is apparently broken; it double counts some futures, it is the expected number of wakings. This is why I split into conditioning on waking on Monday or Tuesday.

(Tuesday, tails) is not the same event as (Monday, tails). They are distinct queries to whatever decision algorithm you implement; there are any number of trivial means to distinguish them without altering the experiment (Say "we will keep you in a red room on one day and a blue one on the other, with the order to be determined by a random coin flip)

They are strongly correlated events, granted. If either occurs, so will the other. That does not make them the same event. On your argumentation, you would assert confidently to that the coin is fair beforehand, yet also assert that the conditional probability that you wake on Monday depends on the coin flip, when in either branch you are woken then with probability 1.

**jonathan_lee**on Conditioning on Observers · 2010-05-11T15:22:23.592Z · score: 0 (0 votes) · LW · GW

The point of the PSB problem is that the approach you've just outlined is indefensible.

You agree that for each single constant k_i P(H|W) = 1/21. Uncertainty over which constant k_i is used does not alter this.

So if I run PSB 20 times, you would assert in each run that P(H|W) = 1/21. So now I simply keep you sedated between experiments. Statistically, 20 runs yields you SB, and each time you answered with 1/21 as your credence. Does this not faze you at all?

You have a scenario A where you assert foo with credence P, and scenario B where you also assert foo with credence P, yet if I put you in scenario A and then scenario B, keeping you sedated in the meantime, you do not assert foo with credence P...

**jonathan_lee**on Conditioning on Observers · 2010-05-11T15:12:07.655Z · score: 0 (0 votes) · LW · GW

The claim is implied by your logic; the fact that you don't engage with it does not prevent it from being a consequence that you need to deal with. Furthermore it appears to be the intuition by which you are constructing your models of Sleeping Beauty.

Imagine we repeat the sleeping beauty experiment many times. On half of the experiments, she'd be on the heads path. On half of the experiments, she'd be on the tails path.

Granted; no contest

If she is on the tails path, it could be either monday or tuesday.

And assuredly she will be woken on both days in any given experimental run. She will be woken twice. Both events occur whenever tails comes up.P(You will be woken on Monday | Tails) = P(You will be woken on Tuesday | Tails) = 1

The arrangement that you are putting forward as a model is that Sleeping Beauty is to be woken once and only once regardless of the coin flip, and thus if she could wake on Tuesday given Tails occurred then that must reduce the change of her waking on Monday given that Tails occurred. However in the Sleeping Beauty problem the number of wakings is not constant. This is the fundamental problem in your approach.

**jonathan_lee**on Conditioning on Observers · 2010-05-11T12:22:15.164Z · score: 1 (1 votes) · LW · GW

P(Monday ∩ H | W) = P(Monday ∩ T | W). Regardless of whether the coin came up heads or tails you will be woken on Monday precisely once.

P(Monday ∩ T | W) = P(Tuesday ∩ T | W), because if tails comes up you are surely woken on both Monday and Tuesday.

You still seem to be holding on to the claim that there are as many observations after a head as after a tail; this is clearly false. There isn't a half measure of observation to spread across the tails branch of the experiment; this is made clearer in Sleeping Twins and the Probabilistic Sleeping Beauty problems.

Once Sleeping Beauty is normalised so that there is at most one observation per "individual" in the experiment, it seems far harder to justify the 1/2 answer. The fact of the matter is that your use of P(W) = 1 is causing grief, as on these problems you should consider E(#W) instead, because P(W) is not linear.

What is your credence in the Probabilistic Sleeping Beauty problem?

**jonathan_lee**on What is Bayesianism? · 2010-02-26T09:42:30.551Z · score: 3 (3 votes) · LW · GW

It seems there are a few meta-positions you have to hold before taking Bayesianism as talked about here; you need the concept of Winning first. Bayes is not sufficient for sanity, if you have, say, an anti-Occamian or anti-Laplacian prior.

What this site is for is to help us be good rationalists; to win. Bayesianism is the best candidate methodology for dealing with uncertainty. We even have theorems that show that in it's domain it's uniquely good. My understanding of what we mean by Bayesianism is updating in the light of new evidence, and updating correctly within the constraints of sanity (cf Dutch books).

**jonathan_lee**on A problem with Timeless Decision Theory (TDT) · 2010-02-05T13:53:46.760Z · score: 0 (0 votes) · LW · GW

The same one that you're currently seeing; for all values of E there is a value of F such that this is consistent, ie that D has actually predicted you in the scenario you currently find yourself in.

**jonathan_lee**on A problem with Timeless Decision Theory (TDT) · 2010-02-04T23:55:31.996Z · score: 1 (1 votes) · LW · GW

The game is to pick a box numbered from 0 to 2; there is a hidden logical computation E yielding another value 0 to 2. Omega has a perfect predictor D of you. You choose C.

The payout is 10^((E+C)mod 3), and there is a display showing the value of F = (E-D)mod 3.

If F = 0, then:

- D = 0 implies E = 0 implies optimal play is C = 2; contradiction
- D = 1 implies E = 1 implies optimal play is C = 1; no contradiction
- D = 2 implies E = 2 implies optimal play is C = 0; contradiction

And similarly for F = 1, F = 2 play C = F+1 as the only stable solution (which nets you 100 per play)

If you're not allowed to infer anything about E from F, then you're faced with a random pick from winning 1, 10 or 100, and can't do any better...

**jonathan_lee**on A problem with Timeless Decision Theory (TDT) · 2010-02-04T22:37:54.026Z · score: 2 (2 votes) · LW · GW

This ad-hoc fix breaks as soon as Omega makes a slightly messier game, wherein you receive a physical clue as to a computation output, and this computation and your decision determine your reward.

Suppose that for any output of the computation there is a a unique best decision, and that furthermore this set of (computation output, predicted decision) pairs are mapped to distinct physical clues. Then given the clue you can infer what decision to make and the logical computation, but this **requires** that you infer from a logical fact (the predictor of you) to the physical state to the clue to the logical fact of the computation.

**jonathan_lee**on Logical Rudeness · 2010-01-29T09:41:57.270Z · score: 4 (4 votes) · LW · GW

The underlying issue is what we take the purpose of debate or discussion to be. Here we consider discourse to be prior to justified belief; the intent is to reveal the reasonable views to hold, and then update our beliefs.

If there is a desire to justify some specific belief as an end in itself, then the rules of logical politeness are null; they have no meaning to you as you're not looking to find truth, per se, but to defend an existing position. You have to admit that you could in principle be wrong, and that is a step that, by observation, most people do not make on most issues.

This is clearly exacerbated on issues where beliefs have been pre-formed by the point at which people learn the trappings of logical argument; heuristic internal beliefs are most likely to be defended in this fashion. The only community standard that seems to be required is that people are willing, or preferably eager, to update f the evidence or balance of argument (given limited cognition) changes.