Stuff That Makes Stuff Happen

eliezer_yudkowsky

Stuff That Makes Stuff Happen

post by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2012-10-18T10:49:26.516Z · LW · GW · Legacy · 128 comments

   Mainstream status.
None
128 comments

Followup to: Causality: The Fabric of Real Things

Previous meditation:

"You say that a universe is a connected fabric of causes and effects. Well, that's a very Western viewpoint - that it's all about mechanistic, deterministic stuff. I agree that anything else is outside the realm of science, but it can still be real, you know. My cousin is psychic - if you draw a card from his deck of cards, he can tell you the name of your card before he looks at it. There's no mechanism for it - it's not a causal thing that scientists could study - he just does it. Same thing when I commune on a deep level with the entire universe in order to realize that my partner truly loves me. I agree that purely spiritual phenomena are outside the realm of causal processes that can be studied by experiments, but I don't agree that they can't be real."

Reply:

Fundamentally, a causal model is a way of factorizing our uncertainty about the universe. One way of viewing a causal model is as a structure of deterministic functions plus uncorrelated sources of background uncertainty.

Let's use the Obesity-Exercise-Internet model (reminder: which is totally made up) as an example again:

$p(x_1, x_2, x_3) = p(x_1)p(x_2)p(x_3|x_1, x_2)$

We can also view this as a set of deterministic functions F_i, plus uncorrelated background sources of uncertainty U_i:

This says is that the value x₃ - how much someone exercises - is a function of how obese they are (x₁), how much time they spend on the Internet (x₂), plus some other background factors U₃ which don't correlate to anything else in the diagram, all of which collectively determine, when combined by the mechanism F₃, how much time someone spends exercising.

There might be any number of different real factors involved in the possible states of U₃ - like whether someone has a personal taste for jogging, whether they've ever been to a trampoline park and liked it, whether they have some gene that affects exercise endorphins. These are all different unknown background facts about a person, which might affect whether or not they exercise, above and beyond obesity and Internet use.

But from the perspective of somebody building a causal model, so long as we don't have anything else in our causal graph that correlates with these factors, we can sum them up into a single factor of subjective uncertainty, our uncertainty U₃ about all the other things that might add up to a force for or against exercising. Once we know that someone isn't overweight and that they spend a lot of time on the Internet, all our uncertainty about those other background factors gets summed up with those two known factors and turned into a 38% conditional probability that the person exercises frequently.

And the key condition on a causal graph is that if you've properly described your beliefs about the connective mechanisms F_i, all your remaining uncertainty U_i should be conditionally independent:

$p(u_1, u_2, u_3) = p(u_1)p(u_2)p(u_3)$

or more generally

$p(\mathbf U) = \prod p(U_i)$

And then plugging those probable U_i into the strictly deterministic F_i should give us back out our whole causal model - the same joint probability table over the observable X_i.

Hence the idea that a causal model factorizes uncertainty. It factorizes out all the mechanisms that we believe connect variables, and all remaining uncertainty should be uncorrelated so far as we know.

To put it another way, if we ourselves knew about a correlation between two U_i that wasn't in the causal model, our own expectations for the joint probability table couldn't match the model's product

$p(\mathbf x) = \prod p(x_i|\mathbf{pa_i})$

and all the theorems about causal inference would go out the window. Technically, the idea that the U_i are uncorrelated is known as the causal Markov condition.

What if you realize that two variables actually are correlated more than you thought? What if, to make the diagram correspond to reality, you'd have to hack it to make some U_a and U_b correlated?

Then you draw another arrow from X_a to X_b, or from X_b to X_a; or you make a new node representing the correlated part of U_a and U_b, X_c, and draw arrows from X_c to X_a and X_b.

vs. vs.

(Or you might have to draw some extra causal arrows somewhere else; but those three changes are the ones that would solve the problem most directly.)

There was apparently at one point - I'm not sure if it's still going on or not - this big debate about the true meaning of randomization in experiments, and what counts as 'truly random'. Is your randomized experiment invalidated, if you use a merely pseudo-random algorithm instead of a thermal noise generator? Is it okay to use pseudo-random algorithms? Is it okay to use shoddy pseudo-randomness that a professional cryptographer would sneer at? Clearly, using 1-0-1-0-1-0 on a list of patients in alphabetical order isn't random enough... or is it? What if you pair off patients in alphabetical order, and flip a coin to assign one member of each pair to the experimental group and the control? How random is random?

Understanding that causal models factorize uncertainty leads to the realization that "randomizing" an experimental variable means using randomness, a U_x for the assignment, which doesn't correlate with your uncertainty about any other U_i. Our uncertainty about a thermal noise generator seems strongly guaranteed to be uncorrelated with our uncertainty about a subject's economic status, their upbringing, or anything else in the universe that might affect how they react to Drug A...

...unless somebody wrote down the output of the thermal noise generator, and then used it in another experiment on the same group of subjects to test Drug B. It doesn't matter how "intrinsically random" that output was - whether it was the XOR of a thermal noise source, a quantum noise source, a human being's so-called free will, and the world's strongest cryptographic algorithm - once it ends up correlated to any other uncertain background factor, any other U_i, you've invalidated the randomization. That's the implicit problem in the XKCD cartoon above.

But picking a strong randomness source, and using the output only once, is a pretty solid guarantee this won't happen.

Unless, ya know, you start out with a list of subjects sorted by income, and the randomness source randomly happens to put out 111111000000. Whereupon, as soon as you look at the output and are no longer uncertain about it, you might expect correlation and trouble. But that's a different and much thornier issue in Bayesianism vs. frequentism.

If we take frequentist ideas about randomization at face value, then the key requirement for theorems about experimental randomization to be applicable, is for your uncertainty about patient randomization to not correlate with any other background facts about the patients. A double-blinded study (where the doctors don't know patient status) ensures that patient status doesn't correlate with the doctor's beliefs about a patient leading them to treat patients differently. Even plugging in the fixed string "1010101010" would be sufficiently random if that pattern wasn't correlated to anything important; the trouble is that such a simple pattern could very easily correlate with some background effect, and we can believe in this possible correlation even if we're not sure what the exact correlation would be.

(It's worth noting that the Center for Applied Rationality ran the June minicamp experiment using a standard but unusual statistical method of sorting applicants into pairs that seemed of roughly matched prior ability / prior expected outcome, and then flipping a coin to pick one member of each pair to be admitted or not admitted that year. This procedure means you never randomly improbably get an experimental group that would, once you actually looked at the random numbers, seem much more promising or much worse than the control group in advance - where the frequentist guarantee that you used an experimental procedure where this usually doesn't happen 'in the long run', might be cold comfort if it obviously had happened this time once you looked at the random numbers. Roughly, this choice reflects a difference between frequentist ideas about procedures that make it hard for scientists to obtain results unless their theories are true, and then not caring about the actual random numbers so long as it's still hard to get fake results on average; versus a Bayesian goal of trying to get the maximum evidence out of the update we'll actually have to perform after looking at the results, including how the random numbers turned out on this particular occasion. Note that frequentist ethics are still being obeyed - you can't game the expected statistical significance of experimental vs. control results by picking bad pairs, so long as the coinflips themselves are fair!)

Okay, let's look at that meditation again:

Well, you know, you can stand there all day, shouting all you like about how something is outside the realm of science, but if a picture of the world has this...

...then we're either going to draw an arrow from the top card to the prediction; an arrow from the prediction to the top card (the prediction makes it happen!); or arrows from a third source to both of them (aliens are picking the top card and using telepathy on your cousin... or something; there's no rule you have to label your nodes).

More generally, for me to expect your beliefs to correlate with reality, I have to either think that reality is the cause of your beliefs, expect your beliefs to alter reality, or believe that some third factor is influencing both of them.

This is the more general argument that "To draw an accurate map of a city, you have to open the blinds and look out the window and draw lines on paper corresponding to what you see; sitting in your living-room with the blinds closed, making stuff up, isn't going to work."

Correlation requires causal interaction; and expecting beliefs to be true means expecting the map to correlate with the territory. To open your eyes and look at your shoelaces is to let those shoelaces have a causal effect on your brain - in general, looking at something, gaining information about it, requires letting it causally affect you. Learning about X means letting your brain's state be causally determined by X's state. The first thing that happens is that your shoelace is untied; the next thing that happens is that the shoelace interacts with your brain, via light and eyes and the visual cortex, in a way that makes your brain believe your shoelace is untied.

p(Shoelace=tied, Belief="tied")	0.931
p(Shoelace=tied, Belief="untied")	0.003
p(Shoelace=untied, Belief="untied")	0.053
p(Shoelace=untied, Belief="tied")	0.012

This is related in spirit to the idea seen earlier on LW that having knowledge materialize from nowhere directly violates the second law of thermodynamics because mutual information counts as thermodynamic negentropy. But the causal form of the proof is much deeper and more general. It applies even in universes like Conway's Game of Life where there's no equivalent of the second law of thermodynamics. It applies even if we're in the Matrix and the aliens can violate physics at will. Even when entropy can go down, you still can't learn about things without being causally connected to them.

The fundamental question of rationality, "What do you think you know and how do you think you know it?", is on its strictest level a request for a causal model of how you think your brain ended up mirroring reality - the causal process which accounts for this supposed correlation.

You might not think that this would be a useful question to ask - that when your brain has an irrational belief, it would automatically have irrational beliefs about process.

But "the human brain is not illogically omniscient", we might say. When our brain undergoes motivated cognition or other fallacies, it often ends up strongly believing in X, without the unconscious rationalization process having been sophisticated enough to also invent a causal story explaining how we know X. "How could you possibly know that, even if it was true?" is a more skeptical form of the same question. If you can successfully stop your brain from rationalizing-on-the-spot, there actually is this useful thing you can sometimes catch yourself in, wherein you go, "Oh, wait, even if I'm in a world where AI does get developed on March 4th, 2029, there's no lawful story which could account for me knowing that in advance - there must've been some other pressure on my brain to produce that belief."

Since it illustrates an important general point, I shall now take a moment to remark on the idea that science is merely one magisterium, and there's other magisteria which can't be subjected to standards of mere evidence, because they are special. That seeing a ghost, or knowing something because God spoke to you in your heart, is an exception to the ordinary laws of epistemology.

That exception would be convenient for the speaker, perhaps. But causality is more general than that; it is not excepted by such hypotheses. "I saw a ghost", "I mysteriously sensed a ghost", "God spoke to me in my heart" - there's no difficulty drawing those causal diagrams.

The methods of science - even sophisticated methods like the conditions for randomizing a trial - aren't just about atoms, or quantum fields.

They're about stuff that makes stuff happen, and happens because of other stuff.

In this world there are well-paid professional marketers, including philosophical and theological marketers, who have thousands of hours of practice convincing customers that their beliefs are beyond the reach of science. But those marketers don't know about causal models. They may know about - know how to lie persuasively relative to - the epistemology used by a Traditional Rationalist, but that's crude by the standards of today's rationality-with-math. Highly Advanced Epistemology hasn't diffused far enough for there to be explicit anti-epistemology against it.

And so we shouldn't expect to find anyone with a background story which would justify evading science's skeptical gaze. As a matter of cognitive science, it seems extremely likely that the human brain natively represents something like causal structure - that this native representation is how your own brain knows that "If the radio says there was an earthquake, it's less likely that your burglar alarm going off implies a burglar." People who want to evade the gaze of science haven't read Judea Pearl's book; they don't know enough about formal causality to not automatically reason this way about things they claim are in separate magisteria. They can say words like "It's not mechanistic", but they don't have the mathematical fluency it would take to deliberately design a system outside Judea Pearl's box.

So in all probability, when somebody says, "I communed holistically and in a purely spiritual fashion with the entire universe - that's how I know my partner loves me, not because of any mechanism", their brain is just representing something like this:

Partner loves	Universe knows	I hear universe	%
p	u	h	0.44
p	u	¬h	0.023
p	¬u	h	0.01
p	¬u	¬h	0.025
¬p	u	h	0.43
¬p	u	¬h	0.023
¬p	¬u	h	0.015
¬p	¬u	¬h	0.035

True, false, or meaningless, this belief isn't beyond investigation by standard rationality.

Because causality isn't a word for a special, restricted domain that scientists study. 'Causal process' sounds like an impressive formal word that would be used by people in lab coats with doctorates, but that's not what it means.

'Cause and effect' just means "stuff that makes stuff happen and happens because of other stuff". Any time there's a noun, a verb, and a subject, there's causality. If the universe spoke to you in your heart - then the universe would be making stuff happen inside your heart! All the standard theorems would still apply.

Whatever people try to imagine that science supposedly can't analyze, it just ends up as more "stuff that makes stuff happen and happens because of other stuff".

Mainstream status. [LW(p) · GW(p)]

Part of the sequence Highly Advanced Epistemology 101 for Beginners

Next post: "Causal Reference"

Previous post: "Causal Diagrams and Causal Models"

128 comments

Comments sorted by top scores.

comment by gjm · 2012-10-18T00:37:06.159Z · LW(p) · GW(p)

p(a,b,c) = p(a)p(b)p(c) isn't a statement of uncorrelatedness but of independence. Using the term "uncorrelated" with that meaning might be defensible but probably merits mention as something not-mainstream.

Replies from: wnoise, IlyaShpitser

↑ comment by wnoise · 2012-10-23T05:58:58.328Z · LW(p) · GW(p)

It's helpful to go a bit further for these corrections. What's the reason not to use "uncorrelated" here?

In ordinary English, "uncorrelated" is indeed used for this (and a host of other things, because ordinary English is very vague). The problem is that it means something else in probability theory, namely the much weaker statement E(a-E(a)) E(b-E(b)) = E((a-E(a)(b-E(b)), which is implied by independence (p(a,b) = p(a)p(b)), but not does not imply independence. If we want to speak to those who know some probability theory, this clash of meaning is a problem. If we want to educate those who don't know probability theory to understand the literature and be able to talk with those who do know probability theory, this is also a problem.

(Note too that uncorrelatedness is only invariant under affine remappings (X and Y chosen as the coordinates of a random point on the unit circle are uncorrelated. X^2 and Y^2 are perfectly correlated. Nor does correlated directly make any sense for non-numerical variables (though you could probably lift to the simplex and use homogeneous coordinates to get a reasonable meaning).)

Replies from: gjm, Richard_Kennaway

↑ comment by gjm · 2012-10-23T10:57:58.953Z · LW(p) · GW(p)

I know that Eliezer knows quite a lot of mathematics. His article was clearly written for people who are at least a bit comfortable with mathematics. So it's reasonable to suppose (1) that a substantial fraction of readers will have encountered something like the mathematical notion of "uncorrelated" and might therefore be confused by having the word used to denote something else, and (2) that in notifying Eliezer of this it's OK to be pretty terse about it.

For the avoidance of doubt, I'm not disagreeing with anything you said, just explaining why I just made the brief statement I did rather than offering more explanation.

↑ comment by Richard_Kennaway · 2012-10-23T10:53:34.198Z · LW(p) · GW(p)

The problem is that it means something else in probability theory, namely the much weaker statement E(a-E(a)) E(b-E(b)) = E((a-E(a)(b-E(b))

E(a-E(a)) and E(b-E(b)) are both identically zero, so this would be more simply put (and restoring some missing parentheses) as E((a-E(a))(b-E(b))) = 0. Or after shifting the means of both variables to zero, E(ab) = 0.

↑ comment by IlyaShpitser · 2012-10-18T00:53:48.013Z · LW(p) · GW(p)

Don't bother, he's "write-only."

edit: There is stuff in the original 'causal diagrams' post from nearly two weeks ago that is factually wrong (not a minor nitpick either), was pointed out as such, and is still uncorrected. "Write-only."

comment by satt · 2012-10-17T23:39:55.986Z · LW(p) · GW(p)

It's worth noting that the Center for Applied Rationality ran the June minicamp experiment using a standard but unusual statistical method of sorting applicants into pairs that seemed of roughly matched prior ability / prior expected outcome, and then flipping a coin to pick one member of each pair to be admitted or not

As an aside, if you're interested in looking up more about this nifty experimental design trick, the magic keyword is "blocking". The idea of randomized block designs dates back to Fisher.

Replies from: gwern

↑ comment by gwern · 2012-10-18T17:53:06.860Z · LW(p) · GW(p)

I've found blocking to be really useful for my small-scale experiments for 2 different reasons:

Often, I'm worried about simple randomization leading to an imbalance in sample vs experimental; if I'm only getting 20 total datapoints on something, then randomization could easily lead to something like 14 control and 6 experimental datapoints - throwing out a lot of statistical power compared to 10 control and 10 experimental. If I pair days, then I know I will get 10/10, without worrying about breaking blinding.
Blocking is the natural way to handle multiple-day effects or trends: if I think lithium operates slowly, I will pair entire weeks or months, rather than days and hoping enough experimental and control days form runs which will reveal any trend rather than wash it out in averaging.

comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2012-10-17T06:46:24.515Z · LW(p) · GW(p)

Mainstream status:

As previously stated, the take on causality's math is meant to be academically standard; this includes the idea of decomposing the X(i) into deterministic F(i) and uncorrelated U(i).

I haven't particularly seen anyone else observe that claiming you know about X without X affecting you, you affecting X, or X and your belief having a common cause, violates the Markov condition on causal graphs.

I haven't actually seen anyone cite the Markov condition as a reply to the old "What constitutes randomization?" debates I've glimpsed, but I would be genuinely surprised if Pearl & co. hadn't pointed it out by now - my understanding is that he's spending most of his time evangelizing causality to experimental statisticians these days. It seems pretty obvious once you have causal models as a background.

The concept of "separate magisteria" is as old as scientific critique of religion, but the actual phrase was coined by Stephen Jay Gould (speaking favorably of the separation, natch). So far as I know, the concept of anti-epistemology is an LW original; likewise the view that causality is more general than anyone trying to separate their magisterium would have the mathematical competence to successfully escape, even as an attempted excuse. In general, when I write about the skeptical applications, I'm usually writing things I haven't read before and that wouldn't be expected to appear somewhere like Pearl's Causality book - which doesn't imply that nobody else has written about them, of course. If you know of similar theses, comment here.

Replies from: bogus

↑ comment by bogus · 2012-10-17T18:32:56.770Z · LW(p) · GW(p)

I haven't actually seen anyone cite the Markov condition as a reply to the old "What constitutes randomization?" debates I've glimpsed

Isn't this essentially implied by the well-known ideas of "natural experiments" and "instrumental variables"? Pearl does deal with these ideas in Causality.

comment by [deleted] · 2012-10-18T02:00:20.942Z · LW(p) · GW(p)

which is totally made up

you can't just do that without consequences

Replies from: incariol

↑ comment by incariol · 2012-10-18T09:09:48.940Z · LW(p) · GW(p)

Look at it as an exercise for the actively disbelieving mini-skill. :)

Replies from: loup-vaillant

↑ comment by loup-vaillant · 2012-10-18T12:01:05.804Z · LW(p) · GW(p)

Mini-trick for the mini-skill: Pretend he's talking about a fictional universe where anything explicitly mentioned is arbitrary.

comment by Matt_Simpson · 2012-10-17T14:55:08.852Z · LW(p) · GW(p)

Whatever people try to imagine that science supposedly can't analyze, it just ends up as more "stuff that makes stuff happen and happens because of other stuff".

I think said people would object to this - e.g. "God certainly isn't stuff, God is metaphysical!" This, of course, is not problem for causal diagrams. The math allows you to have arrows from metaphysical stuff to physical stuff, which allows you to see occam's razor visually. But it's interesting to think about how to best counter this argument when you're trying to convince your opponent and not just yourself.

Replies from: scav, DaFranker, Manfred

↑ comment by scav · 2012-10-18T15:04:23.432Z · LW(p) · GW(p)

Upvoted for:

The math allows you to have arrows from metaphysical stuff to physical stuff, which allows you to see occam's razor visually.

↑ comment by DaFranker · 2012-10-18T15:10:04.669Z · LW(p) · GW(p)

Fun gamble: Make a huge causal diagram as part of the discussion, and once people bring up the metaphysical God argument, point at the whole diagram and say "Okay, if God is metaphysical, then he's the rules by which the diagrams operate. There, look at this diagram. You're looking at God."

I doubt it'd work, but the thought made me chuckle.

↑ comment by Manfred · 2012-10-17T20:00:38.246Z · LW(p) · GW(p)

I suspect the best counter would have been to have seen more steps ahead and given them some abstract causal diagram practice.

comment by gwern · 2012-10-18T17:43:24.733Z · LW(p) · GW(p)

Clearly, using 1-0-1-0-1-0 on a list of patients in alphabetical order isn't random enough... or is it?

It's not, if only because the people implementing it can guess it: a textbook I read on doing medical trials mentioned that this procedure was done in medical trials, and it led to tampering where doctors would send the patients they liked better or were sicker or whatever to the 'right' trial arm.

Replies from: thomblake

↑ comment by thomblake · 2012-10-18T18:35:35.289Z · LW(p) · GW(p)

it led to tampering where doctors would send the patients they liked better or were sicker or whatever to the 'right' trial arm.

So they changed the person's name, or what?

Replies from: gwern

↑ comment by gwern · 2012-10-18T18:53:44.809Z · LW(p) · GW(p)

Something like that. There are a lot of ways to tamper with this: participation is voluntary, of course, so if a patient would 'benefit' from being randomized to the 'right' arm, you'd encourage them to do it, while if they weren't, you'd encourage them to drop out (and maybe get the tested treatment themselves!). You'd filter the list in the first place, or use alternative names (My legal first name starts with one letter but I always go by a version of my middle name which starts with a different letter: which version does the doctor write down?). And so on.

One interesting example, from a retrospective:

It took only a few months to accumulate the required experience in the two hospitals (Reese et al. 1952). Allocation to ACTH or no ACTH was decided by drawing marbles from a jar containing an equal number of white and blue marbles: one morning, when a new infant became eligible for enrollment, I noticed that our head nurse shook the jar vigorously, turned her head away, pulled a marble out (just as she had been instructed); but because she didn't like the 'assignment', she put the marble back, shook the jar again, and pulled out the color that agreed with her bias! The importance of Bradford Hill's precaution in Britain's famous streptomycin trial to conceal the order of assignment in sealed envelopes was immediately obvious!

Replies from: thomblake

↑ comment by thomblake · 2012-10-18T18:55:47.163Z · LW(p) · GW(p)

thanks!

comment by Johnicholas · 2012-10-17T19:39:14.702Z · LW(p) · GW(p)

Twice in this article, there are tables of numbers. They're clearly made-up, not measured from experiment, but I don't really understand exactly how made-up they are - are they carefully or casually cooked?

Could people instead use letters (variables), with relations like 'a > b', 'a >> b', 'a/b > c' and so on in the future? Then I could understand better what properties of the table are intentional.

Replies from: Kindly, MaoShan

↑ comment by Kindly · 2012-10-17T20:31:11.845Z · LW(p) · GW(p)

In my experience, using variables instead of numbers when it's not absolutely necessary makes things ridiculously harder to understand for someone not comfortable with abstract math.

Replies from: None

↑ comment by [deleted] · 2012-10-18T02:19:12.848Z · LW(p) · GW(p)

we are talking about the mathematics of causality. I would expect people to be familiar with free variables and algebra.

I for one would find explicit algebraic expessions much clearer than a bunch of meaningless numbers.

Replies from: Ezekiel

↑ comment by Ezekiel · 2012-10-18T11:40:32.182Z · LW(p) · GW(p)

Depends what you mean by "familiar". I'd imagine anyone reading the essay can do algebra, but that they're still likely to be more comfortable when presented with specific numbers. People are weird like that - we can learn general principles from examples more easily than from having the general principles explained to us explicitly.

Exceptions abound, obviously.

↑ comment by MaoShan · 2012-10-18T02:24:46.126Z · LW(p) · GW(p)

There's nothing about the tables that was not explained in the previous installment of this series; click the links if you're still confused. I came to this knowing nothing about that type of notation, but the tables told me even more than the bubble diagrams--and here's the secret. Looking at the table tells you next to nothing. It's only when you think about the situations that the probabilities quantify, then they make sense. Although, as an additional step, he could have explained each of the situations in sentence form in a paragraph, but probably felt the table spoke for itself.

The second table, for instance, (if I am interpreting correctly) can be paraphrased as:

I believe that my partner loves me, and that the universe knows it, and I can get this answer from the universe. I would also know that if my partner didn't love me, because the universe would know it and I would hear that. It's probably one of those two. Of course it could be that I don't hear the universe, or the universe is lying to me, or that the universe doesn't magically pick up our thoughts (how unromantic!), but I really don't believe that to be true, I only admit that it's possible. I am rational, after all.

Replies from: Johnicholas

↑ comment by Johnicholas · 2012-10-18T15:27:59.745Z · LW(p) · GW(p)

I agree that if you don't look at the numbers, but at the surrounding text, you get the sense that the numbers could be paraphrased in that way.

So does h, labeled "I hear universe" mean "I hear the universe tell me something at all", or "I hear the universe tell me that they love me" or "I hear the universe tell me what it knows, which (tacitly according to the meaning of knows) is accurate"?

I thought it meant "I have a sensation as if the universe were telling me that they love me", but the highest probability scenarios are p&u&h, and -p&u&h, which would suggest that regardless of their love, I'm likely to experience a sensation as if the universe were telling me that they love me. That seems reasonable from a skeptical viewpoint, but not from a believer's viewpoint.

Replies from: DaFranker, MaoShan

↑ comment by DaFranker · 2012-10-18T15:44:55.234Z · LW(p) · GW(p)

Congratulations, you've cleared the hidden test of making sure that this isn't all just a password in your head!

IMO, which one it was intended to be is irrelevant as long as you understand both cases. Understanding these things enough to be able to untangle them like this sounds like it's really the whole point of the article.

↑ comment by MaoShan · 2012-10-19T03:30:11.760Z · LW(p) · GW(p)

I took h to mean "I accurately receive the information that the universe conveys", which in this case regarding the state of my partner loving me or not, I would still accurately hear the universe, otherwise it would be not-h. Since I am considering possible states, partner-not-loving-me/universe-tells-me/me-hearing-that would be the second most likely possibility, because the other two variables are less in doubt (for the person in the example).

If this person were in real life, they probably are frustrated, wondering why on earth it feels like their partner is trying to drive a wedge in the relationship, when obviously they are in love, because the universe can magically read their minds and the crystal auras never lie.

comment by Doriana_Mandrelli · 2012-10-17T13:45:17.069Z · LW(p) · GW(p)

Commenter HistoricalLing does have a point. Katsuki Sekida explains:

"Now, 'Mu' means 'nothing' and is the first koan in Zen. You might suppose that, as you sit saying 'Mu', you are investigating the meaning of nothingness. But that is quite incorrect. It is true that your teacher, who has instructed you to work on Mu, may repeatedly say to you, 'What is Mu?' 'Show me Mu,' and so on, but he is not asking you to indulge in conceptual speculation. He wants you to experience Mu. And in order to do this, technically speaking, you have to take Mu simply as the sound of your own breath and entertain no other idea."

In Zen practice, the purpose of a "koan" is to occupy the mind with a fruitless question (or in LW parlance, a wrong question). (Although "Mu" isn't even a question!) This helps the meditator to maintain concentration, since by dwelling on a dead-end like "What is the samadhi in particle after particle?" he isn't distracted by the normal flux of flitting thoughts.

The student is still expected to provide an answer, eventually, but not one arrived at by rational thought—rather, it is supposed to strike him spontaneously. Of course, this isn't a generally wise approach to answering questions; but if the Zen master were to tell his student that the koan can't be answered, he might not take the exercise seriously. (I expect that Bayesians find it difficult to meditate using koans, since they are so keenly aware of wrong questions.)

A koan is a deliberately futile question, generally short and intended to obscure thought. To use this word also to refer to puzzles which are not skew to reality and which are intended to be answered sensibly, is likely to cause bad inferences about the purpose of koans in Zen—and is jarring in this context!

Replies from: Eliezer_Yudkowsky, thomblake, Luke_A_Somers

↑ comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2012-10-17T20:13:19.726Z · LW(p) · GW(p)

Suggest a better word? Keep in mind that words which are not better will be rejected (people often seem to forget this while making alternate suggestions).

Replies from: wuncidunci, TheOtherDave

↑ comment by wuncidunci · 2012-10-17T21:55:50.883Z · LW(p) · GW(p)

I think the division into problems and exercises usually seen in mathematics texts would be useful: A task is considered an exercise if it's routine application of previous material, it's a problem if it requires some kind of insight or originality. So far most of the Koans have seemed more like problems than like exercises, but depending on content both may be useful. I might be slightly biased towards this as I greatly enjoy mathematics texts and am used to that style.

Replies from: fubarobfusco, Eliezer_Yudkowsky

↑ comment by fubarobfusco · 2012-10-17T23:51:03.570Z · LW(p) · GW(p)

"Problem" suggests something different in philosophy than in math. A philosophy "problem" is a seeming dilemma, e.g. Gettier, Newcomb's, or Trolley. So I'd suggest "exercise" here.

"Exercise" dominates "kōan" in that both have the sense of something to stop and think about and try to solve, but ① "exercise" avoids the misconstrual of Zen practice (the purpose of a Zen kōan is not to come up with a solution, nor to set up for an explanation), ② the Orientalism (the dubiosity of saying something in Japanese to make it sound 20% cooler), and ③ the distraction of having to explain what a kōan is to those who don't know the word.

EDIT: The claim that a purpose of a Zen kōan is not to come up with a solution appears to be a matter of disagreement, so discount ①. I think ② and ③ stand, though.

Replies from: Richard_Kennaway

↑ comment by Richard_Kennaway · 2012-10-18T09:05:40.268Z · LW(p) · GW(p)

the purpose of a Zen kōan is not to come up with a solution, nor to set up for an explanation

The account in the Wikipedia article says differently:

However, in Zen practice, a kōan is not meaningless, and not a riddle or a puzzle. Teachers do expect students to present an appropriate response when asked about a kōan.

According to the history of the word given there, it originally meant accounts of legal decisions (and literally, a magistrate's bench). In Chinese Buddhism it came to refer to snippets of dialogue between masters. From there it mutated to the contemplation of mysterious sayings, and eventually to what looks very like an exercise in guessing the teacher's password, with authorised answers that were specifically taught and had to be given to acquire promotion in the Japanese monastery system. (I have this book, which is subtitled "281 Zen Koans with Answers".)

The modern meaning of "koan" dealt with in the section "Koan-practice" describes what looks very like Eliezer's intention in using the word here: a problem that cannot be answered by merely applying known rules to new examples, but requires new thoughts and ideas: a problem that begins by seeming impossible: a problem that cannot be solved without in the process learning something that one has not been taught.

Perhaps there is, somewhere, a better word, but I think "koan" will be hard to beat.

↑ comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2012-10-18T01:03:12.130Z · LW(p) · GW(p)

So... the main thing I want to convey over and above "exercise" is that rather than there being a straightforward task-to-solve, you're supposed to ponder the statement and say, "What do I think of this?"

A word other than "koan" which conveys this intent-to-ponder would indeed be appreciated.

Replies from: Kindly, MaoShan, Doriana_Mandrelli, shminux, lsparrish

↑ comment by Kindly · 2012-10-18T02:04:23.932Z · LW(p) · GW(p)

What about "riddle" or "puzzle"?

Replies from: Manfred, Jayson_Virissimo

↑ comment by Manfred · 2012-10-18T19:47:54.477Z · LW(p) · GW(p)

The only trouble I see is that "koan" makes it totally okay to think about it for a while without finding the answer, while "puzzle" might cause people to propose solutions.

Replies from: Kaj_Sotala, Kindly

↑ comment by Kaj_Sotala · 2012-10-19T05:59:07.727Z · LW(p) · GW(p)

Given that most people seem likely to look at the koan and think "yeah, I could solve that if I thought about it for a while" and then move on without actually thinking about it, anything that actually gets people to think about it seems like a good thing.

Replies from: Manfred

↑ comment by Manfred · 2012-10-19T15:18:55.238Z · LW(p) · GW(p)

The only trouble is if people then have to unthink things, which humans are notoriously bad at :P

↑ comment by Kindly · 2012-10-18T21:21:24.378Z · LW(p) · GW(p)

People have already been proposing solutions to the "koans", and I don't understand why that's a bad thing.

Replies from: DaFranker

↑ comment by DaFranker · 2012-10-18T21:35:48.213Z · LW(p) · GW(p)

The goal is to apply those algorithms we call "rationality" towards solving the koan, one of which involves withholding even just mentally formulating solutions as much as possible, and instead just thinking about the elements and properties of the problem properly without subjecting oneself to hack heuristics.

The word puzzle is, for most people, loaded with a trained impulse to shoot the first solution-sounding thing that pops to mind so that you can see whether you get a hedon / tribal status coin for a good answer or not.

Replies from: Kindly

↑ comment by Kindly · 2012-10-18T22:23:15.484Z · LW(p) · GW(p)

Alright. I see where you're coming from, though I doubt that "puzzle" and "koan" have as many deep connotations as you claim.

Maybe the right thing to do is to actually write something to the effect of "Here is how you should be approaching these puzzles/koans"?

↑ comment by Jayson_Virissimo · 2012-10-18T12:15:09.988Z · LW(p) · GW(p)

What about "riddle" or "puzzle"?

"Puzzle" is good because it suggests that there is a solution, whereas some "problems" don't have solutions, because they are simply confused.

Replies from: DaFranker

↑ comment by DaFranker · 2012-10-18T20:18:40.039Z · LW(p) · GW(p)

However, the trained behavior of most people when facing a puzzle is to look at it for a few seconds and then throw the first good-sounding solution you can think of.

Replies from: Kaj_Sotala

↑ comment by Kaj_Sotala · 2012-10-19T06:01:06.968Z · LW(p) · GW(p)

Which isn't necessarily a bad thing. Either they'll get a right answer despite throwing the first possible solution at it, or they'll widely miss the mark, in which case they might actually realize that they've learned something by the time that the right answer is demonstrated.

Replies from: DaFranker

↑ comment by DaFranker · 2012-10-19T14:24:05.824Z · LW(p) · GW(p)

You have a point. My (subconscious) priors on that end are skewed towards "Never, ever throw out solutions before you've laid things out properly" because of lots and lots of little personal experiences with complete failure modes due to stopping with the first solution I found.

↑ comment by MaoShan · 2012-10-18T02:28:12.687Z · LW(p) · GW(p)

I don't think "noodle" is taken, yet.

Replies from: Eliezer_Yudkowsky

↑ comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2012-10-18T02:40:22.772Z · LW(p) · GW(p)

Hm. I like the direction this points. Any similar suggestions?

Replies from: None, MaoShan

↑ comment by [deleted] · 2012-10-18T04:13:03.719Z · LW(p) · GW(p)

Replies from: Eliezer_Yudkowsky

↑ comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2012-10-22T05:31:57.144Z · LW(p) · GW(p)

Your suggestion has been... accepted!

Replies from: None

↑ comment by [deleted] · 2012-10-22T06:39:35.844Z · LW(p) · GW(p)

↑ comment by MaoShan · 2012-10-19T03:08:03.194Z · LW(p) · GW(p)

You could use "udon" instead of "noodle" to make it sound foreign and mysterious ;)

↑ comment by Doriana_Mandrelli · 2012-10-18T12:51:27.264Z · LW(p) · GW(p)

The word "pabulum" (from Latin for "fodder") was once used in English to mean "food for thought". However, it (or "pablum") is now more likely to denote insipid fare. We could reclaim the original meaning—in which case these statements-to-be-pondered are "pabula".

↑ comment by Shmi (shminux) · 2012-10-19T03:37:28.233Z · LW(p) · GW(p)

Consider adding "straightforward" exercises for the lesser mortals, and mark the harder ones, (koans?) with stars, like the standard textbooks do.

↑ comment by lsparrish · 2012-10-20T19:01:57.226Z · LW(p) · GW(p)

"Pondering exercise" maybe?

Interesting that "pondering" is a cognitive skill that needs exercised. The term derives from a latin term for "weight". Perhaps this can be thought of as something analogous to barbells or dumbells for epistemological strength-training.

Replies from: None

↑ comment by [deleted] · 2012-10-20T20:20:41.818Z · LW(p) · GW(p)

Perhaps this can be thought of as something analogous to barbells or dumbells for epistemological strength-training.

I like the way you think. Care to elaborate?

Replies from: lsparrish

↑ comment by lsparrish · 2012-10-20T22:56:49.196Z · LW(p) · GW(p)

Pondering means thinking about something in a way that makes it "heavy" or difficult for the mind to process (just as heavy objects are difficult to lift). Like the metaphorical "burden of proof," it references mental difficulty of processing ideas to physical difficulty in lifting objects. The way this happens involves increasing the complexity of your mental instantiation of an idea, thereby bringing more cognitive algorithms to bear on it.

The strength-training metaphor only works if it can be contrasted with endurance-training. Otherwise it would just be a generic kind of training. Strength training involves shorter bursts of focused effort followed by a recovery period. These koans are short and intended for 5-15-minutes of focused thought, so they are probably more on that end of the spectrum than lengthier articles that describe complex concepts.

Epistemological endurance training (assuming there is such a thing) would be where you use longer periods of time thinking about a problem that has a fair degree of mental effort required but not overwhelming. That would analogize to running, biking, and so forth where rather than doing the hardest thing you can do, you are doing something rather hard for a longer time.

Replies from: None

↑ comment by [deleted] · 2012-10-21T01:58:51.662Z · LW(p) · GW(p)

Ooops I miscommunicated. I think the surface analogy isn't the most interesting part of this.

I was more interested in what ideas you had for training epistemological ability. The burst vs endurance thing could be interesting if it could be detailed in on its own terms (ie. inside view instead of analogizing).

I've been thinking a lot about rationality training recently, so anything that looks like a possible excercise really catches my attention.

Replies from: lsparrish

↑ comment by lsparrish · 2012-10-21T03:23:35.771Z · LW(p) · GW(p)

So it must have been "pondering as a rationality skill" which got your attention. Sorry for misinterpreting. :)

For me it's not hard to ponder. I do that naturally. But I don't always ponder exactly what I'm told to ponder, even when I have every reason to think the person who told me to ponder something knows what they're talking about and this is something that if I ponder it I will benefit from the resulting enlightenment. It's like there is something in the nature of pondering that is perverse and rebellious (at least for the way my mind works, some of the time).

Perhaps a good exercise would be to deliberately ponder specific things that you aren't (yet) naturally curious about. Maybe set a timer and commit to only focus on that particular topic until the timer goes off. I wonder what an optimal time length would be? Also, what kinds of topics could/should be used for the exercise?

↑ comment by TheOtherDave · 2012-10-17T20:20:00.307Z · LW(p) · GW(p)

Whether it's better or not for your purposes is of course your call, but as I said to chaosmosis above, I resolve this tension in my own mind by understanding "koan" as you use it to mean "exercise."

Then again, I also replace all of your Japanese phrases in my head with their corresponding English.

I suspect this just reflects my not valuing a particular kind of myth-building very much in this context, so I just experience it as a mildly annoying distraction.
If you find it valuable, by all means continue with it.

Replies from: shminux

↑ comment by Shmi (shminux) · 2012-10-17T20:44:46.166Z · LW(p) · GW(p)

I resolve this tension in my own mind by understanding "koan" as you use it to mean "exercise."

I do the same. I could find no deeper meaning in EY's use of "koan". Maybe I'm missing something.

I also replace all of your Japanese phrases in my head with their corresponding English.

Same here, except I have to look up this annoying pseudo-Japanese in-group slang almost every time. Is using it intended as some kind of status signaling?

↑ comment by thomblake · 2012-10-17T14:23:41.599Z · LW(p) · GW(p)

A koan is a deliberately futile question, generally short and intended to obscure thought. To use this word also to refer to puzzles which are not skew to reality and which are intended to be answered sensibly, is likely to cause bad inferences about the purpose of koans in Zen—and is jarring in this context!

I don't think repurposing the word 'koan' is that terrible. We are not going to do Zen koans in this context, and I would not be surprised to find that many here are more familiar with things such as Ruby koans.

Also, there is some disagreement about the meaning and use of koans - Zen (and Chan, Seon) buddhism has many flavors. Notably, historically koans (and the Chinese sayings they were based on) did not necessarily have the character you attribute to them above; they were originally just teachings passed down in the form of sayings.

Replies from: chaosmosis

↑ comment by chaosmosis · 2012-10-17T15:11:14.869Z · LW(p) · GW(p)

The origins of the word aren't very relevant to its current meaning; almost no one on this site would have known those origins before now and so those origins don't have much influence on the way we think about the word now. The standard understanding of koans that dominates pretty much everywhere is in line with what Doriana quotes.

Using the word koan is inaccurate. I think Yudkowsky is either trying to do it to associate feelings of mystic power with rationality, or to attack feelings of mystic power by setting up expectations and then destroying those; I don't have any idea which. But it somewhat annoys me. It's not a huge deal, but it's annoying.

I'm all for repurposing words, but only if there's a decent justification to do so and I don't see one here.

Replies from: wedrifid, Manfred, thomblake

↑ comment by wedrifid · 2012-10-17T15:28:45.068Z · LW(p) · GW(p)

Using the word koan is inaccurate. I think Yudkowsky is either trying to do it to associate feelings of mystic power with rationality, or to attack feelings of mystic power by setting up expectations and then destroying those; I don't have any idea which. But it somewhat annoys me. It's not a huge deal, but it's annoying.

The first of those two hypotheses but yes, it's annoying and jarring. I had kind of hoped Eliezer got the mystic zen martial arts nonsense out of his system years ago and could start talking plain sense now.

Replies from: Kaj_Sotala, chaosmosis

↑ comment by Kaj_Sotala · 2012-10-18T09:38:17.339Z · LW(p) · GW(p)

I like the mystic Zen martial arts nonsense. Looks like it's the time for a poll.

Eliezer's mystic Zen martial arts nonsense is...

[pollid:182]

Replies from: Emile, drethelin, roystgnr, Risto_Saarelma

↑ comment by Emile · 2012-10-18T17:16:04.978Z · LW(p) · GW(p)

I voted "Don't care", whereas in reality it's more that I like the things like the cult koans and Tsuyoku Naritai, but find the current use of "Koan" so-so (I like the questions, the term "koan" is a bit jarring, but I can get used to it)

↑ comment by drethelin · 2012-10-18T12:29:17.253Z · LW(p) · GW(p)

I find it super obnoxious, in exactly the same way I felt when my martial arts teachers talked about using my dantian to focus my chi instead of breathing with my diaphragm or whatever is actually useful.

↑ comment by roystgnr · 2012-10-19T00:11:03.813Z · LW(p) · GW(p)

In general the "mystic Zen martial arts nonsense" is a nice antidote to the Straw Vulcan stereotype.

That's no excuse for misusing a word in this specific instance, though.

↑ comment by Risto_Saarelma · 2012-10-18T10:43:30.822Z · LW(p) · GW(p)

The problem with regular theory exposition is that we don't have a good theoretical framework for discussing how to put theory to practice, so the difficult to express parts about applying the theory just get omitted. I like the martial arts nonsense so far as it connotes an intention that you are supposed to actually put the subject matter to use and win with it, in addition to just appreciating the theory. Since we don't know how to express general instructions for putting theory to practice very well in plain speech, some evocative mysticism may be the best we can do.

↑ comment by chaosmosis · 2012-10-17T17:56:39.586Z · LW(p) · GW(p)

I don't always dislike it. "I must become stronger" benefited from the approach. I dislike this specific instance because it's jarring and doesn't fit with the context and it's a misuse of the word "koan".

↑ comment by Manfred · 2012-10-17T20:01:04.283Z · LW(p) · GW(p)

The origins of the word aren't very relevant to its current meaning

If you'll allow me to take this a bit out of context, please think of typical Zen usage as "origins of the word" and usage in this sequence of posts as "its current meaning."

The difference is obvious, of course - you know what the word means, and anything else is wrong. Which is totally fine. I just wanted to point out that if you try to make your conclusions universal or absolute here, you will in fact create more relativism - the solution is to claim the non-universal knowledge of how words should be used if you're the audience.

↑ comment by thomblake · 2012-10-17T15:25:48.275Z · LW(p) · GW(p)

The standard understanding of koans that dominates pretty much everywhere is in line with what Doriana quotes.

I disagree. I would predict that most people have no idea what "koan" means, those that have seriously studied Buddhism are aware of the controversy, and a significant mass of people (especially represented in this demographic) are more familiar with the use of "koan" in programming, as with Ruby koans.

The concern seems to be that those who haven't actually studied varieties of Buddhism but are somehow aware of the word "koan" might be confused - but the word is clearly defined before its first use in this sequence:

(A 'koan' is a puzzle that the reader is meant to attempt to solve before continuing. It's my somewhat awkward attempt to reflect the research which shows that you're much more likely to remember a fact or solution if you try to solve the problem yourself before reading the solution; succeed or fail, the important thing is to have tried first . This also reflects a problem Michael Vassar thinks is occurring, which is that since LW posts often sound obvious in retrospect, it's hard for people to visualize the diff between 'before' and 'after'; and this diff is also useful to have for learning purposes. So please try to say your own answer to the koan - ideally whispering it to yourself, or moving your lips as you pretend to say it, so as to make sure it's fully explicit and available for memory - before continuing; and try to consciously note the difference between your reply and the post's reply, including any extra details present or missing, without trying to minimize or maximize the difference.)

Replies from: chaosmosis

↑ comment by chaosmosis · 2012-10-17T18:05:55.275Z · LW(p) · GW(p)

When I google "koan", the first result is Wikipedia which says a koan is "a story, dialogue, question, or statement, which is used in Zen practice to provoke the "great-doubt", and test the students progress in Zen practice". Very Zen, that supports my side. The second result is Merriam-Webster's dictionary, which says a koan is "a paradox to be meditated upon that is used to train Zen Buddhist monks to abandon ultimate dependence on reason". My side. The third result is for a page titled "101 Zen Koans", which again supports my belief.

Eliezer has a history of associating mysticism with rationality, as well.

My personal concern is that using words wrong is annoying because I don't like people mucking up my conceptual spaces. I can't disassociate koans from mysticism and riddles, which makes it awkward and aesthetically unpleasing for me to approach problems of rationality from a "koan".

That said, it's probably too late to change the format of the problems in this current sequence. But I'd like it to never happen again after this gets done.

Replies from: TheOtherDave

↑ comment by TheOtherDave · 2012-10-17T20:14:11.232Z · LW(p) · GW(p)

I suspect it will continue to happen. Invoking the cultural trappings of a certain kind of mysticism while discussing traditionally "rational" topics is, as you note, a popular practice... and not only of Eliezer's.

I recommend treating the word "koan" as used here as a fancy way of saying "exercise".

↑ comment by Luke_A_Somers · 2012-10-18T17:25:36.769Z · LW(p) · GW(p)

And then we realize that the use of the word 'koan' was not entirely serious, and get on with the sequence.

Also, note the side-effect of that karma penalty - responding to things without organizing the post appropriately. Whee.

(note to self: check when I loaded the page before commenting)

comment by pragmatist · 2012-10-18T14:06:28.745Z · LW(p) · GW(p)

You seem to be exaggerating the generality of the causal Markov condition (CMC) when you say it is deeper and more general than the second law of thermodynamics. In a big world, failures of the CMC abound. Let's say the correlation between the psychic cousin's predictions and the top card of the deck is explained by the person performing the test being a stooge, who is giving some non-verbal indication to the purported psychic about the top card. So here we have a causal explanation of the correlation, as the CMC would lead us to expect. But since we are in a big world, there are a massive number of Boltzmann brains out there, outside our light cone, whose brain states correlate with the top card in the same way that the cousin's does. But there is no causal explanation for this correlation, it's just the kind of thing one would expect to happen, even non-causally, in a sufficiently large world. So the CMC isn't a universal truth.

Now, the CMC is a remarkably accurate rule if we restrict it to our local environment. But it's pretty plausible that this is just because our local environment is monotonically entropy-increasing towards the future and entropy-decreasing towards the past. Because of this feature of our environment, local interventions produce correlations that propagate out spatially towards the future, but not towards the past. When you drop a rock into a pond, waves originate at the point the rock hit the water and travel outwards towards the future, eventually producing spatially distant correlations (like fish at either end of the pond being disturbed from their slumber).

Imagine that there is a patch somewhere in the trackless immensity of spacetime that looks exactly like our local environment, but time-reversed. Here we would have a pond with a rock initially lying at its bottom. Spontaneously, the edges of the pond fluctuate so as to produce a coherent inward-directed wave, which closes in on the rock, transferring to it sufficient energy to make it shoot out of the pond. If you don't allow backward causation, then it seems that the initial correlated fluctuation that produced the coherent wave has no causal explanation, a violation of the CMC.

The second law is often read as a claim about the condition of the early universe (or some patch of the universe), specifically that there were no correlations between different degrees of freedom (such as the positions and velocities of particles) except for those imposed by the macroscopic state. There were no sneaky microscopic correlations that could later produce macroscopic consequences (see this paper). Entropy increase follows from that, the story goes, and, plausibly, the success of the CMC follows from that as well. There is a strong case to be made that the second law is prior to the CMC in the order of explanation.

Replies from: scav, adam_strandberg, khafra, nshepperd

↑ comment by scav · 2012-10-18T15:02:06.608Z · LW(p) · GW(p)

I have doubts about how meaningful it is to talk of correlating things that are outside each other's light cones.

Besides that, suppose there really are an astronomical number of Boltzmann Brains that you could say are non-causally correlated with the top card of a particular deck of cards. Calling this a failure of the Causal Markov Condition is begging the question because the only thing identifying this set is selection based on the correlation itself. The set you should consider, of all Boltzmann Brains that you could test for correspondence with the top card, will not be correlated with it at all.

Entropy increase follows from that, the story goes...

Follows from it causally, like? :)

Replies from: pragmatist

↑ comment by pragmatist · 2012-10-18T16:25:14.044Z · LW(p) · GW(p)

I have doubts about how meaningful it is to talk of correlating things that are outside each other's light cones.

I don't see why you would have these doubts. Whether or not two variables are correlated is a purely mathematical condition. Why do you think it matters where in space-time the physical properties those variables describe are instantiated?

Besides that, suppose there really are an astronomical number of Boltzmann Brains that you could say are non-causally correlated with the top card of a particular deck of cards. Calling this a failure of the Causal Markov Condition is begging the question because the only thing identifying this set is selection based on the correlation itself. The set you should consider, of all Boltzmann Brains that you could test for correspondence with the top card, will not be correlated with it at all.

Wait, why is the relevant reference class the class of all and only Boltzmann brains? It seems more natural to pick a reference class that includes all brains (or brain-states). But in that case, the probabilities of the Boltzmann brain being in the states that it is in will be exactly the same as the probabilities of the psychic cousin being in the states that he is in (since the states are the same by hypothesis), so if the psychic's brain states are correlated with the top card the BB's will be as well.

Follows from it causally, like? :)

Sure, if you want. I'm not denying here that causality is prior to the second law. I'm denying that the causal Markov condition is prior to the second law.

Replies from: scav

↑ comment by scav · 2012-10-19T09:13:53.208Z · LW(p) · GW(p)

OK. wrt the light cones, I was posting without my brain switched on. Obviously two events can be outside each others light cones and yet a correlation between them still be observed where their light cones overlap in the future. I was thinking fairly unclearly about whether you could be in an epistemic state to consider correlation between things outside your own light cone, but this is kind of irrelevant, so please disregard.

the probabilities of the Boltzmann brain being in the states that it is in will be exactly the same as the probabilities of the psychic cousin being in the states that he is in (since the states are the same by hypothesis)

Just because the states are the same doesn't mean the probability of being in that state are the same. It's only meaningful to discuss the probability of an outcome in terms of a probability distribution over possible outcomes. If you pick a set of conditions such as "Boltzmann brains in the same state as that of the psychic cousin" you are creating the hypothetical correlation yourself by the way you define it. To my mind, that's not a thought experiment that can tell you anything.

Replies from: pragmatist

↑ comment by pragmatist · 2012-10-19T12:31:46.916Z · LW(p) · GW(p)

Just because the states are the same doesn't mean the probability of being in that state are the same. It's only meaningful to discuss the probability of an outcome in terms of a probability distribution over possible outcomes.

In my example, I specified that the BB is in a reference class with all other brains, including the psychic cousin's. Given that they are both in the reference class, the fact that the BB and the cousin share the same cognitive history implies that the probabilities of their cognitive histories relative to this reference class are the same. The reference class is what fixes the probability distribution over possible outcomes if you're determining probabilities by relative frequencies, and if they are in the same reference class, they will have the same probability distribution.

I suspect Eliezer was thinking of a different probability distribution over brain states when he said the psychic's brain state is correlated with the deck of cards. The probabilities he is referring to are something like the relative frequencies of brain states (or brain state types) in a single observer's cognitive history (ETA: Or perhaps more accurately for a Bayesian, the probabilities you get when you conditionalize some reasonable prior on the sequence of instantiated brain states). Even using this distribution, the BB's brain state will be correlated with the top card.

Replies from: adam_strandberg

↑ comment by adam_strandberg · 2012-10-19T13:59:02.405Z · LW(p) · GW(p)

Even if the BB and the psychic are in causally disconnected parts of your model, them having the same probability of being correlated with the card doesn't imply that the Causal Markov Condition is broken. In order to show that, you would need to specify all of the parent nodes to the BB in your model, calculate the probability of it being correlated with the card, and then see whether having knowledge of the psychic would change your probability for the BB. Since all physics currently is local in nature, I can't think of anything that would imply this is the case if the psychic is outside of the past light cone of the BB. Larger boundary conditions on the universe as a whole that may or may not make them correlate have no effect on whether the CMC holds.

Replies from: pragmatist

↑ comment by pragmatist · 2012-10-19T14:45:43.268Z · LW(p) · GW(p)

I'm having trouble parsing this comment. You seem to be granting that the BB's state is correlated with the top card (I'm assuming this is what you mean by "having the same probability"), that there is no direct causal link between the BB and the psychic, and that there are no common causes, but saying that this still doesn't necessarily violate the CMC. Am I interpreting you right? If I'm not, could you tell me which one of those premises does not hold in my example?

If I am interpreting you correctly, then you are wrong. The CMC entails that if X and Y are correlated, X is not a cause of Y and Y is not a cause of X, then there are common causes of X and Y such that the variables are independent conditional on those common causes.

↑ comment by adam_strandberg · 2012-10-19T05:32:50.472Z · LW(p) · GW(p)

The CMC is not strictly violated in physics as far as we know. If you specify the state of the universe for the entire past light cone of some event, then you uniquely specify the event. The example that you gave of the rock shooting out of the pond indeed does not violate the laws of physics- you simply shoved the causality under the rug by claiming that the edge of the pond fluctuated "spontaneously". This is not true. The edge of the pond fluctuating was completely specified by the past light cone of that event. This is the sense in which the CMC runs deeper than the 2nd law of thermodynamics- because the 2nd "law" is probabilistic, you can find counterexamples to it in an infinite universe. If you actually found a counterexample to the CMC, it would make physics essentially impossible.

Replies from: pragmatist

↑ comment by pragmatist · 2012-10-19T06:51:56.706Z · LW(p) · GW(p)

I meant "spontaneous" in the ordinary thermodynamic sense of spontaneity (like when we say systems spontaneously equilibriate, or that spontaneous fluctuations occur in thermodynamic systems), so no violation of microphysical law was intended. Spontaneous here just means there is no discernable macroscopic cause of the event. Now it is true that everything that happened in the scenario I described was microscopically determined by physical law, but this is not enough to satisfy the CMC. What we need is some common cause account of the macroscopic correlation that leads to a coherent inward-directed wave, and simply specifying that the process is law-governed does not provide such an account. I guess you could just say that the common cause is the initial conditions of the universe, or something like that. If that kind of move is allowed, then the CMC is trivially satisfied for every correlation. But when people usually appeal to the CMC they intend something stronger than this. They're usually talking about a spatially localized cause, not an entire spatial hypersurface.

If you allow entire hypersurfaces as nodes in your graph, you run into trouble. In a deterministic world, any correlation between two properties isn't just screened off by the contents of past hypersurfaces, it's also screened off by the contents of future hypersurfaces. But a future hypersurface can't be a common cause of the correlated properties, so we have a correlation screened off by a node that doesn't d-separate the correlated variables. This doesn't violate the CMC per se, but it does violate the Faithfulness Condition, which says that the only conditional independencies in nature are the ones described by the CMC. If the Faithfulness Condition fails, then the CMC becomes pretty useless as a tool for discerning causation from correlation. The lessons of Eliezer's posts would no longer apply. So to rule out radical failure of the Faithfulness Condition in a deterministic setting, we have to disallow the contents of an entire hypersurface from being treated as a single node in a causal graph. Nodes should correspond to sufficiently locally instantiated properties. But then that re-opens the possibility that the correlation described in my example violates the CMC. There is no locally instantiated common cause.

If there is some past screener-off of the correlation in the time-reversed patch, its counterpart would also be a future screener-off of the correlation in our patch. If we want to say that the Faithfulness Condition holds in our patch (or at least in this example), we have to rule out future screeners-off, but that also implies that the CMC fails in the time-reversed patch.

↑ comment by khafra · 2012-10-22T19:24:06.408Z · LW(p) · GW(p)

Indexically, though, you wouldn't expect to be talking to a mind that just happened to issue something it called predictions, which just happened to be correlated with some unobserved cards, would you? I think the CMC doesn't say that a mind can never be right without being causally entangled with the system it's trying to be right about; just that if it is right, it's down to pure chance.

Replies from: pragmatist

↑ comment by pragmatist · 2012-10-22T23:49:03.357Z · LW(p) · GW(p)

I think the CMC doesn't say that a mind can never be right without being causally entangled with the system it's trying to be right about; just that if it is right, it's down to pure chance.

No, the CMC says that if you conditionalize on all of the direct causes of some variable A in some set of variables, then A will be probabilistically independent of all other variables in that set except its effects. This rules out chance correlation. If there were some other variable in the set that just happened to be correlated with A without any causal explanation, then conditionalizing on A's direct causes would not in general eliminate this correlation.

↑ comment by nshepperd · 2012-10-19T07:15:18.963Z · LW(p) · GW(p)

If coincidences were a violation of the CMC, it wouldn't be a truth at all, would it?

Replies from: pragmatist

↑ comment by pragmatist · 2012-10-19T07:20:28.583Z · LW(p) · GW(p)

Well, one could still say it was true in certain environments, or true like the Ideal Gas Law is true.

comment by johnlawrenceaspden · 2012-10-18T10:23:28.228Z · LW(p) · GW(p)

I am really enjoying these causality posts. Thank you for them and for the skillful writing that makes them so readable.

comment by incariol · 2012-10-23T23:25:39.517Z · LW(p) · GW(p)

Um, let's see if I get this (thinking to myself but posting here if anyone happens to find this useful - or even intelligible)...

claiming you know about X without X affecting you, you affecting X, or X and your belief having a common cause, violates the Markov condition on causal graphs

The causal Markov condition is that a phenomenon is independent of its noneffects, given its direct causes. It is equivalent to the ordinary Markov condition for Bayesian nets (any node in a network is conditionally independent of its nondescendents, given its parents) when the structure of a Bayesian network accurately depicts causality.

So, this condition induces certain (conditional) independencies between nodes in a causal graph (that can be found using the D-separation trick), and when we find two such nodes, they must also be uncorrelated (this follows from probabilistic independence being a stronger property than uncorrelatedness).

If one therefore claims there's a persistent correlation between X and belief about X, this means there's got to be some active path in Bayesian network for probabilistic influence to flow between them - otherwise, X and Belief(X) would be D-separated and thereby independent and uncorrelated. Insisting there's no such path (e.g. no chain of directed links) leads to violation of Markov condition, since it maintains there's probabilistic dependence between two nodes in a graph that cannot be accounted for by the causal links currently in the graph.

comment by Eugine_Nier · 2012-10-19T03:06:57.089Z · LW(p) · GW(p)

More generally, for me to expect your beliefs to correlate with reality, I have to either think that reality is the cause of your beliefs, expect your beliefs to alter reality, or believe that some third factor is influencing both of them.

I can construct examples where for this to be true requires us to treat mathematical truths as causes. Of course, this causes problems for the Bayesian definition of "cause".

Replies from: Eliezer_Yudkowsky, adam_strandberg, endoself

↑ comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2012-10-27T03:33:33.055Z · LW(p) · GW(p)

Yes. An argument similar to this should still be in the other-edited version of my unfinished TDT paper, involving a calculator on Venus and a calculator on Mars, the point being that if you're not logically omniscient then you need to factor out logical uncertainty for the Markov property to hold over your causal graphs, because physically speaking, all common causes should've been screened off by observing the calculators' initial physical states on Earth. Of course, it doesn't follow that we have to factor out logical uncertainty as a causal node that works like every other causal node, but we've got to factor it out somehow.

Replies from: Eugine_Nier, None

↑ comment by Eugine_Nier · 2012-10-27T20:52:10.151Z · LW(p) · GW(p)

My point is more general than this. Namely, that a calculator on Earth and a calculator made by aliens in the Andromeda galaxy would correspond despite humans and the Andromedeans never having had any contact.

↑ comment by [deleted] · 2012-10-27T03:50:38.194Z · LW(p) · GW(p)

Of course, it doesn't follow that we have to factor out logical uncertainty as a causal node that works like every other causal node

Is there some reason not to treat logical stuff as normal causal nodes? Does that cause us actual trouble, or is it just a bit confusing sometimes?

Replies from: Eliezer_Yudkowsky

↑ comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2012-10-27T04:22:25.320Z · LW(p) · GW(p)

In causal models, we can have A -> B, E -> A, E -> ~B. Logical uncertainty does not seem offhand to have the same structure as causal uncertainty.

Replies from: Eugine_Nier

↑ comment by Eugine_Nier · 2012-10-27T20:56:52.965Z · LW(p) · GW(p)

You seem to be confusing the causal arrow with the logical arrow. As endoself points out here proofs logically imply their theorems, but a theorem causes its proof.

↑ comment by adam_strandberg · 2012-10-19T06:24:00.351Z · LW(p) · GW(p)

Can you provide an example? I would claim that for any model in which you have a mathematical truth as a node in a causal graph, you can replace that node by whatever series of physical events caused you to believe that mathematical truth.

Replies from: Eugine_Nier, Peterdjones

↑ comment by Eugine_Nier · 2012-10-19T07:57:16.008Z · LW(p) · GW(p)

I add 387+875 to get 1262, from this I can conclude that anyone else doing the same computation will get the same answer despite never having interacted with them.

Replies from: Peterdjones

↑ comment by Peterdjones · 2012-10-19T17:42:17.804Z · LW(p) · GW(p)

You can't conclude that unless you are aware of the contingent fact that they are capable of getting the answer right.

Replies from: chaosmosis

↑ comment by chaosmosis · 2012-10-19T17:51:38.512Z · LW(p) · GW(p)

"The same computation" doesn't cover that?

↑ comment by Peterdjones · 2012-10-19T17:41:31.386Z · LW(p) · GW(p)

Why would you want a mathematical truth on a causal graph? Are the transation probabilities ever going to be less than 1.0?

Replies from: Eugine_Nier

↑ comment by Eugine_Nier · 2012-10-20T03:05:37.170Z · LW(p) · GW(p)

The transition probabilities from the mathematical truth on something non-mathematical will certainly be less than 1.0.

Replies from: Peterdjones

↑ comment by Peterdjones · 2012-10-21T17:01:43.652Z · LW(p) · GW(p)

And the transition probabilities to a truth will be 1.0. So why write it in? It would be like sprinkiling a circuit diagram with zero ohm resistors.

Replies from: Eugine_Nier

↑ comment by Eugine_Nier · 2012-10-21T22:23:13.895Z · LW(p) · GW(p)

Because otherwise the statement I quoted in the great-great-grandparent becomes false.

Replies from: Peterdjones

↑ comment by Peterdjones · 2012-10-23T22:02:50.183Z · LW(p) · GW(p)

Inasmuch as you have stipulated that "performing the same calculation" means "perforing the same calculation correcly", rahter than something like "launching the same algorithm but possibly crashing", your statement is tautologous. In fact, it isa special case of the general statement that anyone succesfully performing a calculation will get the same result as everyone else. But why woud you want to use a causal diagrtam to represent a tuatlotlogy? The two have different properties. Causal diagrams have <1.0 transition probabilities, which tautologies don't. Tautologies have concpetually intelligible relationships between their parts, which causal diagrams don't.

Replies from: Eugine_Nier

↑ comment by Eugine_Nier · 2012-10-24T03:09:35.173Z · LW(p) · GW(p)

Observe that your two objections cancel each other out. If someone performs the same calculation, there is a significant (but <1.0) chance that it will be done correctly.

Replies from: Peterdjones

↑ comment by Peterdjones · 2012-10-25T08:29:25.653Z · LW(p) · GW(p)

What has that to do with mathemmatica truth? You might as well say that if someone follows the same recipe there e is a significant chance that the same dish will be produced. Inasmuch as you are takling about someting that can haphazardly fail, you are not talking about mathematical truth.

Replies from: Eugine_Nier

↑ comment by Eugine_Nier · 2012-10-26T00:35:03.327Z · LW(p) · GW(p)

I can predict what someone else will conclude, without any causal relationship, in the conventional sense, between us.

Replies from: CCC, Peterdjones

↑ comment by CCC · 2012-10-26T07:42:56.626Z · LW(p) · GW(p)

Your prediction is a prediction of what someone else will conclude, given a set of initial conditions (the mathematical problem) and a set of rules to apply to these conditions. The conclusion that you arrive at is a causal descendant of the problem and the rules of mathematics; the conclusion that the other person arrives at is a causal descendant of the same initial problem and the same rules.

That's the causal link.

Replies from: Eugine_Nier

↑ comment by Eugine_Nier · 2012-10-26T23:02:35.825Z · LW(p) · GW(p)

That's my point. Specifically, that one should have nodes in one's causal diagram for mathematical truths, what you called "rules of mathematics".

Replies from: CCC

↑ comment by CCC · 2012-10-28T14:26:36.982Z · LW(p) · GW(p)

Surely the node should be "person X was taught basic mathematics", and not mathematics itself?

Replies from: Eugine_Nier

↑ comment by Eugine_Nier · 2012-11-03T21:50:15.830Z · LW(p) · GW(p)

The point of having the node is to have a common cause of person X's beliefs about mathematics and person Y's beliefs about mathematics that explains why these two beliefs are correlated even if both discovered said mathematics interdependently.

↑ comment by Peterdjones · 2012-10-26T01:05:30.677Z · LW(p) · GW(p)

What has that to do with any causal powers of mathematical truth?

Replies from: Eugine_Nier

↑ comment by Eugine_Nier · 2012-10-26T01:08:49.033Z · LW(p) · GW(p)

If you what your causal graph to have the property I quoted here, you need to add nodes for mathematical truths.

Replies from: Peterdjones

↑ comment by Peterdjones · 2012-10-26T01:19:36.462Z · LW(p) · GW(p)

Two people can arrive at the same solution to a crossword, but that does not mean there is a Cruciverbial Truth that has causal powers.

Replies from: Eugine_Nier

↑ comment by Eugine_Nier · 2012-10-26T02:32:52.331Z · LW(p) · GW(p)

Yes it does. In this case said truth even has a physical manifestation, i.e., as the crossword-writer's solution as it exists in some combination of his head and his notes which is causal to the form of the crossword the solver sees.

Replies from: Peterdjones

↑ comment by Peterdjones · 2012-10-26T08:01:47.186Z · LW(p) · GW(p)

It only has a physical manifestation. Cruciverbial Truth only summarises what could have been arrived at by a massively fine-grained examinination of the crossword-solver's neurology. It doesn't have causal powers of its own. Its redundant in relation to physics.

↑ comment by endoself · 2012-10-20T04:32:50.018Z · LW(p) · GW(p)

Mathematical truths do behave like causes. Remember, Bayesian probabilities represent subjective uncertainty. Yes, my uncertainty about the Riemann hypothesis is correlated with my uncertainty about other mathematical facts is the same way that my uncertainty about some physical facts is correlated with my uncertainty about others, so I can represent them both as Bayesian networks (really, one big Bayesian network, as my uncertainty about math is also correlated with my uncertainty about the world).

comment by Decius · 2012-10-17T21:54:20.031Z · LW(p) · GW(p)

To answer your discussion about randomizing the control groups and experimental groups- you don't use randomness or noise to divide those groups. You divide the population for study into the number of groups you need, and make that division such that those groups are as close to identical as possible, using all of the data you have on all of them.

Thermal noise and pseudo-random numbers can be used to break ties, but only because if there were any known distinction between the two outcomes, the classification would be deterministic.

comment by MoritzG · 2019-09-09T17:18:44.711Z · LW(p) · GW(p)

"universe is a connected fabric of causes and effects."

I do not think that the universe as a whole is one fabric of causes and effects. There are isolating layers of randomness and chaos upon which there are new layers of emergence. This is why we can model at all without having one unified model.

"Every causally separated group of events would essentially be its own reality."

Places outside our solar system are their own realities in that sense. We have no effect there. Only maybe someone is there to amplify our radio signals.

comment by DavidAgain · 2012-10-20T08:26:18.253Z · LW(p) · GW(p)

Having spent a regrettably large amount of time on forums where the 'magisteria' type questions were had, I think that you're representing the 'outside of science' position slightly unfairly. Obviously, it often tries to have its cake and eat it. But you're substituting 'standard rationality', or perhaps 'questions of cause and effect' for 'science'. Some magisteria-types would say that there are direct causal effects from God or ghosts, but that these do not manifest with the regularity of things that you're likely to be able to find through scientific experiment. They think that the world is better explained by including God or ghosts, but that you can't devise an experiment to prove/disprove them (for a variety of reasons, up to and including 'the ghosts don't come out when you're trying to test if they exist'.

This is aside from the people who basically mean that their religion or whatever is just subjective.

Replies from: Eugine_Nier

↑ comment by Eugine_Nier · 2012-10-21T02:50:48.241Z · LW(p) · GW(p)

He discusses that distinction here.

comment by Decius · 2012-10-17T22:30:03.709Z · LW(p) · GW(p)

Any time there's a noun, a verb, and a subject[sic], there's causality.

Counterexamples "I know this." "Rational people with the same information cannot reasonably disagree about their conclusions." "General and Special Relativity both require that observers in different reference frames measure the length of an artifact differently."

I think you might have meant is "Any time that a concrete subject takes an action with a direct object", there's causality; there's probably a more general form.

I know that the top card is either the six of spades or some variant of 'rules of poker/ranking of poker hands'. There is no 'because' in that sentence, because 'because' is a word that only has meaning in causal terms. Go ahead- test me with the nearest deck of cards.

comment by HistoricalLing · 2012-10-17T09:27:40.029Z · LW(p) · GW(p)

Previous koan:

"You say that a universe is a connected fabric of causes and effects. Well, that's a very Western viewpoint - that it's all about mechanistic, deterministic stuff. I agree that anything else is outside the realm of science, but it can still be real, you know. My cousin is psychic - if you draw a card from his deck of cards, he can tell you the name of your card before he looks at it. There's no mechanism for it - it's not a causal thing that scientists could study - he just does it. Same thing when I commune on a deep level with the entire universe in order to realize that my partner truly loves me. I agree that purely spiritual phenomena are outside the realm of causal processes that can be studied by experiments, but I don't agree that they can't be real."

that's pretty much the worst koan I've ever heard

comment by LauralH · 2012-10-30T05:04:59.612Z · LW(p) · GW(p)

You're writing this instead of Harry Potter fanfic? Sigh.

Replies from: David_Gerard, None

↑ comment by David_Gerard · 2012-10-30T08:53:56.590Z · LW(p) · GW(p)

This is actual day-job stuff.

Replies from: LauralH

↑ comment by LauralH · 2012-12-31T22:11:18.887Z · LW(p) · GW(p)

I was under the impression that the HPMoR story was to entice people to become "more rational", that is, get them to read more of the "day-job" stuff. There was also supposed to be an actual book on rationality, but it looks like that's been put on hold as well. Which to me seemed like a wise decision, since more people were being led to simply read the sequences via HPMoR already, so why bother with a book?

Replies from: BerryPick6

↑ comment by BerryPick6 · 2012-12-31T22:14:10.732Z · LW(p) · GW(p)

Which to me seemed like a wise decision, since more people were being led to simply read the sequences via HPMoR already, so why bother with a book?

What evidence do you have for this? I recall some stats from the last census which indicated that LWers referred here by HPMoR were less likely to have read the sequences and be active participants than the general population.

↑ comment by [deleted] · 2012-10-30T05:09:11.677Z · LW(p) · GW(p)

Try reading through Mysterious Answers to Mysterious Questions. You might actually find yourself enjoying it!

Stuff That Makes Stuff Happen

Contents

128 comments