Attempt to explain Bayes without much maths, please review

post by David_Gerard · 2011-08-06T09:24:30.349Z · LW · GW · Legacy · 26 comments

Contents

26 comments

My current favourite waste of time is the concept of Bayesian postmodernism. Just putting those two words together invokes a world of delightful wrangling, as approximately anyone who understands one won't understand or will have contempt for the other. (Though I found at least one person - a programmer who's studied philosophy - who got the idea straight away when I posted it to my blog, which is one more than I was expecting.) It is currently a page of incoherent notes and isn't necessarily actually useful for anything as yet and may never be.

Anyway, that's not my point today. My point today is that as part of this, I have to somehow explain Bayesian thinking in a nutshell to people who are highly intelligent, but have no mathematical knowledge and may actually be dyscalculic - but who can and do get the feel of things. I'm trying to get across that this is how learning works already and I just want to make them aware of it. I've run it past a couple of working artists who seemed to get the idea a bit. So I am posting this here for your technical correction.

If you think it's any good, please do run it past artists or critics of your acquaintance. (I'm glancing in AndrewHickey's direction right now.)


"The meaning of a thing is the way you should be influenced by it." - Vladimir Nesov

To explain what "Bayesian postmodernism" means, I first have to try to explain Bayesian epistemology.

  1. Probability does not exist outside in the world - it exists in your head. Things happen or not in the world; probability measures your knowledge of them.
  2. You know certain things, to some degree. New knowledge and experiences come in and affect your knowledge, pulling your degree of certainty of given ideas up or down.
  3. Bayes' Theorem is the equation describing precisely how much a new piece of information (on the probability of something holding in the world) must affect your knowledge. This is a mathematical theorem, true in the sense that 2+2=4 is true; this is a mathematical question with one right answer. If you know your "prior probability," and you know what the new information is, you know your new probability (the "posterior probability").
  4. The hard part is, of course knowing what the hell your prior actually is, to more useful specificity than "everything you think you know about everything."
  5. (Just to make it harder, the prior is not a number, but a probability distribution over a spectrum of possible alternatives.)

Bayesian epistemology is the notion of using this approach to map out the network of your degrees of certainty of your ideas and how they interact, and just how much a new idea should change your existing degrees of certainty.

The application to criticism and understanding of art should be obvious to anyone with even an enthusiast's experience in the field. (And probably not to anyone without.) Postmodernism tells us we can't be certain of anything; Bayesianism tells us precisely how uncertain we should be.

Problems:

  1. Assigning meaningful numbers is tricky. It's hard enough having some sort of feel for how certain you feel a given notion is, let alone working out how those certainties should interact with mathematical rigour.
  2. The mathematics to build a Bayesian network properly can get quite hairy. Calculus tends not to be a strong point of art critics.
  3. The subject matter is subjective internal feelings about art. Two people could build plausible yet utterly incompatible Bayesian networks of subjective feelings, even given that art is intersubjective rather than purely subjective. (There is an interesting result called Aumann's Agreement Theorem which mathematically proves that two Bayesians starting from the same data cannot "agree to disagree", at least one must be wrong - but find two art enthusiasts who start from the same life experiences with the same personal inclinations. Thus, convincing others becomes an argument about bases.)

No human who claims to be a Bayesian actually has a network mapped out in their head. They're just doing their best. But that people (a) do this and (b) get useful results from it - even in number-based fields, rather than subjective feeling-based ones - is promising.

A word on competing approaches: The model that holds that probability exists in the world, which is the version found in common everyday popular usage and which your statistics textbooks probably taught you how to use, is the frequentist approach. This is a grab-bag of tools and statistical methods to apply to the problem. The easy part is you don't have to know your precise prior. The hard part is that different methods can get different answers, of which only one (if that) can be right, so you have to know which one to apply. The entire frequentist toolkit can be mathematically derived from the Bayesian approach. The Bayesian approach is currently increasingly popular in science and economics, because it gives the right answer if you have your prior right.


If the above only requires minor fixes, I may post-edit based on comments so I can just refer people to this link.

Despite the above section being what I've posted this here for discussion of, this is going to devolve into a thread about postmodernism. So I'll answer some of the obvious here.

Post-script: No-one's coughed up their own skull in horror yet, so I assume I haven't made any glaring technical errors and, modulo a few post-edits, this'll do for now. It's still too mathematical, but diagrams may help - maybe the next version will have some.

Nor has anyone started talking about postmodernism, to my surprise.

PPS: And I'm surprised no-one's disputed "No human who claims to be a Bayesian actually has a network mapped out in their head."

26 comments

Comments sorted by top scores.

comment by Jack · 2011-08-06T23:21:20.779Z · LW(p) · GW(p)

I don't think you've done enough work to defend the introduction of the concept of post-modernism, here. I suppose postmodernism does include the proposition that one cannot be certain of anything but there is no particular reason to adopt the addition baggage that comes with the concept. The proto-typical enlightenment philosopher, Hume was already there. So what does postmodernism add? The term, without any further explication makes this post very difficult to follow as I have no idea what exactly you mean.

Moreover, it seems to me there is a rather straightforward point of tension between postmodernism and Bayesianism, namely that Bayesian prescribes a uniquely rational way of structuring belief networks, which certainly looks like the kind of self-assured epistemological approach that prototypical postmodernists would resist.

I think what you're actually seeing is the complementarity of Bayesianism and Quinean coherantism/pragmatism. The latter, I suppose is closer to the postmodern tradition than, say, logical positivism. But it is a far, far, cry from the Rortian position that the only constraints on knowledge are conversational and socially constructed. Is your position that the restrictions Bayesianism places on the beliefs we hold are merely socially constructed?

Which is to say, if you want to examine the relationship between Bayesianism and other philosophies it really helps to actually identify the other philosophy instead of just talking about the vague and routinely contradictory clusterfuck that is postmodernism.

And in the right light Derrida can look like anything.

Replies from: David_Gerard
comment by David_Gerard · 2011-08-07T16:10:24.303Z · LW(p) · GW(p)

I don't think I have either. Some variety of it was my starting point. And it's an eyecatching name, which may or may not be a feature (depending, I think, on how well I can justify it).

It is quite possible that I am just completely full of shit in this endeavour. It is also true that I could probably brazen my way through that.

As I attempt to level up in PM, I am becoming quite the non-fan of Derrida.

comment by RobertLumley · 2011-08-06T13:49:09.745Z · LW(p) · GW(p)

Good post, I liked most of it. The only part that stood out to me was this:

There is no reason two people could not build plausible yet utterly incompatible Bayesian networks of subjective feelings, even given that art is intersubjective rather than purely subjective.

That's a double negative, and would be much clearer if you rephrased it "Two people can build plausible yet utterly incompatible...", or something to that effect

Replies from: David_Gerard
comment by David_Gerard · 2011-08-06T15:58:25.862Z · LW(p) · GW(p)

Good one, thank you!

comment by torekp · 2011-08-06T13:47:23.586Z · LW(p) · GW(p)

I recommend using diagrams as much as possible, like this.

Replies from: David_Gerard
comment by David_Gerard · 2011-08-06T15:58:06.600Z · LW(p) · GW(p)

I predict the intended audience's heads would explode.

Replies from: None
comment by [deleted] · 2011-08-06T17:38:44.747Z · LW(p) · GW(p)

It's certainly possible to use simple Venn Diagrams when explaining Bayes' Theorem, and doing so actually broadens the appeal of your article because it makes it more accessible to visual learners.

Replies from: torekp, David_Gerard
comment by torekp · 2011-08-07T14:31:36.451Z · LW(p) · GW(p)

That's stunningly beautiful, and much better than what I had found just Googling a little.

comment by David_Gerard · 2011-08-06T18:17:16.668Z · LW(p) · GW(p)

Hmm, you're right. And the audience I was thinking of was people who, like myself, tend to visualise these things (though the diagrams in my head are maddeningly vague when I try to capture them).

My deeper pedagogical problem, though, is that equations - any equations, even "2+2=4" is pushing it - generate ugh fields. This is more than a little problematic when explaining mathematical concepts.

comment by dbaupp · 2011-08-06T10:19:49.624Z · LW(p) · GW(p)

Bayesian postmodernism also seems like a reasonable way of describing this idea. Although, I feel that the explanation you give is still overly technical. (I think the biggest example is talking about "prior" without explanation, as well as "probability distribution").

On a editing note, I can't parse this sentence:

The hard part is, of course knowing what the hell your prior actually is, to more useful specificity than "everything you think you know about everything."

Replies from: David_Gerard
comment by David_Gerard · 2011-08-06T10:29:29.585Z · LW(p) · GW(p)

Yeah, there's still way too much maths in it in conceptual form, even if the only equation is "2+2=4".

The sentence is intended to explain (or describe) the concept that Bayes always gives the right answer (since it's a theorem), but the hard part is knowing what the prior is with usable levels of detail. Particularly since we're talking about subjective experiences. There's a bit of knowing thyself too - in real life, your prior is "everything you know about everything", but your prior as you know it is "everything that you think you know about everything" - which is the same thing without unknown knowns (cultural and cognitive biases). "Know thyself" can be rephrased "what is my prior?"

I used "prior" because it's the correct term and worth teaching, and I feed it to my mental black box in this context and it tells me it sounds useful here. May have to expand on this. "Probability distribution" is unashamedly technical and I used it here to say "this is actually hard", but yes, expanding on it may be a good idea too. Or at least peppering it with Wikipedia links, which I'll just do now.

Of course, in general, and filling in all the details, the above could easily be expanded to book length.

Replies from: dbaupp
comment by dbaupp · 2011-08-06T11:02:41.864Z · LW(p) · GW(p)

Ah, that sentence makes sense, I just couldn't work out the syntax ("specificity" just didn't seem to fit in). My problem, not yours :)

I didn't explain it clearly at all, but the point I was trying to make was that there is quite a bit of "technical" language/jargon which acts as a stopsign (and/or induces blankness). "Prior" doesn't really fit into this category, but "probability distribution" does, and the Wikipedia article (probably) doesn't really help people from your target audience.

I would suggest removing the term completely (or maybe having the technical term in a parenthetical statement) e.g.

The prior is not a number, but a measurement of the probability of each of a spectrum of possible alternatives

comment by Xom · 2011-11-07T22:13:51.695Z · LW(p) · GW(p)

This is (morbidly) fascinating, please keep at it.

Replies from: David_Gerard, David_Gerard
comment by David_Gerard · 2011-11-27T11:54:16.583Z · LW(p) · GW(p)

Your comment inspired me to buy a notebook and dive back into Derrida. It's sort of painful.

comment by David_Gerard · 2011-11-26T12:24:31.406Z · LW(p) · GW(p)

:-D "Morbid fascination" is about how I feel about it. Bits of this stuff are still buzzing about in my head and I'll see if I can get more into written form. Not promising any time frame, you understand. But I do suspect I should be progressing rather faster ...

comment by fubarobfusco · 2011-08-06T23:07:24.671Z · LW(p) · GW(p)

One possible connection between "postmodern" (i.e. recent Continental) philosophy and Bayesian rationality may be found in the notion of "embodied philosophy" or "embodied cognition" — e.g. the work of George Lakoff in linguistics and Hans Moravec in robotics.

This is on my mind recently because I've been reading Lakoff's Women, Fire, and Dangerous Things.

Take meaning, for instance — the assignment of referents to symbols. The classical view of meaning is that there are objectively true meanings that are discovered, out there in the world. The embodied view of meaning is that meaning is necessarily subjective; there are no "God's-eye view" meanings: meaning only takes place in minds in bodies, which arrive at meanings not by objectively observing an exterior world, but by participating in the world.

(This connects to the Bayesian view of causality, at least so far as I understand it: reasoning about causation involves reasoning about interventions and not merely about observations. Observed correlation can only tell us about statistical, rather than causal, regularities; in order to discover authentic causes, we have to consider intervention.)

That meaning is subjective does not mean that it is arbitrary, or that you get to come up with whatever meanings you like and give them equal validity to meanings assigned through processes such as language acquisition, science, or social construction. Like Bayesian probability, meaning is subjectively objective: it takes place only inside (world-involved) minds, but you can still do it wrong by not paying attention to the world.

Replies from: Manfred, Jack
comment by Manfred · 2011-08-08T01:27:04.725Z · LW(p) · GW(p)

(This connects to the Bayesian view of causality, at least so far as I understand it: reasoning about causation involves reasoning about interventions and not merely about observations. Observed correlation can only tell us about statistical, rather than causal, regularities; in order to discover authentic causes, we have to consider intervention.)

You should read the first few chapters of Causality by Judea Pearl. He details how you can get causal information from static data (if you ignore "just so" correlations with measure 0). Causality (both the book and the pattern) is cool.

comment by Jack · 2011-08-07T02:36:17.694Z · LW(p) · GW(p)

(This connects to the Bayesian view of causality, at least so far as I understand it: reasoning about causation involves reasoning about interventions and not merely about observations. Observed correlation can only tell us about statistical, rather than causal, regularities; in order to discover authentic causes, we have to consider intervention.)

This isn't a "Bayesian" view of causality. You don't have to be Bayesian to be a manipulationist and you don't have to be a manipulationist to be a Bayesian.

comment by [deleted] · 2011-08-06T17:03:46.967Z · LW(p) · GW(p)

.

Replies from: David_Gerard
comment by David_Gerard · 2011-08-06T17:29:48.837Z · LW(p) · GW(p)

I'm not sure what I'm trying to do either. I've noticed what looks like an interesting correspondence that seems to potentially offer a huge amount but which I've no idea how to turn to any actually useful purpose. This is the bit that may be gold or may just be tasty, tasty crack.

Replies from: None
comment by [deleted] · 2011-08-06T17:55:42.222Z · LW(p) · GW(p)

.

Replies from: David_Gerard
comment by David_Gerard · 2011-08-06T18:15:00.557Z · LW(p) · GW(p)

The meaning of "good" would be just the sort of thing I'd like to get from this. Not necessary the kind of good, but something that let me do something with how I feel about a given text's utility for a particular purpose. ("Text" in the PM jargon sense of "any subject matter whatsoever".)

I have vague ideas of turning star ratings (one to five stars) into numbers (say, 0.1 to 0.9). So three stars would mean 0.5, i.e. "I have literally no idea if this is good or not." Except that my prior for the value of any random record is 0.1-0.2, i.e. most music is rubbish. So this leads me to be wary of premature arithmetism - just because you have a system that lets you put a number on something in no way implies that you have any idea what the hell you're talking about. On the other hand, trying some numerical systems and seeing if any of them feel useful will obviously be necessary. While reminding myself that any given system may be completely full of shit. Welcome to the rabbit hole!

Replies from: None
comment by [deleted] · 2011-08-06T19:05:09.824Z · LW(p) · GW(p)

.

comment by [deleted] · 2011-08-07T04:22:25.776Z · LW(p) · GW(p)

Sooner attempt to set up a tent with both hands tied behind your back.

Replies from: David_Gerard
comment by David_Gerard · 2011-11-27T11:59:08.126Z · LW(p) · GW(p)

I don't actually expect anyone to follow me into this vale of crack unless and until I come back with something plausibly resembling a result.