Learn Bayes Nets!
post by abramdemski · 2018-03-27T22:00:11.632Z · LW · GW · 8 commentsContents
8 comments
It recently occurred to me that there are a lot of people in the rationalist community who want to deeply absorb intuitions about how Bayes' theorem works and how to think with it in practice, who have not been specifically told that learning inference algorithms for Bayesian networks is one of the best ways forward.
Well, I'm telling you now.
Bayesian networks were the innovation which made probabilistic reasoning really practical and interesting for artificial intelligence -- and none of the reasons for that are special to trying to squeeze intelligence into a computer. They're also, more or less, describing the way people have to think in order to do probabilistic reasoning in practice. There have been many innovations in probabilistic reasoning since Bayes nets, but those are arguably more about how to get good results on a computer and less about fundamental conceptual issues that you'll get a lot from.
I would argue that the most important inference algorithm to learn about to get practical intuitions is belief propagation. There are others who would argue for monte carlo algorithms, like MCMC (monte carlo markov chain). You may want to learn both, to form your own opinion (and of course, there are many more algorithms beyond this which you may want to learn, in order to gain more connections so your knowledge of the field sticks, and gain more insights). Belief prop and MCMC are more or less the first two algorithms people thought of; there are a lot of newer developments, but they're largely elaborations.
Here is what I claim you can get out of it, through careful study:
- Understanding how bayesian networks define probability distributions, and how probabilities spread through the network via belief propagation, makes your understanding of Bayes' theorem and probability theory in general much more "load bearing" -- so it'll break under the strain if it isn't solid (which is a good thing).
- It'll give you a useful fake model of what you're doing when you're thinking. Global Bayesian updates don't just happen; they result from (something like) local updates which you have to spread across your web of beliefs, partly through conscious attention and cogitation.
- It's also a useful analogy for aspects of group epistemics, like avoiding double counting as messages pass through the social network.
- The messages which pass between nodes in belief prop fall into two types: probabilities and likelihoods. This is a deep truth; it's the "dimensional analysis" of probabilistic reasoning. Several cognitive biases can be seen as confusion between probabilities and likelihoods, most centrally base-rate neglect.
- Understanding probability vs likelihood messages also gives a nice general understanding of the way "prior" and "posterior" are local ideas which only make sense with respect to a "frame of reference".
- Bayesian networks also lay the foundation for a formal understanding of causality, if that's something you're interested in.
I think the best thing to read, to get up to speed, is the first four chapters of Pearl's Probabilistic Reasoning in Intelligent Systems. It's the original source; Pearl didn't invent everything, but he invented a lot, and he's the first who put it all together. There are better modern introductions for people who want to apply bayesian networks in machine learning, but because Pearl was writing at a time when the use of probability theory was not widely accepted in artificial intelligence, he goes into the philosophy of the subject in a way newer sources don't. I think this is good for the LessWrong audience.
It would be even better, of course, if someone were to write a sequence explaining everything from a more specifically LessWrong perspective, drawing out the implications I mentioned above. Alas, I don't have that much time to spend on writing (which is to say, I have other higher-value things to do, in my current estimation).
One might also derive a more general lesson on the relevance of algorithms to rationality [LW · GW], and go read Artificial Intelligence: A Modern Approach as a rationality textbook. [LW · GW]
8 comments
Comments sorted by top scores.
comment by riceissa · 2021-02-24T20:28:07.557Z · LW(p) · GW(p)
For people who find this post in the future, Abram discussed several of the points in the bullet-point list above in Probability vs Likelihood [LW · GW].
comment by Qiaochu_Yuan · 2018-03-28T16:17:07.725Z · LW(p) · GW(p)
It would be even better, of course, if someone were to write a sequence explaining everything from a more specifically LessWrong perspective, drawing out the implications I mentioned above. Alas, I don't have that much time to spend on writing (which is to say, I have other higher-value things to do, in my current estimation).
+1; generally in favor of people who have interesting ideas but too many other competing ideas to execute them to just share those ideas and see if anyone else wants to pick them up. This strikes me as a particularly good "homework assignment" for someone who just really, really wants to grok Bayes nets.
Replies from: kinranycomment by TurnTrout · 2018-03-28T01:06:47.597Z · LW(p) · GW(p)
Very much agree with this post. In my opinion, ch. 14 (Bayes nets) was the most important chapter in AI: AMA; I actually did every single non-programming exercise for that chapter, coming back later to ensure I could redo those I got wrong. I'd "learned" about probability by reading, but it wasn't until I put my nose to the grindstone and did the math that I started being able to see probability flowing through the networks, that I got S1 intuitions for the difference between independence and conditional independence.
Of course, that was just a chapter - hardly "careful study"; I'm very much looking forward to Pearl.
comment by Gram Stone · 2018-03-27T23:45:38.105Z · LW(p) · GW(p)
It's also a useful analogy for aspects of group epistemics, like avoiding double counting as messages pass through the social network.
Fake Causality [LW · GW] contains an intuitive explanation of double-counting of evidence.
Replies from: abramdemski↑ comment by abramdemski · 2018-03-28T01:12:05.446Z · LW(p) · GW(p)
Yeah, and it uses the same analogy for understanding belief propagation as Pearl himself uses, and a reference to Pearl, and a bit more discussion of Bayes nets as a good way to understand things. But, I think, a lot of people didn't derive the directive "Learn Bayes nets!" from that example of insight derived from Bayes nets (and would benefit from going and doing that).
I do think there are some other intuitions lurking in Bayes net algorithms which could benefit from a similar write-up to Fake Causality, but which went "all the way" in terms of describing Bayes nets, rather than partially summarizing.
comment by Viktor Riabtsev (viktor-riabtsev) · 2018-03-28T16:18:40.991Z · LW(p) · GW(p)
Ordered Probabilistic Reasoning in Intelligent Systems . Looking forward to reading it.
comment by David James (david-james) · 2024-06-14T02:30:56.921Z · LW(p) · GW(p)
Surprisingly, perhaps, https://dl.acm.org/doi/book/10.5555/534975 has a free link to the full-text PDF.