The Equation of Knowledge
post by Lê Nguyên Hoang (le-nguyen-hoang-1) · 2020-07-07T16:09:10.367Z · LW · GW · 3 commentsContents
Pure Bayesianism Applied Bayesianism Pragmatic Bayesianism Beyond Bayesianism None 3 comments
My book The Equation of Knowledge has just been published at CRC Press, and I'm guessing that it may be of interest to readers of LessWrong. The book aims to be a somewhat accessible and very complete introduction to Bayesianism. No prior knowledge is needed; though some sections require an important familiarity with mathematics and computer science. The book has been designed so that these sections can be skipped without hindering the reading.
The aim of the book is to (1) highlight the most compelling arguments, theorems and empirical evidence in favor of Bayesianism, (2) present numerous applications in a very wide variety of domains, and (3) discuss solutions for pragmatic Bayesianism with limited computational resources. Please find here a promotional 5-minute video of the book.
In this post, I will briefly sketch the outline of the book. Just like the book, I'll divide the post in four sections.
Pure Bayesianism
The first section of the book is a gentle introduction to pure Bayesianism, which is defined as obeying strictly to the laws of probability theory. The key equation is evidently Bayes rule, which I like to write as follows:
.
This equation says that the critical variable is , that is, the credence of theory given data . Computing this is arguably the end goal of Bayes rule. Bayes rule thus does not quite aim to distinguish truth from falsehood; it rather motivates us to assign quantitative measures of reliability to different theories, given observed data. It suggests that we should replace questions like "is true?" by "how credible is ?" (or perhaps even by "how much should I trust the predictions of theory ?"). I argue in the book that this is a great way to improve the quality of many debates.
Bayes rule then goes on telling us how to compute the credence of a theory given empirical data. Importantly, on the right hand side, we have the term which measures the credence of the theory prior to the observation of data . This is critical. A theory which was extremely unlikely before we knew will likely remain unlikely even given , unless D is overwhelmingly compelling. This corresponds to Carl Sagan's phrase "extraordinary claims require extraordinary evidence" (which was analyzed mathematically by Laplace back in 1814!).
Bayes rule then tells us to update our prior beliefs based on observed data depending on how well theory predicts data . Essentially, we can see any theory as a betting individual. If bets on , which corresponds to a large value of , then it should gain credence in . But if theory found observed data unlikely (i.e. ), then we should decrease our belief in once we observe .
Well, actually, Bayes rule tells us that this update also depends on how well alternative theories perform. Indeed, the denominator orchestrates a sort of competition between the different theories. In particular, the credence of theory will be decreasing only if its bet is outperformed by the bets of alternative theories . In particular, this means that Bayes rule forbids the analysis of a theory independently of others; the credence of a theory is only relative to the set of alternatives.
Chapters 2 to 5 of the book details the analysis of Bayes rule, and illustrates it through a large number of examples, like Sally Clark's infamous lawsuit, Hempel's raven paradox, Einstein's discovery of general relativity and the Linda problem, among many other examples. They also draw connections and tensions with first-order logic, Popper's falsifiability and null hypothesis statistical tests.
Chapter 6 then discusses the history of Bayesianism, which also hints at the importance of probability theory in essentially all human endeavors. Finally, Chapter 7 concludes the first part of the book, by introducing Solomonoff's induction [LW · GW], which I call pure Bayesianism. In brief, Bayes rule requires any theory to bet on any imaginable observable data (formally, needs to define a probability measure on the space of data, otherwise the quantity is ill-defined). Solomonoff's genius was to simply also demand this bet to be computable. It turns out that the rest of Solomonoff's theory essentially beautifully falls out from this simple additional constraint.
Evidently, a lot more explanations and details can be found in the book!
Applied Bayesianism
The second section of the book goes deeper into applications of Bayesianism to numerous different fields. Chapter 8 discusses the strong connection between Bayesianism and privacy. After all, if Bayesianism is the right theory of knowledge, it is clearly critical to any theory on how to prevent knowledge. And indeed, the leading concept of privacy, namely differential privacy, has a very natural definition in terms of probability theory.
Chapter 9 dwells on the strong connection between Bayesianism and economics, and in particular game theory. Nobel prize winner Roger Myerson once argued that "the unity and scope of modern information economics was found in Harsanyi’s framework". Again, this can be made evident by the fact that much of modern economics focuses on the consequences of incomplete (e.g. asymmetric) information.
Chapter 10 moves on to the surprisingly strong connections between Darwinian evolution and Bayes rule. In particular, the famous Lotka-Volterra equations for population dynamics features an intriguing resemblance with Bayes rule. This resemblance is then exploited to discuss to which extent the spread of ideas within the scientific community can be compared to the growth of the credence in a theory for a Bayesian. This allows to identify reliable rules of thumbs to determine when a scientific consensus or a (predictive) market prize is credible, and when they are less so.
Chapter 11 discusses exponential growths, which emerge out of repeated multiplications. Such growths are critical to understand to have an intuitively feel for Bayes rule, as repeated Bayesian updates are typically multiplicative. The chapter also draws a fascinating connection between the multiplicative weights update algorithm and variants like Adaboost, and Bayes rule. It argues that the success of these methods is no accident; and that their late discovery may be due to mathematicians' poor intuitive understanding of exponential growth.
Chapter 12 presents numerous applications of Ockham's razor to avoid erroneous conclusions. It also shows that the practical usefulness of Ockham's razor is intimately connected to the importance of priors in Bayesian thinking, as evidenced by the compelling theorem that says that, under mild assumptions, only Bayesian methods are "statistically admissible". Finally, the chapter concludes with another stunning theorem: it can be proved in one line that a version of Ockham's razor is a theorem under Bayesianism (I'll keep this one line secret to tease you!).
Chapter 13 then stresses the danger of Simpson's paradox and the importance of confounding variables when analyzing empirical uncontrolled data. After discussing the value and limits of randomized controlled tests, I then reformulate the necessary analysis of plausible confounding variables for data analysis as the unavoidability of priors to think correctly. The chapter closes with some philosophical discussions on the ontology of these confounding variables.
Pragmatic Bayesianism
Unfortunately, pure Bayesianism demands unreasonable computational capabilities. Nor our brains nor our machines have access to such capabilities. As a result, in practice, pure Bayesianism is doomed to fail. In other words, we cannot obey strictly the laws of probability. We'll have to content ourselves with approximations of these laws.
Chapter 14 contextualizes this strategy under the more general theory of computational complexity. It gives numerous examples where this strategy has been used, for instance to study prime numbers or Ramsey theory. It also presents Turing's 1950 compelling argument for the need of machine learning to achieve human-level AI, based on computational complexity. The chapter also draws connection with Kahneman's System 1 / System 2 model.
Chapter 15 then stresses the need to embrace (quantitative) uncertainty. It provides numerous arguments for why this uncertainty will always remain, from chaos theory to quantum mechanics, statistical mechanics and automata with irreducible computations. It then discusses ways to measure success under uncertainty, using cross-entropy for instance, or more general proper scoring rules [LW · GW]. Finally it draws connections with modern machine learning, in particular generative adversarial networks (GANs).
Chapter 16 then discusses the challenges posed by having limited information storage spaces, both from a computational and from a cognitive perspective. The chapter discusses things like Kalman filters, false memory, recurrent neural network, attention mechanisms and what should be taught in our modern world, where we can now exploit much better information storage systems than our brains.
Chapter 17 discusses approximations of Bayes rule using sampling. It is a gentle introduction to Monte-Carlo methods, and then to Markov Chain Monte-Carlo (MCMC) methods. It then argues that our brains probably run MCMC-like algorithms, and discusses the consequences on cognitive biases. Indeed, MCMC only has asymptotic guarantees; but if MCMC does not run for long, it will be heavily biased by its starting point. Arguably, something similar occurs in our brains.
Chapter 18 addresses a fundamental question of epistemology, namely the unreasonable effectiveness of abstraction. This chapter draws heavily on theoretical computer science, and in particular on Kolmogorov sophistication and Bennett logical depth, to suggest explanations of the success of abstractions based on computational properties of our current universe. It is interesting to note that, in the far past or the very far future, the state of the universe may be such that deep abstraction would be unlikely to remain useful (and thus "effective").
Chapter 19 introduces the Bayesian brain hypothesis, and the numerous fascinating recent discoveries of cognitive sciences in this regard. Amazingly, Bayes rule has been suggested again and again to explain our vulnerability to optical illusions, our ability to generalize from few examples or babies' learning capabilities. The Bayesian perspective has fascinating consequences on the famous Nature vs Nurture debate.
Beyond Bayesianism
The last section of the book takes a bit of distance from Bayesianism, though it is still strongly connected to the laws of probability. Chapter 20 discusses what I argue to be natural consequences of pure Bayesian thinking on scientific realism. In particular, it argues that theories are mostly tools to predict past and future data. As a result, it seems pointless to argue about the truth of their components; what matters rather seems to be the usefulness of thinking with these components. I discuss consequences on how we ought to discuss concepts like money, life or electrons.
Chapter 21 is my best effort to encourage readers to question their most strongly held beliefs. It does so by providing the examples of my own journey, and by stressing the numerous cognitive biases that I have been suffering. It then goes on underlining what seems to me to be the key reasons of my progress towards Bayesianism, namely the social and informational environment I have been so lucky to end up in. Improving this environment may indeed be key for anyone to question their most strongly held beliefs.
Finally, Chapter 22 briefly goes beyond epistemology to enter the realm of moral philosophy. After discussions on the importance of descriptive moral theories to understand human interactions, the chapter gives a brief classical introduction of the main moral theories, in particular deontology and utilitarianism. It then argues that consequentialism somehow generalizes these theories, but that only Bayesian consequentialism is consistent with the laws of probability. It then illustrates decision-making under Bayesian consequentialism with examples, and stresses the importance of catastrophic events, as long as their probability is not sufficiently negligible.
One last thing I'd add is that I have made a lot of effort to make the book enjoyable. It is written in a very informal style, often with personal examples. I have also made a lot of effort to share complex ideas with a lot of enthusiasm, not because it makes them more convincing, but because it seems necessary to me to motivate the readers to really ponder these complex ideas.
Finally, note that French-speaking readers can also watch the series of videos I've made on Bayesianism on YouTube!
3 comments
Comments sorted by top scores.
comment by Charbel-Raphaël (charbel-raphael-segerie) · 2020-07-07T22:42:13.163Z · LW(p) · GW(p)
I read this book two years ago when it was published in French. I found it incredibly exciting to read, and that's what motivated me to discover this site and then move on to a master's degree in machine learning.
This book saved me a lot of time in discovering Bayesianism, and made a much deeper change in my way of thinking than if I had simply read a textbook of Bayesian machine learning.
I am of course happy to have read the sequences, but I think I am lucky to have started with the equation of knowledge which is much shorter to read and which provides the theoretical assurances, motivation, main tools, enthusiasm and pedagogy to engage in the quest for Bayesianism.
comment by Apodosis · 2020-07-07T19:50:36.292Z · LW(p) · GW(p)
Thank you for sharing your work, I am excited to take a look at it at some point in the next few days. Based on the synopsis you’ve provided, it seems as if you have traveled down (and more productively so) many of the same inferential avenues as myself which arise in the process of identification and application of Bayesian methods in the material and social world. In particular I have been fascinqged with the game theoretic consequences And higher order behaviors which must arise in multi agent interactions where actors are characterized by state and strategy updates driven by approximate Bayesian rules. Of all existing work I have found in that arena, Epistemic game theory has been by far the most exciting. I am still working through the foundations of the theory but I believe it to be powerfully descriptive for mutually observable social interactions among groups. I would love to hear your thoughts if you happen to have encountered the idea, and best of luck on your new book!
Replies from: le-nguyen-hoang-1↑ comment by Lê Nguyên Hoang (le-nguyen-hoang-1) · 2020-07-09T06:55:01.336Z · LW(p) · GW(p)
Hi Apodosis, I have done my PhD in Bayesian game theory, so this is a topic close to my heart ˆˆ There are plenty of fascinating things to explore in the study of interactions between Bayesians. One important finding of my PhD was that, essentially, Bayesians end up playing (stable) Bayes-Nash equilibria in repeated games, even if the only feedback they receive is their utility (and in particular even if the private information of other players remain private). I also studied Bayesian incentive-compatible mechanism design, i.e. coming up with rules that incentivize Bayesians' honesty. The book also discusses interesting features of interactions between Bayesians, such as Aumann-Aaronson's agreement theorem or Bayesian persuasion (i.e. maximizing a Bayesian judge's probability of convicting a defendant by optimizing what investigations should be persued). One research direction I'm interested in is that a Byzantine Bayesian agreement, i.e. how much a group of honest Bayesians can agree if they are infiltrated by a small number of malicious individuals, though I have not yet found the time to dig this topic further. A more empirical challenge is to determine how well these Bayesian game theory models fit the description of human (or AI) interactions. Clearly, we humans are not Bayesians. We have some systematic cognitive biases (and even powerful AIs may also have systematic biases, since they won't be running Bayes rule exactly!). How can we best model and predict humans' divergence from Bayes rule? There has been a lot of spectacular advance in cognitive sciences in this regard (check out Josh Tenenbaum's work for instance), but there's definitely a lot more to do!