Posts
Comments
Have you read Minsky's _Society of Mind_? It is an AI-flavored psychological model of subagents that draws heavily on psychotherapeutic ideas. It seems quite similar in flavor to what you propose here. It inspired generations of students at the MIT AI Lab (although attempts to code it never worked out).
Quote from Richard Feynman explaining why there are no objects here.
I've begun a STEM-compatible attempt to explain a "no objectively-given objects" ontology in "Boundaries, objects, and connections." That's supposed to be the introduction to a book chapter that is extensively drafted but not yet polished enough to publish.
Really glad you are working on this also!
I would not say that selves don't exist (although it's possible that I have done so somewhere, sloppily).
Rather, that selves are both nebulous and patterned ("empty forms," in Tantric terminology).
Probably the clearest summary of that I've written so far is "Selfness," which is supposed to be the introduction to a chapter of the Meaningness book that does not yet otherwise exist.
Renouncing the self is characteristic of Sutrayana.
[FWOMP Summoned spirit appears in Kaj's chalk octogram. Gouts of eldritch flame, etc. Spirit squints around at unfamiliar environment bemusedly. Takes off glasses, holds them up to the candlelight, grimaces, wipes glasses on clothing, replaces on nose. Grunts. Speaks:]
Buddhism is a diverse family of religions, with distinct conceptions of enlightenment. These seem to be quite different and contradictory.
According to one classification of disparate doctrines, Buddhism can be divided into Vajrayana (Tantra plus Dzogchen) and Sutrayana (everything else, except maybe Zen). In this classification, Sutrayana aims at "emptiness," which is a generalization of the Three Marks, including anatman (non-self). The central method of Sutrayana is renunciation. Renunciation of the self is a major aspect. For Sutrayana, clear sustained perception of anatman (or emptiness more generally) is enlightenment, by definition.
For Buddhist Tantra, experience of emptiness is the "base" or starting point. That's the sense in which "enlightenment is the prerequisite"—but it's enlightenment as understood in Sutrayana. Whereas Sutrayana is the path from "form" (ordinary appearances) to emptiness, Tantra is the path from emptiness to the non-duality of emptiness and form. The aim is to perceive everything as both simultaneously. That non-dual vision is the definition of enlightenment within Tantra. The "duality" referred to here is the duality between emptiness and form, rather than the duality between self and other—which is what is overcome in Sutrayana. The non-dual vision that is the end-point of Tantra, is then the base or starting point for Dzogchen.
(Probably the best thing I've written about this is "Beyond Emptiness: Zen, Tantra, and Dzogchen." It may not be very clear but I hope at least it is entertaining. "Sutra, Tantra, and the Modern Worldview" is less fun but more concrete.)
seeing that the self is an arbitrary construct which you don't need to take too seriously, can enable you to play with it in a tantric fashion
Yes, this is a Vajrayana viewpoint. For Sutrayana, the self is non-existent, or at least "empty"; for Vajrayana, it is empty form. That is, "self" is a label applied to various phenomena, which overall are found to be insubstantial, transient, boundaryless, discontinuous, and ambiguous—and yet which exhibit heft, durability, continence, extension, and specificity. This mild paradox is quite amusing—a starting point for tantric play.
I'll say a bit more about "self" in response to Sarah Constantin's comment on this post.
Glad you liked the post! Thanks for pointing out the link problem. I've fixed it, for now. It links to a PDF of a file that's found in many places on the internet, but any one of them might be taken down at any time.
A puzzling question is why your brain doesn't get this right automatically. In particular, deciding whether to gather some food before sleeping is an issue mammals have faced in the EEA for millions of years.
Temporal difference learning seems so basic that brains ought to implement it reasonably accurately. Any idea why we might do the wrong thing in this case?
Are there any psychoactive gases or aerosols that drive you mad?
I suppose a psychedelic might push someone over the edge if they were sufficiently psychologically fragile. I don't know of any substances that specifically make people mad, though.
One aspect of what I consider the correct solution is that the only question that needs to be answered is "do I think putting a coin in the box has positive or negative utility", and one can answer that without any guess about what it is actually going to do.
What is your base rate for boxes being able to drive you mad if you put a coin in them?
Can you imagine any mechanism whereby a box would drive you mad if you put a coin in it? (I can't.)
Excellent! This is very much pointing in the direction of what I consider the correct general approach. I hadn't thought of what you suggest specifically, but it's an instance of the general category I had in mind.
Thanks for the encouragement! I have way too many half-completed writing projects, but this does seem an important point.
Oh, goodness, interesting, you do think I'm evil!
I'm not sure whether to be flattered or upset or what. It's kinda cool, anyway!
Well, the problem I was thinking of is "the universe is not a bit string." And any unbiased representation we can make of the universe as a bit string is going to be extremely large—much too large to do even sane sorts of computation with, never mind Solomonoff.
Maybe that's saying the same thing you did? I'm not sure...
I can't guarantee you won't get blown up
Yes—this is part of what I'm driving at in this post! The kinds of problems that probability and decision theory work well for have a well-defined set of hypotheses, actions, and outcomes. Often the real world isn't like that. One point of the black box is that the hypothesis and outcome spaces are effectively unbounded. Trying to enumerate everything it could do isn't really feasible. That's one reason the uncertainty here is "Knightian" or "radical."
In fact, in the real world, "and then you get eaten by a black hole incoming near the speed of light" is always a possibility. Life comes with no guarantees at all.
Often in Knightian problems you are just screwed and there's nothing rational you can do. But in this case, again, I think there's a straightforward, simple, sensible approach (which so far no one has suggested...)
Hmm... given that the previous several boxes have either paid $2 or done nothing, it seems like that primes the hypothesis that the next in the series also pays $2 or does nothing. (I'm not actually disagreeing, but doesn't that argument seem reasonable?)
To answer this we engage our big amount of human knowledge about boxes and people who hand them to you.
Of comments so far, this comes closest to the answer I have in mind... for whatever that's worth!
Part of the motivation for the black box experiment is to show that the metaprobability approach breaks down in some cases. Maybe I ought to have made that clearer! The approach I would take to the black box does not rely on metaprobability, so let's set that aside.
So, your mind is already in motion, and you do have priors about black boxes. What do you think you ought to in this case? I don't want to waste your time with that... Maybe the thought experiment ought to have specified a time limit. Personally, I don't think enumerating things the boxes could possibly do would be helpful at all. Isn't there an easier approach?
The evidence that I didn't select it at random was my saying “I find this one particularly interesting.”
I also claimed that "I'm probably not that evil." Of course, I might be lying about that! Still, that's a fact that ought to go into your Bayesian evaluation, no?
Yes, I'm not at all committed to the metaprobability approach. In fact, I concocted the black box example specifically to show its limitations!
Solomonoff induction is extraordinarily unhelpful, I think... that it is uncomputable is only one reason.
I think there's a fairly simple and straightforward strategy to address the black box problem, which has not been mentioned so far...
That's good, yes!
How would you assign a probability to that?
So... you think I am probably evil, then? :-)
I gave you the box (in the thought experiment). I may not have selected it from Thingspace at random!
In fact, there's strong evidence in the text of the OP that I didn't...
This is interesting—it seems like the project here would be to construct a universal, hierarchical ontology of every possible thing a device could do? This seems like a very big job... how would you know you hadn't left out important possibilities? How would you go about assigning probabilities?
(The approach I have in mind is simpler...)
Well, regardless of the value of metaprobability, or its lack of value, in the case of the black box, it doesn't seem to offer any help in finding a decision strategy. (I find it helpful in understanding the problem, but not in formulating an answer.)
How would you go about choosing a strategy for the black box?
Well, I hope to continue the sequence... I ended this article with a question, or puzzle, or homework problem, though. Any thoughts about it?
So, how would you analyze this problem, more specifically? What do you think the optimal strategy is?
Hi, I have a site tech question. (Sorry if this is the wrong place to post that!—I couldn't find any other.)
I can't find a way to get email notifications of comment replies (i.e. when my inbox icon goes red). If there is one, how do I turn it on?
If there isn't one, is that a deliberate design feature, or a limitation of the software, or...?
Thanks (and thanks especially to whoever does the system maintenance here—it must be a big job.)
Then why use it instead of learning the standard terms and using those?
The standard term is A_p, which seemed unnecessarily obscure.
Re the figure, see the discussion here.
(Sorry to be slow to reply to this; I got busy and didn't check my LW inbox for more than a month.)
Thank you very much—link fixed!
That's a really funny quote!
Multi-armed bandit problems were intractable during WWII probably mainly because computers weren't available yet. In many cases, the best approach is brute force simulation. That's the way I would approach the "blue box" problem (because I'm lazy).
But exact approaches have also been found: "Burnetas AN and Katehakis MN (1996) also provided an explicit solution for the important case in which the distributions of outcomes follow arbitrary (i.e., nonparametric) discrete, univariate distributions." The blue box problem is within that class.
Thanks, yes! I.e. who is this "everyone else," and where do they treat it the same way Jaynes does? I'm not aware of any examples, but I have only a basic knowledge of probability theory.
It's certainly possible that this approach is common, but Jaynes wasn't ignorant, and he seemed to think it was a new and unusual and maybe controversial idea, so I kind of doubt it.
Also, I should say that I have no dog in this fight at all; I'm not advocating "Jaynes is the greatest thing since sliced bread", for example. (Although that does seem to be the opinion of some LW writers.)
Can you point me at some other similar treatments of the same problem? Thanks!
Thanks, that's really funny! "On the other hand" is my general approach to life, so I'm happy to argue with myself.
And yes, I'm steelmanning. I think this approach is an excellent one in some cases; it will break down in others. I'll present a first one in the next article. It's another box you can put coins in that (I'll claim) can't usefully be modeled in this way.
Here's the quote from Jaynes, by the way:
What are we doing here? It seems almost as if we are talking about the ‘probability of a probability’. Pending a better understanding of what that means, let us adopt a cautious notation that will avoid giving possibly wrong impressions. We are not claiming that P(Ap|E) is a ‘real probability’ in the sense that we have been using that term; it is only a number which is to obey the mathematical rules of probability theory.
Yes, meta-probabilities are probabilities, although somewhat odd ones; they obey the normal rules of probability. Jaynes discusses this in his Chapter 18; his discussion there is worth a read.
The statement "probability estimates are not, by themselves, adequate to make rational decisions" was meant to describe the entire sequence, not this article.
I've revised the first paragraph of the article, since it seems to have misled many readers. I hope the point is clearer now!
Are you claiming there's no prior distribution over sequences which reflects our knowledge?
No. Well, not so long as we're allowed to take our own actions into account!
I want to emphasize—since many commenters seem to have mistaken me on this—that there's an obvious, correct solution to this problem (which I made explicit in the OP). I deliberately made the problem as simple as possible in order to present the A_p framework clearly.
Are we talking about the Laplace vs. fair coins?
Not sure what you are asking here, sorry...
We could also try to summarize some features of such epistemic states by talking about the instability of estimates - the degree to which they are easily updated by knowledge of other events
Yes, this is Jaynes' A_p approach.
this will be a derived feature of the probability distribution, rather than an ontologically extra feature of probability.
I'm not sure I follow this. There is no prior distribution for the per-coin payout probabilities that can accurately reflect all our knowledge.
I reject that this is a good reason for probability theorists to panic.
Yes, it's clear from comments that my OP was somewhat misleading as to its purpose. Overall, the sequence intends to discuss cases of uncertainty in which probability theory is the wrong tool for the job, and what to do instead.
However, this particular article intended only to introduce the idea that one's confidence in a probability estimate is independent from that estimate, and to develop the A_p (meta-probability) approach to expressing that confidence.
So, let me try again to explain why I think this is missing the point... I wrote "a single probability value fails to capture everything you know about an uncertain event." Maybe "simple" would have been better than "single"?
The point is that you can't solve this problem without somehow reasoning about probabilities of probabilities. You can solve it by reasoning about the expected value of different strategies. (I said so in the OP; I constructed the example to make this the obviously correct approach.) But those strategies contain reasoning about probabilities within them. So the "outer" probabilities (about strategies) are meta-probabilistic.
[Added:] Evidently, my OP was unclear and failed to communicate, since several people missed the same point in the same way. I'll think about how to revise it to make it clearer.
Glad you liked it!
I also get "stop after two losses," although my numbers come out slightly differently. However, I suck at this sort of problem, so it's quite likely I've got it wrong.
My temptation would be to solve it numerically (by brute force), i.e. code up a simulation and run it a million times and get the answer by seeing which strategy does best. Often that's the right approach. However, sometimes you can't simulate, and an analytical (exact, a priori) answer is better.
I think you are right about the sportsball case! I've updated my meta-meta-probability curve accordingly :-)
Can you think of a better example, in which the curve ought to be dead flat?
Jaynes uses "the probability that there was once life on Mars" in his discussion of this. I'm not sure that's such a great example either.
Thanks! Fixed.
Yup, it's definitely wrong! I was hoping no one would notice. I thought it would be a distraction to explain why the two are different (if that's not obvious), and also I didn't want to figure out exactly what the right math was to feed to my plotting package for this case. (Is the correct form of the curve for the p=0 case obvious to you? It wasn't obvious to me, but this isn't my area of expertise...)
Decisions are made on the basis of expected value, not probability.
Yes, that's the point here!
your analysis of the first bet ignores the value of the information gained from it in executing your options for further play thereafter.
By "the first bet" I take it that you mean "your first opportunity to put a coin in a green box" (rather than meaning "brown box").
My analysis of that was "you should put some coins in the box", exactly because of the information gain.
This statement indicates a lack of understanding of Jaynes, or at least an adherence to his foundations.
This post was based closely on the Chapter 18 of Jaynes' book, where he writes:
Suppose you have a penny and you are allowed to examine it carefully, and convince yourself that it is an honest coin; i.e. accurately round, with head and tail, and a center of gravity where it ought to be. Then you’re asked to assign a probability that this coin will come up heads on the first toss. I’m sure you’ll say 1/2. Now, suppose you are asked to assign a probability to the proposition that there was once life on Mars. Well, I don’t know what your opinion is there, but on the basis of all the things that I have read on the subject, I would again say about 1/2 for the probability. But, even though I have assigned the same ‘external’ probabilities to them, I have a very different ‘internal’ state of knowledge about those propositions.
Do you think he's saying something different from me here?
I don't think is demonstrated at all by this example.
Yes, I see your point (although I don't altogether agree). But, again, what I'm doing here is setting up analytical apparatus that will be helpful for more difficult cases later.
In the mean time, the LW posts I pointed to here may motivate more strongly the claim that probability alone is an insufficient guide to action.
I'm sure you know more about this than I do! Based on a quick Wiki check, I suspect that formally the A_p are one type of hyperprior, but not all hyperpriors are A_p (a/k/a metaprobabilities).
Hyperparameters are used in Bayesian sensitivity analysis, a/k/a "Robust Bayesian Analysis", which I recently accidentally reinvented here. I might write more about that later in this sequence.
It may be helpful to read some related posts (linked by lukeprog in a comment on this post): Estimate stability, and Model Stability in Intervention Assessment, which comments on Why We Can't Take Expected Value Estimates Literally (Even When They're Unbiased). The first of those motivates the A_p (meta-probability) approach, the second uses it, and the third explains intuitively why it's important in practice.
Jeremy, I think the apparent disagreement here is due to unclarity about what the point of my argument was. The point was not that this situation can't be analyzed with decision theory; it certainly can, and I did so. The point is that different decisions have to be made in two situations where the probabilities are the same.
Your discussion seems to equate "probability" with "utility", and the whole point of the example is that, in this case, they are not the same.
Thanks, Jonathan, yes, that's how I understand it.
Jaynes' discussion motivates A_p as an efficiency hack that allows you to save memory by forgetting some details. That's cool, although not the point I'm trying to make here.
Luke, thank you for these pointers! I've read some of them, and have the rest open in tabs to read soon.
Jeremy, thank you for this. To be clear, I wasn't suggesting that meta-probability is the solution. It's a solution. I chose it because I plan to use this framework in later articles, where it will (I hope) be particularly illuminating.
I would take issue with the first section of this article in which you establish single probability (expected utility) calculations as insufficient for the problem.
I don't think it's correct to equate probability with expected utility, as you seem to do here. The probability of a payout is the same in the two situations. The point of this example is that the probability of a particular event does not determine the optimal strategy. Because utility is dependent on your strategy, that also differs.
This problem easily succumbs to standard expected value calculations if all actions are considered.
Yes, absolutely! I chose a particularly simple problem, in which the correct decision-theoretic analysis is obvious, in order to show that probability does not always determine optimal strategy. In this case, the optimal strategies are clear (except for the exact stopping condition), and clearly different, even though the probabilities are the same.
I'm using this as an introductory wedge example. I've opened a Pandora's Box: probability by itself is not a fully adequate account of rationality. Many odd things will leap and creep out of that box so long as we leave it open.
Hi!
I’ve been interested in how to think well since early childhood. When I was about ten, I read a book about cybernetics. (This was in the Oligocene, when “cybernetics” had only recently gone extinct.) It gave simple introductions to probability theory, game theory, information theory, boolean switching logic, control theory, and neural networks. This was definitely the coolest stuff ever.
I went on to MIT, and got an undergraduate degree in math, specializing in mathematical logic and the theory of computation—fields that grew out of philosophical investigations of rationality.
Then I did a PhD at the MIT AI Lab, continuing my interest in what thinking is. My work there seems to have been turned into a surrealistic novel by Ken Wilber, a woo-ish pop philosopher. Along the way, I studied a variety of other fields that give diverse insights into thinking, ranging from developmental psychology to ethnomethodology to existential phenomenology.
I became aware of LW gradually over the past few years, mainly through mentions by people I follow on Twitter. As a lurker, there’s a lot about the LW community I’ve loved. On the other hand, I think some fundamental, generally-accepted ideas here are limited and misleading. I began considering writing about that recently, and posted some musings about whether and how it might be useful to address these misconceptions. (This was perhaps ruder than it ought to have been.) It prompted a reply post from Yvain, and much discussion on both his site and mine.
I followed that up with a more constructive post on aspects of how to think well that LW generally overlooks. In comments on that post, several frequent LW contributors encouraged me to re-post that material here. I may yet do that!
For now, though, I’ve started a sequence of LW articles on the difference between uncertainty and probability. Missing this distinction seems to underlie many of the ways I find LW thinking limited. Currently my outline for the sequence has seven articles, covering technical explanations of this difference, with various illustrations; the consequences of overlooking the distinction; and ways of dealing with uncertainty when probability theory is unhelpful.
(Kaj Sotala has suggested that I ask for upvotes on this self-introduction, so I can accumulate enough karma to move the articles from Discussion to Main. I wouldn’t have thought to ask that myself, but he seems to know what he’s doing here! :-)
O&BTW, I also write about contemporary trends in Buddhism, on several web sites, including a serial, philosophical, tantric Buddhist vampire romance novel.
Can you recommend an explanation of the complete class theorem(s)? Preferably online. I've been googling pretty hard and I've turned up almost nothing. I'd like to understand what conditions they start from (suspecting that maybe the result is not quite as strong as "Bayes Rules!"). I've found only one paper, which basically said "what Wald proved is extremely difficult to understand, and probably not what you wanted."
Thank you very much!
A collection of collections of advice for graduate students! http://vlsicad.ucsd.edu/Research/Advice/
A collection of advice for graduate students I put together some time ago: http://www.cs.indiana.edu/mit.research.how.to.html
It was meant specifically for people at the MIT AI Lab, but much of it is applicable to other STEM fields.
Regarding the development of agreeableness/empathy: there are meditation techniques specifically intended to do this. (They are variously called "Metta", "Lojong", "Tonglen", or (yuck) "loving kindness meditation"; all of which are pretty similar.) These originate in Mahayana Buddhism, but don't have any specifically religious content. They are often taught in conjunction with mindfulness meditation.
I don't know whether there have been any serious studies on these methods, but anecdotally they are highly effective. They seem not only to develop empathy, but also personal happiness (although that is not a stated goal). Generally, the serious studies that have been done on different meditation techniques have found that they work as advertised...