Unknown unknowns

ciphergoth

Unknown unknowns

post by Paul Crowley (ciphergoth) · 2011-08-05T12:55:37.560Z · LW · GW · Legacy · 26 comments

26 comments

Sorry if this seems incomplete - thought I'd fire this off as a discussion post now and hope to return to it with a more well-rounded post later.

Less Wrongers are used to thinking of uncertainty as best represented as a probability - or perhaps as a log odds ratio, stretching from minus infinity to infinity. But when I argue with people about for example cryonics, it appears most people consider that some possibilities simply don't appear on this scale at all: that we should not sign up for cryonics because no belief about its chances of working can be justified. Rejecting this category seems to me one of the key foundational ideas of this community, but as far as I know the only article specifically discussing it is "I don't know", which doesn't make a devastatingly strong case. What other writing discusses this idea?

I think there are two key arguments against this. First, you have to make a decision anyway, and the "no belief" uncertainty doesn't help with that. Second, "no belief" is treated as disconnected from the probability line; so at some point evidence causes a discontinuous jump from "no belief" to some level of confidence. This discontinuity seems very unnatural. How can evidence add up to a discontinuous jump - what happened to all the evidence before the jump?

26 comments

Comments sorted by top scores.

comment by Vladimir_Nesov · 2011-08-05T14:40:21.966Z · LW(p) · GW(p)

Posts linked to from Logical rudeness wiki page seem relevant, including Katja Grace's Estimation is the best we have.

One should always use the best tools available, even if they are no good or don't satisfy given standards of rigor. Declaring "I don't know" is also such a tool, but is it the best one available? Ironically, the problem seems to be partially with "I don't know" acting as a curiosity stopper, prompting one to stop thinking where a bit more thought could otherwise lead to better decisions.

comment by jsteinhardt · 2011-08-05T20:34:33.516Z · LW(p) · GW(p)

The following is an idea that's been nagging at me for a while, and I finally have it clear enough in my mind to at least try to state it. Any feedback would be highly appreciated (especially if what I say is confusing!).

I think there are cases where you shouldn't assign a probability to your beliefs. Most Bayesian updating is a form of computation, and you need to assign a probability to that computation being a reasonable thing to update on. Unless I have confidence in the procedure that I'm using to update my beliefs, I shouldn't take the number I get at the end as something to update my beliefs to...yes, I should update my beliefs towards that number, but possibly only a very tiny amount.

Now here is the problem you run into. When I start out, I probably haven't even chosen a prior. Choosing that prior is itself a computation that I have to make. If I have a low degree of confidence in the reasonability of that computation, then what am I supposed to do? It seems silly to take the result of that computation as my prior, as the result is probably meaningless. I basically want to update by a small amount away from "nothing", but "nothing" isn't a probabilitym so it's quite unclear what to do in probabilistic terms. (Note that once we start to have higher confidence in our calculations, then we can essentially update away from the "nothing" state by saying that almost any possible prior would have led to something close to our current belief.)

Replies from: jsteinhardt

↑ comment by jsteinhardt · 2011-08-06T23:43:49.470Z · LW(p) · GW(p)

I appreciate the upvotes but I can't imagine that I expressed things so clearly that no one is confused / has points of disagreement / clarification / etc., especially since this idea isn't even clear in my head yet. If someone wants to take the time to help me clarify my views here, or to point out flaws in my thinking, then I would appreciate it!

ETA: I wonder if complaining about upvotes without any comments is as frowned upon as complaining about downvotes without any comments. I guess I'm about to find out...

comment by Kaj_Sotala · 2011-08-06T09:41:15.445Z · LW(p) · GW(p)

Here's something I just posted elsewhere (in a debate concerning cryonics!) relating to this:

Sure, there are many people who know the same amount as I do and reach different numbers. But that's hardly a unique failing of this approach: equally-knowledgable people disagree about all kinds of issues all the time, even when they don't try to put their intuitions into numbers.

No, there isn't any well-established, objective approach for deriving the numbers. But that's beside the point. The difference between just throwing around verbal arguments and writing down probability estimates is like the difference in basing judgements on vague intuitions in your head and explicitly writing out the pros and cons. Putting numbers on things not only helps clarify the exact degree to which you disagree with someone else, it also forces you to be more explicit in your reasoning.

My above comment is a good example: Aleksander challenged me on two points, which led me to 1) consciously realize that I'd been basing one of my figures on two disjoint assumptions, helping me clarify my reasons for why I believed that 2) do some more research on another figure, leading me to data that made me revise my estimate to one tenth of what it previously was. If we'd been just throwing around verbal arguments and vague appeals to intuitions like we'd been doing before, I'm not sure whether either one of those would have happened. So no, putting explicit numbers on things doesn't mean we'll ever reach the correct conclusion, but it does help work out the exact points of disagreement and maybe make the estimates converge at least a bit.

It's also important to realize that not putting numbers on things doesn't mean we're not still pulling numbers out of thin air! Your brain is still doing some sort of implicit probability estimate. And while one could reasonably argue that trying to put numbers into our intuitions loses important data (as we don't have introspective access to all our thought processes), there are also some pretty convincing lines of argument suggesting that consciously-held information has evolved as much to appear good and persuade others as it has to actually evaluate the truthfulness of things. So a refusal to put explicit numbers on things seems to me suspicious, as it seems like the kind of a trick one that'd be useful in hiding inconsistencies in one's arguments.

Note however that I'm most definitely not accusing anyone of intentional dishonesty, or anything along those lines. The issue is not in people being consciously deceitful. The issue is in people's minds being built in such a way as to trick their consciousness into making estimates based on something else than truth, and then having reasonable-seeming intuitions that happen to lead to a "cover-up" for the flimsiness of the reasoning. I'm just as skeptical about my own conscious thought processes as I am of those of others (or at least I try to be), which is part of the reason why I'm so eager to put down my own probability estimates for others to critique.

comment by Merkle · 2011-08-14T07:03:36.117Z · LW(p) · GW(p)

In discussions with a friend, who expressed great discomfort in talking about cryonics, I finally extracted the confession that he had no emotional or social basis for considering cryonics. None of his friends or family had done it, it was not part of any of the accepted rituals that he had grown up with -- there was an emotional void around it that placed it outside of the range of options that he was able to think about. It was "other", alien, of such a nature that merely rational evaluation could not be applied.

He's in his 70's, so this issue is more than just academic. He understands that by rejecting cryonics he is embracing his own death. He does not believe in an afterlife. He becomes emotionally perturbed when I discuss cryonics precisely because I am persuasive about its technical feasibility.

Perhaps this observation isn't germane to the present thread, as this seems an emotional response rather than a response driven by "no belief." But perhaps "no belief" has an emotional component, as in "I don't want to have a belief. If I had a belief, then I'd have to take an unpleasant action."

comment by [deleted] · 2011-08-05T16:13:27.466Z · LW(p) · GW(p)

to me "no belief" often means that people have a lot of uncertainty about their probability estimates. I think that this uncertainty can be best expressed by asking people to make a market on their probability estimate, rather than just specifying a single number. So, for instance if you asked me to make a market in the probability that a fair coin toss will come up heads, I will be like 49@51, (I'll buy 49% and I'll sell 51% because I am very confident that the true value is 50% and so I know I am getting some edge on that one). If you ask me the probability that someone currently being stored at alcor will be brought back to life at some point in the future, I have very little confidence about how to estimate that, and so I'll be something like .0001 bid @ 85 offer.

If you take your beliefs seriously you should be willing to bet on them. If you are willing to bet on them they should be 2-sided markets and not single numbers because you should not be willing to take either side of a bet with the same odds, even a coin flip, because you have 0 EV at best. Once you consider credit risk, transaction costs, adverse selection, etc. then you are definitely -EV unless you include a bid/ask spread.

Replies from: ciphergoth

↑ comment by Paul Crowley (ciphergoth) · 2011-08-19T16:19:31.871Z · LW(p) · GW(p)

This seems like a similar point to When (Not) To Use Probabilities - would you agree?

comment by atucker · 2011-08-05T15:06:39.930Z · LW(p) · GW(p)

About two...

There is no jump, because "I don't know" is the maximum entropy distribution. The maximum entropy distribution is the distribution over probabilities which creates the maximum information-theoretic entropy, while obeying the observed parameters of the system. This works because entropy is just the expected value of the information gained from measuring a system. You want the maximum entropy distribution because anything else is literally pulling information out of thin air. If you pick a lower entropy distribution when you can construct a higher entropy one consistent with the data, then you're expecting less information to be given on a measurement, as if you already knew something about it.

The maximum entropy hypothesis on any yes/no question is a 50/50 chance. At those odds, cryonics are great!

However, they probably have information which adjusts their probability down. An actual "I don't know" would be the result of a coinflip, whereas anything under than a 50% probability of cryonics working is based on information which makes you think it's unlikely. So they have beliefs about it.

Replies from: ciphergoth, jsteinhardt, handoflixue

↑ comment by Paul Crowley (ciphergoth) · 2011-08-05T15:09:12.967Z · LW(p) · GW(p)

Whatever people mean by "I don't know", the way they think about it bears no resemblance to the way you discuss the maximum entropy distribution here I'm afraid. If that's what they meant, they would mean something sensible, and I don't think they do.

Replies from: atucker

↑ comment by atucker · 2011-08-05T15:22:46.898Z · LW(p) · GW(p)

Next time someone uses "I don't know" to try and justify not making a decision, I'll try to see if I can explain the maximum entropy distribution, and convince them that that's how it should be approached.

I anticipate that the main difficulty will be in convincing people that they have to assign a probability, and that even if they don't they're implicitly choosing one based on their actions.

Replies from: rhollerith_dot_com, jsteinhardt

↑ comment by RHollerith (rhollerith_dot_com) · 2011-08-05T16:17:50.140Z · LW(p) · GW(p)

There was a comment writer on LW who assumed that a probabilistic argument that referred to the word "bet" applied only to gambling wagers. He had no reply when someone pointed out that the probabilistic argument under consideration worked even when every decision by every agent is considered a bet.

Rhetorical tactics like using the word "bet" in a very inclusive sense strike me as more useful for the OP's purpose than explaining the MAXENT prior.

↑ comment by jsteinhardt · 2011-08-06T09:57:01.800Z · LW(p) · GW(p)

See my comment above which shows that the arguments surrounding maximum entropy are rather confused.

↑ comment by jsteinhardt · 2011-08-06T09:53:08.606Z · LW(p) · GW(p)

I don't think entropy quite works that way. For notational convenience, let Q(p) denote the entropy of p. Then just because Q(p) > Q(q), does not mean that q is strictly more informative than p. In other words, it is not the case that there is some total ordering on distributions, such that for any p,q with Q(p) > Q(q), I can get from p to q with Q(p)-Q(q) bits of information. The closest statement you can make would be in terms of KL divergence, but it is important to note that both KL(p||q) and KL(q||p) are positive, so KL is providing a distance, not an ordering.

Also note that entropy does not in fact decrease with more information. It decreases in expectation, and even then only relative to the subjective belief distribution. But this isn't even a particularly special property. Jensen's inequality together with conservation of expected evidence implies that, instead of Q(p) = E[-log(p(x))], we could have taken any concave function Q over the space of probability distributions, which would include functions of the form Q(p) = E[f(p(x))] as long as 2f'(z)+zf''(z) <= 0 for all z.

[Proof of the statement about Jensen: Let p2 be the distribution we get from p after updating. Then E[f(p2) | p] <= f(E[p2 | p]) = f(p), where <= is Jensen applied to f and E[p2 | p] = p by conservation of expected evidence.]

EDIT: For the interested reader, this is also strongly related to Doob's martingale convergence theorem, as your beliefs are a martingale and any concave function of them is a supermartingale.

↑ comment by handoflixue · 2011-08-05T20:28:32.871Z · LW(p) · GW(p)

I don't think they really mean maximum entropy, though. There seems to be "I don't know, it's 50/50" and then "I don't know, but it's obviously skewed this way, and I have strong confidence that there are unknown-unknowns that will skew it further when they're discovered"

Replies from: falenas108

↑ comment by falenas108 · 2011-08-06T05:41:34.119Z · LW(p) · GW(p)

In that case, you should be able to use how strongly you anticipate the skewing to create a probability estimate.

Replies from: handoflixue

↑ comment by handoflixue · 2011-08-06T22:01:40.187Z · LW(p) · GW(p)

I am not aware of any mathematical conversion between "I'm pretty sure you're wrong" and a specific probability estimate.

comment by mstevens · 2011-08-05T12:59:56.095Z · LW(p) · GW(p)

The "no belief" option seems at least likely to solve the "Pascal's Mugging" (http://lesswrong.com/lw/kd/pascals_mugging_tiny_probabilities_of_vast/) problem, by allowing you to ignore very low probability outcomes.

comment by torekp · 2011-08-05T21:45:38.586Z · LW(p) · GW(p)

Thanks cipergoth, for raising this fundamental issue. I'll try to defend the "no belief" approach, since I still consider it possibly correct. However, it should be noted that other options include credence intervals, for example "somewhere between almost-certain and certain, inclusive".

On the first argument - you have to make a decision, but it needn't literally be a calculated decision.

On the second, I would suggest a model of thought which has more continuity. In between "no belief" and a precise numerical probability could lie various qualitative assessments of the evidence for and against. On this model, the precise probabilities that a speaker might avow for certain select bets are good-enough approximations to a belief-state that may not quite fully live up to that precision. The exact numerical probabilities are used because expected utility calculations are a convenient approach to certain decisions. The jump from qualitative to numerical probabilities is made when the perceived advantages of expected utility calculations justify it - and perhaps the jump is more verbal than real.

Teddy Seidenfeld has a critique of maximum-entropy priors which, to my admittedly ill-trained eye, looks like a serious problem. I would love to believe that every probability question has an objective answer. But I don't, at least not yet.

comment by timtyler · 2011-08-05T20:33:29.650Z · LW(p) · GW(p)

You just make them bet. That reveals that they do have probability estimates.

Replies from: ciphergoth

↑ comment by Paul Crowley (ciphergoth) · 2011-08-06T09:47:11.834Z · LW(p) · GW(p)

Er, have you tried this? People refuse to bet. The idea that there should be a bet you're prepared to accept on a subject is also a part of the culture here that the people I'm talking about don't share.

Replies from: timtyler, Vladimir_Nesov

↑ comment by timtyler · 2011-08-06T12:08:37.646Z · LW(p) · GW(p)

Emphasis on the "make". Most people should be able to understand - at least as a though experiment - that there are circumstances under which they would have to bet - for example if failure to place a bet was punishable by their own life being forfeit. So, you can normally say: "if you were forced to bet...".

I'm not saying this will work on everyone - but usually framing things in terms of bets makes them more concrete, and helps people to understand what you are asking.

↑ comment by Vladimir_Nesov · 2011-08-06T10:47:19.182Z · LW(p) · GW(p)

Even asking to bet on 2+2=4 doesn't always work!

comment by saturn · 2011-08-05T17:29:05.311Z · LW(p) · GW(p)

This seems like basically the same mistake that was described in "But There's Still A Chance, Right?" except on the opposite end of the probability scale.

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2011-08-05T19:31:53.542Z · LW(p) · GW(p)

No, the post is not about believing high-probability events to be avoidable.

Replies from: handoflixue

↑ comment by handoflixue · 2011-08-05T20:27:13.248Z · LW(p) · GW(p)

But it does seem to be saying "there's not a chance".

Replies from: ciphergoth

↑ comment by Paul Crowley (ciphergoth) · 2011-08-06T09:44:49.973Z · LW(p) · GW(p)

No, this is just what they aren't saying! Their position is emphatically not "I am confident that cryonics won't work". To simplify, let me treat "confident" as if it denoted an exact probability boundary. Then To us, the only possibilities are that either you're confident it won't work, or you're not confident it won't work, in which case you assign a probability to it working significantly greater than zero; and if your position is the latter then signing up makes sense. But neither of these are their position: they don't know whether it'll work or not, they are not making a confident assertion, but they're saying that the evidence doesn't suffice to move us from the "I don't know" position to a position where we think it has a significant chance of working. They would absolutely reject any characterisation of their position as making a confident assertion.

Unknown unknowns

Contents

26 comments