Knightian Uncertainty from a Bayesian perspective

jonahsinick

Knightian Uncertainty from a Bayesian perspective

post by JonahS (JonahSinick) · 2014-02-04T04:16:31.805Z · LW · GW · Legacy · 35 comments

  Definitions of subjective probability
    Pragmatic objections to assigning subjective probabilities
    Overconfidence in models
    Insensitivity to robustness of evidence
    Suppression of dependency of events
  Conclusion
None
35 comments

Some people have maintained that there are events to which there's no rational basis for assigning probabilities. For example, John Maynard Keynes wrote of "uncertainty" in the following sense:

"By `uncertain' knowledge, let me explain, I do not mean merely to distinguish what is known for certain from what is only probable. The game of roulette is not subject, in this sense, to uncertainty...The sense in which I am using the term is that in which the prospect of a European war is uncertain, or the price of copper and the rate of interest twenty years hence...About these matters there is no scientific basis on which to form any calculable probability whatever. We simply do not know." (J.M. Keynes, 1937)

This sort of uncertainty is sometimes referred to as Knightian uncertainty.

MIRI is interested in making probabilistic predictions about events such as the creation of general artificial intelligence, which are without precedent, and which therefore cannot be assigned probabilities via frequentist means. Some of these events are presumably of the type that Keynes had in mind. At MIRI's request, I did a literature review looking for arguments against there being a rational basis for assigning probabilities to such events.

Definitions of subjective probability

One can attempt to define the subjective probability that an agent assigns to an event to be, intuitively, the number that it would assign if it were to make a very large number of predictions with a view toward, for each x, assigning probability x% to a collection of events of which x% actually occur. Eliezer discusses the mathematical formalism behind this in A Technical Explanation of a Technical Explanation.

Other definitions of subjective probabilities have been given by Ramsey (1931), de Finetti (1937), Koopman (1940), Good (1950), Savage (1954), Davidson and Suppes (1956), Kraft, Pratt and Seidenberg (1959), Anscombe and Aumann (1963) and Wakker (1989). (Fishburn (1986) gives a survey of the literature.) I have not studied the mathematical formalisms of most of these papers, but here's a definition inspired by them (one which is immune to some of the criticisms that have been raised against some of the definitions).

Assume that for each number p between 0 and 1, there is a random process R that yields an outcome O' with "objective" probability p. Here "objective" probability refers to a probability that can be determined via physics or frequentist means. Your subjective probability of an event E is defined as follows. Suppose that you have an event F, that you strongly desire to happen, and a choice between the following options:

F occurs if and only if E occurs.
F occurs if and only if the outcome of R is O'

Consider the set S of values of p such that you'd prefer #2 over #1. Then your subjective probability q of E is defined to be the greatest lower bound of S.

(F is usually taken to be a monetary reward arising from a bet.)

For example, suppose that E and F are the both the event "humanity survives for millions of years" and you have the opportunity to push a button that will guarantee this with probability p and otherwise guarantee that this does not happen. If you're willing to push it when p = 99.999%, that means that you assign a probability less than 99.999% to humanity surviving for millions of years. If you're not willing to push it when p = 0.001%, that means that you assign a probability greater than 0.001% to humanity surviving for millions of years.

Some objections to the definition are:

Your value of q is sensitive to factors such as framing effects, mood and what evidence you happen to have considered most recently considered. Kyburg (1968) discusses this on pages 57-58, and Shafer (1986) discusses this on page 465. So q may not be well defined as a number.
It assumes that you've considered the question in the definition. If an agent has never considered an event E, it doesn't have a probability attached to it stored in its memory, even implicitly. And even if one has considered the event E, one may not have had occasion to make an assessment, because of the absence of an event F for which #1 as in the definition could plausibly hold.

These two objections also apply to the definition that Eliezer discusses in A Technical Explanation of a Technical Explanation.

Addressing these points in turn:

q may still be well defined as an interval: in the example above involving humanity surviving for millions of years, it could be that the value of q that you assign fluctuates wildly between 10% and 90% depending on when you're asked and how you're asked, but that it always remains between 0.001% and 99.999%. Keynes discussed this in A Treatise on Probability, Kyburg suggests this in his 1968 paper, and Niklas Moller cites Ellsberg (1961), Kaplan (1983) and Levi (1986) on page 66 of Handbook of Risk Theory.
One can make the agent aware of the possibility of the event E, and try to create such a suitable event F. This may not be feasible, for example, because one lacks the resources to create such an event F, or because E is in the far future. But if one wishes to assign a probability to an event E, one can imagine an associated event F, and imagine that one was making the choice between #1 and #2.

Pragmatic objections to assigning subjective probabilities

Even if subjective probabilities are well-defined (up to the two issues mentioned above), assigning a subjective probability in a given instance could be bad for one's epistemology. Some proponents of the idea of Knightian uncertainty may implicitly adhere to this position. Some ways in which assigning a subjective probability can lead one astray are given below.

Overconfidence in models

Suppose that one has a model of the world that one thinks is probably right and according to which the probability of an event E is extremely small. If one forgets that the model might be wrong, one might erroneously conclude that the probability of E occurring is extremely small. (Yvain discussed this in Confidence levels inside and outside an argument.)

This appears to be close to Keynes' objection to assigning subjective probabilities. I have not studied Keynes' original work, but several people who have written about him seem to implicitly ascribe this position to him. For example, in a book review discussing Keynes, John Gray wrote:

Even our list of possible outcomes may turn out to have omitted the ones that are most important in shaping events. Such an omission was one of the factors that led Long-Term Capital Management, a highly leveraged hedge fund set up by two Nobel Prize winning economists, to fail in 1998-2000. The information used in applying the formula did not include the possibility of such events as the Asian financial crisis and Russia’s default on its sovereign debt, which destabilised global financial markets and helped destroy the fund. The orthodoxy that came unstuck with the collapse of LTCM was not faulty because it neglected the vagaries of human moods; its mistake was to think that the unknown future could be turned into a set of calculable risks and, in effect, conjured out of existence, which was impossible. Several centuries earlier, Pascal – one of the founders of probability theory – had come to the same conclusion, when in the Pensées he asks ironically: ‘Is it probable that probability brings certainty?’ The central flaw of the economic orthodoxy against which Keynes fought in the 1930s was to imagine that an insoluble problem – human ignorance of the future – had been solved. The error was repeated in the 1990s, when economists came to believe that complex mathematical formulae could tame uncertainty in the murky world of derivatives.

One can assign a probability to one's model of the world being accurate, to account for model uncertainty. Keynes' position is perhaps best interpreted as a statement about effect size: a claim that the probability that one should assign to one's model being inaccurate is large.

Insensitivity to robustness of evidence

Kyburg (1968) argues that probabilities don't adequately pick up on robustness of evidence. He gives the example of drawing balls from an urn with black and white balls of unknown relative frequencies. He says that there's a big difference between

An initial guess that the relative frequencies are 50%-50%
A guess that the relative frequencies are 50%-50% after having drawn 1,000 balls and finding that the relative frequencies of the colors of balls drawn are about 50%-50%

saying

The person who offers odds of two to one on the first ball is not at all out of his mind in the same sense as the person who offers two to one odds on the 1001st ball.

A single probability estimate does not pick up on how much one should update in response to incoming evidence. If one assigns a probability p to an event, one might mentally categorize the event in the reference class "events with probability p" and update too little or too much in response to incoming evidence on account of anchoring on other events of probability p (for which the probability is more robustly established or less robustly established than for the event in question).

This may be addressed by replacing a subjective probability of an event with a probability distribution for an event: for each number p between 0 and 1, associating a probability q_p that the event occurs with probability p. Quoting page 67 of Handbook of Risk Theory

Multivalued measures generally take the form of a function that assigns a numerical value to each probability value between 0 and 1. This value represents the degree of reliability or plausibility of each particular probability value. Several interpretations of the measure have been used in the literature, for example, second-order probability (Baron 1987; Skyrms 1980), fuzzy set membership (Unwin 1986; Dubois and Prade 1988), and epistemic reliability (Gardenfors and Sahlin 1982). See Moller et al. (2006) for an overview.

Probability, knowledge, and meta-probability discusses E.T. Jaynes' approach to this.

Suppression of dependency of events

Given two events A and B to which one assigns probabilities p and q, the numbers p and q do not suffice to determine the probability that events A and B both occur. If one assigns probabilities to events, and forgets where the probabilities came from, there's a risk of tacitly assuming that the events are independent, and assigning probability pq to the conjunction of p and q, when the probability of the conjunction could be much higher or much lower. According to chapter 1 of Nate Silver's book The Signal and the Noise, similar mistakes contributed to the 2008 financial crisis: people in finance assigned a much smaller probability of a very large number of houses' prices dropping than they did to a smaller number of houses' prices dropping, even though the prices of different houses were correlated.

Conclusion

While some people have said that subjective probabilities of arbitrary events are not meaningful, there are definitions that make the notion of subjective probability meaningful, though arguably only as an intervals rather than as numbers. Using intervals rather than numbers addresses some of the objections that have been raised.

A large part of the debate about whether one should assign subjective probabilities to arbitrary events is perhaps best conceptualized as a debate about how large the probability intervals that one assigns should be. In Worst Case Scenarios (pg 160) Sunstein wrote

Suppose that the question is the likelihood that at least 100 million human beings will be alive in 10,000 years. For most people equipped with the knowledge they have, no probability can sensibly be assigned. Perhaps uncertainty is not unlimited; the likelihood can reasonably be described as above 0 percent and below 100 percent. But beyond that point, little can be said.

In any given instance, one has the question of how much can be said. If you have a model of the world M that's accurate with probability at least p and M predicts an event E with probability at least q, then the probability of E is at least pq. If p is low, then this doesn't give a good lower bound on the probability of E. But suppose you have 2 independent models M₁, and M₂, where M_i is accurate with probability at least p_i and where M_i predicts E with probability at least q_i. Then the probability of E is bounded below by p₁q₁ + p₂q₂ - p₁q₁p₂q₂. So by using model combination you can get a better lower bound on the probability of E (although in practice the models used may not be fully independent, and if they're positively correlated then the lower bound will be worse).

The ways in which assigning subjective probabilities can be bad for one's epistemology seem to fall under the broad heading "failing to incorporate all of one's knowledge when assigning a probability and then using it uncritically, or forgetting that the probability that you assign to an event does not fully capture your knowledge pertaining to the event." These issues can be at least partially mitigated by keeping them in mind.

35 comments

Comments sorted by top scores.

comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2014-02-04T04:58:09.264Z · LW(p) · GW(p)

This may be addressed by replacing a subjective probability of an event with a probability distribution for an event: for each number p between 0 and 1, associating a probability qp that the event occurs with probability p.

This is a terrible formalism and I have never encountered any good use for it. Subjective probability is a state of belief. To a first approximation and ignoring material covered in CFAR courses rather than in theories of rational agency, I have no need to have nonextreme subjective probabilities about what I believe; I am allowed to know very solidly what I believe.

AFAICT the instability of a probability estimate just is captured by the notion of its sensitivity to probably-encountered evidence. It's just 'something that could easily get updated a lot', the investigation of which therefore has high information-value. I have no need to conceive of this as a probability distribution over probability distributions, unless it's a probability distribution over encountering various pieces of evidence and updating accordingly.

Plus, as with every other alternative to probability, a distribution over distributions either collapses to probability when we have to make decisions and calculate expected utilities, or else it is inconsistent, etc.

Replies from: VipulNaik, JonahSinick, Vaniver

↑ comment by VipulNaik · 2014-02-05T02:53:20.893Z · LW(p) · GW(p)

I'm pretty sure nothing I say here will be new to you, so consider this more of an effort to explain to you where I (and I think also Jonah, though I won't categorically speak for him) am coming from.

Jonah was looking at probability distributions over estimates of an unknown probability (such as the probability of a coin coming up heads). Unless you have some objection to probability distributions per se, I don't see anything wrong with taking a probability distribution to describe one's current state of knowledge of a probability.

If your goal is to answer the question "Will this coin come up heads?" for a single coin toss, and you can't run any experiments to augment your knowledge about the model, but only have access to your prior knowledge, then it's true that all your knowledge would be captured in a single probability number, and in case you have a subjective probability distribution, then the single probability number would simply be the expected value of the distribution.

If, however, you are trying to answer a similar question "Will this coin come up heads when I toss it on such-and-such date at such-and-such time?" but you can run experiments before that, it would make sense to use those experiments to try to understand the model that determines how the coin tossing works. Your model may be something like "with fairly extreme probability, I believe that there is a probability p such that the coin toss turns up heads with probability p, and that that probability p is independent of the time and place that it is tossed. I also have a Bayesian prior for the probability distribution of the probability p." You would start with the prior and then run coin-tossing experiments to continue updating that probability distribution of probabilities. The day before your grand toss, you'll need to take the expected value of the probability distribution that you have obtained by then. But at intermediate stages it would make sense to store the entire probability distribution rather than the expected value (the point estimate of the probability). For instance, if you think that the coin is either fair (probability 1/3), or always heads (probability 1/3), or always tails (probability 1/3), then it's worth storing that full prior rather than simply saying that there's a 50% chance of it turning up heads, so that you can appropriately update your evidence. I could also construct higher-order versions of this hypothetical, but they would be too tedious to describe.

Secondly, as Jonah said, if you're running the coin-tossing experiment multiple times and measuring the probability of, say, all heads, then the subjective probability distribution for p does matter for calculating the probability of all heads, and just the point estimate (expected value) of p would give a wrong answer.

Sorry if this isn't clear -- I can elaborate more later.

Replies from: Kurros, tom_cr, solipsist

↑ comment by Kurros · 2014-02-05T03:25:52.478Z · LW(p) · GW(p)

"Jonah was looking at probability distributions over estimates of an unknown probability (such as the probability of a coin coming up heads)"

It sounds like you are just confusing epistemic probabilities with propensities, or frequencies. I.e, due to physics, the shape of the coin, and your style of flipping, a particular set of coin flips will have certain frequency properties that you can characterise by a bias parameter p, which you call "the probability of landing on heads". This is just a parameter of a stochastic model, not a degree of belief.

However, you can have a degree of belief about what p is no problem. So you are talking about your degree of belief that a set of coin flips has certain frequentist properties, i.e. your degree of belief in a particular model for the coin flips.

edit: I could add that GIVEN a stochastic model you then have degrees of belief about whether a given coin flip will result in heads. But this is a conditional probability: see my other comment in reply to Vanvier. This is not, however, "beliefs about beliefs". It is just standard Bayesian modelling.

Replies from: Cyan, VipulNaik

↑ comment by Cyan · 2014-02-05T03:52:39.392Z · LW(p) · GW(p)

This is just a parameter of a stochastic model, not a degree of belief.

This is not exactly correct. It's true that in general there's a sharp distinction to be made between model parameters (which govern/summarize/encode properties of the entire stochastic process) and degrees of belief for various outcomes, but that distinction becomes very blurry in the current context.

What's going on here is that the probability distribution for the observable outcomes is infinitely exchangeable. Infinite exchangeability gives rise to a certain representation for the predictive distribution under which the prior expected limiting frequency is mathematically equal to the marginal prior probability for any single outcome. So under exchangeability, it's not an either/or -- it's a both/and.

Replies from: Kurros

↑ comment by Kurros · 2014-02-05T04:12:04.859Z · LW(p) · GW(p)

Are you referring to De Finetti's theorem? I can't say I understand your point. Does it relate to the edit I made shortly before your post? i.e. Given a stochastic model with some parameters, you then have degrees of belief about certain outcomes, some of which may seem almost the same thing as the parameters themselves? I still maintain that the two are quite different: parameters characterise probability distributions, and just in certain cases happen to coincide with conditional degrees of belief. In this 'beliefs about beliefs' context, though, it is the parameters we have degrees of belief about, we do not have degrees of belief about the conditional degrees of belief to which said parameters may happen to coincide.

Replies from: Cyan

↑ comment by Cyan · 2014-02-05T10:00:55.964Z · LW(p) · GW(p)

Yup, I'm referring to de Finetti's theorem. Thing is, de Finetti himself would have denied that there is such a thing as a parameter -- he was all about only assigning probabilities to observable, bet-on-able things. That's why he developed his representation theorem. From his perspective, p arises as a distinct mathematical entity merely as a result of the representation provided by exchangeability. The meaning of p is to be found in the predictive distribution; to describe p as a bias parameter is to reify a concept which has no place in de Finetti's Bayesian approach.

Now, I'm not a de-Finetti-style subjective Bayesian. For me, it's enough to note that the math is the same whether one conceives of p as stochastic model parameter or as the degree of plausibility of any single outcome. That's why I say it's not either/or.

Replies from: Kurros

↑ comment by Kurros · 2014-02-06T00:16:56.607Z · LW(p) · GW(p)

Hmm, interesting. I will go and learn more deeply what de Finetti was getting at. It is a little confusing... in this simple case ok fine p can be defined in a straightforward way in terms of the predictive distribution, but in more complicated cases this quickly becomes extremely difficult or impossible. For one thing, a single model with a single set of parameters may describe outcomes of vastly different experiments. E.g. consider Newtonian gravity. Ok fine strictly the Newtonian gravity part of the model has to be coupled to various other models to describe specific details of the setup, but in all cases there is a parameter G for the universal gravitation constant. G impacts on the predictive distributions for all such experiments, so it is pretty hard to see how it could be defined in terms of them, at least in a concrete sense.

Replies from: Cyan

↑ comment by Cyan · 2014-02-06T20:56:38.325Z · LW(p) · GW(p)

I'd guess that in Geisser-style predictive inference, the meaning or reality or what-have-you of G is to be found in the way it encodes the dependence (or maybe, compresses the description) of the joint multivariate predictive distribution. But like I say, that's not my school of thought -- I'm happy to admit the possibility of physical model parameters -- so I really am just guessing.

Replies from: Kurros

↑ comment by Kurros · 2014-02-07T01:00:16.227Z · LW(p) · GW(p)

Hmm, do you know of any good material to learn more about this? I am actually extremely sympathetic to any attempt to rid model parameters of physical meaning; I mean in an abstract sense I am happy to have degrees of belief about them, but in a prior-elucidation sense I find it extremely difficult to argue about what it is sensible to believe a-priori about parameters, particularly given parameterisation dependence problems.

I am a particle physicist, and a particular problem I have is that parameters in particle physics are not constant; they vary with renormalisation scale (roughly, energy of the scattering process), so that if I want to argue about what it is a-priori reasonable to believe about (say) the mass of the Higgs boson, it matters a very great deal what energy scale I choose to define my prior for the parameters at. If I choose (naively) a flat prior over low-energy values for the Higgs mass, it implies I believe some really special and weird things about the high-scale Higgs mass parameter values (they have to be fine-tuned to the bejesus); while if I believe something more "flat" about the high scale parameters, it in turn implies something extremely informative about the low-scale values, namely that the Higgs mass should be really heavy (in the Standard Model - this is essentially the Hierarchy problem, translated into Bayesian words).

Anyway, if I can more directly reason about the physically observable things and detach from the abstract parameters, it might help clarify how one should think about this mess...

Replies from: Cyan

↑ comment by Cyan · 2014-02-07T13:50:00.242Z · LW(p) · GW(p)

I can pass along a recommendation I have received: Operational Subjective Statistical Methods by Frank Lad. I haven't read the book myself, so I can't actually vouch for it, but it was described to me as "excellent". I don't know if it is actively prediction-centered, but it should at least be compatible with that philosophy.

Replies from: Kurros

↑ comment by Kurros · 2014-02-08T12:06:35.326Z · LW(p) · GW(p)

Thanks, this seems interesting. It is pretty radical; he is very insistent on the idea that for all 'quantities' about which we want to reason there must some operational procedure we can follow in order to find out what it is. I don't know what this means for the ontological status of physical principles, models, etc, but I can at least see the naive appeal... it makes it hard to understand why a model could ever have the power to predict new things we have never seen before though, like Higgs bosons...

↑ comment by VipulNaik · 2014-02-07T00:02:26.011Z · LW(p) · GW(p)

I understand this, though I hadn't thought of it with such clear terminology. I think the point Jonah was making was that in many cases, people are talking about propensities/frequencies when they refer to probabilities. So it's not so much that Jonah or I are confusing epistemic probabilities with propensities/frequencies, it's that many people use the term "probability" to refer to the latter. With language used this way, the probability distribution for this model parameter can be called the "probability distribution of the probability estimate." If you reserve the term probability exclusive to epistemic probability (degree of belief) then this would constitute an abuse of language.

Replies from: Kurros

↑ comment by Kurros · 2014-02-07T02:02:07.219Z · LW(p) · GW(p)

Sure, I don't want to suggest we only use the word 'probability' for epistemic probabilities (although the world might be a better place if we did...), only that if we use the word to mean different sorts of probabilities in the same sentence, or even whole body of text, without explicit clarification, then it is just asking for confusion.

↑ comment by tom_cr · 2014-02-05T20:41:20.713Z · LW(p) · GW(p)

Jonah was looking at probability distributions over estimates of an unknown probability

What is an unknown probability? Forming a probability distribution means rationally assigning degrees of belief to a set of hypotheses. The very act of rational assignment entails that you know what it is.

↑ comment by solipsist · 2014-02-05T06:26:19.276Z · LW(p) · GW(p)

That distribution of coin biases is a hyperprior.

↑ comment by JonahS (JonahSinick) · 2014-02-04T06:00:30.378Z · LW(p) · GW(p)

unless it's a probability distribution over encountering various pieces of evidence and updating accordingly.

This is what I had in mind.

For agents of bounded rationality with Kahneman and Tversky biases, of the three options

(A) Don't assign a numerical probability to event E
(B) Assign a numerical probability to an event E
(C) Assign a numerical probability distribution to E corresponding to probabilities of encountering pieces of evidence of a given size

in a given instance, it could be that option (B) is inferior to option (A) but that option (C) is superior to option (A). (Here I'm assuming a scenario where you have time and resources to gather more information if you choose to before making decisions to which E is directly relevant.)

In other scenarios, option (B) could beat out both option (A) and option (C). In other scenarios (like when deciding whether to flinch away upon accidentally touching a hot pot), option (A) could beat out options (B) and (C).

One can imagine a continuum of epistemological frameworks from that of a prehistoric human to that of a Solomonoff Inductor. The optimal one for humans is somewhere in between. An objection to assigning numerical probabilities is "we don't have the hardware and software to utilize them to improve our decision making." An answer is "we do in some contexts, even if not all." An elaboration is "If, in a given instance, at first glance, it looks like quantification doesn't help, quantifying things one layer further may help, even if not always." That was the point of that part of my post.

Replies from: cousin_it

↑ comment by cousin_it · 2014-02-04T12:04:35.746Z · LW(p) · GW(p)

Are you saying that people can't deal with regular probability theory, but can deal with two-level "probabilities of probabilities"? That seems unlikely. I'd guess that the people who claim to use "probabilities of probabilities" cannot use them correctly either.

Replies from: Protagoras, JonahSinick

↑ comment by Protagoras · 2014-02-06T01:05:39.967Z · LW(p) · GW(p)

How about this suggestion/interpretation? There are some probabilities which are based on a lot of evidence already and so which should only be changed slightly when new evidence comes in (unless there's a lot of it, of course), and there are some probabilities that are based on next to nothing and so that we should be prepared to shift dramatically if any actual evidence comes to light. Bad things can happen when the latter are mistakenly treated as if they were the former, but people aren't good at keeping track of the difference. Introducing two level "probabilities of probabilities" for handling the latter may not actually make them particularly manageable, but it could at least prevent them from being confused with the former, and if it prevents their being used much at all, perhaps that's for the best.

Replies from: cousin_it

↑ comment by cousin_it · 2014-02-06T12:09:48.262Z · LW(p) · GW(p)

(I'm about 90% sure that you already know what I'm going to say, but the remaining 10% leads me to say it just in case, and it might help onlookers as well.)

A one-level prior already contains the information about how strongly you update. For example, if you have a prior about the joint outcome of two coinflips, consisting of four probabilities that sum to 1, then learning the outcome of the first coinflip allows you to update your beliefs about the second one, and any Bayesian-rational method of updating in that situation (corresponding to a single coin with known bias, single coin with unknown bias, two coins with opposite biases...) can be expressed that way.

Replies from: JonahSinick

↑ comment by JonahS (JonahSinick) · 2014-02-07T21:29:34.905Z · LW(p) · GW(p)

Yes, it's just a matter of which way of looking at things is most helpful psychologically (for humans, with human biases).

↑ comment by JonahS (JonahSinick) · 2014-02-04T20:58:58.185Z · LW(p) · GW(p)

Are you saying that people can't deal with regular probability theory, but can deal with two-level "probabilities of probabilities"?

Not in full generality. There may be instances though. I don't know how to articulate my intuitions here, without going into examples that are sufficiently involved so that they'd derail the conversation. If nothing else, it's true that a probability estimate does not suffice to capture the knowledge that one has about an event and that one can better use probabilities as an input into one's epistemology if one keeps this in mind.

↑ comment by Vaniver · 2014-02-04T06:29:35.739Z · LW(p) · GW(p)

For questions of a continuous nature, you think that subjective probability is best expressed as a distribution over the continuous support, right? I view these sorts of distributions over distributions as that- there's some continuous parameter potentially in the world (the proportion of white and black balls in the urn), and that continuous parameter may determine my subjective probability about binary events (whether ball #1001 is white or black).

Now, whether or not this formalism stretches to other ideas might be controversial. I might consider "the strength of the argument for Conclusion X" as having continuous support, possibly from 0 to 1, and so be able to express with my probability distribution over that how much more I expect to learn about the issue, but I can see reasons to avoid doing that.

[edit]That is, rather than modifying the likelihood ratios of all of the pieces of evidence for or against the argument being strong, I can modify my distribution on it. I think this runs in to trouble with, say, argument screening off authority- there's a case where you really do want to modify the likelihood ratios.

Replies from: Kurros

↑ comment by Kurros · 2014-02-05T01:17:28.740Z · LW(p) · GW(p)

"I view these sorts of distributions over distributions as that- there's some continuous parameter potentially in the world (the proportion of white and black balls in the urn), and that continuous parameter may determine my subjective probability about binary events (whether ball #1001 is white or black)."

To me this just sounds like standard conditional probability. E.g. let p(x|I) be your subjective probability distribution over the parameter x (fraction of white balls in urn), given prior information I. Then

p("ball 1001 is white"|I) = integral_x { p("ball 1001 is white"|x,I)*p(x|I) } dx

So your belief in "ball 1001 is white" gets modulated by your belief distributions over x, sure. But I wouldn't call this a "distribution over a distribution". Yes, there is a set of likelihoods p("ball 1001 is white"|x,I) which specify your subjective degree of belief in "ball 1001 is white" GIVEN various x, but in then end you want your degree of belief in "ball 1001 is white" considering ALL values that x might have and their relative plausibilities, i.e. you want the marginal likelihood to make your predictions.

(my marginalisation here ignores hypotheses outside the domain implied by there being a fraction of balls in the urn...)

comment by Lumifer · 2014-02-04T21:17:12.840Z · LW(p) · GW(p)

I don't see anything with conclusion-nature in the conclusion :-)

Can we do a worked example? Let's say the assertion is: "Aliens exist!", or, speaking technically, "Representatives of an advanced non-human civilization are present in the Solar system, they are watching the humanity but are successful at concealing their presence".

How would you estimate a probability (or a probability interval or a probability distribution) for this assertion being true?

comment by [deleted] · 2015-09-11T12:18:12.206Z · LW(p) · GW(p)

Psychologists theorise that uncertainty reduction maximises subjective utility in naive humans. If a person subjectively identifies uncertain stimuli associated with negative outcomes, this is not suprising.

Some normative decision theories treat knightian uncertainty as risk neutral

Defensive pessimism is anxiety reducing. This anxiety reduction may be because uncertainties are reduced by analysing them adhoc in ones worry.

Anxiety can also be reduced by non avoidance of an anxiety producing stimuli. When the target of ones worry is an anxiety producing stimuli and the strategy used is non avoidance, this may entail a human agent interacting with anxiety producing stimuli in order to produce a potentially spurious risk reduction attempt assuming that the risk is more accurately classified as knightian uncertainty

I suspect this framework may be useful In understanding why hopelessness can create a self fulfilling prophecy that can sometimes explain behaviours like suicidality, self destructing rather than optimistic gambling. I might flesh out the logic and evidence that has inspired this hypothesis more completely if I see a way that it could be operationalised.

comment by V_V · 2014-02-06T14:40:26.007Z · LW(p) · GW(p)

For example, suppose that E and F are the both the event "humanity survives for millions of years" and you have the opportunity to push a button that will guarantee this with probability p and otherwise guarantee that this does not happen. If you're willing to push it when p = 99.999%, that means that you assign a probability less than 99.999% to humanity surviving for millions of years. If you're not willing to push it when p = 0.001%, that means that you assign a probability greater than 0.001% to humanity surviving for millions of years.

I think these type of definitions are intrinsically circular: you define probability in terms of rational decisions in uncertain environments. But in order to define what it means to perform rational decisions in uncertain environments, you need probability. Hence the circularity.

Replies from: JonahSinick

↑ comment by JonahS (JonahSinick) · 2014-02-07T18:13:40.597Z · LW(p) · GW(p)

But in order to define what it means to perform rational decisions in uncertain environments, you need probability.

What do you mean by this?

Replies from: V_V

↑ comment by V_V · 2014-02-07T21:09:46.535Z · LW(p) · GW(p)

Essentially all forms of decision theory or game theory are based on expected utility maximization (up to some details).

In order to define expected utility maximization, you need a concept of expectation, which means that you need probability theory.

Replies from: JonahSinick

↑ comment by JonahS (JonahSinick) · 2014-02-07T21:26:33.655Z · LW(p) · GW(p)

Savage's theory shows that there exists a utility function and a subjective probability distribution such that a "rational" agent is maximizing expected utility. It doesn't disentangle the utility function and the subjective probability distribution. So what you say is true in some sense, but the agent's behavior still places constraints on the two things.

Replies from: V_V

↑ comment by V_V · 2014-02-08T01:10:56.401Z · LW(p) · GW(p)

a "rational" agent is maximizing expected utility

A rational agent maximizes expected utility by definition.

but the agent's behavior still places constraints on the two things.

Ok. So if you observe the behavior of an agent, and assume it performs expected utility maximization, you can determine some constraints on its utility function and subjective probability distribution. Fair enough.

Still, this doesn't allow us to tell that the revealed subjective probability distribution that is intrinsically accurate in any reasonable sense:
A person who prefers life over death and nevertheless starves himself to death due to the belief that people are tying to poison him may be perfectly rational for some choice of subjective probability distribution. We tend to call these types of probability distributions "psychotic disorders", but there is nothing in the theory of subjective probability that allows us to rule them out as wrong.

comment by torekp · 2014-02-06T02:20:44.018Z · LW(p) · GW(p)

I think the proposed definition (the one inspired by Ramsey, de Finetti, et al) assumes too much about values and our knowledge of our values. Let's consider your example:

For example, suppose that E and F are the both the event "humanity survives for millions of years" and you have the opportunity to push a button that will guarantee this with probability p and otherwise guarantee that this does not happen.

"Okay Djinni," I say, "since your yellow button gives a 90% probability that humanity survives for millions of years, I'll go ahead and push -"

"Mwahah - oops, I mean, what are you waiting for, torekp?"

"Hey! No fair, this button guarantees that we survive, but in horrible agony! Let me look at this purple button instead. Ah, that's better, people survive in complete comfort. I'll go ahead -"

"Mwah - er, never mind me, I'm just getting over a cold."

"Like hell you are! I just noticed that the purple button puts people in near-stasis. They'll live for millions of years on my clock, but their subjective time is nearly nil! OK, purple button's out; let's look at green..."

This could go on ad infinitum - or until we figure out exactly what our terminal values are, which is even longer. Part of the problem is that the value of F, the event I originally wanted, could depend on the value of E, the objectively-random process we're betting on. But wait, here's where it gets really interesting: part of my reason for varying my valuation of F based on E may be the very fact of objective risk associated with E.

Maybe F is more exciting if I obtain it in a risky way. Or, maybe it becomes a lesser achievement for me when it is a matter of luck rather than pure skill. Either way, nonlinearities and discontinuities threaten to pop up and ruin the suggested interpretation of my betting choices as an expression of my epistemic probability.

comment by solipsist · 2014-02-05T06:58:19.320Z · LW(p) · GW(p)

In my (very sleepy) state, I don't see anything here that isn't accomplished with "hyperpriors". Is there?

comment by cousin_it · 2014-02-04T10:09:58.396Z · LW(p) · GW(p)

I agree with Eliezer's comment, and continue to be puzzled why people keep talking about metaprobabilities.

comment by Eugine_Nier · 2014-02-07T08:10:35.935Z · LW(p) · GW(p)

Here is a somewhat mathematical introduction to higher order uncertainty.

Replies from: Richard_Kennaway

↑ comment by Richard_Kennaway · 2014-02-07T12:09:00.822Z · LW(p) · GW(p)

Further context to that link, which is a chapter appearing in Nassim Taleb's works in progress, Silent Risk and Probability and RIsk in the Real World.