Should you go with your best guess?: Against precise Bayesianism and related views
post by Anthony DiGiovanni (antimonyanthony) · 2025-01-27T20:25:26.809Z · LW · GW · 15 commentsContents
15 comments
15 comments
Comments sorted by top scores.
comment by Lukas Finnveden (Lanrian) · 2025-02-01T05:25:46.380Z · LW(p) · GW(p)
If I were to write the case for this in my own words, it might be something like:
- There are many different normative criteria we should give some weight to.
- One of them is "maximizing EV according to moral theory A".
- But maximizing EV is an intuitively less appealing normative criteria when (i) it's super unclear and non-robust what credences we ought to put on certain propositions, and (ii) the recommended decision is very different depending on what our exact credences on those propositions are.
- So in such cases, as a matter of ethics, you might have the intuition that you should give less weight to "maximize EV according to moral theory A" and more weight to e.g.:
- Deontic criteria that don't use EV.
- EV-maximizing according to moral theory B (where B's recommendations are less sensitive to the propositions that are difficult to put robust credences on).
- EV-maximizing within a more narrow "domain", ignoring the effects outside of that "domain". (Where the effects within that "domain" are less sensitive to the propositions that are difficult to put robust credences on).
I like this formulation because it seems pretty arbitrary to me where you draw the boundary between a credence that you include in your representor vs. not. (Like: What degree of justification is enough? We'll always have the problem of induction to provide some degree of arbitrariness.) But if we put this squarely in the domain of ethics, I'm less fuzzed about this, because I'm already sympathetic to being pretty anti-realist about ethics, and there being some degree of arbitrariness in choosing what you care about. (And I certainly feel some intuitive aversion to making choices based on very non-robust credences, and it feels interesting to interpret that as an ~ethical intuition.)
Replies from: antimonyanthony, antimonyanthony, Lanrian↑ comment by Anthony DiGiovanni (antimonyanthony) · 2025-02-01T23:57:16.857Z · LW(p) · GW(p)
(I'll reply to the point about arbitrariness in another comment.)
I think it's generally helpful for conceptual clarity to analyze epistemics separately from ethics and decision theory. E.g., it's not just EV maximization w.r.t. non-robust credences that I take issue with, it's any decision rule built on top of non-robust credences. And I worry that without more careful justification, "[consequentialist] EV-maximizing within a more narrow "domain", ignoring the effects outside of that "domain"" is pretty unmotivated / just kinda looking under the streetlight. And how do you pick the domain?
(Depends on the details, though. If it turns out that EV-maximizing w.r.t. impartial consequentialism is always sensitive to non-robust credences (in your framing), I'm sympathetic to "EV-maximizing w.r.t. those you personally care about, subject to various deontological side constraints etc." as a response. Because “those you personally care about” isn’t an arbitrary domain, it’s, well, those you personally care about. The moral motivation for focusing on that domain is qualitatively different from the motivation for impartial consequentialism.)
So I'm hesitant to endorse your formulation. But maybe for most practical purposes this isn't a big deal, I'm not sure yet.
Replies from: Lanrian↑ comment by Lukas Finnveden (Lanrian) · 2025-02-02T01:08:27.515Z · LW(p) · GW(p)
To be clear: The "domain" thing was just meant to be a vague gesture of the sort of thing you might want to do. (I was trying to include my impression of what eg bracketed choice is trying to do.) I definitely agree that the gesture was vague enough to also include some options that I'd think are unreasonable.
↑ comment by Anthony DiGiovanni (antimonyanthony) · 2025-02-02T00:07:08.094Z · LW(p) · GW(p)
it seems pretty arbitrary to me where you draw the boundary between a credence that you include in your representor vs. not. (Like: What degree of justification is enough? We'll always have the problem of induction to provide some degree of arbitrariness.)
To spell out how I’m thinking of credence-setting: Given some information, we apply different (vague) non-pragmatic principles [LW · GW] we endorse — fit with evidence, Occam’s razor, deference, etc.
Epistemic arbitrariness means making choices in your credence-setting that add something beyond these principles. (Contrast this with mere “formalization arbitrariness”, the sort discussed in the part of the post [LW · GW] about vagueness.)
I don’t think the problem of induction forces us to be epistemically arbitrary. Occam’s razor (perhaps an imprecise version!) favors priors that penalize a hypothesis like “the mechanisms that made the sun rise every day in the past suddenly change tomorrow”. This seems to give us grounds for having prior credences narrower than (0, 1), even if there’s some unavoidable formalization arbitrariness. (We can endorse the principle underlying Occam’s razor, “give more weight to hypotheses that posit fewer entities”, without a circular justification like “Occam’s razor worked well in the past”. Admittedly, I don’t feel super satisfied with / unconfused about Occam’s razor, but it’s not just an ad hoc thing.)
By contrast, pinning down a single determinate credence (in the cases discussed in this post) seems to require favoring epistemic weights for no reason. Or at best, a very weak reason that IMO is clearly outweighed by a principle of suspending judgment. So this seems more arbitrary to me than indeterminate credences, since it injects epistemic arbitrariness on top of formalization arbitrariness.
Replies from: Lanrian↑ comment by Lukas Finnveden (Lanrian) · 2025-02-02T03:49:12.456Z · LW(p) · GW(p)
Thanks. It still seems to me like the problem recurs. The application of Occam's razor to questions like "will the Sun rise tomorrow?" seems more solid than e.g. random intuitions I have about how to weigh up various considerations. But the latter do still seem like a very weak version of the former. (E.g. both do rely on my intuitions; and in both cases, the domain have something in common with cases where my intuitions have worked well before, and something not-in-common.) And so it's unclear to me what non-arbitrary standards I can use to decide whether I should let both, neither, or just the latter be "outweighed by a principle of suspending judgment".
Replies from: antimonyanthony↑ comment by Anthony DiGiovanni (antimonyanthony) · 2025-02-02T13:19:47.084Z · LW(p) · GW(p)
(General caveat that I'm not sure if I'm missing your point.)
Sure, there's still a "problem" in the sense that we don't have a clean epistemic theory of everything. The weights we put on the importance of different principles, and how well different credences fulfill them, will be fuzzy. But we've had this problem all along.
There are options other than (1) purely determinate credences or (2) implausibly wide indeterminate credences. To me, there are very compelling intuitions behind the view that the balance among my epistemic principles is best struck by (3) indeterminate credences that are narrow in proportion to the weight of evidence and how far principles like Occam seem to go. This isn't objective (neither are any other principles of rationality less trivial than avoiding synchronic sure losses). Maybe your intuitions differ, upon careful reflection. That doesn't mean it's a free-for-all. Even if it is, this isn't a positive argument for determinacy.
both do rely on my intuitions
My intuitions about foundational epistemic principles are just about what I philosophically endorse — in that domain, I don’t know what else we could possibly go on other than intuition. Whereas, my intuitions about empirical claims about the far future only seem worth endorsing as far as I have reasons to think they're tracking empirical reality.
↑ comment by Lukas Finnveden (Lanrian) · 2025-02-01T05:29:38.008Z · LW(p) · GW(p)
Also, my sense is that many people are making decisions based on similar intuitions as the ones you have (albeit with much less of a formal argument for how this can be represented or why it's reasonable). In particular, my impression is that people who are are uncompelled by longtermism (despite being compelled by some type of scope-sensitive consequentialism) are often driven by an aversion to very non-robust EV-estimates.
comment by Davidmanheim · 2025-01-28T11:48:53.277Z · LW(p) · GW(p)
I’m not merely saying that agents shouldn’t have precise credences when modeling environments more complex than themselves
You seem to be underestimating how pervasive / universal this critique is - essentially every environment is more complex than we are, at the very least when we're embedded agents, or other humans are involved. So I'm not sure where your criticism (which I agree with) is doing more than the basic argument is in a very strong way - it just seems to be stating it more clearly.
The problem is that Kolmogorov complexity depends on the language in which algorithms are described. Whatever you want to say about invariances with respect to the description language, this has the following unfortunate consequence for agents making decisions on the basis of finite amounts of data: For any finite sequence of observations, we can always find a silly-looking language in which the length of the shortest program outputting those observations is much lower than that in a natural-looking language (but which makes wildly different predictions of future data).
Far less confident here, but I think this isn't correct as a mater of practice. Conceptually, Solomonoff doesn't say "pick an arbitrary language once you've seen the data and then do the math" it says "pick an arbitrary language before you've seen any data and then do the math." And if we need to implement the silly looking language, there is a complexity penalty to doing that, one that's going to be similarly large regardless of what baseline we choose, and we can determine how large it is in reducing the language to some other language. (And I may be wrong, but picking a language cleverly should not means that Kolmogorov complexity will change something requiring NP programs to encode into something that P programs can encode, so this criticism seems weak anyways outside of toy examples.)
Replies from: antimonyanthony↑ comment by Anthony DiGiovanni (antimonyanthony) · 2025-01-28T13:23:24.718Z · LW(p) · GW(p)
You seem to be underestimating how pervasive / universal this critique is - essentially every environment is more complex than we are
I agree it's pretty pervasive, but the impression I've gotten from my (admittedly limited) sense of how infra-Bayesianism works is:
The "more complex than we are" condition for indeterminacy doesn't tell us much about when, if ever, our credences ought to capture indeterminacy in how we weigh up considerations/evidence — which is a problem for us independent of non-realizability. For example, I'd be surprised if many/most infra-Bayesians would endorse suspending judgment in the motivating example in this post, if they haven't yet considered the kinds of arguments I survey. This matters for how decision-relevant indeterminacy is for altruistic prioritization.
I'm also not aware of the infra-Bayesian literature addressing the "practical hallmarks" I discuss, though I might have missed something.
(The Solomonoff induction part is a bit above my pay grade, will think more about it.)
Replies from: Davidmanheim↑ comment by Davidmanheim · 2025-01-28T15:53:48.801Z · LW(p) · GW(p)
"when, if ever, our credences ought to capture indeterminacy in how we weigh up considerations/evidence"
The obvious answer is only when there is enough indeterminacy to matter; I'm not sure if anyone would disagree. Because the question isn't whether there is indeterminacy, it's how much, and whether it's worth the costs of using a more complex model instead of doing it the Bayesian way.
I'd be surprised if many/most infra-Bayesians would endorse suspending judgment in the motivating example in this post
You also didn't quite endorse suspending judgement in that case - "If someone forced you to give a best guess one way or the other, you suppose you’d say “decrease”. Yet, this feels so arbitrary that you can’t help but wonder whether you really need to give a best guess at all…" So, yes, if it's not directly decision relevant, sure, don't pick, say you're uncertain. Which is best practice even if you use precise probability - you can have a preference for robust decisions, or a rule for withholding judgement when your confidence is low. But if it is decision relevant, and there is only a binary choice available, your best guess matters. And this is exactly why Eliezer says that when there is a decision, you need to focus your indeterminacy [LW · GW], and why he was dismissive of DS and similar approaches.
Replies from: antimonyanthony↑ comment by Anthony DiGiovanni (antimonyanthony) · 2025-01-28T16:13:10.382Z · LW(p) · GW(p)
The obvious answer is only when there is enough indeterminacy to matter; I'm not sure if anyone would disagree. Because the question isn't whether there is indeterminacy, it's how much, and whether it's worth the costs of using a more complex model instead of doing it the Bayesian way.
Based on this I think you probably mean something different by “indeterminacy” than I do (and I’m not sure what you mean). Many people in this community explicitly disagree with the claim that our beliefs should be indeterminate at all, as exemplified by the objections I respond to in the post.
When you say “whether it’s worth the costs of using a more complex model instead of doing it the Bayesian way”, I don’t know what “costs” you mean, or what non-question-begging standard you’re using to judge whether “doing it the Bayesian way” would be better. As I write in the “Background” section: "And it’s question-begging to claim that certain beliefs “outperform” others, if we define performance as leading to behavior that maximizes expected utility under those beliefs. For example, it’s often claimed that we make “better decisions” with determinate beliefs. But on any way of making this claim precise (in context) that I’m aware of, “better decisions” presupposes determinate beliefs!"
You also didn't quite endorse suspending judgement in that case - "If someone forced you to give a best guess one way or the other, you suppose you’d say “decrease”.
The quoted sentence is consistent with endorsing suspending judgment, epistemically speaking. As the key takeaways list says, “If you’d prefer to go with a given estimate as your “best guess” when forced to give a determinate answer, that doesn’t imply this estimate should be your actual belief.”
But if it is decision relevant, and there is only a binary choice available, your best guess matters
I address this in the “Practical hallmarks” section — what part of my argument there do you disagree with?
comment by Noosphere89 (sharmake-farah) · 2025-01-29T00:15:23.027Z · LW(p) · GW(p)
IMO, most of the problems with Precise Bayesianism for humans are mostly problems with logical omnisicence not being satisfied.
Also, one the arbitrariness of the prior, this is an essential feature for a very general learner, due to the no free lunch theorems.
The no free lunch theorem prohibits 1 prior from always being universally accurate or inaccurate, so the arbitrariness of the prior is just a fact of life.
Replies from: antimonyanthony↑ comment by Anthony DiGiovanni (antimonyanthony) · 2025-01-29T09:21:34.645Z · LW(p) · GW(p)
mostly problems with logical omnisicence not being satisfied
I'm not sure, given the "Indeterminate priors" section. But assuming that's true, what implication are you drawing from that? (The indeterminacy for us doesn't go away just because we think logically omniscient agents wouldn't have this indeterminacy.)
the arbitrariness of the prior is just a fact of life
The arbitrariness of a precise prior is a fact of life. This doesn't imply we shouldn't reduce this arbitrariness [LW · GW] by having indeterminate priors.
Replies from: sharmake-farah↑ comment by Noosphere89 (sharmake-farah) · 2025-01-29T22:01:39.325Z · LW(p) · GW(p)
I'm not sure, given the "Indeterminate priors" section. But assuming that's true, what implication are you drawing from that? (The indeterminacy for us doesn't go away just because we think logically omniscient agents wouldn't have this indeterminacy.)
In one sense, the implication is that for an ideal reasoner, you can always give a probability to every event.
You are correct that the indeterminancy for us wouldn't go away.
The arbitrariness of a precise prior is a fact of life. This doesn't imply we shouldn't reduce this arbitrariness [LW · GW] by having indeterminate priors.
Perhaps.
I'd expect that we can still extend a no free lunch style argument such that the choice of indeterminate prior is arbitrary if we want to learn in the maximally general case, but I admit no such theorem is known, and maybe imprecise priors do avoid such a theorem.
I'm not saying indeterminate priors are bad, but rather that they probably aren't magical.
comment by romeostevensit · 2025-01-28T01:23:11.682Z · LW(p) · GW(p)
Thank you for writing this. A couple shorthands I keep in my head for aspects:
My confidence interval ranges across the sign flip.
Due to the waluigi effect, I don't know if the outcomes I care about are sensitive to the dimension I'm varying my credence along.