Deference and Decision-Making

post by ben_levinstein (benlev), Daniel Herrmann (Whispermute), Aydin Mohseni (aydin-mohseni) · 2025-01-27T22:02:17.578Z · LW · GW · 0 comments

Contents

  Two Types of Expert Trust
  Why Existing Theories Break Down
  Deference As Decision
    Why Thinking about Decisions Works Better
  An Epistemic Connection
  From Decisions to Deference Principles
  Why This Matters
  Looking Ahead
None
1 comment

You're talking with a leading AI researcher about timelines to AGI. After walking through various considerations, they tell you: "Looking at the current pace of development, I'd put the probability of AGI within 3 years at about 70%. But here's the thing—I know several really sharp researchers who think that timeline is too aggressive. And given how complex and fast-moving this field is, I have to admit there's a real chance—maybe 20%—that they've got better judgment than I do on this."

What should you make of this? The researcher clearly knows far more than you about AI development. But they're also expressing real uncertainty about their own expertise. Should this make you trust them less?

This gets at a puzzle about expert judgment that turns out to have implications well beyond AI: 

When should we defer to experts who doubt themselves? 

The standard philosophical theories suggest, somewhat surprisingly, that we can't rationally defer to such experts at all, at least not directly or in any clean way. But this seems wrong—surely we can learn from experts even when they're uncertain about their own expertise!
 

This post is the first in a series based on this paper by Kevin Dorst, Ben Levinstein, Bernhard Salow and others. Most of the technical details are omitted in the first two posts, but the third will have a technical overview. Much more technical detail can be found in the paper.

This first post deals with deference to epistemically modest experts in a sense developed below. It might seem overly philosophical, but as we'll see the principle developed will connect nicely to more local forms of deference. We also think some of the concepts developed here are connected to some important principal/agent problems in AI alignment[1], which we will explore in future posts.

Two Types of Expert Trust

First let’s distinguish two different levels of trust we can have in experts:

  1. Global deference: Trusting someone's opinions about everything they have opinions about. Not just on a single question, but on every proposition or estimate they might make. This perhaps makes sense for some things. For example, you might think there's some "true distribution," and if you learned what that is, you should just listen to it. For example, objective chances (if such there be [LW · GW]) are commonly thought to play that role. If you learn the chance of some event, that should determine your own credence in the event.  When you globally defer to the objective chances, you're trusting their probabilities for any event that might occur —not just single coin flips, but weather, the economy, conditional probabilities, everything. Another example, for perfect Bayesians, is deferring to your own future credences. If you're certain your credence tomorrow in some proposition is , and you're sure you only will update by conditioning on your total evidence without forgetting anything, then your credence right now in that proposition will be [2]
  2. Local deference: Trusting someone on specific questions while not listening to them much on others. This is how we usually trust human experts. You trust the weather forecaster about tomorrow's weather but not about whether you need new tires. You trust your doctor about your treatment but not about your career choices.

Most real expert deference is local, and that's just what we'd expect! We don't need experts to be smarter than us about everything to learn from their domain expertise.

Why Existing Theories Break Down

The standard philosophical theories of expert deference are built around what are called "Reflection principles."Here's one common The basic idea sounds reasonable: if you consider someone an expert, you should adopt their opinions as your own. If they're 70% confident in X, you should become 70% confident in X upon learning that's their view.

This is obviously a strong form of deference. But it makes sense in a lot of local cases: I reflect my weather app’s forecast, many betting market probabilities, and the probabilities of outcomes in a game of chance like Roulette.

But we see there’s trouble for modest experts like our AI researcher. Let's see how this plays out with our AI researcher. They're:

You cannot globally reflect such an expert. Why? Intuitively, what goes wrong is that when you reflect someone, you’re supposing that they are in fact the expert without any doubt, but they themselves are unsure they’re the expert.

To see how this ultimately conflicts with the laws of probability, suppose Alice is unsure whether Bob or Carol is the expert. Bob assigns probability .8 that he’s the expert but also thinks there’s a .2 chance Carol is. What should Alice’s probability be that Bob’s the expert on the supposition that Bob’s the expert? Probability tells us that, conditional on Bob being the expert, our credence in his expertise should be 1. However, if we are certain that Bob is the expert and reflect him, then we must adopt all of his opinions as our own. Since Bob himself isn’t certain of his expertise, reflection entails we can't be certain of his expertise either.

More explicitly, if  represents Alice's credences,  represents Bob's, and  represents the expert's credences whatever they are, then we can represent the event that Bob is the expert by . By Reflection, we should have , but obviously the laws of probability dictate that .[4]

So, even in idealized cases, we need a better theory that allows learning from experts even when – perhaps especially when – they are uncertain about their own expertise.

Deference As Decision

To understand how to defer more generally, we switch a more pragmatic orientation. Instead of thinking about deference in terms of copying someone's beliefs, we should think about it in terms of decisions. You defer to someone when you'd rather have them make decisions on your behalf (assuming their interests are aligned with yours).[5]

Take our AI researcher. Even if they're uncertain about their own judgment, you might still prefer to use their probabilities rather than your own when making decisions about AI-related matters. If you had to bet on AGI timelines, you might reasonably prefer to bet based on their 70% estimate rather than your own less-informed view.

Put differently, you treat someone as an expert if you would always (weakly) prefer to use their probabilities combined with your utilities to make decisions rather than using your own probabilities. (Of course, you would like even more to use your own probabilities conditioned on learning what their probabilities are to using their probabilities. But if given just the option of using their unknown-to-you probabilities or your own probabilities, you’d go with them.)

This approach has several advantages:

Why Thinking about Decisions Works Better

Let's make this more concrete. When you defer to an expert in this way—by preferring to use their probabilities for decision-making—two interesting things happen:

First, this kind of deference turns out to work perfectly well with modest experts. Sure, our AI researcher isn't 100% confident in their own judgment. But that uncertainty is already baked into their probabilities—their 70% estimate already takes into account their doubts about their own expertise.[6] When making decisions about AI timelines, you might still prefer to use their carefully considered but self-doubting judgment over your own less informed one.

Second, this naturally extends to local deference. The question isn't "should I adopt all their opinions?" but rather "for which kinds of decisions would I prefer to use their probabilities?" You might want the AI researcher's probabilities when betting on AI timelines but not when deciding what to eat for lunch. You don't need some grand theory about why it's okay to trust them on some things but not others—it just falls out of asking which decisions you'd want them to make on your behalf (assuming basic preference alignment).

An Epistemic Connection

When you flesh out this "deference as decisions" view mathematically, it turns out to be equivalent to something that sounds quite different.[7] You prefer to use someone's probabilities for decisions if and only if you expect their estimates to be at least as accurate as your own.

In other words:

These turn out to be exactly the same thing! 

If you prefer using their probabilities for every possible decision, that turns out to be mathematically equivalent to expecting their estimate to be more accurate than yours according to any reasonable measure of accuracy. 

(This holds both locally and globally. If you are willing to outsource on everything, then that's equivalent to saying that you expect better accuracy when scoring the entire credence function. If you're willing to outsource all bets on whether it rains, then that's equivalent to saying that you expect better accuracy on the question of whether it rains.)[8]

But what makes a measure of accuracy "reasonable"? The key requirement is that it should incentivize honesty—if you're trying to minimize your expected inaccuracy score, your best strategy should be to report your actual probabilities. (Technically, we call such measures "strictly proper scoring rules", which are discussed in some detail here [LW · GW].) The familiar Brier score (squared distance) is one example, but there are many others.

Absolute distance doesn't qualify—it can sometimes incentivize you to report probabilities more extreme than what you really believe. This is why the connection between decision-making and accuracy only holds when we restrict ourselves to proper measures of accuracy.

From Decisions to Deference Principles

This view suggests a new principle for when to defer to experts. Let's contrast it with the standard view:

Reflection says: Your probabilities should exactly match the expert's when you learn what they are. 

In other words, upon learning the expert assigns probability  to event , your probability for  should be . Equivalently in the global case, we could say that for any random variable , given that the expert’s estimate of   is , your estimate is also .

But this is too demanding when we have modest experts, as we’ve seen.

Total Trust says something weaker but more useful: When you learn the expert's estimate for something is above some threshold, your estimate should be above it too. 

Formally: For any random variable  and threshold , if you learn the expert's estimate for  is at least , your estimate for  should be at least 

(In the global case, we could also equivalently formulate the principle of Total Trust using “is no more than ”. In the local case, we need to add that in to ensure symmetry.)

The key difference is that Total Trust doesn't require you to exactly match the expert's opinions—just to "move with" them in the right direction. When they're confident in something, you become confident too.[9]

Why This Matters

This new way of thinking about deference—in terms of decisions and estimation rather than copying beliefs—has several important implications:

First, it vindicates our intuitive trust in modest experts. Being uncertain about your own expertise isn't necessarily a flaw incompatible with Bayesianism or good judgment. Our framework explains why we can rationally defer to such experts even when they express uncertainty about their own abilities.

Second, it helps clarify when local deference makes sense. Rather than needing a story about why it's okay to trust someone on some topics but not others, we can simply ask: "For which kinds of decisions would I prefer to use their probabilities?" This matches how we actually rely on expert judgment in practice.

Finally, the connection to accuracy gives us a test for when deference is warranted and how the two concepts connect. If you expect someone's estimates to be more accurate than your own (by any reasonable measure), you should prefer to use their probabilities for decisions. This gives us a principled way to identify expertise worth deferring to.

Looking Ahead

We've seen how thinking about deference in terms of decisions helps solve some puzzles about expert judgment. It explains how we can rationally trust modest experts, and it provides a natural framework for understanding local deference.

But several important questions remain. 

In the next post, we'll focus on local deference—how can we formally characterize what it means to trust someone about some questions but not others? This turns out to have surprising implications even for experts who aren't modest at all. And it's more applicable to most types of situations where we want to defer, since global experts are rare.

For readers interested in the mathematical details behind these ideas, our final post will explore the formal framework and technical results, showing exactly how these principles work and what they require.

  1. ^

    For another post connecting epistemic deference to alignment, see this one [LW · GW] by @abramdemski [LW · GW

  2. ^

    This is the Reflection Principle referenced below. There might be some exceptions for so-called self-locating propositions. Reflection is closely related to conservation of expected evidence [LW · GW].

  3. ^

    "Right" here is tricky to think about it. The most straightforward interpretation is that they're 80% confident that they're the expert here instead of some other researcher. But what does that mean? Here's an operationalized interpretation. Suppose Alice will use the credences of whoever the smartest researcher is to make some decision. She doesn't know who that is yet, and neither does this particular AI researcher, but the researcher is 80% confident it's her. (See below for more on outsourcing decision-making to the expert.)

  4. ^

    You might worry there's some funny business with these "higher-order" probabilities with some unholy sort of typing and semantics, but the semantics we use in the paper treats propositions like  as an ordinary set of points in the sample space with  being a random object. 

  5. ^

    Generalizing this is where the connection to alignment comes in. For this post, we're asking when it's worth outsourcing decision-making to someone who has the same basic preferences as you but who you think has more accurate credences. The broader question is when you're willing to outsource decision making even if there's some misalignment of preferences, the decider doesn't have preferences in any natural sense, or can't reasonably be represented as anything like an EU-maximizing agent. Stay tuned for a later post on this more general question. 

  6. ^

    How a modest expert incorporates doubts about her own expertise into her judgment is a subtle matter, although it is compatible with Bayesianism. Suppose such an expert is someone you can globally defer to (while being Bayesian yourself). So you are willing to outsource all your decision making to this person despite the self-doubt. As it turns out (roughly), her own probabilities have to be a mixture of those of all other possible experts and her own judgment conditional on the statement that she's the expert. For example, if Bob, Carol, and Dave are possible experts that can be deferred to, then Bob's credences are a mixture of Carol's credences, Dave's credences, and then what he would think if he were to learn he was actually the expert. See Theorem 4.1 of the paper.

  7. ^

    Important caveat: many of the results we have are known to hold only when we assume the underlying probability space is finite. We don't have good generalizations yet for the infinite case.

  8. ^

    See Theorem 3.2 of this paper. Also the main result from here. All the proofs are a bit janky, but for a somewhat cleaner one, see this.

  9. ^

    Here's a simple example where they come apart. There are two worlds  and  assigns credence .5 to each. The expert at  assigns credence .9 to  and credence .1 to . The expert at  assigns credence .8 to  and credence .2 to . Because each expert is modest (assigning positive credence to worlds where it's not the expert),  can't reflect the expert. But  does totally trust the expert.

0 comments

Comments sorted by top scores.