Reasons-based choice and cluelessness

post by JesseClifton · 2025-02-07T22:21:47.232Z · LW · GW · 0 comments

Contents

  Preliminaries
  Reasons for belief
  Explicitly reasons-based choice
  Cluelessness
  Acknowledgements
None
No comments

Crossposted from my Substack.

Rational choice theory is commonly thought of as being about what to do in light of our beliefs and preferences. But our beliefs and preferences come from somewhere. I would say that we believe and prefer things for reasons. My evidence gives me reason to believe I am presently in an airport. Any reason I have to think a coin will land heads applies to tails, too, so I should have the same degree of belief in each. The suffering I expect to be averted by donating to charity gives me a reason to donate.

I think reflecting more closely on how our reasons enter into our decision-making can open up new directions for the theory of decision-making. It is particularly relevant, perhaps, for bounded agents who cannot generally act on the basis of expected utility calculations, but can still reflect on and weigh up reasons.

In this post,

Preliminaries

An informal definition of "reason" that's commonly referenced in the philosophy literature is "a consideration that counts in favor of [having some belief or taking some action]"... For example, if I say "I pulled the child out of the pond because I give more weight to the welfare they'll get from living more than the inconvenience to me", I'm treating (my beliefs about) the child's welfare and inconvenience of saving them as reasons to take one action or another.[1]

All of the reasons that actually motivate your decision might not be consciously available to you. But, I'm going to focus on agents who strive to make decisions by explicitly specifying and weighing up reasons. So, when I say that "is a reason for" some agent, I mean that they are deliberately taking to be an input into their decision-making. This is why I say I'm writing about "explicitly" reasons-based choice.

I think this is a particularly attractive perspective on bounded rationality. We may or may not think that we can have "expected values", especially "expected values for total welfare across the cosmos given our actions", in any action-guiding sense. But we can at least strive to be clear-eyed about the reasons that go into our decision-making and how we're weighing them up.

Reasons for belief

What does it look like to base our beliefs on reasons?

For a certain kind of idealized "objective" Bayesian, the reasons for adopting some belief might be reasons for adopting a particular prior (e.g., the principle of indifference) and their evidence. They combine these reasons to get a belief corresponding to that prior conditionalized on their evidence. For an idealized subjective Bayesian, it may instead be enough that they happen to like some prior.

Bounded agents like us generally can't explicitly do the whole Bayesian thing. Our reasons for belief also consist of our evidence, but the evidence we can point to is much fuzzier than that of an ideal Bayesian. We certainly can't specify an exhaustive set of possible worlds to assign probabilities to, in part because many possibilities haven't even occurred to us.

Still, when we ask ourselves for our credences about some question — say, whether the war in Ukraine will have ended a year from now — some number might appear to our consciousness. Maybe it is 0.62. Does "0.62 appeared to my consciousness when I asked myself for my credence in this hypothesis" constitute a strong reason to adopt that as my belief? For some yes, for others no. Some may think that no other reason is needed to justify this belief than that it's their subjective feeling. Others might want more.

I'm in the camp that wants much more reason for adopting some belief than the mere fact that it appeared to my consciousness. (This goes not just for numerical probabilities, but qualitative beliefs like "AI doom is more likely than not" or "I expect donating to the AMF to lead to greater welfare than doing nothing".) While I can't usually use formal criteria like the principle of indifference, I can be guided in the formation of my beliefs by qualitative principles [LW · GW]. When deciding whether to wander around blindly in the road [EA(p) · GW(p)], I can think:

I have a model of the world that says that walking blindly in the road typically incurs a high risk of severe injury [principle: this belief fits the evidence], and I expect the mechanisms posited by this model not to have suddenly changed [principle: Occam's razor says I should give low weight to these mechanisms suddenly changing for no reason], so I shouldn't walk in the road.

(Of course, I don't need to go through this kind of thought process every time I do anything. This general pattern makes me feel justified in deferring to my instincts in mundane situations, at least as far as my self-interest is concerned.)

Then take a hypothesis like "misaligned AI will kill everyone". How far can qualitative principles take us? First, is there a mechanistic model of AI doom over which we can get a principled (if somewhat vague) prior? There is, for example, the counting argument for scheming, which says "most goals that are compatible with high training performance lead to scheming, so by a principle of indifference, we'll probably get a schemer". That seems reasonable on its face. But I have little idea how to put a particular indifference prior over the space of goals learned by AI training (over what specific space, for starters?), and I have little idea how to update on the empirical evidence of LLM performance thus far. (Cf. this post [LW · GW] and comments.)

What about principles that say to defer to better reasoners? Well, Eliezer Yudkowsky et al. are really smart, and their p(Doom) is very high. Other really smart people like Paul Christiano have lower numbers. I trust Christiano's reasoning a bit more, but how, precisely, should I weigh these respective views? And then there are the superforecasters [EA · GW] whose average p(Doom) is much smaller still, like 1%. Looking at the quality of their arguments, I agree with Joe Carlsmith in the linked post that we shouldn't give a ton of weight to this. But also, superforecasters are supposed to be pretty good at forecasting. Should I really give zero weight to their views? And if not, how much should I give?

And so on.

Personally, if pressed to give a quantitative expression of my belief in AI doom I might say "maybe 10-90% [AF · GW]". But a more honest description of my epistemic state would be to gesture at this heap of considerations, to gesture at the list of qualitative epistemic principles that I like, and shrug. (Perhaps this is what people mean when they talk about certain estimates being "made-up" [EA(p) · GW(p)] or "speculative" [EA · GW] or say that they "don't know" what "the" probability of some far-future outcome is.)

Again, for many people, it may be enough to look at all this and say: "The fact that I looked at all these considerations and I asked my brain for a number and it spat out 0.78 is enough reason for me to adopt that as my belief." My main purpose isn't to argue for one perspective. The goal is to illustrate the notion of "reasons for belief" and how attending to our reasons for belief may complicate our picture of belief formation.

In any case, we still have to do something. And when we do something we will be giving some weight, implicitly or explicitly, to the possibility of AI doom. I'll say more about that shortly.

Explicitly reasons-based choice

Now we've got our reasons for belief. These will be inputs into our overall reasons for action, which we'll then weigh up to get a decision.

A simple approach to reasons-based choice is an additive weighing model. We just attach weights to each of the reasons that is true of action , and add them up to get an overall score: .

For example: consider an idealized Bayesian utilitarian. Their reasons for belief yield a probability distribution over worlds given different actions. Their basic reasons for action — above — are then of the form:

My credence that moral patient has welfare at time , given , is (for short, ).

(I've written the "" in "" to make it really clear that the agent's probabilities are derived from their reasons for belief.)

Each such reason is then given weight and added up:

.

Again, as bounded agents, we can't do this. We can, at most, assign probabilities to the welfare of coarse groups of moral patients (potential malaria victims, farmed chickens, far-future digital people, ...). For most EAs, most of the time, this will mostly not be explicitly quantitative, and instead will be happening at a largely intuitive level. But we could take judgements like "looking at all my (admittedly crude, fuzzy) reasons, my expected total welfare of donating to AMF is higher than that of doing nothing" to be primitive aspects of our epistemic state. Nonetheless, I think that reflecting on the relationship between our reasons and such expectations makes it less clear that we ought to be guided by the latter.

Cluelessness

In his discussion of cluelessness, Mogensen looks at two charities aimed at doing good in the short-term: the Against Malaria Foundation (AMF) and the Make-a-Wish Foundation (MAWF). He points out that the net long-run welfare implications of donating to one or the other are extremely unclear. For example, it's unclear what the effect of saving a child from malaria has on population growth. There are reasons to think it contributes to population growth, and reasons to think it contributes to population decline. And not just that, there are many reasons to think changes in population growth are either good or bad. For example, population growth affects economic growth, technological progress, resource scarcity, all of which have many potential consequences for long-term welfare (e.g., via the probability of existential catastrophe).

Let's collect all of these considerations bearing on the flow-through effects of donating to AMF vs. MAWF, as well as any relevant epistemic principles, into a set of reasons called . There are also our reasons related to the direct effects, , containing GiveWell's research etc. Let's say these favor AMF. We'll make our decision by attaching weights to each of these and adding them up:

.

And how do we go about this? Three options stand out to me.

Option 1 is the "maximize expected value" option. We probably don't want to try to write down probabilities and utilities for all of the potential outcomes to get a numerical expected value. But, as discussed above, we might try to form a comparative judgement like, "The expected flowthrough effects from giving to the AMF are better than those of MAWF. So I should donate to the AMF."

Option 2 is to specify imprecise expected values. We say, "There's no way that these considerations pin down precise probabilities or expected utilities. It's only rational for me to have highly imprecise expected utilities, so that the difference in my expectations is a wide interval around 0:

.

A natural decision rule for this setting is the "maximality" rule, which says AMF over MAWF if AMF has better expected value for every one of the precise probability distributions that together represent my beliefs. So, we'd need the smallest number in this interval to be positive. As it stands, the interval contains both positive and negative numbers, so the maximality rule says it's indeterminate which action should be preferred. In fact, this is Mogensen's argument in the linked paper: We ought to have highly imprecise expectations for these effects; the maximality rule is plausible; therefore it's plausible that we're clueless about whether to give to AMF or MAWF.

I like Option 2 better than Option 1, because it at least holds that our beliefs ought to be severely indeterminate [LW · GW], which seems like the appropriate response.

But, you might think that we should still be able to get a determinate preference from imprecise expectations. For example, maybe you think we should take a uniform average of our imprecise expectations, to get precise scores for each action. (I'm somewhat sympathetic to a move like this, for reasons beyond the scope of this post.) Then we can just take the action with the higher score — no cluelessness?

I'm afraid this doesn't get you very far, because I don't think our reasons pin down particular intervals of numbers, either. Our beliefs about are much more like [Vague negative number, Vague positive number] than any definite interval. What's the average of that?

So, I think we should consider Option 3. Option 3 says:

In our example, the weights we attach to reasons given by the near-term consequences are much more grounded in principles we endorse than any weights we could assign to reasons in . (For example, the way we weigh things up here feels a lot closer to the ideal of "derive a prior over world-models from principles I endorse and conditionalize on my evidence".) And it seems kind of reasonable to say that we should set the weight of {all the reasons for which we have qualitatively less principled ways of setting weights} to zero, if these point in different directions. (I suspect this is a good description of many people's intuitions when they favor "near-termism" despite being sympathetic to axiological strong longtermism. Cf. this post.)

(Of course, there isn't always going to be a clear boundary between reasons we do and don't know how to weigh up. But there's going to be some arbitrariness no matter what. In Options 1 and 2, there will be arbitrariness in the exact precise or imprecise probabilities we specify. And you might still think that Option 3 has the best balance of avoiding arbitrariness and respecting other intuitions.)

I emphasize that this isn't the same as saying I have expected values for long-term welfare given each intervention which precisely cancel out! At least not in any sense I care about. I.e., if I adopt Option 3, I am not saying "my best guess is that the expected flow-through effects from AMF and MAWF are exactly the same". That isn't my best guess, I don't have a best guess, I only have a pile of considerations that leave me with no idea. Instead, our weights have to come from somewhere besides best guesses, and the train of thought above leads to a weight of zero. (You might say, "Well, you're still acting as if you had expectations that precisely cancel out". But I don't care what I'm acting as if I'm doing. I care about the justifications for the actual process by which I derive my decision.[2])

This will probably seem like a strange way to reason to many people who are used to thinking about doing good in terms of maximizing expected total welfare. But it isn't so strange if you take the more normatively fundamental procedure to be the weighing of reasons given by our beliefs about our effects on the welfare of potential moral patients. From that point of view, we might see expected utility maximization as a very special case of reasons-weighing, available only to agents who are capable of assigning weights corresponding to well-grounded expected values.

Anyway, I'm not fully convinced of Option 3. But I do want to flag it as an option that isn't irrational on its face, and is consistent with the ideal of maximizing total welfare while (I'd guess) respecting many people's intuitions about how to reason as bounded agents.

Either way, a broader point is: Once we begin to be explicit about our reasons for belief, and don't take for granted that we must always strive to distill these into expected values, we can allow ourselves to explore different ways of respecting our reasons. And that might lead us to more satisfying ways of being boundedly rational.

Acknowledgements

Thanks to Anthony DiGiovanni, Anni Leskelä, Sylvester Kollin, Martín Soto, and Michael St. Jules for comments and discussion.


  1. I've tried to keep the rest of the discussion pretty informal and non-academic. But philosophers have written a lot about reasons. See the SEP article on reasons for action, for example. I found these two papers particularly helpful for understanding the reasons-based perspective and for the formal frameworks they present. The papers give representation theorems for reasons-based choice and belief, respectively. ↩︎

  2. Plus, this approach probably won't actually be formally equivalent to setting the expected total welfare associated with R_flowthrough to 0. This is because even if we find a new reason r+ (say) in favor of AMF, we probably still sometimes want to set the weight of R_flowthrough \union {r+} to 0! I.e., we probably want insensitivity to mild sweetening [LW · GW], which can't be captured by updating precise probabilities. ↩︎

0 comments

Comments sorted by top scores.