[Intuitive self-models] 8. Rooting Out Free Will Intuitions

post by Steven Byrnes (steve2152) · 2024-11-04T18:16:26.736Z · LW · GW · 1 comments

Contents

  8.1 Post summary / Table of contents
  8.2 Recurring series theme: Intuitive self-models have less relation to motivation than you’d think
  8.3 …However, the intuitive self-model can impact motivation via associations
  8.4 How should we think about motivation?
    8.4.1 The framework I’m rejecting
    8.4.2 My framework: valence, associations, and brainstorming
  8.5 Six worked examples
    8.5.1 Example 1: Implicit (non-self-reflective) desire
    8.5.2 Example 2: Explicit (self-reflective) desire
    8.5.3 Example 3: Akrasia
    8.5.4 Example 4: Fighting akrasia with attention control
    8.5.5 Example 5: The homunculus’s monopoly on sophisticated brainstorming and planning
    8.5.6 Example 6: Willpower
      8.5.6.1 Aside: The “innate drive to minimize voluntary attention control”
      8.5.6.2 Back to Example 6
  8.6 Conclusion of the series
    8.6.1 Bonus: How is this series related to my job description as an Artificial General Intelligence safety and alignment researcher?
None
1 comment

8.1 Post summary / Table of contents

This is the final post of the Intuitive Self-Models series [? · GW].

One-paragraph tl;dr: This post is, in a sense, the flip side of Post 3 [LW · GW]. Post 3 centered around the suite of intuitions related to free will. What are these intuitions? How did these intuitions wind up in my brain, even when they have (I argue) precious little relation to real psychology or neuroscience? But Post 3 left a critical question unaddressed: If free-will-related intuitions are the wrong way to think about the everyday psychology of motivation—desires, urges, akrasia, willpower, self-control, and more—then what’s the right way to think about all those things? In this post, I offer a framework to fill that gap.

Slightly longer intro and summary: Back in Post 3 [LW · GW], I argued that the way we conceptualize free will, agency, desires, and decisions in the “Conventional Intuitive Self-Model” (§3.2 [LW · GW]) bears little relation to what’s actually happening in the brain. For example, for reasons explained in detail in §3.3 [LW · GW] (brief recap in §8.4.1 below), “true” desires are conceptualized in a way that makes them incompatible with having any upstream cause. And thus:

Of course, all those things are true. And there’s nothing mysterious about that from a neuroscience perspective. The only problem is how it feels—it’s those pesky free-will-related intuitions, which as I explained in Post 3 [LW · GW], are both deeply entrenched (strong prior [LW · GW]) and deeply misleading (not a veridical [LW · GW] model of anything at all, see §3.3.3 [LW · GW] and §3.6 [LW · GW]).

Now, I expect that most people reading this are scoffing right now that they long ago moved past their childhood state of confusion about free will. Isn’t this “Physicalism 101” stuff? Didn’t Eliezer Yudkowsky describe free will as [? · GW] “about as easy as a philosophical problem in reductionism can get, while still appearing ‘impossible’ to at least some philosophers”? …Indeed, didn’t we already wrap up this whole topic, within this very series, way back in §3.3.6 [LW · GW]??

But—don’t be so sure that you’ve really moved past it! I claim that the suite of intuitions related to free will has spread its tentacles into every corner of how we think and talk about motivation, desires, akrasia, willpower, self, and more. If you can explain how it’s possible to “freely make decisions” even while the brain algorithm is inexorably cranking forward under the hood, then good for you, that’s a great start. (If not, see §3.3.6 [LW · GW]!) But if you’re “applying willpower to fight laziness”, then what is actually happening in your brain? …And, y’know, if free-will-related intuitions generate such confusion in other areas, then isn’t it concerning how much you’re still relying on those same intuitions when you try to think through this question?

Thus, if we want a good physicalist account (§1.6 [LW · GW]) of everyday psychology in general, and motivation in particular, we need to root out all those misleading free-will-related intuitions, and replace them with a better way to think about what’s going on. And that’s my main goal in this post.

The rest of the post is organized as follows:

In Section 8.6, I conclude the whole series, including five reasons that this series is relevant to my job description as an Artificial General Intelligence safety and alignment researcher.

8.2 Recurring series theme: Intuitive self-models have less relation to motivation than you’d think

One of the things I’m trying to do in this series is to de-exoticize the motivations of people with unusual intuitive self-models. We’ve seen this over and over:

In each of these cases, the exotic misconceptions strike many people as more intuitive and plausible than the banal reality. That was certainly the case for me, before I wrote this series!

The problem in all of these cases is that we’re trying to think about what’s happening through the lens of the Conventional Intuitive Self-Model (§3.2 [LW · GW]) and its deeply confused conceptualization of how motivation works. In particular, these intuitions suggest that desires have their root cause in the “homunculus” (a more specific aspect of the “self”), and its “wanting” (§3.3.4 [LW · GW]), with no possible upstream cause prior to that (§3.3.6 [LW · GW]). So when an intuitive self-model deeply changes the nature of the homunculus concept (or jettisons it altogether), we by default mistakenly imagine that the desires get deeply changed (or jettisoned) at the same time.

(Top part is copied from §3.5.3 [LW · GW].) Within the Conventional Intuitive Self-Model (CISM), the desire for justice seems to originate as a property of the homunculus, with no possible upstream cause (§3.3.6 [LW · GW]). Therefore, if the homunculus concept gets thrown away entirely (bottom), as it does in awakening / PNSE (§6.3 [LW · GW]), then it would seem, from the CISM perspective, to imply that justice-seeking behavior would go away as well. In reality, the CISM is wrong; there is an upstream cause of justice-seeking behavior (top-right), and that’s why such behaviors can remain in PNSE (although there can be changes on the margin, see below).

8.3 …However, the intuitive self-model can impact motivation via associations

The previous section was an argument against the common intuition that intuitive self-models have an overwhelming, foundational impact on motivations. But I want to be clear that they do have some impact. Specifically, they have an impact on motivation via associations between concepts.

As background, motivation comes from valence [LW · GW], and valence in turn is a function on “thoughts” [LW · GW].

So for example, maybe the idea of driving to the beach pops into my head, and that idea is positive valence. But then that makes me think of the idea of sitting in traffic, and that thought is negative valence. So I don’t go to the beach.

…What just happened? A big part of it was an association: the idea of driving to the beach is associated with the idea of sitting in traffic. Associations are partly about beliefs (I “believe” that the road is trafficky), but they’re also about saliency (when I think of the former, then the latter tends to pop right into my head.)

So, associations (including but not limited to beliefs) absolutely affect motivations. And the structure of intuitive models affects associations. So this is the path by which intuitive self-models can impact motivation.

We’ve already seen a few examples of such impacts in the series:

8.4 How should we think about motivation?

As I argued in Post 3 [LW · GW], the conception of motivation, agency, and goal-pursuit within the Conventional Intuitive Self-Model centers around the “homunculus” and its “vitalistic force” and “wanting”. But neither the homunculus nor its vitalistic force are a veridical model of anything in the “territory” of either atoms or brain algorithms. So it’s no wonder that, when we try to use this map, we often wind up spouting nonsense.

Instead, I propose to think about motivation much closer to the territory level—i.e., to use concepts that are tightly connected to ingredients in the underlying brain algorithms.

8.4.1 The framework I’m rejecting

Here I’m quickly summarizing a few key points from Post 3: The Homunculus [LW · GW]. You’ve already seen this section in Post 6.

For a proper account of motivation, we need to throw those out—or more precisely, we need to treat those intuitions as merely intuitions (i.e., learned concepts that are present in some intuitive self-models but not others) and not part of what’s really happening in the human brain. Here’s what I propose instead:

8.4.2 My framework: valence, associations, and brainstorming

My framework for thinking about motivation includes the following ingredients:

Brainstorming how to open a coconut. For more discussion of brainstorming see here [LW · GW] & here [LW · GW].

8.5 Six worked examples

In general, intuitions don’t stick around unless they’re producing accurate predictions. So, as much as I’ve been ruthlessly maligning these free-will-related intuitions throughout this post, I acknowledge that they’re generally pointing at real phenomena, and thus our next mission is to get practice explaining those real phenomena in terms of my preferred framework. We’ll go through a series of progressively more involved examples.

8.5.1 Example 1: Implicit (non-self-reflective) desire

Statement: “Being inside is nice.”

Intuitive model underlying that statement: There’s some concept of “inside” as a location / environment, and that concept somehow radiates positive vibes.

How I describe what’s happening using my framework: When the concept of “inside” is active, it tends to trigger positive valence. (See “The (misleading) intuition that valence is an attribute of real-world things” [LW · GW].)

8.5.2 Example 2: Explicit (self-reflective) desire

Statement: “I want to be inside.”

Intuitive model underlying that statement: There’s a frame (§2.2.3 [LW · GW]) “X wants Y” (§3.3.4 [LW · GW]). This frame is being invoked, with X as the homunculus, and Y as the concept of “inside” as a location / environment.

How I describe what’s happening using my framework: There’s a systematic pattern (in this particular context), call it P, where self-reflective thoughts concerning the inside, like “myself being inside” or “myself going inside”, tend to trigger positive valence. That positive valence is why such thoughts arise in the first place, and it’s also why those thoughts tend to lead to actual going-inside behavior.

In my framework, that’s really the whole story. There’s this pattern P. And we can talk about the upstream causes of P—something involving innate drives and learned heuristics in the brain. And we can likewise talk about the downstream effects of P—P tends to spawn behaviors like going inside, brainstorming how to get inside, etc. But “what’s really going on” (in the “territory” of my brain algorithm) is a story about the pattern P, not about the homunculus. The homunculus only arises secondarily, as the way that I perceive the pattern P (in the “map” of my intuitive self-model).

There, in the Conventional Intuitive Self-Model, the self-reflective thoughts are conceptualized as being generated and/or kept around by the homunculus, and correspondingly the pattern P is taken to be indicative of a property of the homunculus—namely, the property of “wanting to be inside”. Why else would the homunculus be holding onto such thoughts, and carrying them through?

8.5.3 Example 3: Akrasia

Statement: “I want to get out of bed, but I can’t, this pillow just feels so soft … ughhhhhhhhhh.”

Intuitive model underlying that statement: As in Example 2, there’s a frame “X wants Y”, filled out with X = homunculus and Y = being-out-of-bed.

Separately, there’s also a frame “X’s plans are being stymied by Y”, filled out with X = homunculus and Y = feelings of comfort associated with laying on the pillow.

How I describe what’s happening using my framework: There’s a systematic pattern (in this particular context), call it P1, where self-reflective thoughts concerning being-out-of-bed, like “myself being out of bed” or “myself getting out of bed”, tend to trigger positive valence.

There’s also a systematic pattern, call it P2, where non-self-reflective thoughts concerning the feeling of the pillow trigger positive valence.

In both cases, that positive valence explains why such thoughts arise in the first place, and also why those thoughts tend to have the effects that they have—i.e., increasing the likelihood of getting out of bed (P1) or not (P2).

Again, that’s “what’s really going on” in my framework. We have two patterns, P1 and P2, and we can talk about the upstream causes and downstream effects of those patterns. There’s no homunculus.

Let’s contrast that with what happens in the Conventional Intuitive Self-Model. There, recall from §3.5 [LW · GW] that the signature of an “intentional” (as opposed to impulsive) action is that it starts with a positive-valence self-reflective thought (which we think of as an “intention”), and that it’s these self-reflective thoughts in particular that seem to happen because the homunculus wants them to happen. So in the Conventional Intuitive Self-Model, the homunculus seems to be causing the pro-getting-out-of-bed thoughts, and thus the P1 pattern is conceptualized as being rooted in a property of the homunculus, namely its “wanting to get out of bed”.

Meanwhile, the pro-staying-in-bed thoughts, which are not self-reflective, correspondingly do not seem to be related to the homunculus, and hence the P2 pattern cannot be explained by the homunculus wanting to stay in bed. Instead, an urge-to-stay-in-bed is conceptualized as a kind of force external to the homunculus, undermining what the homunculus is trying to do.

8.5.4 Example 4: Fighting akrasia with attention control

Let’s continue the scene from above (“I want to get out of bed, but I can’t, this pillow just feels so soft … ughhhhhhhhhh”). Carrying on:

Statement: “Ughh. But I really don’t want to miss the train. C’mon, I can do this. One step at a time. I’ll make myself a deal: if I make it to the train station early enough to wait in line at the Peet’s, then I’ll treat myself to a caffe mocha. Up up up.”

Intuitive model: Per above, the homunculus wants getting-out-of-bed to happen, but is being stymied by the feelings of comfort associated with the pillow. The homunculus then brainstorms how to get around this obstacle, summoning strategies to make that happen, including deft use of both attention control and motor control.

How I describe what’s happening using my framework: I wrote above that positive-valence thoughts automatically summon brainstorming how to make them happen. Awkwardly, this is symmetric!

Indeed, the second story is definitely a thing that can happen too! Both sides of this dilemma can spawn brainstorming and strategic execution of the plans. We’re perfectly capable of brainstorming towards two contradictory goals, by sporadically flipping back and forth.

So maybe, in the Example 4 story, it just so happened that, in this particular instance, the brainstorming-towards-getting-out-of-bed had the upper hand over the brainstorming-towards-staying-in-bed, for whatever random reason.

But still, there is indeed a very important asymmetry, as follows:

8.5.5 Example 5: The homunculus’s monopoly on sophisticated brainstorming and planning

Statement: “If there’s sophisticated brainstorming and planning happening in my mind, it’s because it’s something I want to do—it’s not just some urge. In Example 4 above, if I was rationalizing excuses to stay in bed, then evidently I wanted to stay in bed, at least in that moment (see §2.5.2 [LW · GW]).”

Intuitive model: If there’s ever sophisticated brainstorming and planning towards goal G happening in my mind, then the homunculus must want goal G to happen, at least in that moment.

How I describe this using my framework: As in §8.4.2 above, we have the general rule that, if goal G is positive-valence, then it automatically spawns brainstorming and planning towards G. But I can add an elaboration that, if the corresponding self-reflective thought S(G) is also positive-valence, then this brainstorming and planning can be robust, sophisticated, and long-lasting, whereas if S(G) is negative-valence, then this brainstorming and planning tends to be short-lived and simple.

Why is that? Because the most sophisticated forms of brainstorming and planning involve a lot of self-reflective thoughts—recall Example 4 above, where I was “making a deal with myself”, i.e. a sophisticated attention-control strategy that intimately involves my intuitive self-models. Those self-reflective thoughts may make the associated thought S(G) pop into my head, and then if S(G) is negative-valence (demotivating), I’ll feel like that’s a crappy plan, and more generally that I shouldn’t even be brainstorming towards G in the first place.

Let’s go through an example. Let’s say G is positive-valence but S(G) is negative-valence—i.e., G seems to be an “urge” / “impulse”. For example, maybe I’ve been trying to quit smoking, and G is the idea of smoking. Then here’s something that could happen:

Note, however, that simple, non-self-reflective brainstorming towards G can happen—as in the “impulsive brainstorming” example of §3.5.2 [? · GW], where I crafted and executed a plan to get a cigarette, all without ever “wanting” the cigarette in the homunculus sense. This plan was sophisticated in some objective sense—it involved three steps, relied on my life experience and understanding of the world, and could never have happened by random chance. But in a different sense, the plan was very simple, in that I crafted it in a very fast and “unreflective” way, such the S(G) thought never had a chance to pop up.

So that’s my framework. To sum up: The most powerful forms of brainstorming and planning involve a bunch of self-reflective thoughts, because you need to formulate a plan that will properly integrate with how your own mind works, now and in the future. So, as a rule, if we’re skillfully brainstorming and planning towards some goal G, then it’s almost certainly the case that the corresponding self-reflective S(G) has positive valence. There are exceptions, which I called “impulsive brainstorming” in §3.5.2 [? · GW], but those exceptions tend to involve plans that are relatively fast, simple, and centered around motor-control rather than attention-control.

By contrast, the Conventional Intuitive Self-Model (CISM) “explains” the same set of facts by positing that the homunculus does brainstorming and planning towards things that it “wants”, and that it “wants” the G’s for which S(G) is positive valence. I think CISM doesn’t have a great way to explain “impulsive brainstorming” (§3.5.2 [? · GW]), but impulsive brainstorming is sufficiently minor and unimportant that CISM can get away with glossing over it by mumbling something like “I wasn’t thinking”, even if that’s nonsensical when taken literally.

8.5.6 Example 6: Willpower

Here’s a different continuation of the scene from Example 3 (“I want to get out of bed, but I can’t, this pillow just feels so soft … ughhhhhhhhhh”):

Statement: “I’m just going to get out of bed through sheer force of will. Grrrr … … And I’m standing up!”

Intuitive model: As above, the homunculus “wants” to get out of bed, but is being stymied by the comfortable feeling of the pillow. The homunculus “applies willpower” in order to get its way—some mental move that’s analogous to “applying force” to move a heavy desk while redecorating. It’s conceptualized as “wanting” more intensely and with more “vitalistic force”.

How I describe this using my framework: As above, we have two systematic patterns (in this particular context): Pattern P1 is the fact that self-reflective thoughts concerning being-out-of-bed, like “myself being out of bed” or “myself getting out of bed”, tend to trigger positive valence. Pattern P2 is the fact that non-self-reflective thoughts concerning the feeling of the pillow tend to trigger positive valence.

And also as above, these can both correspondingly trigger brainstorming and planning. When “myself being out of bed” is on my mind, that’s positive valence, so it triggers brainstorming towards getting out of bed. And when “the feeling of the pillow” is on my mind, that’s positive valence, so it triggers brainstorming towards staying in bed. The valences need not be equally strong, so either side might win out. But there’s an additional thumb on the scale that comes from the fact that brainstorming-towards-getting-out-of-bed has a bigger search space, particularly involving attention-control strategies flowing through my intuitive self-model. By contrast, brainstorming-towards-staying-in-bed stops feeling positive-valence as soon as we view it in a self-reflective frame, so those kinds of attention-control strategies will feel unappealing.

Now, as it turns out, there’s a very simple, obvious, one-size-fits-all, attention-control strategy for making pretty much anything happen. Since it involves attention control, this strategy is disallowed for the staying-in-bed brainstorm, but it is an option for the getting-out-of-bed brainstorm.

Here’s the (obvious) strategy: Apply voluntary attention-control to keep S(getting out of bed) at the center of attention. Don’t let it slip away, no matter what.

This strategy is what we call “applying willpower”. Naively, it might seem to be an unbeatable strategy. If S(getting out of bed) remains at center-stage in our minds, then that will keep the feeling of the pillow blocked from conscious awareness. And since S(getting out of bed) is positive valence, the brainstorming / planning process will proceed all the way to the execution phase, and bam, we’re out of bed.

It’s so simple! The interesting question is: why doesn’t that always work? Let’s pause for an aside.

8.5.6.1 Aside: The “innate drive to minimize voluntary attention control”

Recall that valence comes ultimately from “innate drives”, a.k.a. “primary rewards” [LW · GW]—eating-when-hungry is good, pain is bad, etc., along with various social instincts like the “drive to feel liked / admired” [LW · GW], and much more. The exact list of human innate drives is as yet unknown to Science, and happens to be a major research interest of mine.

So here’s my hypothesis for one of those yet-to-be-discovered innate drives: Voluntary attention control is innately negative-valence, other things equal. In particular:

So that’s a hypothesis. To flesh out my case that this alleged innate drive actually exists, let’s go over the usual three questions [LW · GW]:

8.5.6.2 Back to Example 6

Where were we? There’s a goal G (getting out of bed) such that the self-reflective S(G) (myself getting out of bed) is also positive-valence. This enables the more powerful version of brainstorming, the kind of brainstorming where the strategy space includes plans that leverage attention-control in conjunction with my understanding of my own mind. One such plan is the really simple, one-size-fits-all plan to use attention-control to hold S(G) very firmly in mind, so firmly that any thought that might otherwise kick it out (i.e., the comfortable feeling of the pillow) can’t squeeze in. We call this plan “applying willpower” to get out of bed.

Now, we can see why this one-size-fits-all plan doesn’t always work. The plan involves applying voluntary attention control to keep S(G) firmly in mind. This plan gets a positive-valence boost from the fact that S(G) has positive valence. But the plan also gets a negative-valence penalty from the “innate drive to minimize voluntary attention control” above. Thus, if the plan involves too much voluntary attention on S(G), it winds up with negative valence on net, and my brain kicks that thought out and replaces it with something else [LW · GW]. On the other hand, if the plan involves too little voluntary attention on S(G), then the oh-so-appealing thought of the comfortable pillow may successfully bubble up and kick S(G) out of consciousness. Thus, “applying willpower” sometimes works, but also sometimes doesn’t, as we know from everyday experience.

This whole section was a discussion within my own framework. Thus, “applying willpower” points to a real-world psychological phenomenon, but we can explain that phenomenon without any of those problematic intuitions related to the homunculus, vitalistic force, “wanting”, or free will.

8.6 Conclusion of the series

Thanks for joining me on this 45,000-word journey! I for one feel very much less confused about a great many topics now than I was before. Hope you found it helpful too! Thanks to everyone who shared comments, criticisms, and experiences, both before and after initial publication—the series is very much different and better than it would have been otherwise! Keep ’em coming!

Well, I don't want to overstate how related it is. That's why I wrote it fast, aiming just for the big picture rather than the details. But I think the time I spent was worthwhile.

So here are the five main directly-work-relevant things that I got out of writing these eight posts:

First, my top-priority research project right now is coming up with plausible hypotheses for how human social instincts work, which I believe to mainly revolve around little cell groups in the hypothalamus and brainstem (details here [LW · GW]). For example, I think there’s an innate “drive to be liked / admired” [LW · GW], related to social status seeking, but in order to build that drive, the genome must be somehow solving a very tricky “symbol grounding problem”, as discussed here [LW · GW]. Writing this series helped me eliminate a number of wrong ideas about how that works—for example, I now think that the hypothesis that I wrote down in “Spatial attention as a “tell” for empathetic simulation?” [LW · GW] is wrong, as were a couple other ideas that I was playing around with. I have better ideas now—a post on that is forthcoming!

Why exactly was this series helpful for that research project? Well, the big problem with this research project is that there’s almost no direct evidence to go on, for the questions I’m trying to answer about human social instincts. For example, it’s nearly impossible to measure anything whatsoever about the human hypothalamus. It doesn’t show up in fMRI, EEG, etc., and even if it did, the interesting functionality of the hypothalamus is structured as microscopic clusters of neurons, all packed right next to each other, or even worse, intermingled (details here [LW · GW]). There’s lots of data about the rodent hypothalamus,[5] but I think human social instincts are importantly different from mouse social instincts. Thus, being able to scour the weirder corners of human psychology is one of the few ways that I can narrow down possibilities.

For example, if somebody “identifies with the universal consciousness”, do they still feel a “drive to be liked / admired”? Empirically, yes! But then, what does that mean—do they want the universal consciousness to be liked / admired, or do they want their conventional selves to be liked / admired? Empirically, the latter! And what lessons can I draw from that observation, about how the “drive to be liked / admired” works under the hood? I have an answer, but I was only able to find that answer by starting with a deep understanding of what the heck a person is talking about when they say that they “identify with the universal consciousness”. That turned out to be a very important nugget that I got out of writing this series. Again, the payoff is in a forthcoming post.

Second, I sometimes talk about “the first-person problem” [LW · GW] for brain-like AGI: How might one transform third-person data (e.g. a labeled YouTube video of Alice helping Bob) into a AGI’s first-person preferences (“I want to be helpful”)? This almost certainly requires some mechanistic interpretability, which makes it hard to plan out in detail. However, writing this series makes me feel like I have a much better understanding of what we’d be looking for, how it might work, and what might go wrong. For example, if (if!) the AGI winds up with something close to the Conventional Intuitive Self-Model (§3.2 [LW · GW]), then maybe, while the AGI is watching the YouTube, we could find some data structure in its “mind” that we interpret as an X-helping-Y frame with X=Alice and Y=Bob. If so, then we could edit that same frame to X=homunculus, Y=supervisor, and make the resulting configuration trigger positive valence [LW · GW]. There’s still a lot that could go wrong, but again, I feel much more capable of thinking about these issues than I did before.

Third, this 8th post is all about motivation, a central topic in AI alignment. May it help spread clear and correct thinking in that area. I’ve already noticed the ideas in this post coming up in my AI-alignment-related conversations on multiple occasions.

Fourth,  I was intrigued by the observation that social status plays an important role in trance (§4.5.4 [LW · GW]). That seemed to be an intriguing hint of something about how social status works. Alas, it turned out that social status was helpful for understanding trance, but not the other way around. Oh well. Research inevitably involves chasing down leads that don't always bear fruit.

Fifth, there’s the question of sentience and moral patienthood—of people, of animals, and especially of current and future AI systems. While I assiduously avoided directly talking about those topics (§1.6.2 [LW · GW]), I obviously think that this series (especially Posts 1–3) would be highly relevant for anyone working in that area.

Thanks again for reading! Please reach out (in the comments section or by email) if you want to talk about this series or whatever else.

Thanks Simon Skade for critical comments on an earlier draft.

  1. ^

    I’m referring to the neural circuits involved in “social instincts”. I have a short generic argument that such neural circuits have to exist here [LW · GW], and a more neuroscience-y discussion of how they might work here [LW · GW], and much more nuts-and-bolts neuroscience details in a forthcoming post. But that’s off-topic here.

  2. ^

    More precisely: If there are deterministic upstream explanations of what the homunculus is doing and why, e.g. via algorithmic or other mechanisms happening under the hood, then that feels like a complete undermining of one’s free will and agency (§3.3.6 [LW · GW]). And if there are probabilistic upstream explanations of what the homunculus is doing and why, e.g. the homunculus wants to eat when hungry, then that correspondingly feels like a partial undermining of free will and agency, in proportion to how confident those predictions are.

  3. ^

    Recall that, if I’m thinking about the nice feeling of the pillow, that’s generally not self-reflective, but rather conceptualized as a property of the pillow itself, out in the world. See §3.3.2 [LW · GW].

  4. ^

    For one thing, if it’s true at all that voluntary attention-control entails more energy consumption than daydreaming, then the difference is at most a small fraction of the brain’s total 20 watt power budget. Compare that to running, which might involve generating 200 watts of mechanical power plus 600 watts of heat. It’s absurd to think that energy considerations would figure in at all. “Thinking really hard for ten seconds” probably involves less primary metabolic energy expenditure than scratching your nose.

    For another thing, I think my evolutionary story makes sense. Evolution invests so much into building and training a human brain. The obvious cost of using your brain to think about X is the opportunity cost: it means you’re not using your brain to think about any other Y. It seems highly implausible that Evolution would be ignoring this major cost.

    For another thing, I think I can be mentally “tired” but not physically tired, and vice-versa. But be warned that it’s a bit tricky to think about this, because unpleasant physical exertion generally involves both attention control and motor control. Thus, “willpower” can somewhat substitute for “physical energy” and vice-versa. See “Example 1” here.

    Most importantly, there was never any good reason to believe that “the innate drive to minimize voluntary attention control” and “the innate drive to minimize voluntary motor control” have anything to do with each other, in the first place. They intuitively seem to be related, but this isn’t a situation where we should trust our intuitions to be veridical. In particular, there’s such a strong structural parallel between those two drives that they would feel related regardless of whether they actually were related or not.

  5. ^

    Although, even in the rodent hypothalamus, I believe that only a small fraction of those probably hundreds of little idiosyncratic neuron clusters have been characterized in enough detail to say what they do and how.

1 comments

Comments sorted by top scores.

comment by Seth Herd · 2024-11-04T20:42:18.058Z · LW(p) · GW(p)

Excellent! As usual, I concur.

I certainly haven't seen this clear a description of how human brains (and therefore people) actually make decisions. I never bothered to write about it, on the excuse that clarifying brain function would accelerate progress toward AGI, and a lack of clear incentives to do the large amount of work to write clearly. So this is now my reference for how people work. It seems pretty self-contained. (And I think progress toward AGI is now so rapid that spreading general knowledge about brain function will probably do more good than harm).

First, I generally concur. This is mostly stuff I thought about a lot over my half-career in computational cognitive neuroscience. My conclusions are largely the same, in the areas I'd thought about. I hadn't thought as much about alternate self-concepts except for DID (I had a friend with pretty severe DID and reached the same conclusions you have about a "personality" triggering associated memories and valences). And I had not reached this concise definition of "willpower", although I had the same thought about the evolutionary basis of making it difficult to attend to one thing obsessively. That's how you either get eaten or fail to explore valuable new thoughts.

 In particular, I very much agree with your focus on the valance attached to individual thoughts. This reminded me of a book title, Thoughts Without A Thinker, an application of buddhist psychology theory to psychanalysis. I haven't read but read and heard about it. I believe it works from much the same framework of understanding why we do things and feel as we do about things, but I don't remember if it's using something like the same theory of valence guiding thought.

 

Now to my couple of addendums.

First is an unfinished thread: I agree that we tend to keep positively-valanced thoughts while discarding negatively-valenced ones, and this leads to produtive brainstorming - that which leads toward (vaguely) predicted rewards. But I've heard that many people do a lot of thinking about negative outcomes, too. I am both constitutionally and deliberately against thinking too much about negative things, but I've heard that a lot of the current human race spends a lot of time worrying - which I think probably has the same brainstorming dynamic and shares mechanisms with the positively oriented brainstorming. I don't know how to explain this; I think the avoidance of bad outcomes being a good outcome could do this work, but that's not how worrying feels - it feels like my thoughts are drawn toward potential bad outcomes even when I have no idea how to avoid them yet.

I don't have an answer. I have wondered if this is related to the subpopulation of dopamine cells that fire in response to punishments and predicted punishments, but they don't seem to project to the same areas controlling attention in the PFC (if I recall correctly, which I may not).  Anyway, that's my biggest missing piece in this puzzle.

 

Now onto "free will" or whatever makes people think that term sounds important. I think it's a terrible, incoherent term that points at several different important questions about how to properly understand the human self, and whether we bear responsibility and control our own futures.

I think different people have different intuitive models of their selves. I don't know which are most common. Some people identify with the sum total of their thought, and I think wind up less happy as a result; they assume that their actions reflect their "real" selves, for instance identifying with their depression or anger. I think the notion that "science doesn't believe in free will" can contribute to that unhelpful identification. This is not what you're saying, so I'm only addressing phrasing; but I think one pretty common (wrong) conclusion is "science says my thoughts and efforts don't matter, because my genes and environment determine my outcomes". You are saying that thoughts and "efforts" (properly understood) are exactly what determine actions and therefore outcomes. I very much agree.

I frame the free will thing a little differently. To borrow from my last comment on free will [LW(p) · GW(p)] [then edit it a bit]:

I don't know what people mean by "free will" and I don't think they usually do either. [...]

I substitute the term "self-determination" for "free will", in hopes that that term captures more of what people tend to actually care about in this topic: do I control my own future? Framed this way, I think the answer is more interesting- it's sort of and sometimes, rather than a simple yes or no. 

I think someone who's really concerned that "free will isn't real" would say sure [their thoughts] help determine outcomes, but the contents of my consciousness were also determined by previous processes. I didn't pick them. I'm an observer, not a cause. My conscious awareness is an observer. It causes the future, but it doesn't choose, it just predicts outcomes.

[...]

So here I think it's important to break it down further, and ask how someone would want their choices to work in an ideal world (this move is essentially borrowed from Daniel Dennett's "all the varieties of free will worth wanting").

I think the most people would ask for is to have their decisions and therefore their outcomes controlled by their beliefs, their knowledge, their values, and importantly, their efforts at making decisions.

I think these are all perfectly valid labels for important aspects of cognition (with lots of overlap among knowledge, beliefs, and values). Effort at making a decision also plays a huge role, and I think that's a central concern - it seems like I'm working so hard at my decisions, but is that an illusion? I think what we perceive as effort involves more of the conscious predictions you describe [...] It also involves more different types of multi-step cognition, like analyzing progress so far and choosing new strategies for decisions or intermediate conclusions for complex decisions. 

(incidentally I did a whole bunch of work on exactly how the brain does [that] process of conscious predictions to choose outcomes. That's best written up in Neural mechanisms of human decision-making, but that's still barely worth reading because it's so neuroscience-specialist-oriented [and sort of a bad compromise among co-authors]).

So my response to people being bothered by being "just" the "continuation of deterministic forces via genetics and experience" is that those are condensed as beliefs, values, knowledge, and skills, and the effort with which those are applied is what determines outcomes and therefore your results and your future. [That's pretty much exactly the type of free will worth wanting.]

This leaves intact some concerns about forces you're not conscious of playing a role. Did I decide to do this because it's the best decision, or because an advertiser or a friend put an association or belief in my head in a way I wouldn't endorse on reflection? [Or was it some strong valence association I acquired in ways I've never explicitly endorsed?] I think those are valid concerns.

So my answer to "am I really in control of my behavior?" is: sometimes, in some ways - and the exceptions are worth figuring out, so we can have more self-determination in the future. 

Anyway, that's some of my $.02 on "free will" in the sense of self-determination. Excellent post!