Sequence introduction: non-agent and multiagent models of mind

post by Kaj_Sotala · 2019-01-07T14:12:30.297Z · score: 88 (33 votes) · LW · GW · 5 comments

Contents

  Published posts:
  Near-term posts (partially already written):
  Farther out (sketched out but not as extensively planned/written yet)
None
5 comments

A typical paradigm by which people tend to think of themselves and others is as consequentialist agents: entities who can be usefully modeled as having beliefs and goals, who are then acting according to their beliefs to achieve their goals.

This is often a useful model, but it doesn’t quite capture reality. It’s a bit of a fake framework [LW · GW]. Or in computer science terms, you might call it a leaky abstraction.

An abstraction in the computer science sense is a simplification which tries to hide the underlying details of a thing, letting you think in terms of the simplification rather than the details. To the extent that the abstraction actually succeeds in hiding the details, this makes things a lot simpler. But sometimes the abstraction inevitably leaks, as the simplification fails to predict some of the actual behavior that emerges from the details; in that situation you need to actually know the underlying details, and be able to think in terms of them.

Agent-ness being a leaky abstraction is not exactly a novel concept for Less Wrong; it has been touched upon several times, such as in Scott Alexander’s Blue-Minimizing Robot Sequence. At the same time, I do not think that it has been quite fully internalized yet, and that many foundational posts on LW go wrong due to being premised on the assumption of humans being agents. In fact, I would go as far as to claim that this is the biggest flaw of the original Sequences: they were attempting to explain many failures of rationality as being due to cognitive biases, when in retrospect it looks like understanding cognitive biases doesn’t actually make you substantially more effective. But if you are implicitly modeling humans as goal-directed agents, then cognitive biases is the most natural place for irrationality to emerge from, so it makes sense to focus the most on there.

Just knowing that an abstraction leaks isn’t enough to improve your thinking, however. To do better, you need to know about the actual underlying details to get a better model. In this sequence, I will aim to elaborate on various tools for thinking about minds which look at humans in more granular detail than the classical agent model does. Hopefully, this will help us better get past the old paradigm.

One particular family of models that I will be discussing, will be that of multi-agent theories of mind. Here the claim is not that we would literally have multiple personalities. Rather, my approach will be similar in spirit to the one in Subagents Are Not A Metaphor:

Here’s are the parts composing my technical definition of an agent:
1. Values
This could be anything from literally a utility function to highly framing-dependent. Degenerate case: embedded in lookup table from world model to actions.
2. World-Model
Degenerate case: stateless world model consisting of just sense inputs.
3. Search Process
Causal decision theory is a search process. “From a fixed list of actions, pick the most positively reinforced” is another. Degenerate case: lookup table from world model to actions.
Note: this says a thermostat is an agent. Not figuratively an agent. Literally technically an agent. Feature not bug.

This is a model that can be applied naturally to a wide range of entities, as seen from the fact that thermostats qualify. And the reason why we tend to automatically think of people - or thermostats - as agents, is that our brains have evolved to naturally model things in terms of this kind of an intentional stance; it’s a way of thought that comes natively to us.

Given that we want to learn to think about humans in a new way, we should look for ways to map the new way of thinking into a native mode of thought. One of my tactics will be to look for parts of the mind that look like they could literally be agents (as in the above technical definition of an agent), so that we can replace our intuitive one-agent model with intuitive multi-agent models without needing to make trade-offs between intuitiveness and truth. This will still be a leaky simplification, but hopefully it will be a more fine-grained leaky simplification, so that overall we’ll be more accurate.

My model of what I think our subagents looks like draws upon a number of different sources, including neuroscience, psychotherapy and meditation, so in the process of sketching out my model I will be covering a number of them in turn. To give you a rough idea of what I'm trying to do, here's a summary of some upcoming content.

Published posts:

Book summary: Consciousness and the Brain [LW · GW]. One of the fundamental building blocks of much of consciousness research, is that of Global Workspace Theory (GWT). This could be described as a component of a multiagent model, focusing on the way in which different agents exchange information between one another. One elaboration of GWT, which focuses on how it might be implemented in the brain, is the Global Neuronal Workspace (GNW) model in neuroscience. Consciousness in the Brain is a 2014 book that summarizes some of the research and basic ideas behind GNW, so summarizing the main content of that book looks like a good place to start our discussion and for getting a neuroscientific grounding before we get more speculative.

Building up to an IFS model [LW · GW]. One theoretical approach for modeling humans as being composed of interacting parts is that of Internal Family Systems. In my experience and that of several other people in the rationalist community, it’s very effective for this purpose. However, having its origins in therapy, its theoretical model may seem rather unscientific and woo-y. This personally put me off the theory for a long time, as I thought that it sounded fake, and gave me a strong sense of "my mind isn't split into parts like that".

In this post, I construct a mechanistic sketch of how a mind might work, drawing on the kinds of mechanisms that have already been demonstrated in contemporary machine learning, and then end up with a model that pretty closely resembles the IFS one.

Subagents, introspective awareness, and blending. [LW · GW] In this post, I extend the model of mind that I've been building up in previous posts to explain some things about change blindness, not knowing whether you are conscious, forgetting most of your thoughts, and mistaking your thoughts and emotions as objective facts, while also connecting it with the theory in the meditation book The Mind Illuminated.

Subagents, akrasia, and coherence in humans. [LW · GW] We can roughly describe coherence as the property that, if you become aware that there exists a more optimal strategy for achieving your goals than the one that you are currently executing, then you will switch to that better strategy. For a subagent theory of mind, we would like to have some explanation of when exactly the subagents manage to be collectively coherent (that is, change their behavior to some better one), and what are the situations in which they fail to do so.

My conclusion is that we are capable of changing our behaviors on occasions when the mind-system as a whole puts sufficiently high probability on the new behavior being better, when the new behavior is not being blocked by a particular highly weighted subagent (such as an IFS-style protector) that puts high probability on it being bad, and when we have enough slack [LW · GW] in our lives for any new behaviors to be evaluated in the first place. Akrasia is subagent disagreement about what to do.

Integrating disagreeing subagents [LW · GW]. In the previous post, I suggested that akrasia involves subagent disagreement - or in other words, different parts of the brain having differing ideas on what the best course of action is. The existence of such conflicts raises the question, how does one resolve them?

In this post I discuss various techniques which could be interpreted as ways of resolving subagents disagreements, as well as some of the reasons for why this doesn’t always happen.

Subagents, neural Turing machines, thought selection, and blindspots [LW · GW]. In my summary of Consciousness and the Brain, I briefly mentioned that one of the functions of consciousness is to carry out artificial serial operations; or in other words, implement a production system (equivalent to a Turing machine) in the brain.

While I did not go into very much detail about this model in the post, I’ve used it in later articles. For instance, in Building up to an Internal Family Systems model, I used a toy model where different subagents cast votes to modify the contents of consciousness. One may conceptualize this as equivalent to the production system model, where different subagents implement different production rules which compete to modify the contents of consciousness.

In this post, I flesh out the model a bit more, as well as applying it to a few other examples, such as emotion suppression, internal conflict, and blind spots.

Subagents, trauma, and rationality [LW · GW]. This post interprets the appearance of subagents as emerging from unintegrated memory networks, and argues that the presence of these is a matter of degree. There’s a continuous progression of fragmented (dissociated) memory networks giving arise to increasingly worse symptoms as the degree of fragmentation grows. The continuum goes from everyday procrastination and akrasia on the “normal” end, to disrupted and dysfunctional beliefs on the middle, and conditions like clinical PTSD, borderline personality disorder, and dissociative identity disorder on the severely traumatized end.

I also argue that emotional work and exploring one's past traumas in order to heal them, is necessary for effective instrumental and epistemic rationality.

Against "System 1" and "System 2" [LW · GW]. The terms System 1 and System 2 were originally coined by the psychologist Keith Stanovich and then popularized by Daniel Kahneman in his book Thinking, Fast and Slow. Stanovich noted that a number of fields within psychology had been developing various kinds of theories distinguishing between fast/intuitive on the one hand and slow/deliberative thinking on the other. Often these fields were not aware of each other. The S1/S2 model was offered as a general version of these specific theories, highlighting features of the two modes of thought that tended to appear in all the theories.

Since then, academics have continued to discuss the models. Among other developments, Stanovich and other authors have discontinued the use of the System 1/System 2 terminology as misleading, choosing to instead talk about Type 1 and Type 2 processing. In this post, I will build on some of that discussion to argue that Type 2 processing is a particular way of chaining together the outputs of various subagents using working memory. Some of the processes involved in this chaining are themselves implemented by particular kinds of subagents.

Near-term posts (partially already written):

A non-mysterious explanation of the Three Marks of Existence. If being an agent is a leaky abstraction, then one way of characterizing insight meditation [LW · GW] would be as a technique for finding and staring at the places where the abstraction does leak. Here, I offer a model of insight meditation as a way to witness some of the processes by which the experience of being an agent is constructed, helping dissolve [LW · GW] the kinds of confusions that make us think we are agents in the first place.

One way of carving up the space of things that you’ll find by doing insight meditation is by what some Buddhist schools call the Three Marks of Existence: no-self, impermanence, and unsatisfactoriness. Here, I try to sketch out an explanation of the kinds of things that these marks are pointing to, how they underlie a more accurate model of human psychology than the folk intuition does, and how witnessing them might be expected to transform one’s expectations.

Farther out (sketched out but not as extensively planned/written yet)

The game theory of rationality and cooperation in a multiagent world. Multi-agent models have a natural connection to Elephant in the Brain -style dynamics: our brains doing things for purposes of which we are unaware. Furthermore, there can be strong incentives to continue systematic self-deception and not integrate conflicting beliefs. For instance, if a mind has subagents which think that specific beliefs are dangerous to hold or express, then they will work to suppress subagents holding that belief from coming into conscious awareness.

“Dangerous beliefs” might be ones that touch upon political topics, but they might also be ones of a more personal nature. For instance, someone may have an identity as being “good at X”, and then want to rationalize away any contradictory evidence - including evidence suggesting that they were wrong on a topic related to X. Or it might be something even more subtle.

These are a few examples of how rationality work has to happen on two levels at once: to debug some beliefs (individual level), people need to be in a community where holding various kinds of beliefs is actually safe (social level). But in order for the community to be safe for holding those beliefs (social level), people within the community also need to work on themselves so as to deal with their own subagents that would cause them to attack people with the “wrong” beliefs (individual level). This kind of work also seems to be necessary for fixing “politics being the mind-killer” and collaborating on issues such as existential risk across sharp value differences; but the need to carry out the work on many levels at once makes it challenging, especially since the current environment incentivizes many (sub)agents to sabotage any attempt at this.

(This topic area is also related to that stuff Valentine has been saying about Omega [LW · GW].)

AI alignment and multiagent models: submind values and the default human ontology. In a recent post [LW · GW], Wei Dai mentioned that “the only apparent utility function we have seems to be defined over an ontology very different from the fundamental ontology of the universe”. I agree, and I think it’s worth emphasizing that the difference is not just “we tend to think in terms of classical physics but actually the universe runs on particle physics”. Unless they've been specifically trained to do so, people don’t usually think of their values in terms of classical physics, either. That’s something that’s learned on top of the default ontology.

The ontology that our values are defined over, I think, shatters into a thousand shards [LW · GW] of disparate models held by different subagents with different priorities. It is mostly something like “predictions of receiving sensory data that has been previously classified as good or bad, the predictions formed on the basis of doing pattern matching to past streams of sensory data”. Things like e.g. intuitive physics simulators feed into these predictions, but I suspect that even intuitive physics is not the ontology over which our values are defined; clusters of sensory experiences are that ontology, with intuitive physics being a tool for predicting how to get those experiences. This is the same sense in which you might e.g. use your knowledge of social dynamics to figure out how to get into situations which have made you feel loved in the past, but your knowledge of social dynamics is not the same thing as the experience of being loved.


This sequence is part of research done for, and supported by, the Foundational Research Institute.

5 comments

Comments sorted by top scores.

comment by moridinamael · 2019-01-07T19:41:01.488Z · score: 10 (6 votes) · LW · GW

I really look forward to this Sequence.

comment by Hazard · 2019-01-10T02:21:13.947Z · score: 7 (3 votes) · LW · GW

I'm very excited to see the rest of this! Last spring I wrote the first post [LW · GW] for a sequence that had very similar intents. You posting this has given me a nudge to move forward with mine. Here's a brief outline of things I was going to look at (might be useful for you to further clarify to yourself the specific chunks of this topic you are trying to explore)

  • Give some computer architecture arguments for why it's hard to get something to be agent like, and why those arguments might apply to our minds.
  • Explore how social pressure to "act like an agent" and conform to the person-hood interface makes it difficult to notice one's own non-agentyness.
  • For me (and I'd guess others) a lot of my intentional S2 frames for valuing people seems to put a lot of weight on how "agenty" someone is. I would like to dwell on a "rescuing the utility function" like move for agency.
comment by Kaj_Sotala · 2019-01-17T10:02:43.736Z · score: 3 (1 votes) · LW · GW

Sounds like our posts could be nicely complementary, I encourage you to continue posting yours! And huh, you scooped me on the "agenthood is a leaky abstraction" idea, I didn't realize it had been previously used on LW. :)

comment by avturchin · 2019-01-07T15:08:32.061Z · score: 6 (4 votes) · LW · GW

There is an interesting psychotherapeutic technic of calling subpersonalities one by one, called "Voice Dialogue" which were developed by Stones. I experienced a few surprising results from it both being a seater and a subject of the therapy. This technic may be used to demonstrate the soundness of the subpersonalities theory for those who doubt - or to query the subpersonalities one by one, may be with the goal of learning their values for AI alignment. This is their site: http://delos-inc.com/

comment by KyriakosCH · 2019-10-30T18:38:22.481Z · score: 1 (1 votes) · LW · GW

I wish to examine a point in the foundations of your post - to be more precise, a point which leads to the inevitable conclusion that it is not problematic in this discussion to use the term 'agent' while it is understood in a manner which allows a thermostat to qualify as an agent.


A thermostat certainly has triggers/sensors which force a reaction when a condition has been met. However to argue that this is akin to how a person is an agent is to argue that a rock supposedly "runs" the program known as gravity, when it falls. The issue is not a lack of parallels; it is a lack of undercurrent below the parallels (in a sense, this is causing the view that a thermostat is an agent, to be a 'leaking abstraction' as you put it). For we have to consider that no actual identification of change (be it through sense or thought or both) is possible when the entity identifying such change lacks the ability to translate it in a setting of its own. By translating I mean something readily evident in the case of human agents - not so evident in the case of ants or other relatively simpler creatures. If your room is on fire you identify this as a change from the normal, but this does not mean there is only one way to identify the changed situation. Someone living next to you will also identify that there is a fire, but chances are the (to use an analogy) code for that in their mind will differ very significantly from your own. Yet on some basic level you will be in agreement that there was a fire, and you had to leave.

Now an ant, another being which has life - unlike a thermostat - picks up changes in its environment. If you try to attack it it may go into panic mode. This, again, does not mean the act of attacking the ant is picked up as it is; it is once against translated, this time by the ant. How it translates it is not known, however it seems impossible to argue that it merely picks up the change as something set, some block of truth with the meaning 'change/danger' etc. It picks it up due to its ability (not conscious in the case of the ant) to identify something as set, and something as a change in that original set. A thermostat has no identification of anything set, because not being alive it has no power nor need to sense a starting condition, let alone to have inside it a vortex where translations of changes are formed.


All the above is why I firmly am against the view that "agent" is to be defined in a way that both a human and a thermostat can partake in it, when the discussion is about humans and involves that term.