Do humans derive values from fictitious imputed coherence?

post by TsviBT · 2023-03-05T15:23:04.065Z · LW · GW · 8 comments

Contents

  The FIAT hypothesis
  Built-in behavior-determiners
  Some data
  Redescriptions
  Ambiguity
  Questions
None
8 comments

[Metadata: crossposted from https://tsvibt.blogspot.com/2022/11/do-humans-derive-values-from-fictitious.html. First completed November 1, 2022. This essay is more like research notes than exposition, so context may be missing, the use of terms may change across essays, and the text may be revised later; only the versions at tsvibt.blogspot.com are definitely up to date.]

Humans are born with some elements of their minds, and without many other elements, some of which they'll acquire as their life unfolds. In particular, the elements that we pretheoretically call "values"--aesthetic preferences, goals, life goals, squad goals, aspirations, needs, wants, yearnings, drives, cravings, principles, morals, ethics, senses of importance, and so on--are for the most part acquired or at least unfolded, rather than being explicitly present in a newborn. How does this happen? What generates these mental elements?

Hypothesis: a human derives many of zer values by imputing coherent agency to zer past behavior, and then adopting the goals of that fictitious agency as actively influential criteria for future action.

Thanks to Sam Eisenstat for relevant conversations.

The FIAT hypothesis

As a shorthand: "the FIAT hypothesis" = "the Fictitious Imputed Adopted Telos hypothesis". ("Fiat" is Latin for "may it happen" or "may it be made", which has some resonance with the FIAT hypothesis in that they both talk about a free creation of goals.) FIAT goals are goals imputed to some behavior and then adopted as goals.

Human behavior is determined by many things: built-in behavior-determiners such as the instinctive ability to breath, socially learned behavior and values, convergent instrumental goals [? · GW], and freely created autopoietic goals such as artistic goals. The FIAT hypothesis says that a major determiner of a human's behavior is the process of adopting goals based on interpreting zer past behavior as agentic.

Ze can be interpreted as asking the question: if my past behavior were the behavior of a coherent agent trying to do something, what would that something be? Then, whatever the answer was, ze adopts it as a goal--a target of more coherent behavior (more effective, more strategic, more orchestrated, more coordinated, more conscious, better resourced, more reflective, more univocal, more wasteless).

This hypothesis gives a possible answer to the question: how did evolution build something with some substantial level of agentic coherence, even though evolution can't directly program conscious concepts like "avoiding death" or "saving food" or "inclusive genetic fitness" for use as terms in a utility function for an organism to pursue?

This process could be continuous, with goals becoming gradually more coherent (and then potentially deprioritized, but usually not de-cohered). This process is iterative, starting with built-in behavior-determiners, then adopting new FIAT goals based on past behavior mainly generated by built-in determiners (and also maybe adopting new goals for other reasons), and then adopting new goals based on past behavior influenced by previously adopted goals, including previous FIAT goals, and so on. FIAT goals also come from not just imputing goals to zer own behavior, but also to the behavior of others, such as parents and leaders. Everything gets enshrined, but everything is open to criticism.

Note that calling this a hypothesis is maybe presumptuous; it's an idea, but since it's abstract and it's about a complex system, there's a lot of ambiguity between FIAT and other explanations or descriptions of behavior, and it's not necessarily obvious how to make different predictions according to the FIAT hypothesis.

Something left quite unspecified is how the FIAT process picks different possible interpretations of past behavior as serving some goal. As S.E. said, "interpretation needs a criterion".

Built-in behavior-determiners

Organisms are born with features that partially generate behavior. (Or, that partially determine behavior, or partially direct behavior, viewing behavior as a free and open creation of the organism's mind.) More specifically, they're born with features that partially determine the direction of the effect on the world of their behavior, aside from the magnitude of that effect.

These behavior-determiners can to some extent be viewed as "hard-coded values", in the sense that they determine something about the direction of the effect on the world of a human's behavior. The FIAT hypothesis says that a human notices these directions, and then pursues them further than the built-in behavior determiner pursues them.

Some overlapping examples:

Some data

The FIAT hypothesis is ambiguous with other explanations of behavior; see below. So the following possible FIAT goals are not clear examples, and could be taken as questions: are these FIAT goals? Why do humans behave like this? What are the goals involved (the aims of the behavior), if any, and by what force or reason or process are those goals created and adopted?

Imagine that a baby girl, toddling around in the course of her initial tentative investigations, reaches up onto a countertop to touch a fragile and expensive glass sculpture. She observes its color, sees its shine, feels that it is smooth and cold and heavy to the touch. Suddenly her mother interferes, grasps her hand, tells her not to ever touch that object. The child has just learned a number of specifically consequential things about the sculpture—has identified its sensory properties, certainly. More importantly, however, she has determined that approached in the wrong manner, the sculpture is dangerous (at least in the presence of mother); has discovered as well that the sculpture is regarded more highly, in its present unaltered configuration, than the exploratory tendency—at least (once again) by mother.

Redescriptions

Some ways to redescribe FIAT and related processes:

Screenshot from video by Owen's DIY

Ambiguity

There's a lot of ambiguity between the FIAT hypothesis and other descriptions or explanations of behavior.

Some of the ambiguity is hypothetico-deductive ambiguity, i.e. testable, resolvable uncertainty between hypotheses that make different predictions. E.g. humans sometimes adopt the mere appearance of holding a value in order to signal to other humans. Mere signaling makes different predictions than FIAT when the signaling value of behavior is decreased. When people are watching, both signaling and FIAT strongly predict that a person will act as though ze has the socially desirable value, but when people aren't watching, FIAT strongly predicts the person will still behave as though ze holds the value, whereas signaling only weakly predicts that (though still isn't too surprised, because of uncertainty about being caught, and self-signaling as an aid to future signaling for some reason).

Some of the ambiguity is descriptional ambiguity, i.e. there's more than one useful and true way to describe a situation. E.g. does a soap bubble want to have low surface area, or is it evolving under local laws of gas pressure and surface tension? Both, kind of, though the "wanting" needs more qualification than the law-following. E.g., is the sunk cost heuristic due to a FIAT process adopting goals based on past investments, or due to a more narrow heuristic or bias towards relying on a cached plan to invest in something until it pays off or obviously completely fails? These aren't necessarily mutually exclusive: we might want to interpret the FIAT process as being not some sort of unified, separate brain module, but, like many evolved mental processes, as a class of mechanisms and behaviors evolved for the same reason, towards the same end. So there may be a narrow cache-reliance bias, and this could be viewed as evolution having found that narrow mechanism for the general reason that the mechanism tends to contribute to FIAT-like behavior, which is good in general because it avoids thrashing (such as investing and then abandoning the investment).

Some things that FIAT is ambiguous with (besides the above redescriptions, which might be themselves be separable hypotheses ambiguous with FIAT):

Questions

Can the FIAT hypothesis be cached out into concrete, testable predictions?

By what criterion does or should humans select among the possible interpretations of past behavior as goal-pursuit?

In what sense can imputed goals be fictitious? What are the possible differences between a fictitious goal and a goal that's real but tacit, incompetently pursued, secret, etc.?

Is this related to Deutsch's theory of everything being open to criticism, even goals and values?

How does this relate to corrigibility?

Can or ought one impute specific goals to the FIAT process itself?

What are some clear non-examples of FIAT goals, besides built-in drives? E.g. is the subset of morality that could be derived from having to cope with the neighbors, e.g. "fairness", a value that's clearly not adopted by FIAT, but rather by symmetrization?

8 comments

Comments sorted by top scores.

comment by abramdemski · 2023-03-06T17:08:19.248Z · LW(p) · GW(p)

FIAT (by another name) was previously proposed in the book On Intelligence. The version there had a somewhat predictive-processing-like story where the cortex makes plans by prediction alone; so reflective agency (really meaning: agency arising from the cortex) is entirely dependent on building a self-model which predicts agency. Other parts of the brain are responsible for the reflexes which provide the initial data which the self-model gets built on (similar to your story).

The continuing kick toward higher degrees of agency comes from parts of the brain which have reactions to the predictions made by the cortex. (Otherwise, the cortex just learns to predict the raw reflexes, and we're stuck imitating our baby selves or something along those lines). 

It's not clear precisely how all of that works, but basically it means we have a pure predictive system (and much of the time we simply take the predicted actions), plus we have some other stuff (EG reflexes, and an override RLish system which inhibits and/or replaces the predicted action under some circumstances).

The most obvious version of FIAT which someone might write down after reading your post, otoh, is more like: run some IRL technique on your own past actions, and then (most of the time) plan based on the inferred goals, again with some overrides (built-in reflexes).

Anyway.

Here's my attempt to make a probably-false prediction from FIAT, as best I can.

It seems like the thing to do is to look for cases where people pursue their own goals, rather than the goals they would predict they have based on past actions.

It needs to be complex enough to not plausibly be a reflex/instinct. 

A sort of plausible example is courtship. It's complex, it can't easily be inferred from previous things you did (not the first time you do it, that is), and it agentically orients toward a goal. The problem is, I think it's well-explained as imitation - "I'm a person; the people around me do this and seem really into it; so I infer that I'm really into it too". 

So it's got to be a case where someone does something unexpected, even to themselves, which they don't see people do, but which achieves goals-they-plausibly-had-in-hindsight.

Homosexual intercourse in the 1800s??

Christopher Thomas Knight heading off into the woods??

Replies from: TsviBT
comment by TsviBT · 2023-03-12T20:13:16.094Z · LW(p) · GW(p)

I don't recall seeing that theory in the first quarter of the book, but I'll look for it later. I somewhat agree with your description of the difference between the theories (at least, as I imagine a predictive processing flavored version). Except, the theories are more similar than you say, in that FIAT would also allow very partial coherentifying, so that it doesn't have to be "follow these goals, but allow these overrides", but can rather be, "make these corrections towards coherence; fill in the free parameters with FIAT goals; leave all the other incoherent behavior the way it is". A difference between the theories (though I don't feel I can pass the PP ITT) is that FIAT allows, you know, agency, as in, non-myopic goal pursuit based on coherent-world-model-building, whereas PP maybe strongly hints against that?

It seems like the thing to do is to look for cases where people pursue their own goals, rather than the goals they would predict they have based on past actions.

I'm confused by this; are these supposed to be mutually exclusive? What's "their own goals"? [After thinking more: Oh like you're saying, here's what it would look like to have a goal that can't be explained as a FIAT goal? I'll assume that in the rest of this comment.]

It needs to be complex enough to not plausibly be a reflex/instinct.

Agreed.

A sort of plausible example is courtship. It's complex, it can't easily be inferred from previous things you did (not the first time you do it, that is), and it agentically orients toward a goal.

I'm not sure I buy that it can't be inferred, even the first time. Maybe you have fairly built-in instincts that aren't about the whole courtship thing, but cause you to feel good when you're around someone. So you seek being around them, and pay attention to them. You try to get them interested in being around you. This builds up the picture of a goal of being together for a long time. (This is a pretty poor explanation as stated; if this explanation works, why wouldn't you just randomly fall in love with anyone you do a favor for? But this is why it's at least plausible to me that the behavior could come from a FIAT-like thing. And maybe that's actually the case with homosexual intercourse in the 1800s.)

The problem is, I think it's well-explained as imitation - "I'm a person; the people around me do this and seem really into it; so I infer that I'm really into it too".

Maybe courtship is especially much like this, but in general things sort-of-well-explainable as imitation seem like admissible falsifications of FIAT, e.g. if there are also pressures against the behavior.

comment by MadHatter · 2023-03-05T18:11:39.621Z · LW(p) · GW(p)

FIAT is (somewhat) reminiscent of a humanities concept called interpellation.

comment by Mateusz Bagiński (mateusz-baginski) · 2023-07-17T15:00:58.238Z · LW(p) · GW(p)

A potential culture-level historical case of FIAT: AFAIK, Jewish monotheism emerged in response sometime in 5th century BCE in the aftermath (during?) the Babylonian captivity. Before that, Jews were henoteistic, with slight "preference" for JHWH. When their country was conquered "they reasoned" "we must have insulted the God with our cult of other gods (otherwise he wouldn't allow Babylonians to enslave us), so let's erase all explicit mentions of polytheism from the scriptures and ban the worship of non-JHWH gods".

Also, I wonder how FIAT relates to uniquely human capacity and tendency/drive to overimitate others. Is overimitation tied to inference of latent reasons for behavior and re-application of that mode of thinking to one's past self results in FIAT?

comment by Max H (Maxc) · 2023-03-06T04:58:50.298Z · LW(p) · GW(p)

If there are people who sometimes really pursue power and money for their own sake--so that there's nothing specifically determined that they're going to do with the power or money, however much they get--one explanation for this would be that it's a FIAT goal born of interpreting the instrumental goal. (There are probably a lot of other explanations for this behavior; e.g. they may be traumatized into non-goal-pursuing behavior that locally seeks power, like a forest fire.)
 

 

Is the parenthetical here misplaced? It seems unrelated to the text that precedes it.

Replies from: TsviBT
comment by TsviBT · 2023-03-12T19:34:17.284Z · LW(p) · GW(p)

It's giving an alternative explanation of the observation.

Replies from: Maxc
comment by Max H (Maxc) · 2023-03-12T21:19:47.647Z · LW(p) · GW(p)

Ah, I initially interpreted "forest fire" literally, as the event that traumatizes someone into non-goal-pursuing behavior. I see now that it's supposed to be parsed as a figurative description of how the behavior itself spreads.

Replies from: TsviBT
comment by TsviBT · 2023-03-12T21:25:02.391Z · LW(p) · GW(p)

Oh right sorry. Yeah, exactly.