Algorithmic Intent: A Hansonian Generalized Anti-Zombie Principle

zack_m_davis

Algorithmic Intent: A Hansonian Generalized Anti-Zombie Principle

post by Zack_M_Davis · 2020-07-14T06:03:17.761Z · LW · GW · 20 comments

20 comments

"Why didn't you tell him the truth? Were you afraid?"

"I'm not afraid. I chose not to tell him, because I anticipated negative consequences if I did so."

"What do you think 'fear' is, exactly?"

The Generalized Anti-Zombie Principle [LW · GW] calls for us to posit "consciousness" as casually upstream of reports of phenomenological experience (even if the causal link might be complicated and we might be wrong about the details of what consciousness is). If you're already familiar with conscious humans, then maybe you can specifically engineer a non-conscious chatbot that imitates the surface behaviors of humans talking about their experiences, but you can't have a zombie that just happens to talk about being conscious for no reason.

A similar philosophical methodology may help us understand other mental phenomena that we cannot perceive directly, but infer from behavior. The Hansonian Generalized Anti-Zombie Principle calls for us to posit "intent" as causally upstream of optimized behavior (even if the causal link might be complicated and we might be wrong about the details of what intent is). You can't have a zombie that just happens to systematically select actions that result in outcomes that rank high with respect to a recognizable preference ordering for no reason.

It's tempting to think that consciousness isn't part of the physical universe. Seemingly, we can imagine a world physically identically to our own—the same atom-configurations evolving under the same laws of physics—but with no consciousness, a world inhabited by philosophical "zombies" [LW · GW] who move and talk, but only as mere automatons, without the spark of mind within.

It can't actually work that way. When we talk about consciousness, we do so with our merely physical lips or merely physical keyboards. The causal explanation for talk about consciousness has to either exist entirely within physics (in which case anything we say about consciousness is causally unrelated to consciousness, which is absurd), or there needs to be some place where the laws of physics are violated as the immaterial soul is observed to be "tugging" on the brain (which is in-principle experimentally detectable). Zombies can't exist.

But if consciousness exists within physics, it should respect a certain "locality" [LW · GW]: if the configuration-of-matter that is you, is conscious, then almost-identical configurations should also be conscious for almost the same reasons. An artificial neuron that implements the same input-output relationships as a biological one, would "play the same role" within the brain, which would continue to compute the same externally-observable behavior.

We don't want to say that only externally-observable behavior matters and internal mechanisms don't matter at all, because substantively different internal mechanisms could compute the same behavior. Prosaically, acting exists: even the best method actors aren't really occupying the same mental state that the characters they portray would be in. In the limit, we could (pretend that we could) imagine an incomprehensibly vast Giant Lookup Table [LW · GW] that has stored the outputs that a conscious mind would have produced in response to any input. Is such a Giant Lookup Table—an entirely static mapping of inputs to outputs—conscious? Really?

But this thought experiment requires us to posit the existence of a Giant Lookup Table that just happens to mimic the behavior of a conscious mind. Why would that happen? Why would that actually happen, in the real world? (Or the closest possible world large enough to contain the Giant Lookup Table.) "Just assume it happened by coincidence, for the sake of the thought experiment" is unsatisfying, because that kind of arbitrary miracle doesn't help us understand what kind of cognitive work the ordinary simple concept [LW · GW] of consciousness is doing for us. You can assume that a broken and scrambled egg will spontaneously reassemble itself for the sake of a thought experiment, but the interpretation of your thought-experimental results may seem tendentious given that we have Godlike confidence [LW · GW] that you will never, ever see that happen in the real world [LW · GW].

The hard problem of consciousness is still confusing unto me—it seems impossible [LW · GW] that any arrangement of mere matter could add up to the ineffable qualia of subjective experience. But the easier and yet clearly somehow related problem of how mere matter can do information-processing—can do things like construct "models" by using sensory data to correlate its internal state with the state of the world [LW · GW]—seems understandable, and a lot of our ordinary use of the concept of consciousness necessarily deals with the "easy" problems, like how perception works or how to interpret people's self-reports, even if we can't see the identity [LW · GW] between the hard problem and the sum of all the easy problems. Whatever the true referent of "consciousness" is—however confused our current concept of it may be—it's going to be, among other things, the cause of our thinking that we have [LW · GW] "consciousness."

If I were to punch you in the face, I can anticipate the experience [LW · GW] of you reacting somehow—perhaps by saying, "Ow, that really hurt! I'm perceiving an ontologically-basic quale of pain right now! I hereby commit to extract a costly revenge on you if you do that again, even at disproportionate cost to myself!" The fact that the human brain has the detailed functional structure to compute that kind of response, whereas rocks and trees don't, is why we can be confident that rocks and trees don't secretly have minds like ours [LW · GW].

We recognize consciousness by its effects because we can only recognize anything by its effects. For a much simpler example, consider the idea of sorting. Human alphabets aren't just a set of symbols—we also have a concept of the alphabet coming in some canonical order. The order of the alphabet doesn't play any role in the written language itself: you wouldn't have trouble reading books from an alternate world where the order of the Roman alphabet ran KUWONSEZYFIJTABHQGPLCMVDXR, but all English words were the same—but you would have trouble finding the books on a shelf that wasn't sorted in the order you're used to. Sorting is useful because it lets us find things more easily: "The title I'm looking for starts with a P, but the book in front of me starts with a B; skip ahead" is faster than "look at every book until you find the one".

In the days before computers, the work of sorting was always done by humans: if you want your physical bookshelf to be alphabetized, you probably don't have a lot of other options than manually handling the books yourself ("This title starts with a Pl; I should put it ... da da da here, after this title starting with Pe but before its neighbor starting with Po"). But the computational work of sorting is simple enough that we can program computers to do it and prove theorems about what is being accomplished, without getting confused about the sacred mystery [LW · GW] of sorting-ness.

Very different systems can perform the work of sorting, but whether it's a human tidying her bookshelf, or a punchcard-sorting machine, or a modern computer sorting in RAM, it's useful to have a short word [LW · GW] to describe processes that "take in" some list of elements, and "output" a list with the same elements ordered with respect to some criterion, for which we can know that the theorems we prove about sorting-in-general will apply to any system that implements sorting. (For example, sorting processes that can only compare two items to check which is "greater" (as opposed to being able to exploit more detailed prior information about the distribution of elements) can expect to have to perform $n log n$ comparisons, where $n$ is the length of the list.)

Someone who wasn't familiar with computers might refuse to recognize sorting algorithms as real sorting, as opposed to mere "artificial sorting" [LW · GW]. After all, a human sorting her bookshelf intends to put the books in order, whereas the computer is just an automaton following instructions, and doesn't intend anything at all—a zombie sorter!

But this position is kind of silly, a gerrymandered concept definition [LW · GW]. To be sure, it's true that the internal workings of the human are very different from that of the computer. The human wasn't special-purpose programmed to sort and is necessarily doing a lot more things. The whole modality of visual perception, whereby photons bouncing off a physical copy of Rationality: AI to Zombies and absorbed by the human's retina are interpreted as evidence to construct a mental representation of the book in physical reality, whose "title" "begins" with an "R", is much more complicated than just storing the bit-pattern 1010010 (the ASCII code for R) in RAM. Nor does the computer have the subjective experience of eagerly looking forward to how much easier it will be to find books after the bookshelf is sorted. The human also probably won't perform the exact same sequence of comparisons as a computer program implementing quicksort—which also won't perform the same sequence of comparisons as a different program implementing merge sort. But the comparisons—the act of taking two things and placing them somewhere that depends on which one is "greater"—need to happen in order to get the right answer.

The concept of "sorting into alphabetical order" may have been invented before our concept of "computers", but the most natural concept [LW · GW] of sorting includes computers performing quicksort, merge sort, &c.., despite the lack of intent. We might say that intent is epiphenominal with respect to sorting.

But even if we can understand sorting without understanding intent, intent isn't epiphenominal to the universe. Intent is part of the fabric of [LW · GW] stuff that makes stuff happen [LW · GW]: there are sensory experiences that will cause you to usefully attribute intent to some physical systems and not others.

Specifically, whatever "intent" is—however confused our current concept of it may be—it's going to be, among other things, the cause of optimized [LW · GW] behavior. We can think of something as an optimization process if it's easier to predict its effects on the world by attributing goals to it, rather than by simulating its detailed actions and internal state. "To figure out a strange plot, look at what happens, then ask who benefits."

Alex Flint identifies robustness to perturbations as another feature of optimizing systems [LW · GW]. If you scrambled the books on the shelf while the human was taking a bathroom break away from sorting, when she came back she would notice the rearranged books, and sort them again—that's because she intends to achieve the outcome of the shelf being sorted. Sorting algorithms don't, in general, have this property: if you shuffle a subarray in memory that the operation of the algorithm assumes has already been sorted, there's nothing in the code to notice or care that the "intended" output was not achieved.

Note that this is a "behaviorist", "third person" perspective: we're not talking about some subjective feeling [LW · GW] of intending something, just systems that systematically steer reality into otherwise-improbable states that rank high with respect to some preference ordering.

Robin Hanson often writes about hidden motives in everyday life, advancing the thesis that the criteria that control our decisions aren't the same as [LW · GW] the high-minded story we tell other people, and even the story we represent to ourselves. If you take a strictly first-person perspective on intent, the very idea of hidden motives seems absurd—a contradiction in terms. What would it even mean, to intend something without being aware of it? How would you identify an alleged hidden motive?

The answer is that positing hidden motives can simplify our predictions of behavior. It can be easier to "look backwards" from what goals the behavior achieves, and continues to achieve in the presence of novel obstacles, than to "look forwards" from a detailed model of the underlying psychological mechanisms (which are typically unknown [LW · GW]).

Hanson and coauthor Kevin Simler discuss the example of nonhuman primates grooming each other—manually combing each other's fur to remove dirt and parasites. One might assume that the function of grooming is just what it appears to be: hygiene. But that doesn't explain why primates spend more time grooming than they need to, why they predominately groom others rather than themselves, and why the amount of time a species spends grooming is unrelated to the amount of hair it has to groom, but is related to the size of social groupings. These anomalies make more sense if we posit that grooming has been optimized for social-political functions, to provide a credible signal of trust.^[1] (The signal has to cost something—in this case, time—in order for it to not be profitable to fake.) The hygienic function of grooming isn't unreal—parasites do in fact get removed—but the world looks more confusing [LW · GW] if you assume the behavior is optimized solely for hygiene.

This kind of multiplicity of purposes is ubiquitous: thus, nobody does the thing they are supposedly doing [LW · GW]: politics isn't about policy, school is not about learning, medicine is not about health, &c.

There are functional reasons for some of the purposes of social behavior to be covert, to conceal or misrepresent information that it wouldn't be profitable for others to know. (And covert motivations might be a more effective design from an evolutionary perspective [LW · GW] than outright lying if it's too expensive to maintain two mental representations: the real map for ourselves, and a fake map for our victims.) This is sometimes explained as, "We self-deceive in order to better deceive others," but I fear that this formulation might suggest more "central planning" on the cognitive side of the evolutionary–cognitive boundary [LW · GW] than is really necessary: "self-deception" can arise from different parts of the mind working at cross-purposes.

Ziz discusses the example of a father attempting to practice nonviolent communication with his unruly teenage son: the father wants to have an honest and peaceful discussion of feelings and needs, but is afraid he'll lose control and become angry and threatening.

But angry threats aren't just a random mistake, in the way it's a random mistake if I forget to carry the one while adding 143 + 28. Random mistakes don't serve a purpose and don't resist correction: there's no plausible reason for me to want the incorrect answer 143 + 28 = 161, and if you say, "Hey, you forgot to carry the one," I'll almost certainly just say "Oops" and get it right the second time. Even if I'm more likely to make arithmetic errors when I'm tired, the errors probably won't correlate in a way that steers the future in a particular direction: you can't use information about what I want to make better predictions about what specific errors I'll make, nor use observations of specific errors to infer what I want.

In contrast, the father is likely to "lose control" and make angry threats precisely when peaceful behavior isn't getting him what he wants. That's what anger is designed to do: threaten to impose costs or withhold benefits to induce conspecifics to place more weight on the angry individual's welfare.

Another example of hidden motives: Less Wrong commenter Caravelle tells a story about finding a loophole in an online game [LW(p) · GW(p)], and being outraged to later be accused of cheating by the game administrators—only in retrospect remembering that, on first discovering the loophole, they had specifically told their teammates not to tell the administrators. The earlier Caravelle-who-discovered-the-bug must have known that the admins wouldn't allow it (or else why instruct teammates to keep quiet about it?), but the later Caravelle-who-exploited-the-bug was able to protest with perfect sincerity that they couldn't have known.

Another example: someone asks me an innocuous-as-far-as-they-know question that I don't feel like answering. Maybe we're making a cake, and I feel self-conscious about my lack of baking experience. You ask, "Why did you just add an eighth-cup of vanilla?" I initially mishear you as having said, "Did you just add ..." and reply, "Yes." It's only a moment later that I realize that that's not what you asked: you said "Why did you ...", not "Did you ...". But I don't correct myself, and you don't press the point. I am not a cognitive scientist and I don't know what was really going on in my brain when I misheard you: maybe my audio processing is just slow. But it seems awfully convenient for me that I momentarily misheard your question specifically when I didn't want to answer it and thereby reveal that I don't know what I'm doing—almost as if the elephant in my brain bet that it could get away with pretending to mishear you, and the bet paid off.

Our existing language may lack the vocabulary to adequately describe optimized behavior that comes from a mixture of overt and hidden motives. Does the father intend to make angry threats? Did the gamer intend to cheat? Was I only pretending to mishear your question, rather than actually mishearing it? We want to say No—not in the same sense that someone consciously intends to sort her bookshelf. And yet it seems useful to have short codewords [LW · GW] to talk about the aspects of these behaviors that seem optimized. The Hansonian Generalized Anti-Zombie Principle says that when someone "loses control" and makes angry threats, it's not because they're a zombie that coincidentally happens to do so when being nice isn't getting them what they want.

As Jessica Taylor explains, when our existing language lacks the vocabulary to accommodate our expanded ontology in the wake of a new discovery, one strategy for adapting our language is to define new senses of existing words that metaphorically extend the original meaning [LW · GW]. The statement "Ice is a form of water" might be new information to a child or a primitive AI who has already seen (liquid) water, and already seen ice, but didn't know that the former turns into the latter when sufficiently cold.

The word water in the sentence "Ice is a form of water" has a different extensional meaning [LW · GW] than the word water in the sentence "Water is a liquid", but both definitions can coexist as long as we're careful to precisely disambiguate which sense [LW · GW] of the word is meant in contexts where equivocation [LW · GW] could be deceptive.

We might wish to apply a similar linguistic tactic in order to be able to concisely talk about cases where we think someone's behavior is optimized to achieve goals, but the computation that determines the behavior isn't necessarily overt or conscious.

Algorithmic seems like a promising candidate for a disambiguating adjective to make it clear that we're talking about the optimization criteria implied by a system's inputs and outputs, rather than what it subjectively feels like to be that system [LW · GW]. We could then speak of an "algorithmic intent" that doesn't necessarily imply "(conscious) intent", similarly to how ice is a form of "water" despite not being "(liquid) water". We might similarly want to speak of algorithmic "honesty" (referring to signals [LW · GW] selected on the criterion of making receivers have more accurate beliefs), "deception" [LW · GW] (referring to signals selected for producing less accurate beliefs), or even "fraud" (deception that moves resources to the agent sending the deceptive signal).

Some authors might admit the pragmatic usefulness of the metaphorical extension, but insist that the new usage be marked as "just a metaphor" with a prefix such as pseudo- or quasi- [LW(p) · GW(p)]. But I claim that broad "algorithmic" senses of "mental" words like intent often are more relevant and useful for making sense of the world than the original, narrower definitions that were invented by humans in the context of dealing with other humans, because the universe in fact does not revolve around humans.

When a predatory Photuris firefly sends the mating signal of a different species of firefly in order to lure prey, I think it makes sense to straight-up call this deceptive (rather than merely pseudo- or quasi-deceptive), even though fireflies don't have language with which to think the verbal thought, "And now I'm going to send another species's mating signal in order to lure prey ..."

When a generative adversarial network learns to produce images of realistic human faces or anime characters [LW · GW], it would in no way aid our understanding to insist that the system isn't really "learning" just because it's not a human learning the way a human would—any more than it would to insist that quicksort isn't really sorting. "Using exposure to data as an input into gaining capabilities" is a perfectly adequate definition of learning in this context.

In a nearby possible future, when you sue a company for fraud because their advertising claimed that their product would disinfect wolf bites, but the product instead gave you cancer, we would hope that the court will not be persuaded if the company's defense-lawyer AI says, "But that advertisement was composed by filtering GPT [? · GW]-5 output for the version that increased sales the most [LW · GW]—at no point did any human form the conscious intent to deceive you!"

Another possible concern with this proposed language usage is that if it's socially permissible to attribute unconscious motives to interlocutors, people will abuse this to selectively accuse their rivals of bad intent, leading to toxic social outcomes: there's no way for negatively-valenced intent-language like "fraud" or "deception" to stably have denotative meanings [LW · GW] independently of questions of who should be punished [LW · GW].

It seems plausible to me that this concern is correct: in a human community of any appreciable size, if you let people question the stories we tell about ourselves, you are going to get acrimonious and not-readily-falsifiable accusations of bad intent. ("Liar!" "Huh? You can argue that I'm wrong, but I actually believe what I'm saying!" "Oh, maybe consciously, but I was accusing you of being an algorithmic liar.")

Unfortunately, as an aspiring epistemic rationalist, I'm not allowed to care [LW · GW] whether some descriptions might be socially harmful for a human community to adopt; I'm only allowed to care about what descriptions shorten the length of the message [LW · GW] needed to describe my observations.

Robin Hanson and Kevin Simler, The Elephant in the Brain: Hidden Motives in Everyday Life, Ch. 1, "Animal Behavior" ↩︎

20 comments

Comments sorted by top scores.

comment by Viliam · 2020-07-14T20:07:04.297Z · LW(p) · GW(p)

Please correct me if I'm wrong, but it seems to me that you expanded the meaning of "intent" to include any kind of optimization, even things like "he has a white skin, because he intends to catch more vitamin D from sun" describing someone who has no idea what vitamin D is, and where it comes from.

(You also mention robustness, but without specifying timescale. If the atmosphere of the Earth changes, and as a consequence my distant descendants evolve a different skin color, does this still count as our "intent" to keep catching the vitamin D?)

So it seems to me that you are redefining words, with the risk of sneaking in connotations [LW · GW]. I mean, if you start using the word "intent" in this sense, it seems likely to me that an inattentive reader would understand it as, well, intent in the usual narrower sense. And we already have enough problems with people misinterpreting the evolutionary-cognitive boundary [LW · GW], economists believing in perfectly rational behavior of everyone, etc.

Replies from: SaidAchmiz

↑ comment by Said Achmiz (SaidAchmiz) · 2020-07-15T13:40:52.193Z · LW(p) · GW(p)

I don’t think that’s right. As I mention in another comment [LW(p) · GW(p)], Dennett’s notion of the intentional stance is relevant here. More specifically, it provides us with a way to distinguish between cases that Zack intended to include in his concept of “algorithmic intent”, and such cases as the “catch more vitamin D” that you mention. To wit:

The positing of “algorithmic intent” is appropriate in precisely those cases where taking the intentional stance is appropriate (i.e., where—for humans—non-trivial gains in compression of description of a given agent’s behavior may be made by treating the agent’s behavior as intentional [i.e., directed toward some posited goal]), regardless of whether the agent’s conscious mind (if any!) is involved in any relevant decision loops.

Conversely, the positing of “algorithmic intent” is not appropriate in those cases where the design stance or the physical stance suffice (i.e., where no meaningful gains in compression of description of a given agent’s behavior may be made by treating the agent’s behavior as intentional [i.e., directed toward some posited goal]).

Clearly, the “catch more vitamin D” case falls into the latter category, and therefore the term “algorithmic intent” could not apply to it.

comment by gjm · 2020-07-14T11:14:12.464Z · LW(p) · GW(p)

I strongly disagree with your "not allowed to care" claim at the end, and I note with some interest that the link you offer in support of it is something you yourself wrote, and whose comment section consists mostly of other rationalists disagreeing with you. (I also note that it got a lot of upvotes, and one possible explanation for that is that a lot of people agreed with you who didn't say so in the comments.)

We get to define our words however the hell we like. There are many ways in which one set of definitions can be better or worse than another, one of which is that ceteris paribus definitions are better when they reduce average message length. But it's not the only one, and it shouldn't be the only one. (If that were the only criterion, then we should completely redesign our language to eliminate all redundancy. There are many reasons why that would be bad; for instance, errors would have worse consequences, and the language would likely be harder to learn and use because it would match our brain-hardware worse. Note that neither of those reasons is a matter of reducing message length.)

In any case, you have not by any means proved that your proposed change would reduce average message length. I think the nearest you get is where you say

But I claim that broad "algorithmic" senses of "mental" words like intent often are more relevant and useful for making sense of the world than the original, narrower definitions that were invented by humans in the context of dealing with other humans, because the universe in fact does not revolve around humans.

and I think it's very clear that falls far short of demonstrating anything. (For several reasons, of which I will here mention only one: although indeed the universe does not revolve around humans, it happens that most of the time when we are discussing things like "intent" we are talking about humans, and definitions / linguistic conventions that lead to less clarity when talking about humans and more clarity when talking about other entities that might have something like "intent" are not obviously better.)

Replies from: Zack_M_Davis, romeostevensit

↑ comment by Zack_M_Davis · 2020-07-15T06:01:16.219Z · LW(p) · GW(p)

(Regretfully, I'm too busy at the moment to engage with most of this, but one thing leapt out at me—)

I note with some interest that the link you offer in support of it is something you yourself wrote, and whose comment section consists mostly of other rationalists disagreeing with you

I don't think that should be interesting. I don't care what so-called "rationalists" think [LW · GW]; I care about what's true.

Replies from: gjm

↑ comment by gjm · 2020-07-15T07:55:58.380Z · LW(p) · GW(p)

(Why write a lengthy and potentially controversial piece if you know you haven't time to engage with responses? But --)

[EDITED to add:] Of course, maybe you have plenty of time to engage with other responses and something about mine specifically or about me specifically makes you value such engagement particularly little. In that case there'd be no particular inconsistency. I don't know of any reason why my comments, specifically, would be not worth engaging with -- but then I wouldn't, would I?

The statement "as an aspiring epistemic rationalist I am not allowed to do X" can be interpreted in three ways. (1) "Not doing X is part of what the word 'rationalist' means." (2) "Among rationalists, X is morally prohibited." (3) "X is in some fashion objectively wrong for everyone, and it happens that rationalists pay particular attention to that sort of wrongness."

The behaviour of others who consider themselves rationalists is relevant to #1 because the meaning of a word is determined by how it is actually used. It is relevant to #2 because what is prohibited in a given community is determined by the opinions of that community as a whole. It is only tangentially relevant to #3, and I suspect that #3 is your actual meaning; but (a) prima facie #1 and #2 are also possible, and (b) even with #3 I think one function of "as an aspiring epistemic rationalist" in what you wrote is to encourage readers who also think of themselves that way to feel bad about disagreeing, which I think they shouldn't and are less likely to if aware that either your usage of "rationalist" or your opinion about what "rationalists" are allowed to do is highly idiosyncratic.

Replies from: Zack_M_Davis, Zack_M_Davis

↑ comment by Zack_M_Davis · 2020-07-15T16:29:12.939Z · LW(p) · GW(p)

Why write a lengthy and potentially controversial piece if you know you haven't time to engage with responses?

Look, I have a dayjob and I'm way behind on chores! I had a blog post idea that I'd been struggling with on-and-off for months, that I finally managed to finish a passable draft of and email to prereaders Sunday night. My prereaders thought it sucked (excerpts: "should ideally be a lot tighter. 2/3 the word count?", "Don't know why [this post] exists. It seems like you have something to say, so say it"), but on Monday night I was able to edit it into something that I wasn't embarrassed to shove out the door—even if it wasn't my best work. I have a dayjob videoconference meeting starting in two minutes. Can I maybe get back to you later???

Replies from: gjm

↑ comment by gjm · 2020-07-15T20:51:23.114Z · LW(p) · GW(p)

Sure, whatever works for you. (Including not getting back to me later, of course. If what I wrote came across as trying to impose an obligation then I put it badly.) I hope the videoconference went well.

↑ comment by Zack_M_Davis · 2020-07-18T08:25:52.738Z · LW(p) · GW(p)

So, I'm still behind on chores (some laundry on the floor, some more in the dryer, that pile of boxes in the living room, &c.), but allow me to quickly clarify one thing before I get to sleep. (I might have a separate reply for the great-grandparent later.)

the meaning of a word is determined by how it is actually used

Right, and the same word can have different meanings depending on context if it gets used differently in different contexts. Specifically, I perceive the word rationalist as commonly being used with two different meanings:

(1) anyone who seeks methods of systematically correct reasoning, general ways of thinking that produce maps that reflect the territory and use those maps to formulate plans that achieve goals

(2) a member of the particular social grouping of people who read Less Wrong. (And are also likely to read Slate Star Codex, be worried about existential risk from artificial intelligence, be polyamorous and live in Berkeley, CA, &c.)

I intended my "as an aspiring epistemic rationalist" to refer to the meaning rationalist(1), whereas I read your "other rationalists disagreeing with you" to refer to rationalist(2).

An analogy: someone who says, "As a Christian, I cannot condone homosexual 'marriage'" is unlikely to be moved by the reply, "But lots of Christians at my liberal church disagree with you". The first person is trying to be a Christian(1) as they understand it—one who accepts Jesus Christ as their lord and savior and adheres to the teachings of the Bible. The consensus of people who are merely Christian(2)—those who belong to a church—is irrelevant if that church is corrupt and has departed from God's will.

Hope this helps.

one function of "as an aspiring epistemic rationalist" in what you wrote is to encourage readers who also think of themselves that way to feel bad about disagreeing

I mean, they should feel bad if and only if feeling bad helps them be less wrong.

Replies from: gjm

↑ comment by gjm · 2020-07-18T12:18:33.870Z · LW(p) · GW(p)

Yup, all understood. (I think in practice any given use of the word is likely to have a bit of both meanings in it, with or without concomitant equivocation.)

[EDITED to add:] Maybe it's worth saying a little about your analogy. To whatever extent the analogy does more than merely illustrate your distinction between rationalist-1 and rationalist-2 (and I take it it is intended to do a little more, since the distinction was perfectly clear without it), it seems that you see yourself as being in something like the position of the Hypothetical Evangelical, asking us all "do ye not therefore err, because ye know not the Sequences, neither the power of rationality?". That of course is why I dedicated most of my comment to arguing against your actual claim, that those who seek truth through clear thinking must choose their concept-boundaries without any considerations other than minimizing average description length. And my position is a bit like one I remember holding not infrequently in the past, when I was (alas) a moderately-evangelical Christian: I can see how you see the Sequences as supporting your position, but I don't think that's the only or the best way to interpret the relevant bits of the Sequences, and to whatever extent Eliezer was saying the same thing as you I'm afraid I think that Eliezer was wrong. (Yes, I am suggesting, tongue somewhat but not wholly in cheek, a parallel between some Christians' equivocation between "God's will" and "what is written in the bible" and your appeal to the authority of Eliezer's posts about word and concepts when arguing for what seem to me inadvisably-extreme positions on how rationalists should use words.)

Incidentally, I'm aware that that's now twice in a row that I've responded very briefly and then edited in more substantive comments. I promise I'm not doing it out of any wish to deceive or anything like that. It's just that sometimes I'm right on the fence about how much it's worth saying.

↑ comment by romeostevensit · 2020-07-14T19:54:52.129Z · LW(p) · GW(p)

The claim at the end is a sarcastic poke at a particular straw rationality also hinted at by Simulacra levels etc.

Replies from: Zack_M_Davis, gjm

↑ comment by Zack_M_Davis · 2020-07-15T06:08:46.491Z · LW(p) · GW(p)

I mean, I agree that pretending Levels 2+ don't exist may not be a good strategy for getting Level 1 discourse insofar as it's hard to coordinate on ... but maybe not as hard as you think? [LW · GW]

↑ comment by gjm · 2020-07-14T21:29:17.341Z · LW(p) · GW(p)

How sure are you about that? Other things Zack's written seem to me to argue for a rather similar position without any obvious sign of irony. Zack, if you're reading this, would you care to comment?

comment by Richard_Kennaway · 2020-10-11T12:33:17.551Z · LW(p) · GW(p)

"Why didn't you tell him the truth? Were you afraid?"
"I'm not afraid. I chose not to tell him, because I anticipated negative consequences if I did so."
"What do you think 'fear' is, exactly?"

Fear is a certain emotional response to anticipated negative consequences, which may or may not be present when such consequences are anticipated. If present, it may or may not be a factor in making decisions.

comment by Dagon · 2020-07-15T14:08:07.258Z · LW(p) · GW(p)

I think this gives too much import to consciousness. It seems simplest to model it as a side-effect of some computations, roughly how some collections of models introspect themselves. The question of whether the consciousness or the calculation "decided" something needs to be un-asked, and recognize that the system which includes both did it.

Most legal theory and naive human morality use "intent" heavily as factors, likely as a mechanism to predict future risk. It seems like if we wanted to, we could replace it with a more direct model of the agent in question which predicts how much punishment is necessary to alter future actions.

comment by Said Achmiz (SaidAchmiz) · 2020-07-15T13:32:16.094Z · LW(p) · GW(p)

This discussion would be incomplete without a mention of Daniel Dennett’s notion of the intentional stance.

comment by NancyLebovitz · 2021-04-17T13:27:20.443Z · LW(p) · GW(p)

""Why didn't you tell him the truth? Were you afraid?"

"I'm not afraid. I chose not to tell him, because I anticipated negative consequences if I did so."

"What do you think 'fear' is, exactly?""

The possibly amusing thing is that I read it as being someone who thought fear was shameful and was therefore lying, or possibly lying to themself about not feeling fear. I wasn't expecting a discussion of p-zombies, though perhaps I should have been.

Does being strongly inhibited against knowing one's own emotions make one more like a p-zombie?

As for social inhibitions against denying what other people say about their motives, it's quite true that it can be socially corrosive to propose alternate motives for what people are doing, but I don't think your proposal will make things much worse.

We're already there. A lot of political discourse include assuming the worst about the other side's motivations.

Replies from: quwgri

↑ comment by quwgri · 2025-01-20T02:09:12.238Z · LW(p) · GW(p)

If we talk about the quote at the beginning, then its final conclusion seems to me not entirely correct.
What the vast majority of people mean by "emotions" is different from "rational functions of emotions". Yudkowsky in his essay on emotions is playing with words, using terms that are not quite traditional.
Fear is not "I calmly foresee the negative consequences of some actions and therefore I avoid them."
Fear is rather "The thought of the possibility of some negative events makes me tremble, I have useless reflections, I have cognitive distortions that make me unreasonably overestimate (or, conversely, sometimes underestimate) the probability of these negative events, I begin to feel aggression towards sources of information about the possibility of these negative events (and much more in the same spirit)."
Emotions in the human understanding are not at all the same as the rational influence of basic values on behavior in Yudkowsky's interpretation.
Emotions in the human understanding are, first of all, a mad hodgepodge of cognitive distortions.
Therefore, when Yudkowsky says something like "Why do you think that AI will be emotionless? After all, it will have values!", I even see some manipulation here. Well, yes, AI will have values influencing behavior. But at the same time, it will not be nervous, freak out, or experience the halo effect. This is absolutely not what a normal ordinary person would call emotions. In fact, here Yudkowsky's imaginary opponents are closer to the truth, depicting AI as dispassionate and emotionless (because the uniform influence of values on behavior without peaks and troughs should look exactly like that).
Does it matter?
It depends. When communicating with ordinary people, we are used to using their and our cognitive distortions. When talking to a person, you know that you can suddenly change the topic of conversation and influence the emotions of the interlocutor. In communication with AI (powerful enough and having managed to modify itself well), all this will not work. It is like trying to outsmart God.
Therefore, it seems to me that a person who tunes himself to thought "I am communicating with an impassive inhuman being" will in some sense be closer to the truth (at least, will have fewer false subconscious hopes) than a person who tunes himself to thought "I am communicating with the same living emotional sympathetic subject that I am." But this is context-dependent.

comment by Pattern · 2020-07-15T21:30:52.838Z · LW(p) · GW(p)

Someone who wasn't familiar with computers might refuse to recognize sorting algorithms as real sorting, as opposed to mere "artificial sorting" [LW · GW]. After all, a human sorting her bookshelf intends to put the books in order, whereas the computer is just an automaton following instructions, and doesn't intend anything at all—a zombie sorter!

The opposite case is more easily made: if the sorting isn't done by a computer, then how do you know it's been done right? (Humans can make mistakes, and may operate with fallible means (of operation and reason). Computers are so logical, how can they fail at any task?)

The benefit of sorting by human (by hand) is that machines can break down, or...fail in unfamiliar/unpredictable ways. The trolley may fail to break, where a human would fail and collapse instead of(,) running wild.

the most natural concept [LW · GW] of sorting includes computers performing quicksort, merge sort, &c.., despite the lack of intent.

There's a difference between 'natural concepts' and 'mathematical concepts'. By that token, "shuffling" is a sort.

Sorting by subject is more natural - which computers are widely used to do impressively well - up to a point.

Note that this is a "behaviorist", "third person" perspective: we're not talking about some subjective feeling [LW · GW] of intending something, just systems that systematically steer reality into otherwise-improbable states that rank high with respect to some preference ordering.

So 'algorithmic intent' isn't a good argument against zombies. Zombie - an optimizer which doesn't experience:

pain
emotion
etc.

The fact that a zombie was made by/is being used by something that isn't a zombie, i.e. the existence of necromancers, doesn't rule out the possibility of zombies.

The Hansonian Generalized Anti-Zombie Principle says that when someone "loses control" and makes angry threats, it's not because they're a zombie that coincidentally happens to do so when being nice isn't getting them what they want.

'Hidden motives' is a shorter, better name.

When a predatory Photuris firefly sends the mating signal of a different species of firefly in order to lure prey, I think it makes sense to straight-up call this deceptive

evolved deception

it would in no way aid our understanding to insist that the system isn't really "learning"

Sticking with the faces possibility - suppose such an 'AI' is somehow induced to generate an entire character, and the result doesn't make sense at all. A bunch of heads everywhere, or a bunch of smaller portraits of headshots.

It intuitive that there is a difference between a program that 'generates' only images in the training dataset and one that generates original images. There may be real, and important differences between what these programs do and what people do.

any more than it would to insist that quicksort isn't really sorting.

1. Aside from robustness to perturbation (mentioned in the post), how does quicksort perform on a sorted list? (In a world with already/partially sorted collections, quicksort may be lacking in greed.)

2. Sort a set of works from best to worst.

3. Can a program learn how to sort a new alphabet like humans can? Perhaps one can be made, or GPT may already have the capability - but a program with that ability is different from ones that don't, and failing to recognize that a program, or an entity, does (or doesn't) have the capacity to learn a thing, affects future predictions of behavior.

Unfortunately, as an aspiring epistemic rationalist, I'm not allowed to care [LW · GW]

Truth is important.* That argument is also ridiculous...

The sentence is in the wrong tense/voice. (You can do anything.)
Would the court care that (the people who work at the) company/ies that use GPT-5 'aren't allowed' to care about anything but profit?
An 'isolated demand for rigor' may be ridiculous - but is it more ridiculous than than 'the isolated presence of rigor'/topic choice?
This is a choice. (Presumably no one (else) is forcing you to conform to this standard?)

Unfortunately, as an aspiring epistemic rationalist, I'm not allowed to care [LW · GW] whether some descriptions might be socially harmful for a human community to adopt; I'm only allowed to care about what descriptions shorten the length of the message [LW · GW]needed to describe my observations.

I'd guess you don't have this property, that you do care.

It would be weirdly pedantic to say this post isn't short enough - but that standard doesn't seem like it's what is being optimized for. (If what you mean is honesty or accuracy, that seems like an imperfect description.)

*Holding up "Truth" highly makes sense, but that particular relation doesn't. (And who has been banned from this site for falsehood/deceit?)

comment by TAG · 2023-10-13T16:24:19.933Z · LW(p) · GW(p)

We recognize consciousness by its effects because we can only recognize anything by its effects.

Inasmuch as we recognize consciousness, we recognise it by its effects because we can only recognize anything by its effects. But we have no way of confirming how our accurate our guesswork is, particularly regarding aspects of consciousness that aren't behavioural or functional , ie. the hard problem aspects.

The causal explanation for talk about consciousness has to either exist entirely within physics (in which case anything we say about consciousness is causally unrelated to consciousness, which is absurd), or there needs to be some place where the laws of physics are violated as the immaterial soul is observed to be “tugging” on the brain (which is in-principle experimentally detectable).

Or the laws of of physics are an adequate description of reality, not the only one.

comment by Nathaniel Monson (nathaniel-monson) · 2023-10-13T10:48:02.961Z · LW(p) · GW(p)

In the dual interest of increasing your pleasantness to interact with and your epistemic rationality, I will point out that your last paragraph is false. You are allowed to care about anything and everything you may happen to care about or choose to care about. As an aspiring epistemic rationalist, the way in which you are bound is to be honest with yourself about message-description lengths, and your own values and your own actions, and the tradeoffs they reflect.

If a crazy person holding a gun said to you (and you believed) "i will shoot you unless you tell me that you are a yellow dinosaur named Timothy", your epistemic rationality is not compromised by lying to save your life (as long as you are aware it is a lie). Similarly, if you value human social groups, whether intrinsically or instrumentally, you are allowed to externally use longer-than-necessary description lengths if you so choose without any bit of damage to your own epistemic rationality. You may worry that you damage the epistemic rationality of the group or its members, but damaging a community by using the shortest description lengths could also do damage to its epistemic rationality.

Algorithmic Intent: A Hansonian Generalized Anti-Zombie Principle

Contents

20 comments