Posts
Comments
Seems helpful for understanding how believing-ins get formed by groups, sometimes.
"Global evaluation" isn't exactly what I'm trying to posit; more like a "things bottom-out in X currency" thing.
Like, in the toy model about $ from Atlas Shrugged, an heir who spends money foolishly eventually goes broke, and can no longer get others to follow their directions. This isn't because the whole economy gets together to evaluate their projects. It's because they spend their currency locally on things again and again, and the things they bet on do not pay off, do not give them new currency.
I think the analog happens in me/others: I'll get excited about some topic, pursue it for awhile, get back nothing, and decide the generator of that excitement was boring after all.
Hmm. Under your model, are there ways that parts gain/lose (steam/mindshare/something)?
Does it feel to you as though your epistemic habits / self-trust / intellectual freedom and autonomy / self-honesty takes a hit here?
Fair point; I was assuming you had the capacity to lie/omit/deceive, and you're right that we often don't, at least not fully.
I still prefer my policy to the OPs, but I accept your argument that mine isn't a simple Pareto improvement.
Still:
- I really don't like letting social forces put "don't think about X" flinches into my or my friends' heads; and the OPs policy seems to me like an instance of that;
- Much less importantly: as an intelligent/self-reflective adult, you may be better at hiding info if you know what you're hiding, compared to if you have guesses you're not letting yourself see, that your friends might still notice. (The "don't look into dragons" path often still involves hiding info, since often your brain takes a guess anyhow, and that's part of how you know not to look into this one. If you acknowledge the whole situation, you can manage your relationships consciously, including taking conscious steps to buy openness-offsets, stay freely and transparently friends where you can scheme out how.)
I don't see advantage to remaining agnostic, compared to:
1) Acquire all the private truth one can.
Plus:
2) Tell all the public truth one is willing to incur the costs of, with priority for telling public truths about what one would and wouldn't share (e.g. prioritizing to not pose as more truth-telling than one is).
--
The reason I prefer this policy to the OP's "don't seek truth on low-import highly-politicized matters" is that I fear not-seeking-truth begets bad habits. Also I fear I may misunderstand how important things are if I allow politics to influence which topics-that-interest-my-brain I do/don't pursue, compared to my current policy of having some attentional budget for "anything that interests me, whether or not it seems useful/virtuous."
Yes, this is a good point, relates to why I claimed at top that this is an oversimplified model. I appreciate you using logic from my stated premises; helps things be falsifiable.
It seems to me:
- Somehow people who are in good physical health wake up each day with a certain amount of restored willpower. (This is inconsistent with the toy model in the OP, but is still my real / more-complicated model.)
- Noticing spontaneously-interesting things can be done without willpower; but carefully noticing superficially-boring details and taking notes in hopes of later payoff indeed requires willpower, on my model. (Though, for me, less than e.g. going jogging requires.)
- If you’ve just been defeated by a force you weren’t tracking, that force often becomes spontaneously-interesting. Thus people who are burnt out can sometimes take a spontaneous interest in how willpower/burnout/visceral motivation works, and can enjoy “learning humbly” from these things.
- There’s a way burnout can help cut through ~dumb/dissociated/overconfident ideological frameworks (e.g. “only AI risk is interesting/relevant to anything”), and make space for other information to have attention again, and make it possible to learn things not in one's model. Sort of like removing a monopoly business from a given sector, so that other thingies have a shot again.
I wish the above was more coherent/model-y.
Thanks for asking. The toy model of “living money”, and the one about willpower/burnout, are meant to appeal to people who don’t necessarily put credibility in Rand; I’m trying to have the models speak for themselves; so you probably *are* in my target audience. (I only mentioned Rand because it’s good to credit models’ originators when using their work.)
Re: what the payout is:
This model suggests what kind of thing an “ego with willpower” is — where it comes from, how it keeps in existence:
- By way of analogy: a squirrel is a being who turns acorns into poop, in such a way as to be able to do more and more acorn-harvesting (via using the first acorns’-energy to accumulate fat reserves and knowledge of where acorns are located).
- An “ego with willpower”, on this model, is a ~being who turns “reputation with one’s visceral processes” into actions, in such a way as to be able to garner more and more “reputation with one’s visceral processes” over time. (Via learning how to nourish viscera, and making many good predictions.)
I find this a useful model.
One way it’s useful:
IME, many people think they get willpower by magic (unrelated to their choices, surroundings, etc., although maybe related to sleep/food/physiology), and should use their willpower for whatever some abstract system tells them is virtuous.
I think this is a bad model (makes inaccurate predictions in areas that matter; leads people to have low capacity unnecessarily).
The model in the OP, by contrast, suggests that it’s good to take an interest in which actions produce something you can viscerally perceive as meaningful/rewarding/good, if you want to be able to motivate yourself to take actions.
(IME this model works better than does trying to think in terms of physiology solely, and is non-obvious to some set of people who come to me wondering what part of their machine is broken-or-something such that they are burnt out.)
(Though FWIW, IME physiology and other basic aspects of well-being also has important impacts, and food/sleep/exercise/sunlight/friends are also worth attending to.)
I mean, I see why a party would want their members to perceive the other party's candidate as having a blind spot. But I don't see why they'd be typically able to do this, given that the other party's candidate would rather not be perceived this way, the other party would rather their candidate not be perceived this way, and, naively, one might expect voters to wish not to be deluded. It isn't enough to know there's an incentive in one direction; there's gotta be more like a net incentive across capacity-weighted players, or else an easier time creating appearance-of-blindspots vs creating visible-lack-of-blindspots, or something. So, I'm somehow still not hearing a model that gives me this prediction.
You raise a good point that Susan’s relationship to Tusan and Vusan is part of what keeps her opinions stuck/stable.
But I’m hopeful that if Susan tries to “put primary focal attention on where the scissors comes from, and how it is working to trick Susan and Robert at once”, this’ll help with her stuckness re: Tusan and Vusan. Like, it’ll still be hard, but it’ll be less hard than “what if Robert is right” would be.
Reasons I’m hopeful:
I’m partly working from a toy model in which (Susan and Tusan and Vusan) and (Robert and Sobert and Tobert) all used to be members of a common moral community, before it got scissored. And the norms and memories of that community haven’t faded all the way.
Also, in my model, Susan’s fear of Tusan’s and Vusan’s punishment isn’t mostly fear of e.g. losing her income or other material-world costs. It is mostly fear of not having a moral community she can be part of. Like, of there being nobody who upholds norms that make sense to her and sees her as a member-in-good-standing of that group of people-with-sensible-norms.
Contemplating the scissoring process… does risk her fellowship with Tusan and Vusan, and that is scary and costly for Susan.
But:
- a) Tusan and Vusan are not *as* threatened by it as if Susan had e.g. been considering more directly whether Candidate X was good. I think.
- b) Susan is at least partially compensated by her partial-risk-of-losing-Tusan-and-Vusan, by the hope/memory of the previous society that (Susan and Tusan and Vusan) and (Robert and Sobert and Tobert) all shared, which she has some hope of reaccessing here
- b2) Tusan and Vusan are maybe also a bit tempted by this, which on their simpler models (since they’re engaging with Susan’s thoughts only very loosely / from a distance, as they complain about Susan) renders as “maybe she can change some of the candidate X supporters, since she’s discussing how they got tricked”
- c) There are maybe some remnant-norms within the larger (pre-scissored) community that can appreciate/welcome Susan and her efforts.
I’m not sure I’m thinking about this well, or explicating it well. But I feel there should be some unscissoring process?
I don't follow this model yet. I see why, under this model, a party would want the opponent's candidate to enrage people / have a big blind spot (and how this would keep the extremes on their side engaged), but I don't see why this model would predict that they would want their own candidate to enrage people / have a big blind spot.
Thanks; I love this description of the primordial thing, had not noticed this this clearly/articulately before, it is helpful.
Re: why I'm hopeful about the available levers here:
I'm hoping that, instead of Susan putting primary focal attention on Robert ("how can he vote this way, what is he thinking?"), Susan might be able to put primary focal attention on the process generating the scissors statements: "how is this thing trying to trick me and Robert, how does it work?"
A bit like how a person watching a commercial for sugary snacks, instead of putting primary focal attention on the smiling person on the screen who seems to desire the snacks, might instead put primary focal attention on "this is trying to trick me."
(My hope is that this can become more feasible if we can provide accurate patterns for how the scissors-generating-process is trying to trick Susan(/Robert). And that if Susan is trying to figure out how she and Robert were tricked, by modeling the tricking process, this can somehow help undo the trick, without needing to empathize at any point with "what if candidate X is great."
Or: by seeing themselves, and a voter for the other side, as co-victims of an optical illusion, designed to trick each of them into being unable to find another's areas of true seeing. And by working together to figure out how the illusion works, while seeing it as a common enemy.
But my specific hypothesis here is that the illusion works by misconstruing the other voter's "Robert can see a problem with candidate Y" as "Robert can't see the problem with candidate X", and that if you focus on trying to decode the first the illusion won't kick in as much.
By parsing the other voter as "against X" rather than "for Y", and then inquiring into how they see X as worth being against, and why, while trying really hard to play taboo and avoid ontological buckets.
Huh. Is your model is that surpluses are all inevitably dissipated in some sort of waste/signaling cascade? This seems wrong to me but also like it's onto something.
I like your conjecture about Susan's concern about giving Robert steam.
I am hoping that if we decode the meme structure better, Susan could give herself and Robert steam re: "maybe I, Susan, am blind to some thing, B, that matters" without giving steam to "maybe A doesn't matter, maybe Robert doesn't have a blind spot there." Like, maybe we can make a more specific "try having empathy right at this part" request that doesn't confuse things the same way. Or maybe we can make a world where people who don't bother to try that look like schmucks who aren't memetically savvy, or something. I think there might be room for something like this?
If we can get good enough models of however the scissors-statements actually work, we might be able to help more people be more in touch with the common humanity of both halves of the country, and more able to heal blind spots.
E.g., if the above model is right, maybe we could tell at least some people "try exploring the hypothesis that Y-voters are not so much in favor of Y, as against X -- and that you're right about the problems with Y, but they might be able to see something that you and almost everyone you talk to is systematically blinded to about X."
We can build a useful genre-savviness about common/destructive meme patterns and how to counter them, maybe. LessWrong is sort of well-positioned to be a leader there: we have analytic strength, and aren't too politically mindkilled.
I don't know the answer, but it would be fun to have a twitter comment with a zillion likes asking Sam Altman this question. Maybe someone should make one?
I've bookedmarked this; thank you; I expect to get use from this list.
Resonating from some of the OP:
Sometimes people think I have a “utility function” that is small and is basically “inside me,” and that I also have a set of beliefs/predictions/anticipations that is large, richly informed by experience, and basically a pointer to stuff outside of me.
I don’t see a good justification for this asymmetry.
Having lived many years, I have accumulated a good many beliefs/predictions/anticipations about outside events: I believe I’m sitting at a desk, that Biden is president, that 2+3=5, and so on and so on. These beliefs came about via colliding a (perhaps fairly simple, I’m not sure) neural processing pattern with a huge number of neurons and a huge amount of world. (Via repeated conscious effort to make sense of things, partly.)
I also have a good deal of specific preference, stored in my ~perceptions of “good”: this chocolate is “better” than that one; this short story is “excellent” while that one is “meh”; such-and-such a friendship is “deeply enriching” to me; this theorem is “elegant, pivotal, very cool” and that code has good code-smell while this other theorem and that other code are merely meh; etc.
My guess is that my perceptions of which things are “good” encodes quite a lot pattern that really is in the outside world, much like my perceptions of which things are “true/real/good predictions.”
My guess is that it’s confused to say my perceptions of which things are “good” is mostly about my utility function, in much the same way that it’s confused to say that my predictions about the world is mostly about my neural processing pattern (instead of acknowledging that it’s a lot about the world I’ve been encountering, and that e.g. the cause of my belief that I’m currently sitting at a desk is mostly that I’m currently sitting at a desk).
And this requires what I've previously called "living from the inside," and "looking out of your own eyes," instead of only from above. In that mode, your soul is, indeed, its own first principle; what Thomas Nagel calls the "Last Word." Not the seen-through, but the seer (even if also: the seen).
I like this passage! It seems to me that sometimes I (perceive/reason/act) from within my own skin and perspective: "what do I want now? what's most relevant? what do I know, how do I know it, what does it feel like, why do I care? what even am I, this process that finds itself conscious right now?" And then I'm more likely to be conscious, here, caring. (I'm not sure what I mean by this, but I'm pretty sure I mean something, and that it's important.)
One thing that worries me a bit about contemporary life (school for 20 years, jobs where people work in heavily scripted ways using patterns acquired in school, relatively little practice playing in creeks or doing cooking or carpentry or whatever independently) is that it seems to me it conditions people to spend less of our mental cycles "living from the inside," as you put it, and more of them ~"generating sentences designed to seem good some external process", and I think this may make people conscious less often.
I wish I understood better what it is to "look out from your own eyes"/"live from the inside", vs only from above.
Totally. Yes.
I love that book! I like Robin's essays, too, but the book was much easier for me to understand. I wish more people would read it, would review it on here, etc.
(I don't necessarily agree with QC's interpretation of what was going on as people talked about "agency" -- I empathize some, but empathize also with e.g. Kaj's comment in a reply that Kaj doesn't recognize this at from Kaj's 2018 CFAR mentorship training, did not find pressures there to coerce particular kinds of thinking).
My point in quoting this is more like: if people don't have much wanting of their own, and are immersed in an ambient culture that has opinions on what they should "want," experiences such as QC's seem sorta like the thing to expect. Which is at least a bit corroborated by QC reporting it.
-section on other ways to get inside opponent's loop, not just speed -- "more inconspicuously, more quickly, and with more irregularity" as Boyd said
this sounds interesting
-personal examples from video games: Stormgate and Friends vs. Friends
I want these
the OODA loop is not as linear as this model presents
I think the steps always go in order, but also there are many OODA loops running simultaneously
In the Observe step, one gathers information about the situation around them. In Boyd's original context of fighter aircraft operations, we can imagine a pilot looking out the canopy, checking instruments, listening to radio communications, etc.
Gotcha. I'd assumed "observe" was more like "hear a crashing noise from the kitchen" -- a kinda-automatic process that triggers the person to re-take-things-in and re-orient. Is that wrong?
Some partial responses (speaking only for myself):
1. If humans are mostly a kludge of impulses, including the humans you are training, then... what exactly are you hoping to empower using "rationality training"? I mean, what wants-or-whatever will they act on after your training? What about your "rationality training" will lead them to take actions as though they want things? What will the results be?
1b. To illustrate what I mean: once I taught a rationality technique to SPARC high schoolers (probably the first year of SPARC, not sure; I was young and naive). Once of the steps in the process involved picking a goal. After walking them through all the steps, I asked for examples of how it had gone, and was surprised to find that almost all of them had picked such goals as "start my homework earlier, instead of successfully getting it done at the last minute and doing recreational math meanwhile"... which I'm pretty sure was not their goal in any wholesome sense, but was more like ambient words floating around that they had some social allegiance to. I worry that if you "teach" "rationality" to adults who do not have wants, without properly noticing that they don't have wants, you set them up to be better-hijacked by the local memeset (and to better camouflage themselves as "really caring about AI risk" or whatever) in ways that won't do anybody any good because the words that are taking the place of wants don't have enough intelligence/depth/wisdom in them.
2. My guess is that the degree of not-wanting that is seen among many members of the professional and managerial classes in today's anglosphere is more extreme than the historical normal, on some dimensions. I think this partially because:
a. IME, my friends and I as 8-year-olds had more wanting than I see in CFAR participants a lot of the time. My friends were kids who happened to live on the same street as me growing up, so probably pretty normal. We did have more free time than typical adults.
i. I partially mean: we would've reported wanting things more often, and an observer with normal empathy would on my best guess have been like "yes it does seem like these kids wish they could go out and play 4-square" or whatever. (Like, wanting you can feel in your body as you watch someone, as with a dog who really wants a bone or something).
ii. I also mean: we tinkered, toward figuring out the things we wanted (e.g. rigging the rules different ways to try to make the 4-square game work in a way that was fun for kids of mixed ages, by figuring out laxer rules for the younger ones), and we had fun doing it. (It's harder to claim this is different from the adults, but, like, it was fun and spontaneous and not because we were trying to mimic virtue; it was also this way when we saved up for toys we wanted. I agree this point may not be super persuasive though.)
b. IME, a lot of people act more like they/we want things when on a multi-day camping trip without phones/internet/work. (Maybe like Critch's post about allowing oneself to get bored?)
c. I myself have had periods of wanting things, and have had periods of long, bleached-out not-really-wanting-things-but-acting-pretty-"agentically"-anyway. Burnout, I guess, though with all my CFAR techniques and such I could be pretty agentic-looking while quite burnt out. The latter looks to me more like the worlds a lot of people today seem to me to be in, partly from talking to them about it, though people vary of course and hard to know.
d. I have a theoretical model in which there are supposed to be cycles of yang and then yin, of goal-seeking effort and then finding the goal has become no-longer-compelling and resting / getting board / similar until a new goal comes along that is more compelling. CFAR/AIRCS participants and similar people today seem to me to often try to stop this process -- people caffeinate, try to work full days, try to have goals all the time and make progress all the time, and on a large scale there's efforts to mess with the currency to prevent economic slumps. I think there's a pattern to where good goals/wanting come from that isn't much respected. I also think there's a lot of memes trying to hijack people, and a lot of memetic control structures that get upset when members of the professional and managerial classes think/talk/want without filtering their thoughts carefully through "will this be okay-looking" filters.
All of the above leaves me with a belief that the kinds of not-wanting we see are more "living human animals stuck in a matrix that leaves them very little slack to recover and have normal wants, with most of their 'conversation' and 'attempts to acquire rationality techniques' being hijacked by the matrix they're in rather than being earnest contact with the living animals inside" and less "this is simple ignorance from critters who're just barely figuring out intelligence but who will follow their hearts better and better as you give them more tools."
Apologies for how I'm probably not making much sense; happy to try other formats.
I'm trying to build my own art of rationality training, and I've started talking to various CFAR instructors about their experiences – things that might be important for me to know but which hadn't been written up nicely before.
Perhaps off topic here, but I want to make sure you have my biggest update if you're gonna try to build your own art of rationality training.
It is, basically: if you want actual good to result from your efforts, it is crucial to build from and enable consciousness and caring, rather than to try to mimic their functionality.
If you're willing, I'd be quite into being interviewed about this one point for a whole post of this format, or for a whole dialog, or to talking about it with you in some other way, means, since I don't know how to say it well and I think it's crucial. But, to babble:
Let's take math education as an analogy. There's stuff you can figure out about numbers, and how to do things with numbers, when you understand what you're doing. (e.g., I remember figuring out as a kid, in a blinding flash about rectangles, why 2*3 was 3*2, why it would always work). And other people can take these things you can figure out, and package them as symbol-manipulation rules that others can use to "get the same results" without the accompanying insights. But... it still isn't the same things as understanding, and it won't get your students the same kind of ability to build new math or to have discernment about which math is any good.
Humans are automatically strategic sometimes. Maybe not all the way, but a lot more deeply than we are in "far-mode" contexts. For example, if you take almost anybody and put them in a situation where they sufficiently badly need to pee, they will become strategic about how to find a restroom. We are all capable of wanting sometimes, and we are a lot closer to strategic at such times.
My original method of proceeding in CFAR, and some other staff members' methods also, was something like:
- Find a person, such as Richard Feynman or Elon Musk or someone a bit less cool than that but still very cool who is willing to let me interview them. Try to figure out what mental processes they use.
- Turn these mental processes into known, described procedures that system two / far-mode can invoke on purpose, even when the vicera do not care about a given so-called "goal."
(For example, we taught processes such as: "notice whether you viscerally expect to achieve your goal. If you don't, ask why not, solve that problem, and iterate until you have a plan that you do viscerally anticipate will succeed." (aka inner sim / murphyjitsu.))
My current take is that this is no good -- it teaches non-conscious processes how to imitate some of the powers of consciousness, but in a way that lacks its full discernment, and that can lead to relatively capable non-conscious, non-caring processes doing a thing that no one who was actually awake-and-caring would want to do. (And can make it harder for conscious, caring, but ignorant processes, such as youths, to tell the difference between conscious/caring intent, and memetically hijacked processes in the thrall of institutional-preservation-forces or similar.) I think it's crucial to more like start by helping wanting/caring/consciousness to become free and to become in charge. (An Allan Bloom quote that captures some but not all of what I have in mind: "There is no real education that does not respond to felt need. All else is trifling display.")
I'm not Critch, but to speak my own defense of the numeracy/scope sensitivity point:
IMO, one of the hallmarks of a conscious process is that it can take different actions in different circumstances (in a useful fashion), rather than simply doing things the way that process does it (following its own habits, personality, etc.). ("When the facts change, I change my mind [and actions]; what do you do, sir?")
Numeracy / scope sensitivity is involved in, and maybe required for, the ability to do this deeply (to change actions all the way up to one's entire life, when moved by a thing worth being moved by there).
Smaller-scale examples of scope sensitivity, such as noticing that a thing is wasting several minutes of your day each day and taking inconvenient, non-default action to fix it, can help build this power.
I am pretty far from having fully solved this problem myself, but I think I'm better at this than most people, so I'll offer my thoughts.
My suggestion is to not attempt to "figure out goals and what to want," but to "figure out blockers that are making it hard to have things to want, and solve those blockers, and wait to let things emerge."
Some things this can look like:
- Critch's "boredom for healing from burnout" procedures. Critch has some blog posts recommending boredom (and resting until quite bored) as a method for recovering one's ability to have wants after burnout:
- Physically cleaning things out. David Allen recommends cleaning out one's literal garage (or, for those of us who don't have one, I'd suggest one's literal room, closet, inbox, etc.) so as to have many pieces of "stuck goal" that can resolve and leave more space in one's mind/heart (e.g., finding an old library book from a city you don't live in anymore, and either returning it anyhow somehow, or giving up on it and donating it to goodwill or whatever, thus freeing up whatever part of your psyche was still stuck in that goal).
- Refusing that which does not "spark joy." Marie Kondo suggests getting in touch with a thing you want your house to be like (e.g., by looking through magazines and daydreaming about your desired vibe/life), and then throwing out whatever does not "spark joy", after thanking those objects for their service thus far.
- Analogously, a friend of mine has spent the last several months refusing all requests to which they are not a "hell yes," basically to get in touch with their ability to be a "hell yes" to things.
- Repeatedly asking one's viscera "would there be anything wrong with just not doing this?". I've personally gotten a fair bit of mileage from repeatedly dropping my goals and seeing if they regenerate. For example, I would sit down at my desk, would notice at some point that I was trying to "do work" instead of to actually accomplish anything, and then I would vividly imagine simply ceasing work for the week, and would ask my viscera if there would be any trouble with that or if it would in fact be chill to simply go to the park and stare at clouds or whatever. Generally I would get back some concrete answer my viscera cared about, such as "no! then there won't be any food at the upcoming workshop, which would be terrible," whereupon I could take that as a goal ("okay, new plan: I have an hour of chance to do actual work before becoming unable to do work for the rest of the week; I should let my goal of making sure there's food at the workshop come out through my fingertips and let me contact the caterers" or whatever.
- Gendlin's "Focusing." For me and at least some others I've watched, doing this procedure (which is easier with a skilled partner/facilitator -- consider the sessions or classes here if you're fairly new to Focusing and want to learn it well) is reliably useful for clearing out the barriers to wanting, if I do it regularly (once every week or two) for some period of time.
- Grieving in general. Not sure how to operationalize this one. But allowing despair to be processed, and to leave my current conceptions of myself and of my identity and plans, is sort of the connecting thread through all of the above imo. Letting go of that which I no longer believe in.
I think the above works much better in contact also with something beautiful or worth believing in, which for me can mean walking in nature, reading good books of any sort, having contact with people who are alive and not despairing, etc.
Okay, maybe? But I've also often been "real into that" in the sense that it resolves a dissonance in my ego-structure-or-something, or in the ego-structure-analog of CFAR or some other group-level structure I've been trying to defend, and I've been more into "so you don't get to claim I should do things differently" than into whether my so-called "goal" would work. Cf "people don't seem to want things."
. The specific operation that happened was applying ooda loops to the concept of ooda loops.
I love this!
Surprise 4: How much people didn't seem to want things
And, the degree to which people wanted things was even more incoherent than I thought. I thought people wanted things but didn't know how to pursue them.
[I think Critch trailed off here, but implication seemed to be "basically people just didn't want things in the first place"]
I concur. From my current POV, this is the key observation that should've, and should still, instigate a basic attempt to model what humans actually are and what is actually up in today's humans. It's too basic a confusion/surprise to respond to by patching the symptoms without understanding what's underneath.
I also quite appreciate the interview as a whole; thanks, Raemon and Critch!
I'm curious to hear how you arrived at the conclusion that a belief is a prediction.
I got this in part from Eliezer's post Make your beliefs pay rent in anticipated experiences. IMO, this premise (that beliefs should try to be predictions, and should try to be accurate predictions) is one of the cornerstones that LessWrong has been based on.
I love this post. (Somehow only just read it.)
My fav part:
> In the context of quantilization, we apply limited steam to projects to protect ourselves from Goodhart. "Full steam" is classically rational, but we do not always want that. We might even conjecture that we never want that.
To elaborate a bit:
It seems to me that when I let projects pull me insofar as they pull me, and when I find a thing that is interesting enough that it naturally "gains steam" in my head, it somehow increases the extent to which I am locally immune from Goodhardt (e.g., my actions/writing goes deeper than I might've expected). OTOH, when I try hard on a thing despite losing steam as I do it, I am more subject to Goodhardt (e.g., I complete something with the same keywords and external checksums as I thought I needed to hit, but it has less use and less depth than I might've expected given that).
I want better models of this.
Oh, man, yes, I hadn't seen that post before and it is an awesome post and concept. I think maybe "believing in"s, and prediction-market-like structures of believing-ins, are my attempt to model how Steam gets allocated.
So, I agree there's something in common -- Wittgenstein is interested in "language games" that have function without having literal truth-about-predictions, and "believing in"s are games played with language that have function and that do not map onto literal truth-about-predictions. And I appreciate the link in to the literature.
The main difference between what I'm going for here, and at least this summary of Wittgenstein (I haven't read Wittgenstein and may well be shortchanging him and you) is that I'm trying to argue that "believing in"s pay a specific kind of rent -- they endorse particular projects capable of taking investment, they claim the speaker will themself invest resources in that project, they predict that that project will get yield ROI.
Like: anticipations (wordless expectations, that lead to surprise / not-surprise) are a thing animals do by default, that works pretty well and doesn't get all that buggy. Humans expand on this by allowing sentences such as "objects in Earth's gravity accelerate at a rate of 9.8m/s^2," which... pays rent in anticipated experience in a way that "Wulky Wilkisen is a post-utopian" doesn't, in Eliezer's example. I'm hoping to cleave off, here, a different set of sentences that are also not like "Wulky Wilkinsen is a post-utopian" and that pay a different and well-defined kind of rent.
A point I didn’t get to very clearly in the OP, that I’ll throw into the comments:
When shared endeavors are complicated, it often makes sense for them to coordinate internally via a shared set of ~“beliefs”, for much the same reason that organisms acquire beliefs in the first place (rather than simply learning lots of stimulus-response patterns or something).
This sometimes makes it useful for various collaborators in a project to act on a common set of “as if beliefs,” that are not their own individual beliefs.
I gave an example of this in the OP:
- If my various timeslices are collaborating in writing a single email, it’s useful to somehow hold in mind, as a target, a single coherent notion of how I want to trade off between quality-of-email and time-cost-of-writing. Otherwise I leave value on the table.
The above was an example within me, across my subagents. But there are also examples held across sets of people, e.g. how much a given project needs money vs insights on problem X vs data on puzzle Y, and what the cruxes are that’ll let us update about that, and so on.
A “believing in” is basically a set of ~beliefs that some portion of your effort-or-other-resources is invested in taking as a premise, that usually differ from your base-level beliefs.
(Except, sometimes people coordinate more easily via things that’re more like goals or deontologies or whatever, and English uses the phrase “believing in” for marking investment in a set of ~beliefs, or in a set of ~goals, or in a set of ~deontologies.)
I made a new post just now, "Believing In," which offers a different account of some of the above phenomena.
My current take is that my old concept of "narrative syncing" describes the behaviorist outside of a pattern of relating that pops up a lot, but doesn't describe the earnest inside that that pattern is kind of designed around.
(I still think "narrative syncing" is often done without an earnest inside, by people keeping words around an old icon after the icon has lost its original earnest meaning (e.g., to manipulate others), so I still want a term for that part; I, weirdly, do not often think using the term "narrative syncing," it doesn't quite do it for me, not sure what would. Some term that is to "believing in" as lying/deceiving is to "beliefs/predictions".)
Yes, exactly. Like, we humans mostly have something that kinda feels intrinsic but that also pays rent and updates with experience, like a Go player's sense of "elegant" go moves. My current (not confident) guess is that these thingies (that humans mostly have) might be a more basic and likely-to-pop-up-in-AI mathematical structure than are fixed utility functions + updatey beliefs, a la Bayes and VNM. I wish I knew a simple math for them.
Thanks for replying. The thing I'm wondering about is: maybe it's sort of like this "all the way down." Like, maybe the things that are showing up as "terminal" goals in your analysis (money, status, being useful) are themselves composed sort of like the apple pie business, in that they congeal while they're "profitable" from the perspective of some smaller thingies located in some large "bath" (such as an economy, or a (non-conscious) attempt to minimize predictive error or something so as to secure neural resources, or a theremodynamic flow of sunlight or something). Like, maybe it is this way in humans, and maybe it is or will be this way in an AI. Maybe there won't be anything that is well-regarded as "terminal goals."
I said something like this to a friend, who was like "well, sure, the things that are 'terminal' goals for me are often 'instrumental' goals for evolution, who cares?" The thing I care about here is: how "fixed" are the goals, do they resist updating/dissolving when they cease being "profitable" from the perspective of thingies in an underlying substrate, or are they constantly changing as what is profitable changes? Like, imagine a kid who cares about playing "good, fun" videogames, but whose notion of which games are this updates pretty continually as he gets better at gaming. I'm not sure it makes that much sense to think of this as a "terminal goal" in the same sense that "make a bunch of diamond paperclips according to this fixed specification" is a terminal goal. It might be differently satiable, differently in touch with what's below it, I'm not really sure why I care but I think it might matter for what kind of thing organisms/~agent-like-things are.
There’s a thing I’m personally confused about that seems related to the OP, though not directly addressed by it. Maybe it is sufficiently on topic to raise here.
My personal confusion is this:
Some of my (human) goals are pretty stable across time (e.g. I still like calories, and being a normal human temperature, much as I did when newborn). But a lot of my other “goals” or “wants” form and un-form without any particular “convergent instrumental drives”-style attempts to protect said “goals” from change.
As a bit of an analogy (to how I think I and other humans might approximately act): in a well-functioning idealized economy, an apple pie-making business might form (when it was the case that apple pie would deliver a profit over the inputs of apples plus the labor of those involved plus etc.), and might later fluidly un-form (when it ceased to be profitable), without "make apple pies" or "keep this business afloat" becoming a thing that tries to self-perpetuate in perpetuity. I think a lot of my desires are like this (I care intrinsically about getting outdoors everyday while there’s profit in it, but the desire doesn’t try to shield itself from change, and it’ll stop if getting outdoors stops having good results. And this notion of "profit" does not itself seem obviously like a fixed utility function, I think.).
I’m pretty curious about whether the [things kinda like LLMs but with longer planning horizons that we might get as natural extensions of the current paradigm, if the current paradigm extends this way, and/or the AGIs that an AI-accidentally-goes-foom process will summon] will have goals that try to stick around indefinitely, or goals that congeal and later dissolve again into some background process that'll later summon new goals, without summoning something lasting that is fixed-utility-function-shaped. (It seems to me that idealized economies do not acquire fixed or self-protective goals, and for all I know many AIs might as be like economies in this way.)
(I’m not saying this bears on risk in any particular way. Temporary goals would still resist most wrenches while they remained active, much as even an idealized apple pie business resists wrenches while it stays profitable.)
Ben Pace, honorably quoting aloud a thing he'd previously said about Ren:
the other day i said [ren] seemed to be doing well to me
to clarify, i am not sure she has not gone crazy
she might've, i'm not close enough to be confident
i'd give it 25%
I really don't like this usage of the word "crazy", which IME is fairly common in the bay area rationality community. This is for several reasons. The simple to express one is that I really read through like 40% of this dialog thinking (from its title plus early conversational volleys) that people were afraid Ren had gone, like, the kind of "crazy" that acute mania or psychosis or something often is, where a person might lose their ability to do normal tasks that almost everyone can do, like knowing what year it is or how to get to the store and back safely. Which was a set of worries I didn't need to have, in this case. I.e., my simple complaint is that it caused me confusion here.
The harder to express but more heartfelt one, is something like: the word "crazy" is a license to write people off. When people in wider society use it about those having acute psychiatric crises, they give themselves a license to write off the sense behind the perceptions of like 2% or something of the population. When the word is instead used about people who are not practicing LW!rationality, including ordinary religious people, it gives a license to write off a much larger chunk of people (~95% of the population?), so one is less apt to seek sense behind their perceptions and actions.
This sort of writing-off is a thing people can try doing, if they want, but it's a nonstandard move and I want it to be visible as such. That is, I want people to spell it out more, like: "I think Ren might've stopped being double-plus-sane like all the rest of us are" or "I think Ren might've stopped following the principles of LW!rationality" or something. (The word "crazy" hides this as though it's the normal-person "dismiss ~2% of the population" move; these other sentences make make visible that it's an unusual and more widely dismissive move.) The reason I want this move to be made visible in this way is partly that I think the outside view on (groups of people who dismiss those who aren't members of the group) is that this practice often leads to various bad things (e.g. increased conformity as group members fear being dubbed out-group; increased blindness to outside perspectives; difficulty collaborating with skilled outsiders), and I want those risks more visible.
(FWIW, I'd have the same response to a group of democrats discussing republicans or Trump-voters as "crazy", and sometimes have. But IMO bay area rationalists say this sort of thing much more than other groups I've been part of.)
Thanks for this response; I find it helpful.
Reading it over, I want to distinguish between:
- a) Relatively thoughtless application of heuristics; (system-1integrated + fast)
- b) Taking time to reflect and notice how things seem to you once you've had more space for reflection, for taking in other peoples' experiences, for noticing what still seems to matter once you've fallen out of the day-to-day urgencies, and for tuning into the "still, quiet voice" of conscience; (system-1-integrated + slow, after a pause)
- c) Ethical reasoning (system-2-heavy, medium-paced or slow).
The brief version of my position is that (b) is awesome, while (c) is good when it assists (b) but is damaging when it is acted on in a way that disempowers rather than empowers (b).
--
The long-winded version (which may be entirely in agreement with your (Tristan's) comment, but which goes into detail because I want to understand this stuff):
I agree with you and Eccentricity that most people, including me and IMO most LWers and EAers, could benefit from doing more (b) than we tend to do.
I also agree with you that (c) can assist in doing (b). For example, it can be good for a person to ask themselves "how does this action, which I'm inclined to take, differ from the actions I condemned in others?", "what is likely to happen if I do this?", and "do my concepts and actions fit the world I'm in, or is there a tiny note of discord?"
At the same time, I don't want to just say "c is great! do more c!" because I share with the OP a concern that EA-ers, LW-ers, and people in general who attempt explicit ethical reasoning sometimes end up using these to talk themselves into doing dumb, harmful things, with the OP's example of "leave inaccurate reviews at vegan restaurants to try to save animals" giving a good example of the flavor of these errors, and with historical communism giving a good example of their potential magnitude.
My take as to the difference between virtuous use of explicit ethical reasoning, and vicious/damaging use of explicit ethical reasoning, is that virtuous use of such reasoning is aimed at cultivating and empowering a person's [prudence, phronesis, common sense, or whatever you want to call a central faculty of judgment that draws on and integrates everything the person discerns and cares about], whereas vicious/damaging uses of ethical reasoning involve taking some piece of the total set of things we care about, stabilizing it into an identity and/or a social movement ("I am a hedonistic utilitarian", "we are (communists/social justice/QAnon/EA)", and having this artificially stabilized fragment of the total set of things one cares about, act directly in the world without being filtered through one's total discernment ("Action A is the X thing to do, and I am an X, so I will take action A").
(Prudence was classically considered not only a virtue, but the "queen of the virtues" -- as Wikipedia puts it "Prudence points out which course of action is to be taken in any concrete circumstances... Without prudence, bravery becomes foolhardiness, mercy sinks into weakness, free self-expression and kindness into censure, humility into degradation and arrogance, selflessness into corruption, and temperance into fanaticism." Folk ethics, or commonsense ethics, has at its heart the cultivation of a total faculty of discernment, plus the education of this faculty to include courage/kindness/humility/whatever other virtues.)
My current guess as to how to develop prudence is basically to take an interest in things, care, notice tiny notes of discord, notice what actions have historically had what effects, notice when one is oneself "hijacked" into acting on something other than one's best judgment and how to avoid this, etc. I think this is part of what you have in mind about bringing ethical reasoning into daily life, so as to see how kindness applies in specific rather than merely claiming it'd be good to apply somehow?
Absent identity-based or social-movement-based artificial stabilization, people can and do make mistakes, including e.g. leaving inaccurate reviews in an attempt to help animals. But I think those mistakes are more likely to be part of a fairly rapid process of developing prudence (which seems pretty worth it to me), and are less likely to be frozen in and acted on for years.
(My understanding isn't great here; more dialog would be great.)
I like the question; thanks. I don't have anything smart to say about at the moment, but it seems like a cool thread.
The idea is, normally just do straightforwardly good things. Be cooperative, friendly, and considerate. Embrace the standard virtues. Don't stress about the global impacts or second-order altruistic effects of minor decisions. But also identify the very small fraction of your decisions which are likely to have the largest effects and put a lot of creative energy into doing the best you can.
I agree with this, but would add that IMO, after you work out the consequentialist analysis of the small set of decisions that are worth intensive thought/effort/research, it is quite worthwhile to additionally work out something like a folk ethical account of why your result is correct, or of how the action you're endorsing coheres with deep virtues/deontology/tropes/etc.
There are two big upsides to this process:
- As you work this out, you get some extra checks on your reasoning -- maybe folk ethics sees something you're missing here; and
- At least as importantly: a good folk ethical account will let individuals and groups cohere around the proposed action, in a simple, conscious, wanting-the-good-thing-together way, without needing to dissociate from what they're doing (whereas accounts like "it's worth dishonesty in this one particular case" will be harder to act on wholeheartedly, even when basically correct). And this will work a lot better.
IMO, this is similar to: in math, we use heuristics and intuitions and informal reasoning a lot, to guess how to do things -- and we use detailed, not-condensed-by-heuristics algebra or mathematical proof steps sometimes also, to work out how a thing goes that we don't yet find intuitive or obvious. But after writing a math proof the sloggy way, it's good to go back over it, look for "why it worked," "what was the true essence of the proof, that made it tick," and see if there is now a way to "see it at a glance," to locate ways of seeing that will make future such situations more obvious, and that can live in one's system 1 and aesthetics as well as in one's sloggy explicit reasoning.
Or, again, in coding: usually we can use standard data structures and patterns. Sometimes we have to hand-invent something new. But after coming up with the something new: it's good, often, to condense it into readily parsable/remember-able/re-useable stuff, instead of hand spaghetti code.
Or, in physics and many other domains: new results are sometimes counterintuitive, but it is advisable to then work out intuitions whereby reality may be more intuitive in the future.
I don't have my concepts well worked out here yet, which is why I'm being so long-winded and full of analogies. But I'm pretty sure that folk ethics, where we have it worked out, has a bunch of advantages over consequentialist reasoning that're kind of like those above.
- As the OP notes, folk ethics can be applied to hundreds of decisions per day, without much thought per each;
- As the OP notes, folk ethics have been tested across huge numbers of past actions by huge numbers of people. New attempts at folk ethical reasoning can't have this advantage fully. But: I think when things are formulated simply enough, or enough in the language of folk ethics, we can back-apply them a lot more on a lot of known history and personally experienced anecdotes ad so on (since they are quick to apply, as in the above bullet point), and can get at least some of the "we still like this heuristic after considering it in a lot of different contexts with known outcomes" advantage.
- As OP implies, folk ethics is more robust to a lot of the normal human bias temptations ("x must be right, because I'd find it more convenient right this minute") compared to case-by-case reasoning;
- It is easier for us humans to work hard on something, in a stable fashion, when we can see in our hearts that it is good, and can see how it relates to everything else we care about. Folk ethics helps with this. Maybe folk ethics, and notions of virtue and so on, kind of are takes on how a given thing can fit together with all the little decisions and all the competing pulls as to what's good? E.g. the OP lists as examples of commonsense goods "patience, respect, humility, moderation, kindness, honesty" -- and all of these are pretty usable guides to how to be while I care about something, and to how to relate that caring to l my other cares and goals.
- I suspect there's something particularly good here with groups. We humans often want to be part of groups that can work toward a good goal across a long period of time, while maintaining integrity, and this is often hard because groups tend to degenerate with time into serving individuals' local power, becoming moral fads, or other things that aren't as good as the intended purpose. Ethics, held in common by the group's common sense, is a lot of how this is ever avoided, I think; and this is more feasible if the group is trying to serve a thing whose folk ethics (or "commonsense good") has been worked out, vs something that hasn't.
For a concrete example:
AI safety obviously matters. The folk ethics of "don't let everyone get killed if you can help it" are solid, so that part's fine. But in terms of tactics: I really think we need to work out a "commonsense good" or "folk ethics" type account of things like:
- Is it okay to try to get lots of power, by being first to AI and trying to make use of that power to prevent worse AI outcomes? (My take: maybe somehow, but I haven't seen the folk ethics worked out, and a good working out would give a lot of checks here, I think.)
- Is it okay to try to suppress risky research, e.g. via frowning at people and telling them that only bad people do AI research, so as to try to delay tech that might kill everyone? (My take: probably, on my guess -- but a good folk ethics would bring structure and intuitions somehow, like, it would work out how this is different from other kinds of "discourage people from talking and figuring things out," it would have perceivable virtues or something for noticing the differences, which would help people then track the differences on the group commonsense level in ways that help the group's commonsense not erode its general belief in the goodness of people sharing information and doing things).
I agree, '"Flinching away from truth" is often caused by internal conflation" would be a much better title -- a more potent short take-away. (Or at least one I more agree with after some years of reflection.) Thanks!
I enjoyed this post, both for its satire of a bunch of peoples' thinking styles (including mine, at times), and because IMO (and in the author's opinion, I think), there are some valid points near here and it's a bit tricky to know which parts of the "jokes/poetry" may have valid analogs.
I appreciate the author for writing it, because IMO we have a whole bunch of different subcultures and styles of conversation and sets of assumptions colliding all of a sudden on the internet right now around AI risk, and noticing the existence of the others seems useful, and IMO the OP is an attempt to collide LW with some other styles. Judging from the comments it seems to me not to have succeeded all that much; but it was helpful to me, and I appreciate the effort. (Though, as a tactical note, it seems to me the approximate failure was due mostly to piece's the sarcasm, and I suspect sarcasm in general tends not to work well across cultural or inferential distances.)
Some points I consider valid, that also appear within [the vibes-based reasoning the OP is trying to satirize, and also to model and engage with]:
1) Sometimes, talking a lot about a very specific fear can bring about the feared scenario. (An example I'm sure of: a friend's toddler stuck her hands in soap. My friend said "don't touch your eyes." The toddler, unclear on the word 'not,' touched her eyes.) (A possible example I'm less confident in: articulated fears of AI risk may have accelerated AI because humanity's collective attentional flows, like toddlers, has no reasonable implementation of the word "not.") This may be a thing to watch out for for an AI risk movement.
(I think this is non-randomly reflected in statements like: "worrying has bad vibes.")
2) There's a lot of funny ways that attempting to control people or social processes can backfire. (Example: lots of people don't like it when they feel like something is trying to control them.) (Example: the prohibition of alcohol in the US between 1917-1933 is said to have fueled organized crime.) (Example I'm less confident of: Trying to keep e.g. anti-vax views out of public discourse leads some to be paranoid, untrusting of establishment writing on the subject.)This is a thing that may make trouble for some safety strategies, and that seems to me to be non-randomly reflected in "trying to control things has bad vibes."
(Though, all things considered, I still favor trying to slow things! And I care about trying to slow things.)
3) There're a lot of places where different schelling equilibria are available, and where groups can, should, and do try to pick the equilibrium that is better. In many cases this is done with vibes. Vibes, positivity, attending to what is or isn't cool or authentic (vs boring), etc., are part of how people decide which company to congregate on, which subculture to bring to life, which approach to AI to do research within, etc. -- and this is partly doing some real work discerning what can become intellectually vibrant (vs boring, lifeless, dissociated).
TBC, I would not want to use vibes-based reasoning in place of reasoning, and I would not want LW to accept vibes in place of reasons. I would want some/many in LW to learn to model vibes-based reasoning for the sake of understanding the social processes around us. I would also want some/many at LW to sometimes, if the rate of results pans out in a given domain, use something like vibes-based reasoning as a source of hypotheses that one can check against actual reasoning. LW seems to me pretty solid on reasoning relative to other places I know on the internet, but only mediocre on generativity; I think learning to absorb hypotheses from varied subcultures (and from varied old books, from people who thought at other times and places) would probably help, and the OP is gesturing at one such subculture.
I'm posting this comment because I didn't want to post this comment for fear of being written off by LW, and I'm trying to come out of more closets. Kinda at random, since I've spent large months or small years failing to successfully implement some sort of more planned approach.