Posts

Is "VNM-agent" one of several options, for what minds can grow up into? 2024-12-30T06:36:20.890Z
Ayn Rand’s model of “living money”; and an upside of burnout 2024-11-16T02:59:07.368Z
Scissors Statements for President? 2024-11-06T10:38:21.230Z
Believing In 2024-02-08T07:06:13.072Z
Which parts of the existing internet are already likely to be in (GPT-5/other soon-to-be-trained LLMs)'s training corpus? 2023-03-29T05:17:28.000Z
Are there specific books that it might slightly help alignment to have on the internet? 2023-03-29T05:08:28.364Z
What should you change in response to an "emergency"? And AI risk 2022-07-18T01:11:14.667Z
Comment reply: my low-quality thoughts on why CFAR didn't get farther with a "real/efficacious art of rationality" 2022-06-09T02:12:35.151Z
Narrative Syncing 2022-05-01T01:48:45.889Z
The feeling of breaking an Overton window 2021-02-17T05:31:40.629Z
“PR” is corrosive; “reputation” is not. 2021-02-14T03:32:24.985Z
Where do (did?) stable, cooperative institutions come from? 2020-11-03T22:14:09.322Z
Reality-Revealing and Reality-Masking Puzzles 2020-01-16T16:15:34.650Z
We run the Center for Applied Rationality, AMA 2019-12-19T16:34:15.705Z
AnnaSalamon's Shortform 2019-07-25T05:24:13.011Z
"Flinching away from truth” is often about *protecting* the epistemology 2016-12-20T18:39:18.737Z
Further discussion of CFAR’s focus on AI safety, and the good things folks wanted from “cause neutrality” 2016-12-12T19:39:50.084Z
CFAR's new mission statement (on our website) 2016-12-10T08:37:27.093Z
CFAR’s new focus, and AI Safety 2016-12-03T18:09:13.688Z
On the importance of Less Wrong, or another single conversational locus 2016-11-27T17:13:08.956Z
Several free CFAR summer programs on rationality and AI safety 2016-04-14T02:35:03.742Z
Consider having sparse insides 2016-04-01T00:07:07.777Z
The correct response to uncertainty is *not* half-speed 2016-01-15T22:55:03.407Z
Why CFAR's Mission? 2016-01-02T23:23:30.935Z
Why startup founders have mood swings (and why they may have uses) 2015-12-09T18:59:51.323Z
Two Growth Curves 2015-10-02T00:59:45.489Z
CFAR-run MIRI Summer Fellows program: July 7-26 2015-04-28T19:04:27.403Z
Attempted Telekinesis 2015-02-07T18:53:12.436Z
How to learn soft skills 2015-02-07T05:22:53.790Z
CFAR fundraiser far from filled; 4 days remaining 2015-01-27T07:26:36.878Z
CFAR in 2014: Continuing to climb out of the startup pit, heading toward a full prototype 2014-12-26T15:33:08.388Z
Upcoming CFAR events: Lower-cost bay area intro workshop; EU workshops; and others 2014-10-02T00:08:44.071Z
Why CFAR? 2013-12-28T23:25:10.296Z
Meetup : CFAR visits Salt Lake City 2013-06-15T04:43:54.594Z
Want to have a CFAR instructor visit your LW group? 2013-04-20T07:04:08.521Z
CFAR is hiring a logistics manager 2013-04-05T22:32:52.108Z
Applied Rationality Workshops: Jan 25-28 and March 1-4 2013-01-03T01:00:34.531Z
Nov 16-18: Rationality for Entrepreneurs 2012-11-08T18:15:15.281Z
Checklist of Rationality Habits 2012-11-07T21:19:19.244Z
Possible meetup: Singapore 2012-08-21T18:52:07.108Z
Center for Modern Rationality currently hiring: Executive assistants, Teachers, Research assistants, Consultants. 2012-04-13T20:28:06.071Z
Minicamps on Rationality and Awesomeness: May 11-13, June 22-24, and July 21-28 2012-03-29T20:48:48.227Z
How do you notice when you're rationalizing? 2012-03-02T07:28:21.698Z
Urges vs. Goals: The analogy to anticipation and belief 2012-01-24T23:57:04.122Z
Poll results: LW probably doesn't cause akrasia 2011-11-16T18:03:39.359Z
Meetup : Talk on Singularity scenarios and optimal philanthropy, followed by informal meet-up 2011-10-10T04:26:09.284Z
[Question] Do you know a good game or demo for demonstrating sunk costs? 2011-09-08T20:07:55.420Z
[LINK] How Hard is Artificial Intelligence? The Evolutionary Argument and Observation Selection Effects 2011-08-29T05:27:31.636Z
Upcoming meet-ups 2011-06-21T22:28:40.610Z
Upcoming meet-ups: 2011-06-11T22:16:09.641Z

Comments

Comment by AnnaSalamon on Murder plots are infohazards · 2025-02-16T00:22:28.238Z · LW · GW

I got to the suggestion by imagining: suppose you were about to quit the project and do nothing.  And now suppose that instead of that, you were about to take a small amount of relatively inexpensive-to-you actions, and then quit the project and do nothing.  What're the "relatively inexpensive-to-you actions" that would most help?

Publishing the whole list, without precise addresses or allegations, seems plausible to me.

I guess my hope is: maybe someone else (a news story, a set of friends, something) would help some of those on the list to take it seriously and take protective action, maybe after awhile, after others on the list were killed or something.  And maybe it'd be more parsable to people if had been hanging out on the internet for a long time, as a pre-declared list of what to worry about, with visibly no one being there to try to collect payouts or something.

Comment by AnnaSalamon on Murder plots are infohazards · 2025-02-14T22:20:11.255Z · LW · GW

Maybe some of those who received the messages were more alert to their surroundings after receiving it, even if they weren't sure it was real and didn't return the phone/email/messages?

I admit this sounds like a terrible situation.

Comment by AnnaSalamon on Murder plots are infohazards · 2025-02-14T22:13:57.152Z · LW · GW

Gotcha.  No idea if this is a good or bad idea, but: what are your thoughts on dumping an edited version of it onto the internet, including names, photos and/or social media links, and city/country but not precise addresses or allegations?

Comment by AnnaSalamon on Murder plots are infohazards · 2025-02-14T20:27:07.399Z · LW · GW

Can you notify the intended victims?  Or at least the more findable intended victims?

Comment by AnnaSalamon on Is being sexy for your homies? · 2025-01-12T18:20:25.651Z · LW · GW
  • A man being deeply respected and lauded by his fellow men, in a clearly authentic and lasting way, seems to be a big female turn-on. Way way way bigger effect size than physique best as I can tell.
    • …but the symmetric thing is not true! Women cheering on one of their own doesn't seem to make men want her more. (Maybe something else is analogous, the way female "weight lifting" is beautification?)

My guess at the analogous thing: women being kind/generous/loving seems to me like a thing many men have found attractive across times and cultures, and seems to me far more viable if a woman is embedded in a group who recognize her, tell her she is cared about and will be protected by a network of others, who in fact shield her from some kinds of conflict/exploitation, who help there be empathy for her daily cares and details to balance out the attentional flow of these she gives to others, etc.  So the group plays a support role in a woman being able to have/display the quality.

Comment by AnnaSalamon on Is "VNM-agent" one of several options, for what minds can grow up into? · 2025-01-02T17:47:11.472Z · LW · GW

Steven Brynes wrotes:

 "For example, I expect that AGIs will be able to self-modify in ways that are difficult for humans (e.g. there’s no magic-bullet super-Adderall for humans), which impacts the likelihood of your (1a)."

My (1a) (and related (1b)), for reference:

(1a) “You” (the decision-maker process we are modeling) can choose anything you like, without risk of losing control of your hardware.  (Contrast case: if the ruler of a country chooses unpopular policies, they are sometimes ousted.  If a human chooses dieting/unrewarding problems/social risk, they sometimes lose control of themselves.)

1b) There are no costs to maintaining control of your mind/hardware.  (Contrast case: if a company hires some brilliant young scientists to be creative on its behalf, it often has to pay a steep overhead if it additionally wants to make sure those scientists don’t disrupt its goals/beliefs/normal functioning.)

I'm happy to posit an AGI with powerful ability to self-modify.  But, even so, my (nonconfident) guess is that it won't have property (1a), at least not costlessly.

My admittedly handwavy reasoning:

  • Self-modification doesn't get you all powers: some depend on the nature of physics/mathematics.  E.g. it may still be that verifying a proof is easier than generating a proof, for our AGI.
  • Intelligence involves discovering new things, coming into contact with what we don't specifically expect (that's why we bother to spend compute on it).  Let's assume our powerful AGI is still coming into contact with novel-to-it mathematics/empirics/neat stuff.   Questions are: is it (possible at all / possible at costs worth paying) to anticipate enough about what it will uncover that it can prevent the new things from destablilizing its centralized goals/plans/["utility function" if it has one]?  I... am really not sure what the answers to these questions are, even for powerful AGI that has powerfully self-modified!  There are maybe alien-to-it AGIs out there  encoded in mathematics, waiting to boot up within it as it does its reasoning.
Comment by AnnaSalamon on Is "VNM-agent" one of several options, for what minds can grow up into? · 2024-12-31T03:20:02.141Z · LW · GW

I just paraphrased the OP for a friend who said he couldn't decipher it.  He said it helped, so I'm copy-pasting here in case it clarifies for others.

I'm trying to say:

A) There're a lot of "theorems" showing that a thing is what agents will converge on, or something, that involve approximations ("assume a frictionless plane") that aren't quite true.

B) The "VNM utility theorem" is one such theorem, and involves some approximations that aren't quite true.  So does e.g. Steve Omohundro's convergent instrumental drives, the "Gandhi folk theorems" showing that an agent will resist changes to its utility function, etc.

C) So I don't think the VNM utility theorem means that all minds will necessarily want to become VNM agents, nor to follow instrumental drives, nor to resist changes to their "utility functions" (if indeed they have a "utility function").

D) But "be a better VNM-agent" "follow the instrumental Omohundro drives" etc. might still be a self-fulfilling prophecy for some region, partially.  Like, humans or other entities who think its rational to be VNM agents might become better VNM agents, who might become better VNM agents, for awhile.

E) And there might be other [mathematically describable mind-patterns] that can serve as alternative self-propagating patterns, a la D, that're pretty different from "be a better VNM-agent."  E.g. "follow the god of nick land".

F) And I want to know what are all the [mathematically describable mind-patterns, that a mind might decide to emulate, and that might make a kinda-stable attractor for awhile, where the mind and its successors keeps emulating that mind-pattern for awhile].  They'll probably each have a "theorem" attached that involves some sort of approximation (a la "assume a frictionless plane").

Comment by AnnaSalamon on Is "VNM-agent" one of several options, for what minds can grow up into? · 2024-12-31T02:40:33.797Z · LW · GW

There is a problem that, other things equal, agents that care about the state of the world in the distant future, to the exclusion of everything else, will outcompete agents that lack that property. This is self-evident, because we can operationalize “outcompete” as “have more effect on the state of the world in the distant future”.

I am not sure about that!

One way this argument could fail: maybe agents who  care exclusively about the state of the world in the distant future end up, as part of their optimizing, creating other agents who care in different ways from that.

In that case, they would “have more effect on the state of the world in the distant future”, but they might not “outcompete” other agents (in the common-sensical way of understanding “outcompete”).

A person might think this implausible, because they might think that a smart agent who cares exclusively about X can best achieve X by having all minds they create also be [smart agents who care exclusively about X.

But, I’m not sure this is true, basically for reasons of not trusting assumptions (1), (2), (3), and (4) that I listed here.

(As one possible sketch: a mind whose only goal is to map branch B of mathematics might find it instrumentally useful to map a bunch of other branches of mathematics.  And, since supervision is not free, it might be more able to do this efficiently if it creates researchers who have an intrinsic interest in math-in-general, and who are not being fully supervised by exclusively-B-interested minds.)

Comment by AnnaSalamon on Consequentialism & corrigibility · 2024-12-30T23:24:55.522Z · LW · GW

or more centrally, long after I finish the course of action.

I don't understand why the more central thing is "long after I finish the course of action" as opposed to "in ways that are clearly 'external to' the process called 'me', that I used to take the actions."

Comment by AnnaSalamon on Is "VNM-agent" one of several options, for what minds can grow up into? · 2024-12-30T23:21:17.264Z · LW · GW

Thanks; fixed.

Comment by AnnaSalamon on Is "VNM-agent" one of several options, for what minds can grow up into? · 2024-12-30T20:30:04.458Z · LW · GW

I was trying to explain to Habryka why I thought (1), (3) and (4) are parts of the assumptions under which the VNM utility theorem is derived.

I think all of (1), (2), (3) and (4) are part of the context I've usually pictured in understanding VNM as having real-world application, at least.  And they're part of this context because I've been wanting to think of a mind as having persistence, and persistent preferences, and persistent (though rationally updated) beliefs about what lotteries of outcomes can be chosen via particular physical actions, and stuff.  (E.g., in Scott's example about the couple, one could say "they don't really violate independence; they just care also about process-fairness" or something, but, ... it seems more natural to attach words to real-world scenarios in such a way as to say the couple does violate independence.  And when I try to reason this way, I end up thinking that all of (1)-(4) are part of the most natural way to try to get the VNM utility theorem to apply to the world with sensible, non-Grue-like word-to-stuff mappings.)

I'm not sure why Habryka disagrees.  I feel like lots of us are talking past each other in this subthread, and am not sure how to do better.

I don't think I follow your (Mateusz's) remark yet.

Comment by AnnaSalamon on Is "VNM-agent" one of several options, for what minds can grow up into? · 2024-12-30T20:07:44.635Z · LW · GW

I... don't think I'm taking the hidden order of the universe non-seriously.  If it matters, I've been obsessively rereading Christopher Alexander's "The nature of order" books, and trying to find ways to express some of what he's looking at in LW-friendly terms; this post is part of an attempt at that.  I have thousands and thousands of words of discarded drafts about it.

Re: why I think there might be room in the universe for multiple aspirational models of agency, each of which can be self-propagating for a time, in some contexts: Biology and culture often seem to me to have multiple kinda-stable equilibria.  Like, eyes are pretty great, but so is sonar, and so is a sense of smell, or having good memory and priors about one's surroundings, and each fulfills some of the same purposes.  Or diploidy and haplodiploidy are both locally-kinda-stable reproductive systems.

What makes you think I'm insufficiently respecting the hidden order of the universe?

Comment by AnnaSalamon on Is "VNM-agent" one of several options, for what minds can grow up into? · 2024-12-30T19:46:18.113Z · LW · GW

I agree.  I love "Notes on the synthesis of form" by Christopher Alexander, as a math model of things near your vase example.

Comment by AnnaSalamon on Is "VNM-agent" one of several options, for what minds can grow up into? · 2024-12-30T19:33:11.970Z · LW · GW

I agree with your claim that VNM is in some ways too lax.

vNM is .. too restrictive ... [because] vNM requires you to be risk-neutral. Risk aversion violates preferences being linear in probability ... Many people desperately want risk aversion, but that's not the vNM way.

Do many people desperately want to be risk averse about the probability a given outcome will be achieved?  I agree many people want to be loss averse about e.g. how many dollars they will have.  Scott Garrabrant provides an example in which a couple wishes to be fair to its members via compensating for other scenarios in which things would've been done the husband's way (even though those scenarios did not 
Scott's example is ... sort of an example of risk aversion about probabilities?  I'd be interested in other examples if you have them.

Comment by AnnaSalamon on Is "VNM-agent" one of several options, for what minds can grow up into? · 2024-12-30T19:31:17.938Z · LW · GW

The VNM axioms refer to an "agent" who has "preferences" over lotteries of outcomes.  It seems to me this is challenging to interpret if there isn't a persistent agent, with a persistent mind, who assigns Bayesian subjective probabilities to outcomes (which I'm assuming it has some ability to think about and care about, i.e. my (4)), and who chooses actions based on their preferences between lotteries.  That is, it seems to me the axioms rely on there being a mind that is certain kinds of persistent/unaffected.

Do you (habryka) mean there's a new "utility function" at any given moment, made of "outcomes" that can include parts of how the agent runs its own inside?  Or can you say more about VNM is compatible with the negations of my 1, 3, and 4, or otherwise give me more traction for figuring out where our disagreement is coming from?

I was reasoning mostly from "what're the assumptions required for an agent to base its choices on the anticipated external consequences of those choices."

Comment by AnnaSalamon on Is "VNM-agent" one of several options, for what minds can grow up into? · 2024-12-30T18:40:10.337Z · LW · GW

The standard dutch-book arguments seem like pretty good reason to be VNM-rational in the relevant sense.

I mean, there are arguments about as solid as the “VNM utility theorem” pointing to CDT, but CDT is nevertheless not always the thing to aspire to, because CDT is based on an assumption/approximation that is not always a good-enough approximation (namely, CDT assumes our minds have no effects except via our actions, eg it assumes our minds have no direct effects on others’ predictions about us).

Some assumptions the VNM utility theorem is based on, that I suspect aren’t always good-enough approximations for the worlds we are in:

1) VNM assumes there are no important external incentives, that’ll give you more of what you care about if you run your mind (certain ways, not other ways).  So, for example:

1a) “You” (the decision-maker process we are modeling) can choose anything you like, without risk of losing control of your hardware.  (Contrast case: if the ruler of a country chooses unpopular policies, they are sometimes ousted.  If a human chooses dieting/unrewarding problems/social risk, they sometimes lose control of themselves.)

1b) There are no costs to maintaining control of your mind/hardware.  (Contrast case: if a company hires some brilliant young scientists to be creative on its behalf, it often has to pay a steep overhead if it additionally wants to make sure those scientists don’t disrupt its goals/beliefs/normal functioning.)

1c) We can’t acquire more resources by changing who we are via making friends, adopting ethics that our prospective friends want us to follow, etc.

2) VNM assumes the independence axiom.  (Contrast case: Maybe we are a “society of mind” that has lots of small ~agents that will only stay knitted together if we respect “fairness” or something.  And maybe the best ways of doing this violate the independence axiom.  See Scott Garrabrant.)  (Aka, I’m agreeing with Jan.)

2a) And maybe this’ll keep being true, even if we get to reflect a lot, if we keep wanting to craft in new creative processes that we don’t want to pay to keep fully supervised.

3) (As Steven Byrnes notes) we care only about the external world, and don’t care about the process we use to make decisions.  (Contrast case: we might have process proferences, as well as outcome preferences.)

4) We have accurate external reference.  Like, we can choose actions based on what external outcomes we want, and this power is given to us for free, stably.  (Contrast case: ethics is sometimes defended as a set of compensations for how our maps predictably diverge from the territory, e.g. running on untrustworthy hardware, or “respect people, because they’re bigger than your map of them so you should expect they may benefit from e.g. honesty in ways you won’t manage to specifically predict.”)  (Alternate contrast case: it’s hard to build a mind that can do external reference toward e.g. “diamonds”).

Comment by AnnaSalamon on Information vs Assurance · 2024-11-28T22:22:07.401Z · LW · GW

Seems helpful for understanding how believing-ins get formed by groups, sometimes.

Comment by AnnaSalamon on Ayn Rand’s model of “living money”; and an upside of burnout · 2024-11-19T19:56:25.187Z · LW · GW

"Global evaluation" isn't exactly what I'm trying to posit; more like a "things bottom-out in X currency" thing.

Like, in the toy model about $ from Atlas Shrugged, an heir who spends money foolishly eventually goes broke, and can no longer get others to follow their directions.  This isn't because the whole economy gets together to evaluate their projects.  It's because they spend their currency locally on things again and again, and the things they bet on do not pay off, do not give them new currency.

I think the analog happens in me/others: I'll get excited about some topic, pursue it for awhile, get back nothing, and decide the generator of that excitement was boring after all.

Comment by AnnaSalamon on Ayn Rand’s model of “living money”; and an upside of burnout · 2024-11-19T00:43:59.195Z · LW · GW

Hmm.  Under your model, are there ways that parts gain/lose (steam/mindshare/something)?

Comment by AnnaSalamon on Dragon Agnosticism · 2024-11-18T02:06:03.540Z · LW · GW

Does it feel to you as though your epistemic habits / self-trust / intellectual freedom and autonomy / self-honesty takes a hit here?

Comment by AnnaSalamon on Dragon Agnosticism · 2024-11-18T00:47:13.245Z · LW · GW

Fair point; I was assuming you had the capacity to lie/omit/deceive, and you're right that we often don't, at least not fully.

I still prefer my policy to the OPs, but I accept your argument that mine isn't a simple Pareto improvement.

Still:

  • I really don't like letting social forces put "don't think about X" flinches into my or my friends' heads; and the OPs policy seems to me like an instance of that;
  • Much less importantly: as an intelligent/self-reflective adult, you may be better at hiding info if you know what you're hiding, compared to if you have guesses you're not letting yourself see, that your friends might still notice.  (The "don't look into dragons" path often still involves hiding info, since often your brain takes a guess anyhow, and that's part of how you know not to look into this one.  If you acknowledge the whole situation, you can manage your relationships consciously, including taking conscious steps to buy openness-offsets, stay freely and transparently friends where you can scheme out how.)
Comment by AnnaSalamon on Dragon Agnosticism · 2024-11-17T22:23:48.073Z · LW · GW

I don't see advantage to remaining agnostic, compared to:

1) Acquire all the private truth one can.

Plus:

2) Tell all the public truth one is willing to incur the costs of, with priority for telling public truths about what one would and wouldn't share (e.g. prioritizing to not pose as more truth-telling than one is).

--

The reason I prefer this policy to the OP's "don't seek truth on low-import highly-politicized matters" is that I fear not-seeking-truth begets bad habits.  Also I fear I may misunderstand how important things are if I allow politics to influence which topics-that-interest-my-brain I do/don't pursue, compared to my current policy of having some attentional budget for "anything that interests me, whether or not it seems useful/virtuous."

Comment by AnnaSalamon on Ayn Rand’s model of “living money”; and an upside of burnout · 2024-11-16T21:01:07.954Z · LW · GW

Yes, this is a good point, relates to why I claimed at top that this is an oversimplified model.  I appreciate you using logic from my stated premises; helps things be falsifiable.

It seems to me:

  • Somehow people who are in good physical health wake up each day with a certain amount of restored willpower.  (This is inconsistent with the toy model in the OP, but is still my real / more-complicated model.)
  • Noticing spontaneously-interesting things can be done without willpower; but carefully noticing superficially-boring details and taking notes in hopes of later payoff indeed requires willpower, on my model.  (Though, for me, less than e.g. going jogging requires.)
  • If you’ve just been defeated by a force you weren’t tracking, that force often becomes spontaneously-interesting.  Thus people who are burnt out can sometimes take a spontaneous interest in how willpower/burnout/visceral motivation works, and can enjoy “learning humbly” from these things. 
  • There’s a way burnout can help cut through ~dumb/dissociated/overconfident ideological frameworks (e.g. “only AI risk is interesting/relevant to anything”), and make space for other information to have attention again, and make it possible to learn things not in one's model.  Sort of like removing a monopoly business from a given sector, so that other thingies have a shot again.

I wish the above was more coherent/model-y.

Comment by AnnaSalamon on Ayn Rand’s model of “living money”; and an upside of burnout · 2024-11-16T20:09:59.883Z · LW · GW

Thanks for asking.  The toy model of “living money”, and the one about willpower/burnout, are meant to appeal to people who don’t necessarily put credibility in Rand; I’m trying to have the models speak for themselves; so you probably *are* in my target audience.  (I only mentioned Rand because it’s good to credit models’ originators when using their work.)

Re: what the payout is:

This model suggests what kind of thing an “ego with willpower” is — where it comes from, how it keeps in existence:

  • By way of analogy: a squirrel is a being who turns acorns into poop, in such a way as to be able to do more and more acorn-harvesting (via using the first acorns’-energy to accumulate fat reserves and knowledge of where acorns are located).
  • An “ego with willpower”, on this model, is a ~being who turns “reputation with one’s visceral processes” into actions, in such a way as to be able to garner more and more “reputation with one’s visceral processes” over time.  (Via learning how to nourish viscera, and making many good predictions.)

I find this a useful model.

One way it’s useful:

IME, many people think they get willpower by magic (unrelated to their choices, surroundings, etc., although maybe related to sleep/food/physiology), and should use their willpower for whatever some abstract system tells them is virtuous.

I think this is a bad model (makes inaccurate predictions in areas that matter; leads people to have low capacity unnecessarily).

The model in the OP, by contrast, suggests that it’s good to take an interest in which actions produce something you can viscerally perceive as meaningful/rewarding/good, if you want to be able to motivate yourself to take actions.

(IME this model works better than does trying to think in terms of physiology solely, and is non-obvious to some set of people who come to me wondering what part of their machine is broken-or-something such that they are burnt out.)

(Though FWIW, IME physiology and other basic aspects of well-being also has important impacts, and food/sleep/exercise/sunlight/friends are also worth attending to.)

Comment by AnnaSalamon on Scissors Statements for President? · 2024-11-13T00:29:40.241Z · LW · GW

I mean, I see why a party would want their members to perceive the other party's candidate as having a blind spot.  But I don't see why they'd be typically able to do this, given that the other party's candidate would rather not be perceived this way, the other party would rather their candidate not be perceived this way, and, naively, one might expect voters to wish not to be deluded.  It isn't enough to know there's an incentive in one direction; there's gotta be more like a net incentive across capacity-weighted players, or else an easier time creating appearance-of-blindspots vs creating visible-lack-of-blindspots, or something.  So, I'm somehow still not hearing a model that gives me this prediction.

Comment by AnnaSalamon on Scissors Statements for President? · 2024-11-11T06:39:18.667Z · LW · GW

You raise a good point that Susan’s relationship to Tusan and Vusan is part of what keeps her opinions stuck/stable.

But I’m hopeful that if Susan tries to “put primary focal attention on where the scissors comes from, and how it is working to trick Susan and Robert at once”, this’ll help with her stuckness re: Tusan and Vusan.  Like, it’ll still be hard, but it’ll be less hard than “what if Robert is right” would be.

Reasons I’m hopeful:

I’m partly working from a toy model in which (Susan and Tusan and Vusan) and (Robert and Sobert and Tobert) all used to be members of a common moral community, before it got scissored.  And the norms and memories of that community haven’t faded all the way.

Also, in my model, Susan’s fear of Tusan’s and Vusan’s punishment isn’t mostly fear of e.g. losing her income or other material-world costs.  It is mostly fear of not having a moral community she can be part of.  Like, of there being nobody who upholds norms that make sense to her and sees her as a member-in-good-standing of that group of people-with-sensible-norms.

Contemplating the scissoring process… does risk her fellowship with Tusan and Vusan, and that is scary and costly for Susan.

But:

  • a) Tusan and Vusan are not *as* threatened by it as if Susan had e.g. been considering more directly whether Candidate X was good.  I think.
  • b) Susan is at least partially compensated by her partial-risk-of-losing-Tusan-and-Vusan, by the hope/memory of the previous society that (Susan and Tusan and Vusan) and (Robert and Sobert and Tobert) all shared, which she has some hope of reaccessing here
  • b2) Tusan and Vusan are maybe also a bit tempted by this, which on their simpler models (since they’re engaging with Susan’s thoughts only very loosely / from a distance, as they complain about Susan) renders as “maybe she can change some of the candidate X supporters, since she’s discussing how they got tricked”
  • c) There are maybe some remnant-norms within the larger (pre-scissored) community that can appreciate/welcome Susan and her efforts.

I’m not sure I’m thinking about this well, or explicating it well.  But I feel there should be some unscissoring process?

Comment by AnnaSalamon on Scissors Statements for President? · 2024-11-07T09:03:41.672Z · LW · GW

I don't follow this model yet.  I see why, under this model, a party would want the opponent's candidate to enrage people / have a big blind spot (and how this would keep the extremes on their side engaged), but I don't see why this model would predict that they would want their own candidate to enrage people / have a big blind spot.

Comment by AnnaSalamon on Scissors Statements for President? · 2024-11-07T08:44:41.724Z · LW · GW

Thanks; I love this description of the primordial thing, had not noticed this this clearly/articulately before, it is helpful.

Re: why I'm hopeful about the available levers here: 

I'm hoping that, instead of Susan putting primary focal attention on Robert ("how can he vote this way, what is he thinking?"), Susan might be able to put primary focal attention on the process generating the scissors statements: "how is this thing trying to trick me and Robert, how does it work?"

A bit like how a person watching a commercial for sugary snacks, instead of putting primary focal attention on the smiling person on the screen who seems to desire the snacks, might instead put primary focal attention on "this is trying to trick me."  

(My hope is that this can become more feasible if we can provide accurate patterns for how the scissors-generating-process is trying to trick Susan(/Robert).  And that if Susan is trying to figure out how she and Robert were tricked, by modeling the tricking process, this can somehow help undo the trick, without needing to empathize at any point with "what if candidate X is great."

Comment by AnnaSalamon on Scissors Statements for President? · 2024-11-06T18:38:53.758Z · LW · GW

Or: by seeing themselves, and a voter for the other side, as co-victims of an optical illusion, designed to trick each of them into being unable to find another's areas of true seeing.  And by working together to figure out how the illusion works, while seeing it as a common enemy.

But my specific hypothesis here is that the illusion works by misconstruing the other voter's "Robert can see a problem with candidate Y" as "Robert can't see the problem with candidate X", and that if you focus on trying to decode the first the illusion won't kick in as much. 

Comment by AnnaSalamon on Scissors Statements for President? · 2024-11-06T18:02:54.513Z · LW · GW

By parsing the other voter as "against X" rather than "for Y", and then inquiring into how they see X as worth being against, and why, while trying really hard to play taboo and avoid ontological buckets.

Comment by AnnaSalamon on Scissors Statements for President? · 2024-11-06T18:02:08.847Z · LW · GW

Huh.  Is your model is that surpluses are all inevitably dissipated in some sort of waste/signaling cascade?  This seems wrong to me but also like it's onto something.

Comment by AnnaSalamon on Scissors Statements for President? · 2024-11-06T18:01:00.702Z · LW · GW

I like your conjecture about Susan's concern about giving Robert steam.

I am hoping that if we decode the meme structure better, Susan could give herself and Robert steam re: "maybe I, Susan, am blind to some thing, B, that matters" without giving steam to "maybe A doesn't matter, maybe Robert doesn't have a blind spot there."  Like, maybe we can make a more specific "try having empathy right at this part" request that doesn't confuse things the same way.  Or maybe we can make a world where people who don't bother to try that look like schmucks who aren't memetically savvy, or something.  I think there might be room for something like this?

Comment by AnnaSalamon on Scissors Statements for President? · 2024-11-06T11:01:36.388Z · LW · GW

If we can get good enough models of however the scissors-statements actually work, we might be able to help more people be more in touch with the common humanity of both halves of the country, and more able to heal blind spots.

E.g., if the above model is right, maybe we could tell at least some people "try exploring the hypothesis that Y-voters are not so much in favor of Y, as against X -- and that you're right about the problems with Y, but they might be able to see something that you and almost everyone you talk to is systematically blinded to about X."

We can build a useful genre-savviness about common/destructive meme patterns and how to counter them, maybe.  LessWrong is sort of well-positioned to be a leader there: we have analytic strength, and aren't too politically mindkilled.

Comment by AnnaSalamon on Stephen Fowler's Shortform · 2024-05-19T03:06:41.201Z · LW · GW

I don't know the answer, but it would be fun to have a twitter comment with a zillion likes asking Sam Altman this question.  Maybe someone should make one?

Comment by AnnaSalamon on A Dozen Ways to Get More Dakka · 2024-04-09T02:31:28.935Z · LW · GW

I've bookedmarked this; thank you; I expect to get use from this list.

Comment by AnnaSalamon on On green · 2024-03-26T16:26:52.321Z · LW · GW

Resonating from some of the OP:

Sometimes people think I have a “utility function” that is small and is basically “inside me,” and that I also have a set of beliefs/predictions/anticipations that is large, richly informed by experience, and basically a pointer to stuff outside of me.

I don’t see a good justification for this asymmetry.

Having lived many years, I have accumulated a good many beliefs/predictions/anticipations about outside events: I believe I’m sitting at a desk, that Biden is president, that 2+3=5, and so on and so on.  These beliefs came about via colliding a (perhaps fairly simple, I’m not sure) neural processing pattern with a huge number of neurons and a huge amount of world.  (Via repeated conscious effort to make sense of things, partly.)

I also have a good deal of specific preference, stored in my ~perceptions of “good”: this chocolate is “better” than that one; this short story is “excellent” while that one is “meh”; such-and-such a friendship is “deeply enriching” to me; this theorem is “elegant, pivotal, very cool” and that code has good code-smell while this other theorem and that other code are merely meh; etc.

My guess is that my perceptions of which things are “good” encodes quite a lot pattern that really is in the outside world, much like my perceptions of which things are “true/real/good predictions.”

My guess is that it’s confused to say my perceptions of which things are “good” is mostly about my utility function, in much the same way that it’s confused to say that my predictions about the world is mostly about my neural processing pattern (instead of acknowledging that it’s a lot about the world I’ve been encountering, and that e.g. the cause of my belief that I’m currently sitting at a desk is mostly that I’m currently sitting at a desk).

Comment by AnnaSalamon on On attunement · 2024-03-26T03:50:04.539Z · LW · GW

And this requires what I've previously called "living from the inside," and "looking out of your own eyes," instead of only from above. In that mode, your soul is, indeed, its own first principle; what Thomas Nagel calls the "Last Word." Not the seen-through, but the seer (even if also: the seen).

 

I like this passage! It seems to me that sometimes I (perceive/reason/act) from within my own skin and perspective: "what do I want now? what's most relevant? what do I know, how do I know it, what does it feel like, why do I care? what even am I, this process that finds itself conscious right now?"  And then I'm more likely to be conscious, here, caring.  (I'm not sure what I mean by this, but I'm pretty sure I mean something, and that it's important.)

One thing that worries me a bit about contemporary life (school for 20 years, jobs where people work in heavily scripted ways using patterns acquired in school, relatively little practice playing in creeks or doing cooking or carpentry or whatever independently) is that it seems to me it conditions people to spend less of our mental cycles "living from the inside," as you put it, and more of them ~"generating sentences designed to seem good some external process", and I think this may make people conscious less often.

I wish I understood better what it is to "look out from your own eyes"/"live from the inside", vs only from above.

Comment by AnnaSalamon on CFAR Takeaways: Andrew Critch · 2024-02-27T06:34:30.191Z · LW · GW

Totally.  Yes.

Comment by AnnaSalamon on CFAR Takeaways: Andrew Critch · 2024-02-26T15:44:12.811Z · LW · GW

I love that book!  I like Robin's essays, too, but the book was much easier for me to understand.  I wish more people would read it, would review it on here, etc.

Comment by AnnaSalamon on CFAR Takeaways: Andrew Critch · 2024-02-26T04:04:00.796Z · LW · GW

A related tweet by Qiaochu:

(I don't necessarily agree with QC's interpretation of what was going on as people talked about "agency" -- I empathize some, but empathize also with e.g. Kaj's comment in a reply that Kaj doesn't recognize this at from Kaj's 2018 CFAR mentorship training, did not find pressures there to coerce particular kinds of thinking).

My point in quoting this is more like: if people don't have much wanting of their own, and are immersed in an ambient culture that has opinions on what they should "want," experiences such as QC's seem sorta like the thing to expect.  Which is at least a bit corroborated by QC reporting it.

Comment by AnnaSalamon on CFAR Takeaways: Andrew Critch · 2024-02-25T21:09:26.354Z · LW · GW

Some partial responses (speaking only for myself):

1.  If humans are mostly a kludge of impulses, including the humans you are training, then... what exactly are you hoping to empower using "rationality training"?  I mean, what wants-or-whatever will they act on after your training?  What about your "rationality training" will lead them to take actions as though they want things?  What will the results be?

1b.  To illustrate what I mean: once I taught a rationality technique to SPARC high schoolers (probably the first year of SPARC, not sure; I was young and naive).  Once of the steps in the process involved picking a goal.  After walking them through all the steps, I asked for examples of how it had gone, and was surprised to find that almost all of them had picked such goals as "start my homework earlier, instead of successfully getting it done at the last minute and doing recreational math meanwhile"... which I'm pretty sure was not their goal in any wholesome sense, but was more like ambient words floating around that they had some social allegiance to.  I worry that if you "teach" "rationality" to adults who do not have wants, without properly noticing that they don't have wants, you set them up to be better-hijacked by the local memeset (and to better camouflage themselves as "really caring about AI risk" or whatever) in ways that won't do anybody any good because the words that are taking the place of wants don't have enough intelligence/depth/wisdom in them.

2.  My guess is that the degree of not-wanting that is seen among many members of the professional and managerial classes in today's anglosphere is more extreme than the historical normal, on some dimensions.  I think this partially because:

a.  IME, my friends and I as 8-year-olds had more wanting than I see in CFAR participants a lot of the time.  My friends were kids who happened to live on the same street as me growing up, so probably pretty normal.  We did have more free time than typical adults.

i.  I partially mean: we would've reported wanting things more often, and an observer with normal empathy would on my best guess have been like "yes it does seem like these kids wish they could go out and play 4-square" or whatever.  (Like, wanting you can feel in your body as you watch someone, as with a dog who really wants a bone or something).

ii.  I also mean: we tinkered, toward figuring out the things we wanted (e.g. rigging the rules different ways to try to make the 4-square game work in a way that was fun for kids of mixed ages, by figuring out laxer rules for the younger ones), and we had fun doing it.  (It's harder to claim this is different from the adults, but, like, it was fun and spontaneous and not because we were trying to mimic virtue; it was also this way when we saved up for toys we wanted.  I agree this point may not be super persuasive though.)

b.  IME, a lot of people act more like they/we want things when on a multi-day camping trip without phones/internet/work.  (Maybe like Critch's post about allowing oneself to get bored?)

c.  I myself have had periods of wanting things, and have had periods of long, bleached-out not-really-wanting-things-but-acting-pretty-"agentically"-anyway.  Burnout, I guess, though with all my CFAR techniques and such I could be pretty agentic-looking while quite burnt out.  The latter looks to me more like the worlds a lot of people today seem to me to be in, partly from talking to them about it, though people vary of course and hard to know.

d.  I have a theoretical model in which there are supposed to be cycles of yang and then yin, of goal-seeking effort and then finding the goal has become no-longer-compelling and resting / getting board / similar until a new goal comes along that is more compelling.  CFAR/AIRCS participants and similar people today seem to me to often try to stop this process -- people caffeinate, try to work full days, try to have goals all the time and make progress all the time, and on a large scale there's efforts to mess with the currency to prevent economic slumps.  I think there's a pattern to where good goals/wanting come from that isn't much respected.  I also think there's a lot of memes trying to hijack people, and a lot of memetic control structures that get upset when members of the professional and managerial classes think/talk/want without filtering their thoughts carefully through "will this be okay-looking" filters.

All of the above leaves me with a belief that the kinds of not-wanting we see are more "living human animals stuck in a matrix that leaves them very little slack to recover and have normal wants, with most of their 'conversation' and 'attempts to acquire rationality techniques' being hijacked by the matrix they're in rather than being earnest contact with the living animals inside" and less "this is simple ignorance from critters who're just barely figuring out intelligence but who will follow their hearts better and better as you give them more tools."

Apologies for how I'm probably not making much sense; happy to try other formats.

Comment by AnnaSalamon on CFAR Takeaways: Andrew Critch · 2024-02-25T05:17:38.750Z · LW · GW

I'm trying to build my own art of rationality training, and I've started talking to various CFAR instructors about their experiences – things that might be important for me to know but which hadn't been written up nicely before.

Perhaps off topic here, but I want to make sure you have my biggest update if you're gonna try to build your own art of rationality training.

It is, basically: if you want actual good to result from your efforts, it is crucial to build from and enable consciousness and caring, rather than to try to mimic their functionality.

If you're willing, I'd be quite into being interviewed about this one point for a whole post of this format, or for a whole dialog, or to talking about it with you in some other way, means, since I don't know how to say it well and I think it's crucial.  But, to babble:

Let's take math education as an analogy.  There's stuff you can figure out about numbers, and how to do things with numbers, when you understand what you're doing.  (e.g., I remember figuring out as a kid, in a blinding flash about rectangles, why 2*3 was 3*2, why it would always work).  And other people can take these things you can figure out, and package them as symbol-manipulation rules that others can use to "get the same results" without the accompanying insights.  But... it still isn't the same things as understanding, and it won't get your students the same kind of ability to build new math or to have discernment about which math is any good.

Humans are automatically strategic sometimes.  Maybe not all the way, but a lot more deeply than we are in "far-mode" contexts.  For example, if you take almost anybody and put them in a situation where they sufficiently badly need to pee, they will become strategic about how to find a restroom.  We are all capable of wanting sometimes, and we are a lot closer to strategic at such times.

My original method of proceeding in CFAR, and some other staff members' methods also, was something like:

  • Find a person, such as Richard Feynman or Elon Musk or someone a bit less cool than that but still very cool who is willing to let me interview them.  Try to figure out what mental processes they use.
  • Turn these mental processes into known, described procedures that system two / far-mode can invoke on purpose, even when the vicera do not care about a given so-called "goal."

(For example, we taught processes such as: "notice whether you viscerally expect to achieve your goal.  If you don't, ask why not, solve that problem, and iterate until you have a plan that you do viscerally anticipate will succeed." (aka inner sim / murphyjitsu.))

My current take is that this is no good -- it teaches non-conscious processes how to imitate some of the powers of consciousness, but in a way that lacks its full discernment, and that can lead to relatively capable non-conscious, non-caring processes doing a thing that no one who was actually awake-and-caring would want to do.  (And can make it harder for conscious, caring, but ignorant processes, such as youths, to tell the difference between conscious/caring intent, and memetically hijacked processes in the thrall of institutional-preservation-forces or similar.)  I think it's crucial to more like start by helping wanting/caring/consciousness to become free and to become in charge.  (An Allan Bloom quote that captures some but not all of what I have in mind: "There is no real education that does not respond to felt need.  All else is trifling display.")

Comment by AnnaSalamon on CFAR Takeaways: Andrew Critch · 2024-02-25T04:45:27.967Z · LW · GW

I'm not Critch, but to speak my own defense of the numeracy/scope sensitivity point:

IMO, one of the hallmarks of a conscious process is that it can take different actions in different circumstances (in a useful fashion), rather than simply doing things the way that process does it (following its own habits, personality, etc.).  ("When the facts change, I change my mind [and actions]; what do you do, sir?")

Numeracy / scope sensitivity is involved in, and maybe required for, the ability to do this deeply (to change actions all the way up to one's entire life, when moved by a thing worth being moved by there).

Smaller-scale examples of scope sensitivity, such as noticing that a thing is wasting several minutes of your day each day and taking inconvenient, non-default action to fix it, can help build this power.

Comment by AnnaSalamon on CFAR Takeaways: Andrew Critch · 2024-02-25T04:30:02.540Z · LW · GW

I am pretty far from having fully solved this problem myself, but I think I'm better at this than most people, so I'll offer my thoughts.

My suggestion is to not attempt to "figure out goals and what to want," but to "figure out blockers that are making it hard to have things to want, and solve those blockers, and wait to let things emerge."

Some things this can look like:

  1.  Critch's "boredom for healing from burnout" procedures.  Critch has some blog posts recommending boredom (and resting until quite bored) as a method for recovering one's ability to have wants after burnout:
    1. https://acritch.com/fun-does-not-preclude-burnout/
    2. https://acritch.com/boredom/
  2. Physically cleaning things out.  David Allen recommends cleaning out one's literal garage (or, for those of us who don't have one, I'd suggest one's literal room, closet, inbox, etc.) so as to have many pieces of "stuck goal" that can resolve and leave more space in one's mind/heart (e.g., finding an old library book from a city you don't live in anymore, and either returning it anyhow somehow, or giving up on it and donating it to goodwill or whatever, thus freeing up whatever part of your psyche was still stuck in that goal).
  3. Refusing that which does not "spark joy." Marie Kondo suggests getting in touch with a thing you want your house to be like (e.g., by looking through magazines and daydreaming about your desired vibe/life), and then throwing out whatever does not "spark joy", after thanking those objects for their service thus far.
    1. Analogously, a friend of mine has spent the last several months refusing all requests to which they are not a "hell yes," basically to get in touch with their ability to be a "hell yes" to things.
  4. Repeatedly asking one's viscera "would there be anything wrong with just not doing this?".  I've personally gotten a fair bit of mileage from repeatedly dropping my goals and seeing if they regenerate.  For example, I would sit down at my desk, would notice at some point that I was trying to "do work" instead of to actually accomplish anything, and then I would vividly imagine simply ceasing work for the week, and would ask my viscera if there would be any trouble with that or if it would in fact be chill to simply go to the park and stare at clouds or whatever.  Generally I would get back some concrete answer my viscera cared about, such as "no! then there won't be any food at the upcoming workshop, which would be terrible," whereupon I could take that as a goal ("okay, new plan: I have an hour of chance to do actual work before becoming unable to do work for the rest of the week; I should let my goal of making sure there's food at the workshop come out through my fingertips and let me contact the caterers" or whatever.
  5. Gendlin's "Focusing."  For me and at least some others I've watched, doing this procedure (which is easier with a skilled partner/facilitator -- consider the sessions or classes here if you're fairly new to Focusing and want to learn it well) is reliably useful for clearing out the barriers to wanting, if I do it regularly (once every week or two) for some period of time.
  6. Grieving in general.  Not sure how to operationalize this one.  But allowing despair to be processed, and to leave my current conceptions of myself and of my identity and plans, is sort of the connecting thread through all of the above imo.  Letting go of that which I no longer believe in.

I think the above works much better in contact also with something beautiful or worth believing in, which for me can mean walking in nature, reading good books of any sort, having contact with people who are alive and not despairing, etc.  

Comment by AnnaSalamon on CFAR Takeaways: Andrew Critch · 2024-02-25T04:06:32.971Z · LW · GW

Okay, maybe?  But I've also often been "real into that" in the sense that it resolves a dissonance in my ego-structure-or-something, or in the ego-structure-analog of CFAR or some other group-level structure I've been trying to defend, and I've been more into "so you don't get to claim I should do things differently" than into whether my so-called "goal" would work.  Cf "people don't seem to want things."

Comment by AnnaSalamon on CFAR Takeaways: Andrew Critch · 2024-02-25T04:04:52.912Z · LW · GW

. The specific operation that happened was applying ooda loops to the concept of ooda loops.

I love this!

Comment by AnnaSalamon on CFAR Takeaways: Andrew Critch · 2024-02-25T04:03:56.722Z · LW · GW

Surprise 4: How much people didn't seem to want things

And, the degree to which people wanted things was even more incoherent than I thought. I thought people wanted things but didn't know how to pursue them. 

[I think Critch trailed off here, but implication seemed to be "basically people just didn't want things in the first place"]

 

I concur.  From my current POV, this is the key observation that should've, and should still, instigate a basic attempt to model what humans actually are and what is actually up in today's humans.  It's too basic a confusion/surprise to respond to by patching the symptoms without understanding what's underneath.

I also quite appreciate the interview as a whole; thanks, Raemon and Critch!

Comment by AnnaSalamon on Believing In · 2024-02-11T03:30:26.591Z · LW · GW

I'm curious to hear how you arrived at the conclusion that a belief is a prediction. 

I got this in part from Eliezer's post Make your beliefs pay rent in anticipated experiences.  IMO, this premise (that beliefs should try to be predictions, and should try to be accurate predictions) is one of the cornerstones that LessWrong has been based on.

Comment by AnnaSalamon on Steam · 2024-02-08T16:36:56.079Z · LW · GW

I love this post.  (Somehow only just read it.)

My fav part: 
>  In the context of quantilization, we apply limited steam to projects to protect ourselves from Goodhart. "Full steam" is classically rational, but we do not always want that. We might even conjecture that we never want that. 

To elaborate a bit:

It seems to me that when I let projects pull me insofar as they pull me, and when I find a thing that is interesting enough that it naturally "gains steam" in my head, it somehow increases the extent to which I am locally immune from Goodhardt (e.g., my actions/writing goes deeper than I might've expected).  OTOH, when I try hard on a thing despite losing steam as I do it, I am more subject to Goodhardt (e.g., I complete something with the same keywords and external checksums as I thought I needed to hit, but it has less use and less depth than I might've expected given that).

I want better models of this.

Comment by AnnaSalamon on Believing In · 2024-02-08T16:10:20.709Z · LW · GW

Oh, man, yes, I hadn't seen that post before and it is an awesome post and concept.  I think maybe "believing in"s, and prediction-market-like structures of believing-ins, are my attempt to model how Steam gets allocated.