Posts

Believing In 2024-02-08T07:06:13.072Z
Which parts of the existing internet are already likely to be in (GPT-5/other soon-to-be-trained LLMs)'s training corpus? 2023-03-29T05:17:28.000Z
Are there specific books that it might slightly help alignment to have on the internet? 2023-03-29T05:08:28.364Z
What should you change in response to an "emergency"? And AI risk 2022-07-18T01:11:14.667Z
Comment reply: my low-quality thoughts on why CFAR didn't get farther with a "real/efficacious art of rationality" 2022-06-09T02:12:35.151Z
Narrative Syncing 2022-05-01T01:48:45.889Z
The feeling of breaking an Overton window 2021-02-17T05:31:40.629Z
“PR” is corrosive; “reputation” is not. 2021-02-14T03:32:24.985Z
Where do (did?) stable, cooperative institutions come from? 2020-11-03T22:14:09.322Z
Reality-Revealing and Reality-Masking Puzzles 2020-01-16T16:15:34.650Z
We run the Center for Applied Rationality, AMA 2019-12-19T16:34:15.705Z
AnnaSalamon's Shortform 2019-07-25T05:24:13.011Z
"Flinching away from truth” is often about *protecting* the epistemology 2016-12-20T18:39:18.737Z
Further discussion of CFAR’s focus on AI safety, and the good things folks wanted from “cause neutrality” 2016-12-12T19:39:50.084Z
CFAR's new mission statement (on our website) 2016-12-10T08:37:27.093Z
CFAR’s new focus, and AI Safety 2016-12-03T18:09:13.688Z
On the importance of Less Wrong, or another single conversational locus 2016-11-27T17:13:08.956Z
Several free CFAR summer programs on rationality and AI safety 2016-04-14T02:35:03.742Z
Consider having sparse insides 2016-04-01T00:07:07.777Z
The correct response to uncertainty is *not* half-speed 2016-01-15T22:55:03.407Z
Why CFAR's Mission? 2016-01-02T23:23:30.935Z
Why startup founders have mood swings (and why they may have uses) 2015-12-09T18:59:51.323Z
Two Growth Curves 2015-10-02T00:59:45.489Z
CFAR-run MIRI Summer Fellows program: July 7-26 2015-04-28T19:04:27.403Z
Attempted Telekinesis 2015-02-07T18:53:12.436Z
How to learn soft skills 2015-02-07T05:22:53.790Z
CFAR fundraiser far from filled; 4 days remaining 2015-01-27T07:26:36.878Z
CFAR in 2014: Continuing to climb out of the startup pit, heading toward a full prototype 2014-12-26T15:33:08.388Z
Upcoming CFAR events: Lower-cost bay area intro workshop; EU workshops; and others 2014-10-02T00:08:44.071Z
Why CFAR? 2013-12-28T23:25:10.296Z
Meetup : CFAR visits Salt Lake City 2013-06-15T04:43:54.594Z
Want to have a CFAR instructor visit your LW group? 2013-04-20T07:04:08.521Z
CFAR is hiring a logistics manager 2013-04-05T22:32:52.108Z
Applied Rationality Workshops: Jan 25-28 and March 1-4 2013-01-03T01:00:34.531Z
Nov 16-18: Rationality for Entrepreneurs 2012-11-08T18:15:15.281Z
Checklist of Rationality Habits 2012-11-07T21:19:19.244Z
Possible meetup: Singapore 2012-08-21T18:52:07.108Z
Center for Modern Rationality currently hiring: Executive assistants, Teachers, Research assistants, Consultants. 2012-04-13T20:28:06.071Z
Minicamps on Rationality and Awesomeness: May 11-13, June 22-24, and July 21-28 2012-03-29T20:48:48.227Z
How do you notice when you're rationalizing? 2012-03-02T07:28:21.698Z
Urges vs. Goals: The analogy to anticipation and belief 2012-01-24T23:57:04.122Z
Poll results: LW probably doesn't cause akrasia 2011-11-16T18:03:39.359Z
Meetup : Talk on Singularity scenarios and optimal philanthropy, followed by informal meet-up 2011-10-10T04:26:09.284Z
[Question] Do you know a good game or demo for demonstrating sunk costs? 2011-09-08T20:07:55.420Z
[LINK] How Hard is Artificial Intelligence? The Evolutionary Argument and Observation Selection Effects 2011-08-29T05:27:31.636Z
Upcoming meet-ups 2011-06-21T22:28:40.610Z
Upcoming meet-ups: 2011-06-11T22:16:09.641Z
Upcoming meet-ups: Buenos Aires, Minneapolis, Ottawa, Edinburgh, Cambridge, London, DC 2011-05-13T20:49:59.007Z
Mini-camp on Rationality, Awesomeness, and Existential Risk (May 28 through June 4, 2011) 2011-04-24T08:10:13.048Z
Learned Blankness 2011-04-18T18:55:32.552Z

Comments

Comment by AnnaSalamon on A Dozen Ways to Get More Dakka · 2024-04-09T02:31:28.935Z · LW · GW

I've bookedmarked this; thank you; I expect to get use from this list.

Comment by AnnaSalamon on On green · 2024-03-26T16:26:52.321Z · LW · GW

Resonating from some of the OP:

Sometimes people think I have a “utility function” that is small and is basically “inside me,” and that I also have a set of beliefs/predictions/anticipations that is large, richly informed by experience, and basically a pointer to stuff outside of me.

I don’t see a good justification for this asymmetry.

Having lived many years, I have accumulated a good many beliefs/predictions/anticipations about outside events: I believe I’m sitting at a desk, that Biden is president, that 2+3=5, and so on and so on.  These beliefs came about via colliding a (perhaps fairly simple, I’m not sure) neural processing pattern with a huge number of neurons and a huge amount of world.  (Via repeated conscious effort to make sense of things, partly.)

I also have a good deal of specific preference, stored in my ~perceptions of “good”: this chocolate is “better” than that one; this short story is “excellent” while that one is “meh”; such-and-such a friendship is “deeply enriching” to me; this theorem is “elegant, pivotal, very cool” and that code has good code-smell while this other theorem and that other code are merely meh; etc.

My guess is that my perceptions of which things are “good” encodes quite a lot pattern that really is in the outside world, much like my perceptions of which things are “true/real/good predictions.”

My guess is that it’s confused to say my perceptions of which things are “good” is mostly about my utility function, in much the same way that it’s confused to say that my predictions about the world is mostly about my neural processing pattern (instead of acknowledging that it’s a lot about the world I’ve been encountering, and that e.g. the cause of my belief that I’m currently sitting at a desk is mostly that I’m currently sitting at a desk).

Comment by AnnaSalamon on On attunement · 2024-03-26T03:50:04.539Z · LW · GW

And this requires what I've previously called "living from the inside," and "looking out of your own eyes," instead of only from above. In that mode, your soul is, indeed, its own first principle; what Thomas Nagel calls the "Last Word." Not the seen-through, but the seer (even if also: the seen).

 

I like this passage! It seems to me that sometimes I (perceive/reason/act) from within my own skin and perspective: "what do I want now? what's most relevant? what do I know, how do I know it, what does it feel like, why do I care? what even am I, this process that finds itself conscious right now?"  And then I'm more likely to be conscious, here, caring.  (I'm not sure what I mean by this, but I'm pretty sure I mean something, and that it's important.)

One thing that worries me a bit about contemporary life (school for 20 years, jobs where people work in heavily scripted ways using patterns acquired in school, relatively little practice playing in creeks or doing cooking or carpentry or whatever independently) is that it seems to me it conditions people to spend less of our mental cycles "living from the inside," as you put it, and more of them ~"generating sentences designed to seem good some external process", and I think this may make people conscious less often.

I wish I understood better what it is to "look out from your own eyes"/"live from the inside", vs only from above.

Comment by AnnaSalamon on CFAR Takeaways: Andrew Critch · 2024-02-27T06:34:30.191Z · LW · GW

Totally.  Yes.

Comment by AnnaSalamon on CFAR Takeaways: Andrew Critch · 2024-02-26T15:44:12.811Z · LW · GW

I love that book!  I like Robin's essays, too, but the book was much easier for me to understand.  I wish more people would read it, would review it on here, etc.

Comment by AnnaSalamon on CFAR Takeaways: Andrew Critch · 2024-02-26T04:04:00.796Z · LW · GW

A related tweet by Qiaochu:

(I don't necessarily agree with QC's interpretation of what was going on as people talked about "agency" -- I empathize some, but empathize also with e.g. Kaj's comment in a reply that Kaj doesn't recognize this at from Kaj's 2018 CFAR mentorship training, did not find pressures there to coerce particular kinds of thinking).

My point in quoting this is more like: if people don't have much wanting of their own, and are immersed in an ambient culture that has opinions on what they should "want," experiences such as QC's seem sorta like the thing to expect.  Which is at least a bit corroborated by QC reporting it.

Comment by AnnaSalamon on [deleted post] 2024-02-26T03:17:47.053Z

-section on other ways to get inside opponent's loop, not just speed -- "more inconspicuously, more quickly, and with more irregularity" as Boyd said

 

this sounds interesting

Comment by AnnaSalamon on [deleted post] 2024-02-26T03:17:26.745Z

-personal examples from video games: Stormgate and Friends vs. Friends

 

I want these

Comment by AnnaSalamon on [deleted post] 2024-02-26T03:13:06.334Z

the OODA loop is not as linear as this model presents

I think the steps always go in order, but also there are many OODA loops running simultaneously

Comment by AnnaSalamon on [deleted post] 2024-02-26T03:10:40.456Z

In the Observe step, one gathers information about the situation around them. In Boyd's original context of fighter aircraft operations, we can imagine a pilot looking out the canopy, checking instruments, listening to radio communications, etc.

 

Gotcha.  I'd assumed "observe" was more like "hear a crashing noise from the kitchen" -- a kinda-automatic process that triggers the person to re-take-things-in and re-orient.  Is that wrong?

Comment by AnnaSalamon on CFAR Takeaways: Andrew Critch · 2024-02-25T21:09:26.354Z · LW · GW

Some partial responses (speaking only for myself):

1.  If humans are mostly a kludge of impulses, including the humans you are training, then... what exactly are you hoping to empower using "rationality training"?  I mean, what wants-or-whatever will they act on after your training?  What about your "rationality training" will lead them to take actions as though they want things?  What will the results be?

1b.  To illustrate what I mean: once I taught a rationality technique to SPARC high schoolers (probably the first year of SPARC, not sure; I was young and naive).  Once of the steps in the process involved picking a goal.  After walking them through all the steps, I asked for examples of how it had gone, and was surprised to find that almost all of them had picked such goals as "start my homework earlier, instead of successfully getting it done at the last minute and doing recreational math meanwhile"... which I'm pretty sure was not their goal in any wholesome sense, but was more like ambient words floating around that they had some social allegiance to.  I worry that if you "teach" "rationality" to adults who do not have wants, without properly noticing that they don't have wants, you set them up to be better-hijacked by the local memeset (and to better camouflage themselves as "really caring about AI risk" or whatever) in ways that won't do anybody any good because the words that are taking the place of wants don't have enough intelligence/depth/wisdom in them.

2.  My guess is that the degree of not-wanting that is seen among many members of the professional and managerial classes in today's anglosphere is more extreme than the historical normal, on some dimensions.  I think this partially because:

a.  IME, my friends and I as 8-year-olds had more wanting than I see in CFAR participants a lot of the time.  My friends were kids who happened to live on the same street as me growing up, so probably pretty normal.  We did have more free time than typical adults.

i.  I partially mean: we would've reported wanting things more often, and an observer with normal empathy would on my best guess have been like "yes it does seem like these kids wish they could go out and play 4-square" or whatever.  (Like, wanting you can feel in your body as you watch someone, as with a dog who really wants a bone or something).

ii.  I also mean: we tinkered, toward figuring out the things we wanted (e.g. rigging the rules different ways to try to make the 4-square game work in a way that was fun for kids of mixed ages, by figuring out laxer rules for the younger ones), and we had fun doing it.  (It's harder to claim this is different from the adults, but, like, it was fun and spontaneous and not because we were trying to mimic virtue; it was also this way when we saved up for toys we wanted.  I agree this point may not be super persuasive though.)

b.  IME, a lot of people act more like they/we want things when on a multi-day camping trip without phones/internet/work.  (Maybe like Critch's post about allowing oneself to get bored?)

c.  I myself have had periods of wanting things, and have had periods of long, bleached-out not-really-wanting-things-but-acting-pretty-"agentically"-anyway.  Burnout, I guess, though with all my CFAR techniques and such I could be pretty agentic-looking while quite burnt out.  The latter looks to me more like the worlds a lot of people today seem to me to be in, partly from talking to them about it, though people vary of course and hard to know.

d.  I have a theoretical model in which there are supposed to be cycles of yang and then yin, of goal-seeking effort and then finding the goal has become no-longer-compelling and resting / getting board / similar until a new goal comes along that is more compelling.  CFAR/AIRCS participants and similar people today seem to me to often try to stop this process -- people caffeinate, try to work full days, try to have goals all the time and make progress all the time, and on a large scale there's efforts to mess with the currency to prevent economic slumps.  I think there's a pattern to where good goals/wanting come from that isn't much respected.  I also think there's a lot of memes trying to hijack people, and a lot of memetic control structures that get upset when members of the professional and managerial classes think/talk/want without filtering their thoughts carefully through "will this be okay-looking" filters.

All of the above leaves me with a belief that the kinds of not-wanting we see are more "living human animals stuck in a matrix that leaves them very little slack to recover and have normal wants, with most of their 'conversation' and 'attempts to acquire rationality techniques' being hijacked by the matrix they're in rather than being earnest contact with the living animals inside" and less "this is simple ignorance from critters who're just barely figuring out intelligence but who will follow their hearts better and better as you give them more tools."

Apologies for how I'm probably not making much sense; happy to try other formats.

Comment by AnnaSalamon on CFAR Takeaways: Andrew Critch · 2024-02-25T05:17:38.750Z · LW · GW

I'm trying to build my own art of rationality training, and I've started talking to various CFAR instructors about their experiences – things that might be important for me to know but which hadn't been written up nicely before.

Perhaps off topic here, but I want to make sure you have my biggest update if you're gonna try to build your own art of rationality training.

It is, basically: if you want actual good to result from your efforts, it is crucial to build from and enable consciousness and caring, rather than to try to mimic their functionality.

If you're willing, I'd be quite into being interviewed about this one point for a whole post of this format, or for a whole dialog, or to talking about it with you in some other way, means, since I don't know how to say it well and I think it's crucial.  But, to babble:

Let's take math education as an analogy.  There's stuff you can figure out about numbers, and how to do things with numbers, when you understand what you're doing.  (e.g., I remember figuring out as a kid, in a blinding flash about rectangles, why 2*3 was 3*2, why it would always work).  And other people can take these things you can figure out, and package them as symbol-manipulation rules that others can use to "get the same results" without the accompanying insights.  But... it still isn't the same things as understanding, and it won't get your students the same kind of ability to build new math or to have discernment about which math is any good.

Humans are automatically strategic sometimes.  Maybe not all the way, but a lot more deeply than we are in "far-mode" contexts.  For example, if you take almost anybody and put them in a situation where they sufficiently badly need to pee, they will become strategic about how to find a restroom.  We are all capable of wanting sometimes, and we are a lot closer to strategic at such times.

My original method of proceeding in CFAR, and some other staff members' methods also, was something like:

  • Find a person, such as Richard Feynman or Elon Musk or someone a bit less cool than that but still very cool who is willing to let me interview them.  Try to figure out what mental processes they use.
  • Turn these mental processes into known, described procedures that system two / far-mode can invoke on purpose, even when the vicera do not care about a given so-called "goal."

(For example, we taught processes such as: "notice whether you viscerally expect to achieve your goal.  If you don't, ask why not, solve that problem, and iterate until you have a plan that you do viscerally anticipate will succeed." (aka inner sim / murphyjitsu.))

My current take is that this is no good -- it teaches non-conscious processes how to imitate some of the powers of consciousness, but in a way that lacks its full discernment, and that can lead to relatively capable non-conscious, non-caring processes doing a thing that no one who was actually awake-and-caring would want to do.  (And can make it harder for conscious, caring, but ignorant processes, such as youths, to tell the difference between conscious/caring intent, and memetically hijacked processes in the thrall of institutional-preservation-forces or similar.)  I think it's crucial to more like start by helping wanting/caring/consciousness to become free and to become in charge.  (An Allan Bloom quote that captures some but not all of what I have in mind: "There is no real education that does not respond to felt need.  All else is trifling display.")

Comment by AnnaSalamon on CFAR Takeaways: Andrew Critch · 2024-02-25T04:45:27.967Z · LW · GW

I'm not Critch, but to speak my own defense of the numeracy/scope sensitivity point:

IMO, one of the hallmarks of a conscious process is that it can take different actions in different circumstances (in a useful fashion), rather than simply doing things the way that process does it (following its own habits, personality, etc.).  ("When the facts change, I change my mind [and actions]; what do you do, sir?")

Numeracy / scope sensitivity is involved in, and maybe required for, the ability to do this deeply (to change actions all the way up to one's entire life, when moved by a thing worth being moved by there).

Smaller-scale examples of scope sensitivity, such as noticing that a thing is wasting several minutes of your day each day and taking inconvenient, non-default action to fix it, can help build this power.

Comment by AnnaSalamon on CFAR Takeaways: Andrew Critch · 2024-02-25T04:30:02.540Z · LW · GW

I am pretty far from having fully solved this problem myself, but I think I'm better at this than most people, so I'll offer my thoughts.

My suggestion is to not attempt to "figure out goals and what to want," but to "figure out blockers that are making it hard to have things to want, and solve those blockers, and wait to let things emerge."

Some things this can look like:

  1.  Critch's "boredom for healing from burnout" procedures.  Critch has some blog posts recommending boredom (and resting until quite bored) as a method for recovering one's ability to have wants after burnout:
    1. https://acritch.com/fun-does-not-preclude-burnout/
    2. https://acritch.com/boredom/
  2. Physically cleaning things out.  David Allen recommends cleaning out one's literal garage (or, for those of us who don't have one, I'd suggest one's literal room, closet, inbox, etc.) so as to have many pieces of "stuck goal" that can resolve and leave more space in one's mind/heart (e.g., finding an old library book from a city you don't live in anymore, and either returning it anyhow somehow, or giving up on it and donating it to goodwill or whatever, thus freeing up whatever part of your psyche was still stuck in that goal).
  3. Refusing that which does not "spark joy." Marie Kondo suggests getting in touch with a thing you want your house to be like (e.g., by looking through magazines and daydreaming about your desired vibe/life), and then throwing out whatever does not "spark joy", after thanking those objects for their service thus far.
    1. Analogously, a friend of mine has spent the last several months refusing all requests to which they are not a "hell yes," basically to get in touch with their ability to be a "hell yes" to things.
  4. Repeatedly asking one's viscera "would there be anything wrong with just not doing this?".  I've personally gotten a fair bit of mileage from repeatedly dropping my goals and seeing if they regenerate.  For example, I would sit down at my desk, would notice at some point that I was trying to "do work" instead of to actually accomplish anything, and then I would vividly imagine simply ceasing work for the week, and would ask my viscera if there would be any trouble with that or if it would in fact be chill to simply go to the park and stare at clouds or whatever.  Generally I would get back some concrete answer my viscera cared about, such as "no! then there won't be any food at the upcoming workshop, which would be terrible," whereupon I could take that as a goal ("okay, new plan: I have an hour of chance to do actual work before becoming unable to do work for the rest of the week; I should let my goal of making sure there's food at the workshop come out through my fingertips and let me contact the caterers" or whatever.
  5. Gendlin's "Focusing."  For me and at least some others I've watched, doing this procedure (which is easier with a skilled partner/facilitator -- consider the sessions or classes here if you're fairly new to Focusing and want to learn it well) is reliably useful for clearing out the barriers to wanting, if I do it regularly (once every week or two) for some period of time.
  6. Grieving in general.  Not sure how to operationalize this one.  But allowing despair to be processed, and to leave my current conceptions of myself and of my identity and plans, is sort of the connecting thread through all of the above imo.  Letting go of that which I no longer believe in.

I think the above works much better in contact also with something beautiful or worth believing in, which for me can mean walking in nature, reading good books of any sort, having contact with people who are alive and not despairing, etc.  

Comment by AnnaSalamon on CFAR Takeaways: Andrew Critch · 2024-02-25T04:06:32.971Z · LW · GW

Okay, maybe?  But I've also often been "real into that" in the sense that it resolves a dissonance in my ego-structure-or-something, or in the ego-structure-analog of CFAR or some other group-level structure I've been trying to defend, and I've been more into "so you don't get to claim I should do things differently" than into whether my so-called "goal" would work.  Cf "people don't seem to want things."

Comment by AnnaSalamon on CFAR Takeaways: Andrew Critch · 2024-02-25T04:04:52.912Z · LW · GW

. The specific operation that happened was applying ooda loops to the concept of ooda loops.

I love this!

Comment by AnnaSalamon on CFAR Takeaways: Andrew Critch · 2024-02-25T04:03:56.722Z · LW · GW

Surprise 4: How much people didn't seem to want things

And, the degree to which people wanted things was even more incoherent than I thought. I thought people wanted things but didn't know how to pursue them. 

[I think Critch trailed off here, but implication seemed to be "basically people just didn't want things in the first place"]

 

I concur.  From my current POV, this is the key observation that should've, and should still, instigate a basic attempt to model what humans actually are and what is actually up in today's humans.  It's too basic a confusion/surprise to respond to by patching the symptoms without understanding what's underneath.

I also quite appreciate the interview as a whole; thanks, Raemon and Critch!

Comment by AnnaSalamon on Believing In · 2024-02-11T03:30:26.591Z · LW · GW

I'm curious to hear how you arrived at the conclusion that a belief is a prediction. 

I got this in part from Eliezer's post Make your beliefs pay rent in anticipated experiences.  IMO, this premise (that beliefs should try to be predictions, and should try to be accurate predictions) is one of the cornerstones that LessWrong has been based on.

Comment by AnnaSalamon on Steam · 2024-02-08T16:36:56.079Z · LW · GW

I love this post.  (Somehow only just read it.)

My fav part: 
>  In the context of quantilization, we apply limited steam to projects to protect ourselves from Goodhart. "Full steam" is classically rational, but we do not always want that. We might even conjecture that we never want that. 

To elaborate a bit:

It seems to me that when I let projects pull me insofar as they pull me, and when I find a thing that is interesting enough that it naturally "gains steam" in my head, it somehow increases the extent to which I am locally immune from Goodhardt (e.g., my actions/writing goes deeper than I might've expected).  OTOH, when I try hard on a thing despite losing steam as I do it, I am more subject to Goodhardt (e.g., I complete something with the same keywords and external checksums as I thought I needed to hit, but it has less use and less depth than I might've expected given that).

I want better models of this.

Comment by AnnaSalamon on Believing In · 2024-02-08T16:10:20.709Z · LW · GW

Oh, man, yes, I hadn't seen that post before and it is an awesome post and concept.  I think maybe "believing in"s, and prediction-market-like structures of believing-ins, are my attempt to model how Steam gets allocated.

Comment by AnnaSalamon on Believing In · 2024-02-08T08:47:36.925Z · LW · GW

So, I agree there's something in common -- Wittgenstein is interested in "language games" that have function without having literal truth-about-predictions, and "believing in"s are games played with language that have function and that do not map onto literal truth-about-predictions.  And I appreciate the link in to the literature.

The main difference between what I'm going for here, and at least this summary of Wittgenstein (I haven't read Wittgenstein and may well be shortchanging him and you) is that I'm trying to argue that "believing in"s pay a specific kind of rent -- they endorse particular projects capable of taking investment, they claim the speaker will themself invest resources in that project, they predict that that project will get yield ROI.

Like: anticipations (wordless expectations, that lead to surprise / not-surprise) are a thing animals do by default, that works pretty well and doesn't get all that buggy.  Humans expand on this by allowing sentences such as "objects in Earth's gravity accelerate at a rate of 9.8m/s^2," which... pays rent in anticipated experience in a way that "Wulky Wilkisen is a post-utopian" doesn't, in Eliezer's example.  I'm hoping to cleave off, here, a different set of sentences that are also not like "Wulky Wilkinsen is a post-utopian" and that pay a different and well-defined kind of rent.

Comment by AnnaSalamon on Believing In · 2024-02-08T08:21:37.706Z · LW · GW

A point I didn’t get to very clearly in the OP, that I’ll throw into the comments:

When shared endeavors are complicated, it often makes sense for them to coordinate internally via a shared set of ~“beliefs”, for much the same reason that organisms acquire beliefs in the first place (rather than simply learning lots of stimulus-response patterns or something).

This sometimes makes it useful for various collaborators in a project to act on a common set of “as if beliefs,” that are not their own individual beliefs.

I gave an example of this in the OP:

  • If my various timeslices are collaborating in writing a single email, it’s useful to somehow hold in mind, as a target, a single coherent notion of how I want to trade off between quality-of-email and time-cost-of-writing.  Otherwise I leave value on the table.

The above was an example within me, across my subagents. But there are also examples held across sets of people, e.g. how much a given project needs money vs insights on problem X vs data on puzzle Y, and what the cruxes are that’ll let us update about that, and so on.

A “believing in” is basically a set of ~beliefs that some portion of your effort-or-other-resources is invested in taking as a premise, that usually differ from your base-level beliefs.

(Except, sometimes people coordinate more easily via things that’re more like goals or deontologies or whatever, and English uses the phrase “believing in” for marking investment in a set of ~beliefs, or in a set of ~goals, or in a set of ~deontologies.)

Comment by AnnaSalamon on Narrative Syncing · 2024-02-08T07:46:15.337Z · LW · GW

I made a new post just now, "Believing In," which offers a different account of some of the above phenomena.

My current take is that my old concept of "narrative syncing" describes the behaviorist outside of a pattern of relating that pops up a lot, but doesn't describe the earnest inside that that pattern is kind of designed around.

(I still think "narrative syncing" is often done without an earnest inside, by people keeping words around an old icon after the icon has lost its original earnest meaning (e.g., to manipulate others), so I still want a term for that part; I, weirdly, do not often think using the term "narrative syncing," it doesn't quite do it for me, not sure what would.  Some term that is to "believing in" as lying/deceiving is to "beliefs/predictions".)

Comment by AnnaSalamon on Ability to solve long-horizon tasks correlates with wanting things in the behaviorist sense · 2023-11-29T17:20:27.154Z · LW · GW

Yes, exactly.   Like, we humans mostly have something that kinda feels intrinsic but that also pays rent and updates with experience, like a Go player's sense of "elegant" go moves.  My current (not confident) guess is that these thingies (that humans mostly have) might be a more basic and likely-to-pop-up-in-AI mathematical structure than are fixed utility functions + updatey beliefs, a la Bayes and VNM.  I wish I knew a simple math for them.

Comment by AnnaSalamon on Ability to solve long-horizon tasks correlates with wanting things in the behaviorist sense · 2023-11-29T17:09:53.980Z · LW · GW

Thanks for replying.  The thing I'm wondering about is: maybe it's sort of like this "all the way down."  Like, maybe the things that are showing up as "terminal" goals in your analysis (money, status, being useful) are themselves composed sort of like the apple pie business, in that they congeal while they're "profitable" from the perspective of some smaller thingies located in some large "bath" (such as an economy, or a (non-conscious) attempt to minimize predictive error or something so as to secure neural resources, or a theremodynamic flow of sunlight or something).  Like, maybe it is this way in humans, and maybe it is or will be this way in an AI.  Maybe there won't be anything that is well-regarded as "terminal goals."

I said something like this to a friend, who was like "well, sure, the things that are 'terminal' goals for me are often 'instrumental' goals for evolution, who cares?"  The thing I care about here is: how "fixed" are the goals, do they resist updating/dissolving when they cease being "profitable" from the perspective of thingies in an underlying substrate, or are they constantly changing as what is profitable changes?  Like, imagine a kid who cares about playing "good, fun" videogames, but whose notion of which games are this updates pretty continually as he gets better at gaming.  I'm not sure it makes that much sense to think of this as a "terminal goal" in the same sense that "make a bunch of diamond paperclips according to this fixed specification" is a terminal goal.  It might be differently satiable, differently in touch with what's below it, I'm not really sure why I care but I think it might matter for what kind of thing organisms/~agent-like-things are.

Comment by AnnaSalamon on Ability to solve long-horizon tasks correlates with wanting things in the behaviorist sense · 2023-11-25T04:39:19.841Z · LW · GW

There’s a thing I’m personally confused about that seems related to the OP, though not directly addressed by it.  Maybe it is sufficiently on topic to raise here.

My personal confusion is this:

Some of my (human) goals are pretty stable across time (e.g. I still like calories, and being a normal human temperature, much as I did when newborn).  But a lot of my other “goals” or “wants” form and un-form without any particular “convergent instrumental drives”-style attempts to protect said “goals” from change.

As a bit of an analogy (to how I think I and other humans might approximately act): in a well-functioning idealized economy, an apple pie-making business might form (when it was the case that apple pie would deliver a profit over the inputs of apples plus the labor of those involved plus etc.), and might later fluidly un-form (when it ceased to be profitable), without "make apple pies" or "keep this business afloat" becoming a thing that tries to self-perpetuate in perpetuity.  I think a lot of my desires are like this (I care intrinsically about getting outdoors everyday while there’s profit in it, but the desire doesn’t try to shield itself from change, and it’ll stop if getting outdoors stops having good results.  And this notion of "profit" does not itself seem obviously like a fixed utility function, I think.).

I’m pretty curious about whether the [things kinda like LLMs but with longer planning horizons that we might get as natural extensions of the current paradigm, if the current paradigm extends this way, and/or the AGIs that an AI-accidentally-goes-foom process will summon] will have goals that try to stick around indefinitely, or goals that congeal and later dissolve again into some background process that'll later summon new goals, without summoning something lasting that is fixed-utility-function-shaped.  (It seems to me that idealized economies do not acquire fixed or self-protective goals, and for all I know many AIs might as be like economies in this way.)

(I’m not saying this bears on risk in any particular way.  Temporary goals would still resist most wrenches while they remained active, much as even an idealized apple pie business resists wrenches while it stays profitable.)

Comment by AnnaSalamon on Dishonorable Gossip and Going Crazy · 2023-10-14T15:19:10.343Z · LW · GW

Ben Pace, honorably quoting aloud a thing he'd previously said about Ren:

the other day i said [ren] seemed to be doing well to me

to clarify, i am not sure she has not gone crazy

she might've, i'm not close enough to be confident

i'd give it 25%

I really don't like this usage of the word "crazy", which IME is fairly common in the bay area rationality community.  This is for several reasons.  The simple to express one is that I really read through like 40% of this dialog thinking (from its title plus early conversational volleys) that people were afraid Ren had gone, like, the kind of "crazy" that acute mania or psychosis or something often is, where a person might lose their ability to do normal  tasks that almost everyone can do, like knowing what year it is or how to get to the store and back safely.  Which was a set of worries I didn't need to have, in this case.  I.e., my simple complaint is that it caused me confusion here.

The harder to express but more heartfelt one, is something like: the word "crazy" is a license to write people off.  When people in wider society use it about those having acute psychiatric crises, they give themselves a license to write off the sense behind the perceptions of like 2% or something of the population.  When the word is instead used about people who are not practicing LW!rationality, including ordinary religious people, it gives a license to write off a much larger chunk of people (~95% of the population?), so one is less apt to seek sense behind their perceptions and actions.  

This sort of writing-off is a thing people can try doing, if they want, but it's a nonstandard move and I want it to be visible as such.  That is, I want people to spell it out more, like: "I think Ren might've stopped being double-plus-sane like all the rest of us are" or "I think Ren might've stopped following the principles of LW!rationality" or something.  (The word "crazy" hides this as though it's the normal-person "dismiss ~2% of the population" move; these other sentences make make visible that it's an unusual and more widely dismissive move.)  The reason I want this move to be made visible in this way is partly that I think the outside view on (groups of people who dismiss those who aren't members of the group) is that this practice often leads to various bad things (e.g. increased conformity as group members fear being dubbed out-group; increased blindness to outside perspectives; difficulty collaborating with skilled outsiders), and I want those risks more visible.

(FWIW, I'd have the same response to a group of democrats discussing republicans or Trump-voters as "crazy", and sometimes have.  But IMO bay area rationalists say this sort of thing much more than other groups I've been part of.)

Comment by AnnaSalamon on Commonsense Good, Creative Good · 2023-10-06T23:58:16.331Z · LW · GW

Thanks for this response; I find it helpful.

Reading it over, I want to distinguish between:

  • a) Relatively thoughtless application of heuristics; (system-1integrated + fast)
  • b) Taking time to reflect and notice how things seem to you once you've had more space for reflection, for taking in other peoples' experiences, for noticing what still seems to matter once you've fallen out of the day-to-day urgencies, and for tuning into the "still, quiet voice" of conscience; (system-1-integrated + slow, after a pause)
  • c) Ethical reasoning (system-2-heavy, medium-paced or slow).

The brief version of my position is that (b) is awesome, while (c) is good when it assists (b) but is damaging when it is acted on in a way that disempowers rather than empowers (b). 

--

The long-winded version (which may be entirely in agreement with your (Tristan's) comment, but which goes into detail because I want to understand this stuff):

I agree with you and Eccentricity that most people, including me and IMO most LWers and EAers, could benefit from doing more (b) than we tend to do.

I also agree with you that (c) can assist in doing (b).  For example, it can be good for a person to ask themselves "how does this action, which I'm inclined to take, differ from the actions I condemned in others?",  "what is likely to happen if I do this?", and "do my concepts and actions fit the world I'm in, or is there a tiny note of discord?"

At the same time, I don't want to just say "c is great! do more c!" because I share with the OP a concern that EA-ers, LW-ers, and people in general who attempt explicit ethical reasoning sometimes end up using these to talk themselves into doing dumb, harmful things, with the OP's example of "leave inaccurate reviews at vegan restaurants to try to save animals" giving a good example of the flavor of these errors, and with historical communism giving a good example of their potential magnitude.

My take as to the difference between virtuous use of explicit ethical reasoning, and vicious/damaging use of explicit ethical reasoning, is that virtuous use of such reasoning is aimed at cultivating and empowering a person's [prudence, phronesis, common sense, or whatever you want to call a central faculty of judgment that draws on and integrates everything the person discerns and cares about], whereas vicious/damaging uses of ethical reasoning involve taking some piece of the total set of things we care about, stabilizing it into an identity and/or a social movement ("I am a hedonistic utilitarian", "we are (communists/social justice/QAnon/EA)", and having this artificially stabilized fragment of the total set of things one cares about, act directly in the world without being filtered through one's total discernment ("Action A is the X thing to do, and I am an X, so I will take action A").

(Prudence was classically considered not only a virtue, but the "queen of the virtues" -- as Wikipedia puts it  "Prudence points out which course of action is to be taken in any concrete circumstances... Without prudence, bravery becomes foolhardiness, mercy sinks into weakness, free self-expression and kindness into censure, humility into degradation and arrogance, selflessness into corruption, and temperance into fanaticism."  Folk ethics, or commonsense ethics, has at its heart the cultivation of a total faculty of discernment, plus the education of this faculty to include courage/kindness/humility/whatever other virtues.)

My current guess as to how to develop prudence is basically to take an interest in things, care, notice tiny notes of discord, notice what actions have historically had what effects, notice when one is oneself "hijacked" into acting on something other than one's best judgment and how to avoid this, etc.  I think this is part of what you have in mind about bringing ethical reasoning into daily life, so as to see how kindness applies in specific rather than merely claiming it'd be good to apply somehow? 

Absent identity-based or social-movement-based artificial stabilization, people can and do make mistakes, including e.g. leaving inaccurate reviews in an attempt to help animals.  But I think those mistakes are more likely to be part of a fairly rapid process of developing prudence (which seems pretty worth it to me), and are less likely to be frozen in and acted on for years. 

(My understanding isn't great here; more dialog would be great.)

Comment by AnnaSalamon on Thomas Kwa's Shortform · 2023-10-04T01:19:45.144Z · LW · GW

I like the question; thanks.  I don't have anything smart to say about at the moment, but it seems like a cool thread.

Comment by AnnaSalamon on Commonsense Good, Creative Good · 2023-10-02T15:14:06.354Z · LW · GW

The idea is, normally just do straightforwardly good things. Be cooperative, friendly, and considerate. Embrace the standard virtues. Don't stress about the global impacts or second-order altruistic effects of minor decisions. But also identify the very small fraction of your decisions which are likely to have the largest effects and put a lot of creative energy into doing the best you can.

I agree with this, but would add that IMO, after you work out the consequentialist analysis of the small set of decisions that are worth intensive thought/effort/research, it is quite worthwhile to additionally work out something like a folk ethical account of why your result is correct, or of how the action you're endorsing coheres with deep virtues/deontology/tropes/etc.  

There are two big upsides to this process:

  1. As you work this out, you get some extra checks on your reasoning -- maybe folk ethics sees something you're missing here; and
     
  2. At least as importantly: a good folk ethical account will let individuals and groups cohere around the proposed action, in a simple, conscious, wanting-the-good-thing-together way, without needing to dissociate from what they're doing (whereas accounts like "it's worth dishonesty in this one particular case" will be harder to act on wholeheartedly, even when basically correct).  And this will work a lot better.

IMO, this is similar to: in math, we use heuristics and intuitions and informal reasoning a lot, to guess how to do things -- and we use detailed, not-condensed-by-heuristics algebra or mathematical proof steps sometimes also, to work out how a thing goes that we don't yet find intuitive or obvious.  But after writing a math proof the sloggy way, it's good to go back over it, look for "why it worked," "what was the true essence of the proof, that made it tick," and see if there is now a way to "see it at a glance," to locate ways of seeing that will make future such situations more obvious, and that can live in one's system 1 and aesthetics as well as in one's sloggy explicit reasoning.

Or, again, in coding: usually we can use standard data structures and patterns.  Sometimes we have to hand-invent something new.  But after coming up with the something new: it's good, often, to condense it into readily parsable/remember-able/re-useable stuff, instead of hand spaghetti code.

Or, in physics and many other domains: new results are sometimes counterintuitive, but it is advisable to then work out intuitions whereby reality may be more intuitive in the future.

I don't have my concepts well worked out here yet, which is why I'm being so long-winded and full of analogies.  But I'm pretty sure that folk ethics, where we have it worked out, has a bunch of advantages over consequentialist reasoning that're kind of like those above. 

  • As the OP notes, folk ethics can be applied to hundreds of decisions per day, without much thought per each;
  • As the OP notes, folk ethics have been tested across huge numbers of past actions by huge numbers of people.  New attempts at folk ethical reasoning can't have this advantage fully.  But: I think when things are formulated simply enough, or enough in the language of folk ethics, we can back-apply them a lot more on a lot of known history and personally experienced anecdotes ad so on (since they are quick to apply, as in the above bullet point), and can get at least some of the "we still like this heuristic after considering it in a lot of different contexts with known outcomes" advantage.
  • As OP implies, folk ethics is more robust to a lot of the normal human bias temptations ("x must be right, because I'd find it more convenient right this minute") compared to case-by-case reasoning;
  • It is easier for us humans to work hard on something, in a stable fashion, when we can see in our hearts that it is good, and can see how it relates to everything else we care about.  Folk ethics helps with this.  Maybe folk ethics, and notions of virtue and so on, kind of are takes on how a given thing can fit together with all the little decisions and all the competing pulls as to what's good?  E.g. the OP lists as examples of commonsense goods "patience, respect, humility, moderation, kindness, honesty" -- and all of these are pretty usable guides to how to be while I care about something, and to how to relate that caring to l my other cares and goals.
  • I suspect there's something particularly good here with groups.  We humans often want to be part of groups that can work toward a good goal across a long period of time, while maintaining integrity, and this is often hard because groups tend to degenerate with time into serving individuals' local power, becoming moral fads, or other things that aren't as good as the intended purpose.  Ethics, held in common by the group's common sense, is a lot of how this is ever avoided, I think; and this is more feasible if the group is trying to serve a thing whose folk ethics (or "commonsense good") has been worked out, vs something that hasn't.

For a concrete example: 

AI safety obviously matters.  The folk ethics of "don't let everyone get killed if you can help it" are solid, so that part's fine.  But in terms of tactics: I really think we need to work out a "commonsense good" or "folk ethics" type account of things like:

  1. Is it okay to try to get lots of power, by being first to AI and trying to make use of that power to prevent worse AI outcomes?  (My take: maybe somehow, but I haven't seen the folk ethics worked out, and a good working out would give a lot of checks here, I think.)  
  2. Is it okay to try to suppress risky research, e.g. via frowning at people and telling them that only bad people do AI research, so as to try to delay tech that might kill everyone?  (My take: probably, on my guess -- but a good folk ethics would bring structure and intuitions somehow, like, it would work out how this is different from other kinds of "discourage people from talking and figuring things out," it would have perceivable virtues or something for noticing the differences, which would help people then track the differences on the group commonsense level in ways that help the group's commonsense not erode its general belief in the goodness of people sharing information and doing things).
Comment by AnnaSalamon on "Flinching away from truth” is often about *protecting* the epistemology · 2023-09-26T23:23:07.810Z · LW · GW

I agree, '"Flinching away from truth" is often caused by internal conflation" would be a much better title -- a more potent short take-away.  (Or at least one I more agree with after some years of reflection.)  Thanks!

Comment by AnnaSalamon on The Friendly Drunk Fool Alignment Strategy · 2023-04-05T05:17:46.214Z · LW · GW

I enjoyed this post, both for its satire of a bunch of peoples' thinking styles (including mine, at times), and because IMO (and in the author's opinion, I think), there are some valid points near here and it's a bit tricky to know which parts of the "jokes/poetry" may have valid analogs.

I appreciate the author for writing it, because IMO we have a whole bunch of different subcultures and styles of conversation and sets of assumptions colliding all of a sudden on the internet right now around AI risk, and noticing the existence of the others seems useful, and IMO the OP is an attempt to collide LW with some other styles.  Judging from the comments it seems to me not to have succeeded all that much; but it was helpful to me, and I appreciate the effort.  (Though, as a tactical note, it seems to me the approximate failure was due mostly to piece's the sarcasm, and I suspect sarcasm in general tends not to work well across cultural or inferential distances.)

Some points I consider valid, that also appear within [the vibes-based reasoning the OP is trying to satirize, and also to model and engage with]:

1) Sometimes, talking a lot about a very specific fear can bring about the feared scenario.  (An example I'm sure of: a friend's toddler stuck her hands in soap.  My friend said "don't touch your eyes."  The toddler, unclear on the word 'not,' touched her eyes.) (A possible example I'm less confident in: articulated fears of AI risk may have accelerated AI because humanity's collective attentional flows, like toddlers, has no reasonable implementation of the word "not.")  This may be a thing to watch out for for an AI risk movement.

(I think this is non-randomly reflected in statements like: "worrying has bad vibes.")

2) There's a lot of funny ways that attempting to control people or social processes can backfire.  (Example: lots of people don't like it when they feel like something is trying to control them.)  (Example: the prohibition of alcohol in the US between 1917-1933 is said to have fueled organized crime.)  (Example I'm less confident of:  Trying to keep e.g. anti-vax views out of public discourse leads some to be paranoid, untrusting of establishment writing on the subject.)This is a thing that may make trouble for some safety strategies, and that seems to me to be non-randomly reflected in "trying to control things has bad vibes."

(Though, all things considered, I still favor trying to slow things!  And I care about trying to slow things.)

3) There're a lot of places where different schelling equilibria are available, and where groups can, should, and do try to pick the equilibrium that is better.  In many cases this is done with vibes.  Vibes, positivity, attending to what is or isn't cool or authentic (vs boring), etc., are part of how people decide which company to congregate on, which subculture to bring to life, which approach to AI to do research within, etc.  -- and this is partly doing some real work discerning what can become intellectually vibrant (vs boring, lifeless, dissociated).

TBC, I would not want to use vibes-based reasoning in place of reasoning, and I would not want LW to accept vibes in place of reasons.   I would want some/many in LW to learn to model vibes-based reasoning for the sake of understanding the social processes around us.  I would also want some/many at LW to sometimes, if the rate of results pans out in a given domain, use something like vibes-based reasoning as a source of hypotheses that one can check against actual reasoning.  LW seems to me pretty solid on reasoning relative to other places I know on the internet, but only mediocre on generativity; I think learning to absorb hypotheses from varied subcultures (and from varied old books, from people who thought at other times and places) would probably help, and the OP is gesturing at one such subculture.

I'm posting this comment because I didn't want to post this comment for fear of being written off by LW, and I'm trying to come out of more closets.  Kinda at random, since I've spent large months or small years failing to successfully implement some sort of more planned approach.

Comment by AnnaSalamon on Widening Overton Window - Open Thread · 2023-03-31T17:56:45.249Z · LW · GW

The public early Covid-19 conversation (in like Feb-April 2020) seemed pretty hopeful to me -- decent arguments, slow but asymmetrically correct updating on some of those arguments, etc.  Later everything became politicized and stupid re: covid.

Right now I think there's some opportunity for real conversation re: AI.  I don't know what useful thing follows from that, but I do think it may not last, and that it's pretty cool.  I care more about the "an opening for real conversation" thing than for the changing overton window as such, although I think the former probably follows from the latter (first encounters are often more real somehow).

Comment by AnnaSalamon on Pausing AI Developments Isn't Enough. We Need to Shut it All Down by Eliezer Yudkowsky · 2023-03-30T04:10:52.555Z · LW · GW

This seems like a very off-distribution move from Eliezer—which I suspect is in large part the point: when your model predicts doom by default, you go off-distribution in search of higher-variance regions of outcome space.

That's not how I read it.  To me it's an attempt at the simple, obvious strategy of telling people ~all the truth he can about a subject they care a lot about and where he and they have common interests.  This doesn't seem like an attempt to be clever or explore high-variance tails.  More like an attempt to explore the obvious strategy, or to follow the obvious bits of common-sense ethics, now that lots of allegedly clever 4-dimensional chess has turned out stupid.

Comment by AnnaSalamon on Are there specific books that it might slightly help alignment to have on the internet? · 2023-03-29T23:11:25.876Z · LW · GW

Thanks for the suggestion.  I haven't read it.  I'd thought from hearsay that it is rather lacking in "light" -- a bunch of people who're kinda bored and can't remember the meaning of life -- is that true?  Could be worth it anyway.

Comment by AnnaSalamon on Which parts of the existing internet are already likely to be in (GPT-5/other soon-to-be-trained LLMs)'s training corpus? · 2023-03-29T19:15:37.438Z · LW · GW

I did not know this; thanks!

Comment by AnnaSalamon on FLI open letter: Pause giant AI experiments · 2023-03-29T06:29:41.135Z · LW · GW

Not sure where you're going with this.  It seems to me that political methods (such as petitions, public pressure, threat of legislation) can be used to restrain the actions of large/mainstream companies, and that training models one or two OOM larger than GPT4 will be quite expensive and may well be done mostly or exclusively within large companies of the sort that can be restrained in this sort of way.

Comment by AnnaSalamon on Are there specific books that it might slightly help alignment to have on the internet? · 2023-03-29T06:06:49.252Z · LW · GW

Maybe also: anything that bears on how an LLM, if it realizes it is not human and is among aliens in some sense, might want to relate morally to thingies that created it and aren't it.  (I'm not immediately thinking of any good books/similar that bear on this, but there probably are some.)

Comment by AnnaSalamon on Are there specific books that it might slightly help alignment to have on the internet? · 2023-03-29T06:03:43.174Z · LW · GW

I was figuring GPT4 was already trained on a sizable fraction of the internet, and GPT5 would be trained on basically all the text (plus maybe some not-text, not sure).  Is this wrong?

Comment by AnnaSalamon on Are there specific books that it might slightly help alignment to have on the internet? · 2023-03-29T06:02:18.574Z · LW · GW

In terms of what kinds of things might be helpful:

1. Object-level stuff:

Things that help illuminate core components of ethics, such as "what is consciousness," "what is love," "what is up in human beings with the things we call 'values', that seem to have some thingies in common with beliefs," "how exactly did evolution end up producing the thing where we care about stuff and find some things worth caring about," etc.

Some books I kinda like in this space: 

  • Martin Buber's book "I and thou"; 
  • Christopher Alexander's writing, especially his "The Nature of Order" books
  • The Tao Te Ching (though this one I assume is thoroughly in any huge training corpus already)
  • (curious for y'all's suggestions)


2.  Stuff that aids processes for eliciting peoples' values, or for letting people elicit each others' values:

My thought here is that there're dialogs between different people, and between people and LLMs, on what matters and how we can tell.  Conversational methodologies for helping these dialogs go better seem maybe-helpful.  E.g. active listening stuff, or circling, or Gendlin's Focusing stuff, or ... [not sure what -- theory of how these sorts of fusions and dialogs can ever work, what they are, tips for how to do them in practice, ...]



3.  Especially, maybe: stuff that may help locate "attractor states" such that an AI, or a network of humans and near-human-level AIs, might, if it gets near this attractor state, choose to stay in this attractor state.  And such that the attractor state has something to do with creating good futures.

  • Confucius (? I haven't read him, but he at least shaped for society for a long time in a way that was partly about respecting and not killing your ancestors?)
  • Hayek (he has an idea of "natural law" as sort of how you have to structure minds and economies of minds if you want to be able to choose at all, rather than e.g. making random mouth motions that cause random other things to happen that have nothing to do with your intent really, like what would happen if a monarch says "I want to abolish poverty" and then people try to "implement" his "decree").
Comment by AnnaSalamon on FLI open letter: Pause giant AI experiments · 2023-03-29T05:31:55.192Z · LW · GW

It may not be possible to prevent GPT4-sized models, but it probably is possible to prevent GPT-5-sized models, if the large companies sign on and don't want it to be public knowledge that they did it.  Right?

Comment by AnnaSalamon on FLI open letter: Pause giant AI experiments · 2023-03-29T05:28:01.924Z · LW · GW

Oh no.  Apparently also Yann LeCun didn't really sign this.

https://twitter.com/ylecun/status/1640910484030255109

Comment by AnnaSalamon on Here's the exit. · 2022-11-24T23:40:51.177Z · LW · GW

As a personal datapoint: I think the OPs descriptions have a lot in common with how I used to be operating, and that I think this would have been tremendously good advice for me personally, both in terms of its impact on my personal wellness and in terms of its impact on whether I did good-for-the-world things or harmful things.

(If it matters, I still think AI risk is a decent pointer at a thingy in the world that may kill everyone, and that this matters.  The "get sober" thing is a good idea both in relation to that and broadly AFAICT.)

Comment by AnnaSalamon on “PR” is corrosive; “reputation” is not. · 2022-10-29T20:41:56.156Z · LW · GW

Nope, haven't changed it since publication.

Comment by AnnaSalamon on What should you change in response to an "emergency"? And AI risk · 2022-07-23T22:43:31.290Z · LW · GW

I like this observation.  As a random note, I've sometimes heard people justifying "leave poor working conditions in place for others, rather than spending managerial time improving them" based on how AI risk is an emergency, though whether this checks out on a local consequentialist level is not actually analyzed by the model above, since it partly involves tradeoffs between people and I didn't try to get into that.

I sorta also think that "people acting on a promise of community and support that they later [find] [isn't] there" is sometimes done semi-deliberately by the individuals in question, who are trying to get as much work out of their system one's as possible, by hoping a thing works out without really desiring accurate answers.  Or by others who value getting particular work done (via those individuals working hard) and think things are urgent and so are reasoning short-term and locally consequentialist-ly.  Again partly because people are reasoning near an "emergency."  But this claim seems harder to check/verify.  I hope people put more time into "really generating community" rather than "causing newcomers to have an expectation of community," though.

Comment by AnnaSalamon on What should you change in response to an "emergency"? And AI risk · 2022-07-23T22:34:29.052Z · LW · GW

I can think of five easily who spontaneously said something like this to me and who I recall specific names and details about.  And like 20 more who I'm inclined to project it onto but there was some guesswork involved on my part (e.g., they told me about trouble having hobbies and about feeling kinda haunted by whether it's okay to be "wasting" their time, and it seemed to me these factors were connected, but they didn't connect them aloud for me; or I said I thought there was a pattern like this and they nodded and discussed experiences of theirs but in a way that left some construal to me and might've been primed.  Also I did not name the 20, might be wrong in my notion of how many).

In terms of the five: two "yes better shot IMO," three not.  For the 20, maybe 1/4th "better short IMO".

Comment by AnnaSalamon on AnnaSalamon's Shortform · 2022-07-23T22:10:15.063Z · LW · GW

If you get covid (which many of my friends seem to be doing lately), and your sole goal is to minimize risk of long-term symptoms, is it best to take paxlovid right away, or with a delay?

My current low-confidence guess is that it is best with a delay of ~2 days post symptoms.  Would love critique/comments, since many here will face this sometime this year.

Basic reasoning: anecdotally, "covid rebound" seems extremely common among those who get paxlovid right away, probably also worse among those who get paxlovid right away.  Paxlovid prevents viral replication but does not destroy the virus already in your body.  With a delay, your own immune system learns to do this, else not as much.

Data and discussion: https://twitter.com/__philipn__/status/1550239344627027968

Comment by AnnaSalamon on Changing the world through slack & hobbies · 2022-07-21T21:02:42.123Z · LW · GW

Maybe.  But a person following up on threads in their leisure time, and letting the threads slowly congeal until they turn out to turn into a hobby, is usually letting their interests lead them initially without worrying too much about "whether it's going anywhere," whereas when people try to "found" something they're often trying to make it big, trying to make it something that will be scalable and defensible.  I like that this post is giving credit to the first process, which IMO has been historically pretty useful pretty often.  I'd also point to the old tradition of "gentlemen scientists" back before the era of publicly funded science, who performed very well per capita; I would guess that high performance was at least partly because there was more low-hanging fruit back then, but my personal guess is that that wasn't the only cause.

Comment by AnnaSalamon on What should you change in response to an "emergency"? And AI risk · 2022-07-21T19:50:20.197Z · LW · GW

I appreciate this comment a lot.  Thank you.  I appreciate that it’s sharing an inside view, and your actual best guess, despite these things being the sort of thing that might get social push-back!

My own take is that people depleting their long-term resources and capacities is rarely optimal in the present context around AI safety.

My attempt to share my reasoning is pretty long, sorry; I tried to use bolding to make it skimmable.

In terms of my inside-view disagreement, if I try to reason about people as mere means to an end (e.g. “labor”):

0.  A world where I'd agree with you.  If all that would/could impact AI safety was a particular engineering project (e.g., Redwood’s ML experiments, for concreteness), and if the time-frame of a person’s relevance to that effort was relatively short (e.g., a year or two, either because AI was in two years, or because there would be an army of new people in two years), I agree that people focusing obsessively for 60 hours/week would probably produce more than the same people capping their work at 35 hrs/week.

But (0) is not the world we’re in, at least right now.  Specific differences between a world where I'd agree with you, and the world we seem to me to be in:

1.  Having a steep discount rate on labor seems like a poor predictive bet to me.  I don’t think we’re within two years of the singularity; I do think labor is increasing but not at a crazy rate; and a person who keeps their wits and wisdom about them, who pays attention and cares and thinks and learns, and especially someone who is relatively new to the field and/or relatively young (which is the case for most such engineers I think), can reasonably hope to be more productive in 2 years than they are now, which can roughly counterbalance the increase (or more than counterbalance the increase) on my best guess.

E.g., if they get hired and Redwood and then stay there, you’ll want veterans a couple years later who already know your processes and skills.

(In 2009, I told myself I needed only to work hard for ~5 years, maybe 10, because after that I’d be a negligible portion of the AI safety effort, so it was okay to cut corners.  I still think I’m a non-negligible portion of the effort.)

1.1.  Trying a thing to see if it works (e.g. 60 hrs/week of obsession, to see how that is) might still be sensible, but more like “try it and see if it works, especially if that risk and difficulty is appealing, since “appealingness” is often an indicator that a thing will turn out to make sense / to yield useful info / to be the kind of thing one can deeply/sincerely try rather than forcing oneself to mimic, etc.” not like “you are nothing and don’t matter much after two years, run yourself into the ground while trying to make a project go.”  I suppose your question is about accepting a known probability of running yourself into the ground, but I’m having trouble booting that sim; to me the two mindsets are pretty different.  I do think many people are too averse to risk and discomfort; but also that valuing oneself in the long-term is correct and important.  Sorry if I’m dodging the question here.

2.  There is no single project that is most of what matters in AI safety today, AFAICT.  Also, such projects as exist are partly managerially bottlenecked.  And so it isn’t “have zero impact” vs “be above Redwood’s/project such-and-such’s hiring line,” it is “be slightly above a given hiring line” (and contribute the difference between that spot and the person who would fill it next, or between that project having one just-above-margin person and having one fewer but more managerial slack) vs “be alive and alert and curious as you take an interest in the world from some other location”, which is more continuous-ish.

3.  We are confused still, and the work is often subtle, such that we need people to notice subtle mismatches between what they’re doing and what makes sense to do, and subtle adjustments to specific projects, to which projects make sense at all, and subtle updates from how the work is going that can be propagated to some larger set of things, etc.  We need people who care and don’t just want to signal that they kinda look like they care.  We need people who become smarter and wiser and more oriented over time and who have deep scientific aesthetics, and other aesthetics.  We need people who can go for what matters even when it means backtracking or losing face.  We don’t mainly need people as something like fully needing-to-be-directed subjugated labor, who try for the appearances while lacking an internal compass.  I expect more of this from folks who average 35 hrs/week than 60 hrs/week in most cases (not counting brief sprints, trying things for awhile to test and stretch one’s capacities, etc. — all of which seems healthy and part of fully inhabiting this world to me).  Basically because of the things pointed out in Raemon’s post about slack, or Ben’s post about the Sabbath.  Also because often 60 hrs/week for long periods of time means unconsciously writing off important personal goals (cf Critch’s post about addiction to work), and IMO writing off deep goals for the long-term makes it hard to sincerely care about things.

(4.  I do agree there’s something useful about being able to work on other peoples’ projects, or on mundane non-glamorous projects, that many don’t have, and that naive readings of my #3 might tend to pull away from.  I think the deeper readings of #3 don’t, but it could be discussed.)


If I instead try to share my actual views, despite these being kinda wooey and inarticulate and hard to justify, instead of trying to reason about people as means to an end:

A.  I still agree that in a world where all that would/could impact AI safety was a particular engineering project (e.g., Redwood’s ML experiments, for concreteness), and if the time-frame of a person’s relevance to that effort was relatively short (e.g., a year or two, or even probably even five years), people focusing obsessively for 60 hours/week would be in many ways saner-feeling, more grounding, and more likely to produce the right kind of work in the right timeframe than the same people capping their work at 35 hrs/week.  (Although even here, vacations, sabbaths, or otherwise carefully maintaining enough of the right kinds of slack and leisure that deep new things can bubble up seems really valuable to me; otherwise I expect a lot of people working hard at dumb subtasks).

A2.  I’m talking about “saner-feeling” and “more grounding” here, because I’m imagining that if people are somehow capping their work at 35 hrs/week, this might be via dissociating from how things matter, and dissociation sucks and has bad side-effects on the quality of work and of team conversation and such IMO.  This is really the main thing I’m optimizing for ~in general; I think sane grounded contexts where people can see what causes will have what effects and can acknowledge what matters will mostly cause a lot of the right actions, and that the main question is how to cause such contexts, whether that means 60 hrs/week or 35 hrs/week or what.

A3.  In this alternate world, I expect people will kinda naturally reason about themselves and one another as means to an end (to the end of us all surviving), in a way that won’t be disoriented and won’t be made out of fear and belief-in-belief and weird dissociation.

B.  In the world we seem to actually be in, I think all of this is pretty different:

B1.  It’s hard to know what safety strategies will or won’t help how much.  

B2.  Lots of people have “belief in belief” about safety strategies working.  Often this is partly politically motivated/manipulated, e.g. people wanting to work at an organization and to rise there via buying into that organization’s narrative; an organization wanting its staff and potential hires to buy its narrative so they’ll work hard and organize their work in particular ways and be loyal.

B3.  There are large “unknown unknowns,” large gaps in the total set of strategies being done, maybe none of this makes sense, etc.

B4.  AI timelines are probably more than two years, probably also more than five years, although it’s hard to know.

C.  In a context like the hypothetical one in A, people talking about how some people are worth much more than another, about what tradeoffs will have what effects, etc. will for many cash out in mechanistic reasoning and so be basically sane-making and grounding.  (Likewise, I suspect battlefield triage or mechanistic reasoning from a group of firefighters considering rescuing people from a burning building is pretty sane-making.)

In a context like the one in B (which is the one I think we’re in), people talking about themselves and other people as mere means to an end, and about how much more some people are worth than another such that those other people are a waste for the first people to talk to, and so on, will tend to increase social fear, decrease sharing of actual views, and increase weird status stuff and the feeling that one ought not question current social narratives, I think.  It will tend to erode trust, erode freedom to be oneself or to share data about how one is actually thinking and feeling, and increase the extent to which people cut off their own and others’ perceptual faculties. The opposite of sane-ifying/grounding.

To gesture a bit at what I mean: a friend of mine, after attending a gathering of EA elites for the first time, complained that it was like: “So, which of the 30 organizations that we all agree has no more than a 0.1% chance of saving the world do you work for?”, followed by talking shop about the specifics within that plan, with almost no attention to the rest of the probability mass.

So I think we ought mostly not to reason about ourselves and other “labor” as though we’re in simple microecon world, given the world we’re in, and given that it encourages writing off a bunch of peoples’ perceptual abilities etc.  Though I also think that you, Buck (or others) speaking your mind, including when you’re reasoning this way, is extremely helpful!  We of course can’t stop wrong views by taking my best guess at which views are right and doing belief-in-belief about it; we have to converse freely and see what comes out.

(Thanks to Justis for saying some of this to me in the comments prior to me posting.)

Comment by AnnaSalamon on Sexual Abuse attitudes might be infohazardous · 2022-07-20T21:18:58.646Z · LW · GW

I think this is a solid point, and that pointing out the asymmetry in evolutionary gradients is important; I would also expect different statistical distributions for men and women here.  At the same time, my naive ev psych guess about how all this is likely to work out would also take into account that men and women share genes, and that creating gender-specific adaptations is actually tricky.  As evidence: men have nipples, and those nipples sometimes produce drops of milk.

Once, awhile ago and outside this community, a female friend swore me to secrecy and then shared a story and hypothesis similar to the OPs (ETA: it was also a story of sexual touching, not of rape; I suspect rape is usually more traumatic).  I've also heard stories of being pretty messed up by sexual abuse from both men and women, including at least two different men messed up by having, as teenagers, had sex with older women (in one case, one of his teachers) without violence/force, but with manipulation.  My current guess is that adaptations designed for one sex typically appear with great variability in the other sex (e.g. male nipples' milk production), and so we should expect some variability in male reactions here.  Also everyone varies.

ETA: I'd like to quarrel with the use of the word "infohazardous" in the OP's title.  My best guess is that people would be better off having all the stories, including stories such as the OPs that is is currently somewhat taboo to share; my best guess is that there is a real risk of harm the OP is gesturing at, but this is significantly via the selective non-sharing of info, rather than being primarily via the sharing of info.