Posts

Where do (did?) stable, cooperative institutions come from? 2020-11-03T22:14:09.322Z
Reality-Revealing and Reality-Masking Puzzles 2020-01-16T16:15:34.650Z
We run the Center for Applied Rationality, AMA 2019-12-19T16:34:15.705Z
"Flinching away from truth” is often about *protecting* the epistemology 2016-12-20T18:39:18.737Z
Further discussion of CFAR’s focus on AI safety, and the good things folks wanted from “cause neutrality” 2016-12-12T19:39:50.084Z
CFAR's new mission statement (on our website) 2016-12-10T08:37:27.093Z
CFAR’s new focus, and AI Safety 2016-12-03T18:09:13.688Z
On the importance of Less Wrong, or another single conversational locus 2016-11-27T17:13:08.956Z
Several free CFAR summer programs on rationality and AI safety 2016-04-14T02:35:03.742Z
Consider having sparse insides 2016-04-01T00:07:07.777Z
The correct response to uncertainty is *not* half-speed 2016-01-15T22:55:03.407Z
Why CFAR's Mission? 2016-01-02T23:23:30.935Z
Why startup founders have mood swings (and why they may have uses) 2015-12-09T18:59:51.323Z
Two Growth Curves 2015-10-02T00:59:45.489Z
CFAR-run MIRI Summer Fellows program: July 7-26 2015-04-28T19:04:27.403Z
Attempted Telekinesis 2015-02-07T18:53:12.436Z
How to learn soft skills 2015-02-07T05:22:53.790Z
CFAR fundraiser far from filled; 4 days remaining 2015-01-27T07:26:36.878Z
CFAR in 2014: Continuing to climb out of the startup pit, heading toward a full prototype 2014-12-26T15:33:08.388Z
Upcoming CFAR events: Lower-cost bay area intro workshop; EU workshops; and others 2014-10-02T00:08:44.071Z
Why CFAR? 2013-12-28T23:25:10.296Z
Meetup : CFAR visits Salt Lake City 2013-06-15T04:43:54.594Z
Want to have a CFAR instructor visit your LW group? 2013-04-20T07:04:08.521Z
CFAR is hiring a logistics manager 2013-04-05T22:32:52.108Z
Applied Rationality Workshops: Jan 25-28 and March 1-4 2013-01-03T01:00:34.531Z
Nov 16-18: Rationality for Entrepreneurs 2012-11-08T18:15:15.281Z
Checklist of Rationality Habits 2012-11-07T21:19:19.244Z
Possible meetup: Singapore 2012-08-21T18:52:07.108Z
Center for Modern Rationality currently hiring: Executive assistants, Teachers, Research assistants, Consultants. 2012-04-13T20:28:06.071Z
Minicamps on Rationality and Awesomeness: May 11-13, June 22-24, and July 21-28 2012-03-29T20:48:48.227Z
How do you notice when you're rationalizing? 2012-03-02T07:28:21.698Z
Urges vs. Goals: The analogy to anticipation and belief 2012-01-24T23:57:04.122Z
Poll results: LW probably doesn't cause akrasia 2011-11-16T18:03:39.359Z
Meetup : Talk on Singularity scenarios and optimal philanthropy, followed by informal meet-up 2011-10-10T04:26:09.284Z
[Question] Do you know a good game or demo for demonstrating sunk costs? 2011-09-08T20:07:55.420Z
[LINK] How Hard is Artificial Intelligence? The Evolutionary Argument and Observation Selection Effects 2011-08-29T05:27:31.636Z
Upcoming meet-ups 2011-06-21T22:28:40.610Z
Upcoming meet-ups: 2011-06-11T22:16:09.641Z
Upcoming meet-ups: Buenos Aires, Minneapolis, Ottawa, Edinburgh, Cambridge, London, DC 2011-05-13T20:49:59.007Z
Mini-camp on Rationality, Awesomeness, and Existential Risk (May 28 through June 4, 2011) 2011-04-24T08:10:13.048Z
Learned Blankness 2011-04-18T18:55:32.552Z
Talk and Meetup today 4/4 in San Diego 2011-04-04T11:40:05.167Z
Use curiosity 2011-02-25T22:23:54.462Z
Make your training useful 2011-02-12T02:14:03.597Z
Starting a LW meet-up is easy. 2011-02-01T04:05:43.179Z
Branches of rationality 2011-01-12T03:24:35.656Z
If reductionism is the hammer, what nails are out there? 2010-12-11T13:58:18.087Z
Anthropologists and "science": dark side epistemology? 2010-12-10T10:49:41.139Z
Were atoms real? 2010-12-08T17:30:37.453Z
Help request: What is the Kolmogorov complexity of computable approximations to AIXI? 2010-12-05T10:23:55.626Z

Comments

Comment by annasalamon on AGI Predictions · 2020-11-22T05:12:51.612Z · LW · GW

IMO, we decidedly do not "basically have it covered."

That said, IMO it is generally not a good idea for a person to try to force themselves on problems that will make them crazy, desperate need or no.

I am often tempted to downplay how much catastrophe-probability I see, basically to decrease the odds that people decide to make themselves crazy in the direct vicinity of alignment research and alignment researchers.

And on the other hand, I am tempted by the HPMOR passage:

"Girls?" whispered Susan. She was slowly pushing herself to her feet, though Hermione could see her limbs swaying and quivering. "Girls, I'm sorry for what I said before. If you've got anything clever and heroic to try, you might as well try it."

(To be clear, I have hope. Also, please just don't go crazy and don't do stupid things.)

Comment by annasalamon on Where do (did?) stable, cooperative institutions come from? · 2020-11-04T05:44:31.455Z · LW · GW

Thanks; this resonates for me and I hadn't thought of it here.

The guess that makes sense to me along these lines: maybe it's less about individual vulnerability from attack/etc. And more that they can sense somehow that the fundamentals of the our collective situation are not viable (environmental collapse, AI, social collapse, who knows, from that visceral perspective I imagine them to have), and yet they don't have a frame for understanding the "this can't keep working," and so it lands in the "in denial" bucket and their "serious person" is fake. (I don't think the "fake" comes from "scared" alone, I think you need also "in denial about it." For example, I think military units in war often do not feel fake, although their people are scared.)

(Alternative theory for scared: maybe it is just that we are lacking tribe.)

Comment by annasalamon on Where do (did?) stable, cooperative institutions come from? · 2020-11-04T05:07:14.120Z · LW · GW

Thanks. I buy the death spirals thing. I'm not sure I buy the "OK in the private sector but not the public sector b/c no competitive process there" thing -- do you have a story for why the public sector remained okay for ~200 years (if it did)? Also, particular newspapers and academic institutions have competitors, and seem to me also to be in decline.

Comment by annasalamon on Open & Welcome Thread – November 2020 · 2020-11-04T05:04:30.683Z · LW · GW

"And the explosive mood had rapidly faded into a collective sentiment which might perhaps have been described by the phrase: Give us a break!

Blaise Zabini had shot himself in the name of Sunshine, and the final score had been 254 to 254 to 254."

Comment by annasalamon on Where do (did?) stable, cooperative institutions come from? · 2020-11-04T00:16:01.001Z · LW · GW

The first half is fine, but replace "altruistically" with "selfishly".... They figure out how to make a living... [emphasis mine]

At first glance, if we're talking about a thing that requires cooperative effort from many people across time, this seems like a heck of a principal agent problem. What keeps everybody's incentives aligned? Why does each of us trying selfishly to make a living result in a working fire fighting group (or whatever) instead of a tug-of-war? I understand the "invisible hand" when many different individuals are individually putting up goods/services for sale; I do not understand it as an explanation for how hundreds of people get coordinated into working institutions.

My 0-3 is an attempt to understand how something-like-selfishness (or something-like-altruism, or whatever) could stitch the people together into a thingy that could produce good stuff despite the principal agent problem / coordination difficulty.

Comment by annasalamon on Where do (did?) stable, cooperative institutions come from? · 2020-11-03T23:50:18.231Z · LW · GW

Thanks for the SF crime link; you may be right. Multiple (but far from all) friends of mine in SF have been complaining about being more often accosted, having greater fear of mugging than previously, etc.; but that is a selection of crimes and is not conclusive evidence.

Comment by annasalamon on Where do (did?) stable, cooperative institutions come from? · 2020-11-03T23:49:29.369Z · LW · GW

Thanks for the SF crime link; you may be right. Multiple (but far from all) friends of mine in SF have been complaining about being more often accosted, having greater fear of mugging than previously, etc.; but that is a selection of crimes and is not conclusive evidence.

Comment by annasalamon on Where do (did?) stable, cooperative institutions come from? · 2020-11-03T23:25:51.352Z · LW · GW

There actually seems to be far more subcultures being formed than there ever were before

DaystarEld, what are your favorite current happening scenes? (Where new art/science/music/ways of making sense of the world/neat stuff is being created?) Would love leads on where to look.

Comment by annasalamon on Where do (did?) stable, cooperative institutions come from? · 2020-11-03T23:13:05.831Z · LW · GW

Thanks. Under this hypothesis, we should see an improvement in the quality of private-sector institutions. (Whereas, under some competing hypotheses, Google and other private-sector companies should also have trouble creating institutional cultures in the 0-3 sense.) Thoughts on which we see?

Also, thoughts on David Chapman's claim that subcultures (musical scenes, hobby groups, political movements, etc.) have been vanishing? Do you also hypothesize this brain drain to affect hobby groups?

Comment by annasalamon on Where do (did?) stable, cooperative institutions come from? · 2020-11-03T23:02:15.445Z · LW · GW

Good question. I'm not sure if this will make sense, but: this is somehow the sort of place where I would expect peoples' stab-in-the-dark faculties ("blindsight", some of us call it at CFAR) to have some shot at seeing the center of the thing, and where by contrast I would expect that trying to analyze it with explicit concepts that we already know how to point to would... find us some very interesting details, but nonetheless risk having us miss the "main point," whatever that is.

Differently put: "what is up with institutional cultures lately?" is a question where I don't yet have the right frame/ontology. And so, if we try to talk from concepts/ontologies we do have, I'm afraid we'll slide off of the thing. Whereas, if we tune in to something like that tiny note of discord Eliezer talks about (or if we pan out a lot, and ask what our taste says is/isn't most relevant to the situation, or ask ourselves what does/doesn't feel most central), we may have a better shot.

Comment by annasalamon on Where do (did?) stable, cooperative institutions come from? · 2020-11-03T22:54:02.139Z · LW · GW

Thanks! Fixed.

Comment by annasalamon on Where do (did?) stable, cooperative institutions come from? · 2020-11-03T22:37:21.126Z · LW · GW

One hypothesis for why it has gotten harder to form institutional cultures (I am assuming here that it has):

I’ll call this the “Geeks, MOPS, and sociopaths” model. Under this model (put forward e.g. in the essay of Ben Hoffman’s that I linked above), it has somehow become easier and more common for people to successfully ape the appearances of an institutional culture, while not truly being true to it (and so, while betraying it in longer-term or harder-to-trace ways).

In the example of the NYT, this could occur in several ways:

  • People getting jobs within the NYT who believe less sincerely in the old journalistic ethic (though they perhaps believe in looking like they believe in whatever is popular);
  • Alternative press outlets (Washington Post, or whoever) arising that believe less sincerely in journalistic ethics (or anything like this) than the NYT, but who parasitize the “kind of like journalistic ethics” brand by aping its appearance to readers;
  • Leadership of the NYT being more interested in bending the NYT’s brand (and its internal culture/ethics) to however current people today happen to be evaluating which newspaper to trust, in ways that boost those leaders’ personal [$/prestige/political power] but that harm the longer-term legacy of the institution (because future people, who are under the sway of different fads, won’t see it this way).

Related argument: the 4-hour documentary / propaganda film “Century of the Self” argues that the dispersion of game theory (“it’s virtuous to think about my self-interest and e.g. defect in prisoners’ dilemmas”) and of marketing/focus groups/“public relations” (“my brand can figure out how other people are making sense of the world on a pre-conscious level, by using techniques similar to Gendlin’s Focusing on them, and can thereby figure out how to be perceived as having a certain ethic/culture/institution by hacking their detectors”) led to more of this sort of aping, and replaced institutional cultures that might’ve helped past people do real work with LARPing and “lifestyles.”

This is also quite related to Goodhardt’s law. But under this hypothesis, dynamics have somehow changed dynamics so that [individuals/organizations who are trying to appear to have virtues] are able to successfully fool the detectors of [individuals/organizations that are are trying to detect whether they have virtues]. It does not explain why that would have changed.

Comment by annasalamon on Open & Welcome Thread - June 2020 · 2020-06-06T16:40:59.056Z · LW · GW

I'm glad you're trying, and am sorry to hear it is so hard; that sounds really hard. You might try the book "How to have Impossible Conversations." I don't endorse every bit of it, but there's some good stuff in there IMO, or at least I got mileage from it.

Comment by annasalamon on Hanson vs Mowshowitz LiveStream Debate: "Should we expose the youth to coronavirus?" (Mar 29th) · 2020-04-05T18:03:14.427Z · LW · GW

Yes; thanks; I now agree that this is plausible, which I did not at the time of making my above comment.

Comment by annasalamon on Hanson vs Mowshowitz LiveStream Debate: "Should we expose the youth to coronavirus?" (Mar 29th) · 2020-03-29T20:16:20.149Z · LW · GW

​I think we are unlikely to hit herd immunity levels of infection in the US in the next 2 years. I want to see Robin and Zvi discuss whether they think that also or not, since this bears on the value of Robin's proposal (and lots of other things).

Comment by AnnaSalamon on [deleted post] 2020-02-17T22:18:08.885Z

Add lots of sleep and down-time, and activities with a clear feedback loop to the physical world (e.g. washing dishes or welding metals or something).

Comment by annasalamon on Mazes Sequence Roundup: Final Thoughts and Paths Forward · 2020-02-09T10:35:23.080Z · LW · GW

For anyone just tuning in and wanting to follow what I mean by “dominating and submitting,” I have in mind the kinds of interactions that Keith Johnstone describes in the “status” chapter of “Impro” (text here; excerpt and previous overcoming bias discussion here.)

This is the book that indirectly caused us to use the word “status” so often around here, but I feel the term “status” is a euphemism that brings model-distortions, versus discussing “dominating and submitting.” FWIW, Johnstone in the original passage says it is a euphemism, writing: “I should really talk about dominance and submission, but I’d create a resistance. Students who will agree readily to raising or lowering their status may object if asked to ‘dominate’ or ‘submit’.” (Hattip: Divia.)

Comment by annasalamon on Mazes Sequence Roundup: Final Thoughts and Paths Forward · 2020-02-09T10:23:39.146Z · LW · GW

The Snafu Principle, whereby communication is only fully possible between equals, leading to Situation Normal All F***ed Up.

This seems true to me in one sense of “equals” and false in another. It seems true to me that dominating and submitting prohibit real communication. It does not seem true to me that structures of authority (“This is my coffee shop; and so if you want to work here you’ll have to sign a contract with me, and then I’ll be able to stop hiring you later if I don’t want to hire you later”) necessarily prohibit communication, though. I can imagine contexts where free agents voluntarily decide to enter into an authority relationship (e.g., because I freely choose to work at Bob’s coffee shop until such time as it ceases to aid my and/or Bob’s goals), without dominating or submitting, and thereby with the possibility of communication.

Relatedly, folks who are peers can easily enough end up dominating/submitting-to each other, or getting stuck reinforcing lies to each other about how good each others’ poetry is or whatever, instead of communicating.

Do you agree that this is the true bit of the “communication is only possible between equals” claim, or do you have something else in mind?

Comment by annasalamon on Player vs. Character: A Two-Level Model of Ethics · 2020-01-20T05:46:33.016Z · LW · GW

I'm a bit torn here, because the ideas in the post seem really important/useful to me (e.g., I use these phrases as a mental pointer sometimes), such that I'd want anyone trying to make sense of the human situation to have access to them (via this post or a number of other attempts at articulating much the same, e.g. "Elephant and the Brain"). And at the same time I think there's some crucial misunderstanding in it that is dangerous and that I can't articulate. Voting for it anyhow though.

Comment by annasalamon on Reality-Revealing and Reality-Masking Puzzles · 2020-01-20T05:00:32.143Z · LW · GW

Responding partly to Orthonormal and partly to Raemon:

Part of the trouble is that group dynamic problems are harder to understand, harder to iterate on, and take longer to appear and to be obvious. (And are then harder to iterate toward fixing.)

Re: individuals having manic or psychotic episodes, I agree with what Raemon says. About six months into a year into CFAR’s workshop-running experience, a participant had a manic episode a couple weeks after a workshop in a way that seemed plausibly triggered partly by the workshop. (Interestingly, if I’m not mixing people up, the same individual later told me that they’d also been somewhat destabilized by reading the sequences, earlier on.) We then learned a lot about warning signs of psychotic or manic episodes and took a bunch of steps to mostly-successfully reduce the odds of having the workshop trigger these. (In terms of causal mechanisms: It turns out that workshops of all sorts, and stuff that messes with one’s head of all sorts, seem to trigger manic or psychotic episodes occasionally. E.g. Landmark workshops; meditation retreats; philosophy courses; going away to college; many different types of recreational drugs; and different small self-help workshops run by a couple people I tried randomly asking about this from outside the rationality community. So my guess is that it isn’t the “taking ideas seriously” aspect of CFAR as such, although I dunno.)

Re: other kinds of “less sane”:

(1) IMO, there has been a build-up over time of mentally iffy psychological habits/techniques/outlook-bits in the Berkeley “formerly known as rationality” community, including iffy thingies that affect the rate at which other iffy things get created (e.g., by messing with the taste of those receiving/evaluating/passing on new “mess with your head” techniques; and by helping people be more generative of “mess with your head” methods via them having had a chance to see several already which makes it easier to build more). My guess is that CFAR workshops have accidentally been functioning as a “gateway drug” toward many things of iffy sanity-impact, basically by: (a) providing a healthy-looking context in which people get over their concerns about introspection/self-hacking because they look around and see other happy healthy-looking people; and (b) providing some entry-level practice with introspection, and with “dialoging with one’s tastes and implicit models and so on”, which makes it easier for people to mess with their heads in other, less-vetted ways later.

My guess is that the CFAR workshop has good effects on folks who come from a sane-isn or at least stable-is outside context, attend a workshop, and then return to that outside context. My guess is that its effects are iffier for people who are living in the bay area, do not have a day job/family/other anchor, and are on a search for “meaning.”

My guess is that those effects have been getting gradually worse over the last five or more years, as a background level of this sort of thing accumulates.

I ought probably to write about this in a top-level post, and may actually manage to do so. I’m also not at all confident of my parsing/ontology here, and would quite appreciate help with it.

(2) Separately, AI risk seems pretty hard for people, including ones unrelated to this community.

(3) Separately, “taking ideas seriously” indeed seems to pose risks. And I had conversations with e.g. Michael Vassar back in ~2008 where he pointed out that this poses risks; it wasn’t missing from the list. (Even apart from tail risks, some forms of “taking ideas seriously” seem maybe-stupid in cases where the “ideas” are not grounded also in one’s inner simulator, tastes, viscera — much sense is there that isn’t in ideology-mode alone). I don’t know whether CFAR workshops increase or decrease peoples’ tendency to take ideas seriously in the problematic sense, exactly. They have mostly tried to connect peoples’ ideas and peoples’ viscera in both directions.

“How to take ideas seriously without [the taking ideas seriously bit] causing them to go insane” as such actually still isn’t that high on my priorities list; I’d welcome arguments that it should be, though.

I’d also welcome arguments that I’m just distinguishing 50 types of snow and that these should all be called the same thing from a distance. But for the moment for me the group-level gradual health/wholesomeness shifts and the individual-level stuff show up as pretty different.

Comment by annasalamon on Reality-Revealing and Reality-Masking Puzzles · 2020-01-17T19:27:42.449Z · LW · GW

There are some edge cases I am confused about, many of which are quite relevant to the “epistemic immune system vs Sequences/rationality” stuff discussed above:

Let us suppose a person has two faculties that are both pretty core parts of their “I” -- for example, deepset “yuck/this freaks me out” reactions (“A”), and explicit reasoning (“B”). Now let us suppose that the deepset “yuck/this freaks me out” reactor (A) is being used to selectively turn off the person’s contact with explicit reasoning in cases where it predicts that B “reasoning” will be mistaken / ungrounded / not conducive to the goals of the organism. (Example: a person’s explicit models start saying really weird things about anthropics, and then they have a less-explicit sense that they just shouldn’t take arguments seriously in this case.)

What does it mean to try to “help” a person in such as case, where two core faculties are already at loggerheads, or where one core faculty is already masking things from another?

If a person tinkers in such a case toward disabling A’s ability to disable B’s access to the world… the exact same process, in its exact same aspect, seems “reality-revealing” (relative to faculty B) and “reality-masking” (relative to faculty A).

Comment by annasalamon on Reality-Revealing and Reality-Masking Puzzles · 2020-01-17T19:27:23.700Z · LW · GW

To try yet again:

The core distinction between tinkering that is “reality-revealing” and tinkering that is “reality-masking,” is which process is learning to predict/understand/manipulate which other process.

When a process that is part of your core “I” is learning to predict/manipulate an outside process (as with the child who is whittling, and is learning to predict/manipulate the wood and pocket knife), what is happening is reality-revealing.

When a process that is not part of your core “I” is learning to predict/manipulate/screen-off parts of your core “I”s access to data, what is happening is often reality-masking.

(Multiple such processes can be occurring simultaneously, as multiple processes learn to predict/manipulate various other processes all at once.)

The "learning" in a given reality-masking process can be all in a single person's head (where a person learns to deceive themselves just by thinking self-deceptive thoughts), but it often occurs via learning to impact outside systems that then learn to impact the person themselves (like in the example of me as a beginning math tutor learning to manipulate my tutees into manipulating me into thinking I'd explained things clearly)).

The "reality-revealing" vs "reality-masking" distinction is in attempt to generalize the "reasoning" vs "rationalizing" distinction to processes that don't all happen in a single head.

Comment by annasalamon on Reality-Revealing and Reality-Masking Puzzles · 2020-01-17T12:42:39.358Z · LW · GW

I like your example about your math tutoring, where you "had a fun time” and “[weren’t] too results driven” and reality-masking phenomena seemed not to occur.

It reminds me of Eliezer talking about how the first virtue of rationality is curiosity.

I wonder how general this is. I recently read the book “Zen Mind, Beginner’s Mind,” where the author suggests that difficulty sticking to such principles as “don’t lie,” “don’t cheat,” “don’t steal,” comes from people being afraid that they otherwise won’t get a particular result, and recommends that people instead… well, “leave a line of retreat” wasn’t his suggested ritual, but I could imagine “just repeatedly leave a line of retreat, a lot” working for getting unattached.

Also, I just realized (halfway through typing this) that cousin_it and Said Achmiz say the same thing in another comment.

Comment by annasalamon on Reality-Revealing and Reality-Masking Puzzles · 2020-01-17T11:21:32.879Z · LW · GW

Thanks; you naming what was confusing was helpful to me. I tried to clarify here; let me know if it worked. The short version is that what I mean by a "puzzle" is indeed person-specific.

A separate clarification: on my view, reality-masking processes are one of several possible causes of disorientation and error; not the only one. (Sort of like how rationalization is one of several possible causes of people getting the wrong answers on math tests; not the only one.) In particular, I think singularity scenarios are sufficiently far from what folks normally expect that the sheer unfamiliarity of the situation can cause disorientation and errors (even without any reality-masking processes; though those can then make things worse).

Comment by annasalamon on Reality-Revealing and Reality-Masking Puzzles · 2020-01-17T10:56:35.962Z · LW · GW

The difficulties above were transitional problems, not the main effects.

Why do you say they were "transitional"? Do you have a notion of what exactly caused them?

Comment by annasalamon on Reality-Revealing and Reality-Masking Puzzles · 2020-01-17T10:17:53.429Z · LW · GW

A couple people asked for a clearer description of what a “reality-masking puzzle” is. I’ll try.

JamesPayor’s comment speaks well for me here:

There was the example of discovering how to cue your students into signalling they understand the content. I think this is about engaging with a reality-masking puzzle that might show up as "how can I avoid my students probing at my flaws while teaching" or "how can I have my students recommend me as a good tutor" or etc.

It's a puzzle in the sense that it's an aspect of reality you're grappling with. It's reality-masking in that the pressure was away from building true/accurate maps.

To say this more slowly:

Let’s take “tinkering” to mean “a process of fiddling with a [thing that can provide outputs] while having some sort of feedback-loop whereby the [outputs provided by the thing] impacts what fiddling is tried later, in such a way that it doesn’t seem crazy to say there is some ‘learning’ going on.”

Examples of tinkering:

  • A child playing with legos. (The “[thing that provides outputs]” here is the [legos + physics], which creates an output [an experience of how the legos look, whether they fall down, etc.] in reply to the child’s “what if I do this?” attempts. That output then affects the child’s future play-choices some, in such a way that it doesn’t seem crazy to say there is some “learning” happening.)
  • An person doodling absent-mindedly while talking on the phone, even if the doodle has little to no conscious attention;
  • A person walking. (Since the walking process (I think) contains at least a bit of [exploration / play / “what happens if I do this?” -- not necessarily conscious], and contains some feedback from “this is what happens when you send those signals to your muscles” to future walking patterns)
  • A person explicitly reasoning about how to solve a math problem
  • A family member A mostly-unconsciously taking actions near another family member B [while A consciously or unconscoiusly notices something about how the B responds, and while A has some conscious or unconscious link between [how B responds] and [what actions A takes in future].

By a “puzzle”, I mean a context that gets a person to tinker. Puzzles can be person-specific. “How do I get along with Amy?” may be a puzzle for Bob and may not be a puzzle for Carol (because Bob responds to it by tinkering, and Carol responds by, say, ignoring it). A kong toy with peanut butter inside is a puzzle for some dogs (i.e., it gets these dogs to tinker), but wouldn’t be for most people. Etc.

And… now for the hard part. By a “reality-masking puzzle”, I mean a puzzle such that the kind of tinkering it elicits in a given person will tend to make that person’s “I” somehow stupider, or in less contact with the world.

The usual way this happens is that, instead of the tinkering-with-feedback process gradually solving an external problem (e.g., “how do I get the peaut butter out of the kong toy?”), the tinkering-with-feedback process is gradually learning to mask things from part of their own mind (e.g. “how do I not-notice that I feel X”).

This distinction is quite related to the distinction between reasoning and rationalization.

However, it differs from that distinction in that “rationalization” usually refers to processes happening within a single person’s mind. And in many examples of “reality-masking puzzles,” the [process that figures out how to mask a bit of reality from a person’s “I”] is spread across multiple heads, with several different tinkering processes feeding off each other and the combined result somehow being partially about blinding someone.

I am actually not all that satisfied by the “reality-revealing puzzles” vs “reality-masking puzzles” ontology. It was more useful to me than what I’d had before, and I wanted to talk about it, so I posted it. But… I understand what it means for the evidence to run forwards vs backwards, as in Eliezer’s Sequences post about rationalization. I want a similarly clear-and-understood generalization of the “reasoning vs rationalizing” distinction that applies also to processes to spread across multiple heads. I don’t have that yet. I would much appreciate help toward this. (Incremental progress helps too.)

Comment by annasalamon on We run the Center for Applied Rationality, AMA · 2020-01-11T19:21:22.265Z · LW · GW

No; that isn't the trouble; I could imagine us getting the money together for such a thing, since one doesn't need anything like a consensus to fund a position. The trouble is more that at this point the members of the bay area {formerly known as "rationalist"} "community" are divided into multiple political factions, or perhaps more-chaos-than-factions, which do not trust one another's judgment (even about pretty basic things, like "yes this person's actions are outside of reasonable behavioral norms). It is very hard to imagine an individual or a small committee that people would trust in the right way. Perhaps even more so after that individual or committee tried ruling against someone who really wanted to stay, and and that person attempted to create "fear, doubt, and uncertainty" or whatever about the institution that attempted to ostracize them.

I think something in this space is really important, and I'd be interested in investing significantly in any attempt that had a decent shot at helping. Though I don't yet have a strong enough read myself on what the goal ought to be.

Comment by annasalamon on AIRCS Workshop: How I failed to be recruited at MIRI. · 2020-01-08T20:54:25.030Z · LW · GW

Hi Mark,

This maybe doesn't make much difference for the rest of your comment, but just FWIW: the workshop you attended in Sept 2016 not part of the AIRCS series. It was a one-off experiment, funded by an FLI grant, called "CFAR for ML", where we ran most of a standard CFAR workshop and then tacked on an additional day of AI alignment discussion at the end.

The AIRCS workshops have been running ~9 times/year since Feb 2018, have been evolving pretty rapidly, and in recent iterations involve a higher ratio of either AI risk content, or content about how cognitive biases etc. that seem to arise in discussion about AI risk in particular. They have somewhat smaller cohorts for more 1-on-1 conversation (~15 participants instead of 23). They are co-run with MIRI, which "CFAR for ML" was not. They have a slightly different team and a slightly different beast.

Which... doesn't mean you wouldn't have had most of the same perceptions if you'd come to a recent AIRCS! You might well have. From a distance perhaps all our workshops are pretty similar. And I can see calling "CFAR for ML" "AIRCS", since it was in fact partially about AI risk and was aimed mostly at computer scientists, which is what "AIRCS" stands for. Still, we locally care a good bit of the distinctions between our programs, so I did want to clarify.

Comment by annasalamon on CFAR Participant Handbook now available to all · 2020-01-08T05:17:21.630Z · LW · GW

Some combination of: (a) lots of people still wanted it, and we're not sure our previous "idea inoculation" concerns are actually valid, and there's something to testing the idea of giving people what they want; and (perhaps more significantly) (b) we're making more of an overall push this year toward making our purpose and curriculum and strategy and activities and so on clear and visible so that we can dialog with people about our plans, and we figured that putting the handbook online might help with that.

Comment by annasalamon on AIRCS Workshop: How I failed to be recruited at MIRI. · 2020-01-08T05:14:59.479Z · LW · GW

Hi Arthur,

Thanks for the descriptions — it is interesting for me to hear about your experiences, and I imagine a number of others found it interesting too.

A couple clarifications from my perspective:

First: AIRCS is co-run by both CFAR and MIRI, and is not entirely a MIRI recruiting program, although it is partly that! (You might know this part already, but it seems like useful context.)

We are hoping that different people go on from AIRCS to a number of different AI safety career paths. For example:

  • Some people head straight from AIRCS to MIRI.
  • Some people attend AIRCS workshops multiple times, spaced across months or small years, while they gradually get familiar with AI safety and related fields.
  • Some people realize after an AIRCS workshop that AI safety is not a good fit for them.
  • Some people, after attending one or perhaps many AIRCS workshops, go on to do AI safety research at an organization that isn’t MIRI. All of these are good and intended outcomes from our perspective! AI safety could use more good technical researchers, and AIRCS is a long-term investment toward improving the number of good computer scientists (and mathematicians and others) who have some background in the field. (Although it is also partly aimed to help with MIRI's recruiting in particular.)

Separately, I did not mean to "give people a rule" to "not speak about AI safety to people who do not express interest." I mean, neither I nor AIRCS as an organization have some sort of official request that people follow a “rule” of that sort. I do personally usually follow a rule of that sort, though (with exceptions). Also, when people ask me for advice about whether to try to “spread the word” about AI risk, I often share that I personally am a bit cautious about when and how I talk with people about AI risk; and I often share details about that.

I do try to have real conversations with people that reply to their curiosity and/or to their arguments/questions/etc., without worrying too much which directions such conversations will update them toward.

I try to do this about AI safety, as about other topics. And when I do this about AI safety (or other difficult topics), I try to help people have enough “space” that they can process things bit-by-bit if they want. I think it is often easier and healthier to take in a difficult topic at one’s own pace. But all of this is tricky, and I would not claim to know the one true answer to how everyone should talk about AI safety.

Also, I appreciate hearing about the bits you found distressing; thank you. Your comments make sense to me. I wonder if we’ll be able to find a better format in time. We keep getting bits of feedback and making small adjustments, but it is a slow process. Job applications are perhaps always a bit awkward, but iterating on “how do we make it less awkward” does seem to yield slow-but-of-some-value modifications over time.

Comment by annasalamon on We run the Center for Applied Rationality, AMA · 2019-12-25T11:20:57.579Z · LW · GW

I’ve worked closely with CFAR since it’s founding in 2012, for varying degrees of closely (ranging from ~25 hrs/week to ~60 hrs/week). My degree of involvement in CFAR’s high-level and mid-level strategic decisions has varied some, but at the moment is quite high, and is likely to continue to be quite high for at least the coming 12 months.

During work-type hours in which I’m not working for CFAR, my attention is mostly on MIRI on MIRI’s technical research. I do a good bit of work with MIRI (though I am not employed by MIRI -- I just do a lot of work with them), much of which also qualifies as CFAR work (e.g., running the AIRCS workshops and assisting with the MIRI hiring process; or hanging out with MIRI researchers who feel “stuck” about some research/writing/etc. type thing and want a CFAR-esque person to help them un-stick). I also do a fair amount of work with MIRI that does not much overlap CFAR (e.g. I am a MIRI board member).

Oliver remained confused after talking with me in April because in April I was less certain how involved I was going to be in upcoming strategic decisions. However, it turns out the answer was “lots.” I have a lot of hopes and vision for CFAR over the coming years, and am excited about hashing it out with Tim and others at CFAR, and seeing what happens as we implement; and Tim and others seem excited about this as well.

My attention oscillates some across the years between MIRI and CFAR, based partly on the needs of each organization and partly on e.g. there being some actual upsides to me taking a backseat under Pete as he (and Duncan and others) made CFAR into more of a functioning institution in ways I would’ve risked reflexively meddling with. But there has been much change in the landscape CFAR is serving, and it’s time I think for there to be much change also in e.g. our curriculum, our concept of “rationality”, our relationship to community, and how we run our internal processes -- and I am really excited to be able to be closely involved with CFAR this year, in alliance with Tim and others.

Comment by annasalamon on We run the Center for Applied Rationality, AMA · 2019-12-25T05:21:17.939Z · LW · GW

My guesses, in no particular order:

  • Being a first employee is pretty different from being in a middle-stage organization. In particular, the opportunity to shape what will come has an appeal that can I think rightly bring in folks who you can’t always get later. (Folks present base rates for various reference classes below; I don’t know if anyone has one for “founding” vs “later” in small organizations?)

    • Relatedly, my initial guess back in ~2013 (a year in) was that many CFAR staff members would “level up” while they were here and then leave, partly because of that level-up (on my model, they’d acquire agency and then ask if being here as one more staff member was or wasn’t their maximum-goal-hitting thing). I was excited about what we were teaching and hoped it could be of long-term impact to those who worked here a year or two and left, as well as to longer-term people.
  • I and we intentionally hired for diversity of outlook. We asked ourselves: “does this person bring some component of sanity, culture, or psychological understanding -- but especially sanity -- that is not otherwise represented here yet?” And this… did make early CFAR fertile, and also made it an unusually difficult place to work, I think. (If you consider the four founding members of me, Julia Galef, Val, and Critch, I think you’ll see what I mean.)

  • I don’t think I was very easy to work with. I don’t think I knew how to make CFAR a very easy place to work either. I was trying to go with inside views even where I couldn’t articulate them and… really didn’t know how to create a good interface between that and a group of people. Pete and Duncan helped us do otherwise more recently I think, and Tim and Adam and Elizabeth and Jack and Dan building on it more since, with the result that CFAR is much more of a place now (less of a continuous “each person having an existential crisis all the time” than it was for some in the early days; more of a plod of mundane work in a positive sense). (The next challenge here, which we hope to accomplish this year, is to create a place that still has place-ness, and also has more visibility into strategy.)

  • My current view is that being at workshops for too much of a year is actually really hard on a person, and maybe not-good. It mucks up a person’s codebase without enough chance for ordinary check-sums to sort things back to normal again afterward. Relatedly, my guess is also that while stints at CFAR do level a person up in certain ways (~roughly as I anticipated back in 2013), they unfortunately also risk harming a person in certain ways that are related to “it’s not good to live in workshops or workshop-like contexts for too many weeks/months in a row, even though a 4-day workshop is often helpful” (which I did not anticipate in 2013). (Basically: you want a bunch of normal day-to-day work on which to check whether your new changes actually work well, and to settle back into your deeper or more long-term self. The 2-3 week “MIRI Summer Fellows Program” (MSFP) has had… some great impacts in terms of research staff coming out of the program, but also most of our least stable people additionally came out of that. I believe that this year we’ll be experimentally replacing it with repeated shorter workshops; we’ll also be trying a different rest days pattern for staff, and sabbatical months, as well as seeking stability/robustness/continuity in more cultural and less formal ways.)

Comment by annasalamon on We run the Center for Applied Rationality, AMA · 2019-12-25T02:04:01.100Z · LW · GW

No; this would somehow be near-impossible in our present context in the bay, IMO; although Berkeley's REACH center and REACH panel are helpful here and solve part of this, IMO.

Comment by annasalamon on We run the Center for Applied Rationality, AMA · 2019-12-25T02:02:05.276Z · LW · GW

:) There's something good about "common sense" that isn't in "effective epistemics", though -- something about wanting not to lose the robustness of the ordinary vetted-by-experience functioning patterns. (Even though this is really hard, plausibly impossible, when we need to reach toward contexts far from those in which our experiences were based.)

Comment by annasalamon on We run the Center for Applied Rationality, AMA · 2019-12-25T02:01:17.346Z · LW · GW

To clarify: we're not joking about the need to get "what we do" and "what people think we do" more in alignment, via both communicating better and changing our organizational name if necessary. We put that on our "goals for 2020" list (both internally, and in our writeup). We are joking that CfBCSSS is an acceptable name (due to its length making it not-really-that).

(Eli works with us a lot but has been taking a leave of absence for the last few months and so didn't know that bit, but lots of us are not-joking about getting our name and mission clear.)

Comment by annasalamon on We run the Center for Applied Rationality, AMA · 2019-12-24T21:03:58.650Z · LW · GW

I quite like the open questions that Wei Dai wrote there, and I expect I'd find progress on those problems to be helpful for what I'm trying to do with CFAR. If I had to outline the problem we're solving from scratch, though, I might say:

  • Figure out how to:
    • use reason (and stay focused on the important problems, and remember “virtue of the void” and “lens that sees its own flaws, and be quick where you can) without
    • going nutso, or losing humane values, and while:
    • being able to coordinate well in teams.

Wei Dai’s open problems feel pretty relevant to this!

I think in practice this goal leaves me with subproblems such as:

  • How do we un-bottleneck “original seeing” / hypothesis-generation;
  • What is the “it all adds up to normality” skill based in; how do we teach it;
  • Where does “mental energy” come from in practice, and how can people have good relationships to this;
  • What’s up with people sometimes seeming self-conscious/self-absorbed (in an unfortunate, slightly untethered way) and sometimes seeming connected to “something to protect” outside themselves?
    • It seems to me that “something to protect” makes people more robustly mentally healthy. Is that true? If so why? Also how do we teach it?
  • Why is it useful to follow “spinning plates” (objects that catch your interest for their own sake) as well as “hamming questions”? What’s the relationship between those two? (I sort of feel like they’re two halves of the same coin somehow? But I don’t have a model.)
  • As well as more immediately practical questions such as: How can a person do “rest days” well. What ‘check sums’ are useful for noticing when something breaks as you’re mucking with your head. Etc.
Comment by annasalamon on We run the Center for Applied Rationality, AMA · 2019-12-24T20:25:55.022Z · LW · GW

Here’s a very partial list of blog post ideas from my drafts/brainstorms folder. Outside view, though, if I took the time to try to turn these in to blog posts, I’d end up changing my mind about more than half of the content in the process of writing it up (and then would eventually end up with blog posts with somewhat different these).

I’m including brief descriptions with the awareness that my descriptions may not parse at this level of brevity, in the hopes that they’re at least interesting teasers.

Contra-Hodgel

  • (The Litany of Hodgell says “That which can be destroyed by the truth should be”. Its contrapositive therefore says: “That which can destroy [that which should not be destroyed] must not be the full truth.” It is interesting and sometimes-useful to attempt to use Contra-Hodgel as a practical heuristic: “if adopting belief X will meaningfully impair my ability to achieve good things, there must be some extra false belief or assumption somewhere in the system, since true beliefs and accurate maps should just help (e.g., if “there is no Judeo-Christian God” in practice impairs my ability to have good and compassionate friendships, perhaps there is some false belief somewhere in the system that is messing with that).

The 50/50 rule

  • The 50/50 rule is a proposed heuristic claiming that about half of all progress on difficult projects will come from already-known-to-be-project-relevant subtasks -- for example, if Archimedes wishes to determine whether the king’s crown is unmixed gold, he will get about half his progress from diligently thinking about this question (plus subtopics that seem obviously and explicitly relevant to this question). The other half of progress on difficult projects (according to this heuristic) will come from taking an interest in the rest of the world, including parts not yet known to be related to the problem at hand -- in the Archimedes example, from Archimedes taking an interest in what happens to his bathwater.
  • Relatedly, the 50/50 rule estimates that if you would like to move difficult projects forward over long periods of time, it is often useful to spend about half of your high-energy hours on “diligently working on subtasks known-to-be-related to your project”, and the other half taking an interest in the world.

Make New Models, but Keep The Old

  • “... one is silver and the other’s gold.”
  • A retelling of: it all adds up to normality.

On Courage and Believing In.

  • Beliefs are for predicting what’s true. “Believing in”, OTOH, is for creating a local normal that others can accurately predict. For example: “In America, we believe in driving on the right hand side of the road” -- thus, when you go outside and look to predict which way people will be driving, you can simply predict (believe) that they’ll be driving on the right hand side.
  • Analogously, if I decide I “believe in” [honesty, or standing up for my friends, or other such things], I create an internal context in which various models within me can predict that my future actions will involve [honesty, or standing up for my friends, or similar].
  • It’s important and good to do this sometimes, rather than having one’s life be an accidental mess with nobody home choosing. It’s also closely related to courage.

Ethics for code colonies

  • If you want to keep caring about people, it makes a lot of sense to e.g. take the time to put your shopping cart back where it goes, or at minimum not to make up excuses about how your future impact on the world makes you too important to do that.
  • In general, when you take an action, you summon up black box code-modification that takes that action (and changes unknown numbers of other things). Life as a “code colony” is tricky that way.
  • Ethics is the branch of practical engineering devoted to how to accomplish things with large sets of people over long periods of time -- or even with one person over a long period of time in a confusing or unknown environment. It’s the art of interpersonal and intrapersonal coordination. (I mean, sometimes people say “ethics” means “following this set of rules here”. But people also say “math” means “following this algorithm whenever you have to divide fractions” or whatever. And the underneath-thing with ethics is (among other things, maybe) interpersonal and intra-personal coordination, kinda like how there’s an underneath-thing with math that is where those rules come from.)
  • The need to coordinate in this way holds just as much for consequentialists or anyone else.
  • It's kinda terrifying to be trying to do this without a culture. Or to be not trying to do this (still without a culture).

The explicit and the tacit (elaborated a bit in a comment in this AMA; but there’s room for more).

Cloaks, Questing, and Cover Stories

  • It’s way easier to do novel hypothesis-generation if you can do it within a “cloak”, without making any sort of claim yet about what other people ought to believe. (Teaching this has been quite useful on a practical level for many at AIRCS, MSFP, and instructor trainings -- seems worth seeing if it can be useful via text, though that’s harder.)

Me-liefs, We-liefs, and Units of Exchange

  • Related to “cloaks and cover stories” -- we have different pools of resources that are subject to different implicit contracts and commitments. Not all Bayesian evidence is judicial or scientific evidence, etc.. A lot of social coordination works by agreeing to only use certain pools of resources in agreement with certain standards of evidence / procedure / deference (e.g., when a person does shopping for their workplace they follow their workplace’s “which items to buy” procedures; when a physicist speaks to laypeople in their official capacity as a physicist, they follow certain procedures so as to avoid misrepresenting the community of physicists).
  • People often manage this coordination by changing their beliefs (“yes, I agree that drunk driving is dangerous -- therefore you can trust me not to drink and drive”). However, personally I like the rule “beliefs are for true things -- social transactions can make my requests of my behaviors but not of my beliefs.” And I’ve got a bunch of gimmicks for navigating the “be robustly and accurately seen as prosocial” without modifying one’s beliefs (“In my driving, I value cooperating with the laws and customs so as to be predictable and trusted and trustworthy in that way; and drunk driving is very strongly against our customs -- so you can trust me not to drink and drive.”)

How the Tao unravels

  • A book review of part of CS Lewis’s book “The abolition of man.” Elaborates CS Lewis’s argument that in postmodern times, people grab hold of part of humane values and assert it in contradiction with other parts of humane values, which then assert back the thing that they’re holding and the other party is missing, and then things fragment further and further. Compares Lewis’s proposed mechanism with how cultural divides have actually been going in the rationality and EA communities over the last ten years.
Comment by annasalamon on We run the Center for Applied Rationality, AMA · 2019-12-24T18:22:09.551Z · LW · GW

Examples of some common ways that people sometimes find Singularity scenarios disorienting:

When a person loses their childhood religion, there’s often quite a bit of bucket error. A person updates on the true fact “Jehovah is not a good explanation of the fossil record” and accidentally confuses that true fact with any number of other things, such as “and so I’m not allowed to take my friends’ lives and choices as real and meaningful.”

I claimed above that “coming to take singularity scenarios seriously” seems in my experience to often cause even more disruption / bucket errors / confusions / false beliefs than does “losing a deeply held childhood religion.” I’d like to elaborate on that here by listing some examples of the kinds of confusions/errors I often encounter.

None of these are present in everyone who encounters Singularity scenarios, or even in most people who encounter it. Still, each confusion below is one where I’ve seen it or near-variants of it multiple times.

Also note that all of these things are “confusions”, IMO. People semi-frequently have them at the beginning and then get over them. These are not the POV I would recommend or consider correct -- more like the opposite -- and I personally think each stems from some sort of fixable thinking error.)

  • The imagined stakes in a singularity are huge. Common confusions related to this:
    • Confusion about whether it is okay to sometimes spend money/time/etc. on oneself, vs. having to give it all to attempting to impact the future.
    • Confusion about whether one wants to take in singularity scenarios, given that then maybe one will “have to” (move across the country / switch jobs / work all the time / etc.)
    • Confusion about whether it is still correct to follow common sense moral heuristics, given the stakes.
    • Confusion about how to enter “hanging out” mode, given the stakes and one’s panic. (“Okay, here I am at the beach with my friends, like my todo list told me to do to avoid burnout. But how is it that I used to enjoy their company? They seem to be making meaningless mouth-noises that have nothing to do with the thing that matters…”)
    • Confusion about how to take an actual normal interest in one’s friends’ lives, or one’s partner’s lives, or one’s Lyft drivers’ lives, or whatever, given that within the person’s new frame, the problems they are caught up in seem “small” or “irrelevant” or to have “nothing to do with what matters”.
  • The degrees of freedom in “what should a singularity maybe do with the future?” are huge. And people are often morally disoriented by that part. Should we tile the universe with a single repeated mouse orgasm, or what?
    • Are we allowed to want humans and ourselves and our friends to stay alive? Is there anything we actually want? Or is suffering bad without anything being better-than-nothing?
    • If I can’t concretely picture what I’d do with a whole light-cone (maybe because it is vastly larger than any time/money/resources I’ve ever personally obtained feedback from playing with) -- should I feel that the whole future is maybe meaningless and no good?
  • The world a person finds themselves in once they start taking Singularity scenarios seriously is often quite different from what the neighbors think, which itself can make things hard
    • Can I have a “real” conversation with my friends? Should I feel crazy? Should I avoid taking all this in on a visceral level so that I’ll stay mentally in the same world as my friends?
    • How do I keep regarding other peoples’ actions as good and reasonable? The imagined scales are very large, with the result that one can less assume “things are locally this way” is an adequate model.
    • Given this, should I get lost in “what about simulations / anthropics” to the point of becoming confused about normal day-today events?
  • In order to imagine this stuff, folks need to take seriously reasoning that is neither formal mathematics, nor vetted by the neighbors or academia, nor strongly based in empirical feedback loops.
    • Given this, shall I go ahead and take random piles of woo seriously also?

There are lots more where these came from, but I’m hoping this gives some flavor, and makes it somewhat plausible why I’m claiming that “coming to take singularity scenarios seriously can be pretty disruptive to common sense," and such that it might be nice to try having a "bridge" that can help people lose less of the true parts of common sense as their world changes (much as it might be nice for someone who has just lost their childhood religion to have a bridge to "okay, here are some other atheists, and they don't think that God is why they should get up in the morning and care about others, but they do still seem to think they should get up in the morning and care about others").

Comment by annasalamon on We run the Center for Applied Rationality, AMA · 2019-12-24T17:47:54.530Z · LW · GW

My closest current stab is that we’re the “Center for Bridging between Common Sense and Singularity Scenarios.” (This is obviously not our real name. But if I had to grab a handle that gestures at our raison d’etre, at the moment I’d pick this one. We’ve been internally joking about renaming ourselves this for some months now.)

To elaborate: thinking about singularity scenarios is profoundly disorienting (IMO, typically worse than losing a deeply held childhood religion or similar). Folks over and over again encounter similar failure modes as they attempt this. It can be useful to have an institution for assisting with this -- collecting concepts and tools that were useful for previous waves who’ve attempted thought/work about singularity scenarios, and attempting to pass them on to those who are currently beginning to think about such scenarios.

Relatedly, the pattern of thinking required for considering AI risk and related concepts at all is pretty different from the patterns of thinking that suffice in most other contexts, and it can be useful to have a group that attempts to collect these and pass them forward.

Further, it can be useful to figure out how the heck to do teams and culture in a manner that can withstand the disruptions that can come from taking singularity scenarios seriously.

So, my best current angle on CFAR is that we should try to be a place that can help people through these standard failure modes -- a place that can try to answer the question “how can we be sane and reasonable and sensible and appropriately taking-things-seriously in the face of singularity scenarios,” and can try to pass on our answer, and can notice and adjust when our answer turns out to be invalid.

To link this up with our concrete activities:

AIRCS workshops / MSFP:

  • Over the last year, about half our staff workshop-days went into attempting to educate potential AI alignment researchers. These programs were co-run with MIRI. Workshops included a bunch about technical AI content; a bunch of practice thinking through “is there AI risk” and “how the heck would I align a superintelligence” and related things; and a bunch of discussion of e.g. how to not have “but the stakes are really really big” accidentally overwhelm one’s basic sanity skills (and other basic pieces of how to not get too disoriented).
  • Many program alumni attended multiple workshops, spaced across time, as part of a slow acculturation process: stare at AI risk; go back to one’s ordinary job/school context for some months while digesting in a back-burner way; repeat.
  • These programs aim at equipping people to contribute to Al alignment technical work at MIRI and elsewhere; in the last two years it’s helped educate a sizable number of MIRI hires and smaller but still important number of others (brief details in our 2019 progress report; more details coming eventually). People sometime try to gloss the impact of AIRCS as “outreach” or “causing career changes,” but, while I think it does in fact fulfill CEA-style metrics, that doesn’t seem to me like a good way to see its main purple -- helping folks feel their way toward being more oriented and capable around these topics in general, in a context where other researchers have done or are doing likewise.
  • They seem like a core activity for a “Center for bridging between common sense and singularity scenarios” -- both in that they tell us more about what happens when folks encounter AI risk, and in that they let us try to use what we think we know for good. (We hope it’s “good.”)

Mainline workshops, alumni reunions, alumni workshops unrelated to AI risk, etc.:

  • We run mainline workshops (which many people just call “CFAR workshops”), alumni reunions, and some topic-specific workshops for alumni that have nothing to do with AI risk (e.g., a double crux workshop). Together, this stuff constituted about 30% of our staff workshop-days over the last two years.
  • The EA crowd often asks me why we run these. (“Why not just run AI safety workshops, since that is the part of your work that has more shot at helping large numbers of people?”) The answer is that when I imagine removing the mainline workshops, CFAR begins to feel like a table without enough legs -- unstable, liable to work for awhile but then fall over, lacking enough contact with the ground.
  • More concretely: we’re developing and spreading a nonstandard mental toolkit (inner sim, double crux, Gendlin’s Focusing, etc.). That’s a tricky and scary thing to do. It’s really helpful to get to try it on a variety of people -- especially smart, thoughtful, reflective, articulate people who will let us know what seems like a terrible idea, or what causes them help in their lives, or disruption in their lives. The mainline workshops (plus follow-up sessions, alumni workshops, alumni reunions, etc.) let us develop this alleged “bridge” between common sense and singularity scenarios in a way that avoids overfitting it all to just “AI alignment work.” Which is basically to say that they let us develop and test our models of “applied rationality”.

“Sandboxes” toward trying to understand how to have a healthy culture in contact with AI safety:

  • I often treat the AIRCS workshops as “sandboxes”, and try within them to create small temporary “cultures” in which we try to get research to be able to flourish, or try to get people to be able to both be normal humans and slowly figure out how to approach AI alignment, or whatever. I find them a pretty productive vehicle for trying to figure out the “social context” thing, and not just the “individual thinking habits” thing. I care about this experimentation-with-feedback because I want MIRI and other longer-term teams to eventually have the right cultural base.

Our instructor training program, and our attempt to maintain a staff who is skilled at seeing what cognitive processes are actually running in people:

  • There’s a lot of trainable, transferable skill to seeing what people are thinking. CFAR staff have a bunch of this IMO, and we seem to me to be transferring a bunch of it to the instructor candidates too. We call it “seeking PCK”.
  • The “seeking PCK” skillset is obviously helpful for learning to “bridge between common sense and singularity scenarios” -- it helps us see what the useful patterns folks have are, and what the not-so-useful patterns folks have are, and what exactly is happening as we attempt to intervene (so that we can adjust our interventions).
  • Thus, improving and maintaining the “seeking PCK” skillset probably makes us faster at developing any other curriculum.
  • More mundanely, of course, instructor training also gives us guest instructors who can help us run workshops -- many of whom are also out and about doing other interesting things, and porting wisdom/culture/data back and forth between those endeavors and our workshops.

To explain what “bridging betwen common sense and singularity scenarios” has to do with “applied rationality” and the LW Sequences and so on:

  • The farther off you need to extrapolate, the more you need reasoning (vs being able to lean on either received wisdom, or known data plus empirical feedback loops). And singularity scenarios sure are far from the everyday life our heuristics are developed for, so singularity scenarios benefit more than most from trying to be the lens that sees its flaws, and from Sequences-style thinking more broadly.
Comment by annasalamon on We run the Center for Applied Rationality, AMA · 2019-12-22T05:16:41.546Z · LW · GW

Re: 1—“Forked codebases that have a lot in common but are somewhat tricky to merge” seems like a pretty good metaphor to me.

The question I'd like to answer that is near your questions is: "What is the minimal patch/bridge that will let us use all of both codebases without running into merge conflicts?"

We do have a candidate answer to this question, which we’ve been trying out at AIRCS to reasonable effect. Our candidate answer is something like: an explicit distinction between “tacit knowledge” (inarticulate hunches, early-stage research intuitions, the stuff people access and see in one another while circling, etc.) and the “explicit” (“knowledge” worthy of the name, as in the LW codebase—the thing I believe Ben Pace is mainly gesturing at in his comment above).

Here’s how we explain it at AIRCS:

  • By “explicit” knowledge, we mean visible-to-conscious-consideration denotative claims that are piecewise-checkable and can be passed explicitly between humans using language.
    • Example: the claim “Amy knows how to ride a bicycle” is explicit.
  • By “tacit” knowledge, we mean stuff that allows you to usefully navigate the world (and so contains implicit information about the world, and can be observationally evaluated for how well people seem to navigate the relevant parts of the world when they have this knowledge) but is not made of explicit denotations that can be fully passed verbally between humans.
    • Example: however the heck Amy actually manages to ride the bicycle (the opaque signals she sends to her muscles, etc.) is in her own tacit knowledge. We can know explicitly “Amy has sufficient tacit knowledge to balance on a bicycle,” but we cannot explicitly track how she balances, and Amy cannot hand her bicycle-balancing ability to Bob via speech (although speech may help). Relatedly, Amy can’t check the individual pieces of her (opaque) motor patterns to figure out which ones are the principles by which she successfully stays up and which are counterproductive superstition.
  • I’ll give a few more examples to anchor the concepts:
    • In mathematics:
      • Explicit: which things have been proven; which proofs are valid.
      • Tacit: which heuristics may be useful for finding proofs; which theorems are interesting/important. (Some such heuristics can be stated explicitly, but I wouldn't call those statements “knowledge.” I can't verify that they're right in the way I can verify “Amy can ride a bike” or “2+3=5.”)
    • In science:
      • Explicit: specific findings of science, such as “if you take a given amount of hydrogen and decrease its volume by half, you double its pressure." The “experiment” and “conclusion” steps of the scientific method.
      • Tacit: which hypotheses are worth testing.
    • In Paul Graham-style startups:
      • Explicit: what metrics one is hitting, once one achieves an MVP.
      • Tacit: the way Graham’s small teams of cofounders are supposed to locate their MVP. (In calling this “tacit,” I don’t mean you can’t communicate any of this verbally. Of course they use words. But they way they use words is made of ad hoc spaghetti-code bits of attempt to get gut intuitions back and forth between a small set of people who know each other well. It is quite different from the scalable processes of explicit science/knowledge that can compile across large sets of people and long periods of time. This is why Graham claims that co-founder teams should have 2-4 people, and that if you hire e.g. 10 people to a pre-MVP startup, it won’t scale well.)

In the context of the AIRCS workshop, we share “The Tacit and the Explicit” in order to avoid two different kinds of errors:

  • People taking “I know it in my gut” as zero-value, and attempting to live via the explicit only. My sense is that some LessWrong users like Said_Achmiz tend to err in this direction. (This error can be fatal to early-stage research, and to one’s ability to discuss ordinary life/relationship/productivity “bugs” and solutions, and many other mundanely useful topics.)
  • People taking “I know it in my gut” as vetted knowledge, and attempting to build on gut feelings in the manner of knowledge. (This error can be fatal to global epistemology: “but I just feel that religion is true / the future can’t be that weird / whatever”).

We find ourselves needing to fix both those errors in order to allow people to attempt grounded original thinking about AI safety. They need to be able to have intuitions, and take those intuitions seriously enough to develop them / test them / let them breathe, without mistaking those intuitions for knowledge.

So, at the AIRCS workshop, we introduce the explicit (which is a big part of what I take Ben Pace to be gesturing at above actually) at the same time that we introduce the tacit (which is the thing that Ben Pace describes benefiting from at CFAR IMO). And we introduce a framework to try to keep them separate so that learning cognitive processes that help with the tacit will not accidentally mess with folks’ explicit, nor vice versa. (We’ve been introducing this framework at AIRCS for about a year, and I do think it’s been helpful. I think it’s getting to the point where we could try writing it up for LW—i.e., putting the framework more fully into the explicit.)

Comment by annasalamon on We run the Center for Applied Rationality, AMA · 2019-12-22T05:14:09.435Z · LW · GW

With regard to whether our staff has read the sequences: five have, and have been deeply shaped by them; two have read about a third, and two have read little. I do think it’s important that our staff read them, and we decided to run this experiment with sabbatical months next year in part to ensure our staff had time to do this over the coming year.

Comment by annasalamon on We run the Center for Applied Rationality, AMA · 2019-12-22T04:25:33.617Z · LW · GW

I agree very much with what Duncan says here. I forgot I need to point that kind of thing out explicitly. But a good bit of my soul-effort over the last year has gone into trying to inhabit the philosophical understanding of the world that can see as possibilities (and accomplish!) such things as integrity, legibility, accountability, and creating structures that work across time and across multiple people. IMO, Duncan had a lot to teach me and CFAR here; he is one of the core models I go to when I try to understand this, and my best guess is that it is in significant part his ability to understand and articulate this philosophical pole (as well as to do it himself) that enabled CFAR to move from the early-stage pile of un-transferrable "spaghetti code" that we were when he arrived, to an institution with organizational structure capable of e.g. hosting instructor trainings and taking in and making use of new staff.

Comment by annasalamon on We run the Center for Applied Rationality, AMA · 2019-12-22T03:36:55.917Z · LW · GW

I wish someone would create good bay area community health. It isn't our core mission; it doesn't relate all that directly to our core mission; but it relates to the background environment in which CFAR and quite a few other organizations may or may not end up effective.

One daydream for a small institution that might help some with this health is as follows:

  1. Somebody creates the “Society for Maintaining a Very Basic Standard of Behavior”;
  2. It has certain very basic rules (e.g. “no physical violence”; “no doing things that are really about as over the line as physical violence according to a majority of our anonymously polled members”; etc.)
  3. It has an explicit membership list of folks who agree to both: (a) follow these rules; and (b) ostracize from “community events” (e.g. parties to which >4 other society members are invited) folks who are in bad standing with the society (whether or not they personally think those members are guilty).
  4. It has a simple, legible, explicitly declared procedure for determining who has/hasn’t entered bad standing (e.g.: a majority vote of the anonymously polled membership of the society; or an anonymous vote of a smaller “jury” randomly chosen from the society).

Benefits I’m daydreaming might come from this institution:

  • A. If the society had large membership, bad actors could be ostracized from larger sections of the community, and with more simplicity and less drama.
  • B. Also, we could do that while imposing less restraint on individual speech, which would make the whole thing less creepy. Like, if many many people thought person B should be exiled, and person A wanted to defer but was not herself convinced, she could: (a) defer explicitly, while saying that’s what she was doing; and meanwhile (b) speak her mind without worrying that she would destabilize the community’s ability to ever coordinate.
Comment by annasalamon on We run the Center for Applied Rationality, AMA · 2019-12-22T02:51:49.904Z · LW · GW

My rough guess is “we survived; most of the differences I could imagine someone fearing didn’t come to pass”. My correction on that rough guess is: “Okay, but insofar as Duncan was the main holder of certain values, skills, and virtues, it seems pretty plausible that there are gaps now today that he would be able to see and that we haven’t seen”.

To be a bit more specific: some of the poles I noticed Duncan doing a lot to hold down while he was here were:

  • Institutional accountability and legibility;
  • Clear communication with staff; somebody caring about whether promises made were kept; somebody caring whether policies were fair and predictable, and whether the institution was creating a predictable context where staff, workshop participants, and others wouldn’t suddenly experience having the rug pulled out from under them;
  • Having the workshop classes start and end on time; (I’m a bit hesitant to name something this “small-seeming” here, but it is a concrete policy that supported the value above, and it is easier to track)
  • Revising the handbook into a polished state;
  • Having the workshop classes make sense to people, have clear diagrams and a clear point, etc.; having polish and visible narrative and clear expectations in the workshop;

AFAICT, these things are doing… alright in the absence of Duncan (due partly to the gradual accumulation of institutional knowledge), though I can see his departure in the organization. AFAICT also, Duncan gave me a good chunk of model of this stuff sometime after his Facebook post, actually -- and worked pretty hard on a lot of this before his departure too. But I would not fully trust my own judgment on this one, because the outside view is that people (in this case, me) often fail to see what they cannot see.

When I get more concrete:

  • Institutional accountability and legibility is I think better than it was;
  • Clean communication with staff, keeping promises, creating clear expectations, etc. -- better on some axes and worse on others -- my non-confident guess is better overall (via some loss plus lots of work);
  • Classes starting and ending on time -- at mainlines: slightly less precise class-timing but not obviously worse thereby; at AIRCS, notable decreases, with some cost;
  • Handboook revisions -- have done very little since he left;
  • Polish and narrative cohesion in the workshop classes -- it’s less emphasized but not obviously worse thereby IMO, due partly to the infusion of the counterbalancing “original seeing” content from Brienne that was perhaps easier to pull off via toning polish down slightly. Cohesion and polish still seem acceptable, and far far better than before Duncan arrived.

Also: I don’t know how to phrase this tactfully in a large public conversation. But I appreciate Duncan’s efforts on behalf of CFAR; and also he left pretty burnt out; and also I want to respect what I view as his own attempt to disclaim responsibility for CFAR going forward (via that Facebook post) so that he won’t have to track whether he may have left misleading impressions of CFAR’s virtues in people. I don’t want our answers here to mess that up. If you come to CFAR and turn out not to like it, it is indeed not Duncan’s fault (even though it is still justly some credit to Duncan if you do, since we are still standing on the shoulders of his and many others’ past work).

Comment by annasalamon on We run the Center for Applied Rationality, AMA · 2019-12-21T20:57:15.144Z · LW · GW

I'll interpret this as "Which of the rationalist virtues do you think CFAR has gotten the most mileage from your practicing".

The virtue of the void. Hands down. Though I still haven't done it nearly as much as it would be useful to do it. Maybe this year?

If I instead interpret this as "Which of the rationalist virtues do you spend the most minutes practicing": curiosity. Which would be my runner-up for "CFAR got the most mileage from my practicing".

Comment by annasalamon on We run the Center for Applied Rationality, AMA · 2019-12-21T18:10:28.904Z · LW · GW

I, too, believe that Critch played a large and helpful role here.

Comment by annasalamon on We run the Center for Applied Rationality, AMA · 2019-12-21T14:16:54.136Z · LW · GW

Ben Pace writes:

In recent years, when I've been at CFAR events, I generally feel like at least 25% of attendees probably haven't read The Sequences, aren't part of this shared epistemic framework, and don't have any understanding of that law-based approach, and that they don't have a felt need to cache out their models of the world into explicit reasoning and communicable models that others can build on.

The “many alumni haven't read the Sequences” part has actually been here since very near the beginning (not the initial 2012 minicamps, but the very first paid workshops of 2013 and later). (CFAR began in Jan 2012.) You can see it in our old end-of-2013 fundraiser post, where we wrote “Initial workshops worked only for those who had already read the LW Sequences. Today, workshop participants who are smart and analytical, but with no prior exposure to rationality -- such as a local politician, a police officer, a Spanish teacher, and others -- are by and large quite happy with the workshop and feel it is valuable.” We didn't name this explicitly in that post, but part of the hope was to get the workshops to work for a slightly larger/broader/more cognitively diverse set than the set who for whom the original Sequences in their written form tended to spontaneously "click".

As to the “aren’t part of this shared epistemic framework” -- when I go to e.g. the alumni reunion, I do feel there are basic pieces of this framework at least that I can rely on. For example, even on contentious issues, 95%+ of alumni reunion participants seem to me to be pretty good at remembering that arguments should not be like soldiers, that beliefs are for true things, etc. -- there is to my eyes a very noticable positive difference between the folks at the alumni reunion and unselected-for-rationality smart STEM graduate students, say (though STEM graduate students are also notably more skilled than the general population at this, and though both groups fall short of perfection).

Still, I agree that it would be worthwhile to build more common knowledge and [whatever the “values” analog of common knowledge is called] supporting “a felt need to cache out their models of the world into explicit reasoning and communicable models that others can build on” and that are piecewise-checkable (rather than opaque masses of skills that are useful as a mass but hard to build across people and time). This particular piece of culture is harder to teach to folks who are seeking individual utility, because the most obvious payoffs are at the level of the group and of the long-term process rather than at the level of the individual (where the payoffs to e.g. goal-factoring and murphyjitsu are located). It also pays off more in later-stage fields and less in the earliest stages of science within preparadigm fields such as AI safety, where it’s often about shower thoughts and slowly following inarticulate hunches. But still.

Comment by annasalamon on We run the Center for Applied Rationality, AMA · 2019-12-21T13:43:44.260Z · LW · GW

Agreed

Comment by annasalamon on We run the Center for Applied Rationality, AMA · 2019-12-21T13:42:41.379Z · LW · GW

Ben Pace writes:

“... The Gwerns and the Wei Dais and the Scott Alexanders of the world won't have learned anything from CFAR's exploration.”

I’d like to distinguish two things:

  1. Whether the official work activities CFAR staff are paid for will directly produce explicit knowledge in the manner valued by the Gwern etc.
  2. Whether that CFAR work will help educate people who later produce explicit knowledge themselves in the manner valued by Gwern etc., and who wouldn't have produced that knowledge otherwise.

#1 would be useful but isn’t our primary goal (though I think we’ve done more than none of it). #2 seems like a component of our primary goal to me (“scientists” or “producers of folks who can make knowledge in this sense” isn’t all we’re trying to produce, but it’s part of it), and is part of what I would like to see us strengthen over the coming years.

To briefly list our situation with respect to whether we are accomplishing #2 (according to me):

  • There are in fact a good number of AI safety scientists in particular who seem to me to produce knowledge of this type, and who give CFAR some degree of credit for their present tendency to do this.
  • On a milder level, while CFAR workshops do not themselves teach most of the Sequences’ skills (which would exceed four days in length, among other difficulties), we do try to nudge participants into reading the Sequences (by referencing them with respect at the workshop, by giving all mainline participants and AIRCS participants paper copies of “How to actually change your mind” and HPMOR, and by explicitly claiming they are helpful for various things).
  • At the same time, I do think we should make Sequences-style thinking a more explicit element of the culture spread by CFAR workshops, and of the culture that folks can take for granted at e.g. alumni reunions (although it is there nevertheless to quite an extent).

(I edited this slightly to make it easier to read after Kaj had already quoted from it.)

Comment by annasalamon on We run the Center for Applied Rationality, AMA · 2019-12-21T11:14:44.867Z · LW · GW

This is my favorite question of the AMA so far (I said something similar aloud when I first read it, which was before it got upvoted quite this highly, as did a couple of other staff members). The things I personally appreciate about your question are: (1) it points near a core direction that CFAR has already been intending to try moving toward this year (and probably across near-subsequent years; one year will not be sufficient); and (2) I think you asking it publicly in this way (and giving us an opportunity to make this intention memorable and clear to ourselves, and to parts of the community that may help us remember) will help at least some with our moving there.

Relatedly, I like the way you lay out the concepts.

Your essay (I mean, “question”) is rather long, and has a lot of things in it; and my desired response sure also has a lot of things in it. So I’m going to let myself reply via many separate discrete small comments because that’s easier.

(So: many discrete small comments upcoming.)