Posts

Rationality Research Report: Towards 10x OODA Looping? 2024-02-24T21:06:38.703Z
Exercise: Planmaking, Surprise Anticipation, and "Baba is You" 2024-02-24T20:33:49.574Z
Things I've Grieved 2024-02-18T19:32:47.169Z
CFAR Takeaways: Andrew Critch 2024-02-14T01:37:03.931Z
Skills I'd like my collaborators to have 2024-02-09T08:20:37.686Z
"Does your paradigm beget new, good, paradigms?" 2024-01-25T18:23:15.497Z
Universal Love Integration Test: Hitler 2024-01-10T23:55:35.526Z
2022 (and All Time) Posts by Pingback Count 2023-12-16T21:17:00.572Z
Raemon's Deliberate (“Purposeful?”) Practice Club 2023-11-14T18:24:19.335Z
Hiring: Lighthaven Events & Venue Lead 2023-10-13T21:02:33.212Z
"The Heart of Gaming is the Power Fantasy", and Cohabitive Games 2023-10-08T21:02:33.526Z
Related Discussion from Thomas Kwa's MIRI Research Experience 2023-10-07T06:25:00.994Z
Thomas Kwa's MIRI research experience 2023-10-02T16:42:37.886Z
Feedback-loops, Deliberate Practice, and Transfer Learning 2023-09-07T01:57:33.066Z
Open Thread – Autumn 2023 2023-09-03T22:54:42.259Z
The God of Humanity, and the God of the Robot Utilitarians 2023-08-24T08:27:57.396Z
Book Launch: "The Carving of Reality," Best of LessWrong vol. III 2023-08-16T23:52:12.518Z
Feedbackloop-first Rationality 2023-08-07T17:58:56.349Z
Private notes on LW? 2023-08-04T17:35:37.917Z
Exercise: Solve "Thinking Physics" 2023-08-01T00:44:48.975Z
Rationality !== Winning 2023-07-24T02:53:59.764Z
Announcement: AI Narrations Available for All New LessWrong Posts 2023-07-20T22:17:33.454Z
What are the best non-LW places to read on alignment progress? 2023-07-07T00:57:21.417Z
My "2.9 trauma limit" 2023-07-01T19:32:14.805Z
Automatic Rate Limiting on LessWrong 2023-06-23T20:19:41.049Z
Open Thread: June 2023 (Inline Reacts!) 2023-06-06T07:40:43.025Z
Worrying less about acausal extortion 2023-05-23T02:08:18.900Z
Dark Forest Theories 2023-05-12T20:21:49.052Z
[New] Rejected Content Section 2023-05-04T01:43:19.547Z
Tuning your Cognitive Strategies 2023-04-27T20:32:06.337Z
"Rate limiting" as a mod tool 2023-04-23T00:42:58.233Z
LessWrong moderation messaging container 2023-04-22T01:19:00.971Z
Moderation notes re: recent Said/Duncan threads 2023-04-14T18:06:21.712Z
LW Team is adjusting moderation policy 2023-04-04T20:41:07.603Z
Abstracts should be either Actually Short™, or broken into paragraphs 2023-03-24T00:51:56.449Z
Tabooing "Frame Control" 2023-03-19T23:33:10.154Z
Dan Luu on "You can only communicate one top priority" 2023-03-18T18:55:09.998Z
"Carefully Bootstrapped Alignment" is organizationally hard 2023-03-17T18:00:09.943Z
Prizes for the 2021 Review 2023-02-10T19:47:43.504Z
Robin Hanson on "Explaining the Sacred" 2023-02-06T00:50:58.490Z
I don't think MIRI "gave up" 2023-02-03T00:26:07.552Z
Voting Results for the 2021 Review 2023-02-01T08:02:06.744Z
Highlights and Prizes from the 2021 Review Phase 2023-01-23T21:41:21.948Z
Compounding Resource X 2023-01-11T03:14:08.565Z
Review AI Alignment posts to help figure out how to make a proper AI Alignment review 2023-01-10T00:19:23.503Z
The 2021 Review Phase 2023-01-05T07:12:24.251Z
What percent of people work in moral mazes? 2023-01-01T04:33:43.890Z
Recursive Middle Manager Hell 2023-01-01T04:33:29.942Z
The Meditation on Winter 2022-12-25T16:12:10.039Z
The True Spirit of Solstice? 2022-12-19T08:00:30.273Z

Comments

Comment by Raemon on CFAR Takeaways: Andrew Critch · 2024-02-28T00:49:40.002Z · LW · GW

Yeah I don't know that I disagree with it (I think I maybe believe it less strongly than you atm, but it seems like a reasonable take)

Comment by Raemon on Exercise: Planmaking, Surprise Anticipation, and "Baba is You" · 2024-02-27T01:53:56.213Z · LW · GW

To clarify slightly more: I think it's fine to a do a hard level you haven't beaten before, even if you've played it.

Comment by Raemon on Open Thread – Winter 2023/2024 · 2024-02-27T01:52:54.805Z · LW · GW

They are just a renaming of "shortform", with some new UI. "Quick Take" sort of conveyed what we were actually going for which is more like "you wrote it down quickly" than "it was literally short".

Comment by Raemon on Exercise: Planmaking, Surprise Anticipation, and "Baba is You" · 2024-02-26T23:38:46.472Z · LW · GW

One note: custom levels now exist and you can go browse them directly even if you've beaten the game.

I do agree that this exercise, as-worded, probably nudges towards a flavor of "explicit thinking", which I don't think is even necessarily the best strategy for Baba is You overall.

I don't think this exercise necessarily says "think explicitly" – the section on metacognitive brainstorming is meant to fuzzy/experiential/"go-take-a-shower"/"meditate" style options.

Comment by Raemon on Exercise: Planmaking, Surprise Anticipation, and "Baba is You" · 2024-02-26T23:16:20.206Z · LW · GW

Quick note that I have another exercise in the works about the beginning of Patrick's Parabox, but after having investigated more I think the rest of the game doesn't hold up for my purposes.

I like your breakdown of why Baba is You fits exactly here. 

I do think most puzzle games lend themselves to some kind of rationality exercise, but not necessarily this one.

Comment by Raemon on Exercise: Planmaking, Surprise Anticipation, and "Baba is You" · 2024-02-26T23:06:19.028Z · LW · GW

I haven't beaten every level in the game, but I don't have access to any levels that I haven't played before, because the reason I stopped playing was that I had already tried and failed every remaining available level.

(Though I suppose I could cheat and look up the solution for where I got stuck...)

I'm not sure I understand. If you have levels leftover that you haven't beaten beacuse they were too hard, I think this is still a fine exercise (the fact that "it's hard" isn't a crux for me. I do think it's doable, and I think the constraints of the exercise probably help about as much as they hinder. 

(You might not succeed at doing succeeding within three tries of one-shotting, but I think you're more likely to go on to beat the level afterwards, and still learn something from it)

Comment by Raemon on Exercise: Planmaking, Surprise Anticipation, and "Baba is You" · 2024-02-26T22:46:53.756Z · LW · GW

Nod, these instructions were generated for people doing early levels but I agree about how later levels feel.

Comment by Raemon on Rationality Research Report: Towards 10x OODA Looping? · 2024-02-26T01:57:09.198Z · LW · GW

Both of these thoughts are pretty interesting, thanks.

I'd be interested in hearing a bunch more detail about how you trained decision theory and how that went. (naively this sounds like overkill to me, or "not intervening at the best level", but I'm quite interested in what sort of exercises you did and how people responded to them)

re: "how useful is planning", I do think this is specifically useful if you have deep, ambitious goals, without well established practices. (i.e. Rationality !== Winning in General).  

Comment by Raemon on Rationality Research Report: Towards 10x OODA Looping? · 2024-02-25T20:18:36.049Z · LW · GW

(This does all have implications in what sort of ML training regimes I'd expect to produce a general mind, although I think that's, like, bad and you shouldn't do it. Also it does still look like ML is still bottlenecked more on something like 'g' than something like 's' at the moment).

Comment by Raemon on Rationality Research Report: Towards 10x OODA Looping? · 2024-02-25T20:16:34.428Z · LW · GW

I think two main threads here:

  1. I think I just have tried to learn 'how to think on purpose', and have basically succeeded (like, somewhat, not necessarily amazingly, but enough to know there's a "there" there)
  2. Even in the world where skills don't transfer, some skills seem just useful in more places, or in "more useful places."

Re: 1

Most of the time, I'm not thinking strategically, I'm just doing some sort of pattern-matchy-find-the-nearest-reasonable-thing-to-do-and-then-do-it. My current guess is this is what most people (and, probably, ML algorithms?) are doing most of the time.

But, there's clusters of habits that seem pretty useful for solving novel problems, like asking:

  1. What is my goal here?
  2. what seem like the main inputs into that goal?
  3. what resources are available that compound?
  4. original seeing on the stimuli I'm looking at
  5. what skills are required here? what subskills make them up? what's the skill-tree?
  6. what would give me good feedbackloops for gaining those subskills, or, checking if I'm making progress towards my goal?

Each of those feel like "skills" to me, which I've practiced and cultivated, and once cultivated, can be chained into habits. 

Re: 2

If you learn to play piano, I'd expect some weak transfer into: hand-finger coordination, understanding chord progression / musical structure, etc. If you learn a couple different instruments you probably have an easier time picking up new instruments. This can pave the way towards... being really good at music, and maybe some related things.

If you learn arithmetic and algebra, you have a building block skill that applies to science, engineering, and business. These things seem more world-changing than music.

(I think music can be world changing, but I think the skill-tree there is more like 'songwriting' and 'connecting with a muse and speaking to the heart of people's souls', which I think is pretty different from piano playing)

Point #1 is sort of a subset of point #2: analyzing your goals, breaking things down into subgoals, breaking down skills into subskills, are all "skills" that I expect to generalize quite a lot in a lot of domains.

...

How much is this worth?

I do think a point you made that stands out is "well, there's only so much you can specialize. If you specialize at meta-skills, i.e. "specialize in being a generalist", does that trade off against being better specialist?

Probably.

I think it depends on how early you pick up the meta-skills – it seems like a travesty that children aren't taught these skills at like age ~10 so that they get to apply them sooner/faster to more domains. If you're 30ish (like me), I don't think it's that obvious, in all cases, that you should "level up at meta". I spent the last month learning "meta", and I could have been learning ML, or math proofs, or web design, and it would have been more immediately applicable.

(See: Rationality !== Winning)

The reason I think this is important is because I think "how do we safely create a superintelligence" (or, avoid doing so in a reliable/safe fashion), are very confusing questions. It isn't obvious if I'm (or others) are supposed to learn ML, or math proofs, or geopolitics. And meta-skills seem more necessary for figuring out how to navigate that, and what specialist skills to learn, and how to apply them. i.e. Specializing in Problems We Don't Understand.

Comment by Raemon on CFAR Takeaways: Andrew Critch · 2024-02-25T19:53:06.890Z · LW · GW

POV, this is the key observation that should've, and should still, instigate a basic attempt to model what humans actually are and what is actually up in today's humans.  It's too basic a confusion/surprise to respond to by patching the symptoms without understanding what's underneath.

On one hand, when you say it like that, it does seem pretty significant.

I'm not sure I think there's that much confusion to explain? Like, my mainline story here is:

  1. Humans are mostly a kludge of impulses which vary in terms of how coherent / agentic they are. Most of them have wants that are fairly basic, and don't lend themselves super well to strategic thinking. (I think most of them also consider strategic thinking sort of uncomfortable/painful). This isn't that weird, because, like, having any kind of agency at all is an anomaly. Most animals have only limited agency and wanting-ness.
  2. There's some selection effect where the people who might want to start Rationality Orgs are more agentic, have more complex goals, and find deliberate thinking about their wants and goals more natural/fun/rewarding.  
  3. The "confusion" is mostly a typical mind error on the part of people like us, and if you look at evolution the state of most humans isn't actually that weird or surprising.

Perhaps something I'm missing or confused about is what exactly Critch (or, you, if applicable?) mean by "people don't seem to want things." I maybe am surprised that the filtering effect of people who showed up at CFAR workshops or similar still didn't want things.

Can you say a bit more about what you've experienced, and what felt surprising or confusing about it?

Comment by Raemon on Exercise: Planmaking, Surprise Anticipation, and "Baba is You" · 2024-02-25T17:10:12.355Z · LW · GW

I think I’d end up constructing a new exercise for Outer Wilds but could see doing something with ir. (I have started but not completed Outer Wilds)

I think this exercise works best for games where puzzles come in relatively discrete chunks where you can see most of the puzzle at once.

Comment by Raemon on Exercise: Planmaking, Surprise Anticipation, and "Baba is You" · 2024-02-25T03:12:27.399Z · LW · GW

Fwiw I tried out Understand and was underwhelmed. (Cool concept but it wasn’t actually better as an exercise than other good puzzle games)

Comment by Raemon on Rationality Research Report: Towards 10x OODA Looping? · 2024-02-25T00:35:15.907Z · LW · GW

Yeah something like this has already come up as a necessary stepping stone.

See also: ‘have a plan, at all’

Comment by Raemon on Rationality Research Report: Towards 10x OODA Looping? · 2024-02-24T23:53:57.618Z · LW · GW

can you give an example of a time you implemented that shift?

Comment by Raemon on CFAR Takeaways: Andrew Critch · 2024-02-24T23:42:39.328Z · LW · GW

Nod. I'm not actually particularly attached to this point nor think $4000 is necessarily the right amount to get the filtering effect if you're aiming for that. I do think this approach is insufficient for me because the people I most hope to intervene on with my own rationality training are college students, who don't yet have enough income for this approach to work. 

But, also, well, you do need some kind of filter.

Speaking for myself, not sure what Critch would say:

There seems like some kind of assumption here that "if we didn't filter out people unnecessarily, we'd be able to help more people." But, I think the throughput of people-who-can-be-helped here is quite small. I think it's not possible to scale this sort of org to help thousands of people per year without compromising the org. 

(In general in education, there is a problem where educational interventions work initially, because the educators are invested, and have a nuanced understanding of what they're trying to accomplish. But, when they attempt to scale, they get worse teachers who are less invested and less deep understanding of the methodology, because conveying the knowledge is hard)

So, I think it's more like "there is a smallish number of people this sort of process/org would be able to help. There are going to be thousands/millions of people who could be helped, but you don't have time to help them all." That's sort of baked in.

So, it's not necessarily "a problem" from my perspective if this filters out people who I'd have liked to have helped, so long as the program successfully outputs people who go on to be much more effective. (Past me would be more sad about that, but it's something  I've already grieved)

I do think it's important (and plausible) that this creates some kinds of distortions in which people you're selecting on, and if they (in aggregate) those distortions add up. But that's a somewhat different argument from what you and sanyer presented.

But, still, ultimately the question is "okay, what sort of filtering mechanism do you have in mind, and how well does it work?".

Comment by Raemon on Exercise: Planmaking, Surprise Anticipation, and "Baba is You" · 2024-02-24T22:39:50.000Z · LW · GW

Nod, the prompts are meant to be suggestions and you can come up with your own prompts.

I am intending this exercise primarily for people who are interested in answering those sorts of questions though. (But, I also think the exercise is fun, and worth trying/evaluating on that basis if it feels interesting to you)

Comment by Raemon on Choosing My Quest (Part 2 of "The Sense Of Physical Necessity") · 2024-02-24T21:54:14.373Z · LW · GW

I like the "making fake things real” section and think it'd make a good short standalone post. (I also like how it's used here in this context)

I do notice I might be at risk for "trying to make a fake-thing-real, and ending up with still a fake thing" (or at least if I had read about the advice in school, I might have ended up doing that)

Comment by Raemon on the gears to ascenscion's Shortform · 2024-02-23T21:42:15.635Z · LW · GW

Makes sense, thanks for sharing!

Comment by Raemon on the gears to ascenscion's Shortform · 2024-02-23T20:44:10.412Z · LW · GW

I vaguely remember talking to you about this at the time but don't remember what your motivations and thoughts were for cofounding vast at the time. 

I think I'm most interested in this from the perspective of "what decisionmaking processes were you following then, how did they change, and what was the nearest nearby trail of thoughts that might have led you to make a different decision at the time?"

Comment by Raemon on Less Wrong automated systems are inadvertently Censoring me · 2024-02-22T00:27:28.214Z · LW · GW

And notably, the 7 people have to have downvoted you on a comment that got below 0. 

Comment by Raemon on Dual Wielding Kindle Scribes · 2024-02-21T21:02:14.278Z · LW · GW

oh, one question: if you mostly don't look at your notes afterwards, why do you need a kindle scribe for them instead of a notebook?

Comment by Raemon on Dual Wielding Kindle Scribes · 2024-02-21T19:34:22.258Z · LW · GW

Can you say more about "couldn't stand the kindle interface for books/notes?". I'm trying to figure out if I should try this and don't quite have a model of why it didn't work for you (and whether that'd translate to me)

Comment by Raemon on CFAR Takeaways: Andrew Critch · 2024-02-17T20:37:43.998Z · LW · GW

I totally agree that's a (the?) root level cause for most people.

My guess (although I'm not sure) is that in Critch's case working with CFAR participants he was filtering on people who were at least moderately strategic, and it turned out there was a third blocker.

Comment by Raemon on Masterpiece · 2024-02-15T01:47:27.702Z · LW · GW

Not the main point, but:

A sequel to qntm's Lena. Reading Lena first is helpful but not necessary.
 

Why was that piece called "Lena?" The word doesn't show up anywhere in the piece expect the title.

Comment by Raemon on CFAR Takeaways: Andrew Critch · 2024-02-15T00:33:30.415Z · LW · GW

In my experience, I have not been able to reliably break even. This kind of estimate assumes a kind of fungibility that is sometimes correct and sometimes not. I think When to Get Compact is relevant here--it can feel like my bottleneck is time, when in fact it is actually attentional agency or similar. There are black holes that will suck up as much of our available time as they can.

Yeah this sounds like a real/common problem, and dealing with that somehow seems like a necessary piece for this whole process to work.

Comment by Raemon on CFAR Takeaways: Andrew Critch · 2024-02-15T00:32:38.464Z · LW · GW

I'd also broadly say of the meta update: I think human intuitions aren't fine tuned for happening to be in a +2-3sd brain loadout, so they aren't very good at cueing us to actually use those features reliably.

That's an interesting point I hadn't thought about before.

Comment by Raemon on CFAR Takeaways: Andrew Critch · 2024-02-15T00:31:46.377Z · LW · GW

I assume he said something more nuanced and less prone to blindspots than that.

I think this is pretty close to what he said. I suspect he'd have some nuances if we drilled down into it.

This involves speculation on my part, but having known Critch moderately well, I'd personally bet that he has something like 10x to 30x fewer things lying around bothering him than most people (which maybe comes in waves where, first he had 10x fewer things lying around bothering him, then he set more ambitious goals for himself that created more bothersome things, then he fixed those, etc)

He probably does still have blindspots but I think the effect is real.

Comment by Raemon on Masterpiece · 2024-02-14T18:04:17.134Z · LW · GW

I have a feeling this one gets rejected for being Too Meta, based on other warnings and disclaimers in the contest description.

Comment by Raemon on Skills I'd like my collaborators to have · 2024-02-10T18:41:18.225Z · LW · GW

You know, "Actually OODA-ing" would just be a good standalone blogpost. I'd also be interested in reading alternate versions of it by @Andrew_Critch, @LoganStrohl, you (Romeo) and me. (I think this post by Duncan feels kinda related although not structured the same way).

Comment by Raemon on Skills I'd like my collaborators to have · 2024-02-10T08:04:06.754Z · LW · GW

Actually Resting

oh geeze yeah

Comment by Raemon on Skills I'd like my collaborators to have · 2024-02-09T19:09:09.398Z · LW · GW

lol that is amazingly terrible. 

That doc was a memo at a private retreat that a) not actually that private, but b) is mostly just a repackaging of this:

https://www.lesswrong.com/posts/rz73eva3jv267Hy7B/can-you-keep-this-confidential-how-do-you-know 

Comment by Raemon on Believing In · 2024-02-09T15:50:21.501Z · LW · GW

In my case, I learned the skill of separating my predictions from my “believing in”s (or, as I called it at the time, my “targets”) partly the hard way – by about 20 hours of practicing difficult games in which I could try to keep acting from the target of winning despite knowing winning was fairly unlikely, until my “ability to take X as a target, and fully try toward X” decoupled from my prediction that I would hit that target.
 

what were the games here? I‘d guess naive calibration games aren’t sufficient. 

Comment by Raemon on Wrong answer bias · 2024-02-02T05:26:32.207Z · LW · GW

Related: Writing That Provokes Comments 

Comment by Raemon on "Does your paradigm beget new, good, paradigms?" · 2024-01-26T00:18:00.734Z · LW · GW

I would more say "seems like it would reasonably help a lot in getting a huge amount of useful work out of AIs". (And then this work could plausibly help with aligning superintelligent AIs, but that isn't clearly the only or even main thing we're initially targeting.)

Yeah I think if I thought more carefully before posting I'd have come up with this rephrasing myself. Matches my understanding of what you're going for.

Comment by Raemon on My research agenda in agent foundations · 2024-01-25T02:07:34.966Z · LW · GW

I agree there's a risk of overemphasizing fast-feedback loops that damages science. 

My current belief is that gaining research taste (or b is something that shouldn't be that mysterious, and mostly it seems to be something that

  • does require quite a bit of effort (which is why I think it isn't done by default)
  • also requires at least some decent meta-taste on how to gain taste (but, my guess is Alex Altair in particular has enough of this to navigate it)

And.. meanwhile I feel like we just don't have the luxury of not at least trying on this axis to some degree.

(I don't know that I can back up this statement very much, this seems to be a research vein I currently believe in that no one else currently seems to)

It is plausible to me (based on things like Alex's comment on this other post you recently responded to, and other convos with him) that Alex-in-particular is already basically doing all the things that make sense to do here. 

But, like, looking here:

But I think it's a careful balancing act, and I worry that putting too much pressure on speed and legibility is going to end up causing people to do science under the streetlight. I really do not want this to happen. Field founding science is a bunch weirder than normal science, and I want to take care in giving research taste enough space to find its feet. 

I think the amount I'm pushing for this here is "at all", and it feels premature to me to jump to "this will ruin the research process".

Comment by Raemon on What exercises go best with 3 blue 1 brown's Linear Algebra videos? · 2024-01-19T19:51:00.261Z · LW · GW

Woo!

Comment by Raemon on Look For Principles Which Will Carry Over To The Next Paradigm · 2024-01-17T01:25:41.232Z · LW · GW

I think of this as a fairly central post in the unofficial series on How to specialize in Problems We Don't Understand (which, in turn, is the post that most sums up what I think the art of rationality is for. Or at least the parts I'm most excited about).

Comment by Raemon on What Are You Tracking In Your Head? · 2024-01-17T01:15:11.997Z · LW · GW

I think this concept is important. It feels sort of... incomplete. Like, it seems like there are some major follow threads, which are:

  • How to teach others what useful skills you have.
  • How to notice when an expert has a skill, and 
    • how to ask them questions that help them tease out the details.

This feels like a helpful concept to re-familiarize myself with as I explore the art of deliberate practice, since "actually get expert advice on what/how to practice" is one of the most centrally recommended facets.

Comment by Raemon on On how various plans miss the hard bits of the alignment challenge · 2024-01-15T20:58:28.726Z · LW · GW

This seems right to me for posts replying to individual authors/topics (and I think this criticism may apply to some other more targeted Nate posts in that vein)

But I think for giving his takes on a large breadth of people, the cost of making sure each section is well vetted increases the cost by a really prohibitive amount, and I think it's probably better to do it the way Nate did here (clearly establishing the epistemic status of the post, and letting people in the comments argue if he got something wrong).

Also, curious if you think there's a particular instance where someone(s) felt misrepresented here? (I just tried doing a skim of the comments, there were a lot of them and the first ones I saw seemed more like arguing with the substance of the disagreement rather than his characterization being wrong. I gave up kinda quickly, but for now, did you recall him getting something wrong here, or just thinking on general principle that one should't err in this direction?)

Comment by Raemon on Anthropic | Charting a Path to AI Accountability · 2024-01-15T02:42:58.442Z · LW · GW

I held off on writing a comment here because I felt like I should thoroughly read the linked thing before having an opinion, but then it turned out that was a lot of work so I didn't.

I'm hoping to read more details this week as part of LessWrong Review time, but not sure if I'll get to it.

Comment by Raemon on High Reliability Orgs, and AI Companies · 2024-01-15T02:17:44.622Z · LW · GW

Self Review. 

I wasn't sure at the time the effort I put into this post would be worth it. I spent around 8 hours I think, and I didn't end up with a clear gearsy model of how High Reliability Tends to work.

I did end up following up on this, in "Carefully Bootstrapped Alignment" is organizationally hard. Most of how this post applied there was me including the graph from the vague "hospital Reliability-ification process" paper, in which I argued:

The report is from Genesis Health System, a healthcare service provider in Iowa that services 5 hospitals. No, I don't know what "Serious Safety Event Rate" actually means, the report is vague on that. But, my point here is that when I optimistically interpret this graph as making a serious claim about Genesis improving, the improvements took a comprehensive management/cultural intervention over the course of 8 years.

I know people with AI timelines less than 8 years. Shane Legg from Deepmind said he put 50/50 odds on AGI by 2030

If you're working at an org that's planning a Carefully Aligned AGI strategy, and your org does not already seem to hit the Highly Reliable bar, I think you need to begin that transition now. If your org is currently small, take proactive steps to preserve a safety-conscious culture as you scale. If your org is large, you may have more people who will actively resist a cultural change, so it may be more work to reach a sufficient standard of safety. 

I don't know whether it's reasonable to use the graph in this way (i.e. I assume the graph is exaggerated and confused, but that it still seems suggestive of an lower bound on how long it might take a culture/organizational-practice to shift towards high reliability.

After writing "Carefully Bootstrapped Alignment" is organizationally hard, I spent a couple months exploring and putting some effort into trying to understand why the AI safety focused members of Deepmind, OpenAI and Anthropic weren't putting more emphasis on High Reliability. My own efforts there petered out and I don't know that they were particularly counterfactually helpful.

But, later on Anthropic did announce their Scaling Policy, which included language that seems informed by biosecurity practices (since writing this post, I later went on to interview someone about High Reliability practices in bio, and they described a schema that seems to roughly map onto the Anthropic security levels). I am currently kind on the fence about whether Anthropic's policy has teeth or is more like elaborate Safetywashing, but I think it's at least plausibly a step in the right direction.

Comment by Raemon on Learning By Writing · 2024-01-15T02:01:33.762Z · LW · GW

I've had a vague intent to deliberately apply this technique since first reading this two years ago. I haven't actually done so, alas.

It still looks pretty good to me on paper, and feels like something I should attempt at some point. 

Comment by Raemon on How my team at Lightcone sometimes gets stuff done · 2024-01-15T01:24:01.889Z · LW · GW

Lightcone has evolved a bit since Jacob wrote this, and also I have a somewhat different experience from Jacob. 

Updates:

  • "Meeting day" is really important to prevent people being blocked by meetings all week, but, it's better to do it on Thursday than Tuesday (Tuesday Meeting Days basically kill all the momentum you built up on Monday)
  • We hit the upper limits of how many 1-1 public DM channels really made sense (because it grew superlinearly with the number of employees). We mostly now have "wall channels" (i.e. raemon-wall), where people who want to message me write messages. (But, for people I am often pairing extensively with, I still sometimes use dedicated 1-1 channels for that high-bandwidth communication)
  • I think still try to have top-priorities set on Monday, but I think they are a bit looser than the way Jacob was running things on his team at the time.

Things I still basically endorse that feel particularly significant

  • Having people work onsite and near each other so you can easily get help unblocking yourself is indeed quite valuable. The difference between being in the same room and even one-room-over is significant, and being across the office very significant. Being offsite slows things down a lot.
  • Pairing feels even more important than this post makes it seem. I think there's a lot of type of work that feels like you don't need to be pairing, but I think pairing helps me stay focused long after my attention would have started to flag. 

For pairing, I'd add:

  • When people don't pair for a long stretch of time, my sense is they might initially feel more productive, but then slide into bad habits or avoidant behaviors that are hard to notice. 
  • Pairing allows for skill transfer. Pairing between different people with different skills is great.
  • I personally prefer a style of pairing that is very explicit and... micromanagey (both for when I'm the driver or the navigator). i.e. "go to the top-right corner of the screen, click the button, then go to the middle of the screen and type [X]."). Some people find that difficult, it's not one-size-fits-all, but I find it good for avoiding confusion that crops up when you try to give or receive more openended directions.
  • We have shifted to almost always pairing via zoom and screen share, rather than leaning over at each other's monitors (even while in the same room), so we don't have to crane our neck all the time.

I probably could say more but that seems like how much time I want to spend on it for now.

Comment by Raemon on How To Observe Abstract Objects · 2024-01-15T00:43:50.822Z · LW · GW

My previous review of this is in this older comment. Recap:

  • I tried teaching a variation of this exercise that was focused on observing your internal state (sort of as an alternative to "Focusing". I forgot to include the "meta strategy" step, which upon reflection was super important (so important I independently derived it for another portion of the same workshop I was running at the time). The way I taught this exercise it fell flat, but I think this was probably in my presentation.
  • I did have people do the "practice observing things" (physical things, not abstract objects) in 1-1 conversations with them in between workshop sessions, and found that worked better, since I brought up the exercise right when it seemed actually relevant to them.

I plan to re-run this exercise soon and see how it goes. I feel optimistic about it at least being on the right track towards something good and important. Logan notes that they haven't done it exactly the same way twice, so seems like they have a similar take?

Comment by Raemon on Announcing Balsa Research · 2024-01-14T22:06:22.426Z · LW · GW

I think I'm personally interested in a more detailed self-review of how this project went. Like, Balsa Research seemed like your own operationalization of a particular hypothesis, which is "one partial solution to civilizational dysfunction is to accomplish clear wins that restore/improve faith that we can improve things on purpose." (or, something like that)

I didn't nominate the post but having thought about it now, I'd be interested in an in-depth update on how you think about that, and what nuts-and-bolts of your thinking have evolved over the past couple years. 

Comment by Raemon on What good is G-factor if you're dumped in the woods? A field report from a camp counselor. · 2024-01-14T20:01:59.034Z · LW · GW

I should emphasize that he did not succeed at hurting another kid in his allergy plot, and was not likely to. 1% of kids with psychopathic tendencies sounds rare when you’re parenting one kid, but it sounds like Tuesday when you have the number of kids seen by an institution like a summer camp- there’s paperwork, procedures, escalations, all hella battle-tested.

This is really interesting. This makes sense but I hadn't thought about this before at all. I'd be interested in a post that explores this in more detail.

Comment by Raemon on Patient Observation · 2024-01-13T21:33:42.119Z · LW · GW

For the past few years I've read Logan-stuff, and felt a vague sense of impatience about it, and a vague sense of "if I were more patient, maybe a good thing would happen though?". This year I started putting more explicit effort into cultivating patience.

I've read this post thrice now, and each time I start out thinking "yeah, patience seems like a thing I could have more off"... and then I get to the breakdown of "tenacity, openness and thoroughness" and go "oh shit I forgot about that breakdown. What a useful breakdown." It feels helpful because it suggests different avenues of improvement, that each feel a bit more manageable.

I think I have the most to say about "openness", which I think is maybe what I most struggle with.

I don't know that the "desire for closure/certainty" resonates. I think for me it's more like "antsy to get back to the action", and sometimes "being triggered and feeling specific resistance to a thing-I-could-be-open-about", and sometimes "just being bored." The thing I want doesn't feel like "certainty" it's more like "good enough to get back to some other thing, even if uncertain." (Maybe this is secretly still about uncertainty but it doesn't feel like it right now)

Hmm, let's try to call to mind some examples:

  • Triggeredness/Defensiveness. A recent example, responding to a comment on my post with a defensive need to clarify "I already thought of that!". I noticed later that this was kinda closing off opportunities to learn more about whatever distinctions the commenter may have seen. (I recall this pattern coming up in the past, on LW and Facebook)
  • Boredom/Antsiness. While doing Purposeful Practice, and trying to slow down so I can both perform a skill perfectly and do some introspection on what's going on when I mess up, I just feel wanna DO THE THING FAST.
  • Desire to share my frame (at the expensive of either original seeing or listening to someone else's perspective). I don't have a specific example onhand but I think this comes up a lot.  This feels routed through wanting to feel/seem smart and get attention or something. 

"Open" vs Undirected

I think there's an overall issue of... not really believing that the time I'd need to dedicate to persistent open curiosity would pay off. 

Hrm, I guess I notice some confusion here. I'm not sure if I'm conflating Open Observation, Open Curiosity (vs Active Curiosity, or "tunnel-visioned" curiosity). I think maybe a distinction not quite articulated here is "open" in the "not tunnel visioned, seeing from many angles" sense, and "open" in "undirected, you're not trying to achieve a goal or look at any particular type of thing."

My brain feels pretty sold on "open" curiosity and "open" observation (although still not 100% sure if those are different). What my brain doesn't feel sold on is "undirected" attention. I believe some good things will happen if I let myself undirectedly observe things, but... I don't really buy that it'll be as useful as a more directed version of that. I look at the raven acrobatics anecdote and am like "cool, but... idk the raven acrobatics doesn't actually seem that useful." What sort of things would I do that are analogous that'd actually help me?

Note on Perceptual Dexterity

Unlike Tenacity/Openness/Thoroughness, I did remember Perceptual Dexterity as a concept, but I totally forgot it was in this essay.

I notice the section on it feels relatively self-contained. I think it'd be kinda useful to have a post that's... literally just that section copy-pasted, with the title "Perceptual Dexterity." 

Or, like, maybe make an idealized Best Version of Perceptual Dexterity Essay that could possibly be. But, one thing that I think really helps with learning "patient observation" is, well, to be continuously reminded of patient obseration. I think having a larger number of lower-effort posts that keep it in people's mind.

Comment by Raemon on Trigger-Action Planning · 2024-01-13T20:34:53.224Z · LW · GW

I've long believed TAPs are a fundamental skill-building block. But I've noticed lately that I never really gained, or solidified as well as I'd like, the skill of building TAPs. 

I just reread this post to see if it'd help. One paragraph that stands out to me is this:

And in cases where this is not enough—where your trigger does indeed fire, but after two weeks of giving yourself the chance to take the stairs, you discover that you have actually taken yourself up on it zero times—the solution is not TAPs!  The problem lies elsewhere—it's not an issue with your autopilot, but rather with your chosen action or some internal conflict or hesitation, and there are other techniques that can be used to illuminate and solve those problems.

and the Tips for TAPS section, which help clarify what TAPs are for.

I think my biggest confusion right now is how to get TAPs to reliably fire, in a chaotic world. I had set a tap recently of "make sure to close a particular door fully, every time I used it." I practiced doing it as I walked through a few times. But then it failed to fire when I was carrying a heavy thing, or in the middle of a conversation, or when I was walking my scooter through the door. I couldn't figure out a way to practice the trigger that was versatile/robust. I would practice on each new variation I noticed, and I'd try practicing other variations to "cross train", but they never seem to generalize to the situations I actually need. 

...

Some random notes

I find this bullet confusing:

Try gain-pain movies—first imagine some exciting or attractive aspect of the future where you’ve achieved your goal, and then think about the obstacles that lie between you and that future, and then repeat several times.[1]

I think I've previously read this sort of thing with a "and then, visualize overcoming those obstacles" clause, or something similar. 

Hmm:

Locke and Latham (2002) review decades of research on goal setting and performance. Among their findings: people who set a challenging, specific goal tend to accomplish more than people who set a vague goal (such as “do as much as possible”) or those who set an easy goal.

On one hand, I think I have set challenging/specific goals for myself, but I think some part of me is still just going through life with an overall "try to do more good in the world" lens. I guess I do regularly channel that into concrete things, but I could maybe use more specificity sometimes.

Comment by Raemon on Thomas Kwa's Shortform · 2024-01-13T03:33:55.967Z · LW · GW

Mmm, nod. I think this is the first request I've gotten for the review period being longer. I think doing this would change it into a pretty different product, and I think I'd probably want to explore other ways of getting-the-thing-you-want. Six months of the year makes it basically always Review Season, and at that point there's not really a schelling nature of "we're all doing reviews at the same time and getting some cross-pollination-of-review-y-ness."

But, we've also been discussing generally having other types of review that aren't part of the Annual Review process (that are less retrospective, and more immediate-but-thorough). That might or might help.

For the immediate future – I would definitely welcome reviews of whatever sort you are inspired to do, basically whenever. If nothing else you could do write it out, and then re-link to it when Review Season comes around.