Posts

Last week of the Discussion Phase 2025-01-09T19:26:59.136Z
What are the most interesting / challenging evals (for humans) available? 2024-12-27T03:05:26.831Z
ReSolsticed vol I: "We're Not Going Quietly" 2024-12-26T17:52:33.727Z
Hire (or Become) a Thinking Assistant 2024-12-23T03:58:42.061Z
The "Think It Faster" Exercise 2024-12-11T19:14:10.427Z
Subskills of "Listening to Wisdom" 2024-12-09T03:01:18.706Z
The 2023 LessWrong Review: The Basic Ask 2024-12-04T19:52:40.435Z
JargonBot Beta Test 2024-11-01T01:05:26.552Z
The Cognitive Bootcamp Agreement 2024-10-16T23:24:05.509Z
OODA your OODA Loop 2024-10-11T00:50:48.119Z
Scaffolding for "Noticing Metacognition" 2024-10-09T17:54:13.657Z
"Slow" takeoff is a terrible term for "maybe even faster takeoff, actually" 2024-09-28T23:38:25.512Z
2024 Petrov Day Retrospective 2024-09-28T21:30:14.952Z
[Completed] The 2024 Petrov Day Scenario 2024-09-26T08:08:32.495Z
What are the best arguments for/against AIs being "slightly 'nice'"? 2024-09-24T02:00:19.605Z
Struggling like a Shadowmoth 2024-09-24T00:47:05.030Z
Interested in Cognitive Bootcamp? 2024-09-19T22:12:13.348Z
Skills from a year of Purposeful Rationality Practice 2024-09-18T02:05:58.726Z
What is SB 1047 *for*? 2024-09-05T17:39:39.871Z
Forecasting One-Shot Games 2024-08-31T23:10:05.475Z
LessWrong email subscriptions? 2024-08-27T21:59:56.855Z
Please stop using mediocre AI art in your posts 2024-08-25T00:13:52.890Z
Would you benefit from, or object to, a page with LW users' reacts? 2024-08-20T16:35:47.568Z
Optimistic Assumptions, Longterm Planning, and "Cope" 2024-07-17T22:14:24.090Z
Fluent, Cruxy Predictions 2024-07-10T18:00:06.424Z
80,000 hours should remove OpenAI from the Job Board (and similar EA orgs should do similarly) 2024-07-03T20:34:50.741Z
What percent of the sun would a Dyson Sphere cover? 2024-07-03T17:27:50.826Z
What distinguishes "early", "mid" and "end" games? 2024-06-21T17:41:30.816Z
"Metastrategic Brainstorming", a core building-block skill 2024-06-11T04:27:52.488Z
Can we build a better Public Doublecrux? 2024-05-11T19:21:53.326Z
some thoughts on LessOnline 2024-05-08T23:17:41.372Z
Prompts for Big-Picture Planning 2024-04-13T03:04:24.523Z
"Fractal Strategy" workshop report 2024-04-06T21:26:53.263Z
One-shot strategy games? 2024-03-11T00:19:20.480Z
Rationality Research Report: Towards 10x OODA Looping? 2024-02-24T21:06:38.703Z
Exercise: Planmaking, Surprise Anticipation, and "Baba is You" 2024-02-24T20:33:49.574Z
Things I've Grieved 2024-02-18T19:32:47.169Z
CFAR Takeaways: Andrew Critch 2024-02-14T01:37:03.931Z
Skills I'd like my collaborators to have 2024-02-09T08:20:37.686Z
"Does your paradigm beget new, good, paradigms?" 2024-01-25T18:23:15.497Z
Universal Love Integration Test: Hitler 2024-01-10T23:55:35.526Z
2022 (and All Time) Posts by Pingback Count 2023-12-16T21:17:00.572Z
Raemon's Deliberate (“Purposeful?”) Practice Club 2023-11-14T18:24:19.335Z
Hiring: Lighthaven Events & Venue Lead 2023-10-13T21:02:33.212Z
"The Heart of Gaming is the Power Fantasy", and Cohabitive Games 2023-10-08T21:02:33.526Z
Related Discussion from Thomas Kwa's MIRI Research Experience 2023-10-07T06:25:00.994Z
Thomas Kwa's MIRI research experience 2023-10-02T16:42:37.886Z
Feedback-loops, Deliberate Practice, and Transfer Learning 2023-09-07T01:57:33.066Z
Open Thread – Autumn 2023 2023-09-03T22:54:42.259Z
The God of Humanity, and the God of the Robot Utilitarians 2023-08-24T08:27:57.396Z

Comments

Comment by Raemon on On Eating the Sun · 2025-01-10T18:47:55.197Z · LW · GW

It sounds like there's actually like 3-5 different object level places where we're talking about slightly different things. I also updated on the practical aspect from Ryan's comment. So, idk here's a bunch of distinct points.

1. 

Ryan Greenblatt's comment updated me that the energy requirements here are minimal enough that "eating the sun" isn't really going to come up as a consideration for astronomical waste. (Eating the Earth or most of the solar system seems like it still might be. But, I agree we shouldn't Eat the Earth)

2. 

I'd interpreted most past comments for nearterm (i.e. measured in decades) crazy shit to be about building Dyson spheres, not Star Lifting. (i.e. I expected the '20 years from now in some big ol' computer' in the solstice song to be about dyson spheres and voluntary uploads). I think many people will still freak out about Dyson Sphering the sun (not sure if you would). I would personally argue "it's just pretty damn important to Dyson Sphere the sun even if it makes people uncomfortable (while designing it such that Earth still gets enough light)."

3. 

I agree in 1000 years it won't much matter whether you Starlift, for astronomical waste reasons. But I do expect in 1000 years, even assuming a maximally consent-oriented / conservative-with-regards-to-bio-human-values, and all around "good" outcome, most people will have shifted to running on computronium and experienced much more than 1000 years of subjective time and their intuitions about what's good will just be real different. There may be small groups of people who continue living in bio-world but most of them will still probably be pretty alien by our lights. 

I think I do personally hope they preserve the Earth as sanctuary and/or historical relic. But I think there's a lot of compromises like "starlift a lot of material out of the sun, but move the Earth closer to the sun to compensate" (I haven't looked into the physics here, the details are obviously cruxy). 

When I imagine any kind of actual realistic future that isn't maximally conservative (i.e. the bio humans are < .1% of the solar system's population and just don't have that much bargaining power), it seems even more likely that they'll at least work on compromise solutions that preserve a reasonable Earth experience but eat a bunch of the sun, if there turn out to be serious tradeoffs there. (Again I don't actually know enough physics here and I'm recently humbled by remembering the Eternity in Six Hours paper, maybe there's literally no tradeoffs here, but, I'd still doubt it)

4. 

It sounds like it's not particularly cruxy anymore, but, I think the "0.00000004% of the Earth's current population" analogy is just quite different. 80 trillion suns is involves more value than has ever been had before, 3 lives is (relatively) insignificant compared to many political compromises we've made, even going back thousands of years. Maybe whatever descendants get to reap that value are so alien that they just don't count as valuable by today's lights, and it's reasonable to have some extreme time discounting here, but, if any values-you-care-about survived it would be huge.

I agree both morally and practically with "it's way more important to make sure we have good global coordination systems that don't predictably either descend into a horrible totalitarianism, or trigger a race for power that causes horrible wars or other bad things, than to get those 80 trillion suns." But, like, the 80 trillion suns are still a big deal.

5. 

I'll note it's also not a boolean whether we "bulldoze the earth" or "bulldoze the rest of the solar system" for rushing to build a dyson sphere. You can start the process with a bunch of mining in some remote mountain regions or whatever without eating the whole earth. (But I think it might be bad to do this because "don't harvest Earth" is just a nice simple Schelling rule and once you start haggling over the details I do get a lot more worried)

6. 

I recall reading it's actually maybe cheaper to use asteroids than Mercury to make a dyson sphere because you don't need to expensively lift things out of the gravity well. It is appealing to me if there are no tradeoffs involved with deconstructing any of the charismatic astronomical objects until we've had more time to think/orient/grow-as-a-people.

7. 

Part of my outlook here is that I spend the last 14 years being personally uninterested in and scared by the sorts of rapid/crazy/exponential change you're wary of. In the past few years, I've adjusted to be more personally into it. I don't think I would have wanted to rush that grieving/orienting process for Past Me even though it cost me a lot of important time and resources (I'm referring here more to more like stuff in The God of Humanity, and the God of the Robot Utilitarians)

But I do wish I had somehow sped along the non-soulfully-traumatic parts of the process (i.e. some of the updates were more simple/straightforward and if someone had said the right words to me, I think I'd have gotten a strictly better outcome by my original lights).

I expect most humans, given opportunity to experiment on their own terms, will gradually have some kind of perspective shift here (maybe on a longer timescale than Me Among the Rationalists, but, like, <500 years). I don't want people to feel rushed about it, but I think there will be some societal structures that will lend themselves to dallying more and accumulating serious tradeoffs, or less.

Comment by Raemon on The Soul Key · 2025-01-10T17:40:13.695Z · LW · GW

In addition to being hauntingly beautiful, this story helped me adjust to the idea of the trans/posthuman future.

14 years ago, I very much did not identify with the Transhuman Vision. It was too alien, too much, and I didn't feel ready for it. I also didn't actively oppose it. I knew that slowly, as I hung out around rationalists, I would probably slowly come to identify more with humanity's longterm future.

I have indeed come to identify more with the longterm future and all of it's weirdness. It was mostly not because of this story, but I did particularly resonate with the framing here – in large part because it met me where I am, instead of jumping into Future Shock. It presents the increasing alienness in gentle increments, from multiple perspectives, and from the perspective of someone currently living in a more Ancestral Human perspective.

This story doesn't tell you what sort of choices are good to make, but it makes it feel easier to wrap my brain around how I (or others) might eventually make such choices.

Comment by Raemon on Change my mind: Veganism entails trade-offs, and health is one of the axes · 2025-01-10T16:43:00.603Z · LW · GW

I think it might have been important to the process for this post to exist (so you can tell Elizabeth did the work), but for there still to be a shorter post that just gets the point across. 

Comment by Raemon on On Eating the Sun · 2025-01-10T03:53:58.409Z · LW · GW

I have my own actual best guesses for what happens in reasonably good futures, which I can get into. (I'll flag for now I think "preserve Earth itself for as long as possible" is a reasonable Schelling point that is compatible with many "go otherwise quite fast" plans)

I doubt that totally dismantling the Sun after centuries would significantly accelerate the time we reach the cosmic event horizon. 

Why do you doubt this? (to be clear, depends on exact details. But, my original query was about a 2 year delay. Proxima Centauri is 4 lightyears away. What is your story for how only taking 0.1% of the sun's energy while we spin up doesn't slow us down by at least 2 years?

I have more to say but maybe should wait on your answer to that.

Mostly, I think your last comment still had it's own missing mood of horror, and/or seemed to be assuming away any tradeoffs.

(I am with you on "many rationalists seem gung ho about this in a way I find scary")

Comment by Raemon on The Base Rate Times, news through prediction markets · 2025-01-10T01:20:36.681Z · LW · GW

This site is a cool innovation but missing pieces required to be really useful. I’m giving it +1. I might give it +4 to subsidize ‘actually build shit’.

I think this site is on-the-path to something important but a) the UI isn't quite there and b) there's this additional problem where, well, most news doesn't matter (in that it doesn't affect my decisions).

During Ukraine nuclear scares, I looked at BaseRateTimes sometimes to try and orient, but I think it was less helpful than other compilations of prediction markets that Lightcone made specifically to help orient to nuclear threats.

I do think combining multiple prediction platforms into a single page is a pretty useful innovation and still seems likely to be a key piece of the final usable product I hope exists some day.

Things that feel missing:

  • good UI for tailoring the page to suit my interests and decisionmaking
  • ...more context, or something, for each market?

I assume one problem here is that matching up markets from different platforms with similar enough resolution criteria is hard. (Seems like something AI may be able to automate now?)

Comment by Raemon on On Eating the Sun · 2025-01-09T21:18:59.026Z · LW · GW

Richard said "I don't think my priors on that are very different from yours but the thing that would have made this post valuable for me is some object-level reason to upgrade my confidence in that." He didn't say it'd be a longterm project, I think he just meant he didn't change his beliefs about it due to thist post.

Comment by Raemon on On Eating the Sun · 2025-01-09T21:14:28.353Z · LW · GW

So, I'm with you on "hey guys, uh, this is pretty horrifying, right? Uh, what's with the missing mood about that?".

The issue is that not-eating-the-sun is also horrifying. i.e. see also All Possible Views About Humanity's Future Are Wild. To not eat the sun is to throw away orders of magnitude more resources than anyone has ever thrown away before. Is it percentage-wise "a small fraction of the cosmos?". Sure. But, (quickly checks Claude, which wrote up a fermi code snippet before answering, I can share the work if you want to doublecheck yourself), a two year delay would be... 0.00000004% of the unvierse lost beyond the lightcone horizon, which doesn't sound like much except that's 200 galaxies lost.

When you compare "the Amish get a Sun Replica that doesn't change their experience", the question is "Is it worth throwing away 80 trillion stars for the Amish to have the real thing." It does not seem obviously worth it.

IMO there isn't an option that isn't at least a bit horrifying in some sense that one could have a missing mood about. And while I still feel unsettled about it, I think if I have to grieve something, makes more sense to grieve in the direction of "don't throw away 80 trillion stars worth of resources."

I think you're also maybe just not appreciating how much would change in 10,000 years? Like, there is no single culture that has survived 10,000 years. (Maybe one of those small tribes in the amazon? I'd still bet on there having been a lot of cultural drift there but not confidently). The Amish are only a few hundred years old. I can imagine doing a lot of moral reflection and coming to the conclusion the sun shouldn't be eaten until all human cultures have decided it's the right thing to do, but I do really doubt that process takes 10,000 years.

Comment by Raemon on On Eating the Sun · 2025-01-09T20:42:00.065Z · LW · GW

What do you think Richard Ngo claimed about this?

Comment by Raemon on On Eating the Sun · 2025-01-08T20:21:04.901Z · LW · GW

This seemed like a nice explainer post, though it's somewhat confusing who the post is for – if I imagine being someone who didn't really understand any arguments about superintelligence, I think I might bounce off the opening paragraph or title because I'm like "why would I care about eating the sun." 

There is something nice and straightforward about the current phrasing but suspect there's an opening paragraph that would do a better job explaining why you might care about this.

(But I'd be curious to hear from people who weren't really sold on any singularity stuff who read it and can describe how it was for them)

Comment by Raemon on Dmitry Vaintrob's Shortform · 2025-01-06T23:27:33.006Z · LW · GW

FYI I think by the time I wrote Optimistic Assumptions, Longterm Planning, and "Cope", I think I had updated on the things you criticize about it here (but, I had started writing it awhile ago from a different frame and there is something disjointed about it)

But, like, I did mean both halfs of this seriously:

I think you should be scared about this, if you're the sort of theoretic researcher, who's trying to cut at the hardest parts of the alignment problem (whose feedback loops are weak or nonexistent) 

I think you should be scared about this, if you're the sort of Prosaic ML researcher who does have a bunch of tempting feedback loops for current generation ML, but a) it's really not clear whether or how those apply to aligning superintelligent agents, b) many of those feedback loops also basically translate into enhancing AI capabilities and moving us toward a more dangerous world.

...

Re:

For the last few weeks, I’ve been working on trying to find plans for AI safety. They should cover the whole problem, including the major hurdles after intent alignment. 

I strongly disagree with this being a good thing to do! We're not going to have a good, end-to-end plan about how to save the world from AGI.

I think in some sense I agree with you – the actual real plans won't be end-to-end. And I think I agree with you about some kind of neuroticism that unhelpfully bleeds through a lot of rationalist work. (Maybe in particular: actual real solutions to things tend to be a lot messier than the beautiful code/math/coordination-frameworks an autistic idealist dreams up)

But, there's still something like "plans are worthless, but planning is essential." I think you should aim for the standard of "you have a clear story for how your plan fits into something that solves the hard parts of the problem." (or, we need way more people doing that sort of thing, since most people aren't really doing it at all)

Some ways that I think about End to End Planning (and, metastrategy more generally)

Because there are multiple failure modes, I treat myself as having multiple constraints I have to satisfy:

  • My plans should backchain from solving the key problems I think we ultimately need to solve
  • My plans should forward chain through tractable near goals with at least okay-ish feedback loops. (If the okay-ish feedback loops don't exist yet, try to be inventing them. Although don't follow that off a cliff either – I was intigued by Wentworth's recent note that overly focusing on feedback loops led him to predictably waste some time)
  • Ship something to external people, fairly regularly
  • Be Wholesome (that is to say, when I look at the whole of what I'm doing it feels healthy, not like I've accidentally min-maxed my way into some brittle extreme corner of optimization space)

And for end to end planning, have a plan for...

  • up through the end of my current OODA loop
  • (maybe up through a second OODA loop if I have a strong guess for how the first OODA loop goes)
  • as concrete a plan as I can, assuming no major updates from the first OODA loop, up through the end of the agenda.
  • as concrete a visualization of the followup steps after my plan ends, for how it goes on to positively impact the world.

End to End plans don't mean you don't need to find better feedbackloops or pivot. You should plan that into the plan (And also expect to be surprised about it anyway). But, I think if you don't concretely visualize how it fits together you're like to go down some predictably wasteful paths.

Comment by Raemon on quetzal_rainbow's Shortform · 2025-01-02T07:07:06.642Z · LW · GW

Not sure I quite parsed, but things that makes me think of:

  • first, if you're bottlenecked on health (physical or mental), it may be that finding medication that helps is more important than your mindset.
  • try success spiralling – start doing small things, build up both a habit/muscle of doing things, and momentum in doing things, escalate to bigger things
  • if getting started is hard, maybe find a friend or pay a colleague to just sit with you and constantly be like "are you doing stuff?" and spray you with a water bottle if you look like you're overthinking stuff, until you build up a success spiral / muscle of doing things.
  • try doing doing doing just fucking do it man and when you're brain is like "idk that seems like a whole lotta doing what if we're doing the wrong thing?" be like "it's okay Thinky Brain this is an experiment we will learn from later so we evetually can calibrate on Optimal Think-to-Do Ratio"
Comment by Raemon on quetzal_rainbow's Shortform · 2025-01-02T04:29:15.586Z · LW · GW

when you attempt to switch from thinking to doing, what happens instead?

Comment by Raemon on Raemon's Shortform · 2025-01-02T00:43:25.839Z · LW · GW

Sort of inspired by Erik Jenner's post:

If you've been vaguely following me and thought "man, I wish Ray hurried up and finished making the update that <X>", what are some values of X?

Comment by Raemon on The OODA Loop -- Observe, Orient, Decide, Act · 2025-01-01T18:07:05.668Z · LW · GW

Great writeup.

  1. Alice's team develops a major product without first checking to see if it's something people actually want -- after a year and a half of development, the product works great, but it turns out there isn't much of any demand. (I would consider this an observation failure -- failure to observe critical information leads to lots of wasted time.)

FYI I'd classify this more as a decision failure. They really would have had to take different actions in order to get this data, so this was more at the point when they were like "do I start building this product, or do I find some random representative users and see what they think of the idea?."

Comment by Raemon on The OODA Loop -- Observe, Orient, Decide, Act · 2025-01-01T18:05:43.362Z · LW · GW

Also, since decision can flow pretty directly from orientation, you may find these two similar enough that you want to group them as one; I'm undecided on whether to make that change to this technique "more formally" and probably need to test it with more participants to see!

I actually normally combine/conflate Observe and Orient.

I think the actual takeaway here is: any two adjacent steps can kind of blend into each other. 

You might be in a microloop where you're observing and orienting (and then maybe looking for more observations and then orienting on the new ones). 

Then, when you're eventually like "okay I have enough observations", you may be in a loop where you're evaluating decisions, and then looking at your confused model and trying to wrangle the information into a form that's useful for decisionmaking, then look at your decision options again, be dissatisfied with your current ability to make-sense-of-things, and do more orienting.

Then eventually you're in a state where you know how to think about the situation, and you pretty much know what the options are, but as you start thinking about "Acting", your brain starts to see the consequences of each decision in near mode, which changes your guesses about which actions are best.

Then, as you start acting in earnest, each action comes with some immediate observations.

But, you can't really move from "Observe" to "Decide" without having gone through at least a little bit of an orient step on how to classify your observations.

Comment by Raemon on Hire (or Become) a Thinking Assistant · 2024-12-31T18:28:36.785Z · LW · GW

I did a session yesterday with @moonlight, which went pretty well. I ended up consolidating some notes that seemed good to share with new assistants, and then he wrote the introduction he'd personally have preferred. 

I generally work out of google docs that serve as shared-external-working-memory, with multiple tabs.

Moonlight's Intro

[Written by the first thinking assistant working with Ray, writing here what I’d have liked to read first]

Important things:

  • Ray is currently sick, so put some effort into speaking more softly and slowly.
  • There is no interview or anything similar, you’ll begin assisting him straight away.
  • By default, just watch him work (coding/planning/writing/operations), and occasionally give signs you’re still attentive, without interrupting.
  • Write moment to moment observations which feel useful to you, as well as general thoughts, down in the Assistant Notes tab. This helps you feel more proactively involved and makes you focused on noticing patterns and ways in which you could be more useful as an assistant.
  • The Journal tab is for his plans and thoughts about what to generally do. Read it as an overview.
  • This Context tab is for generally useful information about what you should do and about relevant strategies and knowledge Ray has in mind. Reading this helps you get a more comprehensive view on what his ideal workflow looks like, and what your ideal contributions look like.

For the structure of this document:

  • Collapse sections when reading. It helps traverse the document.
  • This part has my thoughts for onboarding. Under it, you can find Ray’s onboarding section. Read these two first.
  • The “Ray Facts” section has important information about logistics and operations. Currently it has only his work location. [edited out in this comment]
  • In “Ray’s Metacognitive Engine” and below, you can find the strategies and knowledge I’ve mentioned above. You can read these after, they’re not mandatory at the very start.

Ray’s First Draft Intro Materials

Strategic Overview

Goal: End the acute risk period, and ensure a flourishing human future.

I’ve recently finished a bunch of grieving necessary to say “all right I’m ready to just level up into an Elon-Musk-but-with-empathy-and-cyborg-tools type”, as well as the minimum necessary pieces of a cognitive engine that (I think) is capable of doing so).

I want to be growing in capacity at an exponential rate, both in terms of my personal resources, and the resources available to the x-risk ecosystem that are accomplishing things I think need accomplishing.

This means having a number of resources that are compounding, that are synergistic, which include:

  • Money (either mine, or ability to spend Lightcone’s)
  • Skills
    • Meta personal skills, like ability to learn, and understand things, or be strategic
    • Meta interpersonal skills, such as the ability to outsource labor or make use of assistants,
    • Object level skills like programming, UI design, Event running
    • Ability to work with employees who can take on tasks I want done
  • Capital
    • Relationships with people I work well with
    • Tools I can re-use

Things I actually do most days:

  • Coding on LessWrong
  • Coding on random other projects
  • Planning my Cybercognition Agenda, which includes workshops, cybernetic tools, and upskilling people around me.
  • UI design, trying to figure out important complex things I want people to interact with in a way that feels simple to them. 
  • Thinking strategically about what needs to be done next

Instructions for Thinking Assistants

Things I would like you to do:

  • By default, be quiet and attentive and just help me focus by being a real human who’s staring at me
  • Develop skills for tackling sort of arbitrary ops or research or coding tasks, such that I can outsource small things to you.
  • Advice
    • This is tricky because I have a good enough model of myself that a lot of advice isn’t that helpful. It’s still useful to have my blindspots pointed out. But, if I interrupt you (either with words or with a hand gesture) that probably means I want to move on to a different thread. (Ideally, you feel comfortable bringing up ideas, with no hard feelings if it doesn’t work out) 

I would like to end up with a series of if-then habits you can help me execute. I will mostly write these myself, but as you get to know me well enough to say useful things, you can make suggested-edits

From “Hire or become a Thinking Assistant

  • By default, be quietly but visibly attentive.
  • Every now and then (~5-10 minutes, or when I look actively distracted), briefly check in (where if I'm in-the-zone, this might just be a brief "Are you focused on what you mean to be?" from them, and a nod or "yeah" from me).
  • When I need to think something through, they rubber duck (i.e. listen as I talk out loud about it, and ask clarifying questions)
  • Build a model of my thought process (partly by me explaining it to them, partly by observing, partly by asking questions)
  • Ideally, notice when my thought process seems confused/disoriented/inefficient.
  • Ideally, have a large repertoire of cognitive tools they can suggest if I seem to be missing them.
  • Intelligent enough that they can pretty easily understand the gist of what I'm working on.
  • Ability to pick things up from context so I don't need to explain things in too much detail.
  • Ideally, when my bottlenecks are emotional, also be at least fairly emotionally attuned (i.e. project a vibe that helps me worth through it, or at least doesn't add extra friction or emotional labor demands from me), and ideally, basically be a competent therapist.
  • In general, own the metacognition. i.e. be taking responsibility for keeping track of things, both on a minute-to-minute timescale, and the day-to-day or week-to-week timescale.
  • Ability to get out of the way / quickly drop things if it doesn't turn out to be what I need, without it being a big deal. 

There are also important outside-the-container skillsets, such as:

  • Be responsive in communication, so that it's easy to schedule with them. If it's too much of a pain to schedule, it kinda defeats the point.
  • Potentially: proactively check in remotely during periods where I'm not actively hiring them. i.e. be a professional accountability buddy, maybe paid some base rate to briefly check in each day, with the ability to upsell into "okay today is a day that requires bigger metacognitive guns than Raemon has at the moment")

Even the minimum bar (i.e. "attentive body double") here is a surprisingly skilled position. It requires gentleness/unobtrusiveness, attentiveness, a good vibe. 

The skill ceiling, meanwhile, seems quite high. The most skilled versions of this are the sort of therapist or executive coach who would charge hundreds of dollars an hour. The sort of person who is really good at this tends to quickly find their ambitions outgrowing the role (same with good executive assistants, unfortunately).

 

Pitfalls

Common problems I've run into:

  • Having trouble scheduling with people. If you want to specialize in this role, it's often important for people to contact you on a short timeline (i.e. I might notice I'm in a brainfoggy state and want someone to assist me like right now, or tomorrow), so, having a communication channel you check regularly so people can ping you about a job.
  • Asking questions in a way that is annoying instead of helpful. Since the point is to be giving me more time, if I have to spend too much time explaining the situation to someone, it undoes the value of it. This requires either them being good at picking things up quickly without much explanation, or good at reading nonverbal cues that the current thread isn't worth it and we should move on.
  • Spending too much time on unhelpful advice. Sometimes an assistant will have ideas that don't work out, and maybe push them more than appropriate. There's a delicate balance here because sometimes I am being avoidant or something and need advice outside of my usual wheelhouse, but generally if advice isn't feeling helpful, I think the assistant should back off and observe more and try to have a few other hypotheses about what to suggest if they feel that the assistee is missing something.
  • Navigating weird dynamics around "having someone entirely optimized to help another person." Having this run smoothly, in a net helpful way, means having to actually be prioritizing my needs/goals in a way that would normally be pretty rude. If I constantly feel like there's social awkwardness / wariness about whether I'm making them feel bad, the whole thing is probably net negative. I think doing a good job of navigating this requires some nuance/emotional-skill on both parties, in terms of striking a vibe where it feels like you are productively collaborating.
    • (I think this likely works best when the person is really actively interested in the job "be a thinking assistant", as opposed to something they're doing because they haven't gotten traction on their real goals).

Ray’s Metacognitive Engine

  • Twice a day, asking “what is the most important thing I could be working on and why aren’t I on track to deal with it?”
    • you probably want a more specific question (“important thing” is too vague). Three example specific questions (but, don’t be a slave to any specific operationalization)
      • what is the most important uncertainty I could be reducing, and how can I reduce it fastest?
      • what’s the most important resource bottleneck I can gain, or contribute to the ecosystem, and would gain me that resource the fastest?
      • what’s the most important goal I’m backchaining from?
  • Have a mechanism to iterate on your habits that you use every day, and frequently update in response to new information
    • for me, this is daily prompts and weekly prompts, which are:
      • optimized for being the efficient metacognition I obviously want to do each day
      • include one skill that I want to level up in, that I can do in the morning as part of the meta-orienting (such as operationalizing predictions, or “think it faster”, or whatever specific thing I want to learn to attend to or execute better right now)
  • The five requirements each fortnight:
    • be backchaining 
      • from the most important goals
    • be forward chaining 
      • through tractable things that compound
    • ship something 
      • to users every fortnight
    • be wholesome 
      • (that is, do not minmax in a way that will predictably fail later)
    • spend 10% on meta (more if you’re Ray in particular but not during working hours. During working hours on workdays, meta should pay for itself within a week)
  • Correlates:
    • have a clear, written model of what you’re backchaining from
    • have a clear, written model of how you’re compounding
  • The general problem solving approach:
    • breadth first
    • identify cruxes
    • connect inner-sim to cruxes / predictions
    • follow your heart
    • see how your predictions went
  • Random ass skills
    • napping
    • managing working memory, innovating and applying on working memory tools
    • grieving
    • Generalizing


Skill I’m working on that hasn’t paid off yet but I believe in:

  • At least once a day or so, when you notice a mistake or surprise, spent a couple minutes asking “how could I have thought that faster” (and periodically do deeper dives)
  • each day/week, figure out what you’re confused or predictably going to tackle in a dumb way, and think in advance about how to be smart about it the first time
Comment by Raemon on The Field of AI Alignment: A Postmortem, and What To Do About It · 2024-12-30T06:33:55.901Z · LW · GW

This is the sort of thing I find appealing to believe, but I feel at least somewhat skeptical of. I notice a strong emotional pull to want this to be true (as well as an interesting counterbalancing emotional pull for it to not be true). 

I don't think I've seen output from the people aspiring in this direction without being visibly quite smart to make me think "okay yeah it seems like it's on track in some sense."

I'd be interested in hearing more explicit cruxes from you about it.

I do think it's plausible than the "smart enough, creative enough, strong epistemics, independent, willing to spend years without legible output, exceptionally driven, and so on" are sufficient (if you're at least moderately-but-not-exceptionally-smart). Those are rare enough qualities that it doesn't necessarily feel like I'm getting a free lunch, if they turn out to be sufficient for groundbreaking pre-paradigmatic research. I agree the x-risk pipeline hasn't tried very hard to filter for and/or generate people with these qualities.

(well, okay, "smart enough" is doing a lot of work there, I assume from context you mean "pretty smart but not like genius smart")

But, I've only really seen you note positive examples, and this seems like the sort of thing that'd have a lot of survivorship bias. There can be tons of people obsessed, but not necessarily on the right things, and if you're not naturally the right cluster of obsessed + smart-in-the-right-way, I don't know whether trying to cultivate the obsession on purpose will really work. 

I do nonetheless overall probably prefer people who have all your listed qualities, and who also either can:

a) self-fund to pursue the research without having to make it legible to others
b) somehow figure out a way to make it legible along the way

I probably prefer those people to tackle "the hard parts of alignment" over many other things they could be doing, but not overwhelmingly obviously (and I think it should come with a background awareness that they are making a gamble, and if they aren't the sort of person who must make that gamble due to their personality makeup, they should be prepared for the (mainline) outcome that it just doesn't work out)

Comment by Raemon on Hire (or Become) a Thinking Assistant · 2024-12-30T02:18:02.040Z · LW · GW

I'd sort of naively guess doing it with a stranger (esp. one not even in your circles) would be easier on the "feeling private/anxious about your productivity" – does that feel like it wouldn't work?

Comment by Raemon on Hire (or Become) a Thinking Assistant · 2024-12-28T19:11:53.883Z · LW · GW

Okay a few people have DMd me, and I'm feeling some kind of vague friction that feels currently on track to be a dealbreaker so let's think that through here.

Problems:

  • I can't tell offhand who's good at this, and while I think this is something someone with little experience could turn out to be good at, they often won't be, and it's kind of costly to spend a slot on them, especially if I really need someone competent at it.
  • I often need someone "right now", and need a way to contact a bunch of people quickly, such that most of them will get the message and one of them will reply quickly, in a way that isn't too annoying for them but works.

I have a vision of a whole-ass website dedicated to facilitating this but right now want a quick hacky solution.

A group DM would work, but that feels like it'll produce weird competitive dynamics with who replies first but maybe isn't as good as the person who replies second.

DMing a bunch of people individually I guess is fine but but then I need to go find them.

A requirement for everyone participating as an assistant is that they have a way of being contacted that they'll respond to quickly.

Comment by Raemon on ReSolsticed vol I: "We're Not Going Quietly" · 2024-12-28T18:34:43.973Z · LW · GW

I've added lyrics to this post for now (if you expand each section)

Comment by Raemon on What are the most interesting / challenging evals (for humans) available? · 2024-12-27T17:48:18.498Z · LW · GW

Clarification (I'll add this to the OP): 

The ideal that I'm looking for are things that will take a smart researcher (like 95th percentile alignment researcher, i.e. there are somewhere between 10-30 people who might count) at least 30 minutes to solve the problem, and most alignment researchers maybe would have a 50% change of figuring it out in 1-3 hours.

The ideal is that people have to:

a) go through a period of planning, and replanning
b) spend at least some time feeling like the problem is totally opaque and they don't have traction.
c) have to reach for tools that they don't normally reach for.

It may be that we just don't have evals at this level yet, and I might take what I can get, but, it's what I'm aiming for. 

I'm not trying to make an IQ test – my sense from the literature is that you basically can't raise IQ through training. So many people have tried. This is very weird to me – subjectively it is just really obvious to me that I'm flexibly smarter in many ways than I was in 2011 when I started the rationality project, and this is due to me having a lot of habits I didn't used to have. The hypotheses I currently have are:

  • You just have to be really motivated to do transfer learning, and a genuinely inspiring / good teacher, and it's just really hard to replicate this sort of training scientifically
  • IQ is mostly measuring "fast intelligence", because that's what cost-effective to measure in large enough quantities to get a robust sample. i.e. it measures whether you can solve questions in like a few minutes which mostly depends on you being able to intuitively get it. It doesn't measure your ability to figure out how to figure something out that requires longterm planning, which would allow a lot of planning skills to actually come into play.

Both seem probably at least somewhat true, but the latter one feels like a clearer story for why there would be potential (at least theoretically) in the space I'm exploring – IQ test take a few hours to take. It would be extremely expensive to do the theoretical statistically valid version of the thing I'm aiming at. 

My explicit goal here is to train researchers who are capable of doing the kind of work necessary in worlds where Yudkowsky is right about the depth/breadth of alignment difficulty.

Comment by Raemon on johnswentworth's Shortform · 2024-12-27T03:07:23.800Z · LW · GW

(my guess is you took more like 15-25 minutes per question? Hard to tell from my notes, you may have finished early but I don't recall it being crazy early)

Comment by Raemon on johnswentworth's Shortform · 2024-12-27T02:57:34.945Z · LW · GW

(This seems like more time than Buck was taking – the goal was to not get any wrong so it wasn't like people were trying to crank through them in 7 minutes)

The problems I gave were (as listed in the csv for the diamond problems) 

  • #1 (Physics) (1 person got right, 3 got wrong, 1 didn't answer)
  • #2 (Organic Chemistry), (John got right, I think 3 people didn't finish)
  • #4 (Electromagnetism), (John and one other got right, 2 got wrong)
  • #8 (Genetics) (3 got right including John)
  • #10 (Astrophysics) (5 people got right)
Comment by Raemon on johnswentworth's Shortform · 2024-12-27T02:38:09.708Z · LW · GW

I at least attempted to be filtering the problems I gave you for GPQA diamond, although I am not very confident that I succeeded. 

(Update: yes, the problems John did were GPQA diamond. I gave 5 problems to a group of 8 people, and gave them two hours to complete however many they thought they could complete without getting any wrong)

Comment by Raemon on Hire (or Become) a Thinking Assistant · 2024-12-25T20:27:52.551Z · LW · GW

I like all these questions. "Maybe you should X" is least likely to be helpful but still fine so long as "nah" wraps up the thread quickly and we move on. The first three are usually helpful (at least filtered for assistants who are asking them fairly thoughtfully)

Comment by Raemon on Hire (or Become) a Thinking Assistant · 2024-12-25T20:19:03.180Z · LW · GW

I imagined "FocusMate + TaskRabbit" specifically to address this issue.

Three types of workers I'm imagining here:

  • People who are reasonable skilled types, but who are youngish and haven't landed a job yet.
  • People who actively like doing this sort of work and are good at it
  • People who have trouble getting/keeping a fulltime job for various reasons (which would land them in the "unreliable" sector), but... it's FocusMate/TaskRabbit, they don't need to be reliable all the time, there just needs to be one of them online who responds to you within a few hours, who is at least reasonably competent when they're sitting down and paying attention. 

And then there are reviews (which I somehow UI design to elicit honest reactions, rather than just slapping a 0-5 stars rating which everyone feels obligated to rate "5" all the time unless something was actively wrong"), and they have profiles about what they think they're good at and what others thought they were good at.

(where an expectation is, if you don't have active endorsementss, if you haven't yet been rated you will probably charge a low rate)

Meanwhile if you're actively good and actively reliable, people can "favorite" you and work out deals where you commit to some schedule.

Comment by Raemon on Hire (or Become) a Thinking Assistant · 2024-12-24T22:49:47.034Z · LW · GW

(Quick note to people DMing me, I'm doing holidays right now and will followup in a week or so. I won't necessarily have slots/need for everyone expressing interest)

Comment by Raemon on Hire (or Become) a Thinking Assistant · 2024-12-23T19:01:01.721Z · LW · GW

Can you say more details about how this works (in terms of practical steps) and how it went?

Comment by Raemon on Hire (or Become) a Thinking Assistant · 2024-12-23T12:19:00.900Z · LW · GW

I actually meant to say "x-risk focused individuals" there (not particularly researchers), and yes was coming from the impact side of things. (i.e. if you care about x-risk, one of the options available to you is to becoming a thinking assistant). 

Comment by Raemon on Raemon's Shortform · 2024-12-20T20:33:59.711Z · LW · GW

I’d like to hire cognitive assistants and tutors more often. This could (potentially) be you, or people you know. Please let me know if you’re interested or have recommendations.

By “cognitive assistant” I mean a range of things, but the core thing is “sit next to me, and notice when I seem like I’m not doing the optimal thing, and check in with me.” I’m interested in advanced versions who have particular skills (like coding, or Applied Quantitivity, or good writing, or research taste) who can also be tutoring me as we go.

I’d like a large rolodex of such people, both for me, and other people I know who could use help. Let me know if you’re interested.

I was originally thinking "people who live in Berkeley" but upon reflection this could maybe be a remote role.

Comment by Raemon on Dress Up For Secular Solstice · 2024-12-20T18:56:41.415Z · LW · GW

Yep, endorsed. One thing I would add: the "semi-official" dresscode I've been promoting explicitly includes black (for space/darkness), silver (for stars), gold (for the sun), and blue (for the earth). 

(Which is pretty much what you have here, I think the blue works best when it is sort of a minority-character distributed across people, such that it's a bit special when you notice it)

Comment by Raemon on Basics of Rationalist Discourse · 2024-12-19T22:02:44.271Z · LW · GW

The complaints I remember about this post seem mostly to be objecting to how some phrases were distilled into the opening short "guideline" section. When I go reread the details it mostly seems fine. I have suggestions on how to tweak it.

(I vaguely expect this post to get downvotes that are some kind of proxy for vague social conflict with Duncan, and I hope people will actually read what's written here and vote on the object level. I also encourage more people to write up versions of The Basics of Rationalist Discourse as they seem them)

The things I'd want to change are:

1. Make some minor adjustments to the "Hold yourself to the absolute highest standard when directly modeling or assessing others' internal states, values, and thought processes." (Mostly, I think the word "absolute" is just overstating it. "Hold yourself to a higher standard" seems fine to me. How much higher-a-standard depends on context)

2. Somehow resolve an actual confusion I have with the "...and behave as if your interlocutors are also aiming for convergence on truth" clause. I think this is doing important, useful work, but a) it depends on the situation, b) it feels like it's not quite stating the right thing.

Digging into #2...

Okay, so when I reread the detailed section, I think I basically don't object to anything. I think the distillation sentence in the opening paragraphs conveys a thing that a) oversimplifies, and b) some people have a particularly triggered reaction to.

The good things this is aiming for that I'm tracking:

  • Conversations where everyone trusts that each other are converging on truth are way less frictiony than ones where everyone is mistrustful and on edge about it.
  • Often, even when the folk you're talking to aren't aiming for convergence on truth, proactively acting as if they are helps make it more true. Conversational vibes are contagious.
  • People are prone to see others' mistakes as more intense than their own mistakes, and if most humans aren't specifically trying to compensate for this bias, there's a tendency to spiral into a low-trust conversation unnecessarily (and then have the wasted motion/aggression of a low-trust conversation instead of a medium-or-high one). 

I think maybe the thing I want to replace this with is more like "aim for about 1-2 levels more trusting-that-everyone-is-aiming-for-truth than currently feel warranted, to account for your own biases, and to lead by example in having the conversation focus on truth." But I'm not sure if this is quite right either.

...

This post came a few months before we created our New User Reject Template system. It should have at least occurred to me to use some of the items here as some of the advice we have easily-on-hand to give to new users (either as part of a rejection notice, or just "hey, welcome to LW but it seems like you're missing some of the culture here."

If this post was voted in the Top 50, and a couple points were resolved, I'd feel good making a making a fork with minor context-setting adjustments and then linking to it as a moderation resource), since I'd feel like The People had a chance to weigh in on it. 

The context-setting I'm imagining is not "these are the official norms of LessWrong", but, if I think a user is making a conversation worse for reasons covered in this post, be more ready to link to this post. Since this post came out, we've developed better Moderator UI for sending users comments on their comments, and it hadn't occurred to me until now to use this post as reference for some of our Stock Replies.

(Note: I currently plan to make it so, during the Review, anyone write Reviews on a post even if normally blocked on commenting. Ideally I'd make it so they can also comment on Review comments. I haven't shipped this feature yet but hopefully will soon)

Comment by Raemon on Dear Self; we need to talk about ambition · 2024-12-19T21:52:43.370Z · LW · GW

Previously, I think I had mostly read this through the lens of "what worked for Elizabeth?" rather than actually focusing on which of this might be useful to me. I think that's a tradeoff on the "write to your past self" vs "attempt to generalize" spectrum – generalizing in a useful way is more work.

When I reread it just now, I found the "Ways to Identify Fake Ambition" the most useful section (both for the specific advice of "these emotional reactions might correspond to those motivations", and the meta-level advice of "check for your emotional reactions and see what they seem to be telling you."

I'd kinda like to see a post that is just that section, with a bit of fleshing out to help people figure out when/why they should check for fake ambition (and how to relate to it). I think literally a copy-paste version would be pretty good, and I think there's a more (well, um) ambitious version that does more interviewing with various people and seeing how the advice lands for them.

I might incorporate this section more directly into my metastrategy workshops.

Comment by Raemon on Subskills of "Listening to Wisdom" · 2024-12-18T18:38:26.801Z · LW · GW

Well to be honest in the future there is probably mostly an AI tool that just beams wisdom directly into your brain or something.

Comment by Raemon on Everything you care about is in the map · 2024-12-18T18:35:59.261Z · LW · GW

I wrote about 1/3 of this myself fyi. (It was important to me to get it to a point where it was not just a weaksauce version of itself but where I felt like I at least might basically endorse it and find it poignant as a way of looking at things)

Comment by Raemon on Being Present is Not a Skill · 2024-12-18T01:47:22.786Z · LW · GW

One way I parse this is "the skill of being present (may be) about untangling emotional blocks that prevent you from being present, more than some active action you take."

It's not like entangling emotional blocks isn't tricky! 

Comment by Raemon on Being Present is Not a Skill · 2024-12-18T01:39:17.444Z · LW · GW

I don't have a strong belief that this experience won't generalize, but, I want to flag the jump between "this worked for me" and an implied "this'll work for everyone/most-people." (I expect most people would benefit from hearing this suggestion, just generally have a yellow-flag about some of the phrasings you have here)

Comment by Raemon on Everything you care about is in the map · 2024-12-18T00:19:07.657Z · LW · GW

Nod. 

Fwiw I mostly just thought it was funny in a way that was sort of neutral on "is this a reasonable frame or not?". It was the first thing I thought of as soon as I read your post title.

(I think it's both true that in an important sense everything we care about is in the Map, and also true in an important sense that it's not, and in the ways it was true it felt like a kind of legitimately poignant rewrite that felt like it helped me appreciate your post, and insofar as it was false it seemed hilarious (non-meanspiritedly, just in a "it's funny that so many lines from the original remain reasonable sentences when you reframe it as about epistemology"))

Comment by Raemon on Everything you care about is in the map · 2024-12-17T21:24:00.250Z · LW · GW

lol at the strong downvote and wondering if it is more objecting to the idea itself or more because Claude co-wrote it?

Comment by Raemon on Everything you care about is in the map · 2024-12-17T20:15:25.533Z · LW · GW

Look again at that map. 

That's here. That's all we know. That's us. 

On that map lies everything you love, everyone you know, everything you've ever heard of. The aggregate of our joy and suffering, thousands of confident religions, ideologies, and economic doctrines, every thought and feeling, every hero and villain, every creator and destroyer of ideas, every paradigm and perspective, every romantic notion, every parent's love, every child's wonder, every flash of insight and exploration, every moral framework, every friendship, every "universal truth", every "fundamental principle" - all of these lived there, in a mere approximation suspended in consciousness.

Our mind is a very small theater in the vast unknown of reality. Think of the endless conflicts between holders of one corner of this mental map and the barely distinguishable beliefs of another corner, how frequent their misunderstandings, how eager they are to impose their models on one another, how fervent their certainties. Think of the rivers of ink spilled by all those philosophers and ideologues so that, in glory and triumph, they could become the momentary arbiters of a fraction of a map.

It has been said that epistemology is a humbling and character-building pursuit. There is perhaps no better demonstration of the folly of human certainty than this recognition of our lenses' limits. To me, it underscores our responsibility to hold our maps more lightly, to deal more kindly with those whose maps differ, and to preserve and cherish this precious capacity for understanding, the only world we've ever known.

(partially written by Claude because I was too lazy busy to write the whole thing by hand)

Comment by Raemon on The 2023 LessWrong Review: The Basic Ask · 2024-12-17T17:18:43.323Z · LW · GW

Can you post a screenshot?

One confounded: by default it’s filtering to posts you’ve read. Toggle off the read filter to see the entire amount.

Comment by Raemon on The 2023 LessWrong Review: The Basic Ask · 2024-12-16T18:43:18.458Z · LW · GW

(I've appreciated your reviews that went and took this to heart, thanks!)

Comment by Raemon on A Way To Be Okay · 2024-12-16T07:23:09.675Z · LW · GW

Another piece of the "how to be okay in the face of possible existential loss" puzzle. I particularly liked the "don't locate your victory conditions inside people/things you can't control" frame. (I'd heard that elsewhere I think but it felt well articulated here)

Comment by Raemon on Competitive, Cooperative, and Cohabitive · 2024-12-16T07:18:58.622Z · LW · GW

I appreciated both this and Mako Yass' Cohabitive Games so Far (I believe Screwtape's post actually introduced the term "cohabitive", which Mako adopted). I think both posts 

I have an inkling that cohabitive games may turn out to be important for certain kinds of AI testing and evaluation – can an AI not only win games with rutheless optimization, but also be a semi-collaborative player in an opended context? (This idea is shaped in part by some ideas I got reading about Encultured)

Comment by Raemon on Fighting without hope · 2024-12-16T07:10:39.173Z · LW · GW

A simple but important point, that has shaped my frame for how to be an emotionally healthy and productive person, even if the odds seem long.

Comment by Raemon on Biological risk from the mirror world · 2024-12-14T19:57:08.166Z · LW · GW

Curated. I'd previously heard vague things about Mirror Life but didn't understand why it would be threatening. This post laid out the case much more clearly than I'd previously heard.

Comment by Raemon on Cohabitive Games so Far · 2024-12-14T19:30:50.118Z · LW · GW

I think it's fine to edit in "here's a link to the thing I shipped later" at the top and/or bottom and/or middle of the post.

Comment by Raemon on Communications in Hard Mode (My new job at MIRI) · 2024-12-14T03:15:54.630Z · LW · GW

Mod note: I frontpaged this. It was a bit of an edge case because we normally don't frontpage "organizational announcements", but, I felt like this one had enough implicit models that I'd feel good reading it in a couple years, whether or not MIRI is no longer doing this particular strategy.

Comment by Raemon on Cohabitive Games so Far · 2024-12-14T00:16:42.755Z · LW · GW

Now they aren't :) This is a case where I think the review's sort of caught the development process in amber.

I'm not sure I understand what the topic is, but, flagging that you are encouraged to edit posts during the Review to make the better, more timeless versions of themselves.

Comment by Raemon on The "Think It Faster" Exercise · 2024-12-13T20:28:39.484Z · LW · GW

I dunno, @sarahconstantin do you remember?

(I'm also curious what @Eliezer Yudkowsky thinks of this post, for that matter, if he's up for it)