How feasible is long-range forecasting? 2019-10-10T22:11:58.309Z · score: 40 (10 votes)
AI Alignment Writing Day Roundup #2 2019-10-07T23:36:36.307Z · score: 35 (9 votes)
Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More 2019-10-04T04:08:49.942Z · score: 138 (47 votes)
Follow-Up to Petrov Day, 2019 2019-09-27T23:47:15.738Z · score: 82 (26 votes)
Honoring Petrov Day on LessWrong, in 2019 2019-09-26T09:10:27.783Z · score: 137 (51 votes)
SSC Meetups Everywhere: Salt Lake City, UT 2019-09-14T06:37:12.296Z · score: 0 (0 votes)
SSC Meetups Everywhere: San Diego, CA 2019-09-14T06:34:33.492Z · score: 0 (0 votes)
SSC Meetups Everywhere: San Jose, CA 2019-09-14T06:31:06.068Z · score: 0 (0 votes)
SSC Meetups Everywhere: San José, Costa Rica 2019-09-14T06:25:45.112Z · score: 0 (0 votes)
SSC Meetups Everywhere: São José dos Campos, Brazil 2019-09-14T06:18:23.523Z · score: 0 (0 votes)
SSC Meetups Everywhere: Seattle, WA 2019-09-14T06:13:06.891Z · score: 0 (-1 votes)
SSC Meetups Everywhere: Seoul, South Korea 2019-09-14T06:08:26.697Z · score: 0 (0 votes)
SSC Meetups Everywhere: Sydney, Australia 2019-09-14T05:53:45.606Z · score: 0 (0 votes)
SSC Meetups Everywhere: Tampa, FL 2019-09-14T05:49:31.139Z · score: 0 (0 votes)
SSC Meetups Everywhere: Toronto, Canada 2019-09-14T05:45:15.696Z · score: 0 (-1 votes)
SSC Meetups Everywhere: Vancouver, Canada 2019-09-14T05:39:25.503Z · score: 0 (0 votes)
SSC Meetups Everywhere: Victoria, BC, Canada 2019-09-14T05:34:40.937Z · score: 0 (-1 votes)
SSC Meetups Everywhere: Vienna, Austria 2019-09-14T05:27:31.640Z · score: 2 (2 votes)
SSC Meetups Everywhere: Warsaw, Poland 2019-09-14T05:24:16.061Z · score: 0 (0 votes)
SSC Meetups Everywhere: Wellington, New Zealand 2019-09-14T05:17:28.055Z · score: 0 (0 votes)
SSC Meetups Everywhere: West Lafayette, IN 2019-09-14T05:11:28.211Z · score: 0 (0 votes)
SSC Meetups Everywhere: Zurich, Switzerland 2019-09-14T05:03:43.295Z · score: 0 (0 votes)
Rationality Exercises Prize of September 2019 ($1,000) 2019-09-11T00:19:51.488Z · score: 85 (25 votes)
Stories About Progress 2019-09-08T23:07:10.443Z · score: 31 (9 votes)
Political Violence and Distraction Theories 2019-09-06T20:21:23.801Z · score: 18 (7 votes)
Stories About Education 2019-09-04T19:53:47.637Z · score: 41 (16 votes)
Stories About Academia 2019-09-02T18:40:00.106Z · score: 33 (21 votes)
Peter Thiel/Eric Weinstein Transcript on Growth, Violence, and Stories 2019-08-31T02:44:16.833Z · score: 71 (29 votes)
AI Alignment Writing Day Roundup #1 2019-08-30T01:26:05.485Z · score: 34 (14 votes)
Why so much variance in human intelligence? 2019-08-22T22:36:55.499Z · score: 55 (20 votes)
Announcement: Writing Day Today (Thursday) 2019-08-22T04:48:38.086Z · score: 31 (11 votes)
"Can We Survive Technology" by von Neumann 2019-08-18T18:58:54.929Z · score: 35 (11 votes)
A Key Power of the President is to Coordinate the Execution of Existing Concrete Plans 2019-07-16T05:06:50.397Z · score: 115 (34 votes)
Bystander effect false? 2019-07-12T06:30:02.277Z · score: 19 (10 votes)
The Hacker Learns to Trust 2019-06-22T00:27:55.298Z · score: 77 (22 votes)
Welcome to LessWrong! 2019-06-14T19:42:26.128Z · score: 90 (42 votes)
Von Neumann’s critique of automata theory and logic in computer science 2019-05-26T04:14:24.509Z · score: 30 (11 votes)
Ed Boyden on the State of Science 2019-05-13T01:54:37.835Z · score: 64 (16 votes)
Why does category theory exist? 2019-04-25T04:54:46.475Z · score: 35 (7 votes)
Formalising continuous info cascades? [Info-cascade series] 2019-03-13T10:55:46.133Z · score: 17 (4 votes)
How large is the harm from info-cascades? [Info-cascade series] 2019-03-13T10:55:38.872Z · score: 23 (4 votes)
How can we respond to info-cascades? [Info-cascade series] 2019-03-13T10:55:25.685Z · score: 15 (3 votes)
Distribution of info-cascades across fields? [Info-cascade series] 2019-03-13T10:55:17.194Z · score: 15 (3 votes)
Understanding information cascades 2019-03-13T10:55:05.932Z · score: 55 (19 votes)
(notes on) Policy Desiderata for Superintelligent AI: A Vector Field Approach 2019-02-04T22:08:34.337Z · score: 46 (16 votes)
How did academia ensure papers were correct in the early 20th Century? 2018-12-29T23:37:35.789Z · score: 79 (20 votes)
Open and Welcome Thread December 2018 2018-12-04T22:20:53.076Z · score: 28 (10 votes)
The Vulnerable World Hypothesis (by Bostrom) 2018-11-06T20:05:27.496Z · score: 47 (17 votes)
Open Thread November 2018 2018-10-31T03:39:41.480Z · score: 17 (6 votes)
Introducing the AI Alignment Forum (FAQ) 2018-10-29T21:07:54.494Z · score: 90 (33 votes)


Comment by benito on Rationality and Levels of Intervention · 2019-10-14T07:00:39.211Z · score: 2 (1 votes) · LW · GW


(The reason I framed it in the style of "am I allowed this thought" / "will my teacher accept it if I make this inference?” is because that's literally the frame used in the post ;P)

Comment by benito on Benito's Shortform Feed · 2019-10-11T07:46:30.741Z · score: 6 (6 votes) · LW · GW

At the SSC Meetup tonight in my house, I was in a group conversation. I asked a stranger if they'd read anything interesting on the new LessWrong in the last 6 months or so (I had not yet mentioned my involvement in the project). He told me about an interesting post about the variance in human intelligence compared to the variance in mice intelligence. I said it was nice to know people read the posts I write. The group then had a longer conversation about the question. It was enjoyable to hear strangers tell me about reading my posts.

Comment by benito on List of resolved confusions about IDA · 2019-10-09T20:27:01.681Z · score: 4 (2 votes) · LW · GW
As far as I know, Paul hasn't explained his choice in detail. One reason he does mention, in this comment, is that in the context of strategy-stealing, preferences like "help me stay in control and be well-informed" do not make sense when interpreted as preferences-as-elicited, since the current user has no way to know if they are in control or well-informed.

I agree this example adds nuance, and I'm unsure how to correctly categorise it.

Comment by benito on List of resolved confusions about IDA · 2019-10-09T07:21:24.632Z · score: 4 (2 votes) · LW · GW

Seems odd to have the idealistic goal get to be the standard name, and the dime-a-dozen failure mode be a longer name that is more confusing.

I note that Wei says a similar thing happened to 'act-based':

My understanding is that "act-based agent" used to mean something different (i.e., a simpler kind of AI that tries to do the same kind of action that a human would), but most people nowadays use it to mean an AI that is designed to satisfy someone's short-term preferences-on-reflection, even though that no longer seems particularly "act-based".

Is there a reason why the standard terms are not being used to refer to the standard, short-term results?

(I suppose that economics assumes rational agents who know their preferences, so taking language from economics might lead to this situation with the 'short-term preferences' decision.)

In the post Wei contrasts "current" and "actual" preferences. "Stated" vs "reflective" preferences also seem like nice alternatives too.

Comment by benito on Rationality and Levels of Intervention · 2019-10-09T06:35:54.071Z · score: 2 (1 votes) · LW · GW

(I want to note that I'm quite interested in having a conversation about the above, both with Geoff but also with others who have thought a lot about rationality.)

Comment by benito on List of resolved confusions about IDA · 2019-10-09T06:34:07.146Z · score: 2 (1 votes) · LW · GW

Oh, okay. Is it not important to have a name for the class of thing we could accidentally train an ML system to optimise for that isn't our ultimate preferences? Is there a term for that?

Comment by benito on List of resolved confusions about IDA · 2019-10-09T00:17:49.091Z · score: 4 (2 votes) · LW · GW

You have a section titled

learning user preferences for corrigibility isn't enough for corrigible behavior

Would this be more consistently titled "Learning narrow preferences for corrigibility isn't enough for corrigible behavior"?

Comment by benito on List of resolved confusions about IDA · 2019-10-09T00:14:25.072Z · score: 2 (1 votes) · LW · GW

I understand Paul to be saying that he hopes that corrigibility will fall out if we train an AI to score well on your short-term preferences, not just your narrow-preferences.

Comment by benito on List of resolved confusions about IDA · 2019-10-09T00:13:51.857Z · score: 2 (1 votes) · LW · GW
At some point Paul used "short-term preferences" and "narrow preferences" interchangeably, but no longer does (or at least no longer endorses doing so).

I would like to have these two terms defined. Let me offer my understanding from reading the relevant thread.

short-term preferences = short-term preferences-on-reflection ≠ narrow preferences

Short-term preferences refer to the most useful action I can take next, given my ultimate goals. This is to be contrasted with my current best guess about the outcome of that process. It's what I would want, not what I do want.

An AI optimising for my short-term preferences may reasonably say "No, don't take this action, because you'd actually prefer this alternative action if you only thought longer. It fits your true short-term preferences, you're just mistaken about them." This is in contrast with something you might call narrow preferences, which is where you tell the AI to do what you said anyway.

Comment by benito on List of resolved confusions about IDA · 2019-10-08T23:55:24.391Z · score: 11 (3 votes) · LW · GW


Comment by benito on Rationality Exercises Prize of September 2019 ($1,000) · 2019-10-08T23:22:21.951Z · score: 8 (2 votes) · LW · GW

I did #4 and #1. Here is what I wrote for each section of #4 (note: this will spoil your ability to do the exercise if you read it).

1. How do you explain these effects?

Seems like a trick question. Like, I have models of the world that feel like they might predict effects 2 and 3, and I can sort of wrangle explanations for 1 and 4, but my split-second reaction is “I’m not sure these are real effects, probably none replicate (though number 2 sounds like it might just be a restatement of a claim I already believe)”.

2. How would you have gone about uncovering them?

As I think about trying to determine whether someone did their diet for ethical reasons, I immediately feel highly skeptical of the result. I think that the things people will tick-box as ‘because I care about animals’ does not necessarily refer to a deep underlying structure of the world that is ‘ethics’, and can refer to one of many things (e.g. exposure to effective guilt-based marketing, reflections on ethical philosophy, the ownership of a dog/cat/pet from an early age, etc). But I guess that just doing a simple questionnaire isn’t of literally zero value.

Loyalty two feels like a thing I could design a better measure for, but I worry this is tangled up with me believing it’s true, and thus illusion-of-transparency assuming people mean the same thing as me if they check-box ‘loyalty’.

Number 3 seems totally testable and straightforward.

Number 4 seems broadly testable. Creativity could be done with that “list the uses of a brick” test, or some other fun ones.

I notice this makes me more skeptical about the first two ‘results’ and more trusting of the last two ‘results’.

3. These are all reversed, and the actual findings were the opposite of what I said. How do you explain the opposite, correct effects?

Ah, the classic ‘I reversed the experimental findings trick’. Well, I guess I did fine on it this time. Oh look, I just managed to think of an explanation for number 2, which is that a more discerning audience of less loyal customers increases adversarial pressures among service providers, raising the prices. Interesting. I think I mostly am noticing how modern psychological research methodology can be quite terrible, and that such a questionnaire without incorporating a thoughtful model of the environment will often be useless. Model-free empirical questions can be overdetermined by the implicit model.

4. Actually, none of these results could be replicated. Why and how were non-null effects detected in the first place? Answers using your designs from (2) are preferable.

Okay. Science is awful.


More general thoughts: This helped me notice how relying on assuming a simple empirical psychological claim like this shouldn't be used as evidence about anything. That pattern-matches to radical skepticism, but that's not what I mean. I think I’m mostly saying context-free/theory-free claims are meaningless in psychology/sociology, or something like that.

And #1.

The only thing I can come up with is that the graph doesn’t prove causality in any particular way. (it did take me like 3 whole minutes to come up with noticing correlation isn't causation - I was primarily looking for things like axis labelled in unhelpful ways or something). I can tell a story where these are uncorrelated and everyone is dumb. I can tell a story where decreasing wages is the *explanation* for why debt is growing - it was previously in equilibrium, but now is getting paid off much more slowly. I can tell a story of active prevention, whereby because wages are going down, the government is making students pay less and store more of it as debt so they still have a good quality of life immediately after college.

Again, I’m noticing how simple context-free/theory-free claims do not determine an interpretation.

While the post promised answers in the comments, there were no comments, neither on the post or on the linked Washington Post article, so I'm not sure what the expected take-away was.

Comment by benito on Proving Too Much (w/ exercises) · 2019-10-08T23:14:16.579Z · score: 7 (2 votes) · LW · GW

I did all the exercises above. Here's what I wrote down during the timed sections. (It's a stream of consciousness account, it may not be very clear/understandable.)

How would you generalize the common problem in the above arguments? You have 2 minutes

The structure of the reasoning does not necessarily correlate with one outcome more than others. You say A because X, but I can argue that B because X.

But I'm confused, because I can do this for any argument that’s not maximally well-specified though. Like, there’s always a gotcha. If I argue for the structure of genetics due to the pattern of children born with certain features, I could also use that evidence combined with an anti-inductive prior to argue the opposite. I’m not quite sure what the reason is that some things feel like they prove too much and some don’t. I suppose it’s just “in the context of my actual understanding of the situation, do I feel like this argument pins down a world-state positively correlated with the belief or not?” and if it doesn’t, then I can neatly express this by showing it can prove anything, because it’s not actually real evidence.

Oh huh, maybe that's wrong. It’s not that it isn’t evidence for anything, it’s that if it were evidence for this it would be evidence for many inconsistent things. (Though I think those two are the same.)

What algorithm were you running when you solved the above problems? Is there a more ideal/general algorithm? You have 3 minutes.

Hmm, I did like the thing that happened actually. Normally in such a disagreement with a person, I would explain the structure of my beliefs around the thing they called a ‘reason’. I’d do lots of interpretive work like that. “Let me explain the process by which smart people get their beliefs and when those processes are/aren't truth-tracking" or “Let me explain what heuristics help predict whether a startups is successful” or “let me explain what p-hacking is”. But in all of them the new mental motion was much cleaner/cheaper, which was producing a small impossibility proof.

I think I normally avoid such proofs because they’re non-constructive - they don’t tell you where the mistake was or how that part of the world works, and I’m often worried this will feel like a demotivating thing or conversation killer for the other person I’m talking with. But I think it’s worth thinking this way for myself more. I do want to practice it, certainly. I should be able to use all tools of proof and disproof, not just those that make conversations go smoothly.

Some general thoughts

  • I found doing the exercises very enjoyable.
  • I think that the answers here could’ve been more to-a-format. These aren't very open-ended questions, and I think that if I’d practiced matching a format that would’ve drilled a more specific tool better. But not clear that's appropriate.
  • I didn’t like how all the examples were of the “don’t believe a dumb low-status thing”. Like I think people often build epistemologies around making sure to never be religious, endorse a failed startup idea, or believe homeopathy, but I think that you should mostly build it around making sure you will make successful insights in physics, or building a successful startup, which is a different frame. I would’ve liked much more difficult examples in areas where it’s not clear what the right choice is purely based on pattern-matching to low-status beliefs.
  • The post tells people to sit by a clock. I think at the start I would've told people to find a timer by googling ‘timer’ (when you do that, one just appears on google) else I expect most folks to have bounced off and not done those exercises.
  • I really liked the ‘reflect on the general technique’ sections, they were excellent and well-placed.
Comment by benito on AI Alignment Writing Day Roundup #2 · 2019-10-08T22:34:24.886Z · score: 4 (2 votes) · LW · GW

Oops. Edited.

Comment by benito on Intentionally Raising the Sanity Waterline · 2019-10-05T00:51:40.826Z · score: 7 (4 votes) · LW · GW

In hindsight, I should've listened to dude562.

Comment by benito on Concrete experiments in inner alignment · 2019-10-04T06:33:06.980Z · score: 2 (1 votes) · LW · GW

Would be good to note that this is for the Alignment Newsletter. I didn't realise that's what this was for a few seconds.

Comment by benito on FB/Discord Style Reacts · 2019-10-04T04:23:40.351Z · score: 4 (2 votes) · LW · GW

My understanding is that Ray wants them to not be anonymous; the idea being voting and anything that determines the order your comment gets seen is always anonymous, and all other things are public.

Comment by benito on Honoring Petrov Day on LessWrong, in 2019 · 2019-10-03T18:53:41.730Z · score: 2 (1 votes) · LW · GW

Just FYI, I am planning to make another post in maybe two weeks to open further discussion to needle down the specific details of what we want to celebrate and what is a fitting way to do that, because that seems like the correct way to build traditions.

Comment by benito on Open & Welcome Thread - October 2019 · 2019-10-02T02:25:32.940Z · score: 5 (3 votes) · LW · GW

That sounds quite interesting to me.

Comment by benito on Why so much variance in human intelligence? · 2019-10-01T23:49:45.203Z · score: 4 (2 votes) · LW · GW

For me, I’m pretty sure it was Yudkowsky (but maybe Bostrom) who put it pithily enough that I remembered. Would have to look for a cite.

Comment by benito on List of resolved confusions about IDA · 2019-10-01T01:45:35.824Z · score: 28 (11 votes) · LW · GW

This is a great post! I know there's been lots of conversations here and elsewhere about this topic, often going for dozens of comments, and I felt like a lot of them needed summarising else they'd be lost to history. Thanks for summarising them briefly and linking back to them.

Comment by benito on Please Take the 2019 EA Survey! · 2019-09-30T21:52:47.563Z · score: 6 (3 votes) · LW · GW

Done, thanks!

Added: I’ve made an exception and put this post on frontpage, I think it's worthwhile for a few big data-gathering effort like the SSC and EA surveys to reach all those who would want to take them.

Comment by benito on Rationality and Levels of Intervention · 2019-09-30T04:40:44.920Z · score: 5 (3 votes) · LW · GW

I also want to mention, as Geoff indicates in the OP, that once you start looking on the time scale of months and years, I think motivation becomes an obvious factor. One way you can think of it is that you have to ask not merely whether this epistemic heuristic a good fit for a person's environment, but also ask how likely the person is to consistently using the heuristic when it's appropriate. Heuristics with a high effort-to-information ratio often wear a person out and they use them less and less.

Comment by benito on Rationality and Levels of Intervention · 2019-09-30T04:39:24.335Z · score: 11 (5 votes) · LW · GW

I like the overall framing, which goes from intervening on the minutiae to long-term, big-picture interventions, and correctly noting that optimising for truth at each level does not look the same and that such strategies can even be in conflict.

I want to think more concretely about what short-term and long-term interventions look like, so I'll try to categorise a bunch of recent ideas on LessWrong, by looking back at all the curated posts and picking ones I think I can fit into this system. I want to do this to see if I'm getting the right overall picture from Geoff's post, so I'm gonna do this in a pretty fast and loose way, and I assign about a 35% probability that a lot of these posts are severely misplaced.

I think there are two main axis here: one is the period of time over which you observe and then make the intervention, and the other is whether you're looking at an individual or a group. I'll start just with individuals.

I think that thought regulation evaluates whether particular thoughts are acceptable. This feels to me like the most rigorous type of analysis. Eliezer's Local Validity as Key to Sanity and Civilization is about making sure each step of reasoning follows from the previous, so that you don't wander into false conclusions from true premises. Abram's post Mistakes with conservation of expected evidence is an example of taking the basic rules of reasoning and showing when particular thoughts are improper. This isn't a broad heuristic, it's a law, and comes with a lot of rigour. These are posts about moving from thought A to thought B, and whether thought B is allowed given thought A.

If I frame train of thought regulation as being about taking short walks that aren't all definitely locally valid steps, but making sure that you end in a place that is true, I think this is often like 'wearing hats' or 'red teaming' or 'doing perspective taking', where you try out a frame of thinking that isn't your best guess for being true, but captures something you've not been thinking about, and ends up coming up with a concrete hypothesis to test or piece of evidence you've missed, that you still find valuable after you take the frame off.

Some examples of this include alkjash's Babbling then Pruning which is about generating many thoughts that don't meet your high standards then reducing it to only the good ones, and my recommendation to Hold On To The Curiosity which can involve saying statements that are not accurate according to your all-things-considered-view while you search for the thing you've noticed. Habryka's post Models of Moderation tries to put on a lot of different perspectives in short succession, none of which seem straightforwardly true to him but all of which capture some important aspect of the problem, for which the next step is finding solutions that score highly on lots of different perspectives at once. Also Scott's If It's Worth Doing, it's Worth Doing With Made-Up Statistics involves building a false-ish model that makes a true point, which has some similarity. It maybe also includes Jessicata's Writing Children's Picture Books which is a frame to think about a subject for a while.

A different post that naturally fits in here is Abram's Track-Back Meditation, where you just practice for noticing your trains-of-thought. Eliezer's writing on humility also covers making sure you check that your train of thought was actually accurate.

OP says the next level is about rules. If I think of it as basically being about trains of thought the plural rather than the individual, I'll say the next level is about multi-trains of thought regulation. I think a central example here would be Anna's "Flinching away from truth" is often about *protecting* the epistemology. This post feels like saying often you will have mildly broken trains of thought, and trying to fix them on the level of not letting yourself believe a single false thought or ever let a train of thought conclude in a false place, will be bad, because sometimes the reason you're doing that is to make sure the most important, big-picture thoughts are true. As long as you notice when you seem to be avoiding true thoughts, and look into what implicit buckets you're making, then you'll be able to make sure to think the important true thoughts and not break things in the meantime by trying to fix everything locally in a way that messes up the bigger picture.

I think Paul's post Argument, Intuition and Recursion also fits into this category. I'd need to read it again carefully to be sure, but I recall it primarily being about how to ensure you're moving in the true direction in the long-run if you often can't get the ground truth in reasonable amounts of time - if you cannot check whether each of your train of thoughts terminated in being actually true - and how to learn to trust alternative sources of information and ideas.

Plausibly much of Brienne writing about noticing (at her bog Agenty Duck) fits in here as well, which is about in the increasing your long-term ability to bring important parts of your experience into your trains of thought. It's not about any one train of thought ending right or wrong, but improving them more generally.

That said, this section was the hardest for me to find posts on (I feel like there's loads for the others), which is interesting, and perhaps suggests we're neglecting this facet of rationality on LessWrong.

Then we move onto individual holistic regression, which feels to me like it is about stepping into a very complex system, trying to understand it and recommend a high-level change to its trajectory. This isn't about getting particular thoughts or trains of thought right, it's just asking where the system is and how all the parts work. Kaj's post Building up to an Internal Family Systems model feels like it believes you'll never get perfect thoughts all of the time but that you can build a self-model that will help you notice the main culprits of bad outcomes and address those head-on from time to time. Ray's Strategies of Personal Growth works on this level too. Zvi's post Slack is about noticing whether you have the sort of environment that allows you the space to complete the important trains of thoughts, and if not that you should do something. There isn't currently a notion of perfect slack and there's no formula for it (yet), but it's a really useful high-level heuristic.


Looking at it this way, I notice the posts I listed started on the more rigorous end and then became less rigorous as I went along. I wonder if this suggests that when you understand something very deeply, you can simply label individual thoughts as good or bad, but when you have a much weaker grasp then you can only notice the pattern it with massive amounts of data, and even then only vaguely. I've often said that I'd like to see the notion of Slack formalised, and that I bet it would be really valuable, but for now we'll have to stick to Zvi's excellent poetry.


Anyhow Geoff; even though I'd guess you haven't read most of the linked posts, I'm curious to know your sense of whether the above is doing a good job of capturing what you think of as the main axis of levels-of-intervention for individuals, or not. I'm also interested to hear from others if they feel like they would've put posts in very different categories, or if they want to offer more examples I didn't include (of which there are many).

Comment by benito on Rationality and Levels of Intervention · 2019-09-30T04:37:48.300Z · score: 3 (2 votes) · LW · GW
Imagine that you have a classroom of children that you want to behave well... You could intervene at the level of the child’s individual thoughts. Police each thought, make sure it is a well-behaved-child thought.

I want to briefly point to a different relevant axis. Your framing is primarily about policing bad thoughts, and generally making the group stable and well-behaved. If I had a group of 30 people to command, while there are some ways I'd try to satisfice for every person (e.g. make sure they all learn a certain level of math, all have a certain ceiling on the trauma they experience in the year) I actually will put a lot of effort (perhaps >50% of my focus) into children achieving the biggest wins possible (e.g. getting one child to a state where they are deeply curious about some aspect of the world and are spending a lot of self-directed effort to better understand that phenomena, or two children getting very excited about building something and spending most of their time doing that well). The motivation here is that a single child growing up and making breakthrough discoveries in fundamental physics is something I will trade-off against a lot of days of many 'well-behaved' children.

But this is an abstract point, and it's easy for people to talk past one another or create a double illusion of transparency when talking in ungrounded abstractions, so I'll write another, much more concrete, comment.

Comment by benito on Honoring Petrov Day on LessWrong, in 2019 · 2019-09-27T19:43:50.885Z · score: 25 (6 votes) · LW · GW

Currently writing that post :)

Added: Will post it sometime today, but probably later on.

Comment by benito on Honoring Petrov Day on LessWrong, in 2019 · 2019-09-27T07:01:30.961Z · score: 13 (4 votes) · LW · GW

"And on that day, the curse was lifted."

Comment by benito on Honoring Petrov Day on LessWrong, in 2019 · 2019-09-27T05:46:04.373Z · score: 13 (4 votes) · LW · GW

To quote Stanislav himself:

I imagined if I'd assume the responsibility for unleashing the third World War...
...and I said, no, I wouldn't. ... I always thought of it. Whenever I came on duty, I always refreshed it in my memory.

I don't think it's obvious that Petrov's choice was correct in foresight, I think he didn't know whether it was a false alarm - my current understanding is that he just didn't want to destroy the world, and that's why he disobeyed his orders. It's a fascinating historical case where someone actually got to make the choice, and made the right one. Real world situations are messy and it's hard to say exactly what his reasoning process is and how justifiable it was - it's really bad like decisions like these have to be made, and it doesn't seem likely to me there's some simple decision rule that gets the right answer in all situations (or even most). I didn't make any explicit claim about his reasoning in the post. I simply celebrate that he managed to make the correct choice.

The rationality community takes as a given Petrov's assertion that it was obviously silly for the United States to attack the Soviet Union with a single ICBM.

I don't take it as a given. It seems like I should push back on claims about 'the rationality community' believing something before you first point to a single person who does, and when the person who wrote the post you're commenting on explicitly doesn't.

I agree with you that while LW's red-button has some similarities with Petrov's situation it doesn't reflect many parts of it. As I say in the exchange with Zvi, I think it is instead representative of the broader situation with nukes and other destructive technologies, where we're building them for little good reason and putting ourselves in increasingly precarious positions - which Petrov's 1983 incident illustrates. We honour Petrov Day by not destroying the world, and I think it's good to practice that in this way.

Comment by benito on Honoring Petrov Day on LessWrong, in 2019 · 2019-09-27T05:09:33.567Z · score: 18 (4 votes) · LW · GW


Perhaps the key problem with attempts to lift the unilateralist's curse, is that it's very easy to enforce dangerous conformity - 'conformity' being a term I made sure not to use in the OP. It's crucial to be able to not do the thing that you're being told to do under the threat of immediate and strong social punishment, especially when there's a long time scale before finding out if your action is actually the right one. Consistently going against the grain because it's better in the long run, not because it brings immediate reward, is very difficult.

Both being able to think and act for yourself, and yet also not disregard others enough to not break things, is a delicate balance, and many people end up too far on one end or the other. They find themselves punished for unilateralist action, and never speak up again; or they find that others are stopping them from being themselves, and then ignore all the costs they're imposing on their community. My current sense is that most people lean towards conformity, but also that the small number of unilateralists have caused an outsized harm.

(Then again, failures from conformity are often more silent, so I have wide error bars around the magnitude of their cost.)

Comment by benito on Heads I Win, Tails?—Never Heard of Her; Or, Selective Reporting and the Tragedy of the Green Rationalists · 2019-09-27T00:11:04.186Z · score: 22 (8 votes) · LW · GW

Curated. I've tried to think about similar topics - silencing of different kinds of information can also lead to information cascades, for example. This was a simple toy model that I had properties I'd never put into an explicit model before - if signalling always looks like at least a 3:1 ratio of args in your side's favour, then random chance is gonna mean some people (even if 3:1 is the ground truth) will have lopsided info and have to lie, and that's a massive corruption of those people's epistemologies.

So far, so standard. (You did read the Sequences, right??)

Yes, indeed I did. A bunch of the beginning was nice to see again, it's good for people to reread that stuff, and for any newer users who haven't done it for the first time yet.

I wasn't so much enjoying one political footnote which seemed mostly off-topic or something, until the line at the end saying

If I'm doing my job right, then my analogue in a "nearby" Everett branch whose local subculture was as "right-polarized" as my Berkeley environment is "left-polarized", would have written a post making the same arguments.

which I really like as a way of visualising needling the truth between political biases in any environment.

The post is very readable and clearly explained, plus lots of links for context, which is always great.

I mostly feel confused about quantifying how biased the decisions are. If you have 9 honest rolls then that's log_2 of 9 = 3.2 bits. But if you roll it 9 times and hide the 3 rolls in a certain direction, then you don't have log_2 of 6 = 2.6 bits. That would be true if you had 6 honest rolls (looking like 2:2:2) but 3:3:0 surely is not the same amount of evidence. I'm generally not sure how best to understand the effects of biases of this sort, and want to think about that more.

Comment by benito on Towards an empirical investigation of inner alignment · 2019-09-26T19:19:12.648Z · score: 2 (1 votes) · LW · GW

Snap, I was also gonna write this comment.

Comment by benito on Honoring Petrov Day on LessWrong, in 2019 · 2019-09-26T19:05:13.779Z · score: 46 (20 votes) · LW · GW
The enemy is smart.
"The enemy knew perfectly well that you'd check whose launch codes were entered, especially since the nukes being set off at all tells us that someone can appear falsely trustworthy." Ben shut his eyes, thinking harder, trying to put himself into the enemy's shoes. Why would he, or his dark side, have done something like - "We're meant to conclude that the enemy has the launch codes. But that's actually something the enemy can only do with difficulty, or under special conditions; they're trying to create a false appearance of omnipotence." Like I would. "Later, hypothetically, the nukes actually get fired. We think it was Quirinus_Quirrell firing it, but really, it was just someone firing it independently."
"Unless that is precisely what Quirinus_Quirrell expects us to think," said Jim Babcock, his brow furrowed in concentration. "In which case he does have the launch codes, as well as the other person."
"Does Quirinus_Quirrell really use plots with that many levels of meta -"
"Yes," said Habryka and Jim.
Ben nodded distantly. "Then this could be a setup to either make us think the personalised launch codes are telling the truth about his identity when they're lying, or a setup to make us think the codes are lying when they're telling the truth, depending on what level the enemy expects us to reason at. But if the enemy is planning to make us trust the personalised codes - we would have trusted the personalised codes anyway, if we'd been given no reason to distrust them. So there's no need to go to all the work of framing another user in a way that we would realize we were intended to discover, just to trick us into going meta -"
Comment by benito on Meetups as Institutions for Intellectual Progress · 2019-09-19T18:35:37.243Z · score: 4 (2 votes) · LW · GW

If I recall correctly, I knew the guy running it, and while he was well-intentioned, he had not read the sequences or much of LW, and the low-quality content was the reason for the name change.

Comment by benito on G Gordon Worley III's Shortform · 2019-09-16T23:23:16.063Z · score: 11 (5 votes) · LW · GW

[Mod note] I thought for a while about how shortform interacts with moderation here. When Ray initially wrote the shortform announcement post, he described the features, goals, and advice for using it, but didn’t mention moderation. Let me follow-up by saying: You’re welcome and encouraged to enforce whatever moderation guidelines you choose to set on shortform, using tools like comment removal, user bans, and such. As a reminder, see the FAQ section on moderation for instructions on how to use the mod tools. Do whatever you want to help you think your thoughts here in shortform and feel comfortable doing so.

Some background thoughts on this: In other places on the internet, being blocked locks you out of the communal conversation, but there are two factors that make it pretty different here. Firstly, banning someone from a post on LW means they can’t reply to the content they’re banned from, but it doesn’t hide your content from them or their content from you. And secondly, everyone here on LessWrong has a common frontpage where the main conversation happens - the shortform is a low-key place and a relatively unimportant part of the conversation. (You can be banned from posts on frontpage, but that action requires meeting high standards not required for shortform bans.) Relatively speaking, shortform is very low-key, and I expect the median post gets 3x-10x fewer views than the median frontpage post. It’s a place for more casual conversation, hopefully leading to the best ideas getting made into posts - indeed we’re working on adding an option to turn shortform posts into blogposts. This is why we never frontpage a user’s shortform feed - they rarely meet frontpage standards, and they’re not supposed to.

Just to mention this thread in particular, Gordon is well within his rights to ban users or remove their comments from his shortform posts if he wishes to, and the LW mod team will back him up when he wants to do that.

Comment by benito on SSC Meetups Everywhere: Sacramento, CA · 2019-09-14T19:53:21.528Z · score: 3 (2 votes) · LW · GW

Who gives a damn about official? ;)

But the post you made doesn't actually name the location, which I think is a bit unfortunate.

I'll make this post yours, you can do what you like with the two.

Comment by benito on Rationality Exercises Prize of September 2019 ($1,000) · 2019-09-13T04:15:17.919Z · score: 5 (3 votes) · LW · GW

I didn’t think about it much, just copied what I remembered the Alignment Prize as doing - but yes, writing them here is totally fine, probably the way many people would want to do it, and I’m not discouraging it at all :)

Though even with like 3 exercises that build, if they have answers, many folks might want to put their answers in comments like on the fixed point exercises, and that would benefit from their own posts.

Comment by benito on The 3 Books Technique for Learning a New Skilll · 2019-09-12T19:00:46.036Z · score: 3 (2 votes) · LW · GW

<3 that book.

Comment by benito on What are the merits of signing up for cryonics with Alcor vs. with the Cryonics Institute? · 2019-09-11T19:11:39.739Z · score: 4 (2 votes) · LW · GW

I encourage you to make a google form for people to fill out, that just asks which one you’ve signed up with (and ‘none’ and ‘other’).

Comment by benito on G Gordon Worley III's Shortform · 2019-09-11T18:48:17.859Z · score: 4 (2 votes) · LW · GW

Er, I generally have FB blocked, but I have now just seen the thread on FB that Duncan made about you, and that does change how I read the dialogue (it makes Duncan’s comments feel more like they’re motivated by social coordination around you rather than around meditation/spirituality, which I’d previously assumed).

(Just as an aside, I think it would’ve been clearer to me if you’d said “I feel like you’re trying to attack me personally for some reason and so it feels especially difficult to engage in good faith with this particular public accusation of norm-violation” or something like that.)

I may make some small edit to my last comment up-thread a little after taking this into account, though I am still curious about your answer to the question as I initially stated it.

Comment by benito on G Gordon Worley III's Shortform · 2019-09-11T17:33:48.345Z · score: 13 (3 votes) · LW · GW

nods Then I suppose I feel confused by your final response.

If I imagine writing a shortform post and someone said it was:

  • Very rude to another member of the community
  • Endorsing a study that failed to replicate
  • Lied about an experience of mine
  • Tried to unfairly change a narrative so that I was given more status

I would often be like “No, you’re wrong” or maybe “I actually stand by it and intended to be rude” or “Thanks, that’s fair, I’ll edit”. I can also imagine times where the commenter is needlessly aggressive and uncooperative where I’d just strong downvote and ignore.

But I’m confused by saying “you’re not allowed to tell me off for norm-violations on my shortform”. To apply that principle more concretely, it could say “you’re not allowed to tell me off for lying on my shortform”.

My actual model of you feels a bit confused by Duncan’s claim or something, and wants to fight back against being attacked for something you don’t see as problematic. Like, it feels presumptuous of Duncan to walk into your post and hold you to what feels mostly like high standards of explanation, and you want to (rightly) say that he’s not allowed to do that.

Does that all seem right?

Comment by benito on G Gordon Worley III's Shortform · 2019-09-11T08:35:46.597Z · score: 11 (3 votes) · LW · GW

Hey Gordon, let me see if I understand your model of this thread. I’ll write mine and can you tell me if it matches your understanding?

  • You write a post giving your rough understanding of a commonly discussed topic that many are confused by
  • Duncan objects to a framing sentence that he claims means “I know better than other people what's going on in those other people's heads; I am smarter/wiser/more observant/more honest." because it seems inappropriate and dangerous in this domain (spirituality)
  • You say “Dude, I’m just getting some quick thoughts off my chest, and it’s hard to explain everything”
  • Duncan says you aren’t responding to him properly - he does not believe this is a disagreement but a norm-violation
  • You say that Duncan is not welcome to prosecute norm violations on your wall unless they are norms that you support
Comment by benito on G Gordon Worley III's Shortform · 2019-09-11T01:42:23.383Z · score: 9 (2 votes) · LW · GW

Yeah, I think there's a subtle distinction. While it's often correct to believe things that you have a hard time communicating explicitly (e.g. most of my actual world model at any given time), the claim that there's something definitely true but that in-principle I can't persuade you of and also can't explain to you, especially when used by a group of people to coordinate around resources, is often functioning as a coordination flag and not as a description of reality.

Comment by benito on Rationality Exercises Prize of September 2019 ($1,000) · 2019-09-11T00:40:57.047Z · score: 4 (2 votes) · LW · GW

I like it! If you had a bunch more worked examples and hid your answers behind spoiler tags or rot13, that'd be a solid submission.

Comment by benito on Rationality Exercises Prize of September 2019 ($1,000) · 2019-09-11T00:38:54.644Z · score: 2 (1 votes) · LW · GW

Sure, that sounds right. In this case it's often practise using a concept. I am kinda hoping to define it extensionally by pointing to all the examples.

Comment by benito on Benito's Shortform Feed · 2019-09-10T00:28:43.462Z · score: 4 (2 votes) · LW · GW

Sometimes I get confused between r/ssc and r/css.

Comment by benito on Peter Thiel/Eric Weinstein Transcript on Growth, Violence, and Stories · 2019-09-09T23:15:14.407Z · score: 2 (1 votes) · LW · GW

I've replied here :-)

Comment by benito on Stories About Progress · 2019-09-09T23:14:57.993Z · score: 10 (2 votes) · LW · GW

(Replying to lionhearted, who previously asked what parts felt insightful to me personally.)

Some thoughts:

  • The part on excitement vs nihilism is the part that I've come back to the mosts, in the context of false narratives about growth. It helps me focus on moving things forward rather spending time on than local political disputes. I've brought this up to other people when I feel like the political action they're taking is trying to beat people over the head with a sign that says "actually, things are bad", when I know that they have the skills and taste to be working directly on making good things (but aren't doing so).
  • The stories about academia, and the sections from stories about education about conformity and malthus, felt like someone else re-deriving the same theorems as me, and it was exciting to see how they did it.
  • The specific story that says growth stopped in 1973, and that our dysfunction comes from that plus lying about that, is interesting and one I want to think about more for myself. I don't know what I think about it yet. The focus on violence as the key force being balanced felt like a potential deep insight.
  • I really like the name 'distraction theory', it helps me notice large classes of things that are optimised to take up my time and attention and make me avoid looking at the real things.
  • The bit describing the shock of being a libertarian and then realising you're so radically imitative and copy everyone around you, then trying to balance those two ideas, is really interesting, and I'll probably think about that a lot more.
  • The section about whether scientific progress can be a motivating story for society felt a bit like a punch to the gut. I don't think it's as bad news as it felt like when I read it, but it definitely shocked me to consider the idea that scientific progress can only be a strong social narrative when it's being used to crush your enemies.
  • I also feel like I absorbed some hard-to-describe social tools for thinking for yourself, even while you feel your narrative is being silenced in a great deal of public discourse.

The main reason I made the transcript was not any one of the specific points above, I was just surprised to find lots of new parts of Thiel's worldview that I've not heard before, and felt it would be better if it could be engaged by in the public discourse, so that any valuable ideas can be mined. Rule Thinkers In, Not Out, etc.

Comment by benito on Book Review: Secular Cycles · 2019-09-09T22:16:42.165Z · score: 4 (2 votes) · LW · GW

I’ve curated this post. I feel really confused about theories of history, and have no idea whether the theory of secular cycles is true. But I think history is a really crucial and difficult field, and this post (along with its sequel) really helped me engage with models of this domain - by diving into a natural category (cyclic theories, which are common in many other social/biological fields) and using lots of robust, general approaches to figure out what’s true (degrees of freedom, independent search for datasets as spot check, etc).

Comment by benito on AI Alignment Writing Day Roundup #1 · 2019-09-08T01:15:50.111Z · score: 2 (1 votes) · LW · GW


Comment by benito on What are the reasons to *not* consider reducing AI-Xrisk the highest priority cause? · 2019-09-06T19:42:19.295Z · score: 3 (2 votes) · LW · GW

My bad, just read the title.

Comment by benito on Formalising decision theory is hard · 2019-09-05T22:22:07.597Z · score: 6 (3 votes) · LW · GW

Pardon me, but I removed your first link, because it linked to a document not about alignment and said at the top that it shouldn’t be shared on the public internet. I think you used the wrong link. Apologies if that was a mistake.

Added: Have PM’d you the link.