Case Studies Highlighting CFAR’s Impact on Existential Risk 2017-01-10T18:51:53.178Z · score: 4 (5 votes)
Results of a One-Year Longitudinal Study of CFAR Alumni 2015-12-12T04:39:46.399Z · score: 35 (35 votes)
The effect of effectiveness information on charitable giving 2014-04-15T16:43:24.702Z · score: 15 (16 votes)
Practical Benefits of Rationality (LW Census Results) 2014-01-31T17:24:38.810Z · score: 16 (17 votes)
Participation in the LW Community Associated with Less Bias 2012-12-09T12:15:42.385Z · score: 34 (34 votes)
[Link] Singularity Summit Talks 2012-10-28T04:28:54.157Z · score: 8 (11 votes)
Take Part in CFAR Rationality Surveys 2012-07-18T23:57:52.193Z · score: 18 (19 votes)
Meetup : Chicago games at Harold Washington Library (Sun 6/17) 2012-06-13T04:25:05.856Z · score: 0 (1 votes)
Meetup : Weekly Chicago Meetups Resume 5/26 2012-05-16T17:53:54.836Z · score: 0 (1 votes)
Meetup : Weekly Chicago Meetups 2012-04-12T06:14:54.526Z · score: 2 (3 votes)
[LINK] Being proven wrong is like winning the lottery 2011-10-29T22:40:12.609Z · score: 29 (30 votes)
Harry Potter and the Methods of Rationality discussion thread, part 8 2011-08-25T02:17:00.455Z · score: 8 (13 votes)
[SEQ RERUN] Failing to Learn from History 2011-08-09T04:42:37.325Z · score: 4 (5 votes)
[SEQ RERUN] The Modesty Argument 2011-04-23T22:48:04.458Z · score: 6 (7 votes)
[SEQ RERUN] The Martial Art of Rationality 2011-04-19T19:41:19.699Z · score: 7 (8 votes)
Introduction to the Sequence Reruns 2011-04-19T19:39:41.706Z · score: 6 (9 votes)
New Less Wrong Feature: Rerunning The Sequences 2011-04-11T17:01:59.047Z · score: 33 (36 votes)
Preschoolers learning to guess the teacher's password [link] 2011-03-18T04:13:23.945Z · score: 23 (26 votes)
Harry Potter and the Methods of Rationality discussion thread, part 7 2011-01-14T06:49:46.793Z · score: 7 (10 votes)
Harry Potter and the Methods of Rationality discussion thread, part 6 2010-11-27T08:25:52.446Z · score: 6 (9 votes)
Harry Potter and the Methods of Rationality discussion thread, part 3 2010-08-30T05:37:32.615Z · score: 5 (8 votes)
Harry Potter and the Methods of Rationality discussion thread 2010-05-27T00:10:57.279Z · score: 34 (35 votes)
Open Thread: April 2010, Part 2 2010-04-08T03:09:18.648Z · score: 3 (4 votes)
Open Thread: April 2010 2010-04-01T15:21:03.777Z · score: 4 (5 votes)


Comment by unnamed on A Critique of Functional Decision Theory · 2019-09-14T16:59:04.493Z · score: 2 (1 votes) · LW · GW
Perhaps the Scots tend to one-box, whereas the English tend to two-box.

My intuition is that two-boxing is the correct move in this scenario where the Predictor always fills the box with $1M for the Scots and never for the English. An Englishman has no hope of walking away with the $1M, so why should he one-box? He could wind up being one of the typical Englishmen who walk away with $1000, or one of the atypical Englishmen who walk away with $0, but he is not going to wind up being an Englishman who walks away with $1M because those don't exist and he is not going to wind up being a Scottish millionaire because he is English.

EDT might also recommend two-boxing in this scenario, because empirically p($1M | English & one-box) = 0.

Comment by unnamed on What's In A Name? · 2019-08-27T04:51:29.061Z · score: 9 (4 votes) · LW · GW

These studies have not held up well to further rigor. See Scott's 2016 post Devoodooifying Psychology, or even better Simonsohn's (2011) paper Spurious? Name similarity effects (implicit egotism) in marriage, job, and moving decisions.

Comment by unnamed on Solving for X instead of 3 in love triangles? · 2019-07-23T01:47:48.232Z · score: 5 (3 votes) · LW · GW

Number of weakly connected digraphs with n nodes.

Comment by unnamed on Bystander effect false? · 2019-07-12T06:57:41.611Z · score: 24 (7 votes) · LW · GW

It also seems worth noting that this study looked at whether people intervened in aggressive public conflicts, which is a type of situation where the bystander's safety could be at risk and there can be safety in numbers. A lone bystander intervening in a fight is at higher risk of getting hurt, compared to a group of 10 bystanders acting together. This factor doesn't exist (or is much weaker) in situations like "does anyone stop to see if the person lying on the ground needs medical help" or "does anyone notify the authorities about the smoke which might indicate a fire emergency." So I'd be cautious about generalizing to those sorts of situations.

Comment by unnamed on Bystander effect false? · 2019-07-12T06:53:17.822Z · score: 35 (10 votes) · LW · GW

The standard claim in bystander effect research is that an individual bystander's probability of intervening goes down as the number of bystanders increases (see, e.g., Wikipedia). Whereas this study looked at the probability of any intervention from the group of bystanders, which is a different thing.

The abstract of the paper actually begins with this distinction:

Half a century of research on bystander behavior concludes that individuals are less likely to intervene during an emergency when in the presence of others than when alone. By contrast, little is known regarding the aggregated likelihood that at least someone present at an emergency will do something to help.

So: not a debunking. And another example of why it's good practice to check the paper in question (or at least its abstract) and the Wikipedia article(s) on the topic rather than believing news headlines.

Comment by unnamed on Why the tails come apart · 2019-06-18T04:56:47.544Z · score: 27 (5 votes) · LW · GW

One angle for thinking about why the tails come apart (which seems worth highlighting even more than it was highlighted in the OP) is that the farther out you go in the tail on some variable, the smaller the set of people you're dealing with.

Which is better, the best basketball team that you can put together from people born in Pennsylvania or the best basketball team that you can put together from people born in Delaware? Probably the Pennsylvania team, since there are about 13x as many people in that state so you get to draw from a larger pool. If there were no other relevant differences between the states then you'd expect 13 of the best 14 players to be Pennsylvanians, and probably the two neighboring states are similar enough so that Delaware can't overcome that population gap.

Now, imagine you're picking the best 10 basketball players from the 1,000 tallest basketball-aged Americans (20-34 year-olds), and you're putting together another group consisting of the best 10 basketball players from the next 100,000 tallest basketball-aged Americans. Which is a better group of basketball players? In this case it's not obvious - getting to pick from a pool of 100x as many people is an obvious advantage, but that height advantage could matter a lot too. That's the tails coming apart - the very tallest don't necessarily give you the very best basketball players, because "the very tallest" is a much smaller set than the "also really tall but not quite as tall".

(I ran some numbers and estimate that the two teams are pretty similar in basketball ability. Which is a remarkable sign of how important height is for basketball - one pool has about a 4 inch height advantage on average, the other pool has 100x as many people, and those factors roughly balance out. If you want the example to more definitively show the tails coming apart, you can expand the larger pool by another factor of 30x and then they'll clearly be better.)

Similarly, who has higher arm strength: the one person in our sample who has the highest grip strength, or the most arm-strong person out of the next ten people who rank 2-11 in grip strength? Grip strength is closely related to arm strength, but you get to pick the best from a 10x larger pool if you give up a little bit of grip strength. In the graph in the OP, the person who was 6th (or maybe 5th) in grip strength had the highest arm strength, so getting to pick from a pool of 10 was more important. (The average arm strength of the people ranked 2-11 in grip strength was lower than the arm strength of the #1 gripper, but we get to pick out the strongest arm of the ten rather than averaging them.)

So: the tails come apart because most of the people aren't way out on the tail. And you usually won't find the very best person at something if you're looking in a tiny pool, even if that's a pretty well selected pool.

Thrasymachus's intuitive explanation covered this - having a smaller pool to pick from hurts because there are other variables that matter, and the smaller the pool the less you get to select for people who do well on those other variables. But his explanation highlighted the "other variables matter" part of this more than the pool size part of it, and both of these points of emphasis seem helpful for getting an intuitive grasp of the statistics in these types of situations, so I figured I'd add this comment.

Comment by unnamed on The Schelling Choice is "Rabbit", not "Stag" · 2019-06-09T00:43:45.445Z · score: 11 (5 votes) · LW · GW
And I said, in a move designed to be somewhat socially punishing: "I don't really trust the conversation to go anywhere useful." And then I took out my laptop and mostly stopped paying attention.

This 'social punishment' move seems problematic, in a way that isn't highlighted in the rest of the post.

One issue: What are you punishing them for? It seems like the punishment is intended to enforce the norm that you wanted the group to have, which is a different kind of move than enforcing a norm that is already established. Enforcing existing norms is generally prosocial, but it's more problematic if each person is trying to enforce the norms that he personally wishes the group to have.

A second thing worth highlighting is that this attempt at norm enforcement looked a lot like a norm violation (of norms against disengaging from a meeting). Sometimes "punishing others for violating norms" is a special case where it's appropriate to do something which would otherwise be a norm violation, but that's often a costly/risky way of doing things (especially when the norm you're enforcing isn't clearly established and so your actions are less legible).

Comment by unnamed on Asymmetric Weapons Aren't Always on Your Side · 2019-06-07T21:42:12.177Z · score: 29 (8 votes) · LW · GW

When Scott used the term "asymmetric weapons", I understood him to mean truth-asymmetric weapons or weapons that favor what's good & true. He was trying to set that particular dimension of asymmetry apart from the various other ways in which a weapon might be more useful in some hands than in others.

I think it's an important concept, and I wish we had better terminology for it.

Comment by unnamed on Was CFAR always intended to be a distinct organization from MIRI? · 2019-05-28T00:41:09.265Z · score: 9 (5 votes) · LW · GW

Yes. Or at least, becoming a distinct organization was already the plan when I got there in early 2012: get a group of people together to create a rationality organization, initially rely on MIRI for institutional support, become an independent organization some months later once all the pieces are in place to do so.

Comment by unnamed on Nash equilibriums can be arbitrarily bad · 2019-05-02T05:45:37.329Z · score: 9 (6 votes) · LW · GW

Unilateral precommitment lets people win at "Almost Free Lunches".

One way to model precommitment is as a sequential game: first player 1 chooses a number, then player 1 has the option of either showing that number to player 2 or keeping it hidden, then player 2 chooses a number. Optimal play is for player 1 to pick £1,000,000 and show that number, and then for player 2 to choose £999,999.99.

An interesting feature of this is that player 1's precommitment helped player 2 even more than it helped player 1. Player 1 is "taking one for the team", but still winning big. This distinguishes it from games like chicken, where precommitment is a threat that allows the precommitter to win the larger share. Though this means that if either player can precommit (rather than one being pre-assigned to go first as player 1) then they'd both prefer to have the other one be the precommitter.

This benefit of precommitment does not extend to the two option version (n2 vs. n1). In that version, player 2 is incentivized to say "n1" regardless of what player 1 commits to, so unilateral precommitment doesn't help them avoid the Nash Equilibrium. As in the prisoner's dilemma.

Comment by unnamed on How much funding and researchers were in AI, and AI Safety, in 2018? · 2019-03-30T00:39:52.063Z · score: 20 (4 votes) · LW · GW

Some numbers related to c (how many capabilities researchers):

In 2018 about 8,500 people attended NeurIPS and about 4,000 people attended ICML. There are about 2,000 researchers who work at Google AI, and in December 2017 there were reports that about 700 total people work at DeepMind including about 400 with a PhD.

Turning this into a single estimate for "number of researchers" is tricky for the sorts of reasons that catherio gives. Capabilities researchers is a fuzzy category and it's not clear to what extent people who are working on advancing the state of the art in general AI capabilities should include people who are primarily working on applications using the current art and people who are primarily working on advancing the state of the art in narrower subfields. Also obviously only some fraction of the relevant researchers attended those conferences or work at those companies.

I'll suggest 10,000 people as a rough order-of-magnitude estimate. I'd be surprised if the number that came out of a more careful estimation process wasn't within a factor of ten of that.

Comment by unnamed on Blackmail · 2019-02-20T06:30:00.201Z · score: 11 (7 votes) · LW · GW

After discussing this offline, I think the main argument that I laid out does not hold up well in the case of blackmail (though it works better for many other kinds of threats). They key bit is here:

if Bob refuses and Alice carries out her threat then it is negative sum (Bob loses a lot and Alice loses something too)

This only looks at the effects on Alice and on Bob, as a simplification. But with blackmail "carrying out the threat" means telling other people information about Bob, and that is often useful for those other people. If Alice tells Casey something bad about Bob, that will often be bad for Bob but good for Casey. So it's not obviously negative sum for the whole world.

Comment by unnamed on Blackmail · 2019-02-19T23:06:02.111Z · score: 23 (11 votes) · LW · GW

There's a pretty simple economic argument for why blackmail is bad: it involves a negative-sum threat rather than a positive-sum deal. I was surprised to not see this argument in the econbloggers' discussion; good to see it come up here. To lay it out succinctly and separate from other arguments:

Ordinarily, when two people make a deal we can conclude that it's win-win because both of them chose to make the deal rather than just not interacting with each other. By default Alice would just act on her own preferences and completely ignore Bob's preferences, and the mirror image for Bob, but sometimes they find a deal where they each give up something in return for the other person doing something that they value even more. With some simplifying assumptions, the worst case scenario is that they don't reach a deal and they both break even (compared to if they hadn't interacted), and if they do reach a deal then they both wind up better off.

With a threat, Alice has an alternative course of action available which is somewhat worse for Alice than her default action but much worse for Bob, and Alice tells Bob that she will do the alternative action unless Bob does something for Alice. With some simplifying assumptions, if Bob agrees to give in then their interaction is zero-sum (Alice gets a transfer from Bob), if Bob refuses and Alice carries out her threat then it is negative sum (Bob loses a lot and Alice loses something too), and if Bob refuses and Alice backs down then it's zero sum (both take the default action).

Ordinary deals add value to the world and threats subtract value from the world, and blackmail is a type of threat.

If we remove some simplifying assumptions (e.g. no transaction costs, one-shot interaction) then things get more complicated, but mostly in ways that make ordinary deals better and threats worse. In the long run deals bring people together as they seek more interactions which could lead to win-win deals, deals encourage people to invest in abilities that will make them more useful to other people so that they'll have more/better opportunities to make deals, and the benefits of deals must outweigh the transaction costs & risks involved at least in expectation (otherwise people would just opt out of trying to make those deals). Whereas threats push people apart as they seek to avoid negative sum interactions, threats encourage people to invest in abilities that make them more able to harm other people, and transaction costs increase the badness of threats (turning zero sum interactions into negative sum) but don't prevent those interactions unless they drive the threatmaker's returns down far enough.

Comment by unnamed on Epistemic Tenure · 2019-02-19T21:57:47.387Z · score: 32 (17 votes) · LW · GW

I think that there's a spectrum between treating someone as a good source of conclusions and treating them as a good source of hypotheses.

I can have thoughts like "Carol looked closely into the topic and came away convinced that Y is true, so for now I'm going to act as if Y is probably true" if I take Carol to be a good source of conclusions.

Whereas if I took Alice to be a good source of hypotheses but not a good source of conclusions, then I would instead have thoughts like "Alice insists that Z is true, so Z seems like something that's worth thinking about more."

Giving someone epistemic tenure as a source of conclusions seems much more costly than giving them epistemic tenure as a source of hypotheses.

Comment by unnamed on Avoiding Jargon Confusion · 2019-02-18T05:09:09.594Z · score: 4 (2 votes) · LW · GW

Huh? I am sufficiently surprised/confused by this example to want a citation.

Edit: The surprise/confusion was in reference to the pre-edit version of the above comment, and does not apply to the current edition.

Comment by Unnamed on [deleted post] 2019-02-16T22:35:31.453Z

I think we should take more care to separate the question of of whether AI developments will be decentralized with the question of whether decentralization is safer. It is not obvious to me whether a decentralized, economy-wide path to advanced AIs will be safer or riskier than a concentrated path within a single organization. It seems like the opening sentence of this question is carrying the assumption that decentralized is safer ("Robin Hanson has argued that those who believe AI Risk to be a primary concern for humanity, are suffering from a bias toward thinking that concentration of power is always more efficient than a decentralised system").

Comment by unnamed on Greatest Lower Bound for AGI · 2019-02-06T09:02:52.496Z · score: 4 (3 votes) · LW · GW

I think you mean 50/62 = 0.81?

Comment by unnamed on The Valley of Bad Theory · 2018-10-13T06:16:47.569Z · score: 2 (1 votes) · LW · GW

Sometimes theory can open up possibilities rather than closing them off. In these cases, once you have a theory that claims that X is important, then you can explore different values of X and do local hill-climbing. But before that it is difficult to explore by varying X, either because there are too many dimensions or because there is some subtlety in recognizing that X is a dimension and being able to vary its level.

This depends on being able to have and use a theory without believing it.

Comment by unnamed on Does This Fallacy Have A Name? · 2018-10-03T05:19:18.138Z · score: 6 (5 votes) · LW · GW

This sounds most similar to what LWers call generalizing from one example or the typical mind fallacy and to what psychologists call the false-consensus effect or egocentric bias.

Comment by unnamed on No standard metric for CFAR workshops? · 2018-09-14T20:24:35.670Z · score: 15 (5 votes) · LW · GW

Here are relatively brief responses on these 3 particular points; I’ve made a separate comment which lays out my thinking on metrics like the Big 5 which provides some context for these responses.

We have continued to collect measures like the ones in the 2015 longitudinal study. We are mainly analyzing them in large batches, rather than workshop to workshop, because the sample size isn’t big enough to distinguish signal from noise for single workshops. One of the projects that I’m currently working on is an analysis of a couple years of these data.

The 2017 impact report was not intended as a comprehensive account of all of CFAR’s metrics, it was just focused on CFAR’s EA impact. So it looked at the data that were most directly related to CFAR alums’ impact on the world, and “on average alums have some increase in conscientiousness” seemed less relevant than the information that we did include. The first few paragraphs of the report say more about this.

I’m curious why you’re especially interested in Raven’s Progressive Matrices. I haven’t looked closely at the literature on it, but my impression is that it’s one of many metrics which are loosely related to the thing that we mean by “rationality.” It has the methodological advantage of being a performance score rather than self-report (though this is partially offset by the possibility of practice effects and effort effects). The big disadvantage is the one that Kaj pointed to: it seems to track relatively stable aspects of a person’s thinking skills, and might not change much even if a person made large improvements. For instance, I could imagine a person developing MacGyver-level problem-solving ability while having little or no change in their Raven’s score.

Comment by unnamed on No standard metric for CFAR workshops? · 2018-09-14T20:20:34.515Z · score: 22 (7 votes) · LW · GW

Here’s a sketch of my thinking about the usefulness of metrics like the Big 5 for what CFAR is trying to do.

It would be convenient if there was a definitive measure of a person’s rationality which closely matched what we mean by the term and was highly sensitive to changes. But as far as I can tell there isn’t one, and there isn’t likely to be one anytime soon. So we rely on a mix of indicators, including some that are more like systematic metrics, some that are more like individuals’ subjective impressions, and some that are in between.

I think of the established psychology metrics (Big 5, life satisfaction, general self-efficacy, etc.) as primarily providing a sanity check on whether the workshop is doing something, along with a very very rough picture of some of what it is doing. They are quantitative measures that don’t rely on staff members’ subjective impressions of participants, they have been validated (at least to some extent) in existing psychology research, and they seem at least loosely related to the effects that CFAR hopes to have. And, compared to other ways of evaluating CFAR’s impact on individuals, they’re relatively easy for an outsider to make sense of.

A major limitation of these established psychology metrics is that they haven’t been that helpful as feedback loops. One of the main purposes of a metric is to provide input into CFAR’s day-to-day and workshop-to-workshop efforts to develop better techniques and refine the workshop. That is hard to do with metrics like the ones in the longitudinal study, because of a combination of a few factors:

  1. The results aren’t available until several months after the workshop, which would make for very slow feedback loops and iteration.
  2. The results are too noisy to tell if changes from one workshop to the next are just random variation. It takes several workshops worth of data to get a clear signal on most of the metrics.
  3. These metrics are only loosely related to what we care about. If a change to the workshop leads to larger increases in conscientiousness that does not necessarily mean that we want to do it, and when a curriculum developer is working on a class they are generally not that interested in these particular metrics.
  4. These metrics are relatively general/coarse indicators of the effect of the workshop as a whole, not tied to particular inputs. So (for example) if we make some changes to the TAPs class and want to see if the new version of the class works better or worse, there isn’t a metric that isolates the effects of the TAPs class from the rest of the workshop.
Comment by unnamed on No standard metric for CFAR workshops? · 2018-09-06T18:59:04.994Z · score: 50 (11 votes) · LW · GW

(This is Dan from CFAR)

CFAR's 2015 Longitudinal Study measured the Big 5 and some other standard psychology metrics. It did find changes including decreased neuroticism and increased conscientiousness.

Comment by unnamed on Birth order effect found in Nobel Laureates in Physics · 2018-09-05T20:28:18.269Z · score: 16 (7 votes) · LW · GW

Seems interesting to get data on:

Some group that isn't heavily selected for intelligence / intellectual curiosity: skateboarders, protestors, professional hockey players...

Some non-STEM group that is selected for success based on mental skills: literature laureates, governors, ...

Not sure which groups it would be easy to get data on.

There is also the option of looking into existing research on birth order to see what groups other people have already looked at.

Comment by unnamed on nostalgebraist - bayes: a kinda-sorta masterpost · 2018-09-04T23:45:56.702Z · score: 28 (11 votes) · LW · GW

Seems worth noting that nostalgebraist published this post in June 2017, which was (for example) before Eliezer's post on toolbox thinking.

Comment by unnamed on Birth order effect found in Nobel Laureates in Physics · 2018-09-04T22:41:38.976Z · score: 27 (13 votes) · LW · GW

Now that we have data on LWers/SSCers, mathematicians, and physicists, if anyone wants to put more work into this I'd like to see them look someplace different. We don't want to fall into the Wason 2-4-6 trap of only looking for birth order effects among smart STEM folks. We want data that can distinguish Scott's intelligence / intellectual curiosity hypothesis from other possibilities like some non-big-5 personality difference or a general firstborns more likely phenomenon.

Comment by unnamed on Historical mathematicians exhibit a birth order effect too · 2018-08-21T07:14:29.942Z · score: 6 (3 votes) · LW · GW

For each mathematician, actual firstbornness was coded as 0 or 1, and expected firstbornness as 1/n (where n is the number of children that their parents had). Then we just did a paired t-test, which is equivalent to subtracting actual minus expected for each data point and then doing a one sample t-test against a mean of 0. You can see this all in Eli's spreadsheet here; the data are also all there for you to try other statistical tests if you want to.

Comment by Unnamed on [deleted post] 2018-08-14T07:53:29.520Z

You could think of CEV applied to a single unitary agent as a special case where achieving coherence is trivial. It's an edge case where the problem becomes easier, rather than an edge case where the concepts threaten to break.

Although this terminology makes it harder to talk about several agents who each separately have their own extrapolated volition (as you were trying to do in your original comment in this thread). Though replacing it with Personal Extrapolated Volition only helps a little, if we also want to talk about several separately groups who each have their own within-group extrapolated volition (which is coherent within each group but not between groups).

Comment by unnamed on Logarithms and Total Utilitarianism · 2018-08-13T05:40:09.291Z · score: 9 (3 votes) · LW · GW

Looking at the math of dividing a fixed pool of resources among a non-fixed number of people, a feature of log(r) that matters a lot is that log(0)<0. The first unit of resources that you give to a person is essentially wasted, because it just gets them up to 0 utility (which is no better than just having 1 fewer person around).

That favors having fewer people, so that you don't have to keep wasting that first unit of resource on each person. If the utility function for a person in terms of their resources was f(r)=r-1 you would similarly find that it is best not to have too many people (in that case having exactly 1 person would work best).

Whereas if it was f(r)=sqrt(r) then it would be best to have as many people as possible, because you're starting from 0 utility at 0 resources and sqrt is steepest right near 0. Doing the calculation... if you have R units of resources divided equally among N people, the total utility is sqrt(RN). log(1+r) is similar to sqrt - it increases as N increases - but it is bounded if R is fixed and just approaches that bound (if we use natural log, that bound is just R).

To sum up: diminishing marginal utility favors having more people each with fewer resources (in addition to favoring equal distribution of resources), f(0)<0 favors having fewer people each with more resources (to avoid "wasting" the bit of resources that get a person up to 0 utility), and functions with both features like log(r) favor some intermediate solution with a moderate population size.

Comment by unnamed on Logarithms and Total Utilitarianism · 2018-08-13T03:26:14.442Z · score: 4 (2 votes) · LW · GW

Total utilitarianism does imply the repugnant conclusion, very straightforwardly.

For example, imagine that world A has 1000000000000000000 people each with 10000000 utility and world Z has 10000000000000000000000000000000000000000 people each with 0.0000000001 utility. Which is better?

Total utilitarianism says that you just multiply. World A has 10^18 people x 10^7 utility per person = 10^25 total utility. World Z has 10^40 people x 10^-10 utility per person = 10^30 total utility. World Z is way better.

This seems repugnant; intuitively world Z is much worse than world A.

Parfit went through cleverer steps because he wanted his argument to apply more generally, not just to total utilitarianism. Even much weaker assumptions can get to this repugnant-seeming conclusion that a world like Z is better than a world like A.

The point is that lots of people are confused about axiology. When they try to give opinions about population ethics, judging in various scenarios whether one hypothetical world is better than another, they'll wind up making judgments that are inconsistent with each other.

Comment by unnamed on Logarithms and Total Utilitarianism · 2018-08-13T03:13:11.641Z · score: 2 (1 votes) · LW · GW

The paragraph that I was quoting from was just about diminishing marginal utility and equality/redistribution, not about the repugnant conclusion in particular.

Comment by unnamed on Open Thread August 2018 · 2018-08-12T05:13:29.957Z · score: 4 (2 votes) · LW · GW

The beta distribution is often used to represent this type of scenario. It is straightforward to update in simple cases where you get more data points, though it's not straightforward to update based on messier evidence like hearing someone's opinion.

Comment by unnamed on Logarithms and Total Utilitarianism · 2018-08-10T21:03:19.286Z · score: 4 (2 votes) · LW · GW
I don't know to what extent have others explored the connection between total utilitarianism and equality

Diminishing marginal utility is one of the standard arguments for redistribution.

Comment by unnamed on Strategies of Personal Growth · 2018-07-29T05:58:12.677Z · score: 4 (2 votes) · LW · GW

I disagree with having "Try Things" identified with one of these strategies. IMO it can be applied to (nearly?) any strategy. e.g., You can try different ways of doing low-key practice, or try different skills or subskills to practice. Or you can run experiments in internal alignment, like what happens if I let the part of me that has been saying "I want cake!" have complete control of what I eat for the next week?

Comment by unnamed on Toolbox-thinking and Law-thinking · 2018-06-02T21:34:56.288Z · score: 21 (5 votes) · LW · GW

His description of LW there is: "LW suggests (sometimes, not always) that Bayesian probability is the main tool for effective, accurate thinking. I think it is only a small part of what you need."

This seems to reflect the toolbox vs. law misunderstanding that Eliezer describes in the OP. Chapman is using a toolbox frame and presuming that, when LWers go on about Bayes, they are using a similar frame and thinking that it's the "main tool" in the toolbox.

In the rest of the post it looks like Chapman thinks that what he's saying is contrary to the LW ethos, but it seems to me like his ideas would fit in fine here. For example, Scott has also discussed how a robot can use simple rules which outsource much of its cognition to the environment instead of constructing an internal representation and applying Bayes & expected utility maximization.

Comment by unnamed on Hold On To The Curiosity · 2018-04-23T20:10:47.814Z · score: 10 (3 votes) · LW · GW

The standard name for the "infinity-mean" is the midrange.

Comment by unnamed on Survey: Help Us Research Coordination Problems In The Rationalist/EA Community · 2018-04-08T19:57:30.282Z · score: 20 (4 votes) · LW · GW

CFAR's 2013 description of its mission was "to create people who can and will solve important problems", via a community with a mix of competence, epistemic rationality, and do-gooding.

Comment by unnamed on April Fools: Announcing: Karma 2.0 · 2018-04-01T17:02:21.768Z · score: 20 (7 votes) · LW · GW

I expect that I (and many other users) would get more benefit out of this feature if it was more personalized. If I have personally upvoted a lot of posts by a user, then make that user's comments appear even larger to me (but not to other readers). That way, the people who I like would be a "bigger" part of my Less Wrong experience.

It's a bit concerning that you seem not have considered this possibility. It seems like this sort of personalization would've naturally come under consideration if LW's leadership was paying attention to the state of the art in user experience like the Facebook news feed.

Comment by unnamed on Focusing · 2018-02-27T06:44:02.980Z · score: 17 (4 votes) · LW · GW
I find this CFAR version of Focusing surprising.

Good noticing.

The exercise described here is one application of Focusing, for finding bugs. At CFAR workshops we do something like this in the class on Hamming problems.

The CFAR class on Focusing is more similar to Conor's post and puts a lot more emphasis on searching for a handle that fits.

Comment by unnamed on Design 2 · 2018-02-23T21:28:10.933Z · score: 9 (2 votes) · LW · GW

Go to chrome://settings/, click "Advanced" at the bottom, unselect "Use a prediction service to help complete searches and URLs typed in the address bar" and maybe also "Use a prediction service to load pages more quickly".

Comment by unnamed on The Three Stages Of Model Development · 2018-02-22T19:30:24.811Z · score: 4 (1 votes) · LW · GW

One advantage of having both a weak intuitive model and a weak analytical model is that you can notice where there are mismatches in their predictions and flag them as places where you're confused.

This helps with making predictions about specific cases. In cases where your intuitive naive physics and your s2 sense of "objects in motion tend to remain in motion" make the same prediction, they're usually right. In cases where they disagree, you now have a trigger to look into the case in a more careful, detailed way rather than relying on either cached model.

It also helps with upgrading your models. Instead of waiting to be surprised by reality when it contradicts what your model predicted, you can notice as soon as your intuitive and analytical models disagree with each other and look for ways to improve them.

Comment by unnamed on Confidence Confusion · 2018-02-16T02:28:40.549Z · score: 22 (5 votes) · LW · GW

Scott mentioned that fact about superforecasters in his review; from what I remember the book doesn't add much detail beyond Scott's summary.

One result is that while poor forecasters tend to give their answers in broad strokes – maybe a 75% chance, or 90%, or so on – superforecasters are more fine-grained. They may say something like “82% chance” – and it’s not just pretentious, Tetlock found that when you rounded them off to the nearest 5 (or 10, or whatever) their accuracy actually decreased significantly. That 2% is actually doing good work.
Comment by unnamed on Confidence Confusion · 2018-02-16T02:24:49.040Z · score: 15 (4 votes) · LW · GW

Albert and Betty should share likelihood ratios, not posterior beliefs.

Comment by unnamed on A LessWrong Crypto Autopsy · 2018-01-30T02:27:31.351Z · score: 17 (5 votes) · LW · GW

And mine is that it sounds like we did do much better than the average tech industry.

Though maybe you/Scott/others have different intuitions than I do about how common it has been for tech folks to make a bunch of money on cryptocurrency. My impression from Scott's post was that we wouldn't differ much in our estimates of the numbers, just about whether "15% is way less than 100%" was more salient than "15% is way more than 1% (or whatever the relevant base rate is)".

Comment by unnamed on A LessWrong Crypto Autopsy · 2018-01-29T09:51:22.294Z · score: 19 (7 votes) · LW · GW

15% seems like a lot.

Maybe I have that impression because I tend to reach for a null hypothesis as my default model, which in this case is that LWers would do no better with cryptocurrency than other people who had similar levels of interest in tech. And 15% of people making $1000+ is way more than I would predict on that model.

Comment by unnamed on A simpler way to think about positive test bias · 2018-01-22T21:11:29.553Z · score: 18 (5 votes) · LW · GW

Terminology request: Can we use the term "positive test bias" instead of "positive bias"?

"Positive bias" seems like bad jargon - it is not used by researchers, an unfamiliar listener would probably think that it had something to do with having an overly rosey view of things, and all of the results on the first page of Google except for those from LW use it to refer to an overly rosey view.

Whereas "positive test bias" is used by some researchers in the same sense that Eliezer used "positive bias", is only used in that sense on the first page of Google hits, is a more precise phrasing of the same idea, and is less likely to be taken by unfamiliar listeners as referring to an overly rosey view.

The term that is most commonly used by researchers is "confirmation bias", but as Eliezer noted in his original post this term gets used to talk about a cluster of related biases; some researchers recognize this and instead talk about "confirmatory biases". Singling out "positive test bias" with a separate label seems like a potentially good case of jargon proliferation - having more terms in order to talk more precisely about different related concepts - but calling it "positive bias" seems like a mistake.

Comment by unnamed on No, Seriously. Just Try It: TAPs · 2018-01-14T23:29:48.202Z · score: 10 (3 votes) · LW · GW

Yeah, a lot of Brienne's posts about cognitive TAPs and noticing are relevant.

Comment by unnamed on No, Seriously. Just Try It: TAPs · 2018-01-14T17:54:09.852Z · score: 14 (5 votes) · LW · GW

Related: Making intentions concrete - Trigger-Action Planning by Kaj_Sotala

Comment by unnamed on Babble · 2018-01-11T09:45:47.125Z · score: 20 (9 votes) · LW · GW
Something something reinforcement learning partially observable Markov decision process I'm in over my head.

Some aspects of this remind me of generative adversarial networks (GANs).

In one use case: The Generator network (Babbler) takes some noies as input and generates an image. The Discriminator network (sorta Pruner) tries to say if that image came from the set of actual photographs or from the Generator. The Discriminator wins if it guesses correctly, the Generator wins if it fools the Discriminator. Both networks get trained up and get better and better at their task. Eventually (if things go right) the Generator makes photorealistic images.

So the pruning happens in two ways: first the Discriminator learns to recognize bad Babble by comparing the Babble with "reality". Then the Generator learns the structure behind what the Discriminator catches and learns a narrower target for what to generate so that it doesn't produce that kind of unrealistic Babble in the first place. And the process iterates - once the Generator learns not to make more obvious mistakes, then the Discriminator learns to catch subtler mistakes.

GANs share the failure mode of a too-strict Prune filter, or more specifically a Discriminator that is much better than the Generator. If every image that the Generator produces is recognized as a fake then it doesn't get feedback about some pieces of Babble being better than others so it stops learning.

(Some other features of Babble aren't captured by GANs.)

Comment by unnamed on Goodhart Taxonomy · 2018-01-03T01:24:55.910Z · score: 17 (4 votes) · LW · GW

Adversarial: A college basketball player who wants to get drafted early and signed to a big contract grows his hair up, so that NBA teams will measure him as being taller (up to the top of his hair).

Comment by unnamed on Goodhart Taxonomy · 2018-01-02T06:55:27.911Z · score: 23 (6 votes) · LW · GW

Height is correlated with basketball ability.

Regressional: But the best basketball player in the world (according to the NBA MVP award) is just 6'3" (1.91m), and a randomly selected 7 foot (2.13m) tall person in his 20s would probably be pretty good at basketball but not NBA caliber. That's regression to the mean; the tails come apart.

Extremal: The tallest person on record, Robert Wadlow, was 8'11" (2.72m). He grew to that height because of a pituitary disorder, he would have struggled to play basketball because he "required leg braces to walk and had little feeling in his legs and feet", and he died at age 22. His basketball ability was well below what one might naively predict based on his height and the regression line, and that is unsurprising because the human body wasn't designed for such an extreme height.