Case Studies Highlighting CFAR’s Impact on Existential Risk 2017-01-10T18:51:53.178Z · score: 4 (5 votes)
Results of a One-Year Longitudinal Study of CFAR Alumni 2015-12-12T04:39:46.399Z · score: 35 (35 votes)
The effect of effectiveness information on charitable giving 2014-04-15T16:43:24.702Z · score: 15 (16 votes)
Practical Benefits of Rationality (LW Census Results) 2014-01-31T17:24:38.810Z · score: 16 (17 votes)
Participation in the LW Community Associated with Less Bias 2012-12-09T12:15:42.385Z · score: 34 (34 votes)
[Link] Singularity Summit Talks 2012-10-28T04:28:54.157Z · score: 8 (11 votes)
Take Part in CFAR Rationality Surveys 2012-07-18T23:57:52.193Z · score: 18 (19 votes)
Meetup : Chicago games at Harold Washington Library (Sun 6/17) 2012-06-13T04:25:05.856Z · score: 0 (1 votes)
Meetup : Weekly Chicago Meetups Resume 5/26 2012-05-16T17:53:54.836Z · score: 0 (1 votes)
Meetup : Weekly Chicago Meetups 2012-04-12T06:14:54.526Z · score: 2 (3 votes)
[LINK] Being proven wrong is like winning the lottery 2011-10-29T22:40:12.609Z · score: 29 (30 votes)
Harry Potter and the Methods of Rationality discussion thread, part 8 2011-08-25T02:17:00.455Z · score: 8 (13 votes)
[SEQ RERUN] Failing to Learn from History 2011-08-09T04:42:37.325Z · score: 4 (5 votes)
[SEQ RERUN] The Modesty Argument 2011-04-23T22:48:04.458Z · score: 6 (7 votes)
[SEQ RERUN] The Martial Art of Rationality 2011-04-19T19:41:19.699Z · score: 7 (8 votes)
Introduction to the Sequence Reruns 2011-04-19T19:39:41.706Z · score: 6 (9 votes)
New Less Wrong Feature: Rerunning The Sequences 2011-04-11T17:01:59.047Z · score: 33 (36 votes)
Preschoolers learning to guess the teacher's password [link] 2011-03-18T04:13:23.945Z · score: 23 (26 votes)
Harry Potter and the Methods of Rationality discussion thread, part 7 2011-01-14T06:49:46.793Z · score: 7 (10 votes)
Harry Potter and the Methods of Rationality discussion thread, part 6 2010-11-27T08:25:52.446Z · score: 6 (9 votes)
Harry Potter and the Methods of Rationality discussion thread, part 3 2010-08-30T05:37:32.615Z · score: 5 (8 votes)
Harry Potter and the Methods of Rationality discussion thread 2010-05-27T00:10:57.279Z · score: 34 (35 votes)
Open Thread: April 2010, Part 2 2010-04-08T03:09:18.648Z · score: 3 (4 votes)
Open Thread: April 2010 2010-04-01T15:21:03.777Z · score: 4 (5 votes)


Comment by unnamed on Dunning Kruger vs. Double Descent · 2020-01-20T03:17:24.562Z · score: 19 (4 votes) · LW · GW

The popular conception of Dunning-Kruger has strayed from what's in Kruger & Dunning's research. Their empirical results look like this, not like the "Mt. Stupid" graph.

Comment by unnamed on The Tails Coming Apart As Metaphor For Life · 2020-01-16T00:26:58.730Z · score: 2 (1 votes) · LW · GW
the most interesting takeaway here is not the part where predictor regressed to the mean, but that extreme things tend to be differently extreme on different axis.

Even though the two variables are strongly correlated, things that are extreme on one variable are somewhat closer to the mean on the other variable.

Comment by unnamed on The Tails Coming Apart As Metaphor For Life · 2020-01-16T00:24:51.988Z · score: 2 (1 votes) · LW · GW

I think they're close to identical. "The tails come apart", "regression to the mean", "regressional Goodhart", "the winner's curse", "the optimizer's curse", and "the unilateralist's curse" are all talking about essentially the same statistical phenomenon. They come at it from different angles, and highlight different implications, and are evocative of different contexts where it is relevant to account for the phenomenon.

Comment by unnamed on How would we check if "Mathematicians are generally more Law Abiding?" · 2020-01-13T02:03:48.525Z · score: 16 (7 votes) · LW · GW

Eric Schwitzgebel has done studies on whether moral philosophers behave more ethically (e.g., here). Some of the measures from that research seem to match reasonably well with law-abidingness (e.g., returning library books, paying conference registration fees, survey response honesty) and could be used in studies of mathematicians.

Comment by unnamed on Are "superforecasters" a real phenomenon? · 2020-01-09T23:15:15.412Z · score: 10 (5 votes) · LW · GW
A better sentence should give the impression that, by way of analogy, some basketball players are NBA players.

This analogy seems like a good way of explaining it. Saying (about forecasting ability) that some people are superforecasters is similar to saying (about basketball ability) that some people are NBA players or saying (about chess ability) that some people are Grandmasters. If you understand in detail the meaning of any one of these claims (or a similar claim about another domain besides forecasting/basketball/chess), then most of what you could say about that claim would port over pretty straightforwardly to the other claims.

Comment by unnamed on Are "superforecasters" a real phenomenon? · 2020-01-09T03:45:37.707Z · score: 13 (5 votes) · LW · GW

I don't see much disagreement between the two sources. The Vox article doesn't claim that there is much reason for selecting the top 2% rather than the top 1% or the top 4% or whatever. And the SSC article doesn't deny that the people who scored in the top 2% (and are thereby labeled "Superforecasters") systematically do better than most at forecasting.

I'm puzzled by the use of the term "power law distribution". I think that the GJP measured forecasting performance using Brier scores, and Brier scores are always between 0 and 1, which is the wrong shape for a fat-tailed distribution. And the next sentence (which begins "that is") isn't describing anything specific to power law distributions. So probably the Vox article is just misusing the term.

Comment by unnamed on We run the Center for Applied Rationality, AMA · 2019-12-22T09:34:23.973Z · score: 21 (6 votes) · LW · GW

(This is Dan, from CFAR since 2012)

Working at CFAR (especially in the early years) was a pretty intense experience, which involved a workflow that regularly threw you into these immersive workshops, and also regularly digging deeply into your thinking and how your mind works and what you could do better, and also trying to make this fledgling organization survive & function. I think the basic thing that happened is that, even for people who were initially really excited about taking this on, things looked different for them a few years later. Part of that is personal, with things like burnout, or feeling like they’d gotten their fill and had learned a large chunk of what they could from this experience, or wanting a life full of experiences which were hard to fit in to this (probably these 3 things overlap). And part of it was professional, where they got excited about other projects for doing good in the world while CFAR wanted to stay pretty narrowly focused on rationality workshops.

I’m tempted to try to go into more detail, but it feels like that would require starting to talk about particular individuals rather the set of people who were involved in early CFAR and I feel weird about that.

Comment by unnamed on We run the Center for Applied Rationality, AMA · 2019-12-22T09:24:44.113Z · score: 26 (8 votes) · LW · GW

(This is Dan from CFAR)

In terms of what happened that day, the article covers it about as well as I could. There’s also a report from the sheriff’s office which goes into a bit more detail about some parts.

For context, all four of the main people involved live in the Bay Area and interact with the rationality community. Three of them had been to a CFAR workshop. Two of them are close to each other, and CFAR had banned them prior to the reunion based on a bunch of concerning things they’ve done. The other two I’m not sure how they got involved.

They have made a bunch of complaints about CFAR and other parts of the community (the bulk of which are false or hard to follow), and it seems like they were trying to create a big dramatic event to attract attention. I’m not sure quite how they expected it to go.

This doesn’t seem like the right venue to go into details to try to sort out the concerns about them or the complaints they’ve raised; there are some people looking into each of those things.

Comment by unnamed on We run the Center for Applied Rationality, AMA · 2019-12-22T08:39:04.234Z · score: 22 (7 votes) · LW · GW

Not precise at all. The confidence interval is HUGE.

stdev = 5.9 (without Bessel's correction)

std error = 2.6

95% CI = (0.5, 10.7)

The confidence interval should not need to go that low. Maybe there's a better way to do the statistics here.

Comment by unnamed on We run the Center for Applied Rationality, AMA · 2019-12-22T08:30:41.401Z · score: 24 (9 votes) · LW · GW

(This is Dan from CFAR)

Warning: this sampling method contains selection effects.

Comment by unnamed on We run the Center for Applied Rationality, AMA · 2019-12-22T08:28:32.024Z · score: 22 (6 votes) · LW · GW

(This is Dan, from CFAR since June 2012)

These are more like “thoughts sparked by Duncan’s post” rather than “thoughts on Duncan’s post”. Thinking about the question of how well you can predict what a workshop experience will be like if you’ve been at a workshop under different circumstances, and looking back over the years...

In terms of what it’s like to be at a mainline CFAR workshop, as a first approximation I’d say that it has been broadly similar since 2013. Obviously there have been a bunch of changes since January 2013 in terms of our curriculum, our level of experience, our staff, and so on, but if you’ve been to a mainline workshop since 2013 (and to some extent even before then), and you’ve also had a lifetime full of other experiences, your experience at that mainline workshop seems like a pretty good guide to what a workshop is like these days. And if you haven’t been to a workshop and are wondering what it’s like, then talking to people who have been to workshops since 2013 seems like a good way to learn about it.

More recent workshops are more similar to the current workshop than older ones. The most prominent cutoff that comes to mind for more vs. less similar workshops is the one I already mentioned (Jan 2013) which is the first time that we basically understood how to run a workshop. The next cutoff that comes to mind is January 2015, which is when the current workshop arc & structure clicked into place. The next is July 2019, which is the second workshop which was run by something like the current team and the first one where we hit our stride (it was also the first one after we started this year's instructor training, which I think helped with hitting our stride). And after that is sometime in 2016 I think when the main classes reached something resembling their current form.

Besides recency, it’s also definitely true that the people at the workshop bring a different feel to it. European workshops have a different feel than US workshops because so many of the people there are from somewhat different cultures. Each staff member brings a different flavor - we try to have staff who approach things in different ways, partly in order to span more of the space of possible ways that it can look like to be engaging with this rationality stuff. The workshop MC (which was generally Duncan’s role while he was involved) does impart more of their flavor on the workshop than most people, although for a single participant their experience is probably shaped more by whichever people they wind up connecting with the most and that can vary a lot even between participants at the same workshop.

Comment by unnamed on We run the Center for Applied Rationality, AMA · 2019-12-22T08:02:58.758Z · score: 27 (10 votes) · LW · GW

I don’t think that time is my main constraint, but here are some of my blog post shaped ideas:

  • Taste propagates through a medium
  • Morality: do-gooding and coordination
  • What to make of ego depletion research
  • Taboo "status"
  • What it means to become calibrated
  • The NFL Combine as a case study in optimizing for a proxy
  • The ability to paraphrase
  • 5 approaches to epistemics
Comment by unnamed on We run the Center for Applied Rationality, AMA · 2019-12-22T07:53:59.413Z · score: 35 (9 votes) · LW · GW

(This is Dan from CFAR)

I did a quick poll of 5 staff members and the average answer was 5.6.

Comment by unnamed on We run the Center for Applied Rationality, AMA · 2019-12-22T07:52:42.100Z · score: 13 (5 votes) · LW · GW

(This is Dan from CFAR)

Guided By The Beauty Of Our Weapons

Asymmetric vs. symmetric tools is now one of the main frameworks that I use to think about rationality (although I wish we had better terminology for it). A rationality technique (as opposed to a productivity hack or a motivation trick or whatever) helps you get more done on something in cases where getting more done is a good idea.

This wasn’t a completely new idea when I read Scott’s post about it, but the post seems to have helped a lot with getting the framework to sink in.

Comment by unnamed on Bayesian examination · 2019-12-10T02:57:43.212Z · score: 2 (1 votes) · LW · GW

I recall hearing about classes at Carnegie Mellon (in the Social and Decision Sciences department) which gave exams in this sort of format.

Comment by unnamed on Integrity and accountability are core parts of rationality · 2019-11-14T09:11:00.489Z · score: 13 (3 votes) · LW · GW

Related: Integrity for consequentialists by Paul Christiano

Comment by unnamed on How do you assess the quality / reliability of a scientific study? · 2019-11-12T22:38:05.665Z · score: 31 (11 votes) · LW · GW

Context: My experience is primarily with psychology papers (heuristics & biases, social psych, and similar areas), and it seems to generalize pretty well to other social science research and fields with similar sorts of methods.

One way to think about this is to break it into three main questions:

1. Is this "result" just noise? Or would it replicate?

2. (If there's something besides noise) Is there anything interesting going on here? Or are all the "effects" just confounds, statistical artifacts, demonstrating the obvious, etc.

3. (If there is something interesting going on here) What is going on here? What's the main takeaway? What can we learn from this? Does it support the claim that some people are tempted to use it to support?

There is some benefit just to explicitly considering all three questions, and keeping them separate.

For #1 ("Is this just noise?") people apparently do a pretty good job of predicting which studies will replicate. Relevant factors include:

1a. How strong is the empirical result (tiny p value, large sample size, precise estimate of effect size, etc.).

1b. How plausible is this effect on priors? Including: How big an effect size would you expect on priors? And: How definitively does the researchers' theory predict this particular empirical result?

1c. Experimenter degrees of freedom / garden of forking paths / possibility of p-hacking. Preregistration is best, visible signs of p-hacking are worst.

1d. How filtered is this evidence? How much publication bias?

1e. How much do I trust the researchers about things like (c) and (d)?

I've found that this post on how to think about whether a replication study "failed" also seems to have helped clarify my thinking about whether a study is likely to replicate.

If there are many studies of essentially the same phenomenon, then try to find the methodologically strongest few and focus mainly on those. (Rather than picking one study at random and dismissing the whole area of research if that study is bad, or assuming that just because there are lots of studies they must add up to solid evidence.)

If you care about effect size, it's also worth keeping in mind that the things which turn noise into "statistically significant results" also tend to inflate effect sizes.

For #2 ("Is there anything interesting going on here?"), understanding methodology & statistics is pretty central. Partly that's background knowledge & expertise that you keep building up over the years, partly that's taking the time & effort to sort out what's going on in this study (if you care about this study and can't sort it out quickly), sometimes you can find other writings which comment on the methodology of this study which can help a lot. You can try googling for criticisms of this particular study or line of research (or check google scholar for papers that have cited it), or google for criticisms of specific methods they used. It is often easier to recognize when someone makes a good argument than to come up with that argument yourself.

One framing that helps me think about a study's methodology (and whether or not there's anything interesting going on here) is to try to flesh out "null hypothesis world": in the world where nothing interesting is going on, what would I expect to see come out of this experimental process? Sometimes I'll come up with more than one world that feels like a null hypothesis world. Exercise: try that with this study (Egan, Santos, Bloom 2007). Another exercise: Try that with the hot hand effect.

#3 ("What is going on here?") is the biggest/broadest question of the three. It's the one that I spend the most time on (at least if the study is any good), and it's the one that I could most easily write a whole bunch about (making lots of points and elaborating on them). But it's also the one that is the most distant from Eli's original question, and I don't want to turn those post into a big huge essay, so I'll just highlight a few things here.

A big part of the challenge is thinking for yourself about what's going on and not being too anchored on how things are described by the authors (or the press release or the person who told you about the study). Some moves here:

3a. Imagine (using your inner sim) being a participant in the study, such that you can picture what each part of the study was like. In particular, be sure that you understand every experimental manipulation and measurement in concrete terms (okay, so then they filled out this questionnaire which asked if you agree with statements like such-and-such and blah-blah-blah).

3b. Be sure you can clearly state the pattern of results of the main finding, in a concrete way which is not laden with the authors' theory (e.g. not "this group was depleted" but "this group gave up on the puzzles sooner"). You need this plus 3a to understand what happened in the study, then from there you're trying to draw inferences about what the study implies.

3c. Come up with (one or several) possible models/theories about what could be happening in this study. Especially look for ones that seem commonsensical / that are based in how you'd inner sim yourself or other people in the experimental scenario. It's fine if you have a model that doesn't make a crisp prediction, or if you have a theory that seems a lot like the authors' theory (but without their jargon). Exercise: try that with a typical willpower depletion study.

3d. Have in mind the key takeaway of the study (e.g., the one sentence summary that you would tell a friend; this is the thing that's the main reason why you're interested in reading the study). Poke at that sentence to see if you understand what each piece of it means. As you're looking at the study, see if that key takeaway actually holds up. e.g., Does the main pattern of results match this takeaway or do they not quite match up? Does the study distinguish the various models that you've come up with well enough to strongly support this main takeaway? Can you edit the takeaway claim to make it more precise / to more clearly reflect what happened in the study / to make the specifics of the study unsurprising to someone who heard the takeaway? What sort of research would it take to provide really strong support for that takeaway, and how does the study at hand compare to that?

3e. Look for concrete points of reference outside of this study which resemble the sort of thing the researchers are talking about. Search in particular for ones that seem out-of-sync with this study. e.g., This study says not to tell other people your goals, but the other day I told Alex about something I wanted to do and that seemed useful; do the specifics of this experiment change my sense of whether that conversation with Alex was a good idea?

Some narrower points which don't neatly fit into my 3-category breakdown:

A. If you care about effect sizes then consider doing a Fermi estimate, or otherwise translating the effect size into numbers that are intuitively meaningful to you. Also think about the range of possible effect sizes rather than just the point estimate, and remember that the issues with noise in #1 also inflate effect size.

B. If the paper finds a null effect and claims that it's meaningful (e.g., that the intervention didn't help) then you do care about effect sizes. (e.g., If it claims the intervention failed because it had no effect on mortality rates, then you might assume a value of $10M per life and try to calculate a 95% confidence interval on the value of the intervention based solely on its effect on mortality.)

C. New papers that claim to debunk an old finding are often right when they claim that the old finding has issues with #1 (it didn't replicate) or #2 (it had methodological flaws) but are rarely actually debunkings if they claim that the old finding has issues with #3 (it misdescribes what's really going on). The new study on #3 might be important and cause you to change your thinking in some ways, but it's generally an incremental update rather than a debunking. Examples that look to me like successful debunkings: behavioral social priming research (#1), the Dennis-dentist effect (#2), the hot hand fallacy (#2 and some of B), the Stanford Prison Experiment (closest to #2), various other things that didn't replicate (#1). Examples of alleged "debunkings" which seem like interesting but overhyped incremental research: the bystander effect (#3), loss aversion (this study) (#3), the endowment effect (#3).

Comment by unnamed on Epistemic Spot Check: Unconditional Parenting · 2019-11-12T20:14:29.874Z · score: 8 (4 votes) · LW · GW

My experience was similar to Habryka's. I followed the "too small and subdivided" link to find more details on what exactly the book claimed about the research and how the research looked to you. I didn't see more details on the page where I landed, and couldn't tell where to navigate from there, so I gave up on that and didn't bother clicking any other links from the article. I think I had a similar experience the last time you relied on Roam links. So I've been getting more out of your epistemic spot checks when they've included the content in the post.

Comment by unnamed on How feasible is long-range forecasting? · 2019-10-15T21:13:09.353Z · score: 13 (3 votes) · LW · GW

The shape of the graph will depend a lot on what questions you ask. So it's hard to interpret many aspects of the graph without seeing the questions that it's based on (or at least a representative subset of questions).

In particular, my recollection is that some GJP questions took the form "Will [event] happen by [date]?", where the market closed around the same time as the date that was asked about. These sorts of questions essentially become different questions as time passes - a year before the date they are asking if the event will happen in a one-year-wide future time window, but a month before the date they are instead asking if the event either will happen in a one-month-wide future time window or if it has already happened in an eleven-months-wide past time window. People can give more and more confident answers as the event draws closer because it's easier to know if the event happened in the past than it is to know if the event will happen in the future, regardless of whether predicting the near future is easier than predicting the far future.

For example, consider the question "an earthquake of at least such-and-such magnitude will happen in such-and-such region between October 16 2019 and October 15 2020". If you know that the propensity for such earthquakes is that they have a probability p of happening each day on average, and you have no information that allows you to make different guesses about different times, then the math on this question is pretty straightforward. Your initial estimate will be that there's a (1-p)^365 chance of No Qualifying Earthquake. Each day that passes with no qualifying earthquake happening, you'll increase the probability you put on No Qualifying Earthquake by reducing the exponent by 1 ("I know that an earthquake didn't happen yesterday, so now how likely is to happen over the next 364 days?", etc.). And if a qualifying earthquake ever does happen then you'll change your prediction to a 100% chance of earthquake in that window (0% chance of No Qualifying Earthquake). You're able to predict the near future (e.g. probability of an earthquake on October 17 2019) and the distant future (e.g. probability of an earthquake on October 14 2020) equally well, but with this [event] by [date] formulation of the question it'll look like you're able to correctly get more and more confident as the date grows closer.

Comment by unnamed on A Critique of Functional Decision Theory · 2019-09-14T16:59:04.493Z · score: 2 (1 votes) · LW · GW
Perhaps the Scots tend to one-box, whereas the English tend to two-box.

My intuition is that two-boxing is the correct move in this scenario where the Predictor always fills the box with $1M for the Scots and never for the English. An Englishman has no hope of walking away with the $1M, so why should he one-box? He could wind up being one of the typical Englishmen who walk away with $1000, or one of the atypical Englishmen who walk away with $0, but he is not going to wind up being an Englishman who walks away with $1M because those don't exist and he is not going to wind up being a Scottish millionaire because he is English.

EDT might also recommend two-boxing in this scenario, because empirically p($1M | English & one-box) = 0.

Comment by unnamed on What's In A Name? · 2019-08-27T04:51:29.061Z · score: 9 (4 votes) · LW · GW

These studies have not held up well to further rigor. See Scott's 2016 post Devoodooifying Psychology, or even better Simonsohn's (2011) paper Spurious? Name similarity effects (implicit egotism) in marriage, job, and moving decisions.

Comment by unnamed on Solving for X instead of 3 in love triangles? · 2019-07-23T01:47:48.232Z · score: 5 (3 votes) · LW · GW

Number of weakly connected digraphs with n nodes.

Comment by unnamed on Bystander effect false? · 2019-07-12T06:57:41.611Z · score: 24 (7 votes) · LW · GW

It also seems worth noting that this study looked at whether people intervened in aggressive public conflicts, which is a type of situation where the bystander's safety could be at risk and there can be safety in numbers. A lone bystander intervening in a fight is at higher risk of getting hurt, compared to a group of 10 bystanders acting together. This factor doesn't exist (or is much weaker) in situations like "does anyone stop to see if the person lying on the ground needs medical help" or "does anyone notify the authorities about the smoke which might indicate a fire emergency." So I'd be cautious about generalizing to those sorts of situations.

Comment by unnamed on Bystander effect false? · 2019-07-12T06:53:17.822Z · score: 35 (10 votes) · LW · GW

The standard claim in bystander effect research is that an individual bystander's probability of intervening goes down as the number of bystanders increases (see, e.g., Wikipedia). Whereas this study looked at the probability of any intervention from the group of bystanders, which is a different thing.

The abstract of the paper actually begins with this distinction:

Half a century of research on bystander behavior concludes that individuals are less likely to intervene during an emergency when in the presence of others than when alone. By contrast, little is known regarding the aggregated likelihood that at least someone present at an emergency will do something to help.

So: not a debunking. And another example of why it's good practice to check the paper in question (or at least its abstract) and the Wikipedia article(s) on the topic rather than believing news headlines.

Comment by unnamed on Why the tails come apart · 2019-06-18T04:56:47.544Z · score: 27 (5 votes) · LW · GW

One angle for thinking about why the tails come apart (which seems worth highlighting even more than it was highlighted in the OP) is that the farther out you go in the tail on some variable, the smaller the set of people you're dealing with.

Which is better, the best basketball team that you can put together from people born in Pennsylvania or the best basketball team that you can put together from people born in Delaware? Probably the Pennsylvania team, since there are about 13x as many people in that state so you get to draw from a larger pool. If there were no other relevant differences between the states then you'd expect 13 of the best 14 players to be Pennsylvanians, and probably the two neighboring states are similar enough so that Delaware can't overcome that population gap.

Now, imagine you're picking the best 10 basketball players from the 1,000 tallest basketball-aged Americans (20-34 year-olds), and you're putting together another group consisting of the best 10 basketball players from the next 100,000 tallest basketball-aged Americans. Which is a better group of basketball players? In this case it's not obvious - getting to pick from a pool of 100x as many people is an obvious advantage, but that height advantage could matter a lot too. That's the tails coming apart - the very tallest don't necessarily give you the very best basketball players, because "the very tallest" is a much smaller set than the "also really tall but not quite as tall".

(I ran some numbers and estimate that the two teams are pretty similar in basketball ability. Which is a remarkable sign of how important height is for basketball - one pool has about a 4 inch height advantage on average, the other pool has 100x as many people, and those factors roughly balance out. If you want the example to more definitively show the tails coming apart, you can expand the larger pool by another factor of 30x and then they'll clearly be better.)

Similarly, who has higher arm strength: the one person in our sample who has the highest grip strength, or the most arm-strong person out of the next ten people who rank 2-11 in grip strength? Grip strength is closely related to arm strength, but you get to pick the best from a 10x larger pool if you give up a little bit of grip strength. In the graph in the OP, the person who was 6th (or maybe 5th) in grip strength had the highest arm strength, so getting to pick from a pool of 10 was more important. (The average arm strength of the people ranked 2-11 in grip strength was lower than the arm strength of the #1 gripper, but we get to pick out the strongest arm of the ten rather than averaging them.)

So: the tails come apart because most of the people aren't way out on the tail. And you usually won't find the very best person at something if you're looking in a tiny pool, even if that's a pretty well selected pool.

Thrasymachus's intuitive explanation covered this - having a smaller pool to pick from hurts because there are other variables that matter, and the smaller the pool the less you get to select for people who do well on those other variables. But his explanation highlighted the "other variables matter" part of this more than the pool size part of it, and both of these points of emphasis seem helpful for getting an intuitive grasp of the statistics in these types of situations, so I figured I'd add this comment.

Comment by unnamed on The Schelling Choice is "Rabbit", not "Stag" · 2019-06-09T00:43:45.445Z · score: 11 (5 votes) · LW · GW
And I said, in a move designed to be somewhat socially punishing: "I don't really trust the conversation to go anywhere useful." And then I took out my laptop and mostly stopped paying attention.

This 'social punishment' move seems problematic, in a way that isn't highlighted in the rest of the post.

One issue: What are you punishing them for? It seems like the punishment is intended to enforce the norm that you wanted the group to have, which is a different kind of move than enforcing a norm that is already established. Enforcing existing norms is generally prosocial, but it's more problematic if each person is trying to enforce the norms that he personally wishes the group to have.

A second thing worth highlighting is that this attempt at norm enforcement looked a lot like a norm violation (of norms against disengaging from a meeting). Sometimes "punishing others for violating norms" is a special case where it's appropriate to do something which would otherwise be a norm violation, but that's often a costly/risky way of doing things (especially when the norm you're enforcing isn't clearly established and so your actions are less legible).

Comment by unnamed on Asymmetric Weapons Aren't Always on Your Side · 2019-06-07T21:42:12.177Z · score: 29 (8 votes) · LW · GW

When Scott used the term "asymmetric weapons", I understood him to mean truth-asymmetric weapons or weapons that favor what's good & true. He was trying to set that particular dimension of asymmetry apart from the various other ways in which a weapon might be more useful in some hands than in others.

I think it's an important concept, and I wish we had better terminology for it.

Comment by unnamed on Was CFAR always intended to be a distinct organization from MIRI? · 2019-05-28T00:41:09.265Z · score: 9 (5 votes) · LW · GW

Yes. Or at least, becoming a distinct organization was already the plan when I got there in early 2012: get a group of people together to create a rationality organization, initially rely on MIRI for institutional support, become an independent organization some months later once all the pieces are in place to do so.

Comment by unnamed on Nash equilibriums can be arbitrarily bad · 2019-05-02T05:45:37.329Z · score: 9 (6 votes) · LW · GW

Unilateral precommitment lets people win at "Almost Free Lunches".

One way to model precommitment is as a sequential game: first player 1 chooses a number, then player 1 has the option of either showing that number to player 2 or keeping it hidden, then player 2 chooses a number. Optimal play is for player 1 to pick £1,000,000 and show that number, and then for player 2 to choose £999,999.99.

An interesting feature of this is that player 1's precommitment helped player 2 even more than it helped player 1. Player 1 is "taking one for the team", but still winning big. This distinguishes it from games like chicken, where precommitment is a threat that allows the precommitter to win the larger share. Though this means that if either player can precommit (rather than one being pre-assigned to go first as player 1) then they'd both prefer to have the other one be the precommitter.

This benefit of precommitment does not extend to the two option version (n2 vs. n1). In that version, player 2 is incentivized to say "n1" regardless of what player 1 commits to, so unilateral precommitment doesn't help them avoid the Nash Equilibrium. As in the prisoner's dilemma.

Comment by unnamed on How much funding and researchers were in AI, and AI Safety, in 2018? · 2019-03-30T00:39:52.063Z · score: 20 (4 votes) · LW · GW

Some numbers related to c (how many capabilities researchers):

In 2018 about 8,500 people attended NeurIPS and about 4,000 people attended ICML. There are about 2,000 researchers who work at Google AI, and in December 2017 there were reports that about 700 total people work at DeepMind including about 400 with a PhD.

Turning this into a single estimate for "number of researchers" is tricky for the sorts of reasons that catherio gives. Capabilities researchers is a fuzzy category and it's not clear to what extent people who are working on advancing the state of the art in general AI capabilities should include people who are primarily working on applications using the current art and people who are primarily working on advancing the state of the art in narrower subfields. Also obviously only some fraction of the relevant researchers attended those conferences or work at those companies.

I'll suggest 10,000 people as a rough order-of-magnitude estimate. I'd be surprised if the number that came out of a more careful estimation process wasn't within a factor of ten of that.

Comment by unnamed on Blackmail · 2019-02-20T06:30:00.201Z · score: 11 (7 votes) · LW · GW

After discussing this offline, I think the main argument that I laid out does not hold up well in the case of blackmail (though it works better for many other kinds of threats). They key bit is here:

if Bob refuses and Alice carries out her threat then it is negative sum (Bob loses a lot and Alice loses something too)

This only looks at the effects on Alice and on Bob, as a simplification. But with blackmail "carrying out the threat" means telling other people information about Bob, and that is often useful for those other people. If Alice tells Casey something bad about Bob, that will often be bad for Bob but good for Casey. So it's not obviously negative sum for the whole world.

Comment by unnamed on Blackmail · 2019-02-19T23:06:02.111Z · score: 23 (11 votes) · LW · GW

There's a pretty simple economic argument for why blackmail is bad: it involves a negative-sum threat rather than a positive-sum deal. I was surprised to not see this argument in the econbloggers' discussion; good to see it come up here. To lay it out succinctly and separate from other arguments:

Ordinarily, when two people make a deal we can conclude that it's win-win because both of them chose to make the deal rather than just not interacting with each other. By default Alice would just act on her own preferences and completely ignore Bob's preferences, and the mirror image for Bob, but sometimes they find a deal where they each give up something in return for the other person doing something that they value even more. With some simplifying assumptions, the worst case scenario is that they don't reach a deal and they both break even (compared to if they hadn't interacted), and if they do reach a deal then they both wind up better off.

With a threat, Alice has an alternative course of action available which is somewhat worse for Alice than her default action but much worse for Bob, and Alice tells Bob that she will do the alternative action unless Bob does something for Alice. With some simplifying assumptions, if Bob agrees to give in then their interaction is zero-sum (Alice gets a transfer from Bob), if Bob refuses and Alice carries out her threat then it is negative sum (Bob loses a lot and Alice loses something too), and if Bob refuses and Alice backs down then it's zero sum (both take the default action).

Ordinary deals add value to the world and threats subtract value from the world, and blackmail is a type of threat.

If we remove some simplifying assumptions (e.g. no transaction costs, one-shot interaction) then things get more complicated, but mostly in ways that make ordinary deals better and threats worse. In the long run deals bring people together as they seek more interactions which could lead to win-win deals, deals encourage people to invest in abilities that will make them more useful to other people so that they'll have more/better opportunities to make deals, and the benefits of deals must outweigh the transaction costs & risks involved at least in expectation (otherwise people would just opt out of trying to make those deals). Whereas threats push people apart as they seek to avoid negative sum interactions, threats encourage people to invest in abilities that make them more able to harm other people, and transaction costs increase the badness of threats (turning zero sum interactions into negative sum) but don't prevent those interactions unless they drive the threatmaker's returns down far enough.

Comment by unnamed on Epistemic Tenure · 2019-02-19T21:57:47.387Z · score: 32 (17 votes) · LW · GW

I think that there's a spectrum between treating someone as a good source of conclusions and treating them as a good source of hypotheses.

I can have thoughts like "Carol looked closely into the topic and came away convinced that Y is true, so for now I'm going to act as if Y is probably true" if I take Carol to be a good source of conclusions.

Whereas if I took Alice to be a good source of hypotheses but not a good source of conclusions, then I would instead have thoughts like "Alice insists that Z is true, so Z seems like something that's worth thinking about more."

Giving someone epistemic tenure as a source of conclusions seems much more costly than giving them epistemic tenure as a source of hypotheses.

Comment by unnamed on Avoiding Jargon Confusion · 2019-02-18T05:09:09.594Z · score: 4 (2 votes) · LW · GW

Huh? I am sufficiently surprised/confused by this example to want a citation.

Edit: The surprise/confusion was in reference to the pre-edit version of the above comment, and does not apply to the current edition.

Comment by Unnamed on [deleted post] 2019-02-16T22:35:31.453Z

I think we should take more care to separate the question of of whether AI developments will be decentralized with the question of whether decentralization is safer. It is not obvious to me whether a decentralized, economy-wide path to advanced AIs will be safer or riskier than a concentrated path within a single organization. It seems like the opening sentence of this question is carrying the assumption that decentralized is safer ("Robin Hanson has argued that those who believe AI Risk to be a primary concern for humanity, are suffering from a bias toward thinking that concentration of power is always more efficient than a decentralised system").

Comment by unnamed on Greatest Lower Bound for AGI · 2019-02-06T09:02:52.496Z · score: 4 (3 votes) · LW · GW

I think you mean 50/62 = 0.81?

Comment by unnamed on The Valley of Bad Theory · 2018-10-13T06:16:47.569Z · score: 2 (1 votes) · LW · GW

Sometimes theory can open up possibilities rather than closing them off. In these cases, once you have a theory that claims that X is important, then you can explore different values of X and do local hill-climbing. But before that it is difficult to explore by varying X, either because there are too many dimensions or because there is some subtlety in recognizing that X is a dimension and being able to vary its level.

This depends on being able to have and use a theory without believing it.

Comment by unnamed on Does This Fallacy Have A Name? · 2018-10-03T05:19:18.138Z · score: 6 (5 votes) · LW · GW

This sounds most similar to what LWers call generalizing from one example or the typical mind fallacy and to what psychologists call the false-consensus effect or egocentric bias.

Comment by unnamed on No standard metric for CFAR workshops? · 2018-09-14T20:24:35.670Z · score: 15 (5 votes) · LW · GW

Here are relatively brief responses on these 3 particular points; I’ve made a separate comment which lays out my thinking on metrics like the Big 5 which provides some context for these responses.

We have continued to collect measures like the ones in the 2015 longitudinal study. We are mainly analyzing them in large batches, rather than workshop to workshop, because the sample size isn’t big enough to distinguish signal from noise for single workshops. One of the projects that I’m currently working on is an analysis of a couple years of these data.

The 2017 impact report was not intended as a comprehensive account of all of CFAR’s metrics, it was just focused on CFAR’s EA impact. So it looked at the data that were most directly related to CFAR alums’ impact on the world, and “on average alums have some increase in conscientiousness” seemed less relevant than the information that we did include. The first few paragraphs of the report say more about this.

I’m curious why you’re especially interested in Raven’s Progressive Matrices. I haven’t looked closely at the literature on it, but my impression is that it’s one of many metrics which are loosely related to the thing that we mean by “rationality.” It has the methodological advantage of being a performance score rather than self-report (though this is partially offset by the possibility of practice effects and effort effects). The big disadvantage is the one that Kaj pointed to: it seems to track relatively stable aspects of a person’s thinking skills, and might not change much even if a person made large improvements. For instance, I could imagine a person developing MacGyver-level problem-solving ability while having little or no change in their Raven’s score.

Comment by unnamed on No standard metric for CFAR workshops? · 2018-09-14T20:20:34.515Z · score: 22 (7 votes) · LW · GW

Here’s a sketch of my thinking about the usefulness of metrics like the Big 5 for what CFAR is trying to do.

It would be convenient if there was a definitive measure of a person’s rationality which closely matched what we mean by the term and was highly sensitive to changes. But as far as I can tell there isn’t one, and there isn’t likely to be one anytime soon. So we rely on a mix of indicators, including some that are more like systematic metrics, some that are more like individuals’ subjective impressions, and some that are in between.

I think of the established psychology metrics (Big 5, life satisfaction, general self-efficacy, etc.) as primarily providing a sanity check on whether the workshop is doing something, along with a very very rough picture of some of what it is doing. They are quantitative measures that don’t rely on staff members’ subjective impressions of participants, they have been validated (at least to some extent) in existing psychology research, and they seem at least loosely related to the effects that CFAR hopes to have. And, compared to other ways of evaluating CFAR’s impact on individuals, they’re relatively easy for an outsider to make sense of.

A major limitation of these established psychology metrics is that they haven’t been that helpful as feedback loops. One of the main purposes of a metric is to provide input into CFAR’s day-to-day and workshop-to-workshop efforts to develop better techniques and refine the workshop. That is hard to do with metrics like the ones in the longitudinal study, because of a combination of a few factors:

  1. The results aren’t available until several months after the workshop, which would make for very slow feedback loops and iteration.
  2. The results are too noisy to tell if changes from one workshop to the next are just random variation. It takes several workshops worth of data to get a clear signal on most of the metrics.
  3. These metrics are only loosely related to what we care about. If a change to the workshop leads to larger increases in conscientiousness that does not necessarily mean that we want to do it, and when a curriculum developer is working on a class they are generally not that interested in these particular metrics.
  4. These metrics are relatively general/coarse indicators of the effect of the workshop as a whole, not tied to particular inputs. So (for example) if we make some changes to the TAPs class and want to see if the new version of the class works better or worse, there isn’t a metric that isolates the effects of the TAPs class from the rest of the workshop.
Comment by unnamed on No standard metric for CFAR workshops? · 2018-09-06T18:59:04.994Z · score: 50 (11 votes) · LW · GW

(This is Dan from CFAR)

CFAR's 2015 Longitudinal Study measured the Big 5 and some other standard psychology metrics. It did find changes including decreased neuroticism and increased conscientiousness.

Comment by unnamed on Birth order effect found in Nobel Laureates in Physics · 2018-09-05T20:28:18.269Z · score: 16 (7 votes) · LW · GW

Seems interesting to get data on:

Some group that isn't heavily selected for intelligence / intellectual curiosity: skateboarders, protestors, professional hockey players...

Some non-STEM group that is selected for success based on mental skills: literature laureates, governors, ...

Not sure which groups it would be easy to get data on.

There is also the option of looking into existing research on birth order to see what groups other people have already looked at.

Comment by unnamed on nostalgebraist - bayes: a kinda-sorta masterpost · 2018-09-04T23:45:56.702Z · score: 28 (11 votes) · LW · GW

Seems worth noting that nostalgebraist published this post in June 2017, which was (for example) before Eliezer's post on toolbox thinking.

Comment by unnamed on Birth order effect found in Nobel Laureates in Physics · 2018-09-04T22:41:38.976Z · score: 27 (13 votes) · LW · GW

Now that we have data on LWers/SSCers, mathematicians, and physicists, if anyone wants to put more work into this I'd like to see them look someplace different. We don't want to fall into the Wason 2-4-6 trap of only looking for birth order effects among smart STEM folks. We want data that can distinguish Scott's intelligence / intellectual curiosity hypothesis from other possibilities like some non-big-5 personality difference or a general firstborns more likely phenomenon.

Comment by unnamed on Historical mathematicians exhibit a birth order effect too · 2018-08-21T07:14:29.942Z · score: 6 (3 votes) · LW · GW

For each mathematician, actual firstbornness was coded as 0 or 1, and expected firstbornness as 1/n (where n is the number of children that their parents had). Then we just did a paired t-test, which is equivalent to subtracting actual minus expected for each data point and then doing a one sample t-test against a mean of 0. You can see this all in Eli's spreadsheet here; the data are also all there for you to try other statistical tests if you want to.

Comment by Unnamed on [deleted post] 2018-08-14T07:53:29.520Z

You could think of CEV applied to a single unitary agent as a special case where achieving coherence is trivial. It's an edge case where the problem becomes easier, rather than an edge case where the concepts threaten to break.

Although this terminology makes it harder to talk about several agents who each separately have their own extrapolated volition (as you were trying to do in your original comment in this thread). Though replacing it with Personal Extrapolated Volition only helps a little, if we also want to talk about several separately groups who each have their own within-group extrapolated volition (which is coherent within each group but not between groups).

Comment by unnamed on Logarithms and Total Utilitarianism · 2018-08-13T05:40:09.291Z · score: 10 (4 votes) · LW · GW

Looking at the math of dividing a fixed pool of resources among a non-fixed number of people, a feature of log(r) that matters a lot is that log(0)<0. The first unit of resources that you give to a person is essentially wasted, because it just gets them up to 0 utility (which is no better than just having 1 fewer person around).

That favors having fewer people, so that you don't have to keep wasting that first unit of resource on each person. If the utility function for a person in terms of their resources was f(r)=r-1 you would similarly find that it is best not to have too many people (in that case having exactly 1 person would work best).

Whereas if it was f(r)=sqrt(r) then it would be best to have as many people as possible, because you're starting from 0 utility at 0 resources and sqrt is steepest right near 0. Doing the calculation... if you have R units of resources divided equally among N people, the total utility is sqrt(RN). log(1+r) is similar to sqrt - it increases as N increases - but it is bounded if R is fixed and just approaches that bound (if we use natural log, that bound is just R).

To sum up: diminishing marginal utility favors having more people each with fewer resources (in addition to favoring equal distribution of resources), f(0)<0 favors having fewer people each with more resources (to avoid "wasting" the bit of resources that get a person up to 0 utility), and functions with both features like log(r) favor some intermediate solution with a moderate population size.

Comment by unnamed on Logarithms and Total Utilitarianism · 2018-08-13T03:26:14.442Z · score: 4 (2 votes) · LW · GW

Total utilitarianism does imply the repugnant conclusion, very straightforwardly.

For example, imagine that world A has 1000000000000000000 people each with 10000000 utility and world Z has 10000000000000000000000000000000000000000 people each with 0.0000000001 utility. Which is better?

Total utilitarianism says that you just multiply. World A has 10^18 people x 10^7 utility per person = 10^25 total utility. World Z has 10^40 people x 10^-10 utility per person = 10^30 total utility. World Z is way better.

This seems repugnant; intuitively world Z is much worse than world A.

Parfit went through cleverer steps because he wanted his argument to apply more generally, not just to total utilitarianism. Even much weaker assumptions can get to this repugnant-seeming conclusion that a world like Z is better than a world like A.

The point is that lots of people are confused about axiology. When they try to give opinions about population ethics, judging in various scenarios whether one hypothetical world is better than another, they'll wind up making judgments that are inconsistent with each other.

Comment by unnamed on Logarithms and Total Utilitarianism · 2018-08-13T03:13:11.641Z · score: 2 (1 votes) · LW · GW

The paragraph that I was quoting from was just about diminishing marginal utility and equality/redistribution, not about the repugnant conclusion in particular.

Comment by unnamed on Open Thread August 2018 · 2018-08-12T05:13:29.957Z · score: 4 (2 votes) · LW · GW

The beta distribution is often used to represent this type of scenario. It is straightforward to update in simple cases where you get more data points, though it's not straightforward to update based on messier evidence like hearing someone's opinion.