Using smart thermometer data to estimate the number of coronavirus cases 2020-03-23T04:26:32.890Z · score: 30 (9 votes)
Case Studies Highlighting CFAR’s Impact on Existential Risk 2017-01-10T18:51:53.178Z · score: 4 (5 votes)
Results of a One-Year Longitudinal Study of CFAR Alumni 2015-12-12T04:39:46.399Z · score: 35 (35 votes)
The effect of effectiveness information on charitable giving 2014-04-15T16:43:24.702Z · score: 15 (16 votes)
Practical Benefits of Rationality (LW Census Results) 2014-01-31T17:24:38.810Z · score: 16 (17 votes)
Participation in the LW Community Associated with Less Bias 2012-12-09T12:15:42.385Z · score: 34 (34 votes)
[Link] Singularity Summit Talks 2012-10-28T04:28:54.157Z · score: 8 (11 votes)
Take Part in CFAR Rationality Surveys 2012-07-18T23:57:52.193Z · score: 18 (19 votes)
Meetup : Chicago games at Harold Washington Library (Sun 6/17) 2012-06-13T04:25:05.856Z · score: 0 (1 votes)
Meetup : Weekly Chicago Meetups Resume 5/26 2012-05-16T17:53:54.836Z · score: 0 (1 votes)
Meetup : Weekly Chicago Meetups 2012-04-12T06:14:54.526Z · score: 2 (3 votes)
[LINK] Being proven wrong is like winning the lottery 2011-10-29T22:40:12.609Z · score: 29 (30 votes)
Harry Potter and the Methods of Rationality discussion thread, part 8 2011-08-25T02:17:00.455Z · score: 8 (13 votes)
[SEQ RERUN] Failing to Learn from History 2011-08-09T04:42:37.325Z · score: 4 (5 votes)
[SEQ RERUN] The Modesty Argument 2011-04-23T22:48:04.458Z · score: 6 (7 votes)
[SEQ RERUN] The Martial Art of Rationality 2011-04-19T19:41:19.699Z · score: 7 (8 votes)
Introduction to the Sequence Reruns 2011-04-19T19:39:41.706Z · score: 6 (9 votes)
New Less Wrong Feature: Rerunning The Sequences 2011-04-11T17:01:59.047Z · score: 33 (36 votes)
Preschoolers learning to guess the teacher's password [link] 2011-03-18T04:13:23.945Z · score: 23 (26 votes)
Harry Potter and the Methods of Rationality discussion thread, part 7 2011-01-14T06:49:46.793Z · score: 7 (10 votes)
Harry Potter and the Methods of Rationality discussion thread, part 6 2010-11-27T08:25:52.446Z · score: 6 (9 votes)
Harry Potter and the Methods of Rationality discussion thread, part 3 2010-08-30T05:37:32.615Z · score: 5 (8 votes)
Harry Potter and the Methods of Rationality discussion thread 2010-05-27T00:10:57.279Z · score: 34 (35 votes)
Open Thread: April 2010, Part 2 2010-04-08T03:09:18.648Z · score: 3 (4 votes)
Open Thread: April 2010 2010-04-01T15:21:03.777Z · score: 4 (5 votes)


Comment by unnamed on Has LessWrong been a good early alarm bell for the pandemic? · 2020-04-04T03:55:49.340Z · score: 5 (3 votes) · LW · GW

In the broader rationality/EA community there was also a Siderea post on Jan 30 and an 80K podcast on Feb 3 (along with a followup podcast on Feb 14).

These two, plus Matthew Barnett's late Jan EA Forum post (which you linked), are the three examples I recall which look most like early visible public alarms from the rationality/EA community.

Other writing was less visible (e.g., on Twitter, Facebook, or Metaculus), less alarm-like (discussions of some aspect of what was happening rather than a call to attention), or later (like the putanumonit Seeing the Smoke post on Feb 27).

Comment by unnamed on Has LessWrong been a good early alarm bell for the pandemic? · 2020-04-03T22:34:03.635Z · score: 7 (4 votes) · LW · GW

I think this post is giving the stock market too much credit.

I'd date the start of the stock market fall as February 24 rather than February 20. The S&P close on Feb 20 & Feb 21 was roughly the same as it had been over the previous couple weeks, and higher than the close on Feb 7, 5, 4, or 3. The first notable dip happened on February 24th; that was the first day that set a low for the month of Feb 2020 (and Feb 25 was the first day that set a low for calendar year 2020).

Also, that was just the start of the crash. The stock market continued falling sharply and erratically for a couple more weeks, and didn't get within 10% of its current level until March 12th (2.5 weeks after it started its fall on Feb 24).

Comment by unnamed on April Fools: Announcing LessWrong 3.0 – Now in VR! · 2020-04-01T08:50:51.188Z · score: 38 (17 votes) · LW · GW

This is now my favorite way to read HPMOR. I love the Star Wars feel.

Comment by unnamed on mind viruses about body viruses · 2020-03-29T05:15:26.059Z · score: 2 (1 votes) · LW · GW

I think Scott linked to Pueyo's essay as an illustration of the ideas, not as the source from which the smart people got the ideas.

Which means that this post's attempt to track & evaluate the information flows is working off of an inaccurate map of how information has flowed.

Comment by unnamed on March Coronavirus Open Thread · 2020-03-25T22:34:30.279Z · score: 4 (2 votes) · LW · GW

Keep in mind that the trend in the number of confirmed cases only provides hints about the trend in new infections. The number of confirmed cases is highly dependent on the amount of testing, and increases in testing capacity will tend to lead to more confirmed cases. Also, there is a substantial delay between when a person is infected and when they test positive, typically somewhere in the range of 1-2 weeks (with the length of the delay also depending on the testing regime).

Comment by unnamed on Using smart thermometer data to estimate the number of coronavirus cases · 2020-03-23T19:59:20.799Z · score: 2 (1 votes) · LW · GW

I think that's right. Although the data still can tell us something after we get into that ambiguous range where it's hard to distinguish increasing covid and decreasing flu.

One nice thing about this pattern is that it provides some evidence that the anti-covid interventions are reducing the spread of fever-inducing diseases. And the size of the drop in total fevers tells us something about how well they're working on the whole, even if it doesn't tell us the precise trend in covid cases.

Another thing that might be possible is to find other sources of data on the actual prevalence of flu, and use that to come up with a better "baseline" which reflects actual current conditions rather than an estimate of the trendline in the counterfactual world where there was no coronavirus pandemic.

A third thing is that 0 is a lower bound on the number of non-covid fevers, so the trend in total fevers is an upper bound on the number of covid cases.

This third thing already tells us something about Seattle (King County). Their peak in excess fevers happened March 9 at 1.76 scale points (observed minus expected), and the March 22 data show the total fevers at 2.77 scale points. As an upper bound, if those are all covid fevers, that is 1.6x as many new daily cases on March 22 compared to March 9. That's 13 days, and not even a full doubling in the number of daily new fevers. Which suggests that suppression there is either working or coming very close to working (even though the number of confirmed cases has kept curving upward, at least through March 21).

Comment by unnamed on Using smart thermometer data to estimate the number of coronavirus cases · 2020-03-23T19:38:56.459Z · score: 2 (1 votes) · LW · GW

If you look at the time series for King County (Seattle area), it shows a spike peaking on March 9 with the upward trend beginning sometime around Feb 28 - Mar 2.

I think the pattern of a spike and then flattening & maybe decline (which has happened at different times in different regions) reflects a drop in the number of influenza cases, as people's anti-covid precautions also prevent flu transmission. So the baseline estimate of how many new fevers there would be if there wasn't a coronavirus pandemic doesn't actually represent the number of non-covid fevers, because there are fewer non-covid fevers than there would've been without this pandemic.

Elizabeth's comment also describes this.

Comment by unnamed on How can we estimate how many people are C19 infected in an area? · 2020-03-23T04:34:34.877Z · score: 12 (4 votes) · LW · GW

Kinsa, a company that sells smart thermometers, has a dashboard that shows which regions of the US have an unusually high number of fevers. They have previously used these methods to track regional flu trends in the US. (FitBit has done something similar.)

I wrote a post here describing my attempt to turn their data into a rough estimate of the total number of coronavirus infections in the United States. Something similar could be done for smaller regions.

Comment by unnamed on Using the Quantified Self paradigma for COVID-19 · 2020-03-23T02:42:29.678Z · score: 11 (6 votes) · LW · GW

I agree that a lot could be done with those sorts of data.

One company that already is making some use of a similar dataset is Kinsa, who sells smart thermometers. They started a few years ago, tracking trends in the flu in the US based on the temperature readings of the people using their thermometers (along with location, age, and gender). Now they have a coronavirus tracking website up. It looks like the biggest useful thing that they've been able to do so far with their data is to quickly identify hotspots - parts of the country where there has been a spike in the number of people with a fever. That used to be a sign of a local flu outbreak, now it's a sign of a local coronavirus outbreak. From the NYTimes:

Just last Saturday, Kinsa’s data indicated an unusual rise in fevers in South Florida, even though it was not known to be a Covid-19 epicenter. Within days, testing showed that South Florida had indeed become an epicenter.

Companies like Fitbit could make a similar pivot, looking to see if they can find atypical trends in their data in the Seattle area Feb 28 - Mar 9, the Miami area Mar 2-19, etc. And they might be able to take the extra step of identifying new indicators that help identify individuals who may have coronavirus (unlike Kinsa, as high body temperature was already a known indicator).

There are potentially a bunch more useful things that could be done with all of these datasets, if more researchers had access to them. For example, it might be possible to get much more accurate estimates of the number of people who have been infected with coronavirus. I may make another post about this soon.

Comment by unnamed on COVID-19's Household Secondary Attack Rate Is Unknown · 2020-03-17T00:51:08.429Z · score: 9 (5 votes) · LW · GW

Has there been research from other similarish diseases breaking down the household secondary attack rate by relevant variables? It seems like there could be large differences between:

romantic partners who sleep in the same bed vs. housemates who sleep in different rooms

circumstances where the household has heightened concerns and is taking precautions vs. unsuspecting households

situations where people are removed from the household shortly after they're infected vs. households where people continue to live after infection

Group houses are mostly in the safer of the two possibilities for the first 2 of these 3.

Comment by unnamed on A Significant Portion of COVID-19 Transmission Is Presymptomatic · 2020-03-16T05:13:24.053Z · score: 5 (3 votes) · LW · GW

I was looking at this paper (for other reasons) and saw that it estimated a mean serial interval of 6.3 days in Shenzhen while there was aggressive testing, contact tracing, and isolating. They report that the mean serial interval was 3.6 days among patients who were infected by someone who was isolated within 2 days of symptom onset, and 8.1 days among patients who were infected by someone who wasn't isolated until 3+ days after symptom onset, for an overall average serial interval of 6.3 in their population. They found R=0.4 - an average of 0.4 known transmissions from each infected person.

Comment by unnamed on March Coronavirus Open Thread · 2020-03-16T02:29:44.752Z · score: 2 (1 votes) · LW · GW

This paper looks at cases which were confirmed in Shenzhen (Guangdong, China) Jan 14 - Feb 12, which is while coronavirus was being brought under control there (by the end of the study the cases had fallen to less than 1/3 of their peak). I suspect that they qualify for point 1, a place with an unusually good testing regime.

The paper reports that "Cases detected through symptom-based surveillance were confirmed on average 5.5 days (95% CI 5.0, 5.9) after symptom onset (Figure 3, Table S2); compared to 3.2 days (95% CI 2.6,3.7) in those detected by contact-based surveillance", and also that the median incubation period was 4.8 days from infection to symptom onset (in the smaller sample where both of those dates were known).

Adding 5.5+4.8, that implies that an average of 10.3 days passed between when a person became infected and when they tested positive for cases detected based on symptoms, and 8.0 days for those detected by contact tracing. Since the paper reports that 77% of cases were detected through symptom-based surveillance, that gives an overall average of 9.8 days. (And this is only for the cases that were detected; it's not adjusting at all for people who were infected by never got a positive test.)

That means that in places where testing is as good as it was in Shenzhen, then the number of positive tests is telling us about the number of infections 9.8 days ago. If the number of cases in that region is doubling every 4 days, then that's 2.4 doublings, so the number of confirmed cases would only be 18% of the actual number of cases due to the delay in testing (again, without factoring in people who never got tested). (With a 3 day doubling period it would be 10%, with a 5 day doubling period 26%.)

So in places that don't have a good testing regime it would be significantly less than that.

Comment by unnamed on A Significant Portion of COVID-19 Transmission Is Presymptomatic · 2020-03-14T08:18:38.858Z · score: 3 (2 votes) · LW · GW

Yeah, I agree that contact tracing & testing/quarantining contacts is good, and that presymptomatic transmission is possible.

It looked to me like you were claiming that the hypothesis "stopping all symptomatic transmission is sufficient to prevent the number of COVID-19 cases from curving upwards" has been tested by some countries' measures and found to be false, and I am questioning that apparent assertion.

Comment by unnamed on A Significant Portion of COVID-19 Transmission Is Presymptomatic · 2020-03-14T07:08:21.487Z · score: 3 (2 votes) · LW · GW

I notice that the estimates of serial interval (almost?) all come from places that had pretty aggressive & successful containment measures in place, such as identifying & isolating potential carriers (including people who show symptoms, traced contacts, and high-risk travelers). That would tend to shorten the serial interval, since people who are identified early in their infection lose the opportunity to transmit during the later portion of their illness.

Are there estimates of what R was for these populations? If it's a lot less than the 2-3 that other studies have found that would be some evidence that a lot of later-stage transmissions were prevented.

Comment by unnamed on A Significant Portion of COVID-19 Transmission Is Presymptomatic · 2020-03-14T06:57:11.547Z · score: 7 (3 votes) · LW · GW
COVID-19 is successfully spreading in countries which have taken these measures ["tell people to stay home if they have those symptoms"] and other more extreme measures

How true is this? I haven't delved in that closely, but my impression is that a big part of what's been successful in containing the spread in places like Hong Kong and mainland China has involved identifying & isolating people as soon as they show symptoms.

Comment by unnamed on March Coronavirus Open Thread · 2020-03-12T21:59:42.730Z · score: 4 (2 votes) · LW · GW

Here's a method to try to estimate the number of cases in a region which I haven't seen calculations of:

1. Identify the places which have the best testing regimes

2. Try to estimate what fraction of cases are identified in those places, potentially along with other variables like how long from infection until the case is identified

3. Use those numbers to extrapolate to other places, based on other similarities between those places besides # of confirmed cases (e.g., number of deaths, or rate of infection in travelers coming from that place, or hospital utilization rate)

I have made some initially attempts to do this, which I'll try to post later today. I'm wondering if anyone has thoughts or sources on any of these 3 points (e.g., which places have the best testing regimes?), or on the method as a whole.

Comment by unnamed on March Coronavirus Open Thread · 2020-03-12T08:35:35.499Z · score: 5 (3 votes) · LW · GW

I think each little bit of curve flattening makes things a little less bad (since a smaller number of cases are beyond capacity, and a little more time is created to prepare), but the graphs tend to draw the "capacity" line unrealistically high. This graph is more realistic than many since the flattened curve still peaks above the capacity line, but it still paints too rosy a picture.

Comment by unnamed on Growth rate of COVID-19 outbreaks · 2020-03-10T08:25:22.042Z · score: 5 (3 votes) · LW · GW

Agreed that #2 could be a big issue. Rapid increase in confirmed cases could easily be due to rapid increase in testing rather than (such) rapid spread of the virus.

What would the graphs look like if they plotted the number of deaths attributed to COVID-19 rather than the number of confirmed cases? In theory the number of deaths should mostly be a lagged & noisier reflection of the number of cases, with less dependence on testing regimes.

Comment by unnamed on Model estimating the number of infected persons in the bay area · 2020-03-09T10:11:40.978Z · score: 17 (3 votes) · LW · GW

I also made an estimate of the number of cases in the bay area, based on deaths and estimated death rate. My calculations are in this spreadsheet.

Comment by unnamed on 2018 Review: Voting Results! · 2020-01-26T21:55:32.744Z · score: 9 (4 votes) · LW · GW
Pearson correlation between karma and vote count is 0.355

And it's even larger (r = -0.46) between amount of karma and ranking in the vote.

Comment by unnamed on Modest Superintelligences · 2020-01-26T00:04:10.067Z · score: 2 (1 votes) · LW · GW

Oh, you're right.

With A & B iid normal variables, if you take someone who is 1 in a billion at A+B, then in the median case they will be 1 in 90,000 at A. Then if you take someone who is 1 in 90,000 at A and give them the median level of B, they will be 1 in 750 at A+B.

(You can get to rarer levels by reintroducing some of the variation rather than taking the median case twice.)

Comment by unnamed on Modest Superintelligences · 2020-01-25T06:50:13.689Z · score: 4 (2 votes) · LW · GW

The component should have a smaller standard deviation, though. If A and B each have stdev=1 & are independent then A+B has stdev=sqrt(2).

I think that means that we'd expect someone who is +6 sigma on A+B to be about +3*sqrt(2) sigma on A in the median case. That's +4.24 sigma, or 1 in 90,000.

Comment by unnamed on Modest Superintelligences · 2020-01-25T06:43:37.322Z · score: 2 (1 votes) · LW · GW

500 seems too small. If someone is 1 in 30,000 on A and 1 in 30,000 on B, then about 1 in a billion will be at least as extreme as them on both A and B. That's not exactly the number that we're looking for but it seems like it should give the right order of magnitude (30,000 rather than 500).

And it seems like the answer we're looking for should be larger than 30,000, since people who are more extreme than them on A+B includes everyone who is more extreme than them on both A and B, plus some people who are more extreme on only either A or B. That would make extreme scores on A+B more common, so we need a larger number than 30,000 to keep it as rare as 1 in a billion.

Comment by Unnamed on [deleted post] 2020-01-20T03:17:24.562Z

The popular conception of Dunning-Kruger has strayed from what's in Kruger & Dunning's research. Their empirical results look like this, not like the "Mt. Stupid" graph.

Comment by unnamed on The Tails Coming Apart As Metaphor For Life · 2020-01-16T00:26:58.730Z · score: 2 (1 votes) · LW · GW
the most interesting takeaway here is not the part where predictor regressed to the mean, but that extreme things tend to be differently extreme on different axis.

Even though the two variables are strongly correlated, things that are extreme on one variable are somewhat closer to the mean on the other variable.

Comment by unnamed on The Tails Coming Apart As Metaphor For Life · 2020-01-16T00:24:51.988Z · score: 2 (1 votes) · LW · GW

I think they're close to identical. "The tails come apart", "regression to the mean", "regressional Goodhart", "the winner's curse", "the optimizer's curse", and "the unilateralist's curse" are all talking about essentially the same statistical phenomenon. They come at it from different angles, and highlight different implications, and are evocative of different contexts where it is relevant to account for the phenomenon.

Comment by unnamed on How would we check if "Mathematicians are generally more Law Abiding?" · 2020-01-13T02:03:48.525Z · score: 16 (7 votes) · LW · GW

Eric Schwitzgebel has done studies on whether moral philosophers behave more ethically (e.g., here). Some of the measures from that research seem to match reasonably well with law-abidingness (e.g., returning library books, paying conference registration fees, survey response honesty) and could be used in studies of mathematicians.

Comment by unnamed on Are "superforecasters" a real phenomenon? · 2020-01-09T23:15:15.412Z · score: 10 (5 votes) · LW · GW
A better sentence should give the impression that, by way of analogy, some basketball players are NBA players.

This analogy seems like a good way of explaining it. Saying (about forecasting ability) that some people are superforecasters is similar to saying (about basketball ability) that some people are NBA players or saying (about chess ability) that some people are Grandmasters. If you understand in detail the meaning of any one of these claims (or a similar claim about another domain besides forecasting/basketball/chess), then most of what you could say about that claim would port over pretty straightforwardly to the other claims.

Comment by unnamed on Are "superforecasters" a real phenomenon? · 2020-01-09T03:45:37.707Z · score: 13 (5 votes) · LW · GW

I don't see much disagreement between the two sources. The Vox article doesn't claim that there is much reason for selecting the top 2% rather than the top 1% or the top 4% or whatever. And the SSC article doesn't deny that the people who scored in the top 2% (and are thereby labeled "Superforecasters") systematically do better than most at forecasting.

I'm puzzled by the use of the term "power law distribution". I think that the GJP measured forecasting performance using Brier scores, and Brier scores are always between 0 and 1, which is the wrong shape for a fat-tailed distribution. And the next sentence (which begins "that is") isn't describing anything specific to power law distributions. So probably the Vox article is just misusing the term.

Comment by unnamed on We run the Center for Applied Rationality, AMA · 2019-12-22T09:34:23.973Z · score: 21 (6 votes) · LW · GW

(This is Dan, from CFAR since 2012)

Working at CFAR (especially in the early years) was a pretty intense experience, which involved a workflow that regularly threw you into these immersive workshops, and also regularly digging deeply into your thinking and how your mind works and what you could do better, and also trying to make this fledgling organization survive & function. I think the basic thing that happened is that, even for people who were initially really excited about taking this on, things looked different for them a few years later. Part of that is personal, with things like burnout, or feeling like they’d gotten their fill and had learned a large chunk of what they could from this experience, or wanting a life full of experiences which were hard to fit in to this (probably these 3 things overlap). And part of it was professional, where they got excited about other projects for doing good in the world while CFAR wanted to stay pretty narrowly focused on rationality workshops.

I’m tempted to try to go into more detail, but it feels like that would require starting to talk about particular individuals rather the set of people who were involved in early CFAR and I feel weird about that.

Comment by unnamed on We run the Center for Applied Rationality, AMA · 2019-12-22T09:24:44.113Z · score: 26 (8 votes) · LW · GW

(This is Dan from CFAR)

In terms of what happened that day, the article covers it about as well as I could. There’s also a report from the sheriff’s office which goes into a bit more detail about some parts.

For context, all four of the main people involved live in the Bay Area and interact with the rationality community. Three of them had been to a CFAR workshop. Two of them are close to each other, and CFAR had banned them prior to the reunion based on a bunch of concerning things they’ve done. The other two I’m not sure how they got involved.

They have made a bunch of complaints about CFAR and other parts of the community (the bulk of which are false or hard to follow), and it seems like they were trying to create a big dramatic event to attract attention. I’m not sure quite how they expected it to go.

This doesn’t seem like the right venue to go into details to try to sort out the concerns about them or the complaints they’ve raised; there are some people looking into each of those things.

Comment by unnamed on We run the Center for Applied Rationality, AMA · 2019-12-22T08:39:04.234Z · score: 22 (7 votes) · LW · GW

Not precise at all. The confidence interval is HUGE.

stdev = 5.9 (without Bessel's correction)

std error = 2.6

95% CI = (0.5, 10.7)

The confidence interval should not need to go that low. Maybe there's a better way to do the statistics here.

Comment by unnamed on We run the Center for Applied Rationality, AMA · 2019-12-22T08:30:41.401Z · score: 24 (9 votes) · LW · GW

(This is Dan from CFAR)

Warning: this sampling method contains selection effects.

Comment by unnamed on We run the Center for Applied Rationality, AMA · 2019-12-22T08:28:32.024Z · score: 22 (6 votes) · LW · GW

(This is Dan, from CFAR since June 2012)

These are more like “thoughts sparked by Duncan’s post” rather than “thoughts on Duncan’s post”. Thinking about the question of how well you can predict what a workshop experience will be like if you’ve been at a workshop under different circumstances, and looking back over the years...

In terms of what it’s like to be at a mainline CFAR workshop, as a first approximation I’d say that it has been broadly similar since 2013. Obviously there have been a bunch of changes since January 2013 in terms of our curriculum, our level of experience, our staff, and so on, but if you’ve been to a mainline workshop since 2013 (and to some extent even before then), and you’ve also had a lifetime full of other experiences, your experience at that mainline workshop seems like a pretty good guide to what a workshop is like these days. And if you haven’t been to a workshop and are wondering what it’s like, then talking to people who have been to workshops since 2013 seems like a good way to learn about it.

More recent workshops are more similar to the current workshop than older ones. The most prominent cutoff that comes to mind for more vs. less similar workshops is the one I already mentioned (Jan 2013) which is the first time that we basically understood how to run a workshop. The next cutoff that comes to mind is January 2015, which is when the current workshop arc & structure clicked into place. The next is July 2019, which is the second workshop which was run by something like the current team and the first one where we hit our stride (it was also the first one after we started this year's instructor training, which I think helped with hitting our stride). And after that is sometime in 2016 I think when the main classes reached something resembling their current form.

Besides recency, it’s also definitely true that the people at the workshop bring a different feel to it. European workshops have a different feel than US workshops because so many of the people there are from somewhat different cultures. Each staff member brings a different flavor - we try to have staff who approach things in different ways, partly in order to span more of the space of possible ways that it can look like to be engaging with this rationality stuff. The workshop MC (which was generally Duncan’s role while he was involved) does impart more of their flavor on the workshop than most people, although for a single participant their experience is probably shaped more by whichever people they wind up connecting with the most and that can vary a lot even between participants at the same workshop.

Comment by unnamed on We run the Center for Applied Rationality, AMA · 2019-12-22T08:02:58.758Z · score: 27 (10 votes) · LW · GW

I don’t think that time is my main constraint, but here are some of my blog post shaped ideas:

  • Taste propagates through a medium
  • Morality: do-gooding and coordination
  • What to make of ego depletion research
  • Taboo "status"
  • What it means to become calibrated
  • The NFL Combine as a case study in optimizing for a proxy
  • The ability to paraphrase
  • 5 approaches to epistemics
Comment by unnamed on We run the Center for Applied Rationality, AMA · 2019-12-22T07:53:59.413Z · score: 35 (9 votes) · LW · GW

(This is Dan from CFAR)

I did a quick poll of 5 staff members and the average answer was 5.6.

Comment by unnamed on We run the Center for Applied Rationality, AMA · 2019-12-22T07:52:42.100Z · score: 13 (5 votes) · LW · GW

(This is Dan from CFAR)

Guided By The Beauty Of Our Weapons

Asymmetric vs. symmetric tools is now one of the main frameworks that I use to think about rationality (although I wish we had better terminology for it). A rationality technique (as opposed to a productivity hack or a motivation trick or whatever) helps you get more done on something in cases where getting more done is a good idea.

This wasn’t a completely new idea when I read Scott’s post about it, but the post seems to have helped a lot with getting the framework to sink in.

Comment by unnamed on Bayesian examination · 2019-12-10T02:57:43.212Z · score: 2 (1 votes) · LW · GW

I recall hearing about classes at Carnegie Mellon (in the Social and Decision Sciences department) which gave exams in this sort of format.

Comment by unnamed on Integrity and accountability are core parts of rationality · 2019-11-14T09:11:00.489Z · score: 13 (3 votes) · LW · GW

Related: Integrity for consequentialists by Paul Christiano

Comment by unnamed on How do you assess the quality / reliability of a scientific study? · 2019-11-12T22:38:05.665Z · score: 31 (11 votes) · LW · GW

Context: My experience is primarily with psychology papers (heuristics & biases, social psych, and similar areas), and it seems to generalize pretty well to other social science research and fields with similar sorts of methods.

One way to think about this is to break it into three main questions:

1. Is this "result" just noise? Or would it replicate?

2. (If there's something besides noise) Is there anything interesting going on here? Or are all the "effects" just confounds, statistical artifacts, demonstrating the obvious, etc.

3. (If there is something interesting going on here) What is going on here? What's the main takeaway? What can we learn from this? Does it support the claim that some people are tempted to use it to support?

There is some benefit just to explicitly considering all three questions, and keeping them separate.

For #1 ("Is this just noise?") people apparently do a pretty good job of predicting which studies will replicate. Relevant factors include:

1a. How strong is the empirical result (tiny p value, large sample size, precise estimate of effect size, etc.).

1b. How plausible is this effect on priors? Including: How big an effect size would you expect on priors? And: How definitively does the researchers' theory predict this particular empirical result?

1c. Experimenter degrees of freedom / garden of forking paths / possibility of p-hacking. Preregistration is best, visible signs of p-hacking are worst.

1d. How filtered is this evidence? How much publication bias?

1e. How much do I trust the researchers about things like (c) and (d)?

I've found that this post on how to think about whether a replication study "failed" also seems to have helped clarify my thinking about whether a study is likely to replicate.

If there are many studies of essentially the same phenomenon, then try to find the methodologically strongest few and focus mainly on those. (Rather than picking one study at random and dismissing the whole area of research if that study is bad, or assuming that just because there are lots of studies they must add up to solid evidence.)

If you care about effect size, it's also worth keeping in mind that the things which turn noise into "statistically significant results" also tend to inflate effect sizes.

For #2 ("Is there anything interesting going on here?"), understanding methodology & statistics is pretty central. Partly that's background knowledge & expertise that you keep building up over the years, partly that's taking the time & effort to sort out what's going on in this study (if you care about this study and can't sort it out quickly), sometimes you can find other writings which comment on the methodology of this study which can help a lot. You can try googling for criticisms of this particular study or line of research (or check google scholar for papers that have cited it), or google for criticisms of specific methods they used. It is often easier to recognize when someone makes a good argument than to come up with that argument yourself.

One framing that helps me think about a study's methodology (and whether or not there's anything interesting going on here) is to try to flesh out "null hypothesis world": in the world where nothing interesting is going on, what would I expect to see come out of this experimental process? Sometimes I'll come up with more than one world that feels like a null hypothesis world. Exercise: try that with this study (Egan, Santos, Bloom 2007). Another exercise: Try that with the hot hand effect.

#3 ("What is going on here?") is the biggest/broadest question of the three. It's the one that I spend the most time on (at least if the study is any good), and it's the one that I could most easily write a whole bunch about (making lots of points and elaborating on them). But it's also the one that is the most distant from Eli's original question, and I don't want to turn those post into a big huge essay, so I'll just highlight a few things here.

A big part of the challenge is thinking for yourself about what's going on and not being too anchored on how things are described by the authors (or the press release or the person who told you about the study). Some moves here:

3a. Imagine (using your inner sim) being a participant in the study, such that you can picture what each part of the study was like. In particular, be sure that you understand every experimental manipulation and measurement in concrete terms (okay, so then they filled out this questionnaire which asked if you agree with statements like such-and-such and blah-blah-blah).

3b. Be sure you can clearly state the pattern of results of the main finding, in a concrete way which is not laden with the authors' theory (e.g. not "this group was depleted" but "this group gave up on the puzzles sooner"). You need this plus 3a to understand what happened in the study, then from there you're trying to draw inferences about what the study implies.

3c. Come up with (one or several) possible models/theories about what could be happening in this study. Especially look for ones that seem commonsensical / that are based in how you'd inner sim yourself or other people in the experimental scenario. It's fine if you have a model that doesn't make a crisp prediction, or if you have a theory that seems a lot like the authors' theory (but without their jargon). Exercise: try that with a typical willpower depletion study.

3d. Have in mind the key takeaway of the study (e.g., the one sentence summary that you would tell a friend; this is the thing that's the main reason why you're interested in reading the study). Poke at that sentence to see if you understand what each piece of it means. As you're looking at the study, see if that key takeaway actually holds up. e.g., Does the main pattern of results match this takeaway or do they not quite match up? Does the study distinguish the various models that you've come up with well enough to strongly support this main takeaway? Can you edit the takeaway claim to make it more precise / to more clearly reflect what happened in the study / to make the specifics of the study unsurprising to someone who heard the takeaway? What sort of research would it take to provide really strong support for that takeaway, and how does the study at hand compare to that?

3e. Look for concrete points of reference outside of this study which resemble the sort of thing the researchers are talking about. Search in particular for ones that seem out-of-sync with this study. e.g., This study says not to tell other people your goals, but the other day I told Alex about something I wanted to do and that seemed useful; do the specifics of this experiment change my sense of whether that conversation with Alex was a good idea?

Some narrower points which don't neatly fit into my 3-category breakdown:

A. If you care about effect sizes then consider doing a Fermi estimate, or otherwise translating the effect size into numbers that are intuitively meaningful to you. Also think about the range of possible effect sizes rather than just the point estimate, and remember that the issues with noise in #1 also inflate effect size.

B. If the paper finds a null effect and claims that it's meaningful (e.g., that the intervention didn't help) then you do care about effect sizes. (e.g., If it claims the intervention failed because it had no effect on mortality rates, then you might assume a value of $10M per life and try to calculate a 95% confidence interval on the value of the intervention based solely on its effect on mortality.)

C. New papers that claim to debunk an old finding are often right when they claim that the old finding has issues with #1 (it didn't replicate) or #2 (it had methodological flaws) but are rarely actually debunkings if they claim that the old finding has issues with #3 (it misdescribes what's really going on). The new study on #3 might be important and cause you to change your thinking in some ways, but it's generally an incremental update rather than a debunking. Examples that look to me like successful debunkings: behavioral social priming research (#1), the Dennis-dentist effect (#2), the hot hand fallacy (#2 and some of B), the Stanford Prison Experiment (closest to #2), various other things that didn't replicate (#1). Examples of alleged "debunkings" which seem like interesting but overhyped incremental research: the bystander effect (#3), loss aversion (this study) (#3), the endowment effect (#3).

Comment by unnamed on Epistemic Spot Check: Unconditional Parenting · 2019-11-12T20:14:29.874Z · score: 8 (4 votes) · LW · GW

My experience was similar to Habryka's. I followed the "too small and subdivided" link to find more details on what exactly the book claimed about the research and how the research looked to you. I didn't see more details on the page where I landed, and couldn't tell where to navigate from there, so I gave up on that and didn't bother clicking any other links from the article. I think I had a similar experience the last time you relied on Roam links. So I've been getting more out of your epistemic spot checks when they've included the content in the post.

Comment by unnamed on How feasible is long-range forecasting? · 2019-10-15T21:13:09.353Z · score: 13 (3 votes) · LW · GW

The shape of the graph will depend a lot on what questions you ask. So it's hard to interpret many aspects of the graph without seeing the questions that it's based on (or at least a representative subset of questions).

In particular, my recollection is that some GJP questions took the form "Will [event] happen by [date]?", where the market closed around the same time as the date that was asked about. These sorts of questions essentially become different questions as time passes - a year before the date they are asking if the event will happen in a one-year-wide future time window, but a month before the date they are instead asking if the event either will happen in a one-month-wide future time window or if it has already happened in an eleven-months-wide past time window. People can give more and more confident answers as the event draws closer because it's easier to know if the event happened in the past than it is to know if the event will happen in the future, regardless of whether predicting the near future is easier than predicting the far future.

For example, consider the question "an earthquake of at least such-and-such magnitude will happen in such-and-such region between October 16 2019 and October 15 2020". If you know that the propensity for such earthquakes is that they have a probability p of happening each day on average, and you have no information that allows you to make different guesses about different times, then the math on this question is pretty straightforward. Your initial estimate will be that there's a (1-p)^365 chance of No Qualifying Earthquake. Each day that passes with no qualifying earthquake happening, you'll increase the probability you put on No Qualifying Earthquake by reducing the exponent by 1 ("I know that an earthquake didn't happen yesterday, so now how likely is to happen over the next 364 days?", etc.). And if a qualifying earthquake ever does happen then you'll change your prediction to a 100% chance of earthquake in that window (0% chance of No Qualifying Earthquake). You're able to predict the near future (e.g. probability of an earthquake on October 17 2019) and the distant future (e.g. probability of an earthquake on October 14 2020) equally well, but with this [event] by [date] formulation of the question it'll look like you're able to correctly get more and more confident as the date grows closer.

Comment by unnamed on A Critique of Functional Decision Theory · 2019-09-14T16:59:04.493Z · score: 2 (1 votes) · LW · GW
Perhaps the Scots tend to one-box, whereas the English tend to two-box.

My intuition is that two-boxing is the correct move in this scenario where the Predictor always fills the box with $1M for the Scots and never for the English. An Englishman has no hope of walking away with the $1M, so why should he one-box? He could wind up being one of the typical Englishmen who walk away with $1000, or one of the atypical Englishmen who walk away with $0, but he is not going to wind up being an Englishman who walks away with $1M because those don't exist and he is not going to wind up being a Scottish millionaire because he is English.

EDT might also recommend two-boxing in this scenario, because empirically p($1M | English & one-box) = 0.

Comment by unnamed on What's In A Name? · 2019-08-27T04:51:29.061Z · score: 9 (4 votes) · LW · GW

These studies have not held up well to further rigor. See Scott's 2016 post Devoodooifying Psychology, or even better Simonsohn's (2011) paper Spurious? Name similarity effects (implicit egotism) in marriage, job, and moving decisions.

Comment by unnamed on Solving for X instead of 3 in love triangles? · 2019-07-23T01:47:48.232Z · score: 5 (3 votes) · LW · GW

Number of weakly connected digraphs with n nodes.

Comment by unnamed on Bystander effect false? · 2019-07-12T06:57:41.611Z · score: 24 (7 votes) · LW · GW

It also seems worth noting that this study looked at whether people intervened in aggressive public conflicts, which is a type of situation where the bystander's safety could be at risk and there can be safety in numbers. A lone bystander intervening in a fight is at higher risk of getting hurt, compared to a group of 10 bystanders acting together. This factor doesn't exist (or is much weaker) in situations like "does anyone stop to see if the person lying on the ground needs medical help" or "does anyone notify the authorities about the smoke which might indicate a fire emergency." So I'd be cautious about generalizing to those sorts of situations.

Comment by unnamed on Bystander effect false? · 2019-07-12T06:53:17.822Z · score: 35 (10 votes) · LW · GW

The standard claim in bystander effect research is that an individual bystander's probability of intervening goes down as the number of bystanders increases (see, e.g., Wikipedia). Whereas this study looked at the probability of any intervention from the group of bystanders, which is a different thing.

The abstract of the paper actually begins with this distinction:

Half a century of research on bystander behavior concludes that individuals are less likely to intervene during an emergency when in the presence of others than when alone. By contrast, little is known regarding the aggregated likelihood that at least someone present at an emergency will do something to help.

So: not a debunking. And another example of why it's good practice to check the paper in question (or at least its abstract) and the Wikipedia article(s) on the topic rather than believing news headlines.

Comment by unnamed on Why the tails come apart · 2019-06-18T04:56:47.544Z · score: 27 (5 votes) · LW · GW

One angle for thinking about why the tails come apart (which seems worth highlighting even more than it was highlighted in the OP) is that the farther out you go in the tail on some variable, the smaller the set of people you're dealing with.

Which is better, the best basketball team that you can put together from people born in Pennsylvania or the best basketball team that you can put together from people born in Delaware? Probably the Pennsylvania team, since there are about 13x as many people in that state so you get to draw from a larger pool. If there were no other relevant differences between the states then you'd expect 13 of the best 14 players to be Pennsylvanians, and probably the two neighboring states are similar enough so that Delaware can't overcome that population gap.

Now, imagine you're picking the best 10 basketball players from the 1,000 tallest basketball-aged Americans (20-34 year-olds), and you're putting together another group consisting of the best 10 basketball players from the next 100,000 tallest basketball-aged Americans. Which is a better group of basketball players? In this case it's not obvious - getting to pick from a pool of 100x as many people is an obvious advantage, but that height advantage could matter a lot too. That's the tails coming apart - the very tallest don't necessarily give you the very best basketball players, because "the very tallest" is a much smaller set than the "also really tall but not quite as tall".

(I ran some numbers and estimate that the two teams are pretty similar in basketball ability. Which is a remarkable sign of how important height is for basketball - one pool has about a 4 inch height advantage on average, the other pool has 100x as many people, and those factors roughly balance out. If you want the example to more definitively show the tails coming apart, you can expand the larger pool by another factor of 30x and then they'll clearly be better.)

Similarly, who has higher arm strength: the one person in our sample who has the highest grip strength, or the most arm-strong person out of the next ten people who rank 2-11 in grip strength? Grip strength is closely related to arm strength, but you get to pick the best from a 10x larger pool if you give up a little bit of grip strength. In the graph in the OP, the person who was 6th (or maybe 5th) in grip strength had the highest arm strength, so getting to pick from a pool of 10 was more important. (The average arm strength of the people ranked 2-11 in grip strength was lower than the arm strength of the #1 gripper, but we get to pick out the strongest arm of the ten rather than averaging them.)

So: the tails come apart because most of the people aren't way out on the tail. And you usually won't find the very best person at something if you're looking in a tiny pool, even if that's a pretty well selected pool.

Thrasymachus's intuitive explanation covered this - having a smaller pool to pick from hurts because there are other variables that matter, and the smaller the pool the less you get to select for people who do well on those other variables. But his explanation highlighted the "other variables matter" part of this more than the pool size part of it, and both of these points of emphasis seem helpful for getting an intuitive grasp of the statistics in these types of situations, so I figured I'd add this comment.

Comment by unnamed on The Schelling Choice is "Rabbit", not "Stag" · 2019-06-09T00:43:45.445Z · score: 11 (5 votes) · LW · GW
And I said, in a move designed to be somewhat socially punishing: "I don't really trust the conversation to go anywhere useful." And then I took out my laptop and mostly stopped paying attention.

This 'social punishment' move seems problematic, in a way that isn't highlighted in the rest of the post.

One issue: What are you punishing them for? It seems like the punishment is intended to enforce the norm that you wanted the group to have, which is a different kind of move than enforcing a norm that is already established. Enforcing existing norms is generally prosocial, but it's more problematic if each person is trying to enforce the norms that he personally wishes the group to have.

A second thing worth highlighting is that this attempt at norm enforcement looked a lot like a norm violation (of norms against disengaging from a meeting). Sometimes "punishing others for violating norms" is a special case where it's appropriate to do something which would otherwise be a norm violation, but that's often a costly/risky way of doing things (especially when the norm you're enforcing isn't clearly established and so your actions are less legible).

Comment by unnamed on Asymmetric Weapons Aren't Always on Your Side · 2019-06-07T21:42:12.177Z · score: 29 (8 votes) · LW · GW

When Scott used the term "asymmetric weapons", I understood him to mean truth-asymmetric weapons or weapons that favor what's good & true. He was trying to set that particular dimension of asymmetry apart from the various other ways in which a weapon might be more useful in some hands than in others.

I think it's an important concept, and I wish we had better terminology for it.