How large is the harm from info-cascades? [Info-cascade series]

post by jacobjacob, Ben Pace (Benito) · 2019-03-13T10:55:38.872Z · score: 23 (4 votes) · LW · GW · 2 comments

This is a question post.

Contents

  Answers
    12 jacobjacob
None
1 comment

This is a question in the info-cascade question series [LW · GW]. There is a prize pool of up to $800 for answers to these questions. See the link above for full background on the problem (including a bibliography) as well as examples of responses we’d be especially excited to see.

___

How can we quantify the impact (harm) of info-cascades?

There are many ways in which info-cascades are harmful. Insofar as people base their decisions on the cascaded info, this can result in bad career choices, mistaken research directions, misallocation of grants, a culture that is easier to hijack by cleverly signalling outsiders (by simply “joining the bubble”), and more.

But in order to properly allocate resources to work on info-cascades we need a better model of how large the effects are, and how they compare with other problems. How can we think about info-cascades from a cost-effectiveness perspective?

We are especially interested in answers to this question that ultimately bear on the effective altruism/rationality communities, or analyses of other institutions with insights that transfer to these communities.

As an example step in this direction, we built a Guesstimate model, which is described in an answer below.

Answers

answer by jacobjacob · 2019-03-13T10:26:49.855Z · score: 12 (4 votes) · LW · GW

Me and Ben Pace (with some help from Niki Shams) made a Guesstimate model of how much information cascades is costing science in terms of wasted grant money. The model is largely based on the excellent paper “How citation distortions create unfounded authority: analysis of a citation network” (Greenberg, 2009), which traces how an uncertain claim in biomedicine is inflated to established knowledge over a period of 15 years, and used to justify ~$10 million in grant money from the NIH (we calculated the number ourselves here).

There are many open questions about some of the inputs to our model as well as how this generalises outside of academia (or even outside of biomedicine). However, we see this as a “Jellybaby” in Douglas Hubbard’s sense -- it’s a first data-point and stab at the problem which brings us from “no idea idea how big or small the costs of info-cascades are”, to at least “it is plausible though very uncertain that the costs can be on the order of magnitude of billions of dollars, yearly, in academic grant money”.

2 comments

Comments sorted by top scores.

comment by jacobjacob · 2019-03-13T10:26:32.892Z · score: 7 (4 votes) · LW · GW

This might be an interesting pointer.

In Note-8 in the supplementary materials, Greenberg begins to quantify the problem. He defines an amplification measure for paper P as the number of citation-paths originating at P and terminating at all other papers, except for paths of length 1 flowing directly to primary data papers. The amplification density of a network is the mean amplification across its papers.

Greenberg then finds that, in the particular network analysed, you can achieve amplification density of about 1000 over a 15 year time-frame. This density grows exponentially with a doubling time of very roughly 2 years.