Understanding information cascades

post by jacobjacob, Ben Pace (Benito) · 2019-03-13T10:55:05.932Z · score: 55 (19 votes) · LW · GW · 33 comments

This is a question post.

Contents

  Background
  Questions
  Bounties
None
  Answers
    23 Davidmanheim
    20 Jan_Kulveit
    18 jacobjacob
    14 Pablo_Stafforini
    12 Pablo_Stafforini
None
10 comments

Meta: Because we think understanding info cascades are important, we recently spent ~10 hours trying to figure out how to quantitatively model them, and have contributed our thinking as answers below. While we currently didn't have the time to continue exploring, we wanted to experiment with seeing how much the LW community could together build on top of our preliminary search, so we’ve put up a basic prize for more work and tried to structure the work around a couple of open questions. This is an experiment! We’re looking forward to reading any of your contributions to the topic, including things like summaries of existing literature and building out new models of the domain.

Background

Consider the following situation:

Bob is wondering whether a certain protein injures the skeletal muscle of patients with a rare disease. He finds a handful papers with some evidence for the claim (and some with evidence against it), so he simply states the claim in his paper, with some caution, and adds that as a citation. Later, Alice comes across Bob’s paper and sees the cited claim, and she proceeds to cite Bob, but without tracing the citation trail back to the original evidence. This keeps happening, in various shapes and forms, and after a while a literature of hundreds of papers builds up where it’s common knowledge that β amyloid injures the skeletal muscle of patients with inclusion body myositis -- without the claim having accumulated any more evidence. (This real example was taken from Greenberg, 2009, which is a case study of this event.)

An information-cascade occurs when people update on each others beliefs, rather than sharing the causes of those beliefs, and those beliefs end up with a vestige of support that far outstrips the evidence for them. Satvik Beri might describe this as the problem of only sharing the outputs of your thinking process, not your inputs.

The dynamics here are perhaps reminiscent of those underlying various failures of collective rationality such as asset bubbles, bystander effects and stampedes.

Note that his effect is different from other problems of collective rationality like the replication crisis, which involve low standards for evidence (such as unreasonably lax p-value thresholds or coordination problems preventing publishing of failed experiments), or the degeneracy of much online discussion, which involves tribal signalling and UI encouraging problematic selection effects. Rather, information cascades involve people rationally updating without any object-level evidence at all, and would persist even if the replication crisis and online outrage culture disappeared. If nobody lies or tells untruths, you can still be subject to an information cascade.

Questions

Ben and I are confused about how to think about the negative effects of this problem. We understand the basic idea, but aren't sure how to reason quantitatively about the impacts, and how to trade-off solving these problems in a community versus doing other improvements to overall efficacy and efficiency of a community. We currently know only how to think about these qualitatively.

We’re posting a couple of related questions that we have some initial thoughts on, that might help clarify the problem.

If you have something you’d like to contribute, but that doesn’t seem to fit into the related questions above, leave it as an answer to this question.

Bounties

We are committing to pay at least either $800 or (No. of answers and comments * $25), whichever is smaller, for work on this problem recorded on LW, done before May 13th. The prize pool will be split across comments in accordance with how valuable we find them, and we might make awards earlier than the deadline (though if you know you’ll put in work in x weeks, it would be good to mention that to one of us via PM).

Ben and Jacob are each responsible for half of the prize money.

Jacob is funding this through Metaculus AI, a new forecasting platform tracking and improving the state-of-the-art in AI forecasting, partly to help avoid info-cascades in the AI safety and policy communities (we’re currently live and inviting beta-users, you can sign-up here).

Examples of work each of us are especially excited about:

Jacob

Ben

Answers

answer by Davidmanheim · 2019-03-14T10:44:31.314Z · score: 23 (5 votes) · LW · GW

I'm unfortunately swamped right now, because I'd love to spend time working on this. However, I want to include a few notes, plus reserve a spot to potentially reply more in depth when I decide to engage in some procrastivity.

First, the need for extremizing forecasts (See: Jonathan Baron, Barbara A. Mellers, Philip E. Tetlock, Eric Stone, Lyle H. Ungar (2014) Two Reasons to Make Aggregated Probability Forecasts More Extreme. Decision Analysis 11(2):133-145. http://dx.doi.org/10.1287/deca.2014.0293) seems like evidence that this isn't typically the dominant factor in forecasting. However, c.f. the usefulness of teaming and sharing as a way to ensure actual reasons get accounted for ( Mellers, B., Ungar, L., Baron, J., Ramos, J., Gurcay, B., Fincher, K., ... & Murray, T. (2014). Psychological strategies for winning a geopolitical forecasting tournament. Psychological science, 25(5), 1106-1115. )

Second, the solution that Pearl proposed for message-passing to eliminate over-reinforcement / double counting of data seems to be critical and missing from this discussion. See his book: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. I need to think about this more, but if Aumann agreement is done properly, people eventually converge on correct models of other reasoners, which should also stop info-cascades. The assumption of both models, however, is that there is iterated / repeated communication. I suspect that we can model info-cascades as a failure at exactly that point - in the examples given, people publish papers, and there is no dialogue. For forecasting, explicit discussion of forecasting reasons should fix this. (That is, I might say "My model says 25%, but I'm giving that only 50% credence and allocating the rest to the consensus value of 90%, leading to my final estimate of 57.5%")

Third, I'd be really interested in formulating testable experimental setups in Mturk or similar to show/not show this occurring, but on reflection this seems non-trivial, and I haven't thought much about how to do it other than to note that it's not as easy as it sounded at first.

comment by Pablo_Stafforini · 2019-03-15T16:20:13.254Z · score: 11 (4 votes) · LW · GW

Thanks for this.

Re extremizing, the recent (excellent) AI Impacts overview of good forecasting practices notes that "more recent data suggests that the successes of the extremizing algorithm during the forecasting tournament were a fluke."

comment by Davidmanheim · 2019-03-17T10:36:30.211Z · score: 7 (2 votes) · LW · GW

That's a great point. I'm uncertain if the analyses account for the cited issue, where we would expect a priori that extremizing slightly would on average hurt the accuracy, but in any moderately sized sample (like the forecasting tournament,) it is likely to help. It also relates to a point I made about why proper scoring rules are not incentive compatible in tournaments in a tweetstorm here; https://twitter.com/davidmanheim/status/1080460223284948994 .

Interestingly, a similar dynamic may happen in tournaments, and could be part of where info-cascades occur. I can in expectation outscore everyone else slightly and minimize my risk of doing very poorly by putting my predictions a bit to the extreme of the current predictions. It's almost the equivalent of betting a dollar more than the current high bid in price is right - you don't need to be close, you just need to beat the other people's scores to win. But if I report my best strategy answer instead of my true guess, it seems that it could cascade if others are unaware I am doing this.

comment by jacobjacob · 2019-03-22T15:10:05.785Z · score: 4 (2 votes) · LW · GW
more recent data suggests that the successes of the extremizing algorithm during the forecasting tournament were a fluke.

Do you have a link to this data?

comment by Davidmanheim · 2019-03-25T09:20:14.511Z · score: 3 (2 votes) · LW · GW

As I replied to Pablo below, "...it's an argument from first principles. Basically, if you extremize guesses from 90% to 95%, and 90% is a correct estimate, 9/10 times you do better due to extremizing. "

comment by Pablo_Stafforini · 2019-03-25T03:51:34.739Z · score: 2 (1 votes) · LW · GW

I only read the AI Impacts article that includes that quote, not the data to which the quote alludes. Maybe ask the author?

comment by Davidmanheim · 2019-03-25T09:19:47.975Z · score: 9 (3 votes) · LW · GW

You don't need the data - it's an argument from first principles. Basically, if you extremize guesses from 90% to 95%, and 90% is a correct estimate, 9/10 times you do better due to extremizing.

comment by jacobjacob · 2019-03-25T22:03:23.949Z · score: 4 (1 votes) · LW · GW

One should be able to think quantitatively about that, eg how many questions do you need to ask until you find out whether your extremization hurt you. I'm surprised by the suggestion that GJP didn't do enough, unless their extremizations were frequently in the >90% range.

comment by Davidmanheim · 2019-05-05T05:39:17.587Z · score: 3 (2 votes) · LW · GW

Each season, there were too few questions for this to be obvious, rather than a minor effect, and the "misses" were excused as getting an actually unlikely event wrong. It's hard to say, post-hoc, that the ~1% consensus opinion about a "freak event" were accurate, but there was a huge surprise (and yes, this happened at least twice) or if the consensus was simply overconfident.

(I also think that the inability to specify estimates <0.5% or >99.5% reduced the extent to which the scores were hurt by these events.)

comment by jacobjacob · 2019-03-25T22:00:06.295Z · score: 4 (2 votes) · LW · GW

I did, he said a researcher mentioned it in conversation.

comment by Pablo_Stafforini · 2019-03-15T16:28:16.427Z · score: 3 (3 votes) · LW · GW

[meta] Not sure why the link to the overview isn't working. Here's how the comment looks before I submit it:

https://imgur.com/MF5Z2X4

(The same problem is affecting this comment.)

In any case, the URL is:

https://aiimpacts.org/evidence-on-good-forecasting-practices-from-the-good-judgment-project-an-accompanying-blog-post/

comment by habryka (habryka4) · 2019-03-15T16:44:34.172Z · score: 4 (2 votes) · LW · GW

It's because I am a bad developer and I broke some formatting stuff (again). Will be fixed within the hour.

Edit: Fixed now

comment by Pablo_Stafforini · 2019-03-15T19:35:39.640Z · score: 4 (2 votes) · LW · GW

Thanks, Oli!

answer by Jan_Kulveit · 2019-03-14T12:58:43.494Z · score: 20 (10 votes) · LW · GW

Generally, there is a substantial literature on the topic within the field of network science. The right keywords for Google scholar are something like spreading dynamics in complex networks. Information cascades does not seem to be the best choice of keywords.

There are many options how you can model the state of the node (discrete states, oscillators, continuous variables, vectors of anything of the above,...), multiple options how you may represent the dynamics (something like Ising model / softmax, versions of voter model, oscillator coupling, ...) and multiple options how you model the topology (graphs with weighted or unweighted edges, adaptive wiring or not, topologies based on SBM, or scale-free networks, or Erdős–Rényi, or Watts-Strogatz, or real-world network data,... This creates somewhat large space of options, which were usually already explored somewhere in the literature.

What is possibly the single most important thing to know about this, there are universality classes of systems which exhibit similar behaviour; so you can often ignore the details of the dynamics/topology/state representation.

Overall I would suggest to approach this with some intellectual humility and study existing research more, rather then try to reinvent large part of network science on LessWrong. (My guess is something like >2000 research years were spent on the topic often by quite good people.)

comment by jacobjacob · 2019-03-14T15:45:50.640Z · score: 10 (3 votes) · LW · GW

I haven't looked through your links in much detail, but wanted to reply to this:

Overall I would suggest to approach this with some intellectual humility and study existing research more, rather then try to reinvent large part of network science on LessWrong. (My guess is something like >2000 research years were spent on the topic often by quite good people.)

I either disagree or am confused. It seems good to use resources to outsource your ability to do literature reviews, distillation or extrapolation, to someone with higher comparative advantage. If the LW question feature can enable that, it will make the market for intellectual progress more efficient; and I wanted to test whether this was so.

I am not trying to reinvent network science, and I'm not that interested in the large amount of theoretical work that has been done. I am trying to 1) apply these insights to very particular problems I face (relating to forecasting and more); and 2) think about this from a cost-effectiveness perspective.

I am very happy to trade money for my time in answering these questions.

(Neither 1) nor 2) seems like something I expect the existing literature to have been very interested in. I believe this for similar reasons to those Holden Karnofsky express here [LW · GW].)

comment by Jan_Kulveit · 2019-03-14T17:09:21.601Z · score: 10 (4 votes) · LW · GW

I was a bit confused by we but aren't sure how to reason quantitatively about the impacts, and how much the LW community could together build on top of our preliminary search, which seemed to nudge toward original research. Outsourcing literature reviews, distillation or extrapolation seem great.

comment by Ben Pace (Benito) · 2019-03-14T17:24:56.050Z · score: 12 (5 votes) · LW · GW

Agreed. I realise the OP could be misread; I've updated the first paragraph with an extra sentence mentioning that summarising and translating existing work/literature in related domains is also really helpful.

comment by Ben Pace (Benito) · 2019-03-14T16:16:03.430Z · score: 7 (4 votes) · LW · GW

Thanks for the pointers to network science Jan, I don't know this literature, and if it's useful here then I'm glad you understand it well enough to guide us (and others) to key parts of it. I don't see yet how to apply it to thinking quantitatively about scientific and forecasting communities.

If you (or another LWer) thinks that the theory around universality classes is applicable in thinking about how to ensure good info propagation in e.g. a scientific community, and you're right, then I (and Jacob and likely many others) would love to read a summary, posted here as an answer. Might you explain how understanding the linked paper on universality classes has helped you think about info propagation in forecasting communities / related communities? Concrete heuristics would be especially interesting

(Note that Jacob and I have not taken a math course in topology or graph theory and won't be able to read answers that assume such, though we've both studied formal fields of study and could likely pick it up quickly if it seemed practically useful.)

In general we're not looking for *novel* contributions. To give an extreme example, if one person translates an existing theoretical literature into a fully fleshed out theory of info-cascades for scientific and forecasting communities, we'll give them the entire prize pot.

comment by Jan_Kulveit · 2019-03-18T10:19:11.340Z · score: 17 (6 votes) · LW · GW

Short summary of how is the lined paper important: you can think about bias as some sort of perturbation. You are then interested in the "cascade of spreading" of the perturbation, and especially factors like the distribution of sizes of cascades. The universality classes tell you this can be predicted by just a few parameters (Table 1 in the linked paper) depending mainly on local dynamic (forecaster-forecaster interactions). Now if you have a good model of the local dynamic, you can determine the parameters and determine into which universality class the problem belongs. Also you can try to infer the dynamics if you have good data on your interactions.

I'm afraid I don't know enough about how "forecasting communities" work to be able to give you some good guesses what may be the points of leverage. One quick idea, if you have everybody on the same platform, may be to do some sort fo A/B experiment - manipulate the data so some forecasters would see the predictions of other with an artificially introduced perturbation, and see how their output will be different from the control group. If you have data on "individual dynamics" liken that, and some knowledge of network structure, the theory can help you predict the cascade size distribution.

(I also apologize for not being more helpful, but I really don't have time to work on this for you.)

comment by Pablo_Stafforini · 2019-03-19T13:29:20.242Z · score: 4 (4 votes) · LW · GW
Information cascades does not seem to be the best choice of keywords.

I wouldn't say that 'information cascades' isn't the best choice of keywords. What's happening here is that the same phenomenon is studied by different disciplines in relative isolation from each other. As a consequence, the phenomenon is discussed under different names, depending on the discipline studying it. 'Information cascades' (or, as it is sometimes spelled, 'informational cascades') is the name used in economics, while network science seems to use a variety of related expressions, such as the one you mention.

answer by jacobjacob · 2019-03-13T10:17:00.402Z · score: 18 (4 votes) · LW · GW

Here's a quick bibliography we threw together.

Background:

Previous LessWrong posts referring to info cascades:

And then here are all the LW posts we could find that used the concept (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11) . Not sure how relevant they are, but might be useful in orienting around the concept.

answer by Pablo_Stafforini · 2019-03-19T14:17:30.687Z · score: 14 (4 votes) · LW · GW

Information Cascades in Multi-Agent Models by Arthur De Vany & Cassey Lee has a section with a useful summary of the relevant economic literature up to 1999. (For more recent overviews, see my other comment.) I copy it below, with links to the works cited (with the exception of Chen (1978) and Lee (1999), both unpublished doctoral dissertations, and De Vany and Walls (1999b), an unpublished working paper):

A seminal paper by Bikhchandani et al (1992) explains the conformity and fragility of mass behavior in terms of informational cascades. In a closely related paper Banerjee (1992) models optimizing agents who engage in herd behavior which results in an inefficient equilibrium. Anderson and Holt (1997) are able to induce information cascades in a laboratory setting by implementing a version of Bikhchandani et al (1992) model.
The second strand of literature examines the relationship between information cascades and large fluctuations. Lee (1998) shows how failures in information aggregation in a security market under sequential trading result in market volatility. Lee advances the notion of “informational avalanches” which occurs when hidden information (e.g. quality) is revealed during an informational cascade thus reversing the direction of information cascades.
The third strand explores the link between information cascades and heavy tailed distributions. Cont and Bouchaud (1998) put forward a model with random groups of imitators that gives rise to stock price variations that are heavy-tailed distributed. De Vany and Walls (1996) use a Bose-Einstein allocation model to model the box office revenue distribution in the motion picture industry. The authors describe how supply adapts dynamically to an evolving demand that is driven by an information cascade (via word-of-mouth) and show that the distribution converges to a Pareto-Lévy distribution. The ability of the Bose-Einstein allocation model to generate the Pareto size distribution of rank and revenue has been proven by Hill (1974) and Chen (1978). De Vany and Walls (1996) present empirical evidence that the size distribution of box office revenues is Pareto. Subsequent work by Walls (1997), De Vany and Walls (1999a), and Lee (1999) has verified this finding for other markets, periods and larger data sets. De Vany and Walls (1999a) show that the tail weight parameter of the Pareto-Levy distribution implies that the second moment may not be finite. Lastly, De Vany and Walls (1999b) have shown that motion picture information cascades begin as action-based, noninformative cascades, but undergo a transition to an informative cascade after enough people have seen it to exchange “word of mouth” information. At the point of transition from an uninformed to an informed cascade, there is loss of correlation and an onset of turbulence, followed by a recovery of week to week correlation among high quality movies.
answer by Pablo_Stafforini · 2019-03-19T14:41:16.616Z · score: 12 (3 votes) · LW · GW

Two recent articles that review the existing economic literature on information cascades:

  • Sushil Bikhchandani, David Hirshleifer and Ivo Welch, Information cascades, The new Palgrave dictionary of economics (Macmillan, 2018), pp. 6492-6500.
  • Oksana Doherty, Informational cascades in financial markets: review and synthesis, Review of behavioral finance, vol. 10, no. 1 (2018), pp. 53-69.
  • An earlier review:

  • Maria Grazia Romano, Informational cascades in financial economics: a review, Giornale degli Economisti e Annali di Economia, vol. 68, no. 1 (2009), pp. 81-109.
  • 33 comments

    Comments sorted by top scores.

    comment by DanielFilan · 2019-03-14T07:19:31.783Z · score: 15 (4 votes) · LW · GW

    This paper looks at the dynamics of information flows in social networks using multi-agent reinforcement learning. I haven't read it, but am impressed by the work of the second author. Abstract:

    We model the spread of news as a social learning game on a network. Agents can either endorse or oppose a claim made in a piece of news, which itself may be either true or false. Agents base their decision on a private signal and their neighbors' past actions. Given these inputs, agents follow strategies derived via multi-agent deep reinforcement learning and receive utility from acting in accordance with the veracity of claims. Our framework yields strategies with agent utility close to a theoretical, Bayes optimal benchmark, while remaining flexible to model re-specification. Optimized strategies allow agents to correctly identify most false claims, when all agents receive unbiased private signals. However, an adversary's attempt to spread fake news by targeting a subset of agents with a biased private signal can be successful. Even more so when the adversary has information about agents' network position or private signal. When agents are aware of the presence of an adversary they re-optimize their strategies in the training stage and the adversary's attack is less effective. Hence, exposing agents to the possibility of fake news can be an effective way to curtail the spread of fake news in social networks. Our results also highlight that information about the users' private beliefs and their social network structure can be extremely valuable to adversaries and should be well protected.

    comment by Davidmanheim · 2019-03-14T10:57:02.174Z · score: 9 (5 votes) · LW · GW

    There's better, simpler results that I recall but cannot locate right now on doing local updating that is algebraic, rather than deep learning. I did find this, which is related in that it models this type of information flow and shows it works even without fully Bayesian reasoning; Jadbabaie, A., Molavi, P., Sandroni, A., & Tahbaz-Salehi, A. (2012). Non-Bayesian social learning. Games and Economic Behavior, 76(1), 210–225. https://doi.org/https://doi.org/10.1016/j.geb.2012.06.001

    Given those types of results, the fact that RL agents can learn to do this should be obvious. (Though the social game dynamic result in the paper is cool, and relevant to other things I'm working on, so thanks!)

    comment by shminux · 2019-03-13T15:26:46.621Z · score: 12 (3 votes) · LW · GW

    Have you read the backreaction blog where Sabine Hossenfelder details much the same phenomenon in high-energy physics? She claims that the prevailing groupthink ended up believing into String Theory without a shred of evidence for (only some vague hints), and so far with every single prediction of it refuted?

    comment by Kaj_Sotala · 2019-03-14T04:50:12.182Z · score: 11 (4 votes) · LW · GW

    Is it necessarily a good idea to break up the topic into so many separate questions before having a general discussion post about it first? I would imagine that people might have comments which were related to several of the different questions, but now the discussion is going to get fragmented over many places.

    E.g. if someone knows about a historical info cascade in academia and how people failed to deal with that, then that example falls under two different questions. So then the answer with that example either has to be be split into two or to be posted in an essentially similar form on both pages, neither of which is good for keeping the entire context of the discussion in one place.

    comment by Kaj_Sotala · 2019-03-14T04:58:02.510Z · score: 12 (7 votes) · LW · GW

    Separately, there's a part of me that finds it viscerally annoying to have multiple questions around the same theme posted around the same time. It feels like it incentivizes people with a pet topic to promote that topic by asking a lot of questions about it so that other topics get temporarily drowned out. Even if the topic is sometimes important enough to be worth it, it still feels like the kind of thing to discourage.

    comment by mr-hire · 2019-03-14T10:00:46.372Z · score: 10 (5 votes) · LW · GW

    I also have this visceral feeling. It feels like a "subquestions" feature could fix both these issues.

    comment by jacobjacob · 2019-03-14T15:20:16.372Z · score: 6 (4 votes) · LW · GW

    Seems like a sensible worry, and we did consider some version of it. My reasoning was roughly:

    1) The questions feature is quite new, and if it will be very valuable, most use-cases and the proper UI haven't been discovered yet (these can be hard to predict in advance without getting users to play around with different things and then talking to them).

    No one has yet attempted to use multiple questions. So it would be valuable for the LW team and the community to experiment with that, despite possible countervailing considerations (any good experiment will have sufficient uncertainty that such considerations will always exist).

    2) Questions 1/2, 3 and 4 are quite different, and it seems good to be able to do research on one sub-problem without taking mindshare from everyone working on any subproblem.

    comment by jacobjacob · 2019-05-14T09:56:59.477Z · score: 6 (3 votes) · LW · GW

    This is an update on the timeline for paying out the bounties on this question. They will be awarded for work done before May 13th, but we're delayed by another few weeks in deciding on the allocation. Apologies!

    comment by Michael McLaren (michael-mclaren) · 2019-03-15T18:39:44.789Z · score: 5 (3 votes) · LW · GW

    Nissen et al 2016 ("Publication bias and the canonization of false facts") give a simple model for how publication bias in academic research can have a similar effect to the "information cascades" described in the OP. False scientific claims are likely to be falsified by an experiment, but will sometimes be found to be true. Positive results supporting a claim may be more likely to be published than negative results against the claim. The authors' model assumes that the credence of the scientific community in the claim is determined by the number of published positive and negative results, and that new studies will be done to repeatedly test the claim until the credence becomes sufficiently close to 0 or 1. The publication bias favoring false results can overpower the odds against getting a positive result in any given experimental replication and lead to false claims becoming canonized as fact with a non-negligible probability.

    The mechanism here differs in a sense from the "information cascade" examples in the OP and on the Wikipedia page in that the false claim is being repeatedly tested with new experiments. However, I think it could be seen as fundamentally the same as the citation bias example of Greenberg 2009 in the OP, if we think of the scientific community rather than an individual scientist as being the actor. In the Greenberg 2009 example, the problem is that individual scientists tend only to cite positive findings; in the Nissen et al model, the scientific community tends to only publish positive findings. (Of course, this second problem feeds into the first.)

    comment by jacobjacob · 2019-03-13T10:46:06.236Z · score: 3 (2 votes) · LW · GW

    See this post [LW · GW] for a good, simple mathematical description of the discrete version of the phenomenon.