The Median Researcher Problem

post by johnswentworth · 2024-11-02T20:16:11.341Z · LW · GW · 30 comments

Contents

29 comments

Claim: memeticity in a scientific field is mostly determined, not by the most competent researchers in the field, but instead by roughly-median researchers. We’ll call this the “median researcher problem”.

Prototypical example: imagine a scientific field in which the large majority of practitioners have a very poor understanding of statistics, p-hacking, etc. Then lots of work in that field will be highly memetic despite trash statistics, blatant p-hacking, etc. Sure, the most competent people in the field may recognize the problems, but the median researchers don’t, and in aggregate it’s mostly the median researchers who spread the memes.

(Defending that claim isn’t really the main focus of this post, but a couple pieces of legible evidence which are weakly in favor:

… mostly, though, the reason I believe the claim is from seeing how people in fact interact with research and decide to spread it.)

Two interesting implications of the median researcher problem:

In particular, LessWrong sure seems like such a community. We have a user base with probably-unusually-high intelligence, community norms which require basically everyone to be familiar with statistics and economics, we have fuzzier community norms explicitly intended to avoid various forms of predictable stupidity, and we definitely have our own internal meme population. It’s exactly the sort of community which can potentially outperform whole large fields, because of the median researcher problem. On the other hand, that does not mean that those fields are going to recognize LessWrong as a thought-leader or whatever.

30 comments

Comments sorted by top scores.

comment by quetzal_rainbow · 2024-11-03T08:35:32.263Z · LW(p) · GW(p)

I'm not sure median researcher is particularly important here, relatively to, say, median lab leader.

Median voter theorem works explicitly because votes of everyone are equal, but if you have lab/research group leader who disincentivizes bad research practices, then you theoretically should get lab with good research practices.

In practice, lab leaders are often people who Goodhart incentives, which results in current situation.

LessWrong has chance to be better exactly because it is outside of current system of perverse incentives. Although, it has its own bad incentives.

Replies from: Kaj_Sotala
comment by Kaj_Sotala · 2024-11-03T19:23:29.890Z · LW(p) · GW(p)

I had the thought while reading the original post that I recall speaking to at least one researcher who, pre-replication crisis, was like "my work is built on a pretty shaky foundation as is most of the research in this field, but what can you do, this is the way the game is played". So that suggested to me that plenty of median researchers might have recognized the issue but not been incentivized to change it.

Lab leaders aren't necessarily in a much better position. If they feel responsibility toward their staff, they might feel even more pressured to keep gaming the metrics so that the lab can keep getting grants and its researchers good CVs.

comment by Hastings (hastings-greer) · 2024-11-03T00:19:33.931Z · LW(p) · GW(p)

Show me a field where replication crises tear through, exposing fraud and rot and an emperor that never had any clothes, a field where replications fail so badly that they result in firings and polemics in the New York Times and destroyed careers- and then I will show you a field that is a little confused but has the spirit and will get there sooner or later.

What you really need to look out for are fields that could never, on a conceptual level, have a devastating replication crisis. Lesswrong sometimes strays a little close to this camp.

Replies from: gwern, scipio
comment by gwern · 2024-11-03T00:42:40.535Z · LW(p) · GW(p)

Show me a field where replication crises tear through, exposing fraud and rot and an emperor that never had any clothes, a field where replications fail so badly that they result in firings and polemics in the New York Times and destroyed careers- and then I will show you a field that is a little confused but has the spirit and will get there sooner or later.

So... parapsychology? How'd that work out? Did they have the (ahem) spirit and get there sooner or later?

Replies from: hastings-greer
comment by Hastings (hastings-greer) · 2024-11-03T02:37:42.620Z · LW(p) · GW(p)

Personally I am quite pleased with the field of parapsychology. For example, they took a human intuition and experience ("Wow, last night when I went to sleep I floated out of my body. That was real!") and operationalized it into a testable hypothesis ("When a subject capable of out of body experiences floats out of their body, they will be able to read random numbers written on a card otherwise hidden to them.") They went and actually performed this experiment, with a decent deal of rigor, writing the results down accurately, and got an impossible result- one subject could read the card. (Tart, 1968.) A great deal of effort quickly went in to further exploration (including military attention with the men who stare at goats etc) and it turned out that the experiment didn't replicate, even though everyone involved seemed to genuinely expect it to. In the end, no, you can't use an out of body experience to remotely view, but I'm really glad someone did the obvious experiments instead of armchair philosophizing. 

https://digital.library.unt.edu/ark:/67531/metadc799368/m2/1/high_res_d/vol17-no2-73.pdf is a great read from someone who obviously believes in the metaphysical, and then does a great job designing and running experiments and accurately reporting their observations, and so it's really only a small ding against them that the author draws the wrong larger conclusions in the end.

comment by ROM (scipio ) · 2024-11-04T16:36:06.122Z · LW(p) · GW(p)

a field where replications fail so badly that they result in firings and polemics in the New York Times and destroyed careers-

A field can be absolutely packed with dreadful research and still see virtually no one getting fired. Take, for instance, the moment a prominent psychologist dubbed peers who questioned methodological standards as “methodological terrorists.” It’s the kind of rhetoric that sends a clear message: questioning sloppy methods isn’t just unwelcome; it’s practically heretical.

comment by Anon User (anon-user) · 2024-11-03T20:39:29.816Z · LW(p) · GW(p)

I am not sure about the median researcher. Many fields have a few "big names" that everybody knows and who's opinions have disproportionate weight.

comment by Kaj_Sotala · 2024-11-03T19:32:57.699Z · LW(p) · GW(p)

community norms which require basically everyone to be familiar with statistics and economics,

I think this is too strong. There are quite a few posts that don't require knowledge of either one to write, read, or comment on. I'm certain that one could easily accumulate lots of karma and become a well-respected poster without knowing either.

Replies from: Raemon, johnswentworth
comment by Raemon · 2024-11-03T19:34:49.729Z · LW(p) · GW(p)

Yeah, I didn't read this post and come away with "and this is why LessWrong works great", I came away with a crisper model of "here are some reasons LW performs well sometimes", but more importantly "here is an important gear for what LW needs to work great."

comment by johnswentworth · 2024-11-03T19:48:17.645Z · LW(p) · GW(p)

Our broader society has community norms which require basically everyone to be literate. Nonetheless, there are jobs in which one can get away without reading, and the inability to read does not make it that much harder to make plenty of money and become well-respected. These statements are not incompatible.

Replies from: Kaj_Sotala
comment by Kaj_Sotala · 2024-11-03T21:25:41.830Z · LW(p) · GW(p)

Hmm... let me rephrase: it doesn't seem to me like we would actually have a clear community norm for this, at least not one strong enough to ensure that the median community member would actually be familiar with stats and econ.

comment by CstineSublime · 2024-11-03T04:36:45.925Z · LW(p) · GW(p)

Could you please elaborate on what you mean by "highly memetic" and "internal memetic selection pressures"? I'm probably not the right audience for this piece, but that particular word (memetic) is making it difficult for me to get to grips with the post as a whole. I'm confused if you mean there is a high degree of uncritical mimicry, or if you're making some analogy to 'genetic' (and what that analogy is...)

Replies from: johnswentworth
comment by johnswentworth · 2024-11-03T05:10:19.760Z · LW(p) · GW(p)

It is indeed an anology to 'genetic'. Ideas "reproduce" via people sharing them. Some ideas are shared more often, by more people, than others. So, much like biologists think about the relative rate at which genes reproduce as "genetic fitness", we can think of the relative rate at which ideas reproduce as "memetic fitness". (The term comes from Dawkins back in the 70's; this is where the word "meme" originally came from, as in "internet memes".)

Replies from: gjm
comment by gjm · 2024-11-03T17:11:28.734Z · LW(p) · GW(p)

I think you're using "memetic" to mean "of high memetic fitness", and I wish you wouldn't. No one uses "genetic" in that way.

An idea that gets itself copied a lot (either because of "actually good" qualities like internal consistency, doing well at explaining observations, etc., or because of "bad" (or at least irrelevant) ones like memorability, grabbing the emotions, etc.) has high memetic fitness. Similarly, a genetically transmissible trait that tends to lead to its bearers having more surviving offspring with the same trait has high genetic fitness. On the other hand, calling a trait genetic means that it propagates through the genes rather than being taught, formed by the environment, etc., and one could similarly call an idea or practice memetic if it comes about by people learning it from one another rather than (e.g.) being instinctive or a thing that everyone in a particular environment invents out of necessity.

When you say, e.g., "lots of work in that field will be highly memetic despite trash statistics, blatant p-hacking, etc." I am pretty certain you mean "of high memetic fitness" rather than "people aware of it are aware of it because they learned of it from others rather than because it came to them instinctively or they reinvented it spontaneously because it was obvious from what was around them".

(It would be possible, though I'd dislike it, to use "memetic" to mean something like "of high memetic fitness for 'bad' reasons" -- i.e., liable to be popular for the sort of reason that we might not appreciate without the notion of memes. But I don't think that can be your meaning in the words I quoted, which seem to presuppose that the "default" way for a piece of work to be "memetic" is for it to be of high quality.)

Replies from: johnswentworth
comment by johnswentworth · 2024-11-03T17:30:16.250Z · LW(p) · GW(p)

I have split feelings on this one. On the one hand, you are clearly correct that it's useful to distinguish those two things and that my usage here disagrees with the analogous usage in genetics. On the other hand, I have the vague impression that my usage here is already somewhat standard, so changing to match genetics would potentially be confusing in its own right.

It would be useful to hear from others whether they think my usage in this post is already standard (beyond just me), or they had to infer it from the context of the post. If it's mostly the latter, then I'm pretty sold on changing my usage to match genetics.

Replies from: Thane Ruthenis, habryka4, D0TheMath
comment by Thane Ruthenis · 2024-11-04T11:53:17.491Z · LW(p) · GW(p)

Your use of "memetic" here did struck me as somewhat idiosyncratic; I had to infer it. I would have used "memetically viral" and derivatives in its place. (E. g., in place of "lots of work in that field will be highly memetic despite trash statistics", I would've said "lots of ideas in that field will be highly viral despite originating from research with trash statistics" or something.)

comment by habryka (habryka4) · 2024-11-03T18:19:18.324Z · LW(p) · GW(p)

Yep, it seems like pretty standard usage to me (and IMO seems conceptually fine, despite the fact that "genetic" means something different, since for some reason using "memetic" in the same way feels very weird or confused to me, like I would almost never say "this has memetic origin")

Replies from: johnswentworth
comment by johnswentworth · 2024-11-03T18:37:00.683Z · LW(p) · GW(p)

since for some reason using "memetic" in the same way feels very weird or confused to me, like I would almost never say "this has memetic origin"

... though now that it's been pointed out, I do feel like I want a short handle for "this idea is mostly passed from person-to-person, as opposed to e.g. being rederived or learned firsthand".

I also kinda now wish "highly genetic" meant that a gene has high fitness, that usage feels like it would be more natural.

Replies from: D0TheMath
comment by Garrett Baker (D0TheMath) · 2024-11-03T18:54:44.762Z · LW(p) · GW(p)

I think in principle it makes sense in the same sense “highly genetic” makes sense. If a trait is highly genetic, then there’s a strong chance for it to be passed on given a reproductive event. If a meme is highly memetic, then there’s a strong chance for it to be passed on via a information transmission.

In genetic evolution it makes sense to distinguish this from fitness, because in genetic evolution the dominant feedback signal is whether you found a mate, not the probability a given trait is passed to the next generation.

In memetic evolution, the dominant feedback signal is the probability a meme gets passed on given a conversation, because there is a strong correlation between the probability someone passes on the information you told them, and getting more people to listen to you. So a highly memetic meme is also incredibly likely to be highly memetically fit.

comment by Garrett Baker (D0TheMath) · 2024-11-03T17:54:22.673Z · LW(p) · GW(p)

I definitely had no trouble understanding the post, and the usage seems very standard among blogs I read and people I talk to.

comment by bhauth · 2024-11-04T03:12:20.336Z · LW(p) · GW(p)

What would this say about subculture gatekeeping? About immigration policy?

comment by tailcalled · 2024-11-03T14:01:32.159Z · LW(p) · GW(p)

I think one thing that's missing here is that you're making a first-order linear approximation of "research" as just continually improving in some direction. I would instead propose a quadratic model where there is some maximal mode of activity in the world, but this mode can face certain obstacles that people can remove. Research progress is what happens when there's an interface for removing obstacles that people are gradually developing familiarity with (for instance because it's a newly developed interface).

Different people have different speeds by which they reach the equillibrium, but generally those who have an advantage would also exhibit an explosion of skills and production as they use their superior understanding of the interface.

comment by Ninety-Three · 2024-11-04T20:15:34.245Z · LW(p) · GW(p)

A small research community of unusually smart/competent/well-informed people can relatively-easily outperform a whole field, by having better internal memetic selection pressures.

 

It's not obvious to me that this is true, except insofar as a small research community can be so unusually smart/competent/etc that their median researcher is better than a whole field's median researcher so they get better selection pressure "for free". But if an idea's popularity in a wide field is determined mainly by its appeal to the median researcher, I would naturally expect its popularity in a small community to be determined mainly by its appeal to the median community member.

This claim looks like it's implying that research communities can build better-than-median selection pressures but, can they? And if so why have we hypothesized that scientific fields don't?

Replies from: Raemon
comment by Raemon · 2024-11-04T21:24:32.199Z · LW(p) · GW(p)

This claim looks like it's implying that research communities can build better-than-median selection pressures but, can they? And if so why have we hypothesized that scientific fields don't?

I'm a bit surprised this is the crux for you. Smaller communities have a lot more control over their gatekeeping because, like, they control it themselves, whereas the larger field's gatekeeping is determined via openended incentives in the broader world that thousands (maybe millions?) of people have influence over. (There's also things you could do in addition to gatekeeping. See Selective, Corrective, Structural: Three Ways of Making Social Systems Work [LW · GW])

(This doesn't mean smaller research communities automatically have good gatekeeping or other mechanisms, but it doesn't feel like a very confusing or mysterious problem on how to do better)

comment by ROM (scipio ) · 2024-11-04T16:27:55.221Z · LW(p) · GW(p)

People did in fact try to sound the alarm about poor statistical practices well before the replication crisis, and yet practices did not change, 

This rings painfully true. As early as the late 1950s, at least one person was already raising a red flag about the risks that psychology[1] might veer into publishing a sea of false claims:

There is some evidence that in fields where statistical tests of significance are commonly used, research which yields nonsignificant results is not published. Such research being unknown to other investigators may be repeated independently until eventually by chance a significant result occurs—an 'error of the first kind'—and is published. Significant results published in these fields are seldom verified by independent replication. The possibility thus arises that the literature of such a field consists in substantial part of false conclusions resulting from errors of the first kind in statistical tests of significance.

  1. ^

    Sterling isn't explicitly talking about psychology, but rather any field where significance tests are used.

comment by Oxidize · 2024-11-03T17:46:47.996Z · LW(p) · GW(p)

How do you think competent people can solve this problem within their own fields of expertise? 

For example, the EA community is a small & effective community like you've referenced for commonplace charity/altruism practices. 

How could we solve the median researcher problem & improve the efficacy & reputation of altruism as a whole?

Personally, I suggest taking a marketing approach. If we endeavor to understand important similarities between "median researchers", so that we can talk to them in the language they want to hear, we may be able to attract attention from the broader altruism community which can eventually be leveraged to place EA in a position of authority or expertise.

What do you think?

comment by João Ribeiro Medeiros (joao-ribeiro-medeiros) · 2024-11-03T21:33:58.130Z · LW(p) · GW(p)

Consistency in research fields and protection against elementary methodological malpractices such as p-hacking and the like should be enforced through automation. IQ over median does not correlate with creativity over median, as indicated by recent research, so i wouldn't worry too much about this side of your argument. I think future research in general has to contemplate what is the best way to harvest human creativity while ensuring that consistency, novelty and methodological robustness are enforced through automation.

Replies from: justinpombrio
comment by justinpombrio · 2024-11-04T14:55:07.471Z · LW(p) · GW(p)

IQ over median does not correlate with creativity over median

That's not what that paper says. It says that IQ over 110 or so (quite above median) correlates less strongly (but still positively) with creativity. In Chinese children, age 11-13.

Replies from: joao-ribeiro-medeiros
comment by João Ribeiro Medeiros (joao-ribeiro-medeiros) · 2024-11-04T16:41:31.079Z · LW(p) · GW(p)

Correlation value over IQ at 100 seems to be already well under the variance so not really meaningful, and if you look at what the researchers call Originality, the correlation is actually negative over IQ 110. 

Just as a correction to your comment, I am not stating this as an adamant fact, but as an "indication" not a "demonstration", I said: "indicated by recent research"

I understand the reference I pointed out has a limited scope (Chinese children, age 11-13), as any research of this kind, but beyond the rigorous scientific demonstration of this concept, I am expressing the fact that IQ tests are very incomplete, which is not novel. 

Thank you for your response.