Simultaneous Redundant Research
post by eg · 2021-08-17T12:17:27.701Z · LW · GW · 1 commentsContents
1 comment
Suppose for some big problem that research labor is plentiful and time is short. Obviously the first thing to do is to divide the research into subproblems and research them in parallel. But what if the number of research teams still exceeds the number of real subproblems identified?
One possibility: some teams preregister methodologies for each subproblem, then each remaining team picks one to copy and perform independently, like replication, but concurrently with the original team.
Possible advantages of this approach:
* Redundancy: If one team gets stopped or delayed, you still get the preliminary result on time.
* Immediate replication: If all goes well, you get the more confident replicated result faster than with sequential replication.
* Faster error detection: If there is a discrepancy in results, there could be immediate investigation of what the teams did differently.
* Research capital: The "extra" teams get to spend their time doing real research instead of other things, presumably making them better at doing real research for when more subproblems become available.
* Replication motivation: Public reports of the results of the first team to finish could mention the other teams by name, thus creating more prestige for replication.
And for more conceptual rather than empirical research, the teams might go in completely different directions and generate insights that a single team or individual would not.
1 comments
Comments sorted by top scores.
comment by gwern · 2021-08-17T17:09:06.933Z · LW(p) · GW(p)
Obviously the first thing to do is to divide the research into subproblems and research them in parallel. But what if the number of research teams still exceeds the number of real subproblems identified?
This is easy but not necessarily optimal. Sometimes you want to overkill a hypothesis before falling back. Imagine a scenario where the top hypothesis has 50% prior probability, you run an experiment which is powered to have a 10% error rate in definitively accepting/rejecting it, and you could run a second experiment reducing that error to 5%; do you really want to instead spend that experiment testing a dark horse hypothesis with a priority probability of 1%? Probably better to drive that top hypothesis down to <1% first, and the marginal value of other hypotheses becoming larger, before investing in buying lottery tickets.
This is a pretty classic sort of multi-stage decision problem in research and so relevant stuff comes up everywhere depending on how you look at it: it's related to experiment design, particularly factorial design; to external vs internal validity, especially in meta-analysis where you balance between-study measurement of heterogeneity/systematic error with overcoming within-study random sampling error; to group testing; and to parallelized blackbox optimization (especially in hyperparameter optimization, where you can more easily run many models in parallel than one model really fast) where you have to distribute multiple arms sampling across the loss landscape and need to avoid over-concentrating in narrow regions of settings.