ISO: Automated P-Hacking Detectionpost by johnswentworth · 2019-06-16T21:15:52.837Z · score: 6 (1 votes) · LW · GW · 3 comments
I'm sure there's some ML students/researchers on Lesswrong in search of new projects, so here's one I'd love to see and probably won't build myself: an automated method for predicting which papers are unlikely to replicate, given the text of the paper. Ideally, I'd like to be able to use it to filter and/or rank results from Google scholar.
Getting a good data set would probably be the main bottleneck for such a project. Various replication-crisis papers which review replication success/failure for tens or hundreds of other studies seem like a natural starting point. Presumably some amount of feature engineering would be needed; I doubt anyone has a large enough dataset of labelled papers to just throw raw or lightly-processed text into a black box.
Also, if anyone knows of previous attempts to do this, I'd be interested to hear about it.
Comments sorted by top scores.