Posts
Comments
Ceteris paribus, I'd choose the second theory since the process that generated it had strictly more information. Assume that the scientists would've generated the same theory given the same data, and the data in question are coin flips. The first scientist sees a random looking series of 10 coin flips with 5 heads and 5 tails and hypothesizes that they are generated by the random flips of a fair coin. We collect 10 more data points, and again we get 5 heads and 5 tails, the maximum likelihood result given the first theory. Now the second scientist sees the same 20 coin flips, and notices that the second series of 10 flips exactly duplicates the first. So the second scientist hypothesizes that the generating process is deterministic with a period of 10 flips. So even though the same 20 data points are maximum likelihood given both theories, the second theory assigns them more probability mass. I think this becomes more salient intuitively if we imagine increasing the length of the repeating series to, say, 1,000,000.