Estimating the Number of Players from Game Result Percentages

post by Daniel L (daniel-lyakovetsky) · 2024-04-28T17:42:03.247Z · LW · GW · 2 comments

Contents

2 comments

Recently I got into the daily word puzzle game Couch Potato Salad. At the end of the game, it shows the percent of players who “nailed”, ”sailed”, ”prevailed”, ”exhaled” and ”failed”. Once, I played the game shortly after midnight when the new puzzle becomes available. I nailed it (ha!), but noticed that the game results suspiciously neatly split into 33.3%, 66.6% and zeroes for the rest. Aha, I thought, there must be only three or six or nine or twelve of us playing the game that early in the morning. That got me interested in figuring the number of players based on the score. 

Generalized as a number of respondents based on the poll results, I thought the following logic should work.

The number of respondents is the minimum positive integer that when multiplied by each of the voting percentages produces a near-integer number.

ChatGPT had concurred, while pedantically pointing that “this concept is akin to finding the least common multiple (LCM) of denominators in fractional representations of those percentages.” Ok.

I concocted the following Pythonic implementation:

N = min(
	[n for n in range(1, Nmax) 
		if all(
			[abs(v[i]*n - round(v[i]*n)) < 100*e 
				for i in range(0, len(v))
			])
	])

Where

I tried it with actual numbers from that game results from a bit later:

>>> N = lambda v, Nmax=10000, e=0.001: \
      min([n for n in range(1, Nmax) \
        if all(
          [abs(v[i]*n - round(v[i]*n)) < 100*e \
            for i in range(0, len(v))])
         ])
>>> N([0.333, 0.666, 0, 0, 0])
3
>>> N([0.125, 0.125, 0.75, 0, 0])
8
>>> N([0.409, 0.136, 0.182, 0.045, 0.227])
22
>>> N([0.821, 0.103, 0.051, 0.026, 0])
39
>>> N([0.671, 0.075, 0.171, 0.04, 0.043])
374

I don't know what was the actual number of people played, but intuitively this seems alright. It was still early in the morning.

ChatGPT suggested using numpy for efficiency, but agreed that my one-liner is Pythonic and elegant.

Implementation aside, what do you think? Is there a different / better way to think about this problem? How would you approach this?

2 comments

Comments sorted by top scores.

comment by bsol · 2024-04-29T18:19:44.397Z · LW(p) · GW(p)

For one, with e = 0.001 the algorithm N([0.64, 0.36, 0, 0, 0]) = 3 gives a wrong result. Using e = 0.0001 would give a better answer in this case. I haven't given more thought though on what e is optimal.

Though the bigger issue is the number of players can't strictly be computed based on percentages alone.

It would be nice if we could make the assumption that the least common multiple (with some leeway to account for rounding) is the correct answer, but I don't feel comfortable with that. This assumption comes from it being just past midnight and therefore we assume the number of players is "small." But how small?

What should N([0.5, 0.5, 0, 0, 0]) return? It could be any multiple of 2, many of which are small.

The only way I can think to get a correct answer is to frequently refresh the page to get a time series of percentages. Then an algorithm could possibly give the number of players. 

For example say you saw [0.5, 0.5, 0, 0, 0] and then refresh and immediately after saw [0.444, 0.444, 0.111, 0, 0]. If you were confident that only 1 player joined between refreshes, you would know that the number of players is now 9. If you can't be sure it was only 1 player, it would be a lot more complicated though.

Replies from: gwern
comment by gwern · 2024-04-29T23:25:16.797Z · LW(p) · GW(p)

I'm not sure what "margin of error" is. This is just rounding, is it not? It's not like the website is adding random epsilon numbers to screw with you: it is simply rounding off percentages. They are exact and deterministic up to rounding.

Though the bigger issue is the number of players can't strictly be computed based on percentages alone.

Since it's a non-deterministic problem, you'd represent all possible answers in ascending order as a lazy generator. You can then filter it by any known constraints or requirements (maybe you know players have to be paired, so it's always an even number of players, and you can filter out all odd values). Since the number of possible valid values will increase greatly as the total N increases, this might get slow and require some sort of sieve. (At a guess, since it's rounding, the range of possible values presumably increases rapidly, and so it might make more sense to instead returns pairs of (lower,upper) bounds?)

Personally, I would first start with the forward problem, since it is so simple. Then I could test any algorithms or tweaks by generating a random N of players and testing that all of the generator values <= N are correct.

This has the benefit that you can also easily do simple Bayesian updates by ABC without the rigmarole of pymc or Stan etc: just draw a sample from the prior over the n players, feed it in, see if you replicate the exact observed %s, and if not, delete the sample; the samples you keep are the new posterior.