D&D.Sci Holiday Special: How the Grinch Pessimized Christmas Evaluation & Ruleset

post by aphyer · 2022-01-11T01:29:59.816Z · LW · GW · 3 comments

Contents

  RULESET
  DATASET GENERATION
  STRATEGY
  LEADERBOARD
  FEEDBACK REQUEST
None
3 comments

This is a follow-up to [LW · GW]last week's D&D.Sci scenario [LW · GW]: if you intend to play that, and haven't done so yet, you should do so now before spoiling yourself.

Full generation code is available here if you are interested, or you can read on.

RULESET

Each of the six toys makes a different amount of noise depending on what child receives it:

Each child's noise is the sum of their two toys' noises.  Additionally, however, each distinct child has a favorite toy.  Any noise a child makes with their favorite toy is doubled.

For example, Johnny Drew Who, in our dataset, is a 4-year old Male Who Child whose favorite toy is the Blum-Blooper.  If he is given:



DATASET GENERATION

There are three families: the Drew Whos, Lou Whos and Sue Whos.

Each year, Who Children age and may be born - each family rolls a d4 and has a new child if the roll is greater than the number of children they currently have.  As such, there are never more than 12 children (4 in each of the 3 families), and the overall number is sometimes a bit lower (average is around 3 children per family at any given time).  Then each Who child from age 1-12 is given two toys.

Each Who child, when they are born, is assigned a Favorite Toy at random.  

There weren't actually any tricks in this one.  I meddled with the RNG in only one minor way: the RNG by default had a 1-year-old Who Child just born this year, and I left that child out because there wouldn't be a way for you to determine their Favorite Toy.  Aside from that, the ages of Whos and what their favorite toys were was purely the RNG.



STRATEGY

Once you know how the rules work, noise-minimizing strategy is to:

While noise-maximizing strategy is essentially the reverse.

The Who children and their Favorite Toys are:

NameAgeGenderFavorite Toy
Andy Sue Who12MSloo-Slonker
Betty Drew Who11FFum-Foozler
Sally Sue Who11FFum-Foozler
Phoebe Drew Who9FSloo-Slonker
Freddie Lou Who8MSloo-Slonker
Eddie Sue Who8MWho-Whonker
Cindy Drew Who6FWho-Whonker
Mary Lou Who6FGah-Ginka
Ollie Lou Who5MFum-Foozler
Johnny Drew Who4MBlum-Blooper

One example of a noise-minimizing allocation is:

For a total of 108 noise.  One example of a noise-maximizing allocation is:

For a total of 237 noise.

LEADERBOARD

Note: inputting the toy selections was a somewhat manual process, if you think I've scored you wrong let me know.

PlayerNoise
Noise-minimizing allocation108
GuySrinivasan (min)108
abstractapplic (min)118
simon (min) 123
Yonge (min)128
MadHatter (min)129
Random allocation165
MadHatter (max)204
abstractapplic (max)207
simon (max)229
Noise-maximizing allocation237

Congratulations to everyone who submitted, particularly to GuySrinivasan (who figured out the entire ruleset and got the perfect noise-minimizing allocation) and also to simon (who had the best Friendly Grinch solution).


FEEDBACK REQUEST

As usual, I'm interested in feedback.  If you played the scenario, what did you like and what did you not like?  If you might have played but in the end did not, what drove you away?  Is the timeline too long/too short/just right?

In particular, I tried to make this scenario simple - there were a couple sneaky things (Favorite Toys and Trum-Troopers) but for the most part I think it was a lot less complex, and bearing this out we had a perfect answer from GuySrinivasan, and near-perfect answers from several others.  How do players think the difficulty of this compared to what you want to see?

3 comments

Comments sorted by top scores.

comment by abstractapplic · 2022-01-11T10:22:04.876Z · LW(p) · GW(p)

Reflections on my attempt:

It looks like I was basically right. Even in the place I came up short – figuring out Trum-Troopas – I knew I was probably missing something, since it would have been weird for that to be the only not-perfectly-predictable part of the problem.

Reflections on the challenge:

This is the first D&D.Sci which is a pure puzzle; that is, the first one without randomness in the linkage between explanatory and response variables. I think this would be unfair for something presented as a social science problem, except that a) the Seussian context was a pretty big hint that normal rules don’t apply and implausibly-tidy solutions are on the table, b) it was fairly obvious after an hour or two of looking at the data that there were unusually clean linkages between at least some toy choices at at least some noise levels (G+S equals exactly 16 more than half the time, many toy combos have a suspiciously low number of possible noise outputs), and c) using only ML can still net you a much-better-than-chance result. However, I suspect the anomalous neatness may have caused limited engagement: once someone posts a complete or near-complete solution, why bother investigating for yourself? And why bother writing about your analysis if it’s identical or near-identical to one already posted?

Others may have other opinions, but I really liked the 10-day length: giving players a week plus a choice of weekend provided a lot of breathing room. I also (continue to) think the problem introduction was the best-written one so far. And while I don’t know if the high puzzleishness was a good idea overall, it definitely enabled one heck of a eureka moment when I figured out (most of) the rules. Thank you very much for running this game.

comment by SarahSrinivasan (GuySrinivasan) · 2022-01-12T16:48:21.227Z · LW(p) · GW(p)

I enjoyed this. Due to Events taking many, many spoons, I would not have tried to engage with something that seemed less ... deterministic. But this clearly had very deterministic elements just by groupby toy-pair | uniq, and was pretty fun to find each successive "obvious" thing, model it, "subtract it out", find a new "obvious", repeat until done.

comment by MadHatter · 2022-01-11T02:33:39.127Z · LW(p) · GW(p)

Thanks for organizing!

Feedback: I was a little bit surprised to see a perfectly regular solution. (And I did relatively poorly because of my assumption that there would not be one.) I feel like real-world data is never as clean as this; on the other hand, all data benefits from taking a closer look at it and trying to understand if there are any regularities in the failure modes of your modeling toolkit, so maybe this is just a lesson for me. Hard to say!