D&D.Sci December 2022 Evaluation and Ruleset

post by abstractapplic · 2022-12-12T21:21:08.781Z · LW · GW · 8 comments

Contents

  Ruleset
    Snark Sub-Species
      Average
      Waking-Time
    Which Snarks are Hunted?
  Strategy
  Leaderboardplot
  Reflections
None
8 comments

This is a followup to the D&D.Sci post [LW · GW] I made ten days ago; if you haven’t already read it, you should do so now before spoiling yourself.

Here is the web interactive I built to let you evaluate your solution; below is an explanation of the rules used to generate the dataset (my full generation code is available here, in case you’re curious about details I omitted). You’ll probably want to test your answer before reading any further.


Ruleset

Snark Sub-Species

There are thirteen distinct types of Snark; three of these are Boojums. Typical characteristics for each sub-species (which are frequently deviated from; see my generation code for details) are summarized in the table below:

NameFreqBoojum?

Average

Waking-Time

Other Characteristics
Vorpal19%No2:27pm

Hollow yet Crisp taste

Extreme Fondness

Moderate Cleanliness

Moderate Phobia

Frumious7%No2:00pm

Crumbling yet Blunt taste

Mild/Moderate Fondness

Moderate Cleanliness

Extreme Phobia

Slythy14%No4:20pm

Hollow/Artless taste

Crisp/Neat taste

Mild Everything

Mimsy4%No4:10pm

Artless/Meagre taste

Bright taste

Moderate Everything

Manxome4%No2:41pm

Hollow/Haunting taste

Blunt taste

Unusually specific sleep schedule

(Very!) Mild Fondness

Moderate Cleanliness

Extreme/Moderate Phobia

Whiffling6%No1:39pm

Hollow yet Bright taste

Relatively specific sleep schedule

Extreme Fondness

(Very!) Mild Cleanliness

Moderate Phobia

Burbling8%No4:33pm

Artless yet Crisp/Clear taste

Extreme Fondness

Moderate Phobia

Gyring10%No3:22pm

Meagre yet Neat taste

Moderate Fondness

Moderate Cleanliness

Extreme Phobia

Gimbling11%No3:23pm

Artless/Meagre taste

Extreme Fondness

Moderate Cleanliness

Mild Phobia

Cromulent5%Yes4:01pm

Hollow/Crumbling taste

Blunt taste

Mild Cleanliness

Phobia almost never Moderate

Snippid2%Yes2:44pm

Meagre/Haunting taste

Clear taste

Moderate Fondness

Moderate/Extreme Phobia

Scrumbling3%Yes4:22pm

Crumbling yet Blunt taste

Mild Fondness

Moderate/Mild Cleanliness

Moderate/Mild Phobia

Which Snarks are Hunted?

2% of all sighted Snarks are left unhunted due to logistical problems or gluts of potential targets.

Conventional wisdom in the Snark-hunting community is that Snarks with a Taste containing “Crumbling” and (to a lesser extent) “Blunt” are much more likely to be Boojums, and so should not be hunted. Six-sevenths of Snark-hunters follow this advice regarding “Blunt”, and everyone follows it regarding “Crumbling”.

Strategy

The risk associated with each Snark in the list is as follows:

SnarkRisk
A1.75%
B3.39%
C2.41%
D0.29%
E98.93%
F77.46%
G0.32%
H0.07%
I3.71%
J1.34%
K2.21%
L1.21%
M0.86%
N0.14%
O0.55%
P1.26%
Q0.03%
R0.38%
S0%
T0.01%
U0.03%
V0.05%
W4.58%
X7.13%
Y0.04%

Which Snarks are worth hunting is a function of your own appetite for risk; the only certainty (given you assign money sublinear utility, only care about killing Snarks so you can spend the reward money, and value your life and the lives of the crew at >0), is that E, F and X aren't worth it.

Leaderboardplot

The solutions provided for the Bonus Task look like this.

Player (listed alphabetically)Snarks SelectedP(Survival)EV
aphyerABGHLPQVWY87.899%8.79 Snarks
simonYNVGQRPHDBL93.004%10.23 Snarks
Thomas SepulchreBPGYHQTS94.959%7.60 Snarks
YongeBGHPQY94.964%5.69 Snarks

 

In classic Carroll-ian fashion, all have won and all must have prizes. Everyone but aphyer is on the Efficient Frontier, and I was planning to offer [UNSPECIFIED BENEFIT (UNDERWHELMING)] to him just for being a fellow D&D.Sci creator; you'll all be contacted when it's ready.

Reflections

In my view, this challenge did nothing it attempted to do, while facilitating several worthwhile things it attempted to facilitate, and a few it didn’t.

The bad news first. Most critically, I provided many too many rows, and far too few columns (continuous columns in particular), enabling people to successfully select Snarks by simply subsetting (sorry!). The central conceit of trying to optimize conflicting objectives under uncertainty also lacked a necessary basis: without a good sense of how much a Snark-corpse is worth, and how much the lives of the crew (should?) matter, the intended tension between these targets hangs loose.

However, the challenge ended up serving as a fruitful basis itself. It (completely unintentionally!) provided a demonstration of how much easier optimizing for EV is than minimizing risk, and how much harder it is to fairly evaluate. It allowed the creation of an interactive with a novel gimmick, tempting players to either hyperbolically discount and give up early, or to take on more risk than they planned in order to reach the next story event. It represented a cautionary tale for makers of future D&D.Sci games. And, finally, the intro poem was (in my humble opinion) an absolute banger.

In summary, this game was much better at having Artistic Merit than at actually being worth playing (my congratulations and condolences to the people who did play it for their excellent work). That was accidentally a really good match for the time of year, since everyone with Data skills spends December frantically trying to hit end-of-year targets (c.f. the muted response to aphyer’s excellent How the Grinch Pessimized Christmas [LW · GW]). I’ll be sure to take seasonality into account when planning future games.

. . . I usually end these sections with “feedback on these points, and any other point, is greatly appreciated”, but this time I’m already pretty sure what went well/badly. That said, if you disagree with any of the above, I’d still like to be corrected.

8 comments

Comments sorted by top scores.

comment by Christian Z R · 2024-10-02T12:33:39.399Z · LW(p) · GW(p)

Thanks for the great work. Found out that a simple Random Forest model combined with avoiding everything Crumbly bagged me 20 snarks with a 72.5% survival chance. So expected number of snarks would be 14.5. Looking at it afterwards this seems like actually the worst and most suicidal way to attack the problem. But, hey, at least I got made Boatmaster.

comment by aphyer · 2022-12-13T00:40:03.405Z · LW(p) · GW(p)

Oh come on, at least put 'Random Play' in the leaderboard so I can feel better about being the only person not to win! :P

Replies from: simon
comment by simon · 2022-12-13T03:51:48.616Z · LW(p) · GW(p)

Calculated using assumptions that I thiiiink are correct given that each snark hunting choice is independent, if you don't trust me you can work it out for yourself :p

I used the 3% chance of conventional non-hunting for non-blunt non-crumbling snarks given in the code, not the 2% given in the post.

RandomN = N% chance to pick each Snark (no floor at 6).

comment by Thomas Sepulchre · 2022-12-12T21:46:45.563Z · LW(p) · GW(p)

My model was spot on, and yet, somehow, the results aren't very close to the truth, I'll have to think about what I did wrong. Anyhow, I liked the challenge a lot, thank you!

Also, the poem was amazing, thanks again for that!

Replies from: simon
comment by simon · 2022-12-13T03:56:45.038Z · LW(p) · GW(p)

How inaccurate were your results? Maybe your expectations were just too high?

Replies from: Thomas Sepulchre
comment by Thomas Sepulchre · 2022-12-13T06:26:35.822Z · LW(p) · GW(p)

I placed B as the safest snark, despite it being the 21st

In general, I completely missed the fact that the choice not to hunt a snark was very far from random, thus introducing a bias I neither noticed nor corrected for

Replies from: simon
comment by simon · 2022-12-13T19:39:35.358Z · LW(p) · GW(p)

I'm not convinced that's the issue... 

If B is a boojum it's almost certainly a Snippid, which should show up just fine.

(0.03386145617504304, {'Vorpal': 0.9114744863640762, 'Frumious': 0.00013955487845201242, 'Slythy': 0.012207182834474093, 'Mimsy': 0.0, 'Manxome': 0.0, 'Whiffling': 0.0, 'Burbling': 0.0, 'Uffish': 0.0, 'Gyring': 0.015702867032507836, 'Gimbling': 0.026614452715446928, 'Cromulent': 1.1153450923986715e-05, 'Snippid': 0.033850302724119055, 'Scrumbling': 0.0})

The above is the output for B from adding a "normalized_sprobs" to abstractapplic's eval_snark_probs as follows:

def eval_snark_probs(ptaste, mtaste, wakemins, fond, lin, phob):
sprobs = eval_species_probs(ptaste, mtaste, wakemins, fond, lin, phob)
Ybooj = sum([sprobs[name] for name in sprobs if snarks[name]["boojum"]==True])
Nbooj = sum([sprobs[name] for name in sprobs if snarks[name]["boojum"]==False])
normalized_sprobs = {name: prob / sum([sprobs[name] for name in sprobs]) for name, prob in sprobs.items()}
return Ybooj/(Ybooj+Nbooj),normalized_sprobs

Replies from: Thomas Sepulchre
comment by Thomas Sepulchre · 2022-12-16T14:41:54.601Z · LW(p) · GW(p)

Sorry for the late response

You're absolutely right, thank you, the inaccurate positioning of B has nothing to do with the probability for a snark not to be hunted

Looking at the code, my model is actually not really spot on, it just kind of looks similar to the real one. I also assume that the snarks can be split into species, each with specific waking-times, phenotypes and probability of being a snark, but, in details, both are actually quite different.

So yes, I built a different model, and got a different ranking of snarks, what was I expecting '^^

Thank you