[S] D&D.Sci: All the D8a. Allllllll of it. Evaluation and Ruleset

post by aphyer · 2023-02-27T23:15:39.094Z · LW · GW · 7 comments

Contents

  RULESET
    PLAYER STATS
  STRATEGY
  LEADERBOARD
  BONUS OBJECTIVES
  FEEDBACK REQUEST
  UNRELATED JOB HUNTING BLEG
None
7 comments

This is a follow-up to [LW · GW]last week's D&D.Sci scenario [LW · GW]: if you intend to play that, and haven't done so yet, you should do so now before spoiling yourself.

There is a web interactive here you can use to test your answer, and generation code available here if you're interested, or you can read on for the ruleset and scores.

RULESET

A party has three stats, corresponding to the three main types of challenge they will face in the course of the game:

Combat.  The Black King wields the immense power of his Ring Of Orbs N-fold (where N is # of players).  If your Combat stat is too low to defeat him, you will surely be vanquished.

Friendship.  You will need to make allies among the consorts and denizens of your world...and also stay on good terms with your other party members!  If your Friendship stat is too low to do this, you will end up killing NPCs you were supposed to work with, or with your party collapsing into recriminations and refusing to talk to one another.

Shenanigans.  The world of Skaia is full of self-fulfilling prophecies, stable time loops, and similar such Shenanigans.  If you manipulate these well, they can work to secure your victory - if your Shenanigans stat is too low, they can work to guarantee your defeat.

Your party's success rate is determined by whichever of these three stats is lowest.  So stats of 10-10-10 are better than stats of 20-20-9.

To determine victory, you roll a number of d4s equal to your lowest stat.  Each die showing a 4 counts as a success.  If you get a number of successes >= the number of players, you win - if not, you lose.

For example, if a 2-player party has Combat 6, Friendship 10, and Social 4 (abbreviated 6-10-4), they will roll 4 d4s and win if at least 2 of those dice show a 4.

A party's stat total is the sum of stats of its members:

PLAYER STATS

Each class and aspect has different stats (generally summing to 6):

ClassAspectCombatFriendshipShenanigans
KnightRage411
MaidHeart141
SeerTime114
RogueBlood321
ThiefDoom312
BardBreath231
SylphLife132
WitchLight213
MageMind123
HeirSpace222
PrinceVoid2*2*2*
PageHope3*3*3*

To determine a character's score in a stat, you multiply the score of their class by the score of their aspect.  For example, an Heir (2-2-2) of Breath (2-3-1) has stats of 4-6-2.  A Seer (1-1-4) of Light (2-1-3) has stats of 2-1-12.

Two class/aspect pairs have special effects:

Prince and Void invert whatever they are paired with.  Any aspect paired with Prince, or any class paired with Void, treats all its stats as being 4 minus whatever they would usually be.  For example, the Prince (2-2-2) of Life (1-3-2) inverts the values of Life to 3-1-2, and so has stats 6-2-4.

Page and Hope provide great potential that is more difficult to reach.  They have higher-than-usual stats of 3-3-3, but any aspect paired with Page or any class paired with Hope treats all its stats as being 1 less than they would usually be (but not below 1).  For example, the Page (3-3-3) of Life (1-3-2) reduces the stats of Life to 1-2-1, and so has stats of 3-6-3.

Note that there is a symmetry between class and aspect.  For example, a Knight of Heart and a Maid of Rage are identical - both pair a 4-1-1 with a 1-4-1 for total stats of 4-4-1.  I don't think anyone actually explicitly realized this, but abstractapplic seemed very close to figuring it out (noticing that the six classpects that always lost a solo run were the Knight/Maid/Seer of Void and the Prince of Rage/Heart/Time).


STRATEGY

Having a high stat total is valuable, and some classpect combinations with low stat totals are bad.  However, more important than high stats is for your stat totals to be balanced.  

The highest stat total a character can have is 18, available to the Knight of Rage (16-1-1), Maid of Heart (1-16-1) and Seer of Time (1-1-16).  However, these characters are only good if other characters can compensate for their weaknesses, and on their own they're much weaker than a more balanced character.

Your party began with:

So your current stat total is 13-4-13.

The most important thing for you to do therefore was to select high-Friendship characters to help resolve your weakness in that stat.

Best possible play was to bring:

This brings your stat total to 20-21-20, letting you roll 20 dice for your required 4 successes, for a win rate of 77.5%.


LEADERBOARD

PlayerCharactersCombatFriendshipShenanigansLowest StatWin Rate
Optimal PlayMaid of Heart (1-16-1), Thief of Light/Witch of Doom (6-1-6)2120212077.5%
simonSylph of Breath (2-9-2), Maid of Hope (3-9-3)1822181869.4%
abstractapplicPage of Mind (3-3-6), Seer of Void (6-6-0)2213191341.6%
Random PlayTwo random legal heroes????????36.2%
YongePage of Hope (4-4-4), Heir of Space (4-4-4)2112211235.1%
Pessimal PlayPrince of Heart (6-0-6), Maid of Void (6-0-6)1941940.4%

Congratulations to simon, who picked two heroes who were both strong in Friendship and okay in other stats, getting fairly close to the optimal score.

You cannot hope to beat simon in a data-science-off.  He is simply the best there is.

Condolences to Yonge, who selected two excellent all-around heroes (both 4-4-4) that would have been a very good 2-person party, but didn't fit particularly well into your existing party with its Friendship weakness.

 

BONUS OBJECTIVES

Troll Bonus 1: This was something of a matter of opinion, since 'which aspects are nice' is not set in stone.  My personal recommendation would have been a Maid of Hope and a Page of Heart, both 3-9-3, for total stats of 19-22-19 (one short of the optimal score of 20), or just choosing the Thief of Light rather than the Witch of Doom in the optimal solution.

Troll Bonus 2: Total stats of the troll party were Combat 46, Friendship 42, Shenanigans 49 (meaning that trolls contributing to Friendship were more valuable).  From most to least valuable (ties broken arbitrarily):

TrollCombatFriendshipShenanigansLowest Stat Without
Knight of Blood122134
Rogue of Heart38134
Sylph of Space26436
Page of Breath36336
Seer of Mind121237
Bard of Rage83138
Heir of Void44438
Maid of Time14438
Witch of Life23639
Thief of Light61640
Mage of Doom32640
Prince of Hope11141

simon and abstractapplic both identified the Prince of Hope as not being very useful (this is one of two maximally-dreadful classpects with stats of 1-1-1, along with the Page of Void), and both flagged the Mage of Doom as being subpar (with mediocre overall stats and a tilt away from Friendship).  simon in particular gave a list that approximated this reasonably well (with the biggest diffs being in the Heir of Void and Maid of Time - while the Heir of Void (4-4-4) is much better than the Maid of Time (1-4-4) in general, in this party the Combat diff doesn't matter and both are equally okay).


FEEDBACK REQUEST

As usual, I'm interested to hear feedback on what people thought of this scenario.  If you played it, what did you like and what did you not like?  If you might have played it but decided not to, what drove you away?  What would you like to see more of/less of in future?  Do you think the scenario was too complicated to decipher?  Or too simple to feel realistic?  Or both at once?  Do you have any other feedback?

Given that this scenario got fewer answers and less engagement than past ones, I suspect I did something wrong in writing it, but I'm not quite sure what.  The scenario didn't feel that overcomplicated from the inside - was it much more unapproachable than I expected?  Was the dataset being too large a serious inconvenience?  Did the baffling Homestuck-based 'story' and 'art' turn off a lot of players?

In any case, thanks for playing, and I hope you had fun!

UNRELATED JOB HUNTING BLEG

I'm currently in the market for a new job.  (Summary: eight years work experience in finance/programming, currently working in NYC but unattached and willing to move).  If you know anyone interested in hiring the sort of person who spends his spare time writing these scenarios, shoot me a PM!  Thank you! 

7 comments

Comments sorted by top scores.

comment by simon · 2023-02-28T05:21:25.482Z · LW(p) · GW(p)

Thanks aphyer, and thanks for enabling me to spend a lot of time learning Haskell instead of efficiently working towards solving this.

Since I have now been  

officially recognized as "the best"

by the all-important Creator of This and Many Other Scenarios, I have some advice for others who might be inclined to participate and to prove your claim wrong:

  1. Just do it
  2. Be curious and have fun
  3. Don't have too high standards for techniques or tools, just try something -- anything you don't know the answer to already provides you new information
  4. Always keep in mind that any result you do get is incomplete and may well be misleading
  5. ...but also that that difference between map and territory is something you may be able to reason about or probe
  6. ...and also that it's always an option to switch to trying something else
  7. Don't be afraid to be "subjective" - these scenarios are for fun and learning, and of course getting the best result, not for providing some "objective" justification for a decision-maker. Use that natural neural net!

About the difficulty of the scenario:

I think that the quantity of data was a major friction here, such that likely it would have had more participation and more success if both the number of class/aspect combos and the number of games were a lot smaller. 

While the option of the reduced dataset was provided, it's hard to get oneself to do something obviously worse when there's an "objectively" superior choice available. I guess I ignored my own advice above!

comment by abstractapplic · 2023-02-28T00:38:56.730Z · LW(p) · GW(p)

Reflections on my performance:

This stings my pride a little; I console myself with the fact that my "optimize conditional on Space and Life" allocation got a 64.7% success rate.

If I'd allocated more time, I would have tried a wider range of ML algorithms on this dataset, instead of just throwing XGBoost at it. I'm . . . not actually sure if that would have helped; in hindsight, trying the same algorithms on different subsets ("what if I built a model on only the 4-player games?") and/or doing more by-hand analysis ("is Princeliness like Voidliness, and if so, what does that mean?") might have provided better results.

Reflections on the challenge:

I found this one hard to get started with because it had a de facto 144 explanatory columns ("does this party include a [Class] of [Aspect]?") along with its 1.4m rows, and the effects of each column was mediated by the effects of each other column. This made it difficult - and computationally intensive! - to figure out anything about what classpect combinations affect the outcome.

That said, I appreciated this scenario. The premise was fun, the writing was well-executed, and the challenge was fair. Also, it served as a much-needed proof-by-example that "train one ML model, then optimize over inputs" isn't a perfect skeleton key for solving problems shaped like this. If it was a little obtuse on top of that . . . well, I can chalk that up to realism.

Replies from: aphyer, abstractapplic
comment by aphyer · 2023-02-28T01:17:11.758Z · LW(p) · GW(p)

Good to know, thank you!  I think my main takeaway is that I am really bad at judging difficulty levels on these: I actually expected this scenario to be easier than the previous Dwarves & D.Sci scenario, but that one had three different near-perfect solutions while this one only had one noticeably-better-than-random solution.

Long-winded and empirically incorrect argument that led me to that expectation follows:

I was aware of the large number of possible characters - this is why the dataset ended up being so big, because I wanted to be sure it was large enough to allow simple analyses to work in spite of that.  One sample approach I tried out on my end as part of designing the scenario was this:

  • Take only teams that contained a Knight of Blood and a Mage of Time (but of any size).
  • For each possible classpect, find its winrate on those teams.

This would have given you ~4k teams, with ~120 with each possible other classpect, which wasn't enough to get an optimal solution but would have been an excellent first step:

  • Page of Heart has a 59.46% winrate
  • Maid of Heart has a 57.01% winrate
  • Maid of Breath has a 51.55% winrate
  • ...
  • ...
  • Heir of Hope has a 27.10% winrate
  • Heir of Rage has a 26.85% winrate
  • Maid of Void has a 22.64% winrate

As I envisioned things playing out:

  • Just running this approach and grabbing the two highest characters you could:
    • You would have picked a Page of Heart (3-9-3) and a Maid of Breath (2-12-1) 
    • This would have given you stats of 18-25-17, for a lowest stat of 17 and a 64% winrate. 
    • This isn't optimal (it over-invests in Friendship, since you've picked two different high-Friendship characters), but it's noticeably better than random.
  • Additionally, looking at the high/low scores might point you further in useful directions: 
    • For instance, Heart/Breath/Life showed up an awful lot in the top on a variety of different classes.
    • This might have pointed you in the direction of 'there's a specific thing I'm missing' and gotten you to bring only one Heart-like hero.

Sadly it seems I overestimated how obvious a thing to try that was.  Based on the answers it looks like:

  • simon did something fairly similar to this, requiring 4-person teams but only requiring one of your two starting characters on the team, and ended up with a similar outcome of 'generally good, but overinvested a bit in Friendship'.
  • Yonge ran some analysis that did a good job of finding 'generally strong characters' but wasn't specific to the two characters you started with.
  • You did some kind of ML thing I didn't understand.
comment by abstractapplic · 2023-02-28T00:48:20.430Z · LW(p) · GW(p)

Reflections x3 combo:

Just realized this could have been a perfect opportunity to show off that modelling library I built [LW · GW], except:

A) I didn't have access to the processing power I'd need to make it work well on a dataset of this size.

B) I was still thinking in terms of "what party archetype predicts success", when "what party archetype predicts failure" would have been more enlightening. Or in other words . . .

. . . I forgot to flip the problem turn-ways.

comment by Jonathan Paulson (jpaulson) · 2023-02-28T00:02:13.541Z · LW(p) · GW(p)

The link at the top is to the wrong previous scenario

Replies from: aphyer
comment by aphyer · 2023-02-28T00:03:55.855Z · LW(p) · GW(p)

YOU SAW NOTHING

comment by Yonge · 2023-03-01T00:00:51.728Z · LW(p) · GW(p)

I think this would have worked better if there had been less data. Looking at the other responses I'm not convinced that adding serval hundered thousand addittional rows did anything significant other than to extend the time needed to scan across this. This stopped me doing a more extensive analysis.

Also I found this one less interesting than the previous ones. I suspect a lot of this has to do with the introductory storyline which didn't make much sense to me. Possibly this was because I hadn't heard of the "homestuck" story before, and didn't have any relevant pre-existing context to relate this to.