D&D.Sci 4th Edition: League of Defenders of the Storm Evaluation & Ruleset

post by aphyer · 2021-10-05T17:30:50.049Z · LW · GW · 17 comments

Contents

  RULESET
    CHARACTER STATS
    HOW CHARACTERS FIGHT
    HOW TEAMS FIGHT
  DATASET GENERATION
  PVE LEADERBOARD
  PVP LEADERBOARD
      Note: The commentary below should be considered non-final for a few days to give people time to point out that I've misread the teams they submitted/added up win percentages wrong/made other obvious mistakes.  If I have messed something like that up I'll have to recalculate, so don't count on victor...
      Edited to add: Alumium has been disqualified for cheating.  Alumium is a person I know in real life, and have discussed this scenario with while writing it.  They made an account on LW in order to submit an answer to this scenario, even though they already knew the rules.  As such, they are DISQUALI...
  FEEDBACK REQUEST
None
17 comments

This is a follow-up to last week's D&D.Sci scenario [LW · GW]: if you intend to play that, and haven't done so yet, you should do so now before spoiling yourself.

RULESET

Code is available here for those who are interested.

CHARACTER STATS

A character has two stats: an Element and a Power Level.

18 of the 19 characters are power levels 1-6 of the elements Fire, Water and Earth:

Power LevelFireWaterEarth
1Volcano VillainArch-AlligatorLandslide Lord
2Oil OozeCaptain CanoeEarth Elemental
3Fire FoxMaelstrom MageDire Druid
4Inferno ImpSiren SorceressQuartz Questant
5Phoenix PaladinWarrior of WinterRock-n-roll Ranger
6Blaze BoyTidehollow TyrantGreenery Giant

The remaining character, the Nullifying Nightmare, has a Power Level of 5 with the unique element of Void.  

The NPC team consists of Fire 5, Water 6, Earth 3, Earth 4, Earth 6.

Congratulations here to abstractapplic, who was the first to figure out the elements.

HOW CHARACTERS FIGHT

A fight between two teams is composed of fights between the individual characters.  When two characters fight one another, it works as follows:

HOW TEAMS FIGHT

To find the outcome of a game between two teams:

This ruleset encourages balanced teams.  For example, a team of 5 Fire characters will lose to a team of 4 Earth and 1 Water characters.  Even though 4/5 of character matchups favor the Fire characters, nothing on their team can beat the one Water character, and eventually it will work its way through their whole team and win.

Once you understand how the rules work, general strategy is:

DATASET GENERATION

The games you have access to were played by players grouped together by the games auto-party functionality.  These players do not build coordinated teams, so there's no correlation between characters.

However, some characters are more popular than others (most notably the Siren Sorceress, whose infamous costume has won her a large and...enthusiastic...fan base).

Overall, Water characters are somewhat more common and Earth characters somewhat less common.  This doesn't affect the game itself, but it means that naive win rate evaluations will make Fire characters look substantially weaker than they are (since the Water characters that counter them are common and the Earth characters that they counter are rare).

The full dataset contains around a million game results.  However, Cloud Liquid Gaming's previous data specialist loaded the data into an Excel carelessly and accidentally truncated it at row 65536, so you only received 65535 entries in your data.

(Real-world Data Science Moral: it is very rare for the length of a dataset to naturally be a power of 2/one less than a power of 2.  If you see that, you should suspect that your data got cut off at some point.)

PVE LEADERBOARD

Note: all winrates below were Monte-Carlo calculated rather than explicitly derived.  Luckily it doesn't look like rankings are close enough for slight Monte Carlo error to matter.

The optimal team for fighting the NPC team consists of Fire 5 (Phoenix Paladin), Fire 6 (Blaze Boy), Water 1 (Arch-Alligator), Earth 5 (Rock-n-roll Ranger), Earth 6 (Greenery Giant).

The most important things to bring (in roughly descending order) were:

(Nullifying Nightmare, despite its very high overall win rate, is not that good to bring here - it performs poorly against teams with multiple 6s on them).

Entries were:

Entrant(s)TeamWin Rate
Optimal PlayFire5, Fire6, Water1, Earth5, Earth681.47%
gjm*Fire5, Fire6, Water1, Earth1, Earth680.40%
Alumium, simonFire5, Fire6, Earth1, Earth5, Earth676.53%
GuySrinivasanFire5, Fire6, Water3, Earth5, Earth672.41%
Measure, JemistFire6, Water6, Earth5, Earth6, Void570.01%
abstractapplicFire5, Fire6, Water6, Earth6, Void566.97%
Maxwell PetersonFire5, Water1, Earth1, Earth6, Void562.05%
YongeWater6, Earth1, Earth5, Earth6, Void536.00%
Random Play5 randomly selected characters28.55%
lsusrFire3, Water5, Earth2, Earth3, Earth414.97%

*After fixing a very entertaining early bug where he set up his code to pessimize his team instead of optimizing it.  I like to think that a guy from the opposing team snuck in and offered him a briefcase full of cash to sabotage his employers.

Congratulations to everyone who submitted.  Particular shoutouts go to: 

For any future players who want to test their performance, you can edit and run these lines of the code to simulate a given team against the NPC team:

enemy_team = [
   get_hero_by_stats(heroes, 'F', 5),
   get_hero_by_stats(heroes, 'W', 6),
   get_hero_by_stats(heroes,'E', 3),
   get_hero_by_stats(heroes, 'E', 4),
   get_hero_by_stats(heroes, 'E', 6),
]

test_team_pve = [ 
   get_hero_by_stats(heroes, 'F', 5),
   get_hero_by_stats(heroes, 'F', 6),
   get_hero_by_stats(heroes,'E', 1),
   get_hero_by_stats(heroes, 'E', 5),
   get_hero_by_stats(heroes, 'E', 6),
]

res = team_matchup_evaluate(test_team_pve, enemy_team, runs=100000,verbose=True)

or if you aren't familiar with the code, DM me and I can run for you.

PVP LEADERBOARD

Note: all winrates below were Monte-Carlo calculated rather than explicitly derived. Luckily it doesn't look like rankings are close enough for slight Monte Carlo error to matter.

Note: The commentary below should be considered non-final for a few days to give people time to point out that I've misread the teams they submitted/added up win percentages wrong/made other obvious mistakes.  If I have messed something like that up I'll have to recalculate, so don't count on victory/defeat until some more eyes have confirmed.

Edited to add: Alumium has been disqualified for cheating.  Alumium is a person I know in real life, and have discussed this scenario with while writing it.  They made an account on LW in order to submit an answer to this scenario, even though they already knew the rules.  As such, they are DISQUALIFIED.  The new winner is simon.  My apologies to all other players.

The PVP submissions received were (ordered from earliest to latest received):

lsusr: Fire4, Water1, Water5, Earth5, Void5

Measure: Fire1, Water4, Water6, Earth1, Void5

abstractapplic: Fire5, Fire6, Water6, Earth6, Void5

Yonge: Water6, Earth1, Earth5, Earth6, Void5

GuySrinivasan: Fire6, Water5, Water6, Earth6, Void5

Maxwell Peterson: Fire6, Water6, Earth1, Earth6, Void5

gjm: Fire5, Fire6, Water6, Earth6, Void5

Alumium: Fire6, Water6, Earth1, Earth5, Earth6

Jemist: Fire6, Water6, Earth5, Earth6, Void5

simon: Fire2, Fire6, Water1, Water6, Earth6

The most common team consisted of the Nullifying Nightmare, all three 6-power characters, and one 5-power character: abstractapplic, GuySrinivasan, gjm and Jemist all submitted variants on this team (between them choosing all three different elements for their 5-power character), plus Alumium submitted that team earlier before changing it out for a different one.

Win rates were:

 AlumiumsimonMaxwell PetersonJemistabstractapplicgjmGuySrinivasanYongeMeasurelsusrOverall Score
Alumium46.29%56.57%58.27%62.69%62.87%65.60%82.14%55.99%73.95%DQ
simon53.71%53.87%64.97%67.80%68.06%60.75%69.63%50.21%68.09%5.57
Maxwell Peterson43.43%46.13%54.18%58.78%58.53%61.90%73.82%68.37%84.80%5.50
Jemist41.73%35.03%45.82%50.46%50.49%49.26%73.09%76.70%87.46%5.10
abstractapplic37.31%32.20%41.22%49.54%50.13%50.51%67.15%64.42%86.36%4.79
gjm37.13%31.94%41.47%49.51%49.87%50.21%67.34%64.28%86.50%4.78
GuySrinivasan34.40%39.25%38.10%50.74%49.49%49.79%54.93%57.00%88.72%4.62
Yonge17.86%30.37%26.18%26.91%32.85%32.66%45.07%77.44%75.83%3.65
Measure44.01%49.79%31.63%23.30%35.58%35.73%43.00%22.56%40.80%3.26
lsusr26.05%31.91%15.20%12.54%13.64%13.50%11.28%24.17%59.20%2.07

The four nearly-symmetrical teams did quite well (with the differences between them coming down to which elements countered other teams best), but did not ultimately win.  

The Nullifying Nightmare was extremely common but not very strong (since most teams included multiple 6s) - ultimately neither of the top 2 teams included it.  

Conditional on this data holding up when more eyes look at it, I believe the victory went to Alumium, who managed to get all three 6s, avoid the Nightmare, include a Power 1 character for the counterpick against strong teams, and have an elemental tilt that helped prey more effectively on some lower-tier teams.  Alumium's team was a bit Earth-heavy, but no other team quite managed to compete.

Congratulations to simon, whose confusing inclusion of Oil Ooze (Fire2) instead of either Fire1 or Fire5 cost him some percentage points but who was the only submitter not to include the Nightmare on his team.

Congratulations simon!  Once you've figured out what theme (either a general genre or a specific work*) you want to request an upcoming scenario be based on, PM or comment and I'll try to get it to happen.  I can't promise it'll happen soon (it'll take some time to write one of these, other people are queued up to publish theirs, and I might end up submitting a Christmas-themed one in December, so you'll end up waiting until some time late this year or early next year).

*Ability to select a specific work is contingent on me being familiar with that work and thinking I can write a scenario based on it.

FEEDBACK REQUEST

I'm interested to hear feedback on what people thought of this scenario.  If you played it, what did you like and what did you not like?  If you might have played it but decided not to, what drove you away?  What would you like to see more of/less of in future?

Thanks for playing!  Now, if you'll excuse me, the League of Legends world championship is starting, and I need to go watch North America's finest best least dreadful teams be shamefully routed by teams from countries I've never heard of!

17 comments

Comments sorted by top scores.

comment by abstractapplic · 2021-10-06T17:01:10.446Z · LW(p) · GW(p)

This was extremely good. In particular, I like that you managed to make the challenge tractable to both Analysis and Machine Learning. I also appreciated that you included an explicit Real-world Data Science Moral in the wrap-up; I should try to do that more often.

comment by gjm · 2021-10-05T23:44:09.232Z · LW(p) · GW(p)

Feedback on the scenario: I liked it, but evidently I took the framing story a bit too seriously because I took it to indicate that I would probably be wasting my time trying to understand the game mechanics, when actually they weren't so very complicated. I can't complain about how that worked out for me, given that mindlessly putting the given data into a model-fitting machine and cranking the handle produced a very good result, and if it had seemed more likely that the thing was approachable then I might have avoided it lest it be too much work :-), but I do feel a bit bad about not trying to use my brain a bit more and my computer a bit less.

comment by gjm · 2021-10-05T19:44:54.659Z · LW(p) · GW(p)

I strongly suspect that I got rather lucky; at any rate, my model's predicted win-rate for my team was substantially less than the real ~80%, suggesting that the model didn't do a great job of capturing reality.

I wonder whether lsusr had a sign-error bug similar to mine.

comment by simon · 2021-10-05T18:54:20.871Z · LW(p) · GW(p)

Hmm how about we switch to using a Condorcet method for the PVP ranking? 

I screwed up my thinking on whether the sides were different, will add an edit/reply [LW(p) · GW(p)] to my comment on the main post later.

Thanks to Maxwell Peterson for introducing me to R through his post [LW · GW] and code, I hope to continue using R later, ideally with a better gears-level understanding than currently.

Replies from: aphyer
comment by aphyer · 2021-10-05T19:11:23.363Z · LW(p) · GW(p)

True, you are the Condorcet winner. :P

Do you know how you ended up with Oil Ooze on your team? I was expecting to trick a lot of people into submitting Nightmare, but I wasn't expecting the Ooze to show up.

Replies from: simon, Alumium
comment by simon · 2021-10-05T19:38:40.288Z · LW(p) · GW(p)

The magic black box  supplied to me by Maxwell, after I fiddled with it a tiny bit and supplied it with adjusted data, told me that BGNOT was supposedly the strongest team in general, and that the strongest counter to BGNOT was ABGOT. It also claimed that ABGOT was the strongest counter to gjm's BGNPT. I asked the magic black box how well a few candidate teams, including ABGOT, did against the non-secret competitors already posted, and the numbers it gave looked more generally decent for ABGOT than the other candidates (I was looking for broad-spectrum effectiveness more than average effectiveness) and it also said that ABGOT would have a decent winrate against average teams, so I went with it. I would have liked to make a figure of merit and find the top team for that figure of merit but wasn't able to do so in time.

In other words, the magic black box liked Oil Ooze for some reason.

Replies from: maxwell-peterson
comment by Maxwell Peterson (maxwell-peterson) · 2021-10-05T21:54:59.115Z · LW(p) · GW(p)

When I read in the main post that the inclusion of Oil Ooze was confusing, I thought my magic box might be the guilty one!

comment by Alumium · 2021-10-06T02:53:06.260Z · LW(p) · GW(p)

I'm perfectly happy not to claim any sort of prize. 

I only got the elements due to Measure.

I only dropped the Nightmare due to Maxwell Peterson.

Also, I have no idea what I'd even ask for as a scenario.

Replies from: simon
comment by simon · 2021-10-06T04:08:43.198Z · LW(p) · GW(p)

Also, I have no idea what I'd even ask for as a scenario.

Neither do I; I was just seeking the glory of (slightly tarnished due to hypothetical rule change) victory.

comment by Alumium · 2021-10-06T02:49:25.974Z · LW(p) · GW(p)

I legitimately did not expect to do that well. 

If I have done better than others, it is because I have stood on the shoulders of, uh, ogres at least.

I've gotten two morals from this. The good moral is 'Even if you have no idea what is going on, you can still data science at something.'

The bad moral is 'When in doubt, crib other people's notes (only be sure always to call it please 'research').'

Replies from: aphyer, Alumium
comment by aphyer · 2021-10-06T23:27:47.292Z · LW(p) · GW(p)

I think you're selling yourself a bit short here. You say that you 'only dropped the Nightmare due to Maxwell Peterson' - but Maxwell himself included the Nightmare on his PVP team!

Only two people submitted PVP teams without the Nightmare on them. One of them was you, and the other was using an analysis he didn't understand that led to him including Oil Ooze on his team for no reason he can discern even after the fact. (Sorry, simon).

If you managed to read through other people's findings and get more use out of them than those people themselves did, I think that leads to a well-deserved victory.

comment by Alumium · 2021-10-06T02:50:32.883Z · LW(p) · GW(p)

Feedback: I'm biased because I won, but I had a great time. This was very approachable even for a complete beginner, while still having sneaky hidden tricks.

comment by SarahNibs (GuySrinivasan) · 2021-10-05T20:18:05.378Z · LW(p) · GW(p)

I was fairly surprised my PvE team did so well. Why did my heuristics basically work?

How well do individuals do? Multiply. How well do teams do, relative to the product of their members? How well do individuals on a team do against the known opponents, pairwise? Okay try all teams, combine their "vs opponents" scores with "team coherence" scores, pick the max.

Finding how well teams did relative to how their individuals did let me accidentally pick just from balanced teams; asking how well individuals did against the known opponents got rid of Null and biased towards the 6s, 5s, and 1s. I think Maelstrom (Water 3) probably got on there because the team did really well in the data despite Maelstrom's inclusion, which looked the same to me as Maelstrom synergizing well with the rest of the team.

I weighted "individuals vs opposing team" equal to "team synergy"; I wonder if I had left the weights as each individual = each other individual = team synergy, whether that would have correctly axed 2-4s while still axing Null? [tries] Nope, that lets Null in quickly.

Replies from: GuySrinivasan
comment by SarahNibs (GuySrinivasan) · 2021-10-05T21:52:35.447Z · LW(p) · GW(p)

Feedback: I liked this one a lot, in theory. I just found myself with less time than I wanted to actually engage. It had a latent structure that was definitely discoverable, but also definitely plenty obfuscated. Even with time constraints the barrier-to-entry was low enough that I could get something nice in pretty quickly. And the differences between individual, team, vs-that-team, and vs-any-team were interesting to mull over.

comment by Measure · 2021-10-05T20:17:53.048Z · LW(p) · GW(p)

Confirmed PVP rankings with my own test script (my run swapped 5th and 6th place, but they're very close).

Replies from: Measure
comment by Measure · 2021-10-07T00:28:58.507Z · LW(p) · GW(p)

Lol, I just now realized that 5th/6th place are the exact same team.

Replies from: aphyer
comment by aphyer · 2021-10-07T00:32:15.169Z · LW(p) · GW(p)

...huh, indeed they are, I guess I missed that.