D&D.Sci 4th Edition: League of Defenders of the Storm Evaluation & Ruleset
post by aphyer · 2021-10-05T17:30:50.049Z · LW · GW · 17 commentsContents
RULESET CHARACTER STATS HOW CHARACTERS FIGHT HOW TEAMS FIGHT DATASET GENERATION PVE LEADERBOARD PVP LEADERBOARD Note: The commentary below should be considered non-final for a few days to give people time to point out that I've misread the teams they submitted/added up win percentages wrong/made other obvious mistakes. If I have messed something like that up I'll have to recalculate, so don't count on victor... Edited to add: Alumium has been disqualified for cheating. Alumium is a person I know in real life, and have discussed this scenario with while writing it. They made an account on LW in order to submit an answer to this scenario, even though they already knew the rules. As such, they are DISQUALI... FEEDBACK REQUEST None 17 comments
This is a follow-up to last week's D&D.Sci scenario [LW · GW]: if you intend to play that, and haven't done so yet, you should do so now before spoiling yourself.
RULESET
Code is available here for those who are interested.
CHARACTER STATS
A character has two stats: an Element and a Power Level.
18 of the 19 characters are power levels 1-6 of the elements Fire, Water and Earth:
Power Level | Fire | Water | Earth |
1 | Volcano Villain | Arch-Alligator | Landslide Lord |
2 | Oil Ooze | Captain Canoe | Earth Elemental |
3 | Fire Fox | Maelstrom Mage | Dire Druid |
4 | Inferno Imp | Siren Sorceress | Quartz Questant |
5 | Phoenix Paladin | Warrior of Winter | Rock-n-roll Ranger |
6 | Blaze Boy | Tidehollow Tyrant | Greenery Giant |
The remaining character, the Nullifying Nightmare, has a Power Level of 5 with the unique element of Void.
The NPC team consists of Fire 5, Water 6, Earth 3, Earth 4, Earth 6.
Congratulations here to abstractapplic, who was the first to figure out the elements.
HOW CHARACTERS FIGHT
A fight between two teams is composed of fights between the individual characters. When two characters fight one another, it works as follows:
- Some elements counter others:
- Fire is countered by Water
- Water is countered by Earth
- Earth is countered by Fire
- If one character's element is countered by the other, that character loses the fight (regardless of power level). For example, if Oil Ooze (Fire 2) fights Greenery Giant (Earth 6), Oil Ooze will win.
- If the characters are the same element, the higher-power one will win. For example, if Phoenix Paladin (Fire 5) fights Fire Fox (Fire 3), Phoenix Paladin will win.
- If the characters are the same element and the same power, each has a 50% chance to win.
- There are two special cases:
- The 1 of each element counters the 6 of the same element, and beats it rather than losing to it. So if Volcano Villain (Fire 1) fights Blaze Boy (Fire 6), Volcano Villain will win. (Congratulations to Yonge, who I think is the first person to have explicitly noticed one of these counters).
- The Nullifying Nightmare has Power Level 5, and the unique element of Void. Void does not counter any elements, and is not countered by any elements - the Nightmare just fights with Power Level directly, as if it were the same element as its opponent. So it will lose a fight to any Power 6, have a 50% chance against any Power 5, and beat any Power 1-4.
HOW TEAMS FIGHT
To find the outcome of a game between two teams:
- Choose a random character from each team.
- Those two characters fight.
- The loser is KOd and removed from their team.
- The winner sticks around.
- Repeat this process until one team has run out of characters. That team loses.
This ruleset encourages balanced teams. For example, a team of 5 Fire characters will lose to a team of 4 Earth and 1 Water characters. Even though 4/5 of character matchups favor the Fire characters, nothing on their team can beat the one Water character, and eventually it will work its way through their whole team and win.
Once you understand how the rules work, general strategy is:
- Choose high-Power characters.
- Try to be reasonably balanced elementally.
- Try to counter the enemy team (with 1s against their 6s, a tilt towards the right elements, etc).
- The elemental counter mechanic does not always work as you might expect it to at the team level - rather than thinking about countering your opponent's common elements, you should think of it as targeting your opponent's weak elements. If your opponent has 3 Earth and 2 Water characters, for example, rather than saying 'that team has lots of Earth characters, I should bring high-power Fire characters to beat them' (which will get you wiped out by the Water characters), you want to say 'that team has no Fire characters, I should bring high-power Earth characters it won't be able to beat'.
DATASET GENERATION
The games you have access to were played by players grouped together by the games auto-party functionality. These players do not build coordinated teams, so there's no correlation between characters.
However, some characters are more popular than others (most notably the Siren Sorceress, whose infamous costume has won her a large and...enthusiastic...fan base).
Overall, Water characters are somewhat more common and Earth characters somewhat less common. This doesn't affect the game itself, but it means that naive win rate evaluations will make Fire characters look substantially weaker than they are (since the Water characters that counter them are common and the Earth characters that they counter are rare).
The full dataset contains around a million game results. However, Cloud Liquid Gaming's previous data specialist loaded the data into an Excel carelessly and accidentally truncated it at row 65536, so you only received 65535 entries in your data.
(Real-world Data Science Moral: it is very rare for the length of a dataset to naturally be a power of 2/one less than a power of 2. If you see that, you should suspect that your data got cut off at some point.)
PVE LEADERBOARD
Note: all winrates below were Monte-Carlo calculated rather than explicitly derived. Luckily it doesn't look like rankings are close enough for slight Monte Carlo error to matter.
The optimal team for fighting the NPC team consists of Fire 5 (Phoenix Paladin), Fire 6 (Blaze Boy), Water 1 (Arch-Alligator), Earth 5 (Rock-n-roll Ranger), Earth 6 (Greenery Giant).
The most important things to bring (in roughly descending order) were:
- Blaze Boy (Fire6), who can be beaten only by one character on the NPC team (their Tidehollow Tyrant, Water6).
- Greenery Giant (Earth6), who can be beaten only by the NPC's Phoenix Paladin (Fire5), or by losing the tiebreaker to the NPC's Greenery Giant.
- Characters that beat the NPC Tidehollow Tyrant (e.g. Arch-Alligator, Rock-n-Roll Ranger). If you can KO the Tidehollow Tyrant, your Blaze Boy can beat everything else on their team itself.
- Characters that beat the NPC Greenery Giant/Phoenix Paladin (e.g. Phoenix Paladin)
(Nullifying Nightmare, despite its very high overall win rate, is not that good to bring here - it performs poorly against teams with multiple 6s on them).
Entries were:
Entrant(s) | Team | Win Rate |
Optimal Play | Fire5, Fire6, Water1, Earth5, Earth6 | 81.47% |
gjm* | Fire5, Fire6, Water1, Earth1, Earth6 | 80.40% |
Fire5, Fire6, Earth1, Earth5, Earth6 | 76.53% | |
GuySrinivasan | Fire5, Fire6, Water3, Earth5, Earth6 | 72.41% |
Measure, Jemist | Fire6, Water6, Earth5, Earth6, Void5 | 70.01% |
abstractapplic | Fire5, Fire6, Water6, Earth6, Void5 | 66.97% |
Maxwell Peterson | Fire5, Water1, Earth1, Earth6, Void5 | 62.05% |
Yonge | Water6, Earth1, Earth5, Earth6, Void5 | 36.00% |
Random Play | 5 randomly selected characters | 28.55% |
lsusr | Fire3, Water5, Earth2, Earth3, Earth4 | 14.97% |
*After fixing a very entertaining early bug where he set up his code to pessimize his team instead of optimizing it. I like to think that a guy from the opposing team snuck in and offered him a briefcase full of cash to sabotage his employers.
Congratulations to everyone who submitted. Particular shoutouts go to:
- The top answer submitted by gjm, whose answer was extremely close to optimal (Earth5 vs Earth1 is a very close call against this team).
- The second-place answer submitted by
Alumium andsimon, who did extremely well despite submitting a team with a worrying-looking elemental tilt (the lack of Water characters did not end up hurting them much because the NPC team has only one Fire character, Phoenix Paladin with Strength 5, which can be KOd e.g. by your Blaze Boy). - The answer by GuySrinivasan, who suffered a lot by bringing Maelstrom Mage (Water3) instead of Water6 or Water1, but who was the first person not to get tricked into bringing the Nullifying Nightmare.
- The answer by Maxwell Peterson, who suffered a lot by missing Blaze Boy from his team but was the first person to bring along Arch-Alligator (the optimal counter to the opposing team's Tidehollow Tyrant).
For any future players who want to test their performance, you can edit and run these lines of the code to simulate a given team against the NPC team:
enemy_team = [
get_hero_by_stats(heroes, 'F', 5),
get_hero_by_stats(heroes, 'W', 6),
get_hero_by_stats(heroes,'E', 3),
get_hero_by_stats(heroes, 'E', 4),
get_hero_by_stats(heroes, 'E', 6),
]test_team_pve = [
get_hero_by_stats(heroes, 'F', 5),
get_hero_by_stats(heroes, 'F', 6),
get_hero_by_stats(heroes,'E', 1),
get_hero_by_stats(heroes, 'E', 5),
get_hero_by_stats(heroes, 'E', 6),
]res = team_matchup_evaluate(test_team_pve, enemy_team, runs=100000,verbose=True)
or if you aren't familiar with the code, DM me and I can run for you.
PVP LEADERBOARD
Note: all winrates below were Monte-Carlo calculated rather than explicitly derived. Luckily it doesn't look like rankings are close enough for slight Monte Carlo error to matter.
Note: The commentary below should be considered non-final for a few days to give people time to point out that I've misread the teams they submitted/added up win percentages wrong/made other obvious mistakes. If I have messed something like that up I'll have to recalculate, so don't count on victory/defeat until some more eyes have confirmed.
Edited to add: Alumium has been disqualified for cheating. Alumium is a person I know in real life, and have discussed this scenario with while writing it. They made an account on LW in order to submit an answer to this scenario, even though they already knew the rules. As such, they are DISQUALIFIED. The new winner is simon. My apologies to all other players.
The PVP submissions received were (ordered from earliest to latest received):
lsusr: Fire4, Water1, Water5, Earth5, Void5
Measure: Fire1, Water4, Water6, Earth1, Void5
abstractapplic: Fire5, Fire6, Water6, Earth6, Void5
Yonge: Water6, Earth1, Earth5, Earth6, Void5
GuySrinivasan: Fire6, Water5, Water6, Earth6, Void5
Maxwell Peterson: Fire6, Water6, Earth1, Earth6, Void5
gjm: Fire5, Fire6, Water6, Earth6, Void5
Alumium: Fire6, Water6, Earth1, Earth5, Earth6
Jemist: Fire6, Water6, Earth5, Earth6, Void5
simon: Fire2, Fire6, Water1, Water6, Earth6
The most common team consisted of the Nullifying Nightmare, all three 6-power characters, and one 5-power character: abstractapplic, GuySrinivasan, gjm and Jemist all submitted variants on this team (between them choosing all three different elements for their 5-power character), plus Alumium submitted that team earlier before changing it out for a different one.
Win rates were:
simon | Maxwell Peterson | Jemist | abstractapplic | gjm | GuySrinivasan | Yonge | Measure | lsusr | Overall Score | ||
– | DQ | ||||||||||
simon | – | 53.87% | 64.97% | 67.80% | 68.06% | 60.75% | 69.63% | 50.21% | 68.09% | 5.57 | |
Maxwell Peterson | 46.13% | – | 54.18% | 58.78% | 58.53% | 61.90% | 73.82% | 68.37% | 84.80% | 5.50 | |
Jemist | 35.03% | 45.82% | – | 50.46% | 50.49% | 49.26% | 73.09% | 76.70% | 87.46% | 5.10 | |
abstractapplic | 32.20% | 41.22% | 49.54% | – | 50.13% | 50.51% | 67.15% | 64.42% | 86.36% | 4.79 | |
gjm | 31.94% | 41.47% | 49.51% | 49.87% | – | 50.21% | 67.34% | 64.28% | 86.50% | 4.78 | |
GuySrinivasan | 39.25% | 38.10% | 50.74% | 49.49% | 49.79% | – | 54.93% | 57.00% | 88.72% | 4.62 | |
Yonge | 30.37% | 26.18% | 26.91% | 32.85% | 32.66% | 45.07% | – | 77.44% | 75.83% | 3.65 | |
Measure | 49.79% | 31.63% | 23.30% | 35.58% | 35.73% | 43.00% | 22.56% | – | 40.80% | 3.26 | |
lsusr | 31.91% | 15.20% | 12.54% | 13.64% | 13.50% | 11.28% | 24.17% | 59.20% | – | 2.07 |
The four nearly-symmetrical teams did quite well (with the differences between them coming down to which elements countered other teams best), but did not ultimately win.
The Nullifying Nightmare was extremely common but not very strong (since most teams included multiple 6s) - ultimately neither of the top 2 teams included it.
Conditional on this data holding up when more eyes look at it, I believe the victory went to Alumium, who managed to get all three 6s, avoid the Nightmare, include a Power 1 character for the counterpick against strong teams, and have an elemental tilt that helped prey more effectively on some lower-tier teams. Alumium's team was a bit Earth-heavy, but no other team quite managed to compete.
Congratulations to simon, whose confusing inclusion of Oil Ooze (Fire2) instead of either Fire1 or Fire5 cost him some percentage points but who was the only submitter not to include the Nightmare on his team.
Congratulations simon! Once you've figured out what theme (either a general genre or a specific work*) you want to request an upcoming scenario be based on, PM or comment and I'll try to get it to happen. I can't promise it'll happen soon (it'll take some time to write one of these, other people are queued up to publish theirs, and I might end up submitting a Christmas-themed one in December, so you'll end up waiting until some time late this year or early next year).
*Ability to select a specific work is contingent on me being familiar with that work and thinking I can write a scenario based on it.
FEEDBACK REQUEST
I'm interested to hear feedback on what people thought of this scenario. If you played it, what did you like and what did you not like? If you might have played it but decided not to, what drove you away? What would you like to see more of/less of in future?
Thanks for playing! Now, if you'll excuse me, the League of Legends world championship is starting, and I need to go watch North America's finest best least dreadful teams be shamefully routed by teams from countries I've never heard of!
17 comments
Comments sorted by top scores.
comment by abstractapplic · 2021-10-06T17:01:10.446Z · LW(p) · GW(p)
This was extremely good. In particular, I like that you managed to make the challenge tractable to both Analysis and Machine Learning. I also appreciated that you included an explicit Real-world Data Science Moral in the wrap-up; I should try to do that more often.
comment by gjm · 2021-10-05T23:44:09.232Z · LW(p) · GW(p)
Feedback on the scenario: I liked it, but evidently I took the framing story a bit too seriously because I took it to indicate that I would probably be wasting my time trying to understand the game mechanics, when actually they weren't so very complicated. I can't complain about how that worked out for me, given that mindlessly putting the given data into a model-fitting machine and cranking the handle produced a very good result, and if it had seemed more likely that the thing was approachable then I might have avoided it lest it be too much work :-), but I do feel a bit bad about not trying to use my brain a bit more and my computer a bit less.
comment by gjm · 2021-10-05T19:44:54.659Z · LW(p) · GW(p)
I strongly suspect that I got rather lucky; at any rate, my model's predicted win-rate for my team was substantially less than the real ~80%, suggesting that the model didn't do a great job of capturing reality.
I wonder whether lsusr had a sign-error bug similar to mine.
comment by simon · 2021-10-05T18:54:20.871Z · LW(p) · GW(p)
Hmm how about we switch to using a Condorcet method for the PVP ranking?
I screwed up my thinking on whether the sides were different, will add an edit/reply [LW(p) · GW(p)] to my comment on the main post later.
Thanks to Maxwell Peterson for introducing me to R through his post [LW · GW] and code, I hope to continue using R later, ideally with a better gears-level understanding than currently.
Replies from: aphyer↑ comment by aphyer · 2021-10-05T19:11:23.363Z · LW(p) · GW(p)
True, you are the Condorcet winner. :P
Do you know how you ended up with Oil Ooze on your team? I was expecting to trick a lot of people into submitting Nightmare, but I wasn't expecting the Ooze to show up.
Replies from: simon, Alumium↑ comment by simon · 2021-10-05T19:38:40.288Z · LW(p) · GW(p)
The magic black box supplied to me by Maxwell, after I fiddled with it a tiny bit and supplied it with adjusted data, told me that BGNOT was supposedly the strongest team in general, and that the strongest counter to BGNOT was ABGOT. It also claimed that ABGOT was the strongest counter to gjm's BGNPT. I asked the magic black box how well a few candidate teams, including ABGOT, did against the non-secret competitors already posted, and the numbers it gave looked more generally decent for ABGOT than the other candidates (I was looking for broad-spectrum effectiveness more than average effectiveness) and it also said that ABGOT would have a decent winrate against average teams, so I went with it. I would have liked to make a figure of merit and find the top team for that figure of merit but wasn't able to do so in time.
In other words, the magic black box liked Oil Ooze for some reason.
Replies from: maxwell-peterson↑ comment by Maxwell Peterson (maxwell-peterson) · 2021-10-05T21:54:59.115Z · LW(p) · GW(p)
When I read in the main post that the inclusion of Oil Ooze was confusing, I thought my magic box might be the guilty one!
comment by Alumium · 2021-10-06T02:49:25.974Z · LW(p) · GW(p)
I legitimately did not expect to do that well.
If I have done better than others, it is because I have stood on the shoulders of, uh, ogres at least.
I've gotten two morals from this. The good moral is 'Even if you have no idea what is going on, you can still data science at something.'
The bad moral is 'When in doubt, crib other people's notes (only be sure always to call it please 'research').'
Replies from: aphyer, Alumium↑ comment by aphyer · 2021-10-06T23:27:47.292Z · LW(p) · GW(p)
I think you're selling yourself a bit short here. You say that you 'only dropped the Nightmare due to Maxwell Peterson' - but Maxwell himself included the Nightmare on his PVP team!
Only two people submitted PVP teams without the Nightmare on them. One of them was you, and the other was using an analysis he didn't understand that led to him including Oil Ooze on his team for no reason he can discern even after the fact. (Sorry, simon).
If you managed to read through other people's findings and get more use out of them than those people themselves did, I think that leads to a well-deserved victory.
comment by SarahNibs (GuySrinivasan) · 2021-10-05T20:18:05.378Z · LW(p) · GW(p)
I was fairly surprised my PvE team did so well. Why did my heuristics basically work?
How well do individuals do? Multiply. How well do teams do, relative to the product of their members? How well do individuals on a team do against the known opponents, pairwise? Okay try all teams, combine their "vs opponents" scores with "team coherence" scores, pick the max.
Finding how well teams did relative to how their individuals did let me accidentally pick just from balanced teams; asking how well individuals did against the known opponents got rid of Null and biased towards the 6s, 5s, and 1s. I think Maelstrom (Water 3) probably got on there because the team did really well in the data despite Maelstrom's inclusion, which looked the same to me as Maelstrom synergizing well with the rest of the team.
I weighted "individuals vs opposing team" equal to "team synergy"; I wonder if I had left the weights as each individual = each other individual = team synergy, whether that would have correctly axed 2-4s while still axing Null? [tries] Nope, that lets Null in quickly.
Replies from: GuySrinivasan↑ comment by SarahNibs (GuySrinivasan) · 2021-10-05T21:52:35.447Z · LW(p) · GW(p)
Feedback: I liked this one a lot, in theory. I just found myself with less time than I wanted to actually engage. It had a latent structure that was definitely discoverable, but also definitely plenty obfuscated. Even with time constraints the barrier-to-entry was low enough that I could get something nice in pretty quickly. And the differences between individual, team, vs-that-team, and vs-any-team were interesting to mull over.