D&D.Sci 5E: Return of the League of Defenders Evaluation & Ruleset

post by aphyer · 2023-06-09T15:25:21.948Z · LW · GW · 8 comments

Contents

  RULESET
  STRATEGY
  PVE LEADERBOARD
  PVP LEADERBOARD
  FEEDBACK REQUEST
None
8 comments

This is a follow-up to last week's D&D.Sci scenario [LW · GW]: if you intend to play that, and haven't done so yet, you should do so now before spoiling yourself.

There is a web interactive here you can use to test your answer, and generation code available here if you're interested, or you can read on for the ruleset and scores.

RULESET

A character has three stats, Attack, Defense and Range.

NameAttackDefenseRange
Daring Duelist501
Bludgeon Bandit411
Silent Samurai321
Lamellar Legionary231
Granite Golem141
Flamethrower Felon402
Captain Chakram312
Jaunty Javelineer222
Hammer Hurler132
Professor Pyro303
Matchlock Marauder213
Rugged Ranger123
Thunder Tyrant204
Amazon Archer114
Wily Wizard105

Congratulations to abstractapplic, who I think was the first person to identify range differences (albeit with a slight confusion that led to categorizing the 2-range characters as 'long-range' ones).

The most important element of underlying structure is that when playing the game your team will line up in order, with a 'front line', 'mid line' and 'back line' character.  Lower-range characters will go at the front, higher-ranged ones behind them.  If two characters have the same Range, the higher-Defense one will go in front.

For example, if a blue team of Duelist (A5-D0-R1), Samurai(A3-D2-R1), and Archer (A1-D1-R4) plays against a green team of Bandit(A4-D1-R1), Javelineer (A2-D2-R2) and Hurler (A1-D3-R2), the teams will line up like this:

Amazon

Archer

Daring

Duelist

Silent

Samurai

Bludgeon

Bandit

Hammer

Hurler

Jaunty

Javelineer

with each team's frontline next to the opposing frontline, and the midlines and backlines further away.

The game is played as follows:

STRATEGY

Broad general strategy was:

The strongest team types I'm aware of were:

 

The NPC team had:

All three of the main strategies had some potential viability against the NPC team:

PVE LEADERBOARD

Submissions were:

PlayerFrontlineMidlineBacklineWinrate
Optimal PlayHurler (A1-D3-R2)Professor (A3-D0-R3)Tyrant (A2-D0-R4)100%
gjmGolem (A1-D4-R1)Felon (A4-D0-R2)Professor (A3-D0-R3)85%
Optimal 1-2-3Legionary or GolemFelonProfessor, Marauder or Ranger85%
simonBandit (A4-D1-R1)Duelist (A5-D0-R1)Javelineer (A2-D2-R2)64.4%
abstractapplicLegionary (A2-D3-R1)Hurler (A1-D3-R2)Professor (A3-D0-R3)50%
YongeHurler (A1-D3-R2)Marauder (A2-D1-R3)Professor (A3-D0-R3)26.4%
Random play??????17.5%

Two players submitted 1-2-3 teams: 

Two players boldly tried something different:

PVP LEADERBOARD

The below should be considered non-final for a few days so people can check and confirm that I've gotten it right.

Again, most players submitted 1-2-3 teams (and everyone included Professor Pyro):

PlayerFrontlineMidlineBackline
abstractapplicLegionary (A2-D3-R1)Javelineer (A2-D2-R2)Professor (A3-D0-R3)
gjm*Golem (A1-D4-R1)Felon (A4-D0-R2)Professor (A3-D0-R3)
simonLegionary (A2-D3-R1)Captain (A3-D1-R2)Professor (A3-D0-R3)
Yonge*Hurler (A1-D3-R2)Marauder (A2-D1-R3)Professor (A3-D0-R3)

*Note: gjm and yonge did not submit distinct PVP teams, I've used their PVE ones here, which may not have been intended for PVP.

with winrates:

 abstractapplicsimonYongegjmOverall
abstractapplic-52.8%100%80.4%2.33
simon47.2%-51.7%96.5%1.95
Yonge0%48.3%-94.3%1.43
gjm19.6%3.5%5.7%-0.29

The 1-2-3 teams all had similarly durable frontlines, and the same Professor backline, but performed very differently based on their midlines.

While gjm's midline Felon was an excellent pick against the PVE team (able to target their 2-defense frontline and do 2 damage a hit), it fared very poorly here, only able to do 1 damage per hit to the durable frontlines it encountered, and vulnerable to being KOd extremely fast (either once its frontline was defeated or to yonge's team targeting it directly.)  

simon and particularly abstractapplic submitted more durable midlines that performed better overall, and brought Legionary rather than Golem as their frontline (a slight improvement since no-one brought a Duelist).

Yonge's 2-3-3 team helped highlight the midline difference with two characters targeting the opposing midline, doing extremely differently based on how durable the opposing midline was (ranging from a 94% winrate against gjm to guaranteed defeat against abstractapplic).

Congratulations to abstractapplic for winning the PVP overall!  Now abstractapplic gets to specify two scenarios   I have to feel twice as guilty about not having finished abstractapplic's requested scenario yet  abstractapplic gets to feel twice as much pride in their Data Science Skills!

FEEDBACK REQUEST

As usual, I'm interested to hear feedback on what people thought of this scenario.  If you played it, what did you like and what did you not like?  If you might have played it but decided not to, what drove you away?  What would you like to see more of/less of in future?  Do you think the scenario was too complicated to decipher?  Or too simple to feel realistic?  Or both at once?  Do you have any other feedback?

8 comments

Comments sorted by top scores.

comment by gjm · 2023-06-10T01:08:14.967Z · LW(p) · GW(p)

I didn't actually submit a PvP entry. I assume you used my PvE one, but it wasn't intended for PvP use and I am in no way surprised that it came last. I don't particularly object to its having been entered in the PvP tournament, but maybe there should be a note explaining that it was never meant for that?

Replies from: aphyer
comment by aphyer · 2023-06-10T02:32:48.646Z · LW(p) · GW(p)

Fair enough, edited.

Replies from: gjm
comment by gjm · 2023-06-10T17:08:34.338Z · LW(p) · GW(p)

Thanks!

comment by simon · 2023-06-09T16:20:06.204Z · LW(p) · GW(p)

Thanks for the scenario, aphyer. 

I made a last minute PVE change which didn't get into the results, but looks like it would have gotten 64.44% winrate which is still lower than gjm's. Congrats to gjm and abstractapplic. I also had previously changed my PVE selection which also isn't in the results, but that change didn't make any difference - it was still 50%.

Interesting ruleset that has some complicated behaviour, but still allows analysis. I think it was actually quite good in this respect, though even with the extension I didn't really get to a point where I felt I was done.

 If I had continued the analysis, my next thing to look at would have been how different candidate PVP teams, plus yonge's PVP team, interacted with different team compositions (classified according to the groups

which corresponded to range 1, range 2, and range 3+).

Not sure what I would have ended up concluding from this. 

Replies from: aphyer
comment by aphyer · 2023-06-09T16:24:41.071Z · LW(p) · GW(p)

I actually edited to include your PVE change, you did manage a 64% winrate.  Sorry not to give you more time, didn't realize there was work still ongoing.

Replies from: simon
comment by simon · 2023-06-09T17:09:38.354Z · LW(p) · GW(p)

NP aphyer, I didn't ask for any more time, though I was happy to get some extra due to you extending for yonge. I hadn't been particularly focused on it for a while, until trying to get things figured out at the last minute, largely I think due to me having spent a greatly disproportionate-to-value effort on figuring out how to do similarity clustering on a highly reduced (and thus much more random) version of the dataset, and then not knowing what to do with the results once I got them. (though I did learn stuff about finding the similarity clustering, so that was good).

Looks like the clusters I found in the reduced dataset more or less corresponded to:

either an aggressive 2-ranged character or everything fairly tanky (FLR cluster)

tending towards tankier 2-ranged and aggressive 1-ranged (melee) character (HSM cluster, note I had excluded B and D from this dataset)

tending towards more aggression to the back  (JGP cluster)

So now I'm trying to figure out why the observed FLR>HSM>JGP>FLR rock-paper scissors effect occurred...

edit: a just-so story (don't know if real reason):

JGP vs FLR: FLR loses the melee first, then likely loses the 2-range since very squishy, then doomed.

FLR vs HSM: HSM loses the melee first. Then FLR might well lose the 2-range first, depending on initiative. FLR would then be splitting damage, but since HSM's 2-range is already damaged and FLR's tank typically isn't that tanky, HSM's 2 range might well die before FLR's backline? dunno, seems weak explanation

HSM vs JGP:  HSM loses the melee first. But then, the tanky 2-range of HSM tends to last a while, and the tanky melee of JGP doesn't contribute much. Once JGP loses its 2-range, it splits damage between HSM's remaining characters, while HSM focuses and defeats JGP's squishy backline? 

comment by abstractapplic · 2023-06-10T12:42:12.578Z · LW(p) · GW(p)

Reflections on my performance:

I failed to stick the landing for PVE; looking at gjm’s work, it seems like what I was most missing was feature-engineering while/before building ML models. I’ll know better next time.

For PVP, I did much better. My strategy was guessing (correctly, as it turned out) that everyone else would include a Professor, noticing that they’re weak to Javelineers, and making sure to include one as my backmidline.

Reflections on the challenge:

I really appreciated this challenge, largely because I got to use it as an excuse to teach myself to build Neural Nets, and try out an Interpretability idea I had (this went nowhere, but at least failed definitively/interestingly).

I have no criticisms, or at least none which don’t double as compliments. The ruleset was complicated and unwieldy, increasing the rarity of “aha!” moments and natural stopping points during analysis, and making it hard to get an intuitive sense of how a given matchup would shake out (even after the rules were revealed) . . . but that’s exactly what made it such a useful testing ground, and such valuable preparation for real-world problems.

Replies from: gjm
comment by gjm · 2023-06-10T22:30:43.230Z · LW(p) · GW(p)

I think calling anything I did "feature engineering" is pretty generous :-). (I haven't checked whether the model still likes FGP without the unprincipled feature-tweaking I did. It might.)