D&D.Sci 4th Edition: League of Defenders of the Storm
post by aphyer · 2021-09-28T23:19:43.916Z · LW · GW · 31 commentsContents
STORY (skippable) DATA & OBJECTIVES BONUS PVP OBJECTIVE None 31 comments
STORY (skippable)
When you graduated top of your class from Data Science School, you didn't care where you ended up working, you just wanted to find the highest-paid job possible. (Your student loans may have had an impact on this decision).
You were expecting a job on Wall Street, or perhaps some Silicon Valley firm. You were not expecting for Goldman Sachs to be outbid at the last minute by a South Korean e-sports team, looking for a 'data specialist' to assist them in winning tournaments of the entirely original new game 'League of Defenders of the Storm©.'
Your new employers at Cloud Liquid Gaming seem friendly enough. They show you their database of games, and tell you they're looking for assistance in selecting their team for an upcoming tournament.
Then you make the mistake of asking them how the game works. All of them start talking at once:
"Okay, so the first thing you need to understand is that if your HK carry isn't able to scale to swamp fights you won't be able to contest Count Shorna's Curse-"
"No, first you need to explain how itemization works, and how ever since Mike's Malign Maul got nerfed to deal with the Grapeshot Gloom build it's been-"
"You have to start from the beginning! You are all Callers, resolving your disputes by summoning spirits to do battle for-"
"Oh come on, why do we need to care about the lore, it's-"
Things do not get more useful from there. Half an hour later, with very little new information and a gigantic headache, you excuse yourself to look at the data. Perhaps you can get something useful out of that without having to listen to them explain every detail of the game. Apparently they have an important game tomorrow they're looking for advice on, and you're interested to make a good impression on your new employers.
DATA & OBJECTIVES
You've managed to learn a few basic things about how the game works:
- Each team simultaneously selects 5 characters.
- The same character can be selected by both teams.
- The same character cannot be selected twice by the same team.
- Your employers are confident they know which characters their opponents will play at their next game:
- Dire Druid
- Greenery Giant
- Phoenix Paladin
- Quartz Questant
- Tidehollow Tyrant
- They would like you to help them pick out a set of 5 characters that maximizes their chances of winning against that enemy team. You may select any 5 out of the 19 characters available:
- Arch-Alligator
- Blaze Boy
- Captain Canoe
- Dire Druid
- Earth Elemental
- Fire Fox
- Greenery Giant
- Inferno Imp
- Landslide Lord
- Maelstrom Mage
- Nullifying Nightmare
- Oil Ooze
- Phoenix Paladin
- Quartz Questant
- Rock-n-Roll Ranger
- Siren Sorceress
- Tidehollow Tyrant
- Volcano Villain
- Warrior of Winter
- You have data on past games played (which characters were played on each team, and who won) to help you with this.
- An alternative format for the data is available here, for those who prefer a different format and lack the tools to manipulate it easily.
- You've been told that this data is all from games on 'the current patch': this means that the rules of the game won't have changed over the time of this dataset.
BONUS PVP OBJECTIVE
You may also submit a PVP team. I recommend sending it as a PM to me, but if you don't mind other people seeing it you can just put it in your answer. The PVP team with the best overall record (sum of performances against all other submitted teams) will win lots of money eternal glory the right to specify the theme of an upcoming D&D.Sci scenario. (Do you want something based on fantasy dungeon crawling adventurers? Futuristic cyberpunk something-or-other? Superheroes? Harry Potter? I can't guarantee success, but I will try to build a D&D.Sci scenario around whatever theme you are interested in).
I don't want the existence of a PVP objective to incentivize people too strongly against posting findings in the chat, so as an effort to reduce the risk of your findings being used against you: if multiple people submit the same PVP team, I will break the tie in favor of whoever submits it earlier.
I'll aim to post the ruleset and results in one week's time, on October 5th. PVP teams should be submitted by October 4th to give me time to evaluate them. If you find yourself wanting extra time, comment below and I can push these deadlines back.
Edited to add: Working together is allowed, but for the sake of anyone who wants to work alone, please spoiler parts of your answers that contain information about the dataset.
31 comments
Comments sorted by top scores.
comment by abstractapplic · 2021-09-29T20:49:11.122Z · LW(p) · GW(p)
Thank you for making this.
Regular team:
Nullifying Nightmare, Blaze Boy, Greenery Giant, Tidehollow Tyrant, and . . . yeah, okay, Phoenix Paladin.
(I was on the fence about whether the last spot should go to Paladin or Ranger, but when I saw Measure's answer I decided to let hipsterism be the deciding factor.)
Key Insights:
There seems to be a rock-paper-scissors thing going on here: Earthy fighters have an advantage over Watery fighters, Watery fighters have an advantage over Flamey fighters, and Flamey fighters - kinda, sorta, unreliably - have an advantage over Earthy fighters. (And the Nightmare has an advantage over everyone.)
This is relevant because 3/5 of the opposing team is Earthy fighters, including Greenery Giant, who has strength that rivals the Nightmare, and whose presence on a team predicts a ~60% chance of victory.
Teams which are slanted too heavily towards a given element have an extremely low win rate. I can't tell to what extent this is because losing the rock-paper-scissors game hurts you more than winning it helps, and to what extent balance is inherently valuable, so I'm playing it safe and not building an entire team of firestarters (also, there are only two Flamey fighters with non-terrible win/loss ratios).
Tangential insights:
I infer from the format of the alternative list that - absent an extremely tricky fakeout - position doesn't matter: A+B+C+D+E is equivalent to E+D+C+B+A.
Different fighters are used with very different frequencies, but this sampling bias doesn't seem to affect my analysis much.
Eyeballing the correlation matrix, it looks like teams are thrown together randomly; no pairs that always show up together, etc. This makes things much simpler, since I can be confident that (for example) GG's apparent power isn't just because people keep using him alongside NN (or vice versa).
There's a random element here. Existence proof: A+B+C+S+V vs A+E+I+T+V happened twice with different outcomes. Given this, I'd want to push Cloud Lightning Gaming to have the match be best-of-five, to decrease randomness' relevance to the outcome.
I appreciate the omission of letters that would let us (accidentally or otherwise) spell out common swearwords.
PVP team:
DM'd
↑ comment by aphyer · 2021-09-29T21:35:48.402Z · LW(p) · GW(p)
Your team is called Cloud Liquid Gaming. Cloud Lightning Gaming is actually their opponent.
↑ comment by abstractapplic · 2021-09-30T09:50:24.743Z · LW(p) · GW(p)
. . . I feel oddly proud to have continued the tradition of D&D players getting in-universe names wrong.
comment by Yonge · 2021-10-01T21:07:27.470Z · LW(p) · GW(p)
Spoiler protection.
Team colour doesn't appear to have any meaningful imapct on the chances of winning.
There are onl2 games where the opponents combination was played which is too small to draw any conclusions from, however there are 352 games where a team has had 4/5 of these characters.
Teams with Greenery Giant or Nullifying Nightmare in them seem to do unusually well against Dire Druid (Nullifying nightmare does slightly better)
Greenery Giant looks very strong there is no other character that has been part of a winning team > 50% of the time against teams which she has been in, though Lanslide Lord comes close
Nullifying Nightmare and Tidehollow Tyrant seem to do best against Phoenix Paladin
Quartz Questant is also fairly strong, but teams with Blaze Boy in tend to win slightly more often than teams with Quartz Questant in. Greenery Giant and Nullyfing Nightmare also do very well. Rock-n-Roll Ranger however does better than all except Greenery Giant
Against teams with Tidehollow Tyrant in them teams with Greenery Giant, and Rock-n-Roll Ranger do best.
Which suggests the following line up:
- Greenery Giant
- Landslide Lord
- Nullifying Nightmare
- Rock-n-Roll Ranger
- Tidehollow Tyrant
Both times this combination was used it won.
Teams with at least 4 of those characters won 77 percent of times, which suggests this is a strong combination.
I will therefore use the following for both the regular and PvP teams:
- Greenery Giant
- Landslide Lord
- Nullifying Nightmare
- Rock-n-Roll Ranger
- Tidehollow Tyrant
comment by gjm · 2021-10-03T15:17:52.670Z · LW(p) · GW(p)
[An earlier version of this comment used teams that had accidentally been pessimized rather than optimized according to my model, as the result of a sign error. I think I have fixed it, though I make a lot of mistakes and I would be unsurprised if something were still broken, and the teams below might now make more sense. Thanks to Alumium for encouraging me to check for errors, though in fact I was already highly suspicious that I had made at least one.]
[After that, I thought I ought to check for overfitting, which I hadn't done before because I'm lazy, and try a bit harder to avoid it. This resulted in a change to my regular team below. I was prompted to do this by the fact that Maxwell did something broadly similar to what I'd done, but more carefully, and got very different-looking results. I also tried something smarter for the PvP, driven mostly by realising that it would run quite efficiently using sklearn despite feeling expensive.]
Regular team:
ABGLP
PvP team:
BGNPT
I confess that here I just
made a random-forest model, tried all possible hands against DGPQT for the regular team, and ran some tournaments using that model to adjudicate for the PvP team. I haven't made any attempt to use my brain to find any patterns, and it's entirely possible that I've missed important things.
I hadn't looked at anyone else's comments before writing the first version of this. I did look at others' comments between writing the first version of the above and correcting a sign-error bug, but I have made no other changes in my code. (In particular, the process that produced my PvP entry is in no way adapted to the entries others have already submitted.)
There is still an excellent chance, despite my having fixed one catastrophic bug, that I have made boneheaded errors that result in my teams actually being pessimized rather than optimized, having everyone off-by-one in the list of heroes, etc. :-)
My decision to do this fairly brainlessly wasn't purely the result of laziness. OP is clearly intended to indicate that the game is super-complicated, so I wasn't optimistic about the chances of spotting enough patterns to do better than a brainless approach. A few more technical details in case my entries turn out to do well enough or badly enough that anyone wants to learn from my success or failure:
The random forest model was built using the RandomForestClassifier class in the scikit-learn library for Python. It uses an ensemble of size 1000 and other parameters had their default values. It's rather slow so for the tournament I used a cheaper random forest model. For PvP, what I did at first was to run 10 sub-tournaments with 30 players in each, initially selected at random. Each tournament is played round-robin and then three badly-performing hands are replaced by mutated versions of three well-performing hands. Scores in the tournament are not pure win/loss but are the total of the win probabilities according to the random forest model. After every 10 rounds in each tournament, I play another tournament using 3 randomly chosen players from each sub-tournament, again doing the replacing thing (so we get a little bit of mixing between sub-tournaments, but hopefully not so much that all diversity is lost too quickly). I ran a couple of hundred iterations of this and it seemed to have more or less converged. To choose my final PvP hand, I ran a single tournament on the best three hands from each subtournament using the more expensive and hopefully better random-forest model. -- But afterwards I thought better of this and decided instead to use the "multiplicative weights update" approach: just take all the possible teams and give each an initial "weight" of 1, then repeatedly play each against a random opponent and nudge its weight up or down according to the result, and choose the random opponents proportionally to the weights. Exactly what team comes out on top depends on exactly how you do the weight-nudging.
↑ comment by Alumium · 2021-10-03T17:03:13.152Z · LW(p) · GW(p)
Man, it’s hard to say “check your work” without being rude. I understood about half of what you said, so your analysis was almost certainly better than my very basic one. But according to my basic analysis you’ve submitted an elementally imbalanced team which manages to also be weak. Um, no offense. Sorry. It’s just I’d want someone to correct me if I looked wrong.
Replies from: gjm, gjm↑ comment by gjm · 2021-10-03T18:59:17.510Z · LW(p) · GW(p)
No offence taken. I did say there's an excellent chance that I've made boneheaded errors, though, and I stand by that :-). I may or may not find time before the deadline to look over what I did and see if there are any errors boneheaded that even I can find them[1].
[1] I don't mean that I'm exceptionally stupid; the point is that one's own errors are usually extra-hard to spot.
comment by Maxwell Peterson (maxwell-peterson) · 2021-10-04T00:06:51.909Z · LW(p) · GW(p)
I had a great time analyzing this! Thanks for making it. I wrote up a post with my analysis: https://www.lesswrong.com/posts/tWQF3LBkaQDK8aqcG/an-analysis-of-the-less-wrong-d-and-d-sci-4th-edition-game
comment by SarahSrinivasan (GuySrinivasan) · 2021-10-02T05:22:21.298Z · LW(p) · GW(p)
For my upcoming Cloud Liquid Gaming team, I choose:
Blaze Boy
Greenery Giant
Maelstrom Mage
Phoenix Paladin
Rock-n-Roll Ranger
I'm not super happy about it, because I know nothing of how the matchups actually work. This is just a product of a kinda ad-hoc value function. How well do individuals do? Multiply. How well do teams do, relative to the product of their members? How well do individuals on a team do against the known opponents, pairwise? Okay try all teams, combine their "vs opponents" scores with "team coherence" scores, pick the max.
For my PvP team, I choose to PM our benevolent creator.
comment by Measure · 2021-10-02T12:12:57.544Z · LW(p) · GW(p)
As abstractapplic noticed, there are three elemental groups of six champions (earth, fire, and water) plus the nightmare, and mixed team comps do much better than something like 5x fire. My guess is that the group of strongest individual champions just happens to be fairly balanced between the elemental groups and that the elemental advantages aren't strong enough to make a big difference vs just picking the strongest champions (I hadn't even noticed the elemental groups when I picked my teams, and I still picked a balanced team with basically the same comp as everyone else).
comment by Pattern · 2021-09-30T02:12:10.818Z · LW(p) · GW(p)
If you wanted to make it harder you could say that: right before the start of the game, the enemy team may 'ban' X characters from the list.* So, you should rank enough teams as choices, that
a) They'll be able to pick a team, whatever is banned
b) That team will do well together.
*The list you were given for the enemy team is what they'll pick after your team issues its ban (they know what they'll ban in advance).
Replies from: aphyercomment by Maxwell Peterson (maxwell-peterson) · 2021-10-05T02:28:11.524Z · LW(p) · GW(p)
Was anyone able to use bayesian probability theory on this problem directly to do things besides 1-v-1 hero matchups? If so, what was your approach? I couldn’t figure out how to start.
comment by J Bostock (Jemist) · 2021-10-04T11:14:45.755Z · LW(p) · GW(p)
Using python I conducted a few different analyses:
Proportion of character wins vs other characters:
Proportion of character wins when paired with other characters:
With these I gave each possible team a score, equal to the sum over characters (sum over enemy characters(proportion of wins for said character) + sum over other teammates(proportion of wins when paired with said character)), and the highest scoring team was:
Rock-n-Roll Ranger, Blaze Boy, Nullifying Nightmare, Greenery Giant, Tidehollow Tyrant
This was much more pleasant than using Excel! I think I might try and learn R or some other dedicated statistical language for the next one.
PvP team (without having the time to estimate anything about my enemies' teams so highly likely to get countered) has actually come out the same. There's a good chance something is up with my analysis or my method is too biased towards synergistic teams.
Tidehollow Tyrant, Rock-n-Roll Ranger, Nullifying Nightmare, Greenery Giant, Blaze Boy
comment by simon · 2021-10-04T08:13:48.925Z · LW(p) · GW(p)
I did some initial analysis and initially came up with the same team as Measure and Alumium. It's also what abstractapplic was on the fence on (but chose differently). Remarks on the initial analysis:
the water, fire and earth groups remarked on by abstractapplic are, it seems to me, more easily noticeable via their (anti-)synergies than via their counters, which seem somewhat hit and miss, presumably due to more individual countering effects. The groups are:
Fire: BFIOPV
Water: ACMSTW
Earth: DEGLQR
and Nullifying Nightmare is separate from each.
The enemy team is, recall, DGPQT
My initial pick was BGNRT was based on green side counters (imagining the enemy on the blue team). GNT are generically strong, R counters DQT on the enemy team, and B counters D and P and (weakly) Q, is not as bad against G as most heroes, and is not as badly stomped by T than other fire characters.
The synergies did not seem important enough to affect the team selection particularly much.
Then I saw Maxwell Peterson's post [LW · GW] and downloaded his gradient-boosting R code. I suggest reading his post first. Remarks based on my use of this code:
Maxwell used 350 training steps, based on green and assuming the enemy to be blue. His code outputted AGLNP as the best counter to the target enemy team DGPQT.
My remarks:
The validation is still improving when Maxwell stops training in the code as supplied in that post. So I increased the number of training steps. The risk is that it will just overfit more, but I assume that if that was a big problem it would make the validation worse?
Changing to 1000 training steps results in the code outputting AGLNO as the best counter to the enemy team, with AGLNP as the fourth pick.
The problem as supplied by aphyer does not seem to specify as to whether we are Green or Blue. So we should probably be prepared for both (and this especially applies for PVP). I also wondered if the apparent differences were due to random chance. So, I created a flipped version of the initial csv file (changing all green characters to blue and vice versa, and also the wins). Maxwell's code run on this flipped version outputted LOPRT as the best counter to DGPQT - quite the change!
I also created a merged csv file from the original and flipped files and 2 cut csv files from the merged file which each contain some entries from the original and some from the flipped file (but no overlap between each other). Validation scores on the cut csv files were worse than on the flipped and original csv files, confirming that side does matter (unless I misimplemented the flip, which would embarrassingly invalidate all my subsequent analysis...). However, even though side does matter, since we don't know what side we are on, I figure we are mainly interested in averaged data so used the merged file anyway. I also increased the number of steps to 1500 for the merged model and doubled the cutoff indices for the validation (because 2x the data).
The code now outputs BGLPR as the best counter to DGPQT. So, that's my choice (unless I change it later). It also happens to be one off from GuySrinivasan's choice (who picked M instead of L).
For PVP, against the average team the code now recommends BGNOT. However, I'd like to modify it to be more specialized against likely opponent teams. This will take some time, possibly more than I can spare, since I have not used R before. If I don't have time to adjust it, I will go with BGNOT.
edit: now switching to ABGOT as the PVP team. According to the code this counters my above choice, as well as gjm's pick of BGNPT (sorry gjm). Also seems to do OK against others who have published PVP teams (a good reason not to publish them). Attempts to get a better value estimator for PVP teams and apply it to all possible choices have thus far been thwarted by my unfamiliarity with R.
Current choices:
BGLPR for main answer
ABGOT as PVP team
↑ comment by Maxwell Peterson (maxwell-peterson) · 2021-10-04T16:51:33.215Z · LW(p) · GW(p)
Woah! The effect of flipping is disturbing. Interesting find.
↑ comment by simon · 2021-10-06T01:13:27.712Z · LW(p) · GW(p)
Post-mortem on my thinking about the sides being asymmetrical:
In order to determine whether there was symmetry, I applied the model to the following datasets and compared the validation scores:
- The original data set
- A flipped version of the data set
- a "cut" version of the data set where some of the data points were flipped and others not (should contains all games in one form or the other, and not have any of the same games)
- the complement of the above (everything flipped rather than 3
On finding that 1 and 2 had better validation scores than 3 and 4, and the gap between 1/2 and 3/4 was larger (but really not all that much larger!) than the gap between 1 and 2 or 3 and 4, I declared that there was asymmetry.
But, really this was totally invalid, because, 1 and 2 are isomorphic to each other under a column swap and bit flip (as pointed out by Maxwell) and while this transformation may affect the results it should not affect validation scores if the algorithm is unbiased, up to random variation if it has random elements (I don't know if it does, but the validation scores were not actually identical). Likewise, 3 and 4 should have the same validation scores. On the other hand, 1/2 are not isomorphic to 3/4 up to such a transformation and so have no need to have the same validation scores. So there was a 50% chance of the observed result happening by chance.
Even if my method would have worked the way I had been thinking**, it would be a pretty weak* test. So why was I so willing to believe it? Well, in my previous analysis I had noticed differences in the sides, which might or might not be random, particularly the winrate for games with Greenery Giant on both sides. In such matchups, green wins 1192, (ironically) much less than blue's 1308. This is not at all unlikely (less than 2 sigma (which I didn't check), and many other possible hypotheses) but this plus the knowledge that League of Legend's map is, while almost symmetrical, not perfectly so, led me to have a too-weak prior against asymmetry when going into poking Maxwell's magic box.
Regardless of this mistake, I do think that my choice to create and use a merged data set including the original data and the flipped data was correct. Given that we either don't care about asymmetries or don't believe they exist, the ideal thing to do would be to add some kind of constraint to the learning algorithm to respect the assumed symmetry, but this is an easier alternative.
*Edit: in the sense of providing weak Bayesian evidence due to high false positive rate
**Edit: which I could have made happen by comparing results from disjoint subsets of the data in each of 1 and 2, etc.
↑ comment by Maxwell Peterson (maxwell-peterson) · 2021-10-04T18:04:24.836Z · LW(p) · GW(p)
Actually, I’ve thought about it more, and I don’t think it’s possible for a flip to change the predictor like this. Flipping like this is equivalent to swapping the first 19 columns of the matrices with the latter 19 columns, and bit-flipping the response. This should end up giving prediction vectors v_flip such that 1 - v_flip = v_original. So my money is currently on something being off with the code that flips.
Replies from: simon↑ comment by simon · 2021-10-04T23:30:41.253Z · LW(p) · GW(p)
I don't understand the details of how your code works very well, but wouldn't 1-v_flip = v_original be what you would get with just flipping the response without swapping columns?
Also, I spot-checked the flipped csv file and didn't see any discrepancies.
Replies from: maxwell-peterson↑ comment by Maxwell Peterson (maxwell-peterson) · 2021-10-05T00:40:38.676Z · LW(p) · GW(p)
Yup, it would be - I guess I’m saying that I don’t think swapping the columns has any effect. To the model, during training, it is just 38 unnamed columns. Swapping the first 19 with the last shouldn’t do anything? Weird weird weird
Replies from: simoncomment by Alumium · 2021-10-02T06:54:09.630Z · LW(p) · GW(p)
Never tried one of these before. I have no idea what I'm doing so we'll see how it goes!
I swear I did the working on my own, but I am producing exactly the same team as another player. I guess that's a good sign?
I've edited my teams based on newer analysis. If this is cheating, just use the old ones!
Regular team:
Blaze Boy, Greenery Giant, Nullifying Nightmare, Rock-n-Roll Ranger, Tidehollow Tyrant
Blaze Boy, Greenery Giant, Landslide Lord, Rock-n-Roll Ranger, Phoenix Paladin
How I got it:
I looked at each champion's win-rate against every other champion. Then I took the 5 champions on their team and looked which champions had the highest mean winrate against all champions on the opponent's team.
Coincidentally, there are exactly 5 champions that have a >50% winrate by this criterion. So I'm bringing all 5 of them as my team.
PvP team:
The same as my regular team, embarrassingly enough.
Greenery Giant, Rock-n-Roll Ranger, Landslide Lord, Blaze Boy, Tidehollow Tyrant
How I got it:
I decided to iterate the process I used to beat the opponent's team. And discovered that the 5 best champions against that team were, in fact, that team.
I feel like this shouldn't happen. Maybe I'll look more at this tomorrow and change my mind.
↑ comment by Alumium · 2021-10-03T03:54:55.626Z · LW(p) · GW(p)
Some kind of rock-paper-scissors does seem likely.
5 watery fighters as an input team causes my program to output a team that loses Blaze Boy in favor of Quartz Questant. 5 earthy fighters loses Tidehollow Tyrant in favor of Phoenix Paladin.5 fiery fighters loses Blaze Boy in favor of Warrior of Winter.
That last one is strange. I'd expect to lose Greenery Giant there. I guess he's just that strong? Or fire is kind of bad? (Or, I have been cunningly tricked because I am bad at this).
The NPC team looks like 3 earth 1 fire 1 water. So normally you'd want more fire to deal with this. But fire is bad, so instead we bring in two strong neutral(?) fighters, the Nightmare, which just seems awesome, and the Rock'n'Roll Ranger, which is still pretty good.
Balanced teams seem important, so cutting any of our 3 strong elemental heroes seems like a bad idea. It's possible the weaker untyped hero should be swapped out for a strong fiery hero; that is, Rock'n'Roll Ranger swapped out for Phoenix Paladin, which is... also a team someone has already submitted.
I think I've made a mistake guessing elements. Because I count 6 water, 5 earth, 5 fire, and 3 neutral. I'd expect 5,5,5,4. Maybe it's 6,6,6,1?
Oil Ooze is what, pollution? Poison? Could be earth? I guess? I'm picturing Hexxus from Ferngully, honestly.
Rock'n'Roll Ranger - um, sonic?, although I guess rangers are nature associated so it could be earth again (maybe)?
Nullifying Nightmare - uh, dreams? If there is only one neutral fighter, it's probably this one.
Man, maybe one of the watery fighters isn't water? Probably the Siren Sorceress as another sonic?
Or maybe the elements are just uneven and I'm reading way too much into this.
↑ comment by Alumium · 2021-10-04T16:15:07.349Z · LW(p) · GW(p)
Okay, I refined my approach somewhat.
I decided to search by pairs or triads of players in the opposing team, rather than just individual fighters (because that might have me selecting the overall strongest team, not a tailored one).
Using pairs, the first-level solution is in fact the same (Blaze Boy, Greenery Giant, Nullifying Nightmare, Rock-n-Roll Ranger, Tidehollow Tyrant). However, when I iterate this, I find something odd.
Nightmare is replaced by Landslide Lord.
Now, this is odd, since Nightmare is great, but a possible solution is revealed by @Maxwell Peterson's awesome analysis. Against specific (strong) characters, the Nightmare falls off. The NPC team includes only 2 or 3 of these, so Nightmare is still good, but my anti-NPC team contains 4-ish, so the Nightmare is countered.
I realize this is rather unsporting, but if possible I'd like to change my PvP team to the new combination, as it seems likely that most PvP teams will be fairly strong.
When I attempted to search by triads, I somehow emerged with the impossible result that the only winning fighter was Warrior of Winter. Still searching for the bug. Caught the mistake.
By triads, my first level solution loses Nightmare for Landslide Lord, and Tidehollow Tyrant for Phoenix Paladin. I don't know how much I trust this, since elemental balance has seemed important, but someone already submitted the team I was using so let's be different!
My new chosen team to beat the NPCs is Greenery Giant, Rock-n-Roll Ranger, Blaze Boy, Phoenix Paladin, Landslide Lord.
Iterating this is much more interesting. We bounce around between elementally balanced teams (Tidehollow Tyrant, Greenery Giant, Landslide Lord, Blaze Boy, Arch-Alligator), and teams with nothing but 5 earth-types. Nightmare is nowhere to be seen, suggesting that cutting it from PvP is almost certainly correct.
Reassuringly, the end result of this iteration is the same as that of the pair search above, so I'm confident this is a good PvP team. Amusingly, I think the earth-focused elementally balanced team the NPCs went with is the correct PvP choice; they just didn't get it quite right.