D&D.Sci Long War: Defender of Data-mocracy Evaluation & Ruleset

aphyer

D&D.Sci Long War: Defender of Data-mocracy Evaluation & Ruleset

post by aphyer · 2024-05-14T03:35:10.586Z · LW · GW · 3 comments

  RULESET
  STRATEGY
  LEADERBOARD
  REFLECTION & FEEDBACK REQUEST
None
3 comments

This is a follow-up to last week's D&D.Sci scenario [LW · GW]: if you intend to play that, and haven't done so yet, you should do so now before spoiling yourself.

There is a web interactive here you can use to test your answer, and generation code available here if you're interested, or you can read on for the ruleset and scores.

RULESET

Each alien has a different amount of HP:

Alien	HP	Threat*
Swarming Scarab	1	1
Chitinous Crawler	3	2
Voracious Venompede	5	3
Arachnoid Abomination	9	4
Towering Tyrant	15	5

*Threat has no effect on combat directly - it's a measure of how threatening Earth considers each alien to be, which scales how many soldiers they send. (The war has been getting worse - early on, Earth sent on average ~1 soldier/4 Threat of aliens, but today it's more like 1 soldier/6 Threat. The wave you're facing has 41 Threat, Earth would send on average ~7 soldiers to it. Earth doesn't exercise much selection with weapons, but sends soldiers in pairs such that each pair has two different weapons - this is a slight bias towards diversity.)

Each weapon has a damage it deals per shot, and a rate of fire that determines how many shots it can get off before the wielder is perforated by venomous spines/dissolved into a puddle of goo/voraciously devoured by a ravenous toothed maw:

Weapon	Damage	Min Shots	Max Shots
Macross Minigun	1	5	8
Fusion Flamethrower	1	3	12
Pulse Phaser	2	4	6
Rail Rifle	3	3	5
Laser Lance	5	2	5
Gluon Grenades	7	2	3
Thermo-Torpedos	13	1	3
Antimatter Artillery	20	1	2

Each soldier will be able to fire a number of shots chosen randomly between Min Shots and Max Shots - for example, a soldier with a Laser Lance will have time to fire 1d4+1 shots, each doing 5 damage.

During a battle, humans roll for how many shots each weapon gets, and then attempt to allocate damage from their shots to bring down all aliens. If they succeed, the humans win - if not, the humans lose. While doing this optimally is theoretically very difficult, your soldiers are well-trained and the battles are not all that large, so your soldiers will reliably find a solution if one exists.

For example, if you are fighting two Towering Tyrants and two Swarming Scarabs using two soldiers:

If you bring one soldier with Antimatter Artillery and one with a Macross Minigun, the Minigun soldier will reliably kill the Scarabs and have 3-6 shots left over (not enough to kill a Tyrant). The Artillery soldier will get either 1 or 2 shots: half the time they will roll a 2, kill both Tyrants and you will win, while the other half they will roll a 1, a Tyrant will survive and you will lose.
You can do a little better by bringing one soldier with Antimatter Artillery and one with a Laser Lance. The Laser Lance rolls 2-5 shots - it will always kill both Scarabs, and 1/4 of the time it will roll 5 shots and also be able to kill a Tyrant (at which point you'll win even if the Antimatter Artillery rolls a 1), giving you a 5/8 winrate overall.
You can do better still by bringing one soldier with Thermo-Torpedos and one with a Pulse Phaser. The Phaser soldier gets at least 4 shots, with which they kill both Scarabs and do 2 damage to each Tyrant (dropping the Tyrants both to 13 HP). And the Torpedo soldier gets 1-3 shots, with a 2/3 chance of being able to kill both Tyrants now that they've been softened up. I believe this is the best winrate you can get in this example.

STRATEGY

The most important element of strategy was sending the right kind of weapons for each alien: high-health aliens like Tyrants are extremely inefficient to kill with light weapons like Miniguns, while small, numerous aliens like Scarabs are extremely inefficient to kill with heavy weapons like artillery.

There were a few subtler elements of strategy:

Some weapons are higher/lower variance than others.
- A Flamethrower fires more shots on average than a Minigun, but with higher randomness that leads to it sometimes doing very poorly.
- Gluon Grenades and Pulse Phasers are subpar weapons on average, but have an extremely low variance (if all weapons roll minimum shots, Gluon Grenades strictly outperform both Laser Lances and Thermo-Torpedos).
- This was valuable to understand in connection with different numbers of soldiers: at high numbers of soldiers, with near-guaranteed wins, it was valuable to use low-variance weapons to reduce risk even further, while at lower numbers of soldiers the average performance mattered more (and at very low numbers where winrates were below 50% the variance could be actively good).
Some aliens could be killed by combinations of weapons more efficiently than by either weapon in isolation:
- For example, Thermo-Torpedos are only okay against Tyrants in a pure 1v1 despite being the second-heaviest weapon available, requiring two of their 13-damage hits to kill one 15-HP Tyrant. A single soldier with Thermo-Torpedos has a 2/3 chance to beat one Tyrant, while a single soldier with a Laser Lance has a 3/4 chance.
- However, if you have spare shots from some lighter weapon, Thermo-Torpedos become much better against Tyrants, able to point one 13-damage hit at a Tyrant and then leave it for a smaller shot to finish off.

Optimal play used different weapons based on how many soldiers you brought:

It was possible to guarantee victory using 8 soldiers. To 100%-guarantee victory, you needed to assume every weapon would fire its minimal number of shots, at which point you could win by bringing exactly:
- 3 Antimatter Artillery, firing 3 times and killing the three Tyrants.
- 2 Gluon Grenades, firing 4 times to kill the Venompede and reduce the three Abominations to 2HP.
- 1 Pulse Phaser, firing 4 times to finish off the 3 Abominations and kill one Scarab.
- 1 Rail Rifle, firing 3 times to kill the 2 Crawlers and one Scarab.
- 1 Macross Minigun, firing 5 times to clear out the remaining Scarabs.
At 7 soldiers, it's no longer possible to completely guarantee success, and we select weapons based more on average performance and less on minimum performance, swapping out the Gluon Grenades and Pulse Phaser (which were mostly helping us just by guaranteeing decent performance rather than by being good overall) and bringing in a Laser Lance and Thermo-Torpedos (very powerful weapons, but with more variance). This squad manages the best possible 95.4% winrate:
- 2 Antimatter Artillery.
- 2 Thermo-Torpedos.
- 1 Laser Lance.
- 1 Rail Rifle.
- 1 Macross Minigun.
At 6 soldiers, as our odds get worse, we swap away from the Minigun to the Flamethrower for clearing out Scarabs: it's higher-variance, but also fires more shots on average, making it an unnecessary risk when our winrate is high but more valuable when our odds aren't near-perfect (going up from a 50% chance to a 60% chance of 'killing 7 Scarabs with no help from other weapons', and sometimes even being able to finish off other aliens). This squad manages the best possible 69.7% winrate:
- 2 Antimatter Artillery.
- 1 Thermo-Torpedos.
- 1 Laser Lance.
- 1 Rail Rifle.
- 1 Fusion Flamethrower.
At 5 soldiers, our odds continue getting worse, down to a maximum 27.9%:
- 2 Antimatter Artillery.
- 1 Thermo-Torpedos.
- 1 Laser Lance.
- 1 Fusion Flamethrower.
And at 4 soldiers things are worse still - almost all teams are mathematically guaranteed to lose, and the best we can do is a 2.1% winrate with:
- 1 Antimatter Artillery.
- 1 Thermo-Torpedos.
- 1 Laser Lance.
- 1 Fusion Flamethrower.
With 3 or fewer soldiers, there is no squad with any non-zero winrate.

Random play would result in a lower winrate:

Soldiers	Random Winrate	Optimal Winrate
4	0.02%	2.08%
5	2.68%	27.92%
6	21.38%	69.65%
7	54.45%	95.43%
8	80.26%	100.00%
9*	92.74%	100.00%
10*	97.57%	100.00%

*Earth is unlikely to send this many soldiers with the war as it currently stands.

At current troop levels, Earth would usually send 6-8 soldiers to this mission, and almost always send 5-9:

Soldiers	Chance to Send
5	6.5%
6	33.0%
7	35.8%
8	17.4%
9	6.3%
10	<0.1%

The interactive will be evaluating your submissions accordingly.

LEADERBOARD

Submissions were:

abstractapplic submitted a 7-soldier squad with 3 Artillery, 1 Torpedo, 2 Lances, and 1 Minigun.

This was not quite the optimal submission, but was fairly well-optimized and ended up very close, with a winrate of 94.40% (compared to 95.43% with the optimal 7-soldier squad).

Yonge and Unnamed both used the small battles to figure out what could reliably beat each alien in isolation, and submitted 10-soldier squads with separate soldiers for each alien type:

Yonge: 3 Artillery, 3 Torpedos, 1 Grenades, 1 Lance, 2 Miniguns.
Unnamed: 3 Artillery, 3 Grenades, 1 Lance, 1 Rifle, 2 Miniguns.

These teams both manage a 100% winrate (albeit while sending a larger squad than the PGFDA would usually send).

Measure said this:

According to my model, for larger numbers of soldiers, you don't need a specific anti-Scarab weapon. It's slightly more important to make sure you have a good matchup against the Tyrants.

and boldly submitted a 6-soldier squad with 4 Artillery and 2 Lances.

Unfortunately for Measure, Scarabs were not in fact a non-threat, and while this squad has excellent handling of Tyrants and Abominations, plus some Lances for the Crawlers and Venompedes, it doesn't have anything that can deal efficiently with the Scarabs, and ends up with only a 9.38% winrate. My condolences to Measure.

qwertyasdef submitted a 7-soldier squad armed with 7 Thermo-Torpedoes. (Apparently this was the result of a model that assumed any given weapon added some amount to your winrate based on which aliens were present, but didn't include parameters for other weapons you already had).

Sadly, this again ends up with a lot of large weapons for the bigger aliens but nothing to handle Scarabs, and has only a 1.65% winrate.

REFLECTION & FEEDBACK REQUEST

One thing I was trying to push with this scenario was 'look at the simple cases first to investigate the underlying world/ruleset, and after that try to apply what you've learned to complicated cases like the one at hand'. I deliberately didn't impose any minimum size on battles, to ensure that there were a lot of extremely simple 'one soldier and one Tyrant'-type battles from which players could try to derive rules, rather than just immediately looking for 'battles that look like the one we're headed into' to try to find what did well there.

I'm reasonably happy with how this part went - I saw several players explicitly analyzing the simple cases, and didn't see anyone doing the 'jump straight to this battle' approach. On the other hand, a couple players seemed to end up tricked by one thing or another into submitting worse-than-random teams. How did this feel from the player side?

As usual, I'm also interested to hear more general feedback on what people thought of this scenario. If you played it, what did you like and what did you not like? If you might have played it but decided not to, what drove you away? What would you like to see more of/less of in future? Do you think the scenario was too complicated to decipher? Too simple to have anything interesting/realistic to uncover? Or both at once? Do you have any other feedback?

3 comments

Comments sorted by top scores.

comment by abstractapplic · 2024-05-14T08:45:24.072Z · LW(p) · GW(p)

Reflections on my performance:

I'm pleasantly surprised by the effectiveness of my reasoning, and of my meta-reasoning. Not only did my loadout do well, but my calibration was impressively close: the final decision I pegged at a "~95%" success rate got 94.4%, and most of the alternative strategies I mentioned in my post [LW(p) · GW(p)] were similarly on-the-nose.

(Unfortunately, my meta-meta-reasoning could still use some work. I figured out that this was a "linear-ish logistic success model with some interactions on top" kind of problem, took this as an opportunity to test that library I made [LW · GW], created a good predictor with a bunch of pretty/informative graphs . . . and then found myself thinking "only need one minigun? doesn't sound right to me", "why would Tyrants/Artillery and Scarabs/Minigun-or-Flamethrowers be so much stronger than every other potential feature interaction?", and "I'm totally gonna turn out to have screwed up and wish I'd handled this with XGBoost, better not even mention how I built my model". If I'd been more calibrated about how calibrated I ended up being, this could have been a really good chance to show off by calling in advance that my unconventional ML approach would succeed here.)

Reflections on the challenge:

This was the 2D performance thing I tried to pull off in Boojumologist [LW · GW], but with better conceptual underpinning and flawless execution. I'm proud, gladdened and envious: can't think of a single way to improve this scenario.

(I, uh, may be biased by how well I happened to do: please take this feedback with a grain of salt.)

comment by MathiasKB (MathiasKirkBonde) · 2024-05-14T20:59:20.891Z · LW(p) · GW(p)

Just played through it tonight. This was my first D&D.Sci, found it quite difficult and learned a a few things while working on it.

Initially I tried to figure out the best counters and found a few patterns (flamethrowers were especially good against certain units). I then tried to look and adjust for any chronology, but after tinkering around for a while without getting anywhere I gave up on that. Eventually I just went with a pretty brainless ML approach.

I ended up sending squads for 5 and 6 which managed a 13.89% and 53.15% chance of surviving, I think it's good I'm not in charge of any soldiers in real life!

Overall I had good fun, and I'm looking forward to looking at the next one.

comment by qwertyasdef · 2024-05-14T23:20:45.904Z · LW(p) · GW(p)

I'm not surprised my submission did badly since it was the easiest thing I could quickly come up with after seeing that I was already late. I wasn't quite expecting to be unable to come up with anything better though. After looking at other people's comments I'm particularly disappointed that it never once crossed my mind to try analyzing single-soldier combats. I was explicitly trying to figure out the effect of one soldier of each weapon, and I had a histogram of the number of soldiers per combat from which I could have easily gleaned that there were lots of single-soldier combats to investigate had I thought to do so, but instead I tried to analyze the win rates of (some combination of weapons) vs (some combination of weapons) + (1 more of the weapon I'm trying to investigate) and running into trouble with the fact that that extra soldier is also correlated with an increased alien threat and didn't know how to tease the two effects apart.

D&D.Sci Long War: Defender of Data-mocracy Evaluation & Ruleset

Contents

RULESET

STRATEGY

LEADERBOARD

REFLECTION & FEEDBACK REQUEST

3 comments