DeepMind on Stratego, an imperfect information game

sanxiyn

DeepMind on Stratego, an imperfect information game

post by sanxiyn · 2022-10-24T05:57:39.462Z · LW · GW · 9 comments

This is a link post for https://arxiv.org/abs/2206.15378

9 comments

Playing an imperfect information game well has been a challenge for AI. DeepMind reports their (IMHO impressive) work on Stratego, an imperfect information game.

9 comments

Comments sorted by top scores.

comment by Shmi (shminux) · 2022-10-24T06:28:03.886Z · LW(p) · GW(p)

I am confused why it stopped at the human level:

DeepNash was tested against top human players for two weeks in early April 2022, yielding 50 ranking matches in which DeepNash won 42%.

instead of self-improving to beat even the best human player every time.

Replies from: paulfchristiano, ChristianKl

↑ comment by paulfchristiano · 2022-10-24T06:46:25.573Z · LW(p) · GW(p)

The quote is:

DeepNash was evaluated against top human players over the course of two weeks in the beginning of April 2022, resulting in 50 ranked matches. Of these matches, 42 (i.e. 84%) were won by DeepNash

Given the game has imperfect information, it's not clear you should expect to be able to win much more than that. (I haven't played much Stratego but I would have guessed that a reasonably strong player going for high-variance strategies could beat God 10-20% of the time.)

Replies from: shminux

↑ comment by Shmi (shminux) · 2022-10-24T07:56:34.918Z · LW(p) · GW(p)

Hmm, so is this one of those games where a novice can beat an expert a significant fraction of time, because of the imperfect information? Is there a theoretical upper limit for percent wins for the perfect player vs best human player?

Replies from: sanxiyn, paulfchristiano, Dagon

↑ comment by sanxiyn · 2022-10-24T09:01:16.847Z · LW(p) · GW(p)

I am a Stratego player, and the answer is no, not really. In fact, DeepNash won 30/30 (100%) against Probe, which won Computer Stratego World Championship three times in the past.

But I think Paul is not wrong. While Stratego is mostly skill not luck (it's not like you are drawing cards and you need good cards, there is zero randomness, just hidden information), there is a bit of rock-paper-scissors involved. Novices can't beat experts, but I do think experts can beat God.

↑ comment by paulfchristiano · 2022-10-24T16:23:17.622Z · LW(p) · GW(p)

My main point was that you quoted 42% when the win rate was 84%.

Even if there's no cap on winrate, I don't think you should necessarily expect to "self-improve to beat the best human players every time." Even in a game of perfect information I think there are 2+ orders of magnitude of scale (or equivalent algorithmic progress) where you will beat human players 60-99% of the time.

So I think it could make sense to be surprised "Isn't Stratego easy enough that AI should be crushing humans?" but it would not make sense to say "Given that AI is able to beat humans at Stratego, why is it not able to crush them every time?"

(Note that humans could potentially do better if they knew they were playing against a much stronger opponent and trying to play for a lucky win.)

↑ comment by Dagon · 2022-10-26T17:57:06.913Z · LW(p) · GW(p)

It doesn't have to be that a novice has a chance against an expert, in order for there to be declining returns to further expertise. As an example, rock-scissors-paper-nothing (rock beats scissors and nothing, scissors beats paper and nothing, paper beats rock and nothing) has the "expert" strategy of "randomize, but never choose "nothing"), which beats the incredible-novice who chooses "nothing" most of the time. Further, there is expertise in noticing patterns among your opponents, while obscuring the patterns that such prediction brings to your plays. But very good AI can probably do better than 50% against human experts, without getting anywhere near 100%.

84% for Stratego is higher than I'd have predicted.

↑ comment by ChristianKl · 2022-10-28T13:55:41.740Z · LW(p) · GW(p)

From where did you take that quote? I find in the paper:

DeepNash was evaluated against top human players over the course of two weeks in the
beginning of April 2022, resulting in 50 ranked matches. Of these matches, 42 (i.e. 84%) were
won by DeepNash. In the Classic Stratego challenge ranking 2022 this corresponds to a rating
of 1799, which resulted in a 3rd place for DeepNash of all ranked Gravon Stratego players

I expect that you mistakenly added the % after 42.

Replies from: shminux

↑ comment by Shmi (shminux) · 2022-10-28T16:50:14.964Z · LW(p) · GW(p)

I quoted an article mentioning it without checking the source, oops:

https://www.marktechpost.com/2022/07/09/deepmind-ai-researchers-introduce-deepnash-an-autonomous-agent-trained-with-model-free-multiagent-reinforcement-learning-that-learns-to-play-the-game-of-stratego-at-expert-level/

comment by Dirichlet-to-Neumann · 2022-10-26T18:32:05.396Z · LW(p) · GW(p)

It's certainly interesting although to be honest I'm pretty confident the top human stratego players are nowhere near the top achievable level for a human player (contrasting with games like chess or StarCraft).

DeepMind on Stratego, an imperfect information game

Contents

9 comments