Simplified Poker
post by Zvi · 2018-06-04T15:50:00.299Z · LW · GW · 17 commentsContents
17 comments
This is intended as a three-part sequence. Part two will go over my strategy. Part three will reveal the results and discuss some implications.
In the same class in which we later played The Darwin Game, we played a less complex game called Simplified Poker. As in The Darwin Game, we were given the rules and asked to submit instructions for a computer program that would play the game, and the professor would then code our programs for us.
The rules of Simplified Poker are as follows:
Game is played with a 3-card deck, with the cards labeled 1, 2 and 3.
Each hand, the players alternate who goes first, each player antes one chip and is dealt one card.
The first player can bet one chip, or check.
If the first player bets, the second player can either call the one chip bet, or fold.
If the first player checks, the second player can either also check, or can bet. If the second player bets, the first player can either call the one chip bet, or fold.
There is at most one bet per hand, as neither player is allowed to raise.
If either player folds, the other wins the pot of 2 chips and takes back their 1 chip bet. Neither card is shown. If neither player folds – either both players check, or there is a bet and a call – then both cards are revealed and the player with the higher card takes all 4 chips.
In the class, all programs would play a round robin with all other programs, with 50 hands per match. Your goal is to maximize the average number of chips won over all rounds – note that how many opponents you beat does not matter, only the number of chips won.
The game is simple. A lot, but far from all, of your decisions are forced. There’s no weird trick, but optimal play still isn’t obvious. I’ll pause here to allow and encourage thinking about what strategy you’d submit.
17 comments
Comments sorted by top scores.
comment by Raemon · 2018-06-15T21:52:35.160Z · LW(p) · GW(p)
Curated, both for providing a clear exercise (this is something the mod team would like to see a lot more of), as well as the for the followup posts that discuss that exercise [LW · GW] and some of the important, broader ramifications [LW · GW].
comment by Dacyn · 2018-06-04T18:03:10.488Z · LW(p) · GW(p)
If neither player folds – either both players check, or there is a bet and a call – then both cards are revealed and the player with the higher card takes all 4 chips.
Presumably you mean to say that if both players check, the player with the higher card takes the 2 chip pot, not 4 chips?
Anyway here is the unique (up to choice of x) Nash equilibrium, if my calculations are correct:
1 tbrf svefg: org jvgu cebonovyvgl k, arire pnyy
2 tbrf svefg: arire org, pnyy jvgu cebonovyvgl k+1/3
3 tbrf svefg: org jvgu cebonovyvgl 3k, nyjnlf pnyy
1 tbrf frpbaq: org jvgu cebonovyvgl 1/3, arire pnyy
2 tbrf frpbaq: arire org, pnyy jvgu cebonovyvgl 1/3
3 tbrf frpbaq: nyjnlf org, nyjnlf pnyy
Replies from: Sniffnoy, Zvi↑ comment by Sniffnoy · 2018-06-05T05:53:16.594Z · LW(p) · GW(p)
Heh, I tried to game-theory it all out and got nonsensical answers (my set of equilibria wasn't convex!), and I wasn't about to redo it, so I'm glad someone managed to determine [what I'm presuming is] the correct Nash equlibirum...
It is a little surprising to me that nyjnlf purpxvat jura fgnegvat jvgu n 2 vf gur bayl sbeprq pubvpr jvgubhg na boivbhf ernfba sbe vg. Gur bgure 6 sbeprq pubvprf ner rnfl gb qrgrezvar jvgubhg univat gb qb zhpu zngu, ohg abg gung bar. Naq gur bgure 5 pubvprf gung qb erdhver qbvat zngu gb qrgrezvar qba'g raq hc nf 1f be 0f.
But yeah, as Zvi says, I guess part of the whole question is how far into donkeyspace you want to go. i would've stuck with Nash, if I could figure it out, but...
Replies from: Dacyn↑ comment by Dacyn · 2018-06-05T09:30:39.731Z · LW(p) · GW(p)
Jryy, bapr lbh xabj gung frpbaq-1 arire pnyyf naq frpbaq-3 nyjnlf orgf naq pnyyf, lbh xabj gung svefg-2: purpx naq pnyy fgevpgyl qbzvangrf svefg-2: org (fvapr vg qbrf gur fnzr ntnvafg 3 naq fyvtugyl orggre ntnvafg 1).
Replies from: Sniffnoy↑ comment by Sniffnoy · 2018-06-08T04:26:05.419Z · LW(p) · GW(p)
Ohg vs nf svefg-2 ntnvafg frpbaq-1 lbh purpx, frpbaq-1 zvtug org naq (fvapr lbh qba'g xabj vg'f 1 engure guna 3) lbh zvtug sbyq va erfcbafr (-1 sbe lbh), zrnavat lbh raq hc qbvat jbefr guna lbh jbhyq unir unq lbh org (+1 sbe lbh). Fb V qba'g frr ubj vg qbzvangrf?
Replies from: Dacyncomment by Vaniver · 2018-06-05T20:30:43.567Z · LW(p) · GW(p)
It seems to me like you have four decisions (bet/check if first, call/fold if second, bet/check if second, and call/fold if first) in three states of knowledge, meaning twelve possible pure agents--with the challenge that the agent might be shifting over time, as not only can they play a mixed strategy of pure agents, but also they can change that mix according to what happens in the game.
It's not obvious to me that, with only fifty rounds, you have enough time to identify exploitable agents and exploit them (especially since it looks like mixed strategies are allowed). Which then leads to the question of whether you should just submit Nash (as Dacyn lays out), something that underperforms when playing Nash but overperforms when playing naive agents (note that there's no population change, so the important distribution is the submitted distribution), or something that tries to adapt.
So what does a naive bot look like? If it's looking at a 1, it'll check or fold. If it's looking at a 3, it'll bet or call. If it's looking at a two, it'll probably flip a coin to play aggressively (betting / calling) or defensively (checking / folding) or just play defensively.
The difficulty of identifying the opponent's strategy suggests to me a two-phase plan, as opposed to a fully reactive one. Start off playing overly aggressively, track how many chips you're up or down, and then switch to playing Nash if down a particular amount. (Perhaps it just always bets and calls? Without thinking of more naive agents, it's not clear what'll exploit them best.)
Replies from: Douglas_Knight, lionhearted↑ comment by Douglas_Knight · 2018-06-17T17:19:26.665Z · LW(p) · GW(p)
FWIW, you aren't playing 50 rounds of one game. You're really playing 25 rounds each of 2 games, so you have half as many parameters to estimate in half as many observations.
↑ comment by lionhearted (Sebastian Marshall) (lionhearted) · 2018-06-06T09:10:34.809Z · LW(p) · GW(p)
>It's not obvious to me that, with only fifty rounds, you have enough time to identify exploitable agents and exploit them
That was the first line of thinking I had too.
If so, you might want to check-open and call down with your 2's early in the game to see if you're going to only be paying 3's, or what frequency they bluff 1's.
Also very important to observe whether they call-down with 2's or toss them in either position, and with what frequency.
On a quick think, the inability to raise means that bluffing (1's) and semi-bluffing (2's) is probably more viable... if the opponent wasn't running any based-on-opposition-adjustment, I wonder to what extent "bet every hand from opening position" might play out. It's certainly not optimal but might do better than expected. Presumably second-position always folds 1's and calls 3's, so you'd wind up with —
ALWAYS BET OPENING HANDS
If each hand mix is around 1/9th of the distribution of hands (it doesn't say how large the deck is)...
1v1: You steal a little more than a half chip than expected, instead of push or fold to bluff. Opponent never calls.
1v2: You steal one chip... 50% of the time? 70%? And lose an extra chip the remainder of the time.
1v3: You lose two chips 100% of the time, instead of losing one chip as expected.
2v1: You get one chip with minimal gained equity (small value in you don't have to face a bluff).
2v2: You steal one chip... 50% of the time? 70%? And you push the remainder of the time.
2v3: You lose two chips 100% of the time, instead of losing one chip as expected.
3v1: They fold, gain one chip as expected.
3v2: They fold... 50% of the time? 70%? You get a second chip the remainder of the time.
3v3: They call, it's a push.
So you'd wind up with...
1v1: +0.5 expected over equity.
1v2: +equity if they fold at 67% or higher, -equity if they fold 66% or less.
1v3: -1 expected over equity.
2v1: Minimally small +equity for not having to make decision against bluff, but not much gained.
2v2: Guaranteed +equity if they ever fold, with their fold rate of 2's being the equity gained rate.
2v3: Slightly less than -1 expected over equity (since you'd check-call sometimes to protect).
3v1: They fold, as expected. Breakeven.
3v2: +equity on whatever their calling rate is.
3v3: Push, as expected. Breakeven
Assuming the deck is large enough that these are all similarly weighted matchups (EX 1v1 or 2v2 isn't much less likely), then you'd get +0.5, ???, -1, +tiny, +medium, -1, 0, +medium, 0 [as compared to expected normal equity].
I think that comes out ahead — I'd speculate that among people who don't sit and run analysis, that they fold 2's to open bets slightly too often. Or, hmm, maybe not. You could do better than this of course... but the fact that I'm even running the analysis to see if "literally bet everything from opening position" is good shows just how much the inability to raise changes the game.
Replies from: Vanivercomment by Joachim Bartosik (joachim-bartosik) · 2018-06-04T17:12:39.684Z · LW(p) · GW(p)
Can my program use history of hands it player against the current opponent? What about history of hands the opponent player against others?
Replies from: Zvicomment by JeremyHussell · 2019-01-26T17:28:42.942Z · LW(p) · GW(p)
8 months late. I'm coming into this cold but having previously read about a very similar competition to create strategies to play Rock-Paper-Scissors (RPS). First, work out all the decision points in the game, and the possible information available at each decision point. We end up with 2 binary decisions for each player, and 3 states of information at each decision point.
So my first strategy is to predict my opponent's decisions, and calculate which of my possible decisions will give me the best result. For RPS this is pretty simple:
P(R), P(P), P(S): probabilities my opponent will play Rock, Paper, and Scissors.
V(R), V(P), V(S): expected score (value) for me playing Rock, Paper, Scissors.
V(R) = (P(R) * 0 + P(P) * -1 + P(S) * 1) / (P(R) + P(P) + P(S))
The calculation on the line above is for the general case. For the specific case of RPS, it simplifies to:
V(R) = P(S) - P(P)
V(P) = P(R) - P(S)
V(S) = P(P) - P(R)
A surprising number of competitors fail to play optimally against their opponent's predicted actions. For example, with P(R) = 0.45, P(P) = 0.16, P(S) = 0.39, many competitors play Paper, even though the best expected value is from playing Rock. (Optimal play exploits unusually low probabilities as well as unusually high probabilities.)
In RPS there are three possible decisions, but in simplified poker all the decision points are binary, so we can use A and !A to represent both probabilities, instead of A, B, and C. I choose to represent betting and calling as direct probabilities, and checking and folding as the complementary probabilities.
A, B, C: player #1 bets with a 1, 2, or 3 respectively
D, E, F: after a check and a bet, player #1 calls with a 1, 2, 3
G, H, I: after a bet, player #2 calls with a 1, 2, 3
J, K, L: after a check, player #2 bets with a 1, 2, 3
The expected value calculations are more complicated than in RPS (among other things, you can be uncertain about the current state of the game because you don't know which card your opponent has, and the outcome of player #1's game sometimes depends on its own future decisions), but thanks to the binary decisions the results can be simplified almost as much as in RPS.
D(A), D(B), etc.: condition necessary to decide to do A, B, etc. Calculate V(A) and V(!A), then D(A) = V(A) > V(!A) and D(!A) = V(A) < V(!A). If they're equal, then you play your predetermined Nash equilibrium strategy.
Player #1:
D(A) = 4/3 > P(H) + P(I)
D(B) = 2 + P(G) + z > 3 * P(I), where z = P(L) - P(J) when 3 * P(J) > P(L) and z = 2 * P(J) when 3 * P(J) < P(L)
D(C) = P(G) + P(H) > P(J) + P(K)
D(D) = false
D(E) = 3 * P(J) > P(L)
D(F) = true
Player #2:
D(G) = false
D(H) = 3 * P(A) > P(C)
D(I) = true
D(J) = P(!B) * (2 * P(!E) - P(E)) > P(!C) * (P(F) - 2 * P(!F))
D(K) = P(!A) * P(D) > P(!C) * (3 * P(F) + 2)
D(L) = P(!A) * P(D) + P(!B) * P(E) > 0
Translated back to English:
#1 with a 1: If you predict #2 will fold often enough, then bet (bluff), otherwise check, and always fold if #2 bets.
#1 with a 2: Bet only if you predict #2 will call with a 1 and fold with a 3 enough more than bluffing with a 1 and checking with a 3. Call after #2 bets if there's a high enough chance it's a bluff.
#1 with a 3: Bet or call depending on whether #2 is more likely to call your bet or bet after you check. Always call if #2 bets.
#2 with a 1: If #1 bets, fold. If #1 checks and will fold often enough, then bluff, otherwise check.
#2 with a 2: If #1 bets, call if the chances of a bluff are high enough, otherwise fold. If #2 checks, check unless you predict #1 will call with a 1 and fold with a 3 often enough combined to be worth it.
#2 with a 3: If #1 bets, call. If #1 checks, bet.
Alert readers will complain that I've skipped over the most interesting step: predicting what my opponent will play. This is true, but the above steps needed to be done first, because many of the interesting strategies for predicting your opponent's play assume they've done the same analysis. If both players play following this strategy, and both know that the other will play following this strategy, then play settles into one of the Nash equilibriums. But, many players won't play optimally, and if you can identify deviations from the Nash equilibrium quickly then you can get a better score. If your opponent is doing the same thing, then you can fake a deviation from Nash that lowers your score a little, but causes your opponent to deviate from the Nash equilibrium in a way that you can exploit for more gain than your loss (until your opponent catches on). So I can predict you will predict I will predict you will... and it seems to go into an infinite loop of ever-higher levels of double-think.
My most important takeaway from the Rock-Paper-Scissors competition was that if there are a finite number of deterministic strategies, then the number of levels of double-think are finite too. This is much easier to see in RPS. Given a method of prediction P:
P0: assume your opponent is vulnerable to prediction by method P, play to beat it.
P1: assume your opponent thinks you will use method P0, and plays to beat it. Play to beat that.
P2: assume your opponent thinks you will use P1, and plays to beat it. Play to beat that.
But because in RPS there are only 3 possible deterministic strategies, P3 recommends you play the same way as P0!
There's also a second stack where you assume your opponent is using P to predict you, then assuming you know that, and so on, which also ends with 3 deterministic strategies.
In simplified poker, if you predict your opponent is not playing a Nash equilibrium strategy, and respond optimally yourself, then you will respond in one of 16 ways. If you assume your opponent has guessed your play and will respond optimally, then there are 8 ways for player #1 to respond, and only 4 ways for player #2 to respond. So, assuming I haven't made a mistake, there are at most 5 levels of second guessing, 1 for responding to naive play, and at most 4 more for responding to optimal play before either you or your opponent start repeating yourselves.
So, for any method of prediction which does not involve double-thinking, you can generate all double-think strategies and reverse double-think strategies. Then you need a meta-strategy to decide which one to use on the next hand. If you do this successfully then you'll defeat anyone who is vulnerable to one of your methods of prediction, uses one of your methods of prediction, or uses a strategy to directly defeat one of your methods of prediction.