An Exercise in Rational Cooperation and Communication: Let's Play Hanabi

post by ejacob · 2021-04-14T08:36:07.846Z · LW · GW · 10 comments

Contents

  Why Play Hanabi?
  Hanabi Rules
    Optional Rules
  Why is this on LessWrong?
  Where to Play or Buy Hanabi
None
10 comments

Why Play Hanabi?

Hanabi is a game requiring modeling others' minds, communication, and strategy. Its unique challenges and cooperative (rather than adversarial) objective have piqued the interest of the AI community, which recently began using it as a testing environment for AI agents. 

Hanabi is a card game in which 2-5 players must cooperate to put on a dazzling firework display. The cards in the game represent different stages of fireworks. The most basic version of the game has five different colors of fireworks (cards), each of which has five stages, numbered 1-5, that must be set up in ascending order. The players must play cards from their hands to add them to the communal display on the table. The players' score at the end of the game is the sum of the latest stage of each color they successfully deployed; reaching stage 5 for all 5 colors receives the maximum score of 25.

However, a slight wrinkle: players must hold their cards so that they face away from them; no player is allowed to look at their own cards, only those of other players. To play their cards at the right time, each player must rely on clues from their teammates.

I like this game a lot. I also think it is good for people to play in that it forces you to think about how others will interpret your communications when they don't have the same information or perspective that you do. The communication skills and theory of mind required for Hanabi are also good for real life - in handling interpersonal conflicts with people you ultimately want to cooperate with, or in explaining your expertise to a layperson, as some examples. (I believe this strongly enough that I will volunteer myself as a partner for any reader interested in trying the game online; just leave a comment or send a message.)

Hanabi Rules

Instead of or in addition to reading this section, you can watch a humorous explainer video.

To start, shuffle all the cards together and deal 5 cards to each player in a 2 or 3 player game or 4 cards to each player in a 4 or 5 player game. Place the rest of the cards in the middle of the table and set up the 8 clock tokens and three fuse tokens face-up.

Players take turns until the game is over. On their turn, a player must take one of three actions:

  1. Give a hint to another player
  2. Play a card from their hand
  3. Discard a card from their hand

Give a Hint: To give a hint, a player picks one other player, then points to cards in that player's hand that match either a number or color (e.g., "this card is a 5" or "these three cards are red"). They must point to all the cards that match the color or number. This is the only way to communicate card values or colors in Hanabi. To give a hint, the player must flip over one of 8 clock tokens. If there is no token to flip over, the player cannot give a hint.

Example: Bob is holding a red 3, a green 2, a blue 2, and a blue 1. Alice wants to give Bob a clue. Two of the clues she is allowed to give are "This card is a 3" and "These two cards are blue," while pointing at the corresponding card(s). She is not allowed to say "This card is a two" while pointing to only one of the two 2s; she must indicate both cards if she wants to tell Bob which cards are 2s.

Play a Card: If a player thinks one of their cards is ready to be added to the display, they can announce they are playing a card, and then put it on the table. If it is ready to be added to the display, great! The card is added to its color's pile and the players' score increases by 1. If instead it is too early to play that card, or another copy of that card has already been played, it is discarded and the players lose one of the three fuse tokens. If all three fuse tokens are lost, the display explodes and the players score 0 points.

Example: The game has just begun and there are no cards on the table. On the first turn, Alice tells Bob he is holding a 1. On Bob's subsequent turn, he plays the card Alice pointed out. It is a blue 1, which is added to the display.

Example: In the middle of the game the green fireworks are at stage 3 and the yellow fireworks are at stage 2. Bob is holding a card that he thinks is a green 4, so he announces he is playing a card and puts it on the table. However, Bob's card was actually a yellow 4, which cannot be played before a yellow 3. Bob discards his yellow 4 and one fuse token.

Playing a 5 of a particular color suit completes that color firework and un-flips a clue token as a bonus.

Discard a Card: Players can announce they are discarding a card and then remove one of their cards in their hand from the game. Doing this returns one clue token to its unflipped state, allowing the team to give another clue in the future. Other than playing 5s, this is the only way to un-flip a clue token. Most cards in the game have multiple copies. Since each color suit only needs one card of each value to be played, there are plenty of cards that can be "safely" discarded.

Whenever a player loses a card by playing or discarding, they draw another at the end of their turn. When the deck runs out, everyone takes one last turn, and then the game ends. The game can also end early if the display is completed.

Optional Rules

There are lots of variations on the base game which range from fun to challenging to absolutely diabolical. I will outline the two most common below. 

Rainbow Cards: The game also includes a rainbow suit, which can be included as a 6th suit along the other 5 solid colors. Depending on the desired difficulty, players can choose to have the rainbow cards be their own distinct color along the basic 5, or they can choose for the rainbow cards to match all color clues. This way, a player who is told one of their cards is red, for instance, cannot be sure if the indicated card is red or rainbow until they receive more information.

Black Powder: An expansion to the base game adds a "black powder" suit of cards, which are added to the deck like the rainbow cards. Black cards differ from the other cards in two ways. First, they cannot be indicated by any color clues (e.g. "this card is black" is an illegal clue). Second, they must be played in descending order, instead of the normal ascending order that the other suits use.

Why is this on LessWrong?

Playing Hanabi online over the past month or two taught me lessons about communication. It's also a case study in how an anonymous crowd finds and uses Schelling points. 

People who play Hanabi on a particular site (or in person with the same group of people) will gradually evolve, by group consensus, norms and expectations about how to play. For example, in online settings:

The benefit of having such norms is hopefully obvious; players can take advantage of "pre-loaded" information to correctly indicate which cards to play while using fewer clues. Most, if not all, of the norms are arguably the "best" norms that could be chosen, often because they are Schelling points [LW · GW] in player strategy that arise naturally from the rules of the game. For example, with respect to the above norms: 

The casual reference to Schelling points is not to be overlooked - anonymous players online approaching Hanabi with a common strategy is not unalike the New York City question.

A highly ranked player will be expected by other highly ranked players to know these informal rules, to the point where not following them is met with confusion (if not outright hostility - it is internet gaming, after all). However, there isn't perfect consensus about particular edge cases of these norms, and there are sometimes situations where perfectly communicating information to another player is simply not possible. With respect to this reality, there are basically two types of players, which I will nickname "Goofus" and "Gallant."

Goofus believes that the correct way to play Hanabi is to perfectly understand and follow the informal rules. If a player he gave a clue to misunderstands it and misplays, Goofus is likely to be confused or angry. To Goofus, to play Hanabi is to execute an algorithm that he has only partial control of. He relies on other players to correctly execute their parts of the algorithm, and feels helpless when they don't.

Gallant understands that what makes communication good or bad is whether or not it is correctly understood by its recipient. If a player he gave a clue to misunderstands it and misplays, Gallant asks himself, "why didn't my clue mean what I thought it meant?" To Gallant, to play Hanabi is to set an objective; the algorithm of clue-giving and interpreting can and must change to reach it.

(An aside: This post was inspired by my experience after I had the poor fortune to be matched with a Goofus yesterday. During our game, he/she gave me a clue that was ambiguous: I could tell that the indicated card was important, but not if it was ready to be played immediately or if it should be saved for later. In fact, he/she had meant "play this card now," but I instead took the risk-averse action and discarded some other card instead of playing. In response the player sent me an angry message, intentionally (I think) lost the game for us, and then wrote a negative comment about me, visible to other players, about how I don't follow the right informal norms of play. I glanced at the player's recent activity: it contained two full pages of the player writing negative comments about people with whom he/she had played Hanabi, stretching back weeks. It struck me as impressive that one can consistently fail at communicating and yet insist that they are communicating correctly.)

I see this as the ultimate "moral of the story" when it comes to Hanabi: the value of good communication lies in whether or not it is properly understood, and ultimately is measured by whether or not it produces the desired behavior in its recipient. 

Don't be Goofus. Be Gallant.

Where to Play or Buy Hanabi

I play Hanabi regularly online (usually with strangers) on Board Game Arena; there are a few other websites with different features available as well. You can also order a physical copy of the game from your favorite online retailer or board game shop. The game components are simple enough that you could make your own set, with some effort.

A benefit of playing online instead of in person is that most online implementations of Hanabi track clues for you, eliminating the memory aspect of the game and allowing you to focus only on the logic and communication. (In my opinion, this is a serious plus.)

Hanabi can be played in less than half an hour, and I recommend it for adults and families with children aged 8-10 and up, or perhaps even younger if they are particularly clever.

10 comments

Comments sorted by top scores.

comment by alexzhaohong · 2021-04-14T19:25:45.682Z · LW(p) · GW(p)

the value of good communication lies in whether or not it is properly understood, and ultimately is measured by whether or not it produces the desired behavior in its recipient. 

Hidden information in board games is a great way to explore coordination games and game theory Focal points such as the New York City question. Are you familiar with other board games in this space? Have you heard of The Mind?

Replies from: ejacob
comment by ejacob · 2021-04-14T19:30:38.684Z · LW(p) · GW(p)

No other examples come to mind, which is one reason I thought to write the post on Hanabi. I have not heard of The Mind but it sounds like a somewhat surreal experience.

Replies from: ahong
comment by ahong · 2021-04-14T19:50:01.684Z · LW(p) · GW(p)

I find Hanabi to be a simple way to challenge my bias that other individuals will arrive at a similar conclusion as I, even with imperfect information. The simplicity of the scoring mechanism provides discrete, actionable information that can provide a narrative that, yes, I can improve the quality and consistency of my interpersonal communication.

For me, Hanabi is a meditative process of self-reflection. To read that Goofus not only can compartmentalize two pages of negative feedback but also can stay strict to the "correct informal conventions" is eye-opening. It sounds like an unpleasant process for both Goofus and the teammates. Given the fact that there are many other games/exercises that offer a higher reliability for engines, perhaps Goofus is looking in Hanabi for that one partner that can understand him 100%, without his needing to evaluate his communication process.

How can one be a Gallant?

Replies from: ejacob
comment by ejacob · 2021-04-14T22:08:41.780Z · LW(p) · GW(p)

Being a Gallant is about communicating the message which is most likely to make sense to your recipient, not the message that best appeals to you for some reason. I suspect it is a skill like any other and is best improved simply through regular practice. Post-mortems on failed communications is also likely to be helpful, if only to prevent future mistakes of the same kind. Eventually, one who has fully embodied Gallant will do a pre-mortem before every attempt at communication, iteratively improving their draft message before communicating the best possible signal they can produce. 

comment by Randomized, Controlled (BossSleepy) · 2021-04-14T21:40:03.501Z · LW(p) · GW(p)

I glanced at the player's recent activity: it contained two full pages of the player writing negative comments about people with whom he/she had played Hanabi, stretching back weeks. It struck me as impressive that one can consistently fail at communicating and yet insist that they are communicating correctly.

 

Or, alternatively, the game this person was actually playing was: I'm going to try and gain status and/or self-regard by denigrating others. Perhaps with a helping of: kek kek kek troll troll troll.

Replies from: philh, BossSleepy, ejacob
comment by philh · 2021-04-18T17:02:26.134Z · LW(p) · GW(p)

Seems important to note here that we don't know (and I guess OP doesn't know either) how often they successfully communicated versus failed to communicate. Someone who usually communicates successfully, but yells at you when they fail, would have a similar pattern.

comment by Randomized, Controlled (BossSleepy) · 2021-04-18T21:11:38.240Z · LW(p) · GW(p)

Fair point. @ejacob, please play more hanabai with this person and report abuse statistics : D

Replies from: ejacob
comment by ejacob · 2021-04-18T21:16:06.011Z · LW(p) · GW(p)

I can't wait to see their response to my invitations 😃

comment by ejacob · 2021-04-14T21:54:30.100Z · LW(p) · GW(p)

Possibly! A game of Hanabi is cooperative, but the meta-game of playing repeated games of Hanabi as an online community may not be.

Replies from: BossSleepy
comment by Randomized, Controlled (BossSleepy) · 2021-04-14T22:05:03.428Z · LW(p) · GW(p)

Even if the meta-game is strictly (or just largely) cooperative, I think there could still be a dynamic where someone plays poorly or ambiguously, and then gets off on thinking that the other players were the ones in the wrong. It's a way to gain self-regard. 

Normally, I would think this is an unlikely strategy for someone to take, but if you see someone who's been shitting on other people for weeks, it does raise the question: what are you getting out of this?