The AI Safety Game (UPDATED)

post by Daniel Kokotajlo (daniel-kokotajlo) · 2020-12-05T10:27:05.778Z · LW · GW · 5 comments


  The basic idea:
  The cards:
  More fleshed-out (optional) rules ideas:
  Where to go from here:

Last year, Donald Hobson created this awesome AI Safety card game. I and a few other people at MSFP played it and had loads of fun. :) Unfortunately, we've lost the cards, and the rules. So I'm reviving the game here. Enjoy! (Also, please invent and add new cards to the spreadsheet!)

(UPDATE: Thanks to jimrandomh I now have doubled the number of cards!)

The basic idea:

There are six decks of cards: Compute, Input-Output, Alignment Data, Politics, Fun, and Problem.

You draw one card from each deck. Then, you try to think of how you could save the world, given the stuff written on the cards. When you think you've got it, you tell the other players, and everyone argues about whether your plan would work or not. Repeat.

The cards:

Can be found here; print it all out and then cut it up with scissors.

The text for all the cards can be found in this spreadsheet. Feel free to copy it and add new cards of your own design, or simply add cards directly to the master sheet. Also feel free to make comments suggesting adjustments to existing cards, etc.

More fleshed-out (optional) rules ideas:

All players can draw cards and create plans simultaneously; perhaps the goal is to be the first player who comes up with a plan that has a 50%+ chance of working. Though I expect the game will work better if it is less competitive and more cooperative.

If you don't like one of the decks, don't play with it. For example, maybe you think the Fun deck is too distracting, or the Problem deck makes things way too hard.

You could let everyone draw an additional card from a deck of their choice, but if they do, they have to also draw a Problem card.

Stronger players can be given handicaps, e.g. one less card or one more Problem card.

There's a place in the spreadsheet to add FAQ/Rulings/Errata, which no doubt will come up a lot as people play and try to stretch the interpretation of the cards.

It is considered unsporting to exploit the fact that you have these powers, rather than the powers themselves. This rule is especially important for super powers that come in the Fun deck. Just use the powers to do stuff, and don't exploit the fact that people might think you are a god or that the whole world is a simulation or something. So, for example, say you have the Magic 100x Multiplier compute card. It's fine for you to start a cloud computing business and drive everyone else out of business with your competitive advantage. It's not fine to get a team of international scientists to observe how a computer will speed up as soon as you become the legal owner, and thereby convince them that you are God.

If you really want to make this into a more competitive game with precise rules, I suggest the following: Everyone looks at their cards simultaneously. When you think you have a viable plan, say so. When two people think they have viable plans, both people explain their plans -- starting with the second person to come up with a plan -- and then everyone gets a chance to speak for up to a minute or so, arguing about the plans. You are allowed to make revisions to your plan. Then, everyone votes on (a) which plan is more likely to save the world, and (b) whether the best plan is actually likely to save the world. If your plan is voted more likely to work than the other player's, and actually likely to work, then you get three points. If your plan is voted unlikely to work, you lose a point. If your plan is voted likely to work but not as likely as the other players, you gain one point. Then everyone draws new cards; repeat until a target game-length is reached (e.g. 45 minutes).

Where to go from here:

I'm optimistic that this game will be enjoyed by many others, and expanded into a real-ish game. I think that it is actually a decent tool for teaching AI safety concepts and thinking skills. (It's like MIRI's USB hypercomputer thought experiment, but different every time. Also, the arguments about whether a plan would work may be quite valuable, and different depending on who you play with. So it's a fun way of getting to know other people's views.)

The right set of cards is probably not the current set; probably we need to add lots of ideas and then prune through playtesting. Some cards are way overpowered, others are way underpowered. Perhaps it would be good to rate the cards by OP-ness, so that we can e.g. play only with the weak cards, or only let the newbie take the strong cards, etc.

If anyone has any clever ideas for what the game should be called, I'm all ears! My ideas so far are:

The AI Safety Game

AI Safety

The AI Safety Card Game

Donald Saves the World

Also, please add new card ideas to the spreadsheet! The more cards we have, the better the game gets.

Many thanks to Donald Hobson for inventing this game, Abram Demski for bugging me to revive it and remembering some of the cards, and Daniel Demski for helping me make the cards.


Comments sorted by top scores.

comment by Scott Garrabrant · 2020-12-06T06:11:00.981Z · LW(p) · GW(p)

I believe jimrandomh actually coppied down all the cards at some point.

Replies from: daniel-kokotajlo
comment by Daniel Kokotajlo (daniel-kokotajlo) · 2020-12-06T06:19:56.350Z · LW(p) · GW(p)

Ah! OK great, will PM them.

Replies from: jimrandomh
comment by jimrandomh · 2020-12-06T21:32:20.990Z · LW(p) · GW(p)

Yes. I found this lying around in the existential risk coworking space and transcribed it, without any information about the rules. . The set I found had five decks, not six, which I labelled based on what their theme seemed to be (not exactly matching the definitions above); it's possible two of them got merged together.

comment by digital_carver · 2020-12-08T20:27:01.337Z · LW(p) · GW(p)

Then, you try to think of how you could save the world, given the stuff written on the cards.

Just to make it explicit, does "save the world" mean things like solve world hunger, create a true utopia, take us on an accelerated path to post-scarcity state, etc.?

Replies from: daniel-kokotajlo
comment by Daniel Kokotajlo (daniel-kokotajlo) · 2020-12-08T20:37:51.744Z · LW(p) · GW(p)

That's what I had in mind, yes. Creating aligned AGI and letting it do the rest would be the primary way to achieve this.

But feel free to play the game with different definitions/objectives if you prefer!