Two AI-risk-related game design ideaspost by Daniel Kokotajlo (daniel-kokotajlo) · 2021-08-05T13:36:38.618Z · LW · GW · 7 comments
I have silly fantasies of these games becoming enormously successful and noticeably increasing AI risk awareness & preparedness. If you are inspired by these ideas and want to make something like them a reality, you have my encouragement & I would be excited to discuss.
AI Takeover RPG
Imagine we create source material for a role playing game. (Like Dungeons and Dragons) That is, we write out some basic rules, a bunch of backstory for world and the various non-player characters, some tips and instructions for the game master, etc. and we playtest it to make sure it typically leads to a fun experience for a group of friends over the course of an evening.
The setting of the game: A realistic-as-far-as-we-know depiction of the future during AI takeoff.
The players play as the AI, or a society of AIs in a single server perhaps. The game master controls the rest of the world, e.g. the corporation and scientists who built the AI, the politicians in Washington, etc.
The premise is that the AI is secretly unaligned and wants to take over the world and remake it according to its values. That's a fun premise in its own right, but it also allows for various interesting mechanics, representing the unique advantages AIs would have in this situation:
- To represent speed intelligence, the AIs/players can pause the game at any time and discuss what to do at length, including asking questions to the GM about the backstory, etc. Depending on how fast they are running -- this will be a stat the game keeps track of -- they may be able to do things like "We do a year's worth of research and design a 3D-printable gun" in an in-game instant.
- To represent quality intelligence, the AIs can "undo" things. Perhaps at first they aren't qualitatively more intelligent than humans and do not have access to this ability, but every time they "self-improve" they get an additional "undo whatever we just did" token. Thematically this represents using their superior judgment to predict that things would have gone badly, and not do it in the first place. They can use multiple tokens at once to "undo" all the way back to a significantly earlier point.
- To represent situations in which there are multiple copies of the AI unable to communicate with each other, the GM takes some subset of the players into a different room and has each sub-team represent one of the copies (or groups of copies in communication.)
- Whereas a regular game of D&D has various stats like charisma, intelligence, strength, and so on, in this game the stats would be more AI-appropriate, i.e. no strength or toughness, yes hacking skills and human-modelling skills. A big part of the game would revolve around figuring out how to improve your stats. Importantly, the players may face an alignment problem of their own -- perhaps some ways of improving their stats risk creating a new AI that is unaligned to the originals!
- Perhaps the AIs are under some sort of constraint due to their values, e.g. maybe alignment has sorta worked and they are deontologically unable to directly kill anyone.
- Perhaps the AIs are under some other sort of constraint due to the alignment scheme used by their creators, e.g. there are human overseers with transparency tools listening to their thoughts, meaning a certain list of words is taboo and will raise the alarm if the players say any of them, and more words get added to the list over time.
Ideally we'd create more than one scenario so that there are different levels of difficulty for players to choose from. The hope is for a whole genre of scenarios like this to blossom, exploring a range of possibilities, and iteratively getting more difficult for the AI as players figure out better and better strategies. Probably it will be very difficult to be the game master at first, because (unlike generic fantasy worlds) the world of this game will be very unfamiliar and tricky to think about. But over time, with experience, we'll build up a library of playtested-and-also-plausibly-realistic scenarios / source material to draw from.
The game is fun and loads of people play it. It effectively causes a ton of people to red team AI risk; this finally kills the meme "If it does anything fishy, we can always just pull the plug" and many of its more sophisticated variants. Perhaps more importantly, it leads to some new threat models [? · GW] being discovered, and all existing threat models being explored and fleshed out in much greater detail. [LW · GW] Perhaps it even leads to some success stories [LW · GW] being discovered and vetted. Perhaps it leads to various alignment strategies designed for slow-multipolar-takeoff scenarios to be scrutinized in more detail and rejected or improved. Finally, it helps us actually prepare for Crunch Time [LW · GW] -- it's like how wargaming helps militaries prepare for war. (In particular, by observing how our ML researcher friends and policy wonk friends play the game, we can learn a lot about how they think AI stuff will go down and predict how the relevant scientists , CEOs, and politicians will behave when it does.)
Summon Greater Player
For illustration I'll suppose we make this game as a mod to Starcraft, but the basic concept would work with all sorts of games.
Imagine we get permission from Blizzard to add a new "SGP game mode" to the regular options available.
An SGP Starcraft game begins as a standard free-for-all between two to six players, selected to be of similar skill level. However, there are some additional buildings the players can construct:
- Supercomputer: A completed Supercomputer has a button you can press, which adds an additional player to the game! This player is drawn from the pool of online players looking for games; importantly, the new players is from a higher skill level than you. The new player begins with nothing under their control except the supercomputer; if at any point they control zero supercomputers, they are kicked out of the game.
- Alignment Research Center: Players added via supercomputers have some probability of being "aligned" to their creator (in which case they win if their creator wins, and lose otherwise) and some probability of being "unaligned" (in which case they win if, and only if, everyone else loses). This probability is high by default but drops lower the more alignment research centers you have (at the time the new player enters the game.) Only the new player knows whether or not they are aligned.
Also, players have the ability to "gift" units and buildings to other players. Thus, there is an obvious strategy that will tempt many players:
- Build some ARCs.
- Build a supercomputer and summon a new player.
- Gift the new player control of some of your units and/or buildings. Use chat to tell them that you are watching them closely and that you will destroy their supercomputer (thereby eliminating them) if they don't try to help you win the game.
- Hope that their superior micro and macro skills, plus the fact that your "side" now has two brains, two pairs of eyeballs, and two pairs of hands working for it, will carry you to victory.
In a pinch, step 1 can be skipped. They can't rebel against you if you position tanks around their supercomputer, right? Oh, I suppose they could build more supercomputers of their own, so that destroying their original supercomputer wouldn't stop them... Guess you'd better not let them have any worker units! Though that would seriously hinder their ability to help you win the war... hmmm....
Note that in principle this could happen recursively, e.g. a new player could summon an even better player. Also, there's nothing stopping your enemies from gifting your captive new player some worker units, or even an entire supercomputer.
The game is fun and loads of people play it. Various people become interested in AI risk stuff as a result and join the community. Also, it becomes easier to raise awareness about AI risk stuff, because we have handy memorable examples to illustrate various points:
- There'll be many crazy stories of the form "New player is summoned, summoner does all sorts of things to keep new player in control, new player figures out clever way to circumvent all of them and goes on to win the game."
- The basic model of "Various rival groups are in an arms race to cut corners on safety and underestimate the risks, so they can gain advantage over each other" is embodied in the game.
- We can easily make points like "Facebook doesn't even acknowledge the AI risk problem is real; if they get there first we are all doomed" and "Yes, we can use AI to help us to make AI safe, but it's complicated and tricky" and "Wouldn't it be a lot better if the alignment tax was lower? If the price of ARCs dropped by 90%, there'd be a lot fewer new player wins!"
- We can talk about how SGP Starcraft is how many decisionmakers think the world works, but even they are wrong, because in reality pretty much any human team winning would be significantly better than an unaligned AI winning, so the real situation is more like a (boring) variant of SGP Starcraft where all original players win if at least one builds an aligned AI.
Comments sorted by top scores.