A Game About AI Alignment (& Meta-Ethics): What Are the Must Haves?

post by JonathanErhardt · 2022-09-05T07:55:05.572Z · LW · GW · 15 comments

Contents

  1) Values are programmed into the system. But:
  2) Values are learned by the system. But:
  Other reasons for concern:
None
15 comments

I'm working on a pc & mobile game about metaethics, ethics & AI alignment. (Our steam page + announcement teaser will be up in 1-2 weeks.) It's important to me that we nail the AI alignment part and give people a good idea about why AI alignment is hard and optimism shouldn't be our default position.

Some constraints that come with the medium:

  1. We can't go into arguments that are too technical
  2. We need to keep the exposition of these ideas short

What do you think of the following framing of why AI alignment is hard? Are we missing any crucial considerations? (This is merely our internal script and will later be turned into more digestible dialogues, mini games, etc.)

 

 

There are 2 ways alignment can pan out: Either the values are programmed into the system, or they are learned by the system.

 

1) Values are programmed into the system. But:

2) Values are learned by the system. But:

Other reasons for concern:


 

15 comments

Comments sorted by top scores.

comment by Gregviers · 2022-09-07T04:00:41.941Z · LW(p) · GW(p)

It would help to know what genre of game you are making. You talk about exposition, "We need to keep the exposition of these ideas short", and I would take this to the extreme if I were you. Show, don't tell. If players don't learn the concepts from the gameplay, then try game isn't about those concepts.

For example, if you want to teach players that ai optimism is not a good default and alignment is hard, give them a chance to do an alignment task or make alignment choices, in which there are optimistic options, that end badly. Or make a game that's almost unwinnable, to emphasize how hard the problem is.

Have you played universal paperclips? I've found it a fun first introduction to ai alignment for people with no knowledge of the topic.

Replies from: JonathanErhardt
comment by JonathanErhardt · 2022-09-08T07:54:53.015Z · LW(p) · GW(p)

We will post more when the game is announced, which should be in 2-3 weeks. For now I'm mostly interested in getting feedback on whether this way of setting the problem up is plausible and doesn't miss crucial elements, less about how to translate it into gameplay and digestible dialogue.

Once the annoucement (including the teaser) is out I'll create a new post for concrete ideas on gameplay + dialogue. 

Replies from: Gunnar_Zarncke
comment by Gunnar_Zarncke · 2023-12-12T23:52:43.249Z · LW(p) · GW(p)

Did you get around to finish the game? I didn't see it. Or is it this?:

AI takeover tabletop RPG: "The Treacherous Turn" [LW · GW]

Replies from: JonathanErhardt
comment by JonathanErhardt · 2024-01-09T07:32:58.674Z · LW(p) · GW(p)

Not yet unfortunately, as our main project (QubiQuest: Castle Craft) has taken more of our resources than I had hoped. The goal is to release it this year in Q3. We do have a Steam page and a trailer now: https://store.steampowered.com/app/2086720/Elementary_Trolleyology/

comment by Ericf · 2022-09-05T15:40:16.197Z · LW(p) · GW(p)

Cautiknary tale: There was a browser game about sustainable fishing that was supposed to show the value of catch shares, but the concept was only introduced at the end of the game, so after playing for 30 minutes I hadn't even seen it (and had gotten bored with the mechanics)

Don't wait too long into the play experience to have your player start interacting with yiur key concepts.

comment by Daniel Kokotajlo (daniel-kokotajlo) · 2022-09-07T06:04:04.368Z · LW(p) · GW(p)

Cool! I suggest you read the following post by Ajeya Cotra [AF · GW] if you haven't already, I think it's a good summary of one of the core problems (which I suppose fits under 2b in your classification & may give some good inspiration as well.)

Replies from: JonathanErhardt
comment by JonathanErhardt · 2022-09-08T07:52:27.660Z · LW(p) · GW(p)

Thanks for the link, I will read that!

comment by James_Miller · 2022-09-05T12:15:24.826Z · LW(p) · GW(p)

You could do a prisoners' dilemma mini game.   The human player and (say) three computer players are AI companies.  Each company independently decides how much risk to take of ending the world by creating an unaligned AI.  The more risk you take relative to the other players the higher your score if the world doesn't end. In the game's last round, the chance of the world being destroyed is determined by how much risk everyone took.

Replies from: Ericf, JonathanErhardt
comment by Ericf · 2022-09-05T15:35:30.660Z · LW(p) · GW(p)

Isn't that begging the question? If the goal is to teach why being optimistic is dangerous, declaring by fiat that an unaligned AI ends the world skips the whole "teaching" part of a game.

Replies from: James_Miller
comment by James_Miller · 2022-09-05T18:01:15.575Z · LW(p) · GW(p)

Yes, it doesn't establish why it's inherently dangerous but does help explain a key challenge to coordinating to reduce the danger.  

comment by JonathanErhardt · 2022-09-05T13:08:53.587Z · LW(p) · GW(p)

I really like that and it happens to fit well with the narrative that we're developing. I'll see where we can include a scene like this.

Replies from: James_Miller
comment by James_Miller · 2022-09-05T17:59:38.818Z · LW(p) · GW(p)

Excellent.  I would be happy to help.  I teach game theory at Smith College.

comment by cubefox · 2022-09-05T12:52:55.998Z · LW(p) · GW(p)

Ethical truths are probably different from empirical truths. An advanced AI may learn empirical truths on its own from enough data, but it seems unlikely that it will automatically converge on the ethical truth. Instead, it seems that any degree of intelligence can be combined with any kind of goal. (Orthogonality Thesis)

I think the main point of the orthogonality thesis is less about an advanced AI not being able to figure out the true ethics, but the AI not being motivated to be ethical in this way even if it figures out the correct theory. If there is a true moral theory and the orthogonality thesis is true, the thesis of moral internalism (true moral beliefs are intrinsically motivating) is false. See here https://arbital.com/p/normative_extrapolated_volition/ section "Unrescuability of moral internalism".

Replies from: JonathanErhardt
comment by JonathanErhardt · 2022-09-05T13:05:19.895Z · LW(p) · GW(p)

Good point, I see what you mean. I think we could have 2 distinct concepts of "ethics" and 2 corresponding orthogonality theses:
 

  1. Concept "ethics1" requires ethics to be motivational. Some set of rules can only be the true ethics if, necessarily, everyone who knows them is motivated to follow them. (I think moral internalist probably use this concept?)
  2. Concept "ethics2" doesn't require some set of rules to be motivational to be the correct ethics.
     

The orthogonality thesis for 1 is what I mentioned: Since there are (probably) no rules that necessarily motivate everyone who knows them, the AI would not find the true ethical theory.

The orthogonality thesis for 2 is what you mention: Even if the AI finds it, it would not necessarily be motivated by it.

Replies from: cubefox
comment by cubefox · 2022-09-05T14:00:54.659Z · LW(p) · GW(p)

Exactly!