The Darwin Game - Conclusion

post by lsusr · 2020-12-04T08:06:37.301Z · LW · GW · 30 comments

Contents

  Rounds 0-20
  Rounds 21-40
  Rounds 41-1208
  Winner
  Today's Obituary
  Conclusion
None
30 comments

Evolution is unintelligent. The bugs removed intelligence from the design of the bots. The more bugs I wrote into my simulator, the better my simulation replicated real-world Darwinian population dynamics. After two alternate timelines with a buggy game engine, I have finally gotten around to running the game for real.

Alas, this game is between intelligently-designed species, not randomly-generated chunks of code.

Rounds 0-20

MeasureBot takes an early lead.

Rounds 21-40

Multicore's EarlyBirdMimicBot steals the lead from MeasureBot.

Rounds 41-1208

Welcome to Planet Multicore.

Winner

Bot Team Description Round
EarlyBirdMimicBot Multicore Superintelligence [? · GW]

Today's Obituary

Everyone else.

Conclusion

I hope you had fun. This wouldn't have been possible without the community here at Less Wrong. At least 75% of the code (not counting pseudocode) was written by people other than me. Thank you everyone who competed, debated, plotted and hacked. Thank you for the espionage and counter-espionage. Thank you everyone who helped spot bugs in the game engine. Thank you Zvi for posting the original Less Wrong Darwin Game series. Extra thanks to moderator Ben Pace for prettifying the tables behind the scenes and moderator Oliver for fixing multiple timestamps.

The source code to the game and all the bots is available here. If there is a bug in this timeline you can fix it yourself.

This concludes the 2020 Less Wrong Darwin Game.

30 comments

Comments sorted by top scores.

comment by Multicore (KaynanK) · 2020-12-04T12:48:55.566Z · LW(p) · GW(p)

Looking through the code, yep, my simulation criteria were so conservative that I only simulated the PasswordBots. OscillatingTwoThreeBot was oh so close to only having two open parentheses but it used a decorative set of parentheses in the class definition (as did many others) Looks like I didn't need simulation anyway.

I am somewhat interested in using the code to explore alternate timelines. Who wins without me? Who wins the clone showdown if it's allowed to happen? What happens if you start the game at round 90 and make the smart bots use their endgame strategies in a pool full of silly bots? What happens if you remove npc bots and have a pool of only players? Does anything interesting happen if the number of turns per round is 101 rather than 100? I'm probably not interested enough to commit to doing this in a timely manner though.

What marginal submission would win in this pool? Probably just a MimicBot with Measure's opening game. Using simulation, especially hy-compatible simulation, could make you win more as long as you didn't simulate MeasureBot, or only simulated it in a separate thread.

It's been a great ride. Thanks for running the game, lsusr.

Replies from: Ericf
comment by Ericf · 2020-12-13T15:54:01.089Z · LW(p) · GW(p)

Why did you have your password bots: A. Look at the opponent's plays, rather than opponent's source code for the password? B. Return 2 when friendly instead of 0? C. Return 3 against the field instead of trying to cooperate?

Replies from: KaynanK
comment by Multicore (KaynanK) · 2020-12-14T01:24:50.849Z · LW(p) · GW(p)

A: EarlyBirdMimicBot is extremely restrictive about what it simulates, because I was worried about malware. MeasureBot confirmed this fear, though I could have been less restrictive and still avoided it. Therefore, PasswordBot cannot look at its opponent's source code if it wants EarlyBirdMimicBot to simulate it.

B: EarlyBirdMimicBot's simulation strategy is brute force, looking at the result of every possible sequence of the next N moves. lsusr required bots to make their moves quickly, so to save on time I only considered the moves 2 and 3 when simulating. 

I could have addressed this by simply having a special case behavior against PasswordBots instead of simulating them, but I didn't think of that.

C: I was actually planning to do this but screwed it up and did not check it properly before uploading. It would have been tit-for-tat against the field if I did it right.

Replies from: Ericf
comment by Ericf · 2020-12-14T04:07:52.501Z · LW(p) · GW(p)

Ah, but the beauty of collusion is that you don't have to be robust. Having thought about this for a month, I would have just had the poor bots check opp source code for a password, and play all 0 against you (and perfect cooperation vs each-other). Then you just check their source code for a password, and play all 5s against them as a special case. Private keys are amazing things, as long as you can secure them. And unless lsusr was going to let people update their source code from within their program for subsequent rounds (navigate the file directory for your source file, write a new version, hope the game engine re-loads it between rounds) there's nothing other bots could do to exploit it.

comment by jacobjacob · 2020-12-09T23:40:46.347Z · LW(p) · GW(p)

Curated -- as a celebration of the entire sequence [? · GW]. 

(This is my first curation notice since becoming an admin.)

What can I do except gaze into the horizon and drawl "Hogwarts... needs more... plotting" 

I think games, simulations, practice, exercises, and so forth are really valuable. They help us build a culture of rationality as a craft, as an ongoing practice, that is grounded in real feedback loops. 

One fear with rationality is to end up in a "hall of mirrors": an endless chasing of your own thoughts and feelings that never actually bottom out in the real world. For example, it is said that a PhD student in literary theory can go through an entire education without ever being conclusively proven wrong. 

How much of LessWrong can you go through without ever being conclusively proven wrong? That is, how much of the Sequences can you delude yourself into understanding? How many teacher's passwords can you guess? 

I'm not sure. But having institutions like the Darwin game offers a valuable opportunity for people to have reality punch them in the face. 

Now exercises can vary along the degree to which they test your rationality. Climbing an obstacle course is also a way to end up getting a log smashed in your face... but it seems to have less relevance for rationality. However, the Darwin Game combined important elements of economics, game theory (open-source and programmatic, LessWrong style!), computer science, thinking outside the box and more.

Finally, I want to encourage and reward the large amounts of thinking that were spent on this by both Isusr and the participants, which impressed me a fair bit. Professor Quirrell thinks you... Exceeded Expectations. 

Replies from: Quirrell
comment by Quirrell · 2020-12-10T09:13:25.160Z · LW(p) · GW(p)

No, but I await the day when they produce something which does. Otherwise, Britain is doomed.

comment by Zvi · 2020-12-04T14:27:43.754Z · LW(p) · GW(p)

This looks more like what I would have expected to happen. Congratulations to Multicore. 

The automatic cooperation means that once you reach an endgame where everyone is always cooperating, whoever has the biggest share will win, so the game is about entering the endgame with the largest share more than being slightly better at late execution. The other games where things didn't collapse seemed weird, and it makes sense that it was largely buggy code. The other possibility is incidental perfect cooperation - e.g. if BendBot always starts 2 and Manticore always starts 3, and there are 100 turns, then the game becomes static if everyone else is gone. 

I am content with a 3rd place finish given I did it without writing code. This was sharp competition! 

If people are running new simulations, some things I'd be curious about to get juices flowing:

  1. What happens if you rerun the thing a few times? Does it always look the same? Graphs seem to have some big semi-random events on them.
  2. What happens if we change 100 turns/round to 101?
  3. What is the simplest bot that, when added to the field, would win?
  4. Could BendBot have won if it had been able to expand its logic to cover more cases and thus get off to a better start, or if it had chosen a slightly better opening sequence? What if it always started 2 and never 3? On reflection alternation was wrong, I shouldn't have hedged my bets here.
  5. What happens if the reward for self-ID is lowered from 2.5/round to the theoretical maximum, where they have to figure out who starts high and who starts low? That makes an equilibrium more likely (since you can have 2 bots that do better than that against each other, because they're not identical).
  6.  How much does it matter if you add in password bots for various people, or new silly bots, or take silly bots away? What happens if we don't include any silly bots or password bots? What happens if the password bots are more ofuscated, so EarlyBirdMimicBot doesn't look at them?
  7. What happens if EarlyBirdMimicBot is less scared to simulate? How much faster does it win? 
Replies from: Zvi, lsusr, KaynanK, Ericf
comment by Zvi · 2020-12-04T14:28:53.200Z · LW(p) · GW(p)

Also, this series should definitely become a sequence. Great job all around, big thanks to lsusr.

comment by lsusr · 2020-12-04T21:06:35.908Z · LW(p) · GW(p)

I am content with a 3rd place finish given I did it without writing code. This was sharp competition!

Plus, I didn't even implement the whole thing.

comment by Multicore (KaynanK) · 2020-12-04T17:57:50.024Z · LW(p) · GW(p)

What happens if EarlyBirdMimicBot is less scared to simulate? How much faster does it win?

I actually win less in that case, even if I get there faster. I get perfect cooperation with the deterministic cooperators written in Python, so one or two of them stick around forever if they last long enough. It can be two if one of them starts 2 and the other starts 3 so they cooperate with each other, though I'm not sure if there's a deterministic Python bot that starts 3.

comment by Ericf · 2020-12-04T16:07:14.172Z · LW(p) · GW(p)

A Bully Bot could actually do pretty well here (even without attempting simulation) - you get to exploit all the Silly bots, get the most you can out of the Clone army (more than 50% at the beginning when they are willing to back down, 40% once you have to be Fold Bot against them) and still cooperate or cooperate+ against everyone else (especially if you can trick simulators or pseudo-simulators into folding to you)                                                                                                                                                                                                             

Replies from: KaynanK
comment by Multicore (KaynanK) · 2020-12-04T17:48:31.643Z · LW(p) · GW(p)

The clones do not fold; in the early game they play an EquityBot-ish strategy that gives attackers less than cooperation would have gotten them. Only a couple of players were willing to fold in the early game, and usually only after ten or more turns of attack. Attacking for tens of turns to find out whether your opponent is a FoldBot will destroy you in a pool of mostly not FoldBots.

Simulation would be able to tell you who to bully without having to go through that - run the opponent for 100 turns and see if they eventually fold against all 3s. But as always, simulation runs the risk of MeasureBot-style malware.

Replies from: Ericf
comment by Ericf · 2020-12-04T18:41:13.387Z · LW(p) · GW(p)

Ah, right, I misread that code.

If there's a time limit on running, a quick "loop until 75% of the time limit is used up" will stop any simulator from running more than 1 turn of simulation.

Replies from: Ericf
comment by Ericf · 2020-12-14T04:52:47.509Z · LW(p) · GW(p)

Having now looked over the codes, it looks like no-one expected so many silly bots that would play 0 every round is simulated correctly. So, a bot that did some checking and cooperated with complex things, simulated and crushed silly bots, and folded to the clone army would probably have gotten a superior early lead, and possibly held onto it. Especially if luser was re-loading the source code from the original file each round, and you took advantage of the rules loophole that prohibited:

  1. Hacking your opponent's source file (but not your own)
  2. Looking at the game engine stuff
  3. Saving any "information" from one round to another. But, crucially, not replacing your own source code file deterministically after a particular round. So, after you finish exploiting the silly bots for the first 10-20 rounds, replace your source code with a compliant Clone Bot with an aggressive payload to win after round 90.
Replies from: Zvi
comment by Zvi · 2020-12-14T13:09:04.723Z · LW(p) · GW(p)

I mean I did include an explicit "if they seem to be playing 0 then don't be an idiot and play 5" line and a similar one to play 4 if they kept playing 1. I had complexity restrictions that prevented me from doing more than that, but I'm confident those lines of codes did good work.

comment by Insub · 2020-12-04T19:21:38.089Z · LW(p) · GW(p)

For the record, here's what the 2nd place CooperateBot [Insub] did:

  • On the first turn, play 2.
  • On other turns:
    • If we added up to 5 on the last round, play the opponent's last move
    • Otherwise, 50% of the time play max(my last move, opponents last move), and 50% of the time play 5 minus that

My goal for the bot was to find a simple strategy that gets into streaks of 2.5's as quickly as possible with other cooperation-minded bots. Seems like it mostly worked.

comment by Vanilla_cabs · 2020-12-04T10:38:12.813Z · LW(p) · GW(p)

Congrats Multicore, for an uncontested victory by mastering both technical and social aspects of the game!

Thanks lsusr for giving us a tournament to fight, thanks all clique members, plotting with you was fun in spades :)

I didn't win, but our clique was probably instrumental to Multicore's victory, so I'll be content with that. Until next time.

Replies from: Bucky, lsusr
comment by Bucky · 2020-12-04T11:42:23.091Z · LW(p) · GW(p)

I didn't win, but our clique was probably instrumental to Multicore's victory, so I'll be content with that. 

"Well," thought the antelope, as it's spirit floated above the scene, "at least that lion is getting plenty of sustenance from my corpse."  :)

I feel like Measure has good reason to feel at least a little smug for having predicted [LW(p) · GW(p)] that something like this might happen:

How surprised would you be if someone managed to bypass the code checking and defect from the group?

For the record, I do agree that the presence of the clique made for an interesting contest!

comment by lsusr · 2020-12-04T10:41:16.020Z · LW(p) · GW(p)

Thank you for assembling the clique and then keeping the nested clique secret at my request. You made the game way more interesting and gave me all sorts of drama to write about. Also, congratulations to the clique for winning the Mutant timeline [? · GW].

comment by Bucky · 2020-12-04T11:44:30.212Z · LW(p) · GW(p)

Multicore, please leave a comment on the post so I can upvote you for winning!

comment by Ben Pace (Benito) · 2020-12-04T10:22:17.258Z · LW(p) · GW(p)

Three cheers for Multicore, the true player of games!

Replies from: lsusr
comment by lsusr · 2020-12-04T10:47:44.198Z · LW(p) · GW(p)

Two cheers for Measure, Vanilla_cabs, Zack_M_Davis, Vanessa Kosoy and Emiya for their elegant creative strategies.

comment by philh · 2020-12-05T16:05:40.635Z · LW(p) · GW(p)

Thanks for running this! I'm a little disappointed the clique didn't make it to round 90, so that my custom code never ran, but so it goes.

Marking my predictions:

  • I win: 20%.
  • A CloneBot wins: 75%.
  • At least one clique member submits a non-CloneBot (by accident or design): 60%.
  • At least one clique member fails to submit anything: 60%. (I think this happened? I don't remember where someone said that though.)
  • At least one bot tries to simulate me after the showdown and doesn't get disqualified: 10%.
  • At least one bot tries to simulate me after the showdown and succeeds: 5%.
  • At least one CloneBot manages to act out: 5%.
  • I get disqualified: 5%.

Only two where I was on the wrong side of 50%, but giving 5% to a CloneBot acting out is embarrassing. I think if I'd said 10% I'd feel okay with it. I'm curious whether, if we started at round 90, the "after the showdown" predictions would have gone the other way; but I think I did try to price in the chance of never making it there when I made them, so. (Were there even any simulators in the clique, other than EarlyBirdMimicBot which wouldn't have tried against any of us?)

Replies from: lsusr
comment by lsusr · 2020-12-05T19:28:20.297Z · LW(p) · GW(p)

At least one clique member fails to submit anything: 60%. (I think this happened? I don't remember where someone said that though.)

Yes. One clique member failed to submit anything.

comment by jacobjacob · 2020-12-05T01:35:40.901Z · LW(p) · GW(p)

A big thanks for running this! It seemed like a massive effort to code up everything, but I think testing our rationality "in the field" with challenges like this is hugely valuable. 

comment by Measure · 2020-12-04T14:39:20.321Z · LW(p) · GW(p)

I'm a little bit surprised that the presence of AbstractSpyTreeBot caused MeasureBot to do worse overall. Good game all, though. Congratulations Multicore!

comment by Lukas Finnveden (Lanrian) · 2020-12-04T10:08:22.099Z · LW(p) · GW(p)

Thanks for running this, and congratulations to multicore!

Who is the CooperateBot surviving the second longest? Is it CooperateBot [Larks] or CooperateBot [Insub]?

Replies from: lsusr
comment by lsusr · 2020-12-04T10:32:41.585Z · LW(p) · GW(p)
Bot Death Round
Silly 0 Bot 2
Silly Chaos Bot 5
Silly 5 Bot 5
Silly 4 Bot 5
Silly Invert Bot 0 5
Silly 1 Bot 6
Definitely Not Collusion Bot 6
S_A 6
Silly Invert Bot 1 7
PasswordBot 7
Silly 3 Bot 8
Ben-Bot 8
Silly Invert Bot 4 9
Silly Invert Bot 5 9
Silly Random Invert Bot 9
Silly Counter Invert Bot 10
Silly Random Invert Bot 2-3 10
CooperateBot [Larks] 11
Silly Cement Bot 3 12
Silly Invert Bot 2 12
Random-start-turn-taking 12
Silly Cement Bot 2-3 14
Silly Invert Bot 3 14
Silly Cement Bot 2 18
RandomOrGreedyBot 18
RaterBot 20
Pure TFT 25
AbstractSpyTreeBot 26
Silly 2 Bot 28
Copoperater 29
Winner against low constant bots 32
CopyBot Deluxe 41
jacobjacob-Bot 42
KarmaBot 43
CloneBot 44
incomprehensibot 45
a_comatose_squirrel 46
A Very Social Bot 47
Akrasia Bot 47
CliqueZviBot 49
Clone wars, episode return 3 49
Why can't we all just get along 50
Silly TFT Bot 3 51
Empiricist 57
AttemptAtFair 62
Silly TFT Bot 2 65
OscillatingTwoThreeBot 90
BeauBot 109
LiamGoddard 172
MeasureBot 436
SimplePatternFinderBot 436
BendBot 579
CooperateBot [Insub] 1208
EarlyBirdMimicBot N/A
comment by Insub · 2020-12-04T16:13:21.624Z · LW(p) · GW(p)

Is something strange going on in the Round 21-40 plot vs the round 41-1208 plot? It looks like the line labeled MeasureBot in the Round 21-40 plot switches to be labeled CooperateBot [Insub] in the Round 41-1208 plot. I hope my simple little bot actually did get second place!

Replies from: Measure
comment by Measure · 2020-12-04T16:50:10.754Z · LW(p) · GW(p)

I thought the same thing when I first saw the graphs, but I think the crossover happened near round 400 where the line dips down and is obscured by the labels. This is consistent with lsusr's obituary comment showing MeasureBot died shortly afterward at round 436.