A.D&D.Sci May 2021 Evaluation and Ruleset

post by abstractapplic · 2021-05-24T16:25:13.704Z · LW · GW · 16 comments

Contents

  Ruleset
    Species
    Days Since Death
    Butchery
      Mild Boar
      Jungle Mammoths
      Dragons
      Jewel Beetles
    Universes
    The NPC
  And the winner is . . .
  Reflections
  Scheduling
None
16 comments

This is a followup to the D&D.Sci post [LW · GW] I made last week; if you haven’t already read it, you should do so now so you know what I'm talking about here.


Ruleset

Generation code is now up here; this section goes through the important points.

Species

The carcasses obtained by the stranger are 41% Mild Boar, 31% Jungle Mammoths, 5% Jewel Beetles, and 23% Dragons (5% Green, 2% Gray, 8% Blue, 8% Red).

Days Since Death

The days since each creature’s death is modelled by rolling two d10s and taking the lowest result.

(For the rest of this post, let “[DSD]” stand in for “Days Since Death”.)

Butchery

Mild Boar

The revenue from a boar is found by rolling a d4, a d8, a d12, and a d20, then summing every result greater than [DSD].

EV is graphed below.

Jungle Mammoths

The revenue from a Jungle Mammoth carcass is given by 20+10d4-3*[DSD]

EV is graphed below.

Dragons

Valuable components of Dragons are Scales, Tongue, Heart and Spleen.

A Dragon has 5d8 Scales; these are worth 1sp apiece, unless it’s Red, in which case they’re worth 2sp apiece (this is the only way colour is relevant).

A Tongue is worth 10sp, so long as the corpse is less than three days dead. A Heart is worth 30sp until day five, and a Spleen is worth 5sp until day seven. Once these thresholds are passed, the organs lose their magic and become worthless.

EV is graphed below.

Jewel Beetles

The revenue from a Jewel Beetle is found by starting with a value of 1 and rolling a d6 repeatedly. On a 1, stop rolling and take the value as your revenue; on a 6, double the value and keep rolling; on any other roll, add your result to the value and keep rolling.

EV is infinite. This is cold comfort to those who won the one on auction, which happened to be worth only 18sp.

Universes

I got 7 entries. These were randomly assigned between two universes, and the universe with three human players got an NPC to even the odds.

The NPC

To hedge my bets against an unexpected level of cooperation – if all or most of the human players made low bids, I didn’t want the NPC to ruin their fun – I knew I had to make NPC behaviour derived from player behaviour, while not granting it any unfair advantage. Therefore, for each lot, the NPC bid is randomly selected from the seven player-submitted bids.

(Three of you become aware that one of your compatriots appears to be a flickering, translucent tangle of selves. The auctioneer, noticing your concern, reassures you that it’s probably fine and this sort of thing happens all the time.)

And the winner is . . .

The winner, making a 51sp profit, is . . .

 

 

 

 

 

. . . hilariously, the NPC.

(I know, I was surprised too. Check my evaluation code if you like.)

However, the most successful human player was . . .

 

 

 

 

 

. . . GuySrinivasan, with a 31sp profit!

(As promised, you get to specify one non-ridiculous thing about July’s scenario. DM me when you’ve made your decision.)

Reflections

Pitting players against each other was interesting. When designing this challenge, I devoted a decent amount of thought to troll-proofing – the reason you have a finite budget is so players bidding unreasonably high could do limited damage – but what actually threatened to derail the scenario was cooperation; even in a zero-sum winner-take-all system, two of the seven entries were “1sp on everything”. If there had been a few more bidders like that, and/or if more than one of them had happened to end up in the same universe, this could have been a very different game.

The other novel thing about this scenario is how it built on earlier work. To me, this experiment was an unqualified success: I got to reuse a lot of code and conceptual legwork, players could balance priors from the previous scenario against new evidence from this one, and I felt more comfortable throwing a wicked problem at people who had already had a ‘normal’/‘fair’ challenge of the same kind. I enthusiastically solicit feedback on these points, and on all other points.

Scheduling

The next scenario ought to be ready early-to-mid-June. However, as rationalists, we are all familiar with the is/ought distinction; see you when I see you.

16 comments

Comments sorted by top scores.

comment by SarahNibs (GuySrinivasan) · 2021-05-24T19:58:41.191Z · LW(p) · GW(p)

Here are the average profits and win rates if we re-ran the sim many times:

bidderavg_profitprob_winbids
A2 sp0.5%[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
B2 sp0.5%[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
C49 sp18.4%[76, 36, 13, 26, 21, 51, 18, 13, 9, 12, 9, 18, 102, 21, 26, 31, 13, 41, 13, 36]
D43 sp7.0%[50, 36, 13, 25, 19, 62, 17, 13, 1, 11, 1, 19, 85, 20, 20, 20, 13, 10, 13, 20]
E52 sp25.6%[73, 34, 15, 21, 17, 57, 16, 15, 6, 11, 6, 15, 8, 17, 28, 31, 15, 42, 15, 34]
F37 sp18.1%[73, 35, 14, 22, 14, 63, 16, 7, 5, 10, 2, 18, 4, 14, 29, 29, 10, 44, 14, 34]
G59 sp20.0%[71, 33, 16, 24, 20, 51, 19, 16, 9, 15, 9, 18, 12, 20, 26, 31, 16, 42, 16, 33]
NPC42 sp9.9%[]

I am bidder E. Whoever bidder G is, they make more profit than I do on average, but I win 25% of the time and they only win 20% of the time.

My method was to interleave two bidding strategies, trying to either spend 300sp at a decent ROI, or spend less on an ROI high enough to beat whoever ended up spending 300sp at a not-quite-as-good-ROI-as-my-original-target.

I am favored to win all heads-up matches, but I never am never favored to win a 4-way with real players. G wins all but one of those, and F wins more 2nd-place finishes in those than I do. So I'm heavily reliant on A/B/NPC being present to make the matchups look more like head-to-head than 4-way.

Replies from: simon, Measure
comment by simon · 2021-05-24T23:49:27.055Z · LW(p) · GW(p)

Bidder G reporting in... 

Looks like my incorrect speculations on the exact models were likely not helpful, I also did not expect the 1 bidders (fine strategy against real duplicates like in the scenario given, but we're trying to have a competition here!). 

Replies from: Pattern, GuySrinivasan
comment by Pattern · 2021-05-25T18:10:38.426Z · LW(p) · GW(p)
we're trying to have a competition here!).

How much time did you spend coming up with that strategy?

Replies from: simon
comment by simon · 2021-05-26T02:40:18.048Z · LW(p) · GW(p)

Good point. I should have anticipated strategies that require less effort to be more popular.

Replies from: Pattern
comment by Pattern · 2021-05-26T04:19:33.350Z · LW(p) · GW(p)

Returns on time aside (I meant that question seriously - plotting out a returns on compute versus compute (time) curve sounds interesting***):

It* requires less effort because 'cooperation' reduces effort, while 'competition' increases it**.

(This is also measurable in the split between the traveler and the players.)


*The strategy

**effort

***In particular, getting a sense for something like the marginal returns on time invested, and then comparing it across problems.

Replies from: simon
comment by simon · 2021-05-28T06:21:15.767Z · LW(p) · GW(p)

It* requires less effort because 'cooperation' reduces effort, while 'competition' increases it**.

In general, one would define cooperation in games as strategies that lead to better overall gains, and ignore effort involved in thinking up the strategy. In this case, there was an easy cooperative strategy, but it's not in general true, for example, in the Darwin Game [? · GW]  designing a cooperative strategy was more complicated than a simple 3-bot defect strategy. 3-bot didn't do well but possibly could have if there were a lot of non-punishing simulators submitted (there weren't).

Also, even in this particular case, you could have had better results if you had taken the effort to get more to follow the same strategy. The rules did not explicitly forbid coordination, even by non-Lesswrongers, so you could have recruited a horde of acquaintances to spam 1-bids. (that might have been against the spirit of the rules, but you could have asked abstractapplic about it first I, I guess).

Replies from: Pattern
comment by Pattern · 2021-05-29T20:45:25.468Z · LW(p) · GW(p)
In general, one would define cooperation in games as strategies that lead to better overall gains, and ignore effort involved in thinking up the strategy.

You should change your username to 'one' then.*

Imagine a game where the 'optimal strategy' is more difficult to calculate than the optimal strategy in chess. Or, suppose you're playing a chess game. You know how to calculate the optimal strategy. Unfortunately, it will take 10 years to calculate on your supercomputer, and you can't take 10 years to make the first move. To neglect time as a resource is to neglect that 'the optimal strategy' must be executed after it is formulated, not before.


The rules did not explicitly forbid coordination, even by non-Lesswrongers, so you could have recruited a horde of acquaintances to spam 1-bids. (that might have been against the spirit of the rules, but you could have asked abstractapplic about it first I, I guess).

Do you want to make a bet concerning abstractapplic's response to this question?


*I expect

Neo hasn't been taken yet.

comment by SarahNibs (GuySrinivasan) · 2021-05-24T23:59:48.920Z · LW(p) · GW(p)

What I wrote to abstractapplic:

Vague price sense > guessing how others might bid > guarding against someone aiming for significantly higher ROI than you did > exact price sense, I think?

Replies from: simon
comment by simon · 2021-05-25T15:34:42.236Z · LW(p) · GW(p)

Yeah, and actually 1-bidding can be a good strategy even from a selfish perspective if you can get enough people to coordinate on it, since a small enough number of high bidders will run out of money and the 1-bidders make a large profit on what they do win, though it's not stable against defection (2-bidders win in the 1-bidder-filled environment).

Replies from: Pattern
comment by Pattern · 2021-05-25T18:07:33.690Z · LW(p) · GW(p)
(2-bidders win in the 1-bidder-filled environment)

But:

  • are the 2-bidders stable against 'defection'?
  • There weren't any 2-bidders.
Replies from: simon
comment by simon · 2021-05-26T02:37:29.619Z · LW(p) · GW(p)

are the 2-bidders stable against 'defection'?

Of course not, they lose to 3-bidders. I wouldn't consider that "defection" in the same way though, since the 1-bidding is presumably an attempt at coordination and the 2-bidding would be exploiting that coordination and not directly a coordination attempt.

There weren't any 2-bidders.

Sure, but if 1-bidding were to become popular in similar problems, there would start to be 2-bidders.

comment by Measure · 2021-05-24T21:38:44.528Z · LW(p) · GW(p)

I am bidder C above. Mostly I was trying to win the Jewel Beetle and hope for a lucky roll. I was expecting a larger number of entries and was optimizing for chance to win rather than expected profit. I didn't put a ton of effort into modeling the exact EVs of the lots, and many of my bids were adjusted to be slightly above the nearest multiple of five.

Replies from: Ericf
comment by Ericf · 2021-05-26T15:04:36.351Z · LW(p) · GW(p)

Same here as "D" - given a goal of "score highest" winning a high value Beetle auction was the best way to do it. I did try to tweak my valuations such that I would either win a bunch of auctions up front, and then not be able to bid on Beetle, or not win the early auctions, and then have a chance at a Beetle win.

Sadly, the EV (excluding beetle) was only ~600 sp, so there was no manipulation of "let others win the first few auctions, then when they run out of money clean up with low bids at the end"

Replies from: GuySrinivasan
comment by SarahNibs (GuySrinivasan) · 2021-05-26T15:47:06.112Z · LW(p) · GW(p)

The Jewel Beetle was weird. It was what, like 8% to auto-win everything by winning the Beetle? Except there was just one roll, overall. So in each group of four, one person auto-wins, and then it becomes a cross-group auction where whoever got the Beetle for way less ends up winning. Seems like with very few people participating overall, going for the Beetle caps your odds of winning at 8%, which is not great. With very many people participating, like 100, going for the Beetle caps your odds of winning at the chance there is not a cohort with pure non-beetlers, otherwise whichever of them wins the beetle probably just wins.

Replies from: Ericf
comment by Ericf · 2021-05-26T16:27:24.750Z · LW(p) · GW(p)

Very good points. I actually made a counting error, and estimated the odds of Beetle Wins at ~20% And then also failed to account for more than 4 players.

comment by Pattern · 2021-05-25T18:05:22.861Z · LW(p) · GW(p)
Pitting players against each other was interesting. When designing this challenge, I devoted a decent amount of thought to troll-proofing – the reason you have a finite budget is so players bidding unreasonably high could do limited damage – but what actually threatened to derail the scenario was cooperation; even in a zero-sum winner-take-all system, two of the seven entries were “1sp on everything”. If there had been a few more bidders like that, and/or if more than one of them had happened to end up in the same universe, this could have been a very different game.

The specter of super-rationality.