Posts

Comments

Comment by simon on D&D.Sci(-Fi) June 2021: The Duel with Earwax · 2021-06-24T07:04:04.681Z · LW · GW

Update on Epsilon resonance:

It's cubic not sine; I can fit Maria's Epsilon data so that the curve rounds to the exactly correct value for every data point, and also for Janelle's data (separately) to round to the exactly correct value; I still need to check if I can make a single curve and multiplier between Maria and Janelle to round exactly for both, but it does look like the curves are at least fairly close to exact multiples of each other. 

Interestingly, no x-value rounding needs to be assumed, at least to get the correctly rounding values for Maria and Janelle separately. So, perhaps the x (heteropneum amplitude) values are exact?

The cubic curve does take a big dive at high heteropneum amplitudes, but fortunately not until after Earwax's ~3.2 amplitude. Also, the fit for Maria's 0.57 amplitude result of 0.1 is actually around 0.096.  Will getting 0.21 suggests he is at least around 2.13 times stronger than Maria using Epsilon and is projected to get at least about 3.85 against a 3.2 amplitude heteropneum. So, Will using Epsilon still looks like a safe pick to survive if we can't find guaranteed survival another way.

Comment by simon on D&D.Sci(-Fi) June 2021: The Duel with Earwax · 2021-06-23T05:20:37.489Z · LW · GW

Initial impressions:

Even though Janelle probably always uses Beta and Maria probably always uses Delta, we can get an idea of the characteristics of each resonance type by comparing their hypothetical results against heteropneums weak enough for them to overwhelm. 

From eyeballing graphs of strength v. heteropneum amplitude for each resonance type and both pilots:

The qualitative behaviour of each resonance looks similar between Janelle and Maria, but quantitatively different (likely a simple multiplicative factor, but I should check!). The multiplier per pilot is different for the different resonance types (so, e.g, Janelle is about as strong as Maria with Beta resonance, but weaker with other resonance types).

And for the different resonance types the graphs are as follows:

Alpha does not depend on enemy strength and maybe has two clumps.

Beta does not depend on enemy strength.

Gamma's points seem to line up on straight, mostly slanted lines from a more-or-less common origin at zero enemy strength. Suggesting a strong dependence on enemy strength but one of the lines is flat and too low, so need to find a way to find out which line you'll end up on.

Delta has a gentle upward trend for Maria (too noisy to detect for Janelle).  This does not appear to be a selection effect as Maria is always handily beating the heteropneums.

Epsilon has a curve that looks like a parabola at first, but then slows down, so maybe a sine curve? It is very consistent looking (not noisy) so should be possible to have an accurate fit for it. The curve looks a bit distorted in places for Janelle but this is likely just rounding due to her very low values at this resonance. 

Zeta and Eta have points lined up on flat lines. For Zeta one of those lines is at zero.

Based on this, some candidate responses: 

  1. If we can figure out which line we'll end up on, possibly Gamma as used by Janelle. We need to be confident however that we'll end up on a good line.
  2. Ditto for Zeta as used by Corazon, but with the additional caveat that we need to know that her past hypothetical results of 1.98 were low tier results (and she'll get high tier this time). If the 1.98's were high tier, she'll lose.
  3. If we can't figure out the information needed for either of the previous, the safe choice appears to be...Epsilon as used by Will. This may seem surprising at first glance since Will's only hypothetical result with Epsilon resonance was a measly 0.21. However, this result was at Heteropneum amplitude 0.57, near the bottom of the Epsilon power curve as seen for Maria and Janelle. If Will has the same Epsilon power curve but with a multiplier, he is around twice as strong as Maria with Epsilon resonance (but check rounding error bars!), and should confidently beat Earwax as long as the Epsilon power curve doesn't take a surprisingly sharp turn to decline between 3.12, where Maria last overwhelms heteropneums with Delta, and Earwax's amplitude of 3.2. However, Will will not overwhelm Earwax, and either option 1 or 2 could do so if successful, so if we can figure out the necessary information for either of those options, they would be preferable.
Comment by simon on A.D&D.Sci May 2021 Evaluation and Ruleset · 2021-05-28T06:21:15.767Z · LW · GW

It* requires less effort because 'cooperation' reduces effort, while 'competition' increases it**.

In general, one would define cooperation in games as strategies that lead to better overall gains, and ignore effort involved in thinking up the strategy. In this case, there was an easy cooperative strategy, but it's not in general true, for example, in the Darwin Game  designing a cooperative strategy was more complicated than a simple 3-bot defect strategy. 3-bot didn't do well but possibly could have if there were a lot of non-punishing simulators submitted (there weren't).

Also, even in this particular case, you could have had better results if you had taken the effort to get more to follow the same strategy. The rules did not explicitly forbid coordination, even by non-Lesswrongers, so you could have recruited a horde of acquaintances to spam 1-bids. (that might have been against the spirit of the rules, but you could have asked abstractapplic about it first I, I guess).

Comment by simon on A.D&D.Sci May 2021 Evaluation and Ruleset · 2021-05-26T02:40:18.048Z · LW · GW

Good point. I should have anticipated strategies that require less effort to be more popular.

Comment by simon on A.D&D.Sci May 2021 Evaluation and Ruleset · 2021-05-26T02:37:29.619Z · LW · GW

are the 2-bidders stable against 'defection'?

Of course not, they lose to 3-bidders. I wouldn't consider that "defection" in the same way though, since the 1-bidding is presumably an attempt at coordination and the 2-bidding would be exploiting that coordination and not directly a coordination attempt.

There weren't any 2-bidders.

Sure, but if 1-bidding were to become popular in similar problems, there would start to be 2-bidders.

Comment by simon on A.D&D.Sci May 2021 Evaluation and Ruleset · 2021-05-25T15:34:42.236Z · LW · GW

Yeah, and actually 1-bidding can be a good strategy even from a selfish perspective if you can get enough people to coordinate on it, since a small enough number of high bidders will run out of money and the 1-bidders make a large profit on what they do win, though it's not stable against defection (2-bidders win in the 1-bidder-filled environment).

Comment by simon on A.D&D.Sci May 2021 Evaluation and Ruleset · 2021-05-24T23:49:27.055Z · LW · GW

Bidder G reporting in... 

Looks like my incorrect speculations on the exact models were likely not helpful, I also did not expect the 1 bidders (fine strategy against real duplicates like in the scenario given, but we're trying to have a competition here!). 

Comment by simon on A.D&D.Sci May 2021: Interdimensional Monster Carcass Auction · 2021-05-24T02:21:56.238Z · LW · GW

I'm assuming that BST is British Summer Time and the deadline has passed. Remarks about the problem and my bid before abstractapplic posts the results:

Decision on how aggressively to bid

With some exceptions for the jewel beetle and mild boars, discussed below, I generally estimated the EV and bid lower by a scaling factor. The scaling factor was pretty ad hoc and not based on some sophisticated game theory, as I don't really know how aggressively people are going to bid. I did not adjust the scaling factor based on the lot number.

One Schelling point is to bid a total of 300, so I figure I should probably bid higher than that on average (given the revenue up for grabs is more than twice that). Another would be to bid at the minimum end of the observed range for each lot, so I could have tried to beat that if the minimums were reasonable, but didn't get around to actually checking this, except that I did note that my bids were above my expectations for what the true minimums were in the cases where I got around to estimating that. 

I assume other people are also bidding above these points. If that is not the case, I will win a lot of bids, but likely lose in profit to someone making higher per-lot profit on fewer lots.

Analysis of revenue from different carcass types:

Jungle Mammoths:

The Jungle Mammoths (=elephants?) looked consistent with a formula of 31+4d6-3dsd so I assumed that their EV was 45-3dsd.

Dragons:

The dragons look like they all have similar characteristics in their drops over time, with in particular a big drop of around 30 value between 4 and 5 dsd (except gray dragon which has too little data to tell). One possibility would be that each has their own non-time variant distribution which is added with a "dragon curve". If I had more time, I would have tried to figure out the dragon curve and the separate distributions based on comparing the different dragon types (or rule it out and look for another hypothesis). As it is, I estimated the dragons in a pretty ad-hoc manner (eyeballing graphs mostly).

I do note that red dragon has some interesting even/odd behaviour, as it is always odd from 1 dsd to 6 dsd, and always even from 7 dsd to 10 dsd. If the "dragon curve" hypothesis is true, then this could be explained by an always-even or always odd "red distribution" (e.g. 2*2d12?) combined with a "dragon curve" that switches from odd to even at that point.

Mild boars:

For the mild boars (=pigs?), I tried to figure some model out that would match the observed qualitative behaviour and came up with rolling two d20s and setting each individually to 0 if less than or equal to the dsd. However, this did not match the quantitative characteristics, as it was consistently too pessimistic at low dsd and too optimistic at high dsd.

So, instead of taking the hint that I was wrong, I doubled down and added some epicycles. Namely, rolling 3 dice, setting each to zero if below dsd, then taking the top two, except that if you rolled a zero, you had to include the zero. (That's a pretty crazy hypothesis as stated, but maybe slightly less crazy in the equivalent formulation of adding the dsd to each die, taking the top two dice, and then setting any die over 20 to zero).

This seemed to predict the low-dsd mild boars a lot better, but was still optimistic on the high-dsd mild boars. Due to low numbers, a close fit on the high-dsd boars might be less necessary though. It also predicts a bimodal distribution with a trough at around 22 and while you can sort of see something like a hint of that in the data, it is not very convincing. Going to 4 dice adversely affected the early mild boar fit and seemed worse overall.

Anyway, I decided to roll with it (the 2 out of 3 d20s model), but since I am not super convinced, I limited my bids on the 8 dsd mild boars (lots 9 and 11) to 9sp, equal to the ceiling of the average of observed value for 8 dsd mild boars. Due to the "winner's curse", in the very likely event that I am wrong on their distribution I will probably take a loss on these.

Jewel Beetle

As previously remarked on by other commenters, the jewel beetle (or "lottery ticket beetle" as I think of it) has a high variance distribution. It looks more or less like a power law. In fact, it looks like it's such an extreme power law that it won't even have a finite expected value, as the extreme low frequency outliers will have value disproportionate to the low frequency.

So, if I were in the position of the hypothetical scenario provided, I would probably bid a lot for the lottery ticket beetle.

However, I'm not in that situation. I am instead competing for the glory of being Numbah One. And while the jewel beetle might have an extreme value, it probably doesn't. So, I reduced my jewel beetle bid to the median jewel beetle value of 12 instead of gambling on an outlier here.

I also note that new jewel beetles seem to tend to be lower in value than old ones. Not sure if this is random and my prior is generally against this.

Actual Bid

71,33,16,24,20,51,19,16,9,15,9,18,12,20,26,31,16,42,16,33

Comment by simon on A.D&D.Sci May 2021: Interdimensional Monster Carcass Auction · 2021-05-17T23:19:16.095Z · LW · GW

Is our profit evaluated based on actual results, or based on expected value?

Comment by simon on Does butterfly affect? · 2021-05-15T02:03:52.818Z · LW · GW

Sure, the butterfly is really minor compared to everything else going on, and so only "causes" the hurricane if you unnaturally consider the butterfly as a variable while many more important factors are held fixed.

But, I don't believe the assassination of Franz Ferdinand is in the same category. While there's certainly a danger that hindsight could make certain events look more pivotal than they really were, the very fact that we have a natural seeming chain of apparent causation from the assassination to the war is evidence against it being a "butterfly effect". 

Comment by simon on D&D.Sci May 2021 Evaluation and Ruleset · 2021-05-15T01:41:40.467Z · LW · GW

The amount Carver gains from a Yeti carcass is given by 70+1d6-[DSD]d6

 

No, I already went over this with GuySrinivasan lol...

line # 89 (carcass # 88): Yeti,0,60sp,60sp,77sp

Anyways, I'm assuming that's a typo there and you meant to put in 72.

Also

60-28*[DSD] for Snow Serpents 

that should be a 20.

 

This one really brought home to me the usefulness of strong (yet correct) priors. 

Assuming that the typo wasn't in the d6, credit to GuySrivinisan for correctly defending the d6 against the weight of evidence for the d5. Also, the insistence on a higher prior probability for age distribution than a weighted average that just happens to be triangular would have.

This puzzle was made a lot easier by the simplicity of the model, e.g. everything was independent from everything else, except for bids and value obtained depending on monster type and days since death which we were primed to expect by the problem, and no hidden variables except the necessary randomness to actually have something to work out. I don't particularly feel like a Bayesian superintelligence though maybe all problems look like this to one sufficiently advanced.

Looking forward to whatever non-puzzle you have in mind for Monday. 

Comment by simon on D&D.Sci May 2021: Monster Carcass Auction · 2021-05-11T01:08:20.413Z · LW · GW

Anyway, in the spirit of tumbling platonic solids:

One possible distribution for the age numbers would be the distribution generated by min(d12,d12)-1. This is not the same as the 1,2,3,4...12 triangular distribution, but rather a 1,3,5,7,...23 triangular distribution. (The 1,2,3...12 distribution would be generated by min (d12, d13)-1).

And checking the likelihood - this one is actually better. 

LOG10 likelihood

-1672.05 for 1,2,3,...12

-1671.43 for 1,3,5,...23

P.S. I was terse in the previous comment because of time constraints. About the difficulty of the triangular distribution, I was thinking it wasn't that unlikely anyway because in the previous problem abstractapplic generated a weighted average by taking a random entry from a list that contained duplicates, and a suitable list could be generated easily enough using a for loop. 

Comment by simon on D&D.Sci May 2021: Monster Carcass Auction · 2021-05-10T15:55:16.138Z · LW · GW

Looks like the likelihood for triangular is over a million times better (to log-nearest order of magnitude ~10^-1672 v. ~10^-1679) than the 1/6 drop per turn exponential. 

Comment by simon on D&D.Sci May 2021: Monster Carcass Auction · 2021-05-10T05:12:57.727Z · LW · GW

The age 0 amount is higher than expected from the model distribution, but it's nowhere near 2.5x the age 1 amount. I have:

Overall 289 age 0, 233 age 1 (expected 258, 236.5)

Snow Serpents 103 age 0, 92 age 1

Winter Wolves 141 age 0, 108 age 1

Yeti 45 age 0, 33 age 1

Comment by simon on D&D.Sci May 2021: Monster Carcass Auction · 2021-05-10T01:30:21.447Z · LW · GW

A potential model of the full problem (involves questionable numerology):

There are 13 lots currently, and the number of carcasses in the record is divisible by 13 (129*13). If we include the current auctions, that's 13*130, or 13*13*10. 

So, I'll assume that all auctions have 13 lots.

The individual monster types aren't divisible by 13 (except Snow Serpents), nor are they if we include the current auctions (except Winter Wolves). However, the Wolf:Yeti ratio seems very close to 3, and if the overall ratios were 2:5:6 that would fit in with the 13 theme and seems close enough to the Yeti:Serpent:Wolf ratio.

The age distributions look fairly triangularish, with a maximum age of 11. One possible way to express that would be there is on average 1 carcass of age 11 for every 2 aged 10, up to every 12 aged 0. And what's 1+2+...12? Of course - a multiple of 13. Specifically 13*6.

Now, it would be nice if looking at the data in blocks of 13*6 showed some pattern, but I don't see one, nor is the data a multiple of 6. No matter, we will press on without such empirical validation.

I also note that, in the current auction, the early lots look newer than the late auctions. Coincidence? Probably.

So, model:
Base Lot Generation (low confidence):
Auction of 13 lots each day
Each lot assigned an age from 0 to 11 by a random distribution weighted by {12-age}
Each lot assigned an animal type by a random distribution weighted 2:5:6 for Yeti:Serpent:Wolf

Bids: There are two non-Carver bidders, one of whom bids:
60-20*age for Snow Serpents
50-12*age for Winter Wolves
55-6*age for Yeti

and the other one bids:
9+d8 for Snow Serpents
19+d4 for Winter Wolves
29+d6 for Yeti

whereas Carver bids (not high confidence):
7+2d10 for Snow Serpents
31+2d8-3*age for Winter Wolves
32+2d20-2*age for Yeti

Revenue: (credit to GuySrinivasan)
20+2d6 for Snow Serpents
25+4d6-2*age for Winter Wolves
72+1d6-{age}d6 for Yeti (assuming that the prior for d6's over d5's is stronger than the 45 times better fit for a d5 in the base part of this formula)

Comment by simon on D&D.Sci May 2021: Monster Carcass Auction · 2021-05-09T19:28:33.525Z · LW · GW

Nice analysis! But I am not so confident on how strong the prior should be and am somewhat torn on the conclusion.

Comment by simon on D&D.Sci May 2021: Monster Carcass Auction · 2021-05-09T06:08:31.684Z · LW · GW

OK I calculated the likelihoods of getting the full yeti revenue data set for those distributions and got the following results:

For 78-{age+1}d6 (which is equivalent to 71+1d6-{age}d6):

likelihood ~ 10^-192

For 72 + 1d5 - {age}d6:

likelihood ~10^-189

So the 1d5 version is literally 1000 times better fit to the data, and I doubt that the prior for 1d6 over 1d5 is that strong. Besides, the base value of 72 is a round number in a way, might increase the 1d5 prior a bit. I'd definitely bet on the 1d5 version over the 1d6.

Comment by simon on D&D.Sci May 2021: Monster Carcass Auction · 2021-05-08T23:32:23.549Z · LW · GW

OK, I see such an argument for the die used, but the base value proposed can't be correct with that die.

Comment by simon on D&D.Sci May 2021: Monster Carcass Auction · 2021-05-08T22:01:14.878Z · LW · GW

Your yeti revenue distribution would predict a maximum of 76 at day 0, but we see 77. We also don't see anything below 73. I suggest 72 + 1d5 at day 0. 

The full formula would be then 72 + 1d5 - {age}d6, where these are independent d6 rolls, and not a single roll multiplied by the age (I think this is what you meant, but clarifying since I wasn't 100% sure on the terminology used). (The other interpretation would lead to too high variation at high ages, I think).

The formulas for Snow Serpent and Winter Wolf look consistent with the data to me.

Comment by simon on D&D.Sci May 2021: Monster Carcass Auction · 2021-05-08T20:37:53.331Z · LW · GW

Nice graphs in the colab. 

For the bidder model, the Yeti day=3 data is consistent with the Days 0,1,2 line and not with the days >=4 distribution, and in general, whichever of the new-carcass line or old-carcass distribution would be higher is what we see.

So, my model would be that there are two other bidders, one who makes deterministic linear bids and one who bids in a uniform distribution (see my reply to my top-level comment); both bid on everything (well, I guess as long as the bid is >0), but we don't see the lower bid.

Comment by simon on D&D.Sci May 2021: Monster Carcass Auction · 2021-05-08T08:36:02.312Z · LW · GW

To save anyone time if they want to read my decision, it's the same as GuySrinivasan's better formatted decision except on lots 10 and 11, where I took seriously the statement that we're indifferent to work and bid 1 sp higher than GuySrinivasan did.

Comment by simon on D&D.Sci May 2021: Monster Carcass Auction · 2021-05-08T08:29:53.510Z · LW · GW

Above I explicitly noticed that Carver's bids would distort the distribution, set that aside, then, when I encountered a skewed distribution, failed to think to attribute it to Carver's bids. But of course that would be the obvious explanation. 

I looked at this, and found that the actual distribution does match pretty well what I'd expect if the true distribution of non-Carver bids were a uniform distribution for old carcasses, though the skew effect was only really dramatic for Snow Serpents anyway.

(edit): Assuming this is correct, we can now better calculate the probability of losing if we bid lower.

For Lot 10: 23 v. 22 sp on 7 day wolf, we cut win rate by exactly 1/3 going down to 22, the revenue is a bit over 25 by my linear fit or exactly 25 by GuySrinivasan's formula; meaning, if GuySrinivasan is right then the average profit is exactly the same either way.

For Lot 11: 22 v. 21 sp on 8 day wolf, we exactly halve win rate going down to 21, the revenue is a bit over 23 by my linear fit or exactly 23 by GuySrinivasan's formula; meaning, if GuySrinivasan is right then the average profit is exactly the same either way.

(edit2): A further note - Even though accounting for Carver's bids when calculating the old-carcass bid distribution dramatically reduces the loss probability from bidding low on Snow Serpents, it's still not worth it to bid low on them (since 1/8 probability of losing more than makes up for profit going from 9 to 10). However, if the parameters of the problem had been changed a bit, this could have been a nice trap.

(edit3): Or wait - do we actually know that the non-Carver bid distribution for Snow Serpents only goes down to 10? what if it goes down to 8 or lower, but we miss those because carver bids over those? Carver's bids go down to 9, so we could theoretically see a 9 non-Carver bid, but the data might not include any by coincidence. We can't possibly see a non-Carver bid below 9. Looking at this - no, if it goes down to 8, the number of non-Carver bids we'd expect to see seems too low - I'm reasonably confident it doesn't go down that low.

Comment by simon on D&D.Sci May 2021: Monster Carcass Auction · 2021-05-08T06:10:03.722Z · LW · GW

Initial observations and results (warning, for all I know right now it could be substantially complete solution):

It looks like the non-Carver bidders for newer carcasses are consistent, and drop linearly with age in days:

snow serpents: 60 - 20*age 

winter wolf: 50-12*age (technically we don't have a third point to see that this is linear, but see below)

yeti: 55-6*age

Presumably, this is due to a single bidder.  The trend continues until it drops below the distribution for older carcasses, so I guess that this bidder continues to bid this pattern down to 0 sp.

For older carcasses, there is a range of variation that doesn't seem to depend much on age. I therefore assume for now that it really doesn't depend on age, though for all I know it does.

For winter wolf, Carver won all the auctions for 2 and 3 day carcasses. However, Carver bid high enough for this data to be consistent with 2 day old winter wolves following the linear trend and 3 day old winter wolves following the old-carcass distribution, so I am assuming that they do follow these trends.

One potential pitfall in assessing this distribution for older carcasses is that where Carver also bids within the range, the lower non-Carver bids are more likely  to be overbid by Carver. Ignoring this for now. For 4 day old yetis, the new-Carver bidder would fall within the expected older-carcass non-Carver bid distribution, though all actual bids were higher. Ignoring this as well.

The old-carcass distribution appears to be:

snow serpent: 10-17, but heavily skewed to higher values

winter wolf: 20-23, possibly some skew to higher values

yeti: uniformish 30-35, though with few data points, could also have some skew to higher.

In terms of revenue obtained, Yetis and Winter Wolves appear to show a linearish decline with age, while Snow Serpents appear roughly constant. Assuming such linearity/constantness:

snow serpent expected revenue: 27.15

winter wolf expected revenue: about 39 -2*age

yeti expected revenue: about 75-3.4*age

Solution based on the above:

Lot 1, Yeti, 0 days: bid 56 to beat expected  new-carcass bid of 55. Expected profit about 19sp, we have 344 sp remaining.

Lot 2, Snow Serpent, 2 days: bid 21 to beat expected new-carcass bid of 20, 7 sp expected profit.  323 sp remaining.

Lot 3, Snow Serpent, 1 day: no profitable bid expected, skip.

Lot 4, Winter Wolf, 1 day: no profitable bid expected, skip.

Lot 5, Yeti, 5 days: bid 36 as there is enough likelihood of a bid up to 35 to avoid losing the ~22sp expected profit. 287 sp remaining.

Lot 6, Winter Wolf, 1 day: skip as with previous 1 day winter wolf

Lot 7, Snow Serpent, 1 day: skip as with previous 1 day snow serpent

Lot 8, Snow Serpent, 5 days: bid 18 as there is enough likelihood of a bid of 17 to make it not worth risking losing the ~9sp expected profit. 269 sp remaining.

Lot 9, Winter Wolf, 3 days: bid 24 as there is enough likelihood of a bid of 23 to make it not worth risking the ~9sp expected profit. 245 sp remaining.

Lot 10, Winter Wolf, 7 days: at an expected revenue of ~25, it is worth going down to 23sp to get ~2sp profit though we might lose. Going down to 22sp looks like it will reduce the chances of getting it by more than a third for less than 50% gain, not worth it. So, bid 23sp, ~2sp profit, 222 sp remaining.

Lot 11, Winter Wolf, 8 days: at an expected revenue of a bit over 23, it is worth going down to 22sp, but not to 21sp (which halves or slightly more than halves success chance for less than a doubling of revenue. So, bid 22sp, ~1sp profit, 200 sp remaining.

Lot 12, Snow Serpent, 8 days: bid 18 for same reason as any other old snow serpent. ~9sp profit, 192 sp remaining.

Lot 13, Winter Wolf, 2 days: bid 27 to beat expected new-carcass bid of 26 (even though we have no 2-day winter wolf non-Carver bids to confirm this);  expected profit ~8sp, 165 sp remaining. 

Since we have lots of sp remaining, I go back and put in bids for the 1 day winter wolves and snow serpents I skipped. Specifically, I'll put in 18 for the one day snow serpents and 24 for the one day winter wolves to beat the old-carcass bid distribution in case the new-carcass bidder doesn't show up.

Comment by simon on D&D.Sci April 2021 Evaluation and Ruleset · 2021-04-19T16:09:51.296Z · LW · GW

All the distributions look pretty much as I expected except the merpeople; while I suspected multiple peaks and/or a long tail I didn't expect that the lower-damage visible(ish) peak had the long tail.

One thing you didn't discuss though, is how you generated the number of trips and trip directions. One thing I noticed, but didn't remark on, is that the variation in total number of trips each month is smaller than the variation in northbound trips and southbound trips separately. That seems unexpected given the scenario, but my main theory is that you simply had some formula for the total trips per month and then randomly assigned each trip a direction?

Comment by simon on D&D.Sci April 2021: Voyages of the Gray Swan · 2021-04-17T20:56:41.765Z · LW · GW

Possible counter-evidence:

Pirates have a bimodal distribution (around 20% and 40% damage) and only the 40% part of the distribution seems to have declined. So, this looks like two different populations and theoretically, the 20% pirates could be the strong, smart pirates who win a lot and back off early if they won't get an easy win, while the 40% pirates could be weak, stupid pirates who go all out every time. 

Still all totally speculative of course.

Comment by simon on D&D.Sci April 2021: Voyages of the Gray Swan · 2021-04-14T02:34:06.909Z · LW · GW

I haven't looked much into dependence on time and direction yet, apart from

noticing that the pirates decline in relative frequency

but, I would like some clarification about whether the 100gp budget is for a single set of interventions we'll use on all trips, or is spent each time on a (potentially varying) arrangement. (edit: I see abstractapplic already responded to such a question from Measure: it's a single set obtained up front and not changed.)

My current thoughts:

From looking at the probability distributions, I mostly agree with gjm; my current recommendations are the same as GuySrinivasan's and Measure's.

Demon Whale distribution scares the crap out of me, and I probably panic buy all 20 oars allowed. I mostly agree that the distribution looks like it could be peaking near 100% damage, but I am not at all confident of this and think it could be consistent with something growing a lot bigger beyond the cutoff. I expect 250-1000 of the destroyed vessels to have been destroyed by demon whales. While it looks like it will be in control before needing 20 oars, I am uncertain enough to value oars pretty high. Budget spent = 20gp.

Merpeople have a very wierd looking distribution. It doesn't seem to tailing off at the end, and so (like gjm) I am very uncertain about what happens after the 100% cutoff. I think (like GuySrinivasan) there's a possibility that it has a multimodal distribution with another peak beyond 100% (not saying bimodal since there even looks like there might be a small peak around 25% distorting the main peak of  about 50%, though this could very easily be random). I figure merpeople are responsible for around 200 (assuming no extra peak) or potentially vastly more (assuming an extra peak) of the losses. Merpeople are potentially another solid choice for mitigation imo, unless removing them from the encounter table puts in demon whales in as the substitute. Budget spent = 45+20=65gp

I do not agree with gjm that only about 1% of crabmonster encounters are terminal. The distribution seems to be tailing off very slowly, visually more or less consistent with a triangle-shaped distribution. A simple linear extrapolation would suggest a few percent which I would take as a lower bound. But it might not be linear, but slowing in how it tapers off, so it might be much much more than this. For all I know (apart from the finite number of sinkings) it might not even sum to a finite value. On the other hand, we only really care about crabmonster attacks that do less than 200% damage, since the only relevant intervention reduces damage by 50%. I estimate that between about 50 and about 250 of the destroyed vessels to have been destroyed by the relevant part of the crabmonster damage distribution, with potentially unlimited numbers destroyed by crabmonsters outside that range. Despite the expected max of about 250 mitigatable losses I consider arming carpenters a pretty solid choice for 20gp. Budget spent = 20+65=85gp.

Nessie looks like a pretty straightforward distribution where I assume about 10-20 losses are from the bit of the distribution we can't see. A single cannon (which is all that we can afford after the other purchases) should suffice. Budget spent = 10+85=95gp.

Conclusion: 20 oars (20gp) + pay off merpeople (45gp) + arm carpenters (20gp) + one cannon (10gp), total 95 gp spent. This is the same plan previously recommended by GuySrinivasan and Measure.

Nothing else looks like it can kill us, unless e.g. some bimodal distribution has one of its humps located entirely within the >=100% zone. 

However, stepping out of the pure data analysis and into reasoning about the fantasy world, it seems strange that pirates would bother to attack us if they only ever do 64% damage. They are intelligent, after all. Maybe they run away if in a hard fight, not sticking around to do more damage than that, and take over the ship entirely if they win? I might consider a second cannon (also provides extra insurance in case nessie's distribution extends further than expected). Dropping 3 oars would provide the funds, and will probably still be enough for the demon whale. 

(and...also anticipated by Measure on the pirate theory).

Comment by simon on Don't Sell Your Soul · 2021-04-07T03:37:18.771Z · LW · GW

They say that now, but perhaps they would change their mind in a hypothetical future where they actually regularly interacted with ems.

Comment by simon on I'm from a parallel Earth with much higher coordination: AMA · 2021-04-06T09:32:15.980Z · LW · GW

I think the idea is that the different Earths have the same landmasses, and in particular the landmass corresponding to our Japan has in Dath Ilan an exile-location function that is vaguely analogous to Australia in the past in our timeline.

Comment by simon on Core Pathways of Aging · 2021-03-30T00:42:55.585Z · LW · GW

It's possible that natural selection has historically kept the quantity of transposons down to small levels relative to the amount that one gains in non-gonad cells during aging. While this may change now that selection is relaxed, if the transposon suppression in gonads is good enough, it may take a long time. (and selection may not really be relaxed in our case, given our tendencies to late reproduction).

Comment by simon on Core Pathways of Aging · 2021-03-29T23:57:10.988Z · LW · GW

It's a stochastic process, not a clock. One person gets an extra transposon copy at location A, another gets one at location B, sexual reproduction drops both 1/4 of the time.

Comment by simon on Core Pathways of Aging · 2021-03-29T15:00:21.516Z · LW · GW

If the suppression of transposons in the gonads is good enough, the reset could be the same as with any other harmful mutation - shuffling by sexual reproduction and natural selection. 

 

Which may suggest a reason why sexual reproduction exists in the first place.

Comment by simon on Texas Freeze Retrospective: meetup notes · 2021-03-18T16:16:52.536Z · LW · GW

We see evidence reported that climate change may increase the likelihood of extreme weather events, both hot and cold, in the coming years.

Your source does not seem to support this claim, and from what I am aware (largely, admittedly, from skeptic sources, but I haven't seen credible mainstream sources contradicting this), global warming is expected to reduce and not increase overall temperature variation (high temperatures increase but low temperatures get more moderate by a larger amount).

Comment by simon on Pseudorandomness contest: prizes, results, and analysis · 2021-01-31T11:27:51.140Z · LW · GW

P.S:

With (1) (total number of 1's) excluded, but all of (2), (3), (4) included:

Confidence level: 61.8

Score: 20.2

With (2) (total number of runs) excluded, but all of (1), (3), (4) included:

Confidence level: 59.4

Score: 13.0

With ONLY (1) (total number of 1's) included:

Confidence level: 52.0

Score: -1.8

With ONLY (2) (total number of runs) included:

Confidence level: 57.9

Score: 18.4

So really it was the total number of runs doing the vast majority of the work. All calculations here do include setting the probability for string 106 to zero, both for the confidence level and final score.

Comment by simon on Pseudorandomness contest: prizes, results, and analysis · 2021-01-31T10:51:34.273Z · LW · GW

I think it depends a lot more on the number of strings you get wrong than on the total number of strings, so I think GuySrinivasan has a good point that deliberate overconfidence would be viable if the dataset were easy. I was thinking the same thing at the start, but gave it up when it became clear my heuristics weren't giving enough information.

My own theory though was that most overconfidence wasn't deliberate but simply from people not thinking through how much information they were getting from apparent non-randomness (i.e. the way I compared my results to what would be expected by chance).

Comment by simon on Pseudorandomness contest: prizes, results, and analysis · 2021-01-31T10:32:46.265Z · LW · GW

Whoops, missed this post at the time.

In response to:

(4) average XOR-correlation between bits and the previous 4 bits (not sure what this means -Eric)

This is simply XOR-ing each bit (starting with the 5th one) with the previous 4 and adding it all up. This test was to look for a possible tendency (or the opposite) to end streaks at medium range (other tests were either short range or looked at the whole string).  I didn't throw in more tests using any other numbers than 4 since using different tests with any significant correlation on random input would lead to overconfidence unless I did something fancy to compensate. 

In response to:

“XOR derivative” refers to the 149-bit substring where the k-th bit indicates whether the k-th bit and the (k +1)-th bit of the original string were the same or different. So this is measuring the number of runs ... (3) average value of second XOR derivative and (4) average XOR-correlation between bits and the previous 4 bits...

I’m curious how much, if any, of simon’s success came from (3) and (4). 

Values below. Confidence level refers to the probability of randomness assigned to the values that weren't in the tails of any of the tests I used.

Actual result:

Confidence level: 63.6

Score: 21.0

With (4) excluded:

Confidence level: 61.4

Score: 19.4

With (3) excluded:

Confidence level: 62.2

Score: 17.0

With both (3) and (4) excluded:

Confidence level: 60.0

Score: 16.1

Score in each case calculated using the probabilities rounded to the nearest percent (as they were or would have been submitted ultimately).  Oddly, in every single case the rounding improved my score (20.95 v. 20.92, 19.36 v. 19.33, 16.96 v. 16.89 and  16.11 v. 16.08.

So, looks I would have only gone down to fifth place if I had only looked at total number of 1's and number of runs. I'd put that down to not messing up calibration too badly, but looks that would have still put me in sixth in terms of post-squeeze scores? (I didn't calculate the squeeze, just comparing my hypothetical raw score with others' post-squeeze scores)

Comment by simon on What is going on in the world? · 2021-01-20T01:48:14.608Z · LW · GW

True, the typical argument for the great silence implying a late filter is weak, because an early filter is not all that a priori implausible. 

However, the OP (Katja Grace) specifically mentioned "anthropic reasoning".

As she  previously pointed out, an early filter makes our present existence much less probable than a late filter. So, given our current experience , we should weight the probability of a late filter much higher than the prior would be without anthropic considerations.

Comment by simon on Centrally planned war · 2021-01-06T15:59:46.809Z · LW · GW

Individuals may be bad at foresight, but if there's predictably going to be a good price for 100000 coats in a few months, someone's likely to supply them, unless of course there's some anti "price gouging" legislation.

Comment by simon on D&D.Sci Evaluation and Ruleset · 2020-12-12T21:40:15.034Z · LW · GW

If you didn’t account for selection effects, you may have correctly avoided boosting DEX because you thought it was actively harmful instead of merely useless. 

I immediately considered a selection effect, but then I tricked myself into believing it did matter by a method that corrected for the selection effect but was vulnerable to randomness/falsely seeing patterns. Oops. Specifically I found the average dex for successful and failed adventurers for each total non-dex stat value, but had them listed in an inconvenient big column with lots of gaps. I looked at some differences and it seemed that for middle values of non-dex stats, successful adventurers consistently had lower average dex than failed ones, while that reversed for extreme values. When I (now - I didn't at the time) make a bar chart out of the data it's a lot more clear that there's no good evidence for any effect of dex on success: 

 

If you didn’t look for interactions, you may have dodged the WIS<INT penalty just because WIS seemed like a better place to put points than INT. 

Yep. Thing is, I *did* look for interactions - with DEX. I had the idea that DEX might be bad due to such interactions, and when I didn't find anything more or less stopped looking for such interactions.

And I’m pretty sure even the three people who submitted optimal answers on the last post (good job simon, seed, and Ericf) didn’t find them by using the right link function

For sure in my case. I calculated the success/fail ratios for each value of each stat individually (no smoothing), and found the reachable stat combo that maximized the product of those ratios. This method found the importance of reaching 8. I was never confident that this wasn't random, though.

When I did later start simming guesses what I simmed would have given smoothed results: a bunch of stat checks with a D20, success if total number of passed stat checks  greater than a threshold. The actual test would have been pretty far down in the list of things I would have checked given infinite time.

Comment by simon on D&D.Sci · 2020-12-11T08:04:29.220Z · LW · GW

>! in reply to:

         Graduate stats likely come from 2d10 drop anyone under 60 total

I think you're right. The character stats data seems consistent with starting with 10000 candidates, each with 6 stats independently chosen by 2d10, and tossing out everything with a total below 60. 

One possible concern with this is the top score being the round number of 100, but I tested it and got only one score above 100 (it was 103), so this seems consistent with the 100 top score being coincidence.

Comment by simon on D&D.Sci · 2020-12-07T16:17:33.294Z · LW · GW

You do indeed miss out on some gains from a jump - WIS gets you a decline in success at +1 but a big gain at +3. (Edit: actually my method uses odds ratio (successes divided by failures) not probabilities (successes divided by total). So, may not be equivalent to detecting jump gains for your method. Also my method tries to maximize multiplicative gain, while your words "greatest positive" suggest you maximize additive gain.)

STR - 8 (increased by 2)

CON - 15 (increased by 1)

DEX - 13 (no change)

INT - 13 (no change)

WIS - 15 (increased by 3)

CHA - 8 (increased by 4)

calculation method: spreadsheet adhockery resulting in tables for each stat of:

per point gain = ((success odds ratio for current stat)/(success odds ratio for current stat + n))^(1/n), find n and table resulting in highest per point gain, generate new table for that stat for new stat start point and repeat.

Comment by simon on D&D.Sci · 2020-12-07T11:23:11.976Z · LW · GW

str +2 points to 8, con +1 point to 15, cha +4 points to 8, wis +3 points to 15, based on assuming that a) different stats have multiplicative effect (no other stat interactions) and b) that the effect of any stat is accurately represented by looking at the overall data in terms of just that stat and that c) the true distribution is exactly the data distribution with no random variation. I have not done anything to verify that these assumptions make sense.

dex looks like it actually has a harmful effect. I don't know whether the apparent effect is or is not too large to be explained by it helping bad candidates meet the college's apparent 60-point cutoff.

Comment by simon on Anti-EMH Evidence (and a plea for help) · 2020-12-05T21:54:51.815Z · LW · GW

I would worry in a lot of these cases that there's some risk that your model isn't taking account of, so you could be "picking up pennies in front of a steamroller". Not in all cases though - 70-200% isn't pennies.

But things like supposedly equivalent assets that used to be closely priced now diverging seems highly suspicious.

Comment by simon on Can preference falsification be reduced with Ring Signatures? · 2020-11-30T01:46:47.893Z · LW · GW

You need to have a private key to sign, otherwise it would be useless as a "signature".

For signing (in the non-ring case), you encrypt with your private key and they decrypt with your public key, whereas in normal encryption (again, non-ring) you encrypt with their public key and they decrypt with their private key.

Comment by simon on Ongoing free money at PredictIt · 2020-11-12T08:15:46.324Z · LW · GW

It's not necessarily structural inefficiency at PredictIt specifically that is causing most of this, but to a large extent bettors pricing in the odds of Trump still winning the election. Apparently Betfair's odd of Trump winning are still around 10% - link I found from searching for articles on betting odds from the last day, but I wasn't able to find the odds at Betfair itself.

Comment by simon on The Born Rule is Time-Symmetric · 2020-11-02T02:52:42.603Z · LW · GW

Yes, if you consider one branch of the wavefunction, you have less than full information than the full state you branched from. But, the analogous situation would apply to a merger of different branches - you would have less than full information in one of the initial branches regarding the full resulting merged state.

Comment by simon on On the Dangers of Time Travel · 2020-10-27T05:53:22.546Z · LW · GW

I have (semi-*)seriously considered the possibility that time travel would in fact instantly destroy the universe (not the multiverse though). One of the theories on what happens when you time travel is based on "postselection", see also youtube video by Seth Lloyd: what you get at the end is what you'd expect if you just discarded any final state that led to a contradiction with what you put in.

Now Seth Lloyd says you then renormalize the probabilities. But this renormalization seems an extraneous assumption to me: a more natural interpretation to me is that the amplitude that would otherwise be associated with the contradictory possibilities simply disappears. It would be hard to time travel without having to have a camel-through-the-eye-of-a-needle level of contortion to prevent some contradiction, so time travel would in effect reliably destroy the universe if it were ever used.

*I am bad at taking things seriously, or I probably wouldn't post this - e.g. imagine if our lack of time travel is due to the anthropic principle. edit: we should have a greater expectation on hypotheses that have more observers in our situation, so we should expect that, conditional on time travel being deadly, it's probably too hard for us so far and not merely that the worlds where it has been invented are gone. Whew, that's a relief (for now)! If anyone reading this does discover an easy time travel method though, DO NOT USE IT.

Comment by simon on The Darwin Game - Rounds 0 to 10 · 2020-10-25T08:26:09.284Z · LW · GW

I'm not so optimistic about your bot... if the clones will be getting 250 per round and you will be getting 200, you'll lose about 1/5 of your copies per round, which is like a 3 round half-life. Not going to be anything left at 90 at that rate.

Comment by simon on The Darwin Game - Rounds 0 to 10 · 2020-10-24T18:11:22.385Z · LW · GW

Ah, I had misunderstood how the system works. I had not read carefully and assumed some kind of weighted round robin. Random pairings allow for a lot more random variation.

Comment by simon on The Darwin Game - Rounds 0 to 10 · 2020-10-24T07:29:36.260Z · LW · GW

All clones should act equally against non-clones until the showdown round. I guess some outsider bots could be adjusting behavior depending on finding certain patterns in the code in order to respond to those patterns, and the relevant patterns occur in the payloads of some clones?

FWIW, doing better or worse in any given round has a multiplicative effect between rounds, not additive. So that might affect the level of randomness, though even with 100 it seems really big to be random.

Comment by simon on Local Solar Time · 2020-10-24T04:33:19.372Z · LW · GW

The main objection in the link seems to be a pre-supposition that solar time information about different places would be less available than time zone conversion now. That seems probably false. Also, sleep schedules depending on culture and not necessarily lining up with solar time is not any less of an issue now.