Comment by simon on D&D.Sci Pathfinder: Return of the Gray Swan Evaluation & Ruleset · 2021-09-10T06:46:49.130Z · LW · GW

FWIW I had fun, or at least I remember it that way (I could have suppressed memory of frustrations). I do think I prefer things that are more complicated, or have "secrets", in terms of the underlying dynamics. As you noted in the original post, even if we don't find everything we can still find some things.

I was still planning on doing more analysis but was busy the last few days. (I also made a mistake where I tried to separate the different merfolk areas with a column and row check and noticed afterwards that only the column check had worked, which provided a tiny, but maybe not insignificant psychological activation barrier to continuing.)

Where complication is maybe less desirable is in terms of the data we are supplied. Even so, while I didn't look at, e.g. captains or voyage purpose, I don't feel it was a detriment to my enjoyment of the scenario and I could have looked at them if I had more time.

One thing that did provide some frustration at the start was separating the planned voyages into columns. Text-to-columns did not work correctly in LibreOffice Calc until I made all the hexes the same number of characters. In Excel on the other hand it immediately worked with dash as a separator. (I ended up switching back to LibreOffice Calc again though when I couldn't immediately figure out how to use regular expressions in Excel.)

Edit: I agree with abstractapplic that I'd prefer complicated dynamics to arise from simple rules where possible, but also don't know if that's practical when setting up a puzzle. I am fine though if they are not simple - in real life things are usually not simple. And, in a sense, extra complications are kind of like random noise when you don't figure them out, and are fun-to-deduce regularities to the extent you do. Which is OK (or good) either way.

Comment by simon on D&D.Sci Pathfinder: Return of the Gray Swan · 2021-09-07T09:26:27.273Z · LW · GW

Yes, thanks; deleted the extraneous N5.

Comment by simon on D&D.Sci Pathfinder: Return of the Gray Swan · 2021-09-06T18:10:06.153Z · LW · GW

Some (late relative to others) initial remarks:

As others have noted, all but one datapoint is consistent with Galleon/Carrack having 30hp, Barquentine/Dhow having 20hp, The one exception being a single case of Carrack taking 5% dmg from Reef. Also as abstractapplic has noted, Barquentines tend to be effected by things similarly to Galleons and Dhow similarly to Carracks. abstractapplic claims that they are the same other than hp, but it would take a massive confounder to account for e.g. the different encounter probabilities (as found by measure, also remarked on below).

Note, per-hex encounter probabilities below don't account for selection effects except that I tended to round up if close call to round up or down. I do count only out-of-port ships that didn't get destroyed in the denominator. Damage numbers don't account for selection effects either.


Reef, Kraken, Iceberg Mefolk and Wyrd Majick Fyre have location dependence as noted by others.


Reef always does at least 1 damage, exponentialish decline with long tail, 3.5-4 average

As noted by abstractapplic, Reefs occur on hexes adjacent to land but not adjacent to ports. I haven't seen anyone mention that for the purpose of this rule, L16 is a land hex. I guess it's a seamount.

The probability of receiving a reef encounter if going through a reef hex is about ~20% for non-Dhow's, and ~4% for Dhows.  Combined with the potentially high damage this makes these a high priority to avoid if not using a Dhow.


Kraken: spiky damage histogram. Spikes decline for higher values (but selection effects?), worse for Carrack/Dhow, and  Carrack/Dhow also seem to lack a low damage component present in the Barquentine/Galleon distribution  . ~3.5 average for Galleon/Barquentine,  ~6.5 for Dhow, ~8 for Carrack.

As others have noted Kraken have "territories". These "territories" actually are just a simple rule as with Reefs:

Kraken territories =  spaces at least a 2 hex gap away from land (where land has the same definition as for Reefs, i.e. L16 is a land hex).

Around 25-30% encounter probability per relevant hex. Combined with the high damage, high priority to avoid for all ships but especially Carrack/Dhow.

You can always avoid Kraken+Reefs by keeping a 1 hex gap between you and land (or L16) when not adjacent to a port. There are minimum length paths that follow this rule between most port combinations except between South Point and either Norwatch and Eastmarch, where a detour is required (a quite significant one for Eastmarch/South Point). 

As noted by others, the target points are in Kraken territory (IMO this is likely a coincidence since that just means they are far at sea). We can avoid going into any additional Kraken territory, but this will require an additional detour (relative to just avoiding reefs) for the western target particularly if avoiding E7 for which we have no data.


Icebergs: dmg roughly consistent with 1d6 as reported by abstractapplic, so about 3.5 average damage. However, Galleons and Carracks seem to take 1 damage more often and Barquentines and Dhows take 1 damage less often. Maybe coincidence?

Icebergs are found from rows 0-2 in summer (Jun-Aug), in rows 0-6 in spring and fall(Sept-May), and in rows 0-10 in winter (Dec-Feb). (Others have remarked on Iceberg seasonality/northernness more generally).

Icebergs are not particularly high probability (<10%) per hex, but would add up if far enough north. Since this voyage will occur in summer, we don't have to worry about icebergs unless taking a significant detour to the north.


Merfolk: Do 0 dmg a lot of time, though unlike abstracapplic I am not convinced it is exactly half. Exponentialish? decline if they do do damage, which can go to high values. About ~2.6 average damage for Galleons, ~1.7 for Barquentines, ~6 for Carracks, ~4 for Dhows.

As noted by abstractapplic Merfolk have two zones.

Most Merfolk reports form a giant triangularish donut centered around the northeast corner of J8. The donut looks like it should include F7 and L11, but there are no reports from there, and looks like it should not include O5, but O5 does have one Merfolk report. In the case of F7 this is probably just chance, since it's not visited a lot. All other reports are in another Merfolk zone southwest of Westengard. The giant donut occupies most of the center of the map and is hard to avoid, so should be analyzed further. 

Merfolk have a ~9% probability per hex for Galleons, ~2.7% for Barquentines, and ~1.8% for Carrack/Dhow. 

I have also noticed that relative to the low popularity of these hexes, Merfolk are significantly more likely to be encountered in the southwest region. I have not checked if this is connected with the ship type stats, but I will leave this for now since we don't need to go to that region. 

Wyrd Macjick Fyre:

Wyrd Majick Fyre: high damage, mostly 7 or less, but with tail (exponentialish?) 4-5 average for all except Dhow, which gets ~1.2.

As others have reported, Wyrd Majick Fyre mostly occurs around J8 (almost but not quite aligned with the Merfolk donut hole), with a few random-looking other instances. It is a >10% encounter in these hexes making them important to avoid for non-Dhows even if they did not also have Reefs (which they do).


Pirates: as abstractapplic noted does not do 1 dmg (but does do zero, very often, or 2d3? but with a long tail. ~3.1 average for Galleons/barquentine, ~4.4 for Dhow and ~5.29 for Carrack.

As measure noted, Galleons receive more pirate attacks. Per-hex encounter probability of ~12% for Galleons and ~4% for everything else. Todo: check to see if this depends e.g. on mission type.


Storm: usually 0-7dmg. some tail. 2.5-2.6 average damage. Around ~7% chance per hex regardless of ship type.


Sharks: as abbstractapplic noted, dmg is consistent with min(2d4)-1. As with Pirates,  Per-hex encounter probability of ~12% for Galleons and ~4% for everything else. While not as damaging as Pirates, adds a reason to avoid Galleons.


Harpies:  Galleons and Barquentines seem to take 0 dmg 2/3 of the time, and 1-2 damage 1/3 of the time. Carracks and Dhows seem to take 0 dmg 1/3 of the time, and 1-4 dmg 2/3 of the time. So, theoretically 0.5 average for Galleons/Barquentines and 1.7 average for Carracks/Dhows

Per-hex encounter probability is ~9% for Galleons and ~4% for others, but per-hex damage from Harpies is still less for Galleons than for Carracks and Barquentines.


Dragon: long tail in damage; peaks later for Carracks/Dhows; ~3.4 average for Barquentine, ~3.7 Galleon, ~5 for Dhow, ~7.5 for Carracks. 

Per-hex encounter probability is ~2% for Galleons, ~0.06% for barquentines, ~0.3% for Carracks and Dhows. In terms of average damage per hex the extra encounter probability for Galleons more than makes up for them taking less damage per event than Carracks and Dhows.

Route and ship selection (analysis):

Looking at the encounter types that are initally seem location-independent, we have an average per hex movement cost in damage points, by ship type, of:

Ship/encounter| Pirates | Storm | Sharks | Harpies | Dragon | Total

Galleon              | 0.36     | 0.17    | 0.11     | 0.039    | 0.073    |  0.75

Carrack              | 0.19     | 0.18    | 0.34     | 0.069    | 0.021    | 0.50

Barquentine     | 0.11     | 0.18    | 0.034   | 0.021    | 0.019    | 0.37

Dhow                | 0.16     | 0.16    | 0.036   | 0.058    | 0.017    | 0.44

Taking into account ship hp, the Carrack looks the best here, with ~60 hexes of movement.

We are also likely going to go into Merfolk territory though, which adds an additional cost:

Galleon: 0.24

Carrack: 0.11

Barquentine: 0.044

Dhow: 0.072

It's looking a lot more even here between Carrack/Barquentine, but still slightly favouring Carrack. Since not all of the trip will be in Merfolk territory, might as well go for the Carrack?

One other thing - this cost assumes uniformity of Merfolk, though I actually think the southwest merfolk are more aggressive. Should adjust to account for that later.

We also want to minimize chance of sinking, not damage to be repaired in port. If confident average damage will be tolerable, we might want to reduce long tails rather than average damage. This could favour the Barquentine.

Dhow has less chance of hitting Reefs. Going to the east target, we can take a shortcut through Reefs and might want to consider a Dhow for that.

Additional dmg per Reef hex (v. non-Reef):

Galleon: 0.79

Carrack:  0.76

Barquentine: 0.68

Dhow: 0.17

Going to the West target, we might want to take a shortcut through Kraken territory, for which a Barquentine might be more suitable than a Carrack.

Additional damage per Kraken hex (v. non-Kraken):

Galleon: 0.99

Carrack: 2.04

Barquentine: 0.95

Dhow 1.59

We also might want to avoid E7, for which we have no data. There be dragons. I mean ... in-universe hypothetical squared dragons.

Also, early on I noted down some hexes where >1/5 of ships passing through were destroyed. They include some hexes which should not be especially dangerous from the above info, but this could just be that the routes also pass through dangerous hexes. Anyway, something to look at with further analysis, and maybe avoid if not costly to do so.

With all the above in mind, candidate routes and other info messily drawn on the map:

EditL map deleted and moved to imgur since it wasn't being spoiler properly

imgur link

When counting hexes, I don't count the port since these seem safe from the data.

For the west target:

Route A is the obvious choice taking all the above at face value. With a return trip, it will involve 27 hexes, of which 22 are Merfolk hexes and 1 Kraken.

Route B avoids the unknowns of E7. It's the same overall length including Merfolk length as Route A, but has 5 Kraken hexes on the round trip.

Route C also expensively avoids E7. It's 37 hexes, of which 20  are Merfolk hexes and 1 Kraken. No way that's going to be worth it.

I also added route G later which minimizes distance (and avoids E7) at the cost of additional Kraken hexes. 25 hexes, of which 20 Merfolk and  9 Kraken.

All of these routes involve >1/5 destroyed hexes, but I'm not prioritizing avoiding these super hard atm on the theory that these hexes will turn out to just be on paths that go through other dangerous stuff or are long.

For the east target:

Route D avoids reefs,  but is long and goes through Merfolk territory. It also goes through some >1/5 destroyed hexes. 21 hexes round trip, of which 18 Merfolk and 1 Kraken.

Route E takes 2 reefs to shortcut. 19 hexes round trip, or which 4 reefs and 1 Kraken.

Route F takes 3 reefs to shorten the path a bit more. 17 hexes round trip, of which 6 reefs and 1 Kraken. This is the shortest possible path given the constraint that you can't go across land.

So, expected damage for each route :

route/ship type   | Galleon | Carrack | Barquentine | Dhow

Route A (west)    |  26.5      | 17.9       | 11.8               | 15.0

Route B (west)    | 30.5       | 26.0       | 15.6               | 21.3

Route C (west)    | 33.5       | 22.6       | 15.4               | 19.2

Route G (west)   | 32.4       | 33.0       | 18.6               | 26.6

Route D (east)    | 21.1       | 14.4       | 9.5                 | 12.0

Route E (east)    | 18.4       | 14.6       | 10.7               | 10.6

Route F (east)    | 18.4       | 15.1       | 11.3               | 10.0

Route A looks so much better than the other western choices that I am willing to have the sailors brave the squared dragons. Barquentine looking like the best choice even with only 20 hp.

For the eastern target, Route D looks good with either a Barquentine or a Carrack, or Route F with a Dhow. Some considerations: Route D does go through >1/5 destroyed hexes, so I should try to find out if that really is a problem. On the other hand, the Dhow has low chance to hit a Reef but not low damage if it does get hit - high variance is risky. On balance, I pick the Barquentine on D for now.

Current route and ship choice:

So, for now I pick: 

"The Bloody Diamond, a Barquentine captained by Angus MacDougal" on Route A (Q6-P6-O6-N6-M5-L5-K5-J5-I5-H5-G5-F5-F6-E7-E8) and back by the same route.

"The Saucy Heart, a Barquentine captained by Erin Aubrey" on Route D (Q6-P6-O6-N6-M7-M8-L8-K9-K10-K11-L12-L13) and back by the same route.

Comparing to others' selections:

My selected routes A and D are the same ones chosen by abstractapplic, but I use two Barquentines whereas abstractapplic uses a Galleon and a Barquentine.

Yonge selected Route F to go to the east target and for the west first selected something that looks like it should be equivalent to my Route B, in terms of length and types of hexes it goes in, but at the bottom of the comment changes it (why?) to add some additional dilly-dallying in Kraken territory. Yonge chose a Galleon and a Carrack.

measure picked two Dhows (unconventional!), and sent one of them on a route equivalent-seeming to Route E, which looks sensible to me, but the other one is going to the west target starting out at (up to the last hex) the same route (so, super long route), and is a Dhow cutting through Kraken territory, which looks not so sensible.


Look at Merfolk donut only, check to see if that affects merfolk stats

Look to see if expected damage can reasonably account for observed losses, check where excess losses are occuring (is Jemist right that there are unexpected losses?)

check to see if Captains affect anything

check to see if time docked affects anything

check to see if voyage purpose affects anything

additional remarks:

As Yonge notes, there are 19 encounters not on the planned route I did not see a pattern and attribute this to noise in the data. Note that it is possible that, even if something was displaced by noise, it would still end up on the planned route. I am inclined to attribute the Merfolk event on O5 to such noise, the event probably having really occurred on N5, which was also on the ship's route.

Comment by simon on D&D.Sci Pathfinder: Return of the Gray Swan · 2021-09-04T00:17:32.512Z · LW · GW

Thanks. I can just switch to Excel then if it's significantly better for this purpose. In my case this is not a problem since I have office 365 access through work - I just normally avoid closed source stuff (other than games) for my personal use. GuySrinivasan mentioned another thing in an earlier thread (comment link), I probably should check that out, though expect a bigger learning curve.

Comment by simon on D&D.Sci Pathfinder: Return of the Gray Swan · 2021-09-03T05:35:06.336Z · LW · GW

Nice, though I have been finding LibreOffice Calc rather annoying to work with on this one...

Is the following data point a bug?

voyage 3352 has a storm encounter at P14, which is a land hex

Comment by simon on D&D.Sci August 2021: The Oracle and the Monk · 2021-08-14T06:26:38.263Z · LW · GW

Observations and results so far:

Ignoring any time dependence, solar+lunar and solar+earth are the most successfuly combinations; they would have succeeded on 246 and 235 respectively of the existing 374 datapoints.

Note, in my remarks on individual mana types I may include information on other mana types.


Solar seems to have a 27 day cycle with 3 peaks within it (so 9 day cycle?) but which peaks are stronger has been changing over time. The current cycle, cycle 14, has been weird with days 15-20 of the cycle (days 366-371) being higher than expected. The last 3 days (372-374) are low, but not far from expected. Other slightly weird cycles include cycle 8 (slightly higher than expected values from days 12-14 of the cycle, i.e. days 201-203) and cycle 10 (slightly higher than expected values from days 20 to 25 of the cycle, i.e. days 263-267). I'm counting "day 1" of a cycle to be the one that's a multiple of 27 from the day 1 of the overall data.

If solar is back to its normal pattern, on day 384 it should be on the way down from a high peak and approaching a high trough, so still doing pretty well (>40 expected), making solar a good candidate for one of the mana choices.


Lunar, like solar, shows a 27 day cycle (shouldn't it be 28 days?) with 3 peaks changing which is stronger over time; outliers include days 26 of cycle 8 to day 1 of cycle 9 (days 215-217) which are unexpectedly weak. Like solar, lunar should be declining to a high trough on day 384, I expect >35.  Assuming solar is back to normal, solar + lunar should succeed.


Ocean varies greatly from single digits to over 60. While some possible patterns appear (e.g. some short range autocorrelation, and a degree of autocorrelation at a 4-day displacement almost as high as at 1-day displacement) it does not seem to have a fixed period of variation. The possible high values make this a potentially interesting choice if it can be predicted, but more analysis needed to determine what ocean will be at at day 384.


Breeeze looks fairly random, distribution peaks at 13 and varies from 6 to 20.


Like Ocean, has short-range autocorrelation but no single period. Varies from 11 to 41.


Like Ocean and Flame, has short range autocorrelation but does not seem to have a single period. Varies from 2 to 10, so of scientific interest only.


Looks random except for some possible 1-displacement autocorrelation. Varies from 9 to 74, so definitely of interest if a pattern can be found.


Looks fairly random, varies between 17 and 31. This is the same size of range as for Breeze, but has a fatter and asymmetrical distribution.


8 days sawtooth pattern with some possible random variation (peak on day 2, bottom on day 3, then steady rise). Notable outliers from expected pattern: Day 9 of cycle 1 (i.e., day 9), day 5 of cycle 22 (i.e. day 181), day 3 of cycle 11 (i.e. day 91), and day 1 of cycle 31 (i.e. day 249).

On day 384, it will be day 8 of the cycle, and something in the range of 27-33 is likely. Not as good as solar and lunar, even discounting the risk of a miscalculation with one of the dangerous mana types.


28-day periodicity with lots of peaks and troughs within the period; the troughs are often (but not always) 0. A reliable spike on day 5 gives the period away. Seems to also prefer specific values instead of a smooth distribution. Day 384 will be day 20 of the period, and a low value can be expected (7 or 0).

Preliminary answer:

So far Solar+Lunar seems the best choice.

This is also the choice that looked best before getting time dependence information for Solar, Lunar, Doom and Spite, so further research on the time dependence of other mana types (especially Ocean or Earth which can have high values) might find an alternative, better answer.

Edit after reading aphyer's solution:

Aphyr found that Earth and Ocean are anticorrelated and their sum has a smooth 22-day pattern. As Aphyr reports, day 384 should be close to the peak. Expected value 75-80 or so. This looks like a good, safe solution.

I actually expect Solar+Lunar to be slightly higher, since the peaks and troughs have been shifting in height/depth over time, and while they will be near a trough at day 384, the trough is one that has been shifting upward. I expect ~43- 45 from Solar and ~36-40 from Lunar on day 384. However, this is less certain than the Earth+Ocean expectation and as Aphyr notes Solar has been weird lately. Solar+Lunar is definitely the riskier pick and probably is objectively not what one should pick based on the available info, so I'd switch to Aphyr's solution in real life but I'll stick with Solar+Lunar as what I want to get credit for (for now) since I have an excuse that it might be better and it's what my analysis was on.

Not had time for this recently (fortunately extra weekend though) but after checking gjm, Jemist's and GuySrinivasan's comments:

Whoops so much for Lunar.

Only a little further remarks as time running out:

The following predicts solar with +-1 accuracy:

28 day cycle:  32,32,32,27,27,27,27,27,27,32,32,32,34,34,35,36,37,40,41,42,42,41,40,37,36,35,34,34

9 day cycle: 9,10,10,10,9,3,0,0,3

anomalies: +8 days 61 and 62, +12 dats 201-204, +9 days 263-268, +24 days 366-371

predicted result for day 384 is 45+-1, unless there's an anomaly.

Doom's 8 day cycle is 30,32,18,20,22,24,26,28 plus 0-5 with anomalies on days 34, 91, 181, 249. Unlike solar there are both positive and negative anomalies. Expected result (no anomaly) is 28-33 on day 384.

Solar+doom should give (with no anomaly) 72-81, does not look as good as Earth+Ocean's 74-80 (from GuySrinivasan) though obviously Doom is better than Lunar's known value of 16 at day 384.

With respect to Earth and Ocean, both include many values larger than the minimum of the sum of the two, and Ocean has a sharp minimum value at 4, while Earth's minimum value is not so sharp with 9 being the smallest but more of the bottom edge of the Earth-Ocean x-y plot being at 11 or so. This probably says something about how they are made but I have not thought of it. So far, nothing better than Earth+Ocean found.

also (whoops, this postdated the eval, and was apparently spurious to boot):

Obviously, the best candidates for beating earth+ocean are solar+ocean or solar+earth (whichever we can find out will be bigger).

Spite correlates a bit with Ocean and anticorrelates a bit with Earth. Not a super large effect but relating ocean/earth (which we need to predict) to Spite (which we know deterministically) is very interesting)

Comment by simon on Punishing the good · 2021-07-21T07:58:48.093Z · LW · GW

So, what are your meta-moral considerations here?

If the underlying meta-moral considerations are utilitarian, then I think that using moral outrage as a social punishment against people with differing moral views is likely to backfire very badly in general, and so is not particularly compatible with maximizing utility. (A sin tax is probably a lot safer.)

Now, at least the example of Bob involves a topics on which people in general have differing moral views, but the particular people involved in both examples likely have the same relevant moral views as you. So in these particular cases, perhaps moral outrage might "get the incentives straight", though if people with differing moral views are treated differently (in order to prevent the likely defensive reaction from disagreers), that creates its own set of problematic incentives.

Comment by simon on Covid 7/15: Rates of Change · 2021-07-18T06:48:01.798Z · LW · GW

Yes, as Zvi mentioned in the quote and I acknowledged.

Comment by simon on Covid 7/15: Rates of Change · 2021-07-16T02:17:51.289Z · LW · GW

One possible way for this to kinda sorta work is that perhaps there are people who get tested in order to show a negative test, whose tests get reported every time, and people who get tested because they want to actually know if they have Covid, who mostly only report when they’re positive. Then, doubling the size of the second group doesn’t change reported test counts much? That’s the best I can come up with.

My mental model (backed by nothing) is that a lot of people get tested due to symptoms that often aren't due to covid, so this provides a relatively constant level of negative tests even though they do actually want to know if they have Covid. (In addition to any who simply want a negative test, of course). 

It's possible that perceived prevalence would affect the tendency to get tested for a given level of symptoms, but if so I wouldn't be surprised if this perceived prevalence lags the positive tests. (There's a lot of potential for weirdness here though).

People getting tested due to an interaction with a positive case would provide negative tests correlated with positive tests, but I expect this would lag the positive tests. 

Just speculating - I haven't been paying attention to the negative test patterns in the past, so this might for all I know be totally at odds with the actual data.

Comment by simon on D&D.Sci(-Fi) June 2021 Evaluation and Ruleset · 2021-06-30T01:16:29.181Z · LW · GW

Zeta Resonance randomly fails and produces 0kCept 22% of the time.

So it was statistically independent, and I still managed to Texas-sharpshooter up a  0.00006457 significance level for a false stop at 2.27. Tbh, I expected that the other reasons (edit: see this comment) for Zeta on Earwax to be unsafe were more likely than the stop at 2.27 being just random variation - looks like my priors against such a stop were too weak.

Also the Poisson theme seems it should have been discoverable, but I didn't look into the random variation in the resonances where the random variation seemed irrelevant. (And it mostly was, except that it would have provided insight into gamma, where it did matter.)

Comment by simon on D&D.Sci(-Fi) June 2021: The Duel with Earwax · 2021-06-29T05:34:41.253Z · LW · GW

Since time is pretty much up, summarizing where I'm at:

Afaict each resonance pilot strength result is a multiple of the result of a pilot-independent, resonance-specific rule times a pilot-specific power level with that resonance. (though, tbh, I haven't checked this that closely). 

With Maria out, the highest pilot power levels per resonance appear to be:

Alpha: Corazon. This resonance has apparently random variation about a constant value that jumped slightly somewhere around Floorday 500. It is too weak to save us.

Beta:  Janelle.  This resonance has apparently random variation about a constant value. It is too weak to be likely to save us.

Gamma: Janelle. Credit to GuySrinivasan for finding the specific formula of (1+k*amplitude) (times pilot gamma power level). The integer k is from 0 to 5 with 1 being slightly more common than 0 (could be random variation) but dropping off beyond that. Though that isn't the most expected random distribution and may hint at something non-random, I haven't found the pattern if there is one. Janelle needs k to be 1 or higher to save us or 2 or higher to overwhelm Earwax. Without knowing a pattern for the k value, this seems too risky.

Delta: Amir. This resonance has apparently random variation along with a moderate upward slope with heteropneum amplitude. It is too weak to save us.

Epsilon: Will. This resonance follows a cubic formula (credit to GuySrinivasan for reporting the cubic dependence first, though I hadn't read his comment when I reported it). Though GuySrinivasan expresses low confidence in Epsilon, it seems to me that, assuming the assumptions of the cubic formula plus the multiplicative relationship between power values for different pilots is correct, there is no way the coefficients could possibly off by enough for Will not to beat Earwax. And these assumptions seem to me more solid than for Zeta below, so I see this as the safe choice (but not my current choice, because Epsilon will not overwhelm).

Eta: Will. This resonance has a non-random constant value with several jumps over time. One of the jumps appears to coincide with Alpha's jump. Without any reason to expect a further jump since the last observed data, it is not strong enough to save us.

Zeta: Corazon. Zeta is either zero, or one of two non-zero values. Afaict whether it is zero is random, except that no zero values have been observed for heteropneum amplitudes above 2.27, so I weakly infer that there will not be a non-zero value against Earwax. It seems that which non-zero value occurs depends on which of two or more populations the heteropneum belongs to. The large majority of heteropneums belong to a population with amplitudes that are (before rounding) multiples of 0.142 or something very close to 0.142. These always get a low Zeta value if they get a non-zero result. The minority that are not in this population always get a high Zeta result if they get a non-zero result. Earwax's rounded 3.2 value cannot be obtained by rounding a multiple of 0.142, so we can expect a high Zeta result and for Earwax to be overwhelmed. Thus, I pick this choice, despite my uncertainty as to whether I have enough evidence against a zero result.

A potential wild card is that we don't know Flint's power levels except for alpha, since he never overwhelmed any heteropneums. If there is a way to predict power levels without seeing a strength result with that resonance, this could reveal further opportunities with Flint.

Comment by simon on D&D.Sci(-Fi) June 2021: The Duel with Earwax · 2021-06-26T09:01:00.040Z · LW · GW

update on Eta resonance:

Eta is simply a multiplication of a character-dependent Eta power level and a date-dependent Eta strength. The date-dependent Eta strength is constant except occasionally it jumps. The lowest strength was from floordays 2-253, then second lowest from 280-297, then next level from 316-395495, then it jumped to the highest level from 516-746, and then dropped to the second highest level from 749-804. (no relevant data in the time gaps). It has never jumped back to a level after going to a different level.

Will's sole eta value of 0.9 occurred on floorday 110 when Eta was at its lowest strength. This means Will is almost as strong as Maria at Eta (everyone else for whom we have Eta data is lower). Unfortunately, this is still not strong enough to beat Earwax if the Eta strength remains, now at floorday 814, at the same level it's been from 749-804. 

edited to add: Alpha also shows a jump between floorday 495 and floorday 516. (This is the reason for the bimodal appearance of its distribution). Since this jump occurred in both Alpha and Eta, but the others only occurred in Eta, this suggests that it might have a different cause than the other jumps. 

Comment by simon on D&D.Sci(-Fi) June 2021: The Duel with Earwax · 2021-06-25T01:08:40.966Z · LW · GW

update on Zeta resonance:

Though duplicate amplitude values are common, all verifiably high-tier Zeta values so far have been against heteropneums with unique amplitudes. Admittedly, this is only 5 datapoints.

The good news: Corazon got her Zeta results against duplicate-valued heteropneums, so if the pattern holds true for her, her results have been low-tier so far and she is strong enough to overwhelm Earwax if she gets a high-tier result.

The bad news: Earwax has a duplicate amplitude value (as long as the formatting including rounding if applicable is consistent between Earwax and the other entries) so if the pattern holds true for Earwax, there will be no high-tier Zeta result against Earwax. Wrong, see below

Edited to add: Earwax and the "duplicate" (Divisor, floorday 389) have not been overwhelmed and are likely rounded to 1 decimal place, but all of our zeta data is from overwhelmed heteropneums, reducing the likely relevance of the "duplicate".  More detailed info below.

Further addition: I failed to mention earlier that all the non-duplicated entries have either high-tier or zero Zeta results (whereas all the duplicated entries have zero or low-tier Zeta). So, this is very likely significant.

On reviewing the relationships between the duplicated entries for which we have overwhelm results, all are equal to 0.142 multiplied by an integer from 2 to 22 (when that is rounded to 2 decimal places). The 0.142 might not be the exact value but it makes them round correctly. The 22 is probably not the highest but is simply where Maria last overwhelms heteropneums (3.12 amplitude).

Importantly, Earwax's value of 3.2 cannot be rounded from a multiple of 0.142 (3.12 is too low, and the next value would be 3.266, which would round up to 3.3). If I try to lower the base value to 0.1419, this already prevents correct rounding of the known values (it would predict the 18x number would round to 2.55 but the 18x number needs to round to 2.56), and this too-low base value still predicts 3.2637 for the next multiple). Thus, Earwax is not from this population of heteropneums, which accounts for all the low tier Zeta results! 

However, we still need to find a way to predict if we will get a zero result. Of the 9 non-duplicated overwhelmed heteropneums, there was a zero pilot strength Zeta result in 4 of these cases.

Still further addition: For amplitudes above 2.27, we have no cases of zero zeta. Among the overwhelmed heteropneums (which is all we have Zeta data for) we have 27 total cases of zero Zeta among 150 data points, and there are 41 cases with more than 2.27 amplitude. So, if all are statistically independent, then the probability of this happening by chance is (123/150)x(122/149)x...x(83/110)=0.00006457.

There are a variety of reasons not to be too impressed by this probability number.

  1. We don't have a very good reason to believe that the results are statistically independent. Duplicates in amplitude values do vary in whether they get zero Zeta (if amplitude less than or equal to 2.27), so they might be statistically independent, though.
  2. I came up with the hypothesis (that Zeta is never zero above some amplitude) after seeing the data, not before, and need to adjust for the prior with possible hindsight bias.
  3. Even if there is a non-random pattern causing the results, it doesn't necessarily imply that it will hold for Earwax.

That being said, my expectation is that abstractapplic did leave us a way to overwhelm Earwax, so I'm confident enough (barely) to switch my proposed response to Corazon with Zeta. In real life, I'd stick with Will and Epsilon, which I am far more confident in.

Comment by simon on D&D.Sci(-Fi) June 2021: The Duel with Earwax · 2021-06-24T07:04:04.681Z · LW · GW

Update on Epsilon resonance:

It's cubic not sine; I can fit Maria's Epsilon data so that the curve rounds to the exactly correct value for every data point, and also for Janelle's data (separately) to round to the exactly correct value; I still need to check if I can make a single curve and multiplier between Maria and Janelle to round exactly for both, but it does look like the curves are at least fairly close to exact multiples of each other. 

Interestingly, no x-value rounding needs to be assumed, at least to get the correctly rounding values for Maria and Janelle separately. So, perhaps the x (heteropneum amplitude) values are exact? No, see below

The cubic curve does take a big dive at high heteropneum amplitudes, but fortunately not until after Earwax's ~3.2 amplitude. Also, the fit for Maria's 0.57 amplitude result of 0.1 is actually around 0.096.  Will getting 0.21 suggests he is at least around 2.13 times stronger than Maria using Epsilon and is projected to get at least about 3.85 against a 3.2 amplitude heteropneum. So, Will using Epsilon still looks like a safe pick to survive if we can't find guaranteed survival another way.

edited to add: note that the speculation that the x-axis values might be exact should  only apply to the overwhelmed heteropneums - there are the these are the only ones we have epsilon data on and also the only ones we have data to 2 decimal places on. Irrelevant, see below

All overwhelmed heteropneums that are duplicates in power of another overwhelmed heteropneum is a multiple of an integer from 2 to 22 times 0.142 (the integers probably go higher than this, but Maria stops overwhelming them at that point). This value of 0.142 might not be the exact value, but it makes the numbers round correctly, whereas the rounded values are not exactly the right ratios, so presumably the amplitude values are rounded.

Comment by simon on D&D.Sci(-Fi) June 2021: The Duel with Earwax · 2021-06-23T05:20:37.489Z · LW · GW

Initial impressions:

Even though Janelle probably always uses Beta and Maria probably always uses Delta, we can get an idea of the characteristics of each resonance type by comparing their hypothetical results against heteropneums weak enough for them to overwhelm. 

From eyeballing graphs of strength v. heteropneum amplitude for each resonance type and both pilots:

The qualitative behaviour of each resonance looks similar between Janelle and Maria, but quantitatively different (likely a simple multiplicative factor, but I should check!). The multiplier per pilot is different for the different resonance types (so, e.g, Janelle is about as strong as Maria with Beta resonance, but weaker with other resonance types).

And for the different resonance types the graphs are as follows:

Alpha does not depend on enemy strength and maybe has two clumps.

Beta does not depend on enemy strength.

Gamma's points seem to line up on straight, mostly slanted lines from a more-or-less common origin at zero enemy strength. Suggesting a strong dependence on enemy strength but one of the lines is flat and too low, so need to find a way to find out which line you'll end up on.

Delta has a gentle upward trend for Maria (too noisy to detect for Janelle).  This does not appear to be a selection effect as Maria is always handily beating the heteropneums.

Epsilon has a curve that looks like a parabola at first, but then slows down, so maybe a sine curve? It is very consistent looking (not noisy) so should be possible to have an accurate fit for it. The curve looks a bit distorted in places for Janelle but this is likely just rounding due to her very low values at this resonance. 

Zeta and Eta have points lined up on flat lines. For Zeta one of those lines is at zero.

Based on this, some candidate responses: 

  1. If we can figure out which line we'll end up on, possibly Gamma as used by Janelle. We need to be confident however that we'll end up on a good line.
  2. Ditto for Zeta as used by Corazon, but with the additional caveat that we need to know that her past hypothetical results of 1.98 were low tier results (and she'll get high tier this time). If the 1.98's were high tier, she'll lose.
  3. If we can't figure out the information needed for either of the previous, the safe choice appears to be...Epsilon as used by Will. This may seem surprising at first glance since Will's only hypothetical result with Epsilon resonance was a measly 0.21. However, this result was at Heteropneum amplitude 0.57, near the bottom of the Epsilon power curve as seen for Maria and Janelle. If Will has the same Epsilon power curve but with a multiplier, he is around twice as strong as Maria with Epsilon resonance (but check rounding error bars!), and should confidently beat Earwax as long as the Epsilon power curve doesn't take a surprisingly sharp turn to decline between 3.12, where Maria last overwhelms heteropneums with Delta, and Earwax's amplitude of 3.2. However, Will will not overwhelm Earwax, and either option 1 or 2 could do so if successful, so if we can figure out the necessary information for either of those options, they would be preferable.
Comment by simon on A.D&D.Sci May 2021 Evaluation and Ruleset · 2021-05-28T06:21:15.767Z · LW · GW

It* requires less effort because 'cooperation' reduces effort, while 'competition' increases it**.

In general, one would define cooperation in games as strategies that lead to better overall gains, and ignore effort involved in thinking up the strategy. In this case, there was an easy cooperative strategy, but it's not in general true, for example, in the Darwin Game  designing a cooperative strategy was more complicated than a simple 3-bot defect strategy. 3-bot didn't do well but possibly could have if there were a lot of non-punishing simulators submitted (there weren't).

Also, even in this particular case, you could have had better results if you had taken the effort to get more to follow the same strategy. The rules did not explicitly forbid coordination, even by non-Lesswrongers, so you could have recruited a horde of acquaintances to spam 1-bids. (that might have been against the spirit of the rules, but you could have asked abstractapplic about it first I, I guess).

Comment by simon on A.D&D.Sci May 2021 Evaluation and Ruleset · 2021-05-26T02:40:18.048Z · LW · GW

Good point. I should have anticipated strategies that require less effort to be more popular.

Comment by simon on A.D&D.Sci May 2021 Evaluation and Ruleset · 2021-05-26T02:37:29.619Z · LW · GW

are the 2-bidders stable against 'defection'?

Of course not, they lose to 3-bidders. I wouldn't consider that "defection" in the same way though, since the 1-bidding is presumably an attempt at coordination and the 2-bidding would be exploiting that coordination and not directly a coordination attempt.

There weren't any 2-bidders.

Sure, but if 1-bidding were to become popular in similar problems, there would start to be 2-bidders.

Comment by simon on A.D&D.Sci May 2021 Evaluation and Ruleset · 2021-05-25T15:34:42.236Z · LW · GW

Yeah, and actually 1-bidding can be a good strategy even from a selfish perspective if you can get enough people to coordinate on it, since a small enough number of high bidders will run out of money and the 1-bidders make a large profit on what they do win, though it's not stable against defection (2-bidders win in the 1-bidder-filled environment).

Comment by simon on A.D&D.Sci May 2021 Evaluation and Ruleset · 2021-05-24T23:49:27.055Z · LW · GW

Bidder G reporting in... 

Looks like my incorrect speculations on the exact models were likely not helpful, I also did not expect the 1 bidders (fine strategy against real duplicates like in the scenario given, but we're trying to have a competition here!). 

Comment by simon on A.D&D.Sci May 2021: Interdimensional Monster Carcass Auction · 2021-05-24T02:21:56.238Z · LW · GW

I'm assuming that BST is British Summer Time and the deadline has passed. Remarks about the problem and my bid before abstractapplic posts the results:

Decision on how aggressively to bid

With some exceptions for the jewel beetle and mild boars, discussed below, I generally estimated the EV and bid lower by a scaling factor. The scaling factor was pretty ad hoc and not based on some sophisticated game theory, as I don't really know how aggressively people are going to bid. I did not adjust the scaling factor based on the lot number.

One Schelling point is to bid a total of 300, so I figure I should probably bid higher than that on average (given the revenue up for grabs is more than twice that). Another would be to bid at the minimum end of the observed range for each lot, so I could have tried to beat that if the minimums were reasonable, but didn't get around to actually checking this, except that I did note that my bids were above my expectations for what the true minimums were in the cases where I got around to estimating that. 

I assume other people are also bidding above these points. If that is not the case, I will win a lot of bids, but likely lose in profit to someone making higher per-lot profit on fewer lots.

Analysis of revenue from different carcass types:

Jungle Mammoths:

The Jungle Mammoths (=elephants?) looked consistent with a formula of 31+4d6-3dsd so I assumed that their EV was 45-3dsd.


The dragons look like they all have similar characteristics in their drops over time, with in particular a big drop of around 30 value between 4 and 5 dsd (except gray dragon which has too little data to tell). One possibility would be that each has their own non-time variant distribution which is added with a "dragon curve". If I had more time, I would have tried to figure out the dragon curve and the separate distributions based on comparing the different dragon types (or rule it out and look for another hypothesis). As it is, I estimated the dragons in a pretty ad-hoc manner (eyeballing graphs mostly).

I do note that red dragon has some interesting even/odd behaviour, as it is always odd from 1 dsd to 6 dsd, and always even from 7 dsd to 10 dsd. If the "dragon curve" hypothesis is true, then this could be explained by an always-even or always odd "red distribution" (e.g. 2*2d12?) combined with a "dragon curve" that switches from odd to even at that point.

Mild boars:

For the mild boars (=pigs?), I tried to figure some model out that would match the observed qualitative behaviour and came up with rolling two d20s and setting each individually to 0 if less than or equal to the dsd. However, this did not match the quantitative characteristics, as it was consistently too pessimistic at low dsd and too optimistic at high dsd.

So, instead of taking the hint that I was wrong, I doubled down and added some epicycles. Namely, rolling 3 dice, setting each to zero if below dsd, then taking the top two, except that if you rolled a zero, you had to include the zero. (That's a pretty crazy hypothesis as stated, but maybe slightly less crazy in the equivalent formulation of adding the dsd to each die, taking the top two dice, and then setting any die over 20 to zero).

This seemed to predict the low-dsd mild boars a lot better, but was still optimistic on the high-dsd mild boars. Due to low numbers, a close fit on the high-dsd boars might be less necessary though. It also predicts a bimodal distribution with a trough at around 22 and while you can sort of see something like a hint of that in the data, it is not very convincing. Going to 4 dice adversely affected the early mild boar fit and seemed worse overall.

Anyway, I decided to roll with it (the 2 out of 3 d20s model), but since I am not super convinced, I limited my bids on the 8 dsd mild boars (lots 9 and 11) to 9sp, equal to the ceiling of the average of observed value for 8 dsd mild boars. Due to the "winner's curse", in the very likely event that I am wrong on their distribution I will probably take a loss on these.

Jewel Beetle

As previously remarked on by other commenters, the jewel beetle (or "lottery ticket beetle" as I think of it) has a high variance distribution. It looks more or less like a power law. In fact, it looks like it's such an extreme power law that it won't even have a finite expected value, as the extreme low frequency outliers will have value disproportionate to the low frequency.

So, if I were in the position of the hypothetical scenario provided, I would probably bid a lot for the lottery ticket beetle.

However, I'm not in that situation. I am instead competing for the glory of being Numbah One. And while the jewel beetle might have an extreme value, it probably doesn't. So, I reduced my jewel beetle bid to the median jewel beetle value of 12 instead of gambling on an outlier here.

I also note that new jewel beetles seem to tend to be lower in value than old ones. Not sure if this is random and my prior is generally against this.

Actual Bid


Comment by simon on A.D&D.Sci May 2021: Interdimensional Monster Carcass Auction · 2021-05-17T23:19:16.095Z · LW · GW

Is our profit evaluated based on actual results, or based on expected value?

Comment by simon on Does butterfly affect? · 2021-05-15T02:03:52.818Z · LW · GW

Sure, the butterfly is really minor compared to everything else going on, and so only "causes" the hurricane if you unnaturally consider the butterfly as a variable while many more important factors are held fixed.

But, I don't believe the assassination of Franz Ferdinand is in the same category. While there's certainly a danger that hindsight could make certain events look more pivotal than they really were, the very fact that we have a natural seeming chain of apparent causation from the assassination to the war is evidence against it being a "butterfly effect". 

Comment by simon on D&D.Sci May 2021 Evaluation and Ruleset · 2021-05-15T01:41:40.467Z · LW · GW

The amount Carver gains from a Yeti carcass is given by 70+1d6-[DSD]d6


No, I already went over this with GuySrinivasan lol...

line # 89 (carcass # 88): Yeti,0,60sp,60sp,77sp

Anyways, I'm assuming that's a typo there and you meant to put in 72.


60-28*[DSD] for Snow Serpents 

that should be a 20.


This one really brought home to me the usefulness of strong (yet correct) priors. 

Assuming that the typo wasn't in the d6, credit to GuySrivinisan for correctly defending the d6 against the weight of evidence for the d5. Also, the insistence on a higher prior probability for age distribution than a weighted average that just happens to be triangular would have.

This puzzle was made a lot easier by the simplicity of the model, e.g. everything was independent from everything else, except for bids and value obtained depending on monster type and days since death which we were primed to expect by the problem, and no hidden variables except the necessary randomness to actually have something to work out. I don't particularly feel like a Bayesian superintelligence though maybe all problems look like this to one sufficiently advanced.

Looking forward to whatever non-puzzle you have in mind for Monday. 

Comment by simon on D&D.Sci May 2021: Monster Carcass Auction · 2021-05-11T01:08:20.413Z · LW · GW

Anyway, in the spirit of tumbling platonic solids:

One possible distribution for the age numbers would be the distribution generated by min(d12,d12)-1. This is not the same as the 1,2,3,4...12 triangular distribution, but rather a 1,3,5,7,...23 triangular distribution. (The 1,2,3...12 distribution would be generated by min (d12, d13)-1).

And checking the likelihood - this one is actually better. 

LOG10 likelihood

-1672.05 for 1,2,3,...12

-1671.43 for 1,3,5,...23

P.S. I was terse in the previous comment because of time constraints. About the difficulty of the triangular distribution, I was thinking it wasn't that unlikely anyway because in the previous problem abstractapplic generated a weighted average by taking a random entry from a list that contained duplicates, and a suitable list could be generated easily enough using a for loop. 

Comment by simon on D&D.Sci May 2021: Monster Carcass Auction · 2021-05-10T15:55:16.138Z · LW · GW

Looks like the likelihood for triangular is over a million times better (to log-nearest order of magnitude ~10^-1672 v. ~10^-1679) than the 1/6 drop per turn exponential. 

Comment by simon on D&D.Sci May 2021: Monster Carcass Auction · 2021-05-10T05:12:57.727Z · LW · GW

The age 0 amount is higher than expected from the model distribution, but it's nowhere near 2.5x the age 1 amount. I have:

Overall 289 age 0, 233 age 1 (expected 258, 236.5)

Snow Serpents 103 age 0, 92 age 1

Winter Wolves 141 age 0, 108 age 1

Yeti 45 age 0, 33 age 1

Comment by simon on D&D.Sci May 2021: Monster Carcass Auction · 2021-05-10T01:30:21.447Z · LW · GW

A potential model of the full problem (involves questionable numerology):

There are 13 lots currently, and the number of carcasses in the record is divisible by 13 (129*13). If we include the current auctions, that's 13*130, or 13*13*10. 

So, I'll assume that all auctions have 13 lots.

The individual monster types aren't divisible by 13 (except Snow Serpents), nor are they if we include the current auctions (except Winter Wolves). However, the Wolf:Yeti ratio seems very close to 3, and if the overall ratios were 2:5:6 that would fit in with the 13 theme and seems close enough to the Yeti:Serpent:Wolf ratio.

The age distributions look fairly triangularish, with a maximum age of 11. One possible way to express that would be there is on average 1 carcass of age 11 for every 2 aged 10, up to every 12 aged 0. And what's 1+2+...12? Of course - a multiple of 13. Specifically 13*6.

Now, it would be nice if looking at the data in blocks of 13*6 showed some pattern, but I don't see one, nor is the data a multiple of 6. No matter, we will press on without such empirical validation.

I also note that, in the current auction, the early lots look newer than the late auctions. Coincidence? Probably.

So, model:
Base Lot Generation (low confidence):
Auction of 13 lots each day
Each lot assigned an age from 0 to 11 by a random distribution weighted by {12-age}
Each lot assigned an animal type by a random distribution weighted 2:5:6 for Yeti:Serpent:Wolf

Bids: There are two non-Carver bidders, one of whom bids:
60-20*age for Snow Serpents
50-12*age for Winter Wolves
55-6*age for Yeti

and the other one bids:
9+d8 for Snow Serpents
19+d4 for Winter Wolves
29+d6 for Yeti

whereas Carver bids (not high confidence):
7+2d10 for Snow Serpents
31+2d8-3*age for Winter Wolves
32+2d20-2*age for Yeti

Revenue: (credit to GuySrinivasan)
20+2d6 for Snow Serpents
25+4d6-2*age for Winter Wolves
72+1d6-{age}d6 for Yeti (assuming that the prior for d6's over d5's is stronger than the 45 times better fit for a d5 in the base part of this formula)

Comment by simon on D&D.Sci May 2021: Monster Carcass Auction · 2021-05-09T19:28:33.525Z · LW · GW

Nice analysis! But I am not so confident on how strong the prior should be and am somewhat torn on the conclusion.

Comment by simon on D&D.Sci May 2021: Monster Carcass Auction · 2021-05-09T06:08:31.684Z · LW · GW

OK I calculated the likelihoods of getting the full yeti revenue data set for those distributions and got the following results:

For 78-{age+1}d6 (which is equivalent to 71+1d6-{age}d6):

likelihood ~ 10^-192

For 72 + 1d5 - {age}d6:

likelihood ~10^-189

So the 1d5 version is literally 1000 times better fit to the data, and I doubt that the prior for 1d6 over 1d5 is that strong. Besides, the base value of 72 is a round number in a way, might increase the 1d5 prior a bit. I'd definitely bet on the 1d5 version over the 1d6.

Comment by simon on D&D.Sci May 2021: Monster Carcass Auction · 2021-05-08T23:32:23.549Z · LW · GW

OK, I see such an argument for the die used, but the base value proposed can't be correct with that die.

Comment by simon on D&D.Sci May 2021: Monster Carcass Auction · 2021-05-08T22:01:14.878Z · LW · GW

Your yeti revenue distribution would predict a maximum of 76 at day 0, but we see 77. We also don't see anything below 73. I suggest 72 + 1d5 at day 0. 

The full formula would be then 72 + 1d5 - {age}d6, where these are independent d6 rolls, and not a single roll multiplied by the age (I think this is what you meant, but clarifying since I wasn't 100% sure on the terminology used). (The other interpretation would lead to too high variation at high ages, I think).

The formulas for Snow Serpent and Winter Wolf look consistent with the data to me.

Comment by simon on D&D.Sci May 2021: Monster Carcass Auction · 2021-05-08T20:37:53.331Z · LW · GW

Nice graphs in the colab. 

For the bidder model, the Yeti day=3 data is consistent with the Days 0,1,2 line and not with the days >=4 distribution, and in general, whichever of the new-carcass line or old-carcass distribution would be higher is what we see.

So, my model would be that there are two other bidders, one who makes deterministic linear bids and one who bids in a uniform distribution (see my reply to my top-level comment); both bid on everything (well, I guess as long as the bid is >0), but we don't see the lower bid.

Comment by simon on D&D.Sci May 2021: Monster Carcass Auction · 2021-05-08T08:36:02.312Z · LW · GW

To save anyone time if they want to read my decision, it's the same as GuySrinivasan's better formatted decision except on lots 10 and 11, where I took seriously the statement that we're indifferent to work and bid 1 sp higher than GuySrinivasan did.

Comment by simon on D&D.Sci May 2021: Monster Carcass Auction · 2021-05-08T08:29:53.510Z · LW · GW

Above I explicitly noticed that Carver's bids would distort the distribution, set that aside, then, when I encountered a skewed distribution, failed to think to attribute it to Carver's bids. But of course that would be the obvious explanation. 

I looked at this, and found that the actual distribution does match pretty well what I'd expect if the true distribution of non-Carver bids were a uniform distribution for old carcasses, though the skew effect was only really dramatic for Snow Serpents anyway.

(edit): Assuming this is correct, we can now better calculate the probability of losing if we bid lower.

For Lot 10: 23 v. 22 sp on 7 day wolf, we cut win rate by exactly 1/3 going down to 22, the revenue is a bit over 25 by my linear fit or exactly 25 by GuySrinivasan's formula; meaning, if GuySrinivasan is right then the average profit is exactly the same either way.

For Lot 11: 22 v. 21 sp on 8 day wolf, we exactly halve win rate going down to 21, the revenue is a bit over 23 by my linear fit or exactly 23 by GuySrinivasan's formula; meaning, if GuySrinivasan is right then the average profit is exactly the same either way.

(edit2): A further note - Even though accounting for Carver's bids when calculating the old-carcass bid distribution dramatically reduces the loss probability from bidding low on Snow Serpents, it's still not worth it to bid low on them (since 1/8 probability of losing more than makes up for profit going from 9 to 10). However, if the parameters of the problem had been changed a bit, this could have been a nice trap.

(edit3): Or wait - do we actually know that the non-Carver bid distribution for Snow Serpents only goes down to 10? what if it goes down to 8 or lower, but we miss those because carver bids over those? Carver's bids go down to 9, so we could theoretically see a 9 non-Carver bid, but the data might not include any by coincidence. We can't possibly see a non-Carver bid below 9. Looking at this - no, if it goes down to 8, the number of non-Carver bids we'd expect to see seems too low - I'm reasonably confident it doesn't go down that low.

Comment by simon on D&D.Sci May 2021: Monster Carcass Auction · 2021-05-08T06:10:03.722Z · LW · GW

Initial observations and results (warning, for all I know right now it could be substantially complete solution):

It looks like the non-Carver bidders for newer carcasses are consistent, and drop linearly with age in days:

snow serpents: 60 - 20*age 

winter wolf: 50-12*age (technically we don't have a third point to see that this is linear, but see below)

yeti: 55-6*age

Presumably, this is due to a single bidder.  The trend continues until it drops below the distribution for older carcasses, so I guess that this bidder continues to bid this pattern down to 0 sp.

For older carcasses, there is a range of variation that doesn't seem to depend much on age. I therefore assume for now that it really doesn't depend on age, though for all I know it does.

For winter wolf, Carver won all the auctions for 2 and 3 day carcasses. However, Carver bid high enough for this data to be consistent with 2 day old winter wolves following the linear trend and 3 day old winter wolves following the old-carcass distribution, so I am assuming that they do follow these trends.

One potential pitfall in assessing this distribution for older carcasses is that where Carver also bids within the range, the lower non-Carver bids are more likely  to be overbid by Carver. Ignoring this for now. For 4 day old yetis, the new-Carver bidder would fall within the expected older-carcass non-Carver bid distribution, though all actual bids were higher. Ignoring this as well.

The old-carcass distribution appears to be:

snow serpent: 10-17, but heavily skewed to higher values

winter wolf: 20-23, possibly some skew to higher values

yeti: uniformish 30-35, though with few data points, could also have some skew to higher.

In terms of revenue obtained, Yetis and Winter Wolves appear to show a linearish decline with age, while Snow Serpents appear roughly constant. Assuming such linearity/constantness:

snow serpent expected revenue: 27.15

winter wolf expected revenue: about 39 -2*age

yeti expected revenue: about 75-3.4*age

Solution based on the above:

Lot 1, Yeti, 0 days: bid 56 to beat expected  new-carcass bid of 55. Expected profit about 19sp, we have 344 sp remaining.

Lot 2, Snow Serpent, 2 days: bid 21 to beat expected new-carcass bid of 20, 7 sp expected profit.  323 sp remaining.

Lot 3, Snow Serpent, 1 day: no profitable bid expected, skip.

Lot 4, Winter Wolf, 1 day: no profitable bid expected, skip.

Lot 5, Yeti, 5 days: bid 36 as there is enough likelihood of a bid up to 35 to avoid losing the ~22sp expected profit. 287 sp remaining.

Lot 6, Winter Wolf, 1 day: skip as with previous 1 day winter wolf

Lot 7, Snow Serpent, 1 day: skip as with previous 1 day snow serpent

Lot 8, Snow Serpent, 5 days: bid 18 as there is enough likelihood of a bid of 17 to make it not worth risking losing the ~9sp expected profit. 269 sp remaining.

Lot 9, Winter Wolf, 3 days: bid 24 as there is enough likelihood of a bid of 23 to make it not worth risking the ~9sp expected profit. 245 sp remaining.

Lot 10, Winter Wolf, 7 days: at an expected revenue of ~25, it is worth going down to 23sp to get ~2sp profit though we might lose. Going down to 22sp looks like it will reduce the chances of getting it by more than a third for less than 50% gain, not worth it. So, bid 23sp, ~2sp profit, 222 sp remaining.

Lot 11, Winter Wolf, 8 days: at an expected revenue of a bit over 23, it is worth going down to 22sp, but not to 21sp (which halves or slightly more than halves success chance for less than a doubling of revenue. So, bid 22sp, ~1sp profit, 200 sp remaining.

Lot 12, Snow Serpent, 8 days: bid 18 for same reason as any other old snow serpent. ~9sp profit, 192 sp remaining.

Lot 13, Winter Wolf, 2 days: bid 27 to beat expected new-carcass bid of 26 (even though we have no 2-day winter wolf non-Carver bids to confirm this);  expected profit ~8sp, 165 sp remaining. 

Since we have lots of sp remaining, I go back and put in bids for the 1 day winter wolves and snow serpents I skipped. Specifically, I'll put in 18 for the one day snow serpents and 24 for the one day winter wolves to beat the old-carcass bid distribution in case the new-carcass bidder doesn't show up.

Comment by simon on D&D.Sci April 2021 Evaluation and Ruleset · 2021-04-19T16:09:51.296Z · LW · GW

All the distributions look pretty much as I expected except the merpeople; while I suspected multiple peaks and/or a long tail I didn't expect that the lower-damage visible(ish) peak had the long tail.

One thing you didn't discuss though, is how you generated the number of trips and trip directions. One thing I noticed, but didn't remark on, is that the variation in total number of trips each month is smaller than the variation in northbound trips and southbound trips separately. That seems unexpected given the scenario, but my main theory is that you simply had some formula for the total trips per month and then randomly assigned each trip a direction?

Comment by simon on D&D.Sci April 2021: Voyages of the Gray Swan · 2021-04-17T20:56:41.765Z · LW · GW

Possible counter-evidence:

Pirates have a bimodal distribution (around 20% and 40% damage) and only the 40% part of the distribution seems to have declined. So, this looks like two different populations and theoretically, the 20% pirates could be the strong, smart pirates who win a lot and back off early if they won't get an easy win, while the 40% pirates could be weak, stupid pirates who go all out every time. 

Still all totally speculative of course.

Comment by simon on D&D.Sci April 2021: Voyages of the Gray Swan · 2021-04-14T02:34:06.909Z · LW · GW

I haven't looked much into dependence on time and direction yet, apart from

noticing that the pirates decline in relative frequency

but, I would like some clarification about whether the 100gp budget is for a single set of interventions we'll use on all trips, or is spent each time on a (potentially varying) arrangement. (edit: I see abstractapplic already responded to such a question from Measure: it's a single set obtained up front and not changed.)

My current thoughts:

From looking at the probability distributions, I mostly agree with gjm; my current recommendations are the same as GuySrinivasan's and Measure's.

Demon Whale distribution scares the crap out of me, and I probably panic buy all 20 oars allowed. I mostly agree that the distribution looks like it could be peaking near 100% damage, but I am not at all confident of this and think it could be consistent with something growing a lot bigger beyond the cutoff. I expect 250-1000 of the destroyed vessels to have been destroyed by demon whales. While it looks like it will be in control before needing 20 oars, I am uncertain enough to value oars pretty high. Budget spent = 20gp.

Merpeople have a very wierd looking distribution. It doesn't seem to tailing off at the end, and so (like gjm) I am very uncertain about what happens after the 100% cutoff. I think (like GuySrinivasan) there's a possibility that it has a multimodal distribution with another peak beyond 100% (not saying bimodal since there even looks like there might be a small peak around 25% distorting the main peak of  about 50%, though this could very easily be random). I figure merpeople are responsible for around 200 (assuming no extra peak) or potentially vastly more (assuming an extra peak) of the losses. Merpeople are potentially another solid choice for mitigation imo, unless removing them from the encounter table puts in demon whales in as the substitute. Budget spent = 45+20=65gp

I do not agree with gjm that only about 1% of crabmonster encounters are terminal. The distribution seems to be tailing off very slowly, visually more or less consistent with a triangle-shaped distribution. A simple linear extrapolation would suggest a few percent which I would take as a lower bound. But it might not be linear, but slowing in how it tapers off, so it might be much much more than this. For all I know (apart from the finite number of sinkings) it might not even sum to a finite value. On the other hand, we only really care about crabmonster attacks that do less than 200% damage, since the only relevant intervention reduces damage by 50%. I estimate that between about 50 and about 250 of the destroyed vessels to have been destroyed by the relevant part of the crabmonster damage distribution, with potentially unlimited numbers destroyed by crabmonsters outside that range. Despite the expected max of about 250 mitigatable losses I consider arming carpenters a pretty solid choice for 20gp. Budget spent = 20+65=85gp.

Nessie looks like a pretty straightforward distribution where I assume about 10-20 losses are from the bit of the distribution we can't see. A single cannon (which is all that we can afford after the other purchases) should suffice. Budget spent = 10+85=95gp.

Conclusion: 20 oars (20gp) + pay off merpeople (45gp) + arm carpenters (20gp) + one cannon (10gp), total 95 gp spent. This is the same plan previously recommended by GuySrinivasan and Measure.

Nothing else looks like it can kill us, unless e.g. some bimodal distribution has one of its humps located entirely within the >=100% zone. 

However, stepping out of the pure data analysis and into reasoning about the fantasy world, it seems strange that pirates would bother to attack us if they only ever do 64% damage. They are intelligent, after all. Maybe they run away if in a hard fight, not sticking around to do more damage than that, and take over the ship entirely if they win? I might consider a second cannon (also provides extra insurance in case nessie's distribution extends further than expected). Dropping 3 oars would provide the funds, and will probably still be enough for the demon whale. 

(and...also anticipated by Measure on the pirate theory).

Comment by simon on Don't Sell Your Soul · 2021-04-07T03:37:18.771Z · LW · GW

They say that now, but perhaps they would change their mind in a hypothetical future where they actually regularly interacted with ems.

Comment by simon on I'm from a parallel Earth with much higher coordination: AMA · 2021-04-06T09:32:15.980Z · LW · GW

I think the idea is that the different Earths have the same landmasses, and in particular the landmass corresponding to our Japan has in Dath Ilan an exile-location function that is vaguely analogous to Australia in the past in our timeline.

Comment by simon on Core Pathways of Aging · 2021-03-30T00:42:55.585Z · LW · GW

It's possible that natural selection has historically kept the quantity of transposons down to small levels relative to the amount that one gains in non-gonad cells during aging. While this may change now that selection is relaxed, if the transposon suppression in gonads is good enough, it may take a long time. (and selection may not really be relaxed in our case, given our tendencies to late reproduction).

Comment by simon on Core Pathways of Aging · 2021-03-29T23:57:10.988Z · LW · GW

It's a stochastic process, not a clock. One person gets an extra transposon copy at location A, another gets one at location B, sexual reproduction drops both 1/4 of the time.

Comment by simon on Core Pathways of Aging · 2021-03-29T15:00:21.516Z · LW · GW

If the suppression of transposons in the gonads is good enough, the reset could be the same as with any other harmful mutation - shuffling by sexual reproduction and natural selection. 


Which may suggest a reason why sexual reproduction exists in the first place.

Comment by simon on Texas Freeze Retrospective: meetup notes · 2021-03-18T16:16:52.536Z · LW · GW

We see evidence reported that climate change may increase the likelihood of extreme weather events, both hot and cold, in the coming years.

Your source does not seem to support this claim, and from what I am aware (largely, admittedly, from skeptic sources, but I haven't seen credible mainstream sources contradicting this), global warming is expected to reduce and not increase overall temperature variation (high temperatures increase but low temperatures get more moderate by a larger amount).

Comment by simon on Pseudorandomness contest: prizes, results, and analysis · 2021-01-31T11:27:51.140Z · LW · GW


With (1) (total number of 1's) excluded, but all of (2), (3), (4) included:

Confidence level: 61.8

Score: 20.2

With (2) (total number of runs) excluded, but all of (1), (3), (4) included:

Confidence level: 59.4

Score: 13.0

With ONLY (1) (total number of 1's) included:

Confidence level: 52.0

Score: -1.8

With ONLY (2) (total number of runs) included:

Confidence level: 57.9

Score: 18.4

So really it was the total number of runs doing the vast majority of the work. All calculations here do include setting the probability for string 106 to zero, both for the confidence level and final score.

Comment by simon on Pseudorandomness contest: prizes, results, and analysis · 2021-01-31T10:51:34.273Z · LW · GW

I think it depends a lot more on the number of strings you get wrong than on the total number of strings, so I think GuySrinivasan has a good point that deliberate overconfidence would be viable if the dataset were easy. I was thinking the same thing at the start, but gave it up when it became clear my heuristics weren't giving enough information.

My own theory though was that most overconfidence wasn't deliberate but simply from people not thinking through how much information they were getting from apparent non-randomness (i.e. the way I compared my results to what would be expected by chance).

Comment by simon on Pseudorandomness contest: prizes, results, and analysis · 2021-01-31T10:32:46.265Z · LW · GW

Whoops, missed this post at the time.

In response to:

(4) average XOR-correlation between bits and the previous 4 bits (not sure what this means -Eric)

This is simply XOR-ing each bit (starting with the 5th one) with the previous 4 and adding it all up. This test was to look for a possible tendency (or the opposite) to end streaks at medium range (other tests were either short range or looked at the whole string).  I didn't throw in more tests using any other numbers than 4 since using different tests with any significant correlation on random input would lead to overconfidence unless I did something fancy to compensate. 

In response to:

“XOR derivative” refers to the 149-bit substring where the k-th bit indicates whether the k-th bit and the (k +1)-th bit of the original string were the same or different. So this is measuring the number of runs ... (3) average value of second XOR derivative and (4) average XOR-correlation between bits and the previous 4 bits...

I’m curious how much, if any, of simon’s success came from (3) and (4). 

Values below. Confidence level refers to the probability of randomness assigned to the values that weren't in the tails of any of the tests I used.

Actual result:

Confidence level: 63.6

Score: 21.0

With (4) excluded:

Confidence level: 61.4

Score: 19.4

With (3) excluded:

Confidence level: 62.2

Score: 17.0

With both (3) and (4) excluded:

Confidence level: 60.0

Score: 16.1

Score in each case calculated using the probabilities rounded to the nearest percent (as they were or would have been submitted ultimately).  Oddly, in every single case the rounding improved my score (20.95 v. 20.92, 19.36 v. 19.33, 16.96 v. 16.89 and  16.11 v. 16.08.

So, looks I would have only gone down to fifth place if I had only looked at total number of 1's and number of runs. I'd put that down to not messing up calibration too badly, but looks that would have still put me in sixth in terms of post-squeeze scores? (I didn't calculate the squeeze, just comparing my hypothetical raw score with others' post-squeeze scores)

Comment by simon on What is going on in the world? · 2021-01-20T01:48:14.608Z · LW · GW

True, the typical argument for the great silence implying a late filter is weak, because an early filter is not all that a priori implausible. 

However, the OP (Katja Grace) specifically mentioned "anthropic reasoning".

As she  previously pointed out, an early filter makes our present existence much less probable than a late filter. So, given our current experience , we should weight the probability of a late filter much higher than the prior would be without anthropic considerations.

Comment by simon on Centrally planned war · 2021-01-06T15:59:46.809Z · LW · GW

Individuals may be bad at foresight, but if there's predictably going to be a good price for 100000 coats in a few months, someone's likely to supply them, unless of course there's some anti "price gouging" legislation.