Posts

Comments

Comment by simon on Omicron Post #14 · 2022-01-14T02:59:29.211Z · LW · GW

I'm not sure what your point is here? Omicron is, to my understanding, replacing delta, which should be benefecial.

Comment by simon on Omicron Post #14 · 2022-01-13T20:54:06.889Z · LW · GW

Not sure what reasons the Wall Street Journal gave (haven't seen the article), but they're probably right. Faster spread doesn't change the total number of cases all that much v. slow spread but it does reduce the average number of sequential infections each lineage experiences before the disease burns out, thus the evolutionary distance possible for the disease to achieve.

Comment by simon on D&D.Sci Holiday Special: How the Grinch Pessimized Christmas · 2022-01-07T23:16:49.023Z · LW · GW

My current analysis and results:

I noticed that the B/F/W combinations tended to be consistent when given to the same child multiple times, so looked at these first.

Assuming that noise is a summed contribution from both toys, then it looks like B usually contributes 6, with some discrepancies consistent with it sometimes contributing 12 instead.

F meanwhile always contributes 4, 8 or 16.

amd W contributes 5(only if F is 8 or 16), 9 (only if F is 4 or 8), 10 (only if F is 8), or 18 (seen with F=4).

Note: it is always possible to make such an attribution with 3 variables, regardless if it really is a sum, but it seems to have worked out in view of subsequent results.

Next, I looked at T, combinations with which seem to be inconsistent but not so as to suggest an age relationship. It looks like T is for some whos 5 or 10 (inconsistently) and for other 10 or 20 (inconsistently).

G and S vary usually from about 5-11 (Note - I wrote this before finding the numbers for the latest Whos!), with G dropping by 1 every 2nd year, and S rising by one every second year for any particular Who.

Comparing to actual recent results we can fill in expectations for how much noise our current Whos will make next year with different toys:

Who | B | F | G | S | T | W

1550 Andy Sue | 6 | 4 | 5 | 18 22 | 5 or 10 | 9

1551 Betty Drew | 6 | 16 | 11 | 10 |5 or 10 | 5

1552 Sally Sue |  6 | 16 |  6 |  10 |  ? 5 or 10 | 5

1553 Phoebe Drew | 6 | 8 | 7 | 18 | ? 5 or 10 | 5

1554 Freddie Lou | ? 6 |  4 |  7 |  17 |  5 or 10 | 9

1555 Eddie Sue | 6 | 4 |  7 | 9 | ? | ?  9 or 18

1556 Cindy Drew | 6 | 8 | 8 |  8 | ? 5 or 10 | 10

1557 Mary Lou |  ? 6 | 8 | 19 16 | 8 | 5 or 10 | ? 5

1558 Ollie Lou | ? |  8 |  8 or 9 | 7 | ? | 9

1559 Johnny Drew | 12 | 4 | ? |  7 | ? 5 or 10 | 9

And whoops, looks like something's wrong since Andy Sue  had a huge jump in noise from toy combos including Sloo-Slonkers between the ages of 3 and 4, looking like he enjoys S to the tune of 14 noise at age 4 and 16 at 7. Something's also up with Phoebe Drew. But, I continued to project forwards in both cases to 18 at the current year anyway as if nothing is wrong.

Also something wrong with Mary Lou and Gah-Ginkas, again I press forward assuming it's all OK.

While we have seen combos with Sally Sue, Phoebe Drew, Eddie Sue, Cindy Drew, Ollie Lou and johnny Drew involving Trum-Troopas, they've all had a 10 contribution from T, so we don't know if they get 5/10 or 10/20 from them.

Now for the solutions:

we have 4B, 4F, 2G, 3S, 3T and 4W to distribute.

In order to try to maximize noise, I'll distribute as follows:

1550 Andy Sue  S + W 

1551 Betty Drew  F + G

1552 Sally Sue  F + T

1553 Phoebe Drew   F + S + T

1554 Freddie Lou B + S   

1555 Eddie Sue F + T + W

1556 Cindy Drew F+ W 

1557 Mary Lou B+ G

1558 Ollie Lou B+ W T

1559 Johnny Drew B+ W

In order to try to minimize noise, I'll distribute as follows:

1550 Andy Sue  F + G T

1551 Betty Drew  B + W 

1552 Sally Sue  B + W

1553 Phoebe Drew B + W

1554 Freddie Lou  F + T

1555 Eddie Sue F + G 

1556 Cindy Drew B + S

1557 Mary Lou T + W 

1558 Ollie Lou  G + S + T 

1559 Johnny Drew  F + S

edit: in view of abstractapplic's observations, we can fill in some of the ?'s in the chart, added in above. The information that a doubling is involved with the anomalies also changes how we project forward the high Gs or Ss. This also led me to reinterpret some results above, also I noticed I had accidentally projected upwards instead of down for G for Mary Lou. Whoops. Since we don't know what Ollie Lou has doubled, I've now avoided giving him T when minimizing noise, even at the expense of a less optimal G allocation. Also fishing for more upside when maximizing noise.

Comment by simon on D&D.Sci GURPS Dec 2021: Hunters of Monsters · 2021-12-22T11:22:36.336Z · LW · GW

Strategy thoughts at this point.

Like abstractapplic, I would recommend going for Crows until we get one (but I see that's not allowed by Jemist, so I'll refine below). So Scorchsands on week 6 and Thunderwood on weeks 7-15 if still hunting Crows.

However, I'll recommend Electro Chainmail rather than Ground Greaves for Crow hunting. Electro Chainmail has only a single instance of coming home with nothing during Crow season, better than any other armor type, and was used in 2 crow kills, tied for the best of any armor type.

I was on the fence between Windrider Crossbow and Stormblade, but settled on Windrider, same choice as abstractapplic, for the reason that the Stormblade+Electro Chainmail combo has the single case of coming home with nothing during crow season while wearing Electro Chainmail (though this is probably a coincidence).

After Crow hunting, it'll be time to hunt Raging Windriders which apparently don't leave footprints (so the biologists foolishly think they're super rare, and they're also large so the Hunters will like them). They are found in Scorchsands and Miresmouth. Miresmouth has much less other monsters in spring, increasing the chance of getting Windriders (but also chances of duplicates).

Raging Windriders are most commonly caught when wearing Flaming Faulds or Icemail (though likely coincidence) and Icemail has fewer cases of coming home with nothing in the Windrider territories. So I suggest Icemail. Though, in spring specifically Icemail looks worse (likely coincidence). I also note that the single Poison weapon in the game, Toxicala Blowgun, has only failed once in it's admittedly small 13 uses. It's also never brought down a Bull-king/Sliding Queen/Crow, suggesting maybe it isn't used in difficult circumstances (but has brought down a Raging Windrider, with Icemail). The single fail was in Thunderwood in the spring, suggesting it might have lost to a Crow. 

So, for hunting Raging Windriders, I'll use Toxicala Blowgun and Icemail, and do it Miresmouth for better chance of a Windrider. I'll also do it in weeks 8-10 to line up with when Windriders have most commonly actually been caught in Miresmouth (in case of sub-seasonal migration).

I expect much further progress could be made by going through all the failed hunts and trying to assess the most likely causes of each one, then seeing what we can infer from this when compared with successes, and then reassessing the causes of the fails, etc. However,  I have been finding little motivation to do it because it looks like a lot of work, and still would be uncertain due to low N.

So, recommendation (unless I change it in the likely short time until the answer is posted):

Week 6: Scorchsand Shores with Windrider Crossbow and Electro Chainmail

Week 7: Thunderwood Peaks with Windrider Crossbow and Electro Chainmail

Weeks 8-10: Miresmouth Forests with Toxicala Blowgun and Icemail

Weeks 11-15: Thunderwood Peaks with Windrider Crossbow and Electro Chainmail

Comment by simon on Manipulation resistance of futarchy · 2021-12-21T02:30:27.816Z · LW · GW

I think there's supposed to be an infinite amount of stock available. That is, the futures market administration lets any participant buy a full combination of all possibilities at the full reward value, and then they can sell off the ones they think are overpriced so a market hoarder can't prevent such selling.

That being said, a manipulator may be hard to distinguish from someone with private information from the point of view of other participants, and I expect they would in fact influence the price.

Comment by simon on D&D.Sci GURPS Dec 2021: Hunters of Monsters · 2021-12-18T02:41:15.768Z · LW · GW

Thanks, I could also use a bit extra time.

Comment by simon on Omicron Post #6 · 2021-12-15T20:03:15.954Z · LW · GW

Overlay comparing with new cases from https://graphics.reuters.com/world-coronavirus-tracker-and-maps/countries-and-territories/south-africa/ (yellow line). I think I have reasonably accurate x-axis scaling, not sure about x-axis position. Y axis position and scaling is arbitrary (scaling based on preserving aspect ratio of original charts, position for easy visual comparison to line for total deaths).

Yellow line is new cases (7 day average) from Reuters tracker, overlaid on deaths chart from SAMRC report cited by avturchin
Comment by simon on Working through D&D.Sci, problem 1 · 2021-12-13T04:32:13.442Z · LW · GW

For anyone looking for the latest D&DSci type problem, Jemist has recently posted one that hasn't seemed to get a lot of attention yet.

Comment by simon on D&D.Sci GURPS Dec 2021: Hunters of Monsters · 2021-12-12T04:11:06.122Z · LW · GW

Thanks for setting up this problem. Initial remarks from considering biology data only:

There is a clear seasonality with 

Winter= weeks 46-6

Spring=weeks 7-19

Summer=weeks 20-32

Fall=weeks 33-45

The biologists do not check Miresmouth Forests in the Fall, do not check Scorchsand Shores in the Winter, do not check The Lordesteppes in the Summer, and do not check Thunderwood Peaks in the Spring. There is also Devil's Maw which they can check year-round.

For all the species the biologists have seen tracks for, they are consistent either with being in one or more locations year-round or migrating between two locations, one in winter or summer and the other for the rest of the year. Note that the information can have gaps due to the seasonal non-checking mentioned above, so it might be that they really aren't all in these patterns, but I'm going with consistency.

The species seen are:

Downhanger (98 total). Migratory. Devil's Maw, wintering in Thunderwood Peaks.

Northern Badger (77 total). Non-Migratory. Scorchsand Shores and Thunderwood Peaks.

Earthmover (71 total). Devil's maw, summering in Scorchsand Shores.

Dull viper (65 total). Miresmouth Forests.

Sandcrawler (53 total). Scorchsand Shores, wintering in Miresmouth Forests.

Flamu (53 total). Non-Migratory. Scorchsand Shores and The Lordesteppes.

Cold Parrot (47 total). The Lordesteppes, summering in Thunderwood Peaks.

Cassowarrior (46 total). Scorchsand Shores, summering in Miresmouth Forests. Edit: incorrectly first wrote wintering here instead of summering, caught the mistake after finding Hunter data inconsistent with what I had written.

Flying Storm (45 total). Miresmouth Forests, wintering in The Lordesteppes.

Wrathrope (41 total). Scorchsand Shores, wintering in Devil's Maw.

Peaksnake (41 total). Thunderwood Peaks.

Macrophant (35 total). The Lordesteppes. Edit: migratory, summering in Miresmouth Forests. This was clear in the bio data but I somehow missed it when summing up and detected the omission from the Hunter data.

Puffdrake (33 total). Thunderwood Peaks, summering  in Miresmouth Forests.

Thunderclap Wyvern (31 total). The Lordesteppes.

Rimewinder (25 total). The Lordesteppes, wintering in Devil's Maw.

Toxicala (19 total). Thunderwood Peaks, summering in The Lordesteppes. I'm going on a limb on this one (since the Lordesteppes has no summer data, their tracks have only actually been seen in Thunderwood Peaks). But, they definitely are not found in Thunderwood Peaks in the Summer so for consistency with the assumed pattern I think they must be in the Lordesteppes at that time. 

todo: check Hunter data for consistency with this, plus learn more from the hunter data. 

The biologists have not seen tracks for: Sliding Queen Shash, Bull-King of Heaven, Raging Windrider, Crow That Breaks the Sky, despite these being included in their notes.

Followup comparing to Hunter data:

Unfortunately, the Hunter data also does not have any Toxicala hauls in the Summer, so the hypothesis that they summer in The Lordesteppes is neither confirmed nor refuted. All data seems consistent with what I should have wrote above (not necessarily with what I actually wrote pre-editing).

The Hunter data also has info on:

Bull-King of Heaven (Summer in Thunderwood Peaks, only 1 data point).

Crow That Breaks The Sky (Summer in The Lordesteppes, Spring in Thunderwood Peaks, Fall in Miresmouth Forests, Winter in Scorchsand Shores). Hmm. This is a familiar pattern - looks like the biologists' "ancient rules" are avoiding this monster for some reason. 

Raging Windrider (Scorchsand Shores and Miresmouth Forests).

Sliding Queen Shash (Fall in Devil's Maw, 2 data points).

Some additional remarks:

The available diet data is compatible with the following rule:

Herbivores migrate in the summer, Carnivores migrate in the winter, Omnivores don't migrate and live in multiple zones, Scavengers don't migrate and live in a single zone. Hmm: to check - do omnivores really not migrate, or do they migrate at a higher frequency than the seasons? (edit: falsified: year 1 week 45 Flamu seen in both Scorchsands and the Lordesteppes.)

Comment by simon on D&D.Sci GURPS Dec 2021: Hunters of Monsters · 2021-12-11T19:56:39.041Z · LW · GW

I'm somewhat bemused that the hunters want us to get bigger monsters, but the size data only comes from the biologists and is incomplete. The obvious thing would be to just ask the hunters, but I guess they have such contempt for us they will not even provide that information?

Comment by simon on How common are abiogenesis events? · 2021-11-27T21:04:24.391Z · LW · GW

OK, but how does this evolve into a bacterium? Won't it evolve into a local maximum of RNA enzyme replication efficiency and stay there?

Comment by simon on What’s the weirdest way to win this game? · 2021-11-22T05:50:08.410Z · LW · GW

My solution:

Each person is ignorant between 2 possible states, but each is ignorant between the correct state and a different incorrect state.

We need to partition states so that some (minority) of states are "special" so that if anyone is ignorant as to whether the state is "special" they will guess they are not in the special state, and if they know it isn't "special" they all stay silent. That way we will lose if the state is special, but will win if the state is not special but someone is ignorant about whether it is special. (What someone does if they know the state is special is not specified, but we're doomed in this case anyway since others will guess wrong).

Ideally, we need it to be the case in all cases there will be at least one person who is ignorant as to whether the state is special, so that you go free unless the state is special. Then we want to make as there be as few as possible states that are special.

If we imagine the hat arrangements as vertices on a 4-dimensional hypercube, then we want each non-special vertex to be adjacent to at least one (and as few as possible) special vertices. So, the question is, what is the smallest number of special vertices we can use?

So I tried drawing this (with a pair of cubes to represent a hypercube) and it seems you need 4 special vertices (if you only have 3 there are two vertices that are not adjacent to any of them).

Let's just start with

1) all white hats

being special. Then the other special states can be:

2) prisoner 1 has a white hat and all others have black hats

3) prisoner 2 has a white hat and all others have black hats

4) prisoner 3 and prisoner 4 have white hats and prisoner 1 and prisoner 2 have black hats

All other states are within 1 of at least one of these states, so if everyone follows the strategy, someone will always be ignorant as to whether we are in one of these special states, and we will win if we are not in these 4 states, which is a 75% chance of winning since there are 16 possible states.

Comment by simon on Does anyone know what Marvin Minsky is talking about here? · 2021-11-19T02:09:19.749Z · LW · GW

With respect to the second question the answer will depend on the discount rate. I expect Solomonoff is assuming that we are in the limit of low discount rate, where exponential decay will look linear, so essentially you are minimizing the expected total number of attempts. 

I haven't done the math to confirm Somolonoff's answer, but if you were to go to each box with probability equal to it being correct, then your expected number of attempts would be equal to the number of boxes, since each box would have an expected number of attempts conditional on it being the right box equal to the the inverse of its probability. So this is no better than choosing randomly. With this in mind it seems intuitive that some intermediate strategy, such as square roots, would then be better.  

Comment by simon on D&D.Sci Dungeoncrawling: The Crown of Command Evaluation & Ruleset · 2021-11-16T02:11:57.447Z · LW · GW

Thanks for making the scenario. 

I found I was slow to start because it seemed a bit intimidating, as there was so many variables that were obviously going to confuse each other. (This was part of why I looked at Threat Level, since it was something I could actually solve). Once I got going, there was plenty to consider but not enough time, which is OK. Maybe some more obviously low-hanging fruit could help people get into it.

A minor formatting preference: if you would add a unique ID to each row, this would help identify a particular row in a way that is preserved through sorting of the data. 

On my particular solution:

I didn't realize I needed all three of Fighter, Ranger and Mage on Dragon. I should have looked at class combos on Dragon in more detail; that should have been discoverable. Meanwhile, the optimal choice on Goblin Warrens was (by sheer coincidence) going to be my second choice, but even if I had chosen it, I would still have been behind abstractapplic who got the proper Dragon combo as well as a good Goblin party. Congrats abstractapplic.

I explicitly contemplated the idea that cleric might restore resources, not from the data but from priors, but didn't look into it (are Clerics good at easy-but-long dungeons would be one possible line of approach).   I also would have eventually looked at the relative importance of different classes being high level if I had enough time.

Comment by simon on D&D.Sci Dungeoncrawling: The Crown of Command · 2021-11-15T07:59:40.666Z · LW · GW

My strategy and analysis:

general remarks:

As abstractapplic notes, parties are less likely to fail on earlier encounters. I can think of multiple possible reasons:

  1. resource depletion. Fights can deplete a resource (or possibly there could be multiple resources) and the party gives up when it runs out of the/a resource.
  2. depth-dependent difficulty. Each fight is an independent binary check, but at a difficulty that depends on the depth of that fight.
  3. debuffs. Each fight can apply a debuff that reduces success chance on later fights. Importantly different from 1 in what you can deduce from failure rate on a particular fight.
  4. incomplete data. It's just too embarrassing to report to the guild that you turned back at the very beginning.

I'm proceeding assuming (1), but on a weak basis: it's the a priori most likely (imo), things don't immediately appear incompatible with this (without me having really checked), and the "Threat Level" stat seems most compatible with this. But I don't really know!

If resource depletion (at least, single-resource) is an accurate model, losses to an encounter type should be fairly informative about the difficulty of an encounter type and what is strong against it, with at least these caveats:

  1. later encounters should falsely appear harder due to resources have been depleted
  2. classes that are better at earlier encounters in the same sort of dungeon that the later encounter is in will falsely appear stronger in the later encounter due to having higher resources entering the later encounters

I am keeping these points in mind but have not really done anything to actually deal with them in the analysis.

I'm also assuming things like only the encounters in the dungeon matter and not the name, order of party members doesn't matter, etc.

edited to add: after posting this i did a brief check on the effect of levels and noticed two outliers where a high level party was defeated early:

Abandoned Dungeon of Azmar: Ranger 7 Druid 6 Fighter 6 Rogue 7 Snake Pit lich Lich (...etc, but defeated there)

Forgotten Temple of Stormwind: Ranger 4 Rogue 6 Fighter 5 Cleric 6 Basilisk Basilisk (...etc, but defeated there)

This makes me think a single-resource-depletion model is maybe less likely, and it might be a multiple-resource model (such that repeated encounters deplete the same resource), but I have no time to re-consider things.

The Goblin Warrens of Khaz-Gorond:

  • Goblins -> Boulder Trap -> Unknown x 8

Measure reports:

Encounters 3-9 are 2/3 Goblins and 1/3 Boulder Trap, and the final encounter is 3/4 Goblin Chieftain and 1/4 Goblins

I confirm, but go a bit further.  Dungeons with "Goblin" in the name all seem to have basically the same encounter generation, unless they have the "Night" prefix (in which case they can have Ghosts) or the "Mountain" prefix (in which case they can have Wolves). There are 118 Dungeons with "Goblin" in the name that have 9 or more encounters; all of these end with a Goblin Chieftain. Also, dungeons with "Goblin" in the name never have exactly 8 encounters, suggesting a discontinuity in the generation rules. Thus, the dungeon will end with a Goblin Chieftain (high confidence).

Threat Level: 3.95 (but beware potential bias regarding perceived difficulty of Goblins, reported by abstractapplic)

Goblins: Rangers are strongest.

Goblin Chieftain: Looks like a relatively hard encounter (but be aware of late dungeon bias).  Fighter is strongest.

Boulder Trap: looks like a relatively easy encounter. Fighter is strongest.

Ranger and Fighter are obvious choices. 

There are a decent number of very similar dungeons to this in the data. Restricting to dungeons with 7 or more total encounters including Goblins, Boulder Trap, Goblin Chieftain and no other encounters, there are 12 wins and 22 losses, maybe not enough to do accurate statistical analysis, but certainly enough for me to engage in my favourite pastime of overfitting to spurious  patterns.

Looking at these particular dungeons, Rangers look incredibly strong, Mages look very bad, and Clerics look uncharacteristically meh (they are usually quite good). Going by strongest-looking to weakest-looking and selecting the top four different types, it looks like Ranger, Fighter, Druid, Rogue would be the strongest party.

However, going further down the overfitting-to-spurious-patterns rabbit hole, Druids and Rogues together have a terrible record (1-9) on these dungeons. Note, Druids and Rogues don't seem to have particular antisynergy in general, so this is most likely completely spurious. However, double Ranger (2-1) looks better assuming the likely spurious patterns are real. Who to drop of Druid and Rogue?

Ranger+Rogue looks good, Fighter+Rogue not so much, Ranger+Druid and Fighter+Druid both OK, so Druid looks the safer choice. 

All of this, to be clear, is far too low N to be of any reliable use. But I'm doing it anyway because, whatever, maybe there's something there.

Another possibility would be Ranger+Fighter+ double Cleric, since both double Cleric parties won. But I'm sticking with double Ranger, Fighter, Druid. 

Levels: see below.

The Lost Temple of Lemarchand:

  • Skeletons -> Poison Needle Trap -> Zombies -> Snake Pit -> Poison Needle Trap -> Skeletons -> Snake Pit -> Ghosts

Threat Level: 4

None of these look like particularly tough fights.

Skeletons, Zombies, : Cleric looks best, followed by mage.

Ghosts: Cleric still looks best, but mages look about comparable to fighters in distant second place.

Note: Ghosts commonly occur in both physical and undead-oriented dungeons, while Skeletons and Zombies are more restricted to undead-themed dungeons. So, potential for different biases to creep in from parties being weakened by other fights in different dungeon types.

Poison Needle Trap: Rogues do best, as reported by abstractapplic and yonge.

Snake Pit: Druids do best, as reported by abstractapplic and yonge. Possibly notable: Fighters don't look especially good here, despite Snake Pits commonly occuring in physical-oriented dungeons.

Obvious choice from just this info is Cleric+Mage+Rogue+Druid.

There isn't a big pool of dungeons with this exact encounter combo to look at for my overfitting. Although undead-themed dungeons with Skeletons, Zombies and Ghosts are common, and they often have Poison Needle Traps, they don't often have Snake Pits. In fact, dungeons with "Undead" in the name never have Snake Pits. However, Undead-themed dungeons without "Undead" in the name apparently have looser rules.

However, we can look at dungeons that have subsets of these encounters (with the understanding that the sample is low in Snake Pits).

Looking at this, Clerics, Mages and Rogues look good as expected, but next place is Fighter. Druids look quite bad.

If we look at synergies between these in this tiny data pool, all of Clerics, Mages and Rogues work well with each other, and while Druids still don't look as good as Fighters, they do OK when paired with these. So, I speculate that druids are only bad for missing something we needed more and we probably have what we need with this party, and that therefore taking Druid to deal with Snake Traps is probably worth missing out on whatever Fighter is bringing us on other fights.

So: Cleric+Mage+Rogue+Druid (same as abstractapplic and Yonge)

Level: see below

The Infernal Den of Cheliax:

  • Orcs -> Snake Pit -> Wolves -> Snake Pit -> Unknown

Measure reports:

The final encounter is a Dragon.

I confirm.

Threat Level: 4.5

Orcs: Fighters do best, then Clerics (as reported by Yonge).

Snake Pit: encounter is also in The Lost Temple of Lemarchand. Druids do best, as reported by abstractapplic and Yonge.

Wolves: Looks like an easy fight, but I expect this is largely because both Fighters and Druids do well against them, and since parties typically have 4 of the 6 classes the large majority of parties will have at least one Fighter or Druid. Fortunately, we are likely to want a Fighter for Orcs or a Druid for Snake Pits anyway.

Dragon: a very hard fight. No class looks to have a big advantage; Fighters, Rangers, Mages and Clerics all do comparably well/badly with Druids worse (by a bit) and Rogues worst.

There are few dungeons that are really close to this one, so only really looked at the individual fights.

If you look at parties with Fighters in them Rangers seem to  do better against Dragons than Mages, so I am thinking: Fighter+Ranger+Druid+Cleric.

Conclusion

I have not analyzed the effect of levels.  I will just slightly buff up the characters that seem the most important and on the dungeons I expect to be harder, and vice versa. I don't know the effect of giving different characters much different levels and will avoid that to avoid possible danger.

Solution (unless I change it later):

The Lost Temple of Lemarchand (9,000gp):

Cleric: 3

Mage: 2

Rogue: 2

Druid: 2

The Infernal Den of Cheliax (14,000gp):

Fighter: 4

Ranger: 4

Druid: 3

Cleric: 3

The Goblin Warrens of Khaz-Gorond (13,000gp):

Ranger: 3

Fighter: 4

Druid: 3

Ranger: 3

Comment by simon on D&D.Sci Dungeoncrawling: The Crown of Command · 2021-11-12T03:33:37.519Z · LW · GW

Anyone who wants to know about "Threat level" can now find information in a separate comment.

Comment by simon on D&D.Sci Dungeoncrawling: The Crown of Command · 2021-11-12T03:24:11.661Z · LW · GW

You are at your local village's Evil Overlords Club meeting. (Yes, Evil Overlording, or to be more precise, wanna-be Evil Overlording, is very popular in this civilization).

Several other club members have coincidentally also taken an interest in arranging for adventures, and some fruitful discussions have taken place (see other people's comments for details).

One of the club members, however, reveals some further information:

"When I sent to get adventuring data from the guild" said your fellow club member, who you only know as simon, "the Evil Overlords liaison there showed my representative something he wasn't supposed to see. It was only a moment, but that was enough - he got it with his Secret Encoder Ring. It was a list of adventure data, but with an additional column, for something called "Threat Level".

"What," you say, "the liaison didn't show my representative that. I thought the liaison was supposed to show data to all Evil Overlords International member representatives equally."

 "Indeed. Which caused me some concern that it might strain relations if I used it. But, I wasn't specifically told to throw it away. Now, the liaison said that this data, supposedly, "didn't end up being used", but even if it wasn't "used",  it's potentially of interest since it might be in some way related to dungeon difficulty or perception of difficulty - or someone's perception of what someone else might perceive as the difficulty - or something. Anyway, this "Threat Level" column had numbers in it and I figured out a formula for the numbers. If you're interested, keep listening."

"So, what's the formula?"

"Simple. It's a sum of contributions from the individual encounters in the dungeon, minus 0.25. 

Goblins = 0.325

Orcs=Skeletons=Zombies=Boulder Trap=Lever Puzzle Room=Riddle Door=Cursed Altar=Snake Pit=Poison Needle Trap=0.5

Goblin Chieftain=Wolves=Orc Shaman=Ghosts=0.75

Orc Warlord=Basilisk=1.25

Lich=1.75

and Dragon=2.5"

"Yes, that does seem simple," you say, "I assume that was pretty easy to figure out."

"Uh...of course," Evil Overlord simon responded, with a slightly pained look. "Not that it had to be easy... a lesser mind than myself might have proceeded from the assumption that there's no 0.25 subtraction, and then, since the first entries when sorted by Threat Level each include exactly one physical trap (by which I mean Boulder Trap, Lever Puzzle Room, Snake Pit, or Poison Needle Trap), counted the physical traps as having 0.25. And then later, that lesser mind would have noticed that they needed to count extra physical traps beyond the first as having 0.5. And, if they had extended that as well to magic traps (by which I mean Riddle Door and Cursed Altar) everything would have worked out, since all dungeons have at least one trap of one sort or another. But, that lesser mind might have been tripped up by the fact that the first magical traps, when the dungeons are sorted by threat level, occur along with a physical trap - so the 0.25 subtraction is accounted for already - and before dungeons with two physical traps are encountered, so before they knew that traps after the first contributed 0.5 to the threat level. So it looks like magic traps are a straight 0.5 and different from physical traps. And since many later dungeons have some magical and no physical traps, the lesser mind would then get the wrong answer for those dungeons and add epicycles. And then become tied in knots trying to make the epicycles work, knowing something is wrong but expecting some simplification to become apparent when they just add enough epicycles, but not looking back to the very earliest assumptions. Of course, super-smart Evil Overlord that I am, avoided that easily."

"That's, um,  a very specific pitfall. How did you avoid it."

"Intuition," Evil Overlord simon grimaced. "By the way," he added, changing the subject. "there were some very tiny discrepancies that look like rounding errors. Like, on the order of 10 to the minus 16th or thereabouts. Anyway, I won't be looking into those. I consider my duty to equalize the information properly discharged."

Comment by simon on D&D.Sci Dungeoncrawling: The Crown of Command · 2021-11-07T23:47:45.597Z · LW · GW

OK, but you've added a new column for "Threat Level", is that intended?

Comment by simon on D&D.Sci Dungeoncrawling: The Crown of Command · 2021-11-07T23:43:12.283Z · LW · GW

Thanks. Actually though, could you keep both versions available? Having some entries listed as "Unknown" makes it easier to check what a party actually fought - something I had been intending to extract from the "# of <x> Encounters" columns when I noticed the issue.

edit: thanks. comment was made before seeing aphyer's comment on the corrected version being available and the edits to the post.

Comment by simon on D&D.Sci Dungeoncrawling: The Crown of Command · 2021-11-07T23:16:53.487Z · LW · GW

A potential issue with the dataset:

The listings for total number of encounters of the different types appear to include the actual encounters that are shown as "Unknown" in the main list of encounters. Is that intended?

Comment by simon on Do you think you are a Boltzmann brain? If not, why not? · 2021-10-15T09:51:56.820Z · LW · GW

Even if the vast majority of entities with your current mental state are Boltzmann brains, you can only expect the mental operations to carry out the conclusion "and therefore I am likely a Boltzmann brain" to validly operate in the entities in which you are not, in fact, a Boltzmann brain. That operation, therefore, would only harm the accuracy of your beliefs. 

Comment by simon on Do you think you are a Boltzmann brain? If not, why not? · 2021-10-15T08:33:53.986Z · LW · GW

In addition to what DanArmak said:

Even if you, in the moment, do not have good reason to be confident that you are not a Boltzmann brain, you do have much better reason to believe that any entity you create in the future is not a Boltzmann brain.

If you wish to improve the accuracy of that entity's beliefs, you can do so by instilling that entity with a low prior of being a Boltzmann brain.

Among the entities you will create in the future is your own future self. 

Comment by simon on Are we in an AI overhang? · 2021-10-09T18:52:14.641Z · LW · GW

I believe that is referring to the baseline driver assistance system, and not the advanced "full self driving" one (that has to be paid for separately). Though it's hard to tell that level of detail from a mainstream media report.

Comment by simon on D&D.Sci 4th Edition: League of Defenders of the Storm Evaluation & Ruleset · 2021-10-06T04:08:43.198Z · LW · GW

Also, I have no idea what I'd even ask for as a scenario.

Neither do I; I was just seeking the glory of (slightly tarnished due to hypothetical rule change) victory.

Comment by simon on D&D.Sci 4th Edition: League of Defenders of the Storm · 2021-10-06T01:13:27.712Z · LW · GW

Post-mortem on my thinking about the sides being asymmetrical:

In order to determine whether there was symmetry, I applied the model to the following datasets and compared the validation scores:

  1. The original data set
  2. A flipped version of the data set
  3. a "cut" version of the data set where some of the data points were flipped and others not (should contains all games in one form or the other, and not have any of the same games)
  4. the complement of the above (everything flipped rather than 3

On finding that 1 and 2 had better validation scores than 3 and 4, and the gap between 1/2 and 3/4 was larger (but really not all that much larger!) than the gap between 1 and 2 or 3 and 4, I declared that there was asymmetry.

But, really this was totally invalid, because, 1 and 2 are isomorphic to each other under a column swap and bit flip (as pointed out by Maxwell) and while this transformation may affect the results it should not affect validation scores if the algorithm is unbiased, up to random variation if it has random elements (I don't know if it does, but the validation scores were not actually identical). Likewise, 3 and 4 should have the same validation scores. On the other hand, 1/2 are not isomorphic to 3/4 up to such a transformation and so have no need to have the same validation scores. So there was a 50% chance of the observed result happening by chance.

Even if my method would have worked the way I had been thinking**, it would be a pretty weak* test. So why was I so willing to believe it? Well, in my previous analysis I had noticed differences in the sides, which might or might not be random, particularly the winrate for games with Greenery Giant on both sides. In such matchups,  green wins 1192, (ironically) much less than blue's 1308. This is not at all unlikely (less than 2 sigma (which I didn't check), and many other possible hypotheses) but this plus the knowledge that League of Legend's map is, while almost symmetrical, not perfectly so, led me to have a too-weak prior against asymmetry when going into poking Maxwell's magic box.

Regardless of this mistake, I do think that my choice to create and use a merged data set including the original data and the flipped data was correct. Given that we either don't care about asymmetries or don't believe they exist, the ideal thing to do would be to add some kind of constraint to the learning algorithm to respect the assumed symmetry, but this is an easier alternative.

*Edit: in the sense of providing weak Bayesian evidence due to high false positive rate

**Edit: which I could have made happen by comparing results from disjoint subsets of the data in each of 1 and 2, etc.

Comment by simon on D&D.Sci 4th Edition: League of Defenders of the Storm Evaluation & Ruleset · 2021-10-05T19:38:40.288Z · LW · GW

The magic black box  supplied to me by Maxwell, after I fiddled with it a tiny bit and supplied it with adjusted data, told me that BGNOT was supposedly the strongest team in general, and that the strongest counter to BGNOT was ABGOT. It also claimed that ABGOT was the strongest counter to gjm's BGNPT. I asked the magic black box how well a few candidate teams, including ABGOT, did against the non-secret competitors already posted, and the numbers it gave looked more generally decent for ABGOT than the other candidates (I was looking for broad-spectrum effectiveness more than average effectiveness) and it also said that ABGOT would have a decent winrate against average teams, so I went with it. I would have liked to make a figure of merit and find the top team for that figure of merit but wasn't able to do so in time.

In other words, the magic black box liked Oil Ooze for some reason.

Comment by simon on D&D.Sci 4th Edition: League of Defenders of the Storm Evaluation & Ruleset · 2021-10-05T18:54:20.871Z · LW · GW

Hmm how about we switch to using a Condorcet method for the PVP ranking? 

I screwed up my thinking on whether the sides were different, will add an edit/reply to my comment on the main post later.

Thanks to Maxwell Peterson for introducing me to R through his post and code, I hope to continue using R later, ideally with a better gears-level understanding than currently.

Comment by simon on 2021 Darwin Game - Tundra · 2021-10-05T03:59:45.341Z · LW · GW

Yeah, that seems crazy high, with only ~300 resources to fight over. If there were a steady state, you'd expect ~300 herbivores and fewer carnivores, across all species.

It's also not the energy budget in the code I downloaded (which was 1000, and even that might be pretty high in tundra given the number of entrants and available resources).

Comment by simon on D&D.Sci 4th Edition: League of Defenders of the Storm · 2021-10-05T01:18:24.132Z · LW · GW

I don't understand what you mean about swapping the columns not having any effect, it would seem to imply, e.g. that swapping which characters you have on your team would have no effect.

Comment by simon on D&D.Sci 4th Edition: League of Defenders of the Storm · 2021-10-04T23:30:41.253Z · LW · GW

I don't understand the details of how your code works very well, but wouldn't 1-v_flip = v_original be what you would get with just flipping the response without swapping columns? 

Also, I spot-checked the flipped csv file and didn't see any discrepancies.

Comment by simon on D&D.Sci 4th Edition: League of Defenders of the Storm · 2021-10-04T08:13:48.925Z · LW · GW

I did some initial analysis and initially came up with the same team as Measure and Alumium. It's also what abstractapplic was on the fence on (but chose differently). Remarks on the initial analysis:

the water, fire and earth groups remarked on by abstractapplic are, it seems to me, more easily noticeable via their (anti-)synergies than via their counters, which seem somewhat hit and miss, presumably due to more individual countering effects. The groups are:

Fire: BFIOPV

Water: ACMSTW

Earth: DEGLQR

and Nullifying Nightmare is separate from each.

The enemy team is, recall, DGPQT

My initial pick was BGNRT was based on green side counters (imagining the enemy on the blue team). GNT are generically strong, R counters DQT on the enemy team, and B counters D and P and (weakly) Q, is not as bad against G as most heroes, and is not as badly stomped by T than other fire characters.

The synergies did not seem important enough to affect the team selection particularly much.

 Then I saw Maxwell Peterson's post and downloaded his gradient-boosting R code. I suggest reading his post first. Remarks based on my use of this code:

Maxwell used 350 training steps, based on green and assuming the enemy to be blue. His code outputted AGLNP as the best counter to the target enemy team DGPQT.

My remarks:

The validation is still improving when Maxwell stops training in the code as supplied in that post. So I increased the number of training steps. The risk is that it will just overfit more, but I assume that if that was a big problem it would make the validation worse? 

Changing to 1000 training steps results in the code outputting AGLNO as the best counter to the enemy team, with AGLNP as the fourth pick.

The problem as supplied by aphyer does not seem to specify as to whether we are Green or Blue. So we should probably be prepared for both (and this especially applies for PVP). I also wondered if the apparent differences were due to random chance. So, I created a flipped version of the initial csv file (changing all green characters to blue and vice versa, and also the wins). Maxwell's code run on this flipped version outputted LOPRT as the best counter to DGPQT  - quite the change! 

I also created a merged csv file from the original and flipped files and 2 cut csv files from the merged file which each contain some entries from the original and some from the flipped file (but no overlap between each other). Validation scores on the cut csv files were worse than on the flipped and original csv files, confirming that side does matter (unless I misimplemented the flip, which would embarrassingly invalidate all my subsequent analysis...). However, even though side does matter, since we don't know what side we are on, I figure we are mainly interested in averaged data so used the merged file anyway. I also increased the number of steps to 1500 for the merged model and doubled the cutoff indices for the validation (because 2x the data).

The code now outputs BGLPR as the best counter to DGPQT. So, that's my choice (unless I change it later). It also happens to be one off from GuySrinivasan's choice (who picked M instead of L).

For PVP, against the average team the code now recommends BGNOT. However, I'd like to modify it to be more specialized against likely opponent teams. This will take some time, possibly more than I can spare, since I have not used R before. If I don't have time to adjust it, I will go with BGNOT.

edit: now switching to ABGOT as the PVP team. According to  the code this counters my above choice, as well as gjm's pick of BGNPT (sorry gjm). Also seems to do OK against others who have published PVP teams (a good reason not to publish them). Attempts to get a better value estimator for PVP teams and apply it to all possible choices have thus far been thwarted by my unfamiliarity with R.

Current choices:

 BGLPR for main answer

ABGOT as PVP team

Comment by simon on An analysis of the Less Wrong D&D.Sci 4th Edition game · 2021-10-04T06:06:23.992Z · LW · GW

Thanks. Well, now that you've given this out I'll steal it for my own use for this problem :)

I'll comment on my answer  on the main D&D.sci post.

Edit: linked comment

Comment by simon on An analysis of the Less Wrong D&D.Sci 4th Edition game · 2021-10-04T02:18:40.898Z · LW · GW

This is very interesting btw, thanks. I've downloaded your code but I haven't used R before so am having some trouble figuring things out.

Does this output from your code relate to generic win probability (i.e. not against that specific team)?

# Groups:   V1, V2, V3, V4 [14]
  V1             V2              V3                   V4         V5           p
  <chr>          <chr>           <chr>                <chr>      <chr>    <dbl>
1 Blaze Boy      Greenery Giant  Nullifying Nightmare Rock-n-Ro~ Tidehol~ 0.805
2 Blaze Boy      Greenery Giant  Nullifying Nightmare Phoenix P~ Tidehol~ 0.802
3 Blaze Boy      Greenery Giant  Landslide Lord       Nullifyin~ Tidehol~ 0.795
4 Blaze Boy      Greenery Giant  Nullifying Nightmare Oil Ooze   Tidehol~ 0.790
5 Blaze Boy      Greenery Giant  Nullifying Nightmare Quartz Qu~ Tidehol~ 0.789
6 Arch-Alligator Blaze Boy       Greenery Giant       Nullifyin~ Tidehol~ 0.787
7 Blaze Boy      Greenery Giant  Nullifying Nightmare Tidehollo~ Volcano~ 0.784
8 Blaze Boy      Greenery Giant  Inferno Imp          Nullifyin~ Tidehol~ 0.784
9 Blaze Boy      Dire Druid      Greenery Giant       Nullifyin~ Tidehol~ 0.781
10 Blaze Boy      Greenery Giant  Nullifying Nightmare Tidehollo~ Warrior~ 0.781
11 Blaze Boy      Captain Canoe   Greenery Giant       Nullifyin~ Tidehol~ 0.778
12 Blaze Boy      Greenery Giant  Nullifying Nightmare Siren Sor~ Tidehol~ 0.775
13 Blaze Boy      Earth Elemental Greenery Giant       Nullifyin~ Tidehol~ 0.773
14 Blaze Boy      Fire Fox        Greenery Giant       Nullifyin~ Tidehol~ 0.769
15 Blaze Boy      Greenery Giant  Maelstrom Mage       Nullifyin~ Tidehol~ 0.769

I also got "object not found" errors and commented out the following lines to fix. I figured they looked likely duplicative of the code generating the above, but am concerned that I might have commented out the code that shows the best teams against the specific enemy team.

The specific lines I commented out were:

arrange(desc(win_proba)) %>% head(20) %>% name_teams()
    
op_matchups_dat %>% head()

nvm, found the output about matchups with that specific team (below), still curious if the above was anything important

   person_1       person_2       person_3             person_4 person_5 win_proba
  <chr>          <chr>          <chr>                <chr>    <chr>        <dbl>
1 Arch-Alligator Greenery Giant Landslide Lord       Nullify~ Phoenix~     0.755
2 Arch-Alligator Landslide Lord Nullifying Nightmare Phoenix~ Rock-n-~     0.750
3 Arch-Alligator Landslide Lord Phoenix Paladin      Rock-n-~ Warrior~     0.748
4 Arch-Alligator Blaze Boy      Landslide Lord       Phoenix~ Rock-n-~     0.738
5 Arch-Alligator Greenery Giant Landslide Lord       Nullify~ Oil Ooze     0.726
6 Arch-Alligator Landslide Lord Phoenix Paladin      Rock-n-~ Tidehol~     0.724
7 Arch-Alligator Landslide Lord Nullifying Nightmare Oil Ooze Phoenix~     0.719
8 Arch-Alligator Landslide Lord Phoenix Paladin      Rock-n-~ Volcano~     0.719
9 Arch-Alligator Landslide Lord Phoenix Paladin      Rock-n-~ Siren S~     0.718
10 Arch-Alligator Greenery Giant Landslide Lord       Phoenix~ Rock-n-~     0.712

Comment by simon on An analysis of the Less Wrong D&D.Sci 4th Edition game · 2021-10-04T00:31:39.465Z · LW · GW

Although the model has OK calibration, it's pretty underconfident at the low end, and pretty overconfident at the high end. 

In a calibration context, I would think that this is underconfident at both ends: when it predicts a win, it is more likely to win than it thinks and when it predicts a loss it is more likely to lose than it thinks.

Comment by simon on An analysis of the Less Wrong D&D.Sci 4th Edition game · 2021-10-04T00:17:25.318Z · LW · GW

Since there are 19 characters, each shows up in 1/19 = 5.3% of all possible compositions. ... I'm surprised how many characters have a ratio of more than 1 here - I don't understand that. Naively, I'd've expected half to have a ratio of less than 1.'

You have a team of 5 not a team of one, so each shows up in 5/19 of all possible compositions.

Comment by simon on 2021 Darwin Game - Contestants · 2021-10-03T16:24:17.413Z · LW · GW

Other duplicate names included ... Forest Tribble

The fact that only Forest Tribble and not the regular Tribble is listed here concerns me. Perhaps I accidentally submitted Forest Tribble twice in which case I was likely the one with 11 entries?

 

This is the realm of ...  Sandworms, 

Heh, I'm pretty sure I submitted "Shai-Hulud" with 10 weapons and 0 armor. It's supposed to be pretty sensitive anyway - that's how you steer them when riding.

Comment by simon on Problems with using approval voting to elect to a multi-individual body? · 2021-09-28T16:31:27.257Z · LW · GW

From the link:

the independence of clones criterion measures an election method's robustness to strategic nomination. Nicolaus Tideman was the first to formulate this criterion, which states that the winner must not change due to the addition of a non-winning candidate who is similar to a candidate already present.

That doesn't seem to be what the OP is concerned with at all, nor does it appear that Approval would violate this criterion.

Comment by simon on The 2021 Less Wrong Darwin Game · 2021-09-25T03:22:03.574Z · LW · GW

Tested: 

Verified: the omnivore wipes them out in 40 generations in my test with both starting on grassland and no other animals in the simulation.

on the other hand:

if a specialized killer of the omnivore is added (weapons 2, speed 2, nothing else) then it eventually (85 generations) killed off the omnivore in my simulation and was able to subsequently co-exist with the minimal seed-eater, which it also predated but not enough to crash the population. Theoretically, the omnivore could move somewhere else, but no such populations got established in my simulation.

 (edit: post bugfix, this was 38 generations and 114 generations. The omnivore did spread to other biomes, but its nemesis spread too and wiped out the colonies before dying out in the biomes without the mimimalist)

Comment by simon on The 2021 Less Wrong Darwin Game · 2021-09-25T02:14:08.523Z · LW · GW

(cleaned up with edits) OK, I was running:

hy 1.0a3 

matplotlib 3.4.3

python 3.9.0

And thanks, after the following it worked, no need to change the others:

pip3 uninstall hy

pip3 install --user hy==0.20.0

Comment by simon on The 2021 Less Wrong Darwin Game · 2021-09-25T02:04:21.954Z · LW · GW

I'm trying to run the source code to test it, but unfortunately I'm not familiar with hylang (and low familiarity with python). 

I've (or at least attempted to) installed hy and matplotlib and tried to run "hy main.hy" (is that the way you are supposed to run it?):

I get an error in line 105 when I run that:

    (return 1))
   ^
hy.errors.HySyntaxError: parse error for pattern macro 'if': got unexpected token: hy.models.Expression([
 hy.models.Symbol('return'),
 hy.models.Integer(1)]), expected: end of file

Anyone know what I'm doing wrong?

Comment by simon on The 2021 Less Wrong Darwin Game · 2021-09-24T23:40:03.668Z · LW · GW

I was thinking of the name Tribbles for this strategy.  Locusts would be fast, I think.

Comment by simon on D&D.Sci Pathfinder: Return of the Gray Swan Evaluation & Ruleset · 2021-09-10T06:46:49.130Z · LW · GW

FWIW I had fun, or at least I remember it that way (I could have suppressed memory of frustrations). I do think I prefer things that are more complicated, or have "secrets", in terms of the underlying dynamics. As you noted in the original post, even if we don't find everything we can still find some things.

I was still planning on doing more analysis but was busy the last few days. (I also made a mistake where I tried to separate the different merfolk areas with a column and row check and noticed afterwards that only the column check had worked, which provided a tiny, but maybe not insignificant psychological activation barrier to continuing.)

Where complication is maybe less desirable is in terms of the data we are supplied. Even so, while I didn't look at, e.g. captains or voyage purpose, I don't feel it was a detriment to my enjoyment of the scenario and I could have looked at them if I had more time.

One thing that did provide some frustration at the start was separating the planned voyages into columns. Text-to-columns did not work correctly in LibreOffice Calc until I made all the hexes the same number of characters. In Excel on the other hand it immediately worked with dash as a separator. (I ended up switching back to LibreOffice Calc again though when I couldn't immediately figure out how to use regular expressions in Excel.)

Edit: I agree with abstractapplic that I'd prefer complicated dynamics to arise from simple rules where possible, but also don't know if that's practical when setting up a puzzle. I am fine though if they are not simple - in real life things are usually not simple. And, in a sense, extra complications are kind of like random noise when you don't figure them out, and are fun-to-deduce regularities to the extent you do. Which is OK (or good) either way.

Comment by simon on D&D.Sci Pathfinder: Return of the Gray Swan · 2021-09-07T09:26:27.273Z · LW · GW

Yes, thanks; deleted the extraneous N5.

Comment by simon on D&D.Sci Pathfinder: Return of the Gray Swan · 2021-09-06T18:10:06.153Z · LW · GW

Some (late relative to others) initial remarks:

As others have noted, all but one datapoint is consistent with Galleon/Carrack having 30hp, Barquentine/Dhow having 20hp, The one exception being a single case of Carrack taking 5% dmg from Reef. Also as abstractapplic has noted, Barquentines tend to be effected by things similarly to Galleons and Dhow similarly to Carracks. abstractapplic claims that they are the same other than hp, but it would take a massive confounder to account for e.g. the different encounter probabilities (as found by measure, also remarked on below).

Note, per-hex encounter probabilities below don't account for selection effects except that I tended to round up if close call to round up or down. I do count only out-of-port ships that didn't get destroyed in the denominator. Damage numbers don't account for selection effects either.

Encounters:

Reef, Kraken, Iceberg Mefolk and Wyrd Majick Fyre have location dependence as noted by others.

Reef: 

Reef always does at least 1 damage, exponentialish decline with long tail, 3.5-4 average

As noted by abstractapplic, Reefs occur on hexes adjacent to land but not adjacent to ports. I haven't seen anyone mention that for the purpose of this rule, L16 is a land hex. I guess it's a seamount.

The probability of receiving a reef encounter if going through a reef hex is about ~20% for non-Dhow's, and ~4% for Dhows.  Combined with the potentially high damage this makes these a high priority to avoid if not using a Dhow.

Kraken:

Kraken: spiky damage histogram. Spikes decline for higher values (but selection effects?), worse for Carrack/Dhow, and  Carrack/Dhow also seem to lack a low damage component present in the Barquentine/Galleon distribution  . ~3.5 average for Galleon/Barquentine,  ~6.5 for Dhow, ~8 for Carrack.

As others have noted Kraken have "territories". These "territories" actually are just a simple rule as with Reefs:

Kraken territories =  spaces at least a 2 hex gap away from land (where land has the same definition as for Reefs, i.e. L16 is a land hex).

Around 25-30% encounter probability per relevant hex. Combined with the high damage, high priority to avoid for all ships but especially Carrack/Dhow.

You can always avoid Kraken+Reefs by keeping a 1 hex gap between you and land (or L16) when not adjacent to a port. There are minimum length paths that follow this rule between most port combinations except between South Point and either Norwatch and Eastmarch, where a detour is required (a quite significant one for Eastmarch/South Point). 

As noted by others, the target points are in Kraken territory (IMO this is likely a coincidence since that just means they are far at sea). We can avoid going into any additional Kraken territory, but this will require an additional detour (relative to just avoiding reefs) for the western target particularly if avoiding E7 for which we have no data.

Iceberg:

Icebergs: dmg roughly consistent with 1d6 as reported by abstractapplic, so about 3.5 average damage. However, Galleons and Carracks seem to take 1 damage more often and Barquentines and Dhows take 1 damage less often. Maybe coincidence?

Icebergs are found from rows 0-2 in summer (Jun-Aug), in rows 0-6 in spring and fall(Sept-May), and in rows 0-10 in winter (Dec-Feb). (Others have remarked on Iceberg seasonality/northernness more generally).

Icebergs are not particularly high probability (<10%) per hex, but would add up if far enough north. Since this voyage will occur in summer, we don't have to worry about icebergs unless taking a significant detour to the north.

Merfolk:

Merfolk: Do 0 dmg a lot of time, though unlike abstracapplic I am not convinced it is exactly half. Exponentialish? decline if they do do damage, which can go to high values. About ~2.6 average damage for Galleons, ~1.7 for Barquentines, ~6 for Carracks, ~4 for Dhows.

As noted by abstractapplic Merfolk have two zones.

Most Merfolk reports form a giant triangularish donut centered around the northeast corner of J8. The donut looks like it should include F7 and L11, but there are no reports from there, and looks like it should not include O5, but O5 does have one Merfolk report. In the case of F7 this is probably just chance, since it's not visited a lot. All other reports are in another Merfolk zone southwest of Westengard. The giant donut occupies most of the center of the map and is hard to avoid, so should be analyzed further. 

Merfolk have a ~9% probability per hex for Galleons, ~2.7% for Barquentines, and ~1.8% for Carrack/Dhow. 

I have also noticed that relative to the low popularity of these hexes, Merfolk are significantly more likely to be encountered in the southwest region. I have not checked if this is connected with the ship type stats, but I will leave this for now since we don't need to go to that region. 

Wyrd Macjick Fyre:

Wyrd Majick Fyre: high damage, mostly 7 or less, but with tail (exponentialish?) 4-5 average for all except Dhow, which gets ~1.2.

As others have reported, Wyrd Majick Fyre mostly occurs around J8 (almost but not quite aligned with the Merfolk donut hole), with a few random-looking other instances. It is a >10% encounter in these hexes making them important to avoid for non-Dhows even if they did not also have Reefs (which they do).

Pirates:

Pirates: as abstractapplic noted does not do 1 dmg (but does do zero, very often, or 2d3? but with a long tail. ~3.1 average for Galleons/barquentine, ~4.4 for Dhow and ~5.29 for Carrack.

As measure noted, Galleons receive more pirate attacks. Per-hex encounter probability of ~12% for Galleons and ~4% for everything else. Todo: check to see if this depends e.g. on mission type.

Storm:

Storm: usually 0-7dmg. some tail. 2.5-2.6 average damage. Around ~7% chance per hex regardless of ship type.

Sharks:

Sharks: as abbstractapplic noted, dmg is consistent with min(2d4)-1. As with Pirates,  Per-hex encounter probability of ~12% for Galleons and ~4% for everything else. While not as damaging as Pirates, adds a reason to avoid Galleons.

Harpies:

Harpies:  Galleons and Barquentines seem to take 0 dmg 2/3 of the time, and 1-2 damage 1/3 of the time. Carracks and Dhows seem to take 0 dmg 1/3 of the time, and 1-4 dmg 2/3 of the time. So, theoretically 0.5 average for Galleons/Barquentines and 1.7 average for Carracks/Dhows

Per-hex encounter probability is ~9% for Galleons and ~4% for others, but per-hex damage from Harpies is still less for Galleons than for Carracks and Barquentines.

Dragon:

Dragon: long tail in damage; peaks later for Carracks/Dhows; ~3.4 average for Barquentine, ~3.7 Galleon, ~5 for Dhow, ~7.5 for Carracks. 

Per-hex encounter probability is ~2% for Galleons, ~0.06% for barquentines, ~0.3% for Carracks and Dhows. In terms of average damage per hex the extra encounter probability for Galleons more than makes up for them taking less damage per event than Carracks and Dhows.

Route and ship selection (analysis):

Looking at the encounter types that are initally seem location-independent, we have an average per hex movement cost in damage points, by ship type, of:

Ship/encounter| Pirates | Storm | Sharks | Harpies | Dragon | Total

Galleon              | 0.36     | 0.17    | 0.11     | 0.039    | 0.073    |  0.75

Carrack              | 0.19     | 0.18    | 0.34     | 0.069    | 0.021    | 0.50

Barquentine     | 0.11     | 0.18    | 0.034   | 0.021    | 0.019    | 0.37

Dhow                | 0.16     | 0.16    | 0.036   | 0.058    | 0.017    | 0.44

Taking into account ship hp, the Carrack looks the best here, with ~60 hexes of movement.

We are also likely going to go into Merfolk territory though, which adds an additional cost:

Galleon: 0.24

Carrack: 0.11

Barquentine: 0.044

Dhow: 0.072

It's looking a lot more even here between Carrack/Barquentine, but still slightly favouring Carrack. Since not all of the trip will be in Merfolk territory, might as well go for the Carrack?

One other thing - this cost assumes uniformity of Merfolk, though I actually think the southwest merfolk are more aggressive. Should adjust to account for that later.

We also want to minimize chance of sinking, not damage to be repaired in port. If confident average damage will be tolerable, we might want to reduce long tails rather than average damage. This could favour the Barquentine.

Dhow has less chance of hitting Reefs. Going to the east target, we can take a shortcut through Reefs and might want to consider a Dhow for that.

Additional dmg per Reef hex (v. non-Reef):

Galleon: 0.79

Carrack:  0.76

Barquentine: 0.68

Dhow: 0.17

Going to the West target, we might want to take a shortcut through Kraken territory, for which a Barquentine might be more suitable than a Carrack.

Additional damage per Kraken hex (v. non-Kraken):

Galleon: 0.99

Carrack: 2.04

Barquentine: 0.95

Dhow 1.59

We also might want to avoid E7, for which we have no data. There be dragons. I mean ... in-universe hypothetical squared dragons.

Also, early on I noted down some hexes where >1/5 of ships passing through were destroyed. They include some hexes which should not be especially dangerous from the above info, but this could just be that the routes also pass through dangerous hexes. Anyway, something to look at with further analysis, and maybe avoid if not costly to do so.

With all the above in mind, candidate routes and other info messily drawn on the map:

EditL map deleted and moved to imgur since it wasn't being spoiler properly

imgur link

When counting hexes, I don't count the port since these seem safe from the data.

For the west target:

Route A is the obvious choice taking all the above at face value. With a return trip, it will involve 27 hexes, of which 22 are Merfolk hexes and 1 Kraken.

Route B avoids the unknowns of E7. It's the same overall length including Merfolk length as Route A, but has 5 Kraken hexes on the round trip.

Route C also expensively avoids E7. It's 37 hexes, of which 20  are Merfolk hexes and 1 Kraken. No way that's going to be worth it.

I also added route G later which minimizes distance (and avoids E7) at the cost of additional Kraken hexes. 25 hexes, of which 20 Merfolk and  9 Kraken.

All of these routes involve >1/5 destroyed hexes, but I'm not prioritizing avoiding these super hard atm on the theory that these hexes will turn out to just be on paths that go through other dangerous stuff or are long.

For the east target:

Route D avoids reefs,  but is long and goes through Merfolk territory. It also goes through some >1/5 destroyed hexes. 21 hexes round trip, of which 18 Merfolk and 1 Kraken.

Route E takes 2 reefs to shortcut. 19 hexes round trip, or which 4 reefs and 1 Kraken.

Route F takes 3 reefs to shorten the path a bit more. 17 hexes round trip, of which 6 reefs and 1 Kraken. This is the shortest possible path given the constraint that you can't go across land.

So, expected damage for each route :

route/ship type   | Galleon | Carrack | Barquentine | Dhow

Route A (west)    |  26.5      | 17.9       | 11.8               | 15.0

Route B (west)    | 30.5       | 26.0       | 15.6               | 21.3

Route C (west)    | 33.5       | 22.6       | 15.4               | 19.2

Route G (west)   | 32.4       | 33.0       | 18.6               | 26.6

Route D (east)    | 21.1       | 14.4       | 9.5                 | 12.0

Route E (east)    | 18.4       | 14.6       | 10.7               | 10.6

Route F (east)    | 18.4       | 15.1       | 11.3               | 10.0

Route A looks so much better than the other western choices that I am willing to have the sailors brave the squared dragons. Barquentine looking like the best choice even with only 20 hp.

For the eastern target, Route D looks good with either a Barquentine or a Carrack, or Route F with a Dhow. Some considerations: Route D does go through >1/5 destroyed hexes, so I should try to find out if that really is a problem. On the other hand, the Dhow has low chance to hit a Reef but not low damage if it does get hit - high variance is risky. On balance, I pick the Barquentine on D for now.

Current route and ship choice:

So, for now I pick: 

"The Bloody Diamond, a Barquentine captained by Angus MacDougal" on Route A (Q6-P6-O6-N6-M5-L5-K5-J5-I5-H5-G5-F5-F6-E7-E8) and back by the same route.

"The Saucy Heart, a Barquentine captained by Erin Aubrey" on Route D (Q6-P6-O6-N6-M7-M8-L8-K9-K10-K11-L12-L13) and back by the same route.

Comparing to others' selections:

My selected routes A and D are the same ones chosen by abstractapplic, but I use two Barquentines whereas abstractapplic uses a Galleon and a Barquentine.

Yonge selected Route F to go to the east target and for the west first selected something that looks like it should be equivalent to my Route B, in terms of length and types of hexes it goes in, but at the bottom of the comment changes it (why?) to add some additional dilly-dallying in Kraken territory. Yonge chose a Galleon and a Carrack.

measure picked two Dhows (unconventional!), and sent one of them on a route equivalent-seeming to Route E, which looks sensible to me, but the other one is going to the west target starting out at (up to the last hex) the same route (so, super long route), and is a Dhow cutting through Kraken territory, which looks not so sensible.

todo:

Look at Merfolk donut only, check to see if that affects merfolk stats

Look to see if expected damage can reasonably account for observed losses, check where excess losses are occuring (is Jemist right that there are unexpected losses?)

check to see if Captains affect anything

check to see if time docked affects anything

check to see if voyage purpose affects anything

additional remarks:

As Yonge notes, there are 19 encounters not on the planned route I did not see a pattern and attribute this to noise in the data. Note that it is possible that, even if something was displaced by noise, it would still end up on the planned route. I am inclined to attribute the Merfolk event on O5 to such noise, the event probably having really occurred on N5, which was also on the ship's route.

Comment by simon on D&D.Sci Pathfinder: Return of the Gray Swan · 2021-09-04T00:17:32.512Z · LW · GW

Thanks. I can just switch to Excel then if it's significantly better for this purpose. In my case this is not a problem since I have office 365 access through work - I just normally avoid closed source stuff (other than games) for my personal use. GuySrinivasan mentioned another thing in an earlier thread (comment link), I probably should check that out, though expect a bigger learning curve.

Comment by simon on D&D.Sci Pathfinder: Return of the Gray Swan · 2021-09-03T05:35:06.336Z · LW · GW

Nice, though I have been finding LibreOffice Calc rather annoying to work with on this one...

Is the following data point a bug?

voyage 3352 has a storm encounter at P14, which is a land hex

Comment by simon on D&D.Sci August 2021: The Oracle and the Monk · 2021-08-14T06:26:38.263Z · LW · GW

Observations and results so far:

Ignoring any time dependence, solar+lunar and solar+earth are the most successfuly combinations; they would have succeeded on 246 and 235 respectively of the existing 374 datapoints.

Note, in my remarks on individual mana types I may include information on other mana types.

Solar:

Solar seems to have a 27 day cycle with 3 peaks within it (so 9 day cycle?) but which peaks are stronger has been changing over time. The current cycle, cycle 14, has been weird with days 15-20 of the cycle (days 366-371) being higher than expected. The last 3 days (372-374) are low, but not far from expected. Other slightly weird cycles include cycle 8 (slightly higher than expected values from days 12-14 of the cycle, i.e. days 201-203) and cycle 10 (slightly higher than expected values from days 20 to 25 of the cycle, i.e. days 263-267). I'm counting "day 1" of a cycle to be the one that's a multiple of 27 from the day 1 of the overall data.


If solar is back to its normal pattern, on day 384 it should be on the way down from a high peak and approaching a high trough, so still doing pretty well (>40 expected), making solar a good candidate for one of the mana choices.

Lunar:

Lunar, like solar, shows a 27 day cycle (shouldn't it be 28 days?) with 3 peaks changing which is stronger over time; outliers include days 26 of cycle 8 to day 1 of cycle 9 (days 215-217) which are unexpectedly weak. Like solar, lunar should be declining to a high trough on day 384, I expect >35.  Assuming solar is back to normal, solar + lunar should succeed.

Ocean:

Ocean varies greatly from single digits to over 60. While some possible patterns appear (e.g. some short range autocorrelation, and a degree of autocorrelation at a 4-day displacement almost as high as at 1-day displacement) it does not seem to have a fixed period of variation. The possible high values make this a potentially interesting choice if it can be predicted, but more analysis needed to determine what ocean will be at at day 384.

Breeze:

Breeeze looks fairly random, distribution peaks at 13 and varies from 6 to 20.

Flame:

Like Ocean, has short-range autocorrelation but no single period. Varies from 11 to 41.

Ash:

Like Ocean and Flame, has short range autocorrelation but does not seem to have a single period. Varies from 2 to 10, so of scientific interest only.

Earth:

Looks random except for some possible 1-displacement autocorrelation. Varies from 9 to 74, so definitely of interest if a pattern can be found.

Void:

Looks fairly random, varies between 17 and 31. This is the same size of range as for Breeze, but has a fatter and asymmetrical distribution.

Doom:

8 days sawtooth pattern with some possible random variation (peak on day 2, bottom on day 3, then steady rise). Notable outliers from expected pattern: Day 9 of cycle 1 (i.e., day 9), day 5 of cycle 22 (i.e. day 181), day 3 of cycle 11 (i.e. day 91), and day 1 of cycle 31 (i.e. day 249).

On day 384, it will be day 8 of the cycle, and something in the range of 27-33 is likely. Not as good as solar and lunar, even discounting the risk of a miscalculation with one of the dangerous mana types.

Spite:

28-day periodicity with lots of peaks and troughs within the period; the troughs are often (but not always) 0. A reliable spike on day 5 gives the period away. Seems to also prefer specific values instead of a smooth distribution. Day 384 will be day 20 of the period, and a low value can be expected (7 or 0).

Preliminary answer:

So far Solar+Lunar seems the best choice.

This is also the choice that looked best before getting time dependence information for Solar, Lunar, Doom and Spite, so further research on the time dependence of other mana types (especially Ocean or Earth which can have high values) might find an alternative, better answer.

Edit after reading aphyer's solution:

Aphyr found that Earth and Ocean are anticorrelated and their sum has a smooth 22-day pattern. As Aphyr reports, day 384 should be close to the peak. Expected value 75-80 or so. This looks like a good, safe solution.

I actually expect Solar+Lunar to be slightly higher, since the peaks and troughs have been shifting in height/depth over time, and while they will be near a trough at day 384, the trough is one that has been shifting upward. I expect ~43- 45 from Solar and ~36-40 from Lunar on day 384. However, this is less certain than the Earth+Ocean expectation and as Aphyr notes Solar has been weird lately. Solar+Lunar is definitely the riskier pick and probably is objectively not what one should pick based on the available info, so I'd switch to Aphyr's solution in real life but I'll stick with Solar+Lunar as what I want to get credit for (for now) since I have an excuse that it might be better and it's what my analysis was on.

Not had time for this recently (fortunately extra weekend though) but after checking gjm, Jemist's and GuySrinivasan's comments:

Whoops so much for Lunar.

Only a little further remarks as time running out:

The following predicts solar with +-1 accuracy:

28 day cycle:  32,32,32,27,27,27,27,27,27,32,32,32,34,34,35,36,37,40,41,42,42,41,40,37,36,35,34,34

9 day cycle: 9,10,10,10,9,3,0,0,3

anomalies: +8 days 61 and 62, +12 dats 201-204, +9 days 263-268, +24 days 366-371

predicted result for day 384 is 45+-1, unless there's an anomaly.

Doom's 8 day cycle is 30,32,18,20,22,24,26,28 plus 0-5 with anomalies on days 34, 91, 181, 249. Unlike solar there are both positive and negative anomalies. Expected result (no anomaly) is 28-33 on day 384.

Solar+doom should give (with no anomaly) 72-81, does not look as good as Earth+Ocean's 74-80 (from GuySrinivasan) though obviously Doom is better than Lunar's known value of 16 at day 384.

With respect to Earth and Ocean, both include many values larger than the minimum of the sum of the two, and Ocean has a sharp minimum value at 4, while Earth's minimum value is not so sharp with 9 being the smallest but more of the bottom edge of the Earth-Ocean x-y plot being at 11 or so. This probably says something about how they are made but I have not thought of it. So far, nothing better than Earth+Ocean found.

also (whoops, this postdated the eval, and was apparently spurious to boot):

Obviously, the best candidates for beating earth+ocean are solar+ocean or solar+earth (whichever we can find out will be bigger).

Spite correlates a bit with Ocean and anticorrelates a bit with Earth. Not a super large effect but relating ocean/earth (which we need to predict) to Spite (which we know deterministically) is very interesting)

Comment by simon on Punishing the good · 2021-07-21T07:58:48.093Z · LW · GW

So, what are your meta-moral considerations here?

If the underlying meta-moral considerations are utilitarian, then I think that using moral outrage as a social punishment against people with differing moral views is likely to backfire very badly in general, and so is not particularly compatible with maximizing utility. (A sin tax is probably a lot safer.)

Now, at least the example of Bob involves a topics on which people in general have differing moral views, but the particular people involved in both examples likely have the same relevant moral views as you. So in these particular cases, perhaps moral outrage might "get the incentives straight", though if people with differing moral views are treated differently (in order to prevent the likely defensive reaction from disagreers), that creates its own set of problematic incentives.

Comment by simon on Covid 7/15: Rates of Change · 2021-07-18T06:48:01.798Z · LW · GW

Yes, as Zvi mentioned in the quote and I acknowledged.

Comment by simon on Covid 7/15: Rates of Change · 2021-07-16T02:17:51.289Z · LW · GW

One possible way for this to kinda sorta work is that perhaps there are people who get tested in order to show a negative test, whose tests get reported every time, and people who get tested because they want to actually know if they have Covid, who mostly only report when they’re positive. Then, doubling the size of the second group doesn’t change reported test counts much? That’s the best I can come up with.

My mental model (backed by nothing) is that a lot of people get tested due to symptoms that often aren't due to covid, so this provides a relatively constant level of negative tests even though they do actually want to know if they have Covid. (In addition to any who simply want a negative test, of course). 

It's possible that perceived prevalence would affect the tendency to get tested for a given level of symptoms, but if so I wouldn't be surprised if this perceived prevalence lags the positive tests. (There's a lot of potential for weirdness here though).

People getting tested due to an interaction with a positive case would provide negative tests correlated with positive tests, but I expect this would lag the positive tests. 

Just speculating - I haven't been paying attention to the negative test patterns in the past, so this might for all I know be totally at odds with the actual data.