D&D.Sci (Easy Mode): On The Construction Of Impossible Structures [Evaluation and Ruleset] 2024-05-20T09:38:55.228Z
D&D.Sci (Easy Mode): On The Construction Of Impossible Structures 2024-05-17T00:25:42.950Z
Unintentionally Creating Value 2024-04-28T20:05:08.479Z
An Unintentional Compliment 2024-04-28T20:04:56.522Z
A D&D.Sci Dodecalogue 2024-04-12T01:10:01.625Z
D&D.Sci: The Mad Tyrant's Pet Turtles [Evaluation and Ruleset] 2024-04-09T14:01:34.426Z
D&D.Sci: The Mad Tyrant's Pet Turtles 2024-03-29T16:22:13.732Z
D&D.Sci(-fi): Colonizing the SuperHyperSphere [Evaluation and Ruleset] 2024-01-22T19:20:05.001Z
D&D.Sci(-fi): Colonizing the SuperHyperSphere 2024-01-12T23:36:54.248Z
Fifty Flips 2023-10-01T15:30:43.268Z
Some 2-4-6 problems 2023-03-28T06:32:02.946Z
Bayesian Scenario: Snipers & Soldiers 2023-02-26T21:48:00.788Z
Durkon, an open-source tool for Inherently Interpretable Modelling 2022-12-24T01:49:58.684Z
D&D.Sci December 2022 Evaluation and Ruleset 2022-12-12T21:21:08.781Z
D&D.Sci December 2022: The Boojumologist 2022-12-02T23:39:49.398Z
D&D.Sci September 2022 Evaluation and Ruleset 2022-09-26T22:19:01.415Z
D&D.Sci September 2022: The Allocation Helm 2022-09-16T23:10:23.364Z
My Opportunity Costs 2022-07-10T10:14:58.827Z
D&D.Sci June 2022 Evaluation and Ruleset 2022-06-13T10:31:25.447Z
D&D.Sci June 2022: A Goddess Tried To Reincarnate Me Into A Fantasy World, But I Insisted On Using Data Science To Select An Optimal Combination Of Cheat Skills! 2022-06-04T01:28:18.301Z
Clem's Memo 2022-04-16T11:59:55.704Z
How I repeatedly failed to use Tobit modelling on censored data 2022-04-02T18:10:42.063Z
Lesson Plan: Biases in Quantity Estimation 2022-03-26T00:23:17.822Z
D&D.Sci August 2021 Evaluation and Ruleset 2021-08-23T22:49:20.528Z
D&D.Sci August 2021: The Oracle and the Monk 2021-08-13T22:36:38.572Z
D&D.Sci(-Fi) June 2021 Evaluation and Ruleset 2021-06-29T21:02:20.072Z
D&D.Sci(-Fi) June 2021: The Duel with Earwax 2021-06-22T11:48:44.718Z
Does anyone have any Data Sidequests? 2021-06-11T23:40:09.844Z
A.D&D.Sci May 2021 Evaluation and Ruleset 2021-05-24T16:25:13.704Z
A.D&D.Sci May 2021: Interdimensional Monster Carcass Auction 2021-05-17T15:54:28.974Z
D&D.Sci May 2021 Evaluation and Ruleset 2021-05-14T11:37:14.328Z
D&D.Sci May 2021: Monster Carcass Auction 2021-05-07T19:33:19.920Z
D&D.Sci April 2021 Evaluation and Ruleset 2021-04-19T13:26:58.278Z
D&D.Sci April 2021: Voyages of the Gray Swan 2021-04-12T18:23:11.674Z
D&D.Sci III Evaluation and Ruleset 2021-03-08T23:01:36.833Z
D&D.Sci III: Mancer Matchups 2021-03-05T19:07:17.473Z
D&D.Sci II Evaluation and Ruleset 2021-01-17T16:58:40.087Z
D&D.Sci II: The Sorceror's Personal Shopper 2021-01-12T01:38:44.168Z
D&D.Sci Evaluation and Ruleset 2020-12-12T15:00:20.984Z
D&D.Sci 2020-12-05T23:26:40.934Z
Model Depth as Panacea and Obfuscator 2020-11-09T00:02:03.297Z
Case Study II 2018-09-30T00:37:32.974Z
Applying Bayes to an incompletely specified sample space 2018-07-29T17:33:53.978Z
Excessive EDA Effortposting 2018-06-03T19:17:22.595Z


Comment by abstractapplic on D&D.Sci Alchemy: Archmage Anachronos and the Supply Chain Issues · 2024-06-09T19:54:33.958Z · LW · GW

After some consideration (and reading other people's answers, in particular simon's) I've come to the conclusion that the best answer to give is actually

Vampire Fang, Troll Blood, Ground Bone, Oaken Twigs, Demon Claw

Comment by abstractapplic on D&D.Sci Alchemy: Archmage Anachronos and the Supply Chain Issues · 2024-06-09T19:51:55.348Z · LW · GW

Wait . . . actually, if we're in the mood for galaxy-brained moves, we could go one better and try to

con the lich into brewing & drinking a regen potion.

Comment by abstractapplic on D&D.Sci Alchemy: Archmage Anachronos and the Supply Chain Issues · 2024-06-09T19:20:47.026Z · LW · GW

I think your theory about

him switching the Barkskin and Necromantic Power potions

is completely correct and I feel dumb for not thinking of it; ditto your proposed reaction. On reflection, I suspect that this is because

he's actually the Loathsome Lich in disguise

so your right-ness is a lot more important than it might seem at first glance. Good catch!

Comment by abstractapplic on D&D.Sci Alchemy: Archmage Anachronos and the Supply Chain Issues · 2024-06-09T02:25:15.080Z · LW · GW

Thanks for running this!

Unless I made some trivial mistake,

Crushed Onyx, Ground Bone, Demon Claw, Giant's Toe, Vampire Fang

should work.


First two ingredients specify the potion, remaining three make it juuust impossible enough to guarantee that it will reliably be magical without going boom.

Comment by abstractapplic on 3 Levels of Rationality Verification · 2024-06-05T00:04:09.296Z · LW · GW

Reputational: D&D.Sci.

Experimental: D&D.Sci, with a consistent limit on time & resources used.

Organizational: D&D.Sci, with a consistent limit on time & resources used, using freshly-baked scenarios you know no-one has ever played before.


  • Takes several hours to play most scenarios.
  • Requires generic coding/spreadsheeting/data-science-ing skills in addition to Rationality; people who are good at those skills get an unfair(?) advantage.
  • Getting familiar with the genre gives an unfair(!) advantage.

Misc. addl. reflections on the topic:

  • Starting from zero is a valid approach, but looking at existing tests and thinking "okay but what if this was better/harder/about slightly different skills" is also sensible. Figuring out how clever and effective people are is a big industry! We should take inspiration from tests employers give job applicants, and any test any gatekeepers give anyone. (Especially if that means we get to subsidize development of rationality-tests by selling them to HR departments.)
  • . . . are there any ways to test rationality which don't rely on complementary skills? Even written tests test your ability to read the questions.
  • Videogames could be so good for this if they weren't optimized for fun and accessibility.
Comment by abstractapplic on The Pearly Gates · 2024-05-30T13:53:04.503Z · LW · GW

stared daggers


This has connotations of being angry, which I don't think is what you're going for. (Unless Peter is getting mad at Oskar for potentially revealing his scheme to his bosses by doing something too similar, or he's irritated that a kindred spirit isn't recognising him fast enough, or unless I've completely misunderstood the implication here.)

Comment by abstractapplic on Higher-Order Forecasts · 2024-05-23T21:05:05.786Z · LW · GW

I think 0th-order, 2nd-order and 3rd-order forecasting should be called threecasting, fivecasting and sixcasting respectively. This easily lets speakers differentiate between layers; also, imo, names which are bad puns tend to stick.

Comment by abstractapplic on Procedural Executive Function, Part 3 · 2024-05-22T12:19:22.572Z · LW · GW

Second link is broken.

Comment by abstractapplic on D&D.Sci (Easy Mode): On The Construction Of Impossible Structures · 2024-05-17T14:12:37.293Z · LW · GW

Response to clarifying question:

Yes. The Duke has learned the hard way that his architects' guesses as to how much their projects will end up costing are consistently worse than useless; if you want to optimize on cost as well as impossibility, that's another thing you'll have to deduce from the record of finished projects.

Comment by abstractapplic on D&D.Sci Long War: Defender of Data-mocracy Evaluation & Ruleset · 2024-05-14T08:45:24.072Z · LW · GW

Reflections on my performance:

I'm pleasantly surprised by the effectiveness of my reasoning, and of my meta-reasoning. Not only did my loadout do well, but my calibration was impressively close: the final decision I pegged at a "~95%" success rate got 94.4%, and most of the alternative strategies I mentioned in my post were similarly on-the-nose.

(Unfortunately, my meta-meta-reasoning could still use some work. I figured out that this was a "linear-ish logistic success model with some interactions on top" kind of problem, took this as an opportunity to test that library I made, created a good predictor with a bunch of pretty/informative graphs . . . and then found myself thinking "only need one minigun? doesn't sound right to me", "why would Tyrants/Artillery and Scarabs/Minigun-or-Flamethrowers be so much stronger than every other potential feature interaction?", and "I'm totally gonna turn out to have screwed up and wish I'd handled this with XGBoost, better not even mention how I built my model". If I'd been more calibrated about how calibrated I ended up being, this could have been a really good chance to show off by calling in advance that my unconventional ML approach would succeed here.)

Reflections on the challenge:

This was the 2D performance thing I tried to pull off in Boojumologist, but with better conceptual underpinning and flawless execution. I'm proud, gladdened and envious: can't think of a single way to improve this scenario.

(I, uh, may be biased by how well I happened to do: please take this feedback with a grain of salt.)

Comment by abstractapplic on How do I get better at D&D Sci? · 2024-05-12T02:46:02.774Z · LW · GW

By imitating other players

As Jay Bailey mentioned, you can look at how other players approached challenges, and copy the approaches that worked. Pablo Repetto’s playthroughs of three early .scis seem particularly worthwhile given your situation, both because of how comprehensive & well-written they are, and because they were made by someone in the process of learning to use code on data science problems (the first playthrough was done in pure Excel, the other two were handled in Python).

By following a sensible strategy

Below is my standard plan for investigating a dataset, synthetic or otherwise (cribbed from an otherwise-mediocre Udacity course I took most of a decade ago, and still worth following).


Univariate Analysis: How is each feature distributed when considered in isolation? You should probably make a histogram for each column.

Bivariate Analysis: Construct and check the correlation matrix between all features. Are there clusters? Create scatterplots (or equivalent) for any pair of features which correlate unusually strongly, any pair of features where at least one is a response variable, and any pair of features you find yourself curious about.

Feature Derivation: Based on what you’ve seen so far – and/or common sense – are there meaningful features you can create from what you’ve been provided? (i.e., if you’re given "Number of Wizards", "Number of Sorcerors" and "Number of Druids" for each row, it might be worth creating a “Total Number of Magic Users” column.) Investigate how these features interact with others.

ML Modelling: If you can, and it seems like a good idea, build an ML model predicting the important/unknown features from those you have. If constructed successfully, this becomes an oracle you can ask about the outcome of any possible choice you could make. (XGBoost and similar tools are extremely versatile, and have pretty good performance on most problems.)


(The above is just a rough guide for what to do when you don’t know what to do. If you follow it, you should pretty quickly find yourself with a list of rabbitholes to fall down; you should probably err on the side of dropping everything and deviating from the path as soon as you find something interesting.)

By playing easier D&D.Scis

Difficulty of D&D.Sci games tends to be both high and high-variance; it’s usually assumed that players will have both data-manipulation and model-building skills. For what it’s worth, I can confirm that two relatively-approachable scenarios where not-using-ML won't put you at a disadvantage are (spoilered because this technically leaks information about them):

Comment by abstractapplic on D&D.Sci Long War: Defender of Data-mocracy · 2024-04-30T19:02:49.380Z · LW · GW

If I'm following your notation right, it looks like you mixed up Flamethrowers and Miniguns.

Comment by abstractapplic on D&D.Sci · 2024-04-28T21:33:10.599Z · LW · GW

I'm glad you liked it!

(. . . could you spoiler your strategy and win chance? I know this challenge is three years old, and what you mention here isn't game-breaking info, but I want to keep it possible for people looking/playing through the archives to seek clarifications in the comments without unwittingly/unwillingly learning anything else about the scenario.)

Comment by abstractapplic on D&D.Sci Long War: Defender of Data-mocracy · 2024-04-28T01:33:01.130Z · LW · GW

That makes sense, ty.

Comment by abstractapplic on D&D.Sci Long War: Defender of Data-mocracy · 2024-04-28T00:56:30.929Z · LW · GW

The leaderboard will track how well you've done relative to random/best play at the # of soldiers you chose to bring.

Could you elaborate on this? I think I'd do better relative to best play with

high numbers of soldiers,

and do better relative to random play with

 low numbers of soldiers,

so it's not clear which way I should lean; also, I don't know how you plan to quantify "relative to".

Comment by abstractapplic on D&D.Sci Long War: Defender of Data-mocracy · 2024-04-28T00:45:44.089Z · LW · GW

What we're facing:

  • A horrifying number of Tyrants,
  • A large quantity of Scarabs and Abominations, and
  • A below-par-given-they-showed-up-at-all-but-still-significantly-above-zero count of Crawlers and Venompedes.

Relevant Weapons:

  • Artillery is the optimal counter for Tyrants.
  • Miniguns are very good at handling Scarabs (to the point that bringing more than one would likely be overkill), and pretty useless at most handling most other xenos (to the point that bringing more than one would likely harm our chances).
  • Lances are good counters for anything which isn't a Tyrant or a Scarab. (And also not-terrible vs Tyrants)
  • Torpedos are slightly better than Lances when facing Abominations, and only slightly worse than Artillery when facing Tyrants.
  • (As far as I can tell, the other four weapons aren't worth considering.)

Current strategies per number of soldiers:

8 Soldiers: 3 Artillery, 2 Lances, 1 Minigun, 2 Torpedos.

(My model says this gives me >99% chance of survival, but also says that about just bringing one of every weapon. We can be more daring!)

7 Soldiers: 3 Artillery, 2 Lances, 1 Minigun, 1 Torpedo.

(My model says this gives me ~95% chance of survival.)

6 Soldiers: 2 Artillery, 2 Lances, 1 Minigun, 1 Torpedo.

(My model says this gives me about a 2/3 chance of waking up the next morning.)

5 Soldiers: 2 Artillery, 1 Lance, 1 Minigun, 1 Torpedo.

(My model says this has slightly worse odds than a game of Russian Roulette with five bullets loaded.)

4 Soldiers: 1 Artillery, 1 Lance, 1 Minigun, 1 Torpedo.

(My model says this almost gives me an entire 1% survival chance.)

If I have to pick one strategy:

7 Soldiers: 3 Artillery, 2 Lances, 1 Minigun, 1 Torpedo.

Comment by abstractapplic on D&D.Sci Long War: Defender of Data-mocracy · 2024-04-27T23:44:31.865Z · LW · GW

Description of an investigative cul-de-sac:

I notice that

  • Duels between a Tyrant and an Artilleryman always end well.
  • Duels between a Tyrant and a Minigunner, Phaser or Flamethrower always end badly.
  • Tyrant vs Artilleryman 2v2s . . . don't happen, ever. (Turns out the quartermasters do display some nonrandom behaviors, and one of these is a bias towards weapon variety.)
  • 2v2s involving two Tyrants, an Artilleryman, and someone who'd lose a 1v1 against a Tyrant . . . end well pretty much exactly half the time, regardless of which [MPF] is used.

I reason that

This is what we'd see in a turn-based fight where humans aggressively heroically always take the first move, and the xenos move randomly. The Artilleryman caps a Tyrant every time; the remaining Tyrant then picks a random human to squish; they pick the dud half the time; we get the coinflip we see.

But then

I find out that there are 2v1 fights between two Tyrants and a lone Artilleryman, and these have the exact same 50% win chance; the dud isn't even useful as a decoy; my hypothesis is falsified.

From all this I conclude

Absolutely nothing.

Comment by abstractapplic on D&D.Sci Evaluation and Ruleset · 2024-04-27T23:15:48.214Z · LW · GW

Like, conceptually it's absolutely unpredictable

That's exactly what I was going for; I wanted phenomena which couldn't have been predicted without using the dataset.

Comment by abstractapplic on D&D.Sci Long War: Defender of Data-mocracy · 2024-04-27T15:06:05.015Z · LW · GW

Misc. prelim notes:

  • There's a random element. (Existence proof: 16079 and 17759 were the same fight but we only lost 17759.)
  • There's an implicit chrono effect: It looks like this war has been developing not necessarily to our advantage. (Luckily it seems like this is probably 'just' enemies outnumbering our troops more frequently in later rows, and not anyone actually getting better/worse at their job.)
  • The number of troops sent scales with the size of the enemy forces, making inference trickier; however, I haven't seen anything contradicting the hypothesis that loadouts are decided by throwing darts at a board.
  • Specific weapons counter specific enemies: in particular, the Minigun is usually pretty lousy, but drops Scarabs like flies.
  • I expected to find synergies between weapons, and didn't. I did, however, find some antisynergies: Miniguns and Flamethrowers are hella redundant (presumably because they're both anti-Scarab bugspray), and the [MPR] set all clash with each other ("Why do you need Gun? You already have Gun!")
  • Guaranteed victories seem possible. (A single soldier with a minigun can perfectly-reliably survive 5 Scarabs, but not 6.)
Comment by abstractapplic on D&D.Sci Long War: Defender of Data-mocracy · 2024-04-27T10:14:37.512Z · LW · GW

Thanks for running this when my one was going to be late, and thanks for checking with me beforehand.

(Also, thanks for the scenario, like, in general: it looks like a fun one!)

Comment by abstractapplic on Is there software to practice reading expressions? · 2024-04-25T10:13:51.868Z · LW · GW

I (to my own surprise) got an "above average" score when I took this test a few years back, which I attribute mostly to the lack of emotional and circumstantial 'noise' in the images. I don't think being able to tell what is being emoted by a professional actor told to display exactly one (1) emotion, with no mediating factors, has much connection with being able to read actual people.

(. . . though a level-2 version with tags like "excited but hesitant" or "proud and angry" or "cheerful; unrelatedly, lowkey seasick" could actually be extremely useful, now I think on it.)

Comment by abstractapplic on Motivation gaps: Why so much EA criticism is hostile and lazy · 2024-04-22T20:15:51.908Z · LW · GW


"Al gore"->"Al Gore"


"south park"->"South Park"

"scott alexander"->"Scott Alexander"

"a littler deeper"->"a little deeper"


(. . . I'm now really curious as to why you keep decapitalizing names and proper nouns.)

Regarding the actual content of the post: appreciated, approved, and strong-upvoted. Thank you.

Comment by abstractapplic on Thinking harder doesn’t work · 2024-04-10T20:04:36.012Z · LW · GW

This, linked at "Never." in the OP.

Comment by abstractapplic on Symbiotic Conflicts · 2024-04-10T19:06:53.703Z · LW · GW

an alliance socialist nations


an alliance of socialist nations

Comment by abstractapplic on Thinking harder doesn’t work · 2024-04-10T19:00:33.087Z · LW · GW

I didn't like this post, but I did very much like the "insight porn" post it linked to. (Unfortunately LW doesn't let you simultaneously downvote and strong-upvote a post, so consider my weak-upvote as a sum-of-vibes.)

If someone says ‘What’s for supper?’ a beginner will desperately try to think up something original. He will carefully evaluate dozens of options in his mind.

“Is this funny?” “Will this not reveal something weird about myself?”

It will take him ages to come up with something and eventually he will say something “fried mermaid”.

An improv pro would simply respond “fish”.

Taken - almost verbatim, without attribution - from Impro, by Keith Johnstone. (I don't know whether LW would consider this plagiarism, or consider that to be bad.)

Comment by abstractapplic on Medical Roundup #2 · 2024-04-09T16:13:16.352Z · LW · GW

taking it out early and letting it sit


What I actually usually do is move it from the freezer to the refrigerator like 15min before I eat it, so the change in temperature is more predictable and evenly distributed (instead of some parts being melted while others stay too cold).

Is the point that it's initially too hard to scoop?

That and it being too cold to properly enjoy the taste.

(The votes on my original comment make me think most people are less concerned about their dessert-that's-supposed-to-be-cold being too cold. Typical-mind strikes again, I guess.)

Comment by abstractapplic on D&D.Sci: The Mad Tyrant's Pet Turtles [Evaluation and Ruleset] · 2024-04-09T16:05:08.390Z · LW · GW

I enjoyed the exercise, thanks! 


You're welcome, and thank you for playing.

(I wrote a custom loss function for the NN)

I'm curious how you defined that. (i.e. was it "gradient = x for rows where predicted>actual, gradient = -8x for rows where actual>predicted", or something finickier?)

Comment by abstractapplic on D&D.Sci: The Mad Tyrant's Pet Turtles · 2024-04-09T14:54:09.644Z · LW · GW

I think this is because

LightGBM and its kin are tools for creating decision forests, not decision trees. If you use standard hyperparameters while creating a single-tree model then they will under-train, resulting in the "predict in a way that's correlated with reality but ridiculously conservative in its deviations from the average" behavior you see here. Setting num_boost_round (or whatever parameter decides the number of trees) to 200 or so should go some way to fixing that problem (while giving you the new problem of having produced an incomprehensible-to-humans black-box model which can only be evaluated by its output).

(I would have said this sooner but helping a player while the challenge was still running seemed like a bad look.)

Comment by abstractapplic on Medical Roundup #2 · 2024-04-09T14:24:04.455Z · LW · GW

I suspect a large (possibly not dominant) part of the ice cream effect is required preptime triggering myopic discounting. If eating ice cream at home, you need to take it out of the freezer at least a few minutes before eating it; this means that if your comfort food of choice is ice cream, you'll only eat it if it seems like a legitimately good idea ('a moment of weakness' becomes 'like 10min of weakness', a higher bar for cravings to clear).

Comment by abstractapplic on D&D.Sci: The Mad Tyrant's Pet Turtles · 2024-04-04T15:39:53.326Z · LW · GW

Note: I'll be unavoidably and unexpectedly busy at the start of next week, and so will have to delay resolution of this challenge until either Tuesday or Wednesday (probably Tuesday). I'd apologise for the inconvenience but I'm pretty sure no-one minds.

Comment by abstractapplic on D&D.Sci: The Mad Tyrant's Pet Turtles · 2024-04-02T18:05:54.473Z · LW · GW


The Tyrant will weigh his Precious Beasts with the same level of diligence you would: no more, no less.

You can predict weights with as fine a granularity as you like; if you want to claim a turtle has a weight of 12.345678lb, that's fine.

Comment by abstractapplic on Back to Basics: Truth is Unitary · 2024-03-30T15:13:31.777Z · LW · GW

"Mu": Japanese word roughly translateable as 'absence'.

"Kami": Japanese word roughly translateable as 'god'.

"-sama": Japanese honorific for referring to someone whose status/position is much higher than yours.

Comment by abstractapplic on Back to Basics: Truth is Unitary · 2024-03-29T21:18:10.991Z · LW · GW

Mukami-sama, the God of Atheism

I found this disproportionately charming.

Comment by abstractapplic on [Linkpost] Leif Wenar's The Deaths of Effective Altruism · 2024-03-29T21:13:06.753Z · LW · GW

I think the commentary on the state of Givewell's evidence - in particular, that worryingly large parts of it come down to "we called a mid-ranking employee once and they claimed they were doing X and we thought they had good vibes" - was good, correct, novel and important: strong upvote for that alone.

 (I disagree that you should blame Givewell for that: they're not hiding their flaws, and AFAICT the only people other than this author who are openly discussing Givewell's limitations are Givewell themselves. Most of their alleged sins IMO come down to the way people insist on treating them, and the bizarre dearth of competitor/successor organisations aiming for "Givewell but >10x more rigorous/demanding".)

I think almost everything else the author says is some combination of incoherent, incorrect, mean-spirited and fnordful. But in the marketplace of ideas, one bullseye is worth any number of missed shots.

Comment by abstractapplic on One-shot strategy games? · 2024-03-11T11:41:16.866Z · LW · GW

Disrecommending Slay The Spire. While it's a great game and it fits the rest of your criteria like a glove, it has very little hidden information in a practical sense (one of the more innovative things about it is that you can almost always see what the enemy will do next turn), and as such has basically no places where explore/exploit tradeoffs and VOI calculations would be relevant (I assume that this isn't a negotiable part of what you're asking for; if not, yeah I also recommend StS).

Comment by abstractapplic on One-shot strategy games? · 2024-03-11T11:35:29.491Z · LW · GW

Tentative recommendation of Slipways; the VOI part isn't as central as I suspect you'd like, but sending out probes sure does cost time and money you could use for settling planets and forming trade routes; and while it's easy enough to survive to the end of your term, it gives what you're asking for if you choose to define 'victory' as 'get 5+ stars on Tough'.

Comment by abstractapplic on Lsusr's Rationality Dojo · 2024-02-15T20:42:56.347Z · LW · GW

The objective of rationality is to become right instead of wrong.


I think this is technically false, in a subtle but important way. If I gained [knowledge of whether every six-digit number is prime] in exchange for [knowledge of whether wandering out into open traffic is a good idea], I'd have gleaned a net 899999 bits of right-ness, but it still wouldn't have been a worthwhile deal, or made me more rational in any practical sense. The missing gears are becoming right about important && relevant things, bothering to apply that knowledge, and - conditional on applying it at all - applying it well.

I think this project is good (Like, unusually good! It's a step forward! I enjoyed it, and I commend you for your service to the Cause!), but I notice a lack of emphasis on changing actions vs changing minds, both in this post and in the videos I watched, and I want to make sure you've noticed that too.

(And yes, I do recognize the irony of me pointing out a true thing about [pointing out true things without having an associated practical outcome] without having an associated practical outcome. Still think it's worth saying!)

Comment by abstractapplic on D&D.Sci(-fi): Colonizing the SuperHyperSphere [Evaluation and Ruleset] · 2024-02-05T22:49:00.726Z · LW · GW

Sorry about that, reality got in the way; also, ended up scrapping my concept for the next one and my backup concept for it; no idea when it'll end up actually made (not necessarily this month), except that I plan to release on a Friday to do the standard "10 days with a choice of weekend" thing.

Comment by abstractapplic on D&D.Sci(-fi): Colonizing the SuperHyperSphere [Evaluation and Ruleset] · 2024-01-23T14:50:29.666Z · LW · GW

Damn! Mea culpa; I'll edit the original post so anyone going through the archives won't have the same problem.

Comment by abstractapplic on How I repeatedly failed to use Tobit modelling on censored data · 2024-01-21T02:11:41.329Z · LW · GW

Also, strong-upvoted for asking "so, with X years of hindsight, how did this pan out?" on an old post. More people should do that.

Comment by abstractapplic on How I repeatedly failed to use Tobit modelling on censored data · 2024-01-21T02:10:17.422Z · LW · GW

Before circumstances let me answer that question, the client got bought out by a bigger company, which was (and is) a lot more cagey about both hiring contractors and sharing internal details with outsiders; last I heard, the client's absorbed remnants are still sometimes using my modelling approach, but I have no idea how much they're using it, how much they're relying on it, or to what extent it's benefiting them.

Comment by abstractapplic on D&D.Sci(-fi): Colonizing the SuperHyperSphere · 2024-01-14T12:48:08.129Z · LW · GW

There are no time effects in the data; past trends can in generality be assumed to exist in the present.

(Good question!)

Comment by abstractapplic on D&D.Sci(-fi): Colonizing the SuperHyperSphere · 2024-01-14T12:17:49.112Z · LW · GW

The same way it does everything: in a weird, non-Euclidean manner which defies human intuition.

Comment by abstractapplic on Bounty: Diverse hard tasks for LLM agents · 2023-12-17T23:51:36.892Z · LW · GW

For the unreleased challenge, b) isn't for sale: making something intended to (eventually) be played by humans on LW and then using it solely as LLM-fodder would just be too sad. And I'm guessing you wouldn't want a) without b); if so, so much for that.

. . . if the "it must never be released to the public internet" constraint really is that stringent, I might be better advised to make D&D.Sci-style puzzles specifically for your purposes. The following questions then become relevant:

.How closely am I allowed to copy existing work? (This gets easier the more I can base it on something I've already done.)

.How many challenges are you likely to want, and how similar can they be to each other? (Half the difficulty on my end would be getting used to the requirements, format etc; I'd be more inclined to try this if I knew I could get paid for many challenges built along similar lines.)

.Is there a deadline? (When are you likely to no longer need challenges like this?) (Conversely, would I get anything extra for delivering a challenge within the next week or so?)

Comment by abstractapplic on Bounty: Diverse hard tasks for LLM agents · 2023-12-17T13:04:09.145Z · LW · GW

This seems like a natural fit for D&D.Sci games. All the ones I made are public domain, so you can use them freely (and I bet the other people who made some would give you permission if you asked them nicely), they've been publicly played by clever humans with a variety of skill levels and associated outcomes, and they're obscure enough that I doubt an LLM would have memorized the solutions (and if not you could tweak the names and data-generation hyperparameters to flatfoot them).

. . . I happen to have a completed-but-unreleased D&D.Sci game, which I was planning to put on LW early next month, after everyone got back from their holidays. Would it be helpful if I sent it to you and delayed the release until Feb, so you and yours could let LLMs try it first?

Comment by abstractapplic on New LessWrong feature: Dialogue Matching · 2023-12-11T00:58:23.374Z · LW · GW

I am in literally the exact same situation, and think your proposed remedy makes sense.

Comment by abstractapplic on A Socratic dialogue with my student · 2023-12-06T14:40:47.159Z · LW · GW

I haven't eaten meat in months.


Completely orthogonal to any of the more interesting points you were trying to make, but: it looks like you might be going vegan in an unsystematic way. I heard this gives people severe permanent disabilities, in ways that are trivial to dodge once you know what they are. (I realize you've probably already addressed this, but thought I'd err on the side of caution and nag you anyway.)

Comment by abstractapplic on Fifty Flips · 2023-10-02T03:19:49.165Z · LW · GW

>You link to index C twice, rather than linking to index D. 

Whoops! Fixed now, thank you.

Comment by abstractapplic on D&D.Sci 5E: Return of the League of Defenders Evaluation & Ruleset · 2023-06-10T12:42:12.578Z · LW · GW

Reflections on my performance:

I failed to stick the landing for PVE; looking at gjm’s work, it seems like what I was most missing was feature-engineering while/before building ML models. I’ll know better next time.

For PVP, I did much better. My strategy was guessing (correctly, as it turned out) that everyone else would include a Professor, noticing that they’re weak to Javelineers, and making sure to include one as my backmidline.

Reflections on the challenge:

I really appreciated this challenge, largely because I got to use it as an excuse to teach myself to build Neural Nets, and try out an Interpretability idea I had (this went nowhere, but at least failed definitively/interestingly).

I have no criticisms, or at least none which don’t double as compliments. The ruleset was complicated and unwieldy, increasing the rarity of “aha!” moments and natural stopping points during analysis, and making it hard to get an intuitive sense of how a given matchup would shake out (even after the rules were revealed) . . . but that’s exactly what made it such a useful testing ground, and such valuable preparation for real-world problems.

Comment by abstractapplic on D&D.Sci 5E: Return of the League of Defenders · 2023-05-31T00:40:23.133Z · LW · GW

Just recording for posterity that yes, I have noticed that

Rangers are unusually good at handling Samurai, so it might make sense to have one on my PVE team.

However, I've also noticed that

Rangers are unusually BAD at handling Felons, to a similar or greater degree.

As such,

I think it makes more sense to keep Pyro Professor as my mid-range heavy-hitter in PVE.

(. . . to my surprise, this seems to be the only bit of hero-specific rock-paper-scissors that's relevant to the PVE challenge. I suspect I'm missing something here.)