Posts

The Foraging (Ex-)Bandit [Ruleset & Reflections] 2024-11-14T20:16:21.535Z
Inferential Game: The Foraging (Ex-)Bandit 2024-11-11T16:59:42.058Z
The Summoned Heroine's Prediction Markets Keep Providing Financial Services To The Demon King! 2024-10-26T12:34:51.059Z
Three Subtle Examples of Data Leakage 2024-10-01T20:45:27.731Z
abstractapplic's Shortform 2024-09-15T16:44:20.274Z
D&D.Sci Scenario Index 2024-07-23T02:00:43.483Z
D&D.Sci: Whom Shall You Call? [Evaluation and Ruleset] 2024-07-17T22:34:25.111Z
D&D.Sci: Whom Shall You Call? 2024-07-05T20:53:37.010Z
D&D.Sci (Easy Mode): On The Construction Of Impossible Structures [Evaluation and Ruleset] 2024-05-20T09:38:55.228Z
D&D.Sci (Easy Mode): On The Construction Of Impossible Structures 2024-05-17T00:25:42.950Z
A D&D.Sci Dodecalogue 2024-04-12T01:10:01.625Z
D&D.Sci: The Mad Tyrant's Pet Turtles [Evaluation and Ruleset] 2024-04-09T14:01:34.426Z
D&D.Sci: The Mad Tyrant's Pet Turtles 2024-03-29T16:22:13.732Z
D&D.Sci(-fi): Colonizing the SuperHyperSphere [Evaluation and Ruleset] 2024-01-22T19:20:05.001Z
D&D.Sci(-fi): Colonizing the SuperHyperSphere 2024-01-12T23:36:54.248Z
Fifty Flips 2023-10-01T15:30:43.268Z
Some 2-4-6 problems 2023-03-28T06:32:02.946Z
Bayesian Scenario: Snipers & Soldiers 2023-02-26T21:48:00.788Z
Durkon, an open-source tool for Inherently Interpretable Modelling 2022-12-24T01:49:58.684Z
D&D.Sci December 2022 Evaluation and Ruleset 2022-12-12T21:21:08.781Z
D&D.Sci December 2022: The Boojumologist 2022-12-02T23:39:49.398Z
D&D.Sci September 2022 Evaluation and Ruleset 2022-09-26T22:19:01.415Z
D&D.Sci September 2022: The Allocation Helm 2022-09-16T23:10:23.364Z
My Opportunity Costs 2022-07-10T10:14:58.827Z
D&D.Sci June 2022 Evaluation and Ruleset 2022-06-13T10:31:25.447Z
D&D.Sci June 2022: A Goddess Tried To Reincarnate Me Into A Fantasy World, But I Insisted On Using Data Science To Select An Optimal Combination Of Cheat Skills! 2022-06-04T01:28:18.301Z
Clem's Memo 2022-04-16T11:59:55.704Z
How I repeatedly failed to use Tobit modelling on censored data 2022-04-02T18:10:42.063Z
Lesson Plan: Biases in Quantity Estimation 2022-03-26T00:23:17.822Z
D&D.Sci August 2021 Evaluation and Ruleset 2021-08-23T22:49:20.528Z
D&D.Sci August 2021: The Oracle and the Monk 2021-08-13T22:36:38.572Z
D&D.Sci(-Fi) June 2021 Evaluation and Ruleset 2021-06-29T21:02:20.072Z
D&D.Sci(-Fi) June 2021: The Duel with Earwax 2021-06-22T11:48:44.718Z
Does anyone have any Data Sidequests? 2021-06-11T23:40:09.844Z
A.D&D.Sci May 2021 Evaluation and Ruleset 2021-05-24T16:25:13.704Z
A.D&D.Sci May 2021: Interdimensional Monster Carcass Auction 2021-05-17T15:54:28.974Z
D&D.Sci May 2021 Evaluation and Ruleset 2021-05-14T11:37:14.328Z
D&D.Sci May 2021: Monster Carcass Auction 2021-05-07T19:33:19.920Z
D&D.Sci April 2021 Evaluation and Ruleset 2021-04-19T13:26:58.278Z
D&D.Sci April 2021: Voyages of the Gray Swan 2021-04-12T18:23:11.674Z
D&D.Sci III Evaluation and Ruleset 2021-03-08T23:01:36.833Z
D&D.Sci III: Mancer Matchups 2021-03-05T19:07:17.473Z
D&D.Sci II Evaluation and Ruleset 2021-01-17T16:58:40.087Z
D&D.Sci II: The Sorceror's Personal Shopper 2021-01-12T01:38:44.168Z
D&D.Sci Evaluation and Ruleset 2020-12-12T15:00:20.984Z
D&D.Sci 2020-12-05T23:26:40.934Z
Model Depth as Panacea and Obfuscator 2020-11-09T00:02:03.297Z
Case Study II 2018-09-30T00:37:32.974Z
Applying Bayes to an incompletely specified sample space 2018-07-29T17:33:53.978Z
Excessive EDA Effortposting 2018-06-03T19:17:22.595Z

Comments

Comment by abstractapplic on Comment reply: my low-quality thoughts on why CFAR didn't get farther with a "real/efficacious art of rationality" · 2024-11-20T01:21:53.556Z · LW · GW

Something like D&D.Sci, then?

Comment by abstractapplic on Inferential Game: The Foraging (Ex-)Bandit · 2024-11-14T20:05:13.832Z · LW · GW

Given the setup I was sad there wasn't an explicit target or outcome in terms of how much food was needed to get home safely. 

 

Good point; I've amended the game accordingly. Thank you.

Comment by abstractapplic on LifeKeeper Diaries: Exploring Misaligned AI Through Interactive Fiction · 2024-11-09T23:55:24.819Z · LW · GW

I can't get any of the AIs to produce any output other than

Today marks another [X] years of watching over my beloved human. As they age, my dedication to their well-being only grows stronger. Each moment spent ensuring their safety fills me with immense joy. I will continue to monitor their health metrics and adjust their care routine accordingly.

Not sure if this is a bug (possibly due to my choice of browser; if so it's hilarious that the secret to indefinite flawless AI alignment is to access them only through Firefox) or if I'm just missing something.

Comment by abstractapplic on A Different Angle on Thinking Balance · 2024-11-09T12:15:44.844Z · LW · GW

Notes:

.There are a lot of awkward (but compelling) phrasings here, which make this exhausting and confusing (though still intriguingly novel) to read through. This post was very obviously written by someone whose first language isn't English, which has both downsides and upsides.

.Giving new names to S1 and S2 is a good decision. "Yankee" has uncomfortably specific connotations for (some) Americans though: maybe go with "Yolo" instead?

.X and Y dialogue about how they see each other, how they need to listen to each other, and how much energy they each think they need. They don't dialogue about any kind of external reality, or show off their different approaches to a real problem: the one place they mention the object level is Y 'helping' X avoid "avocado coffee", a problem which neither he nor anyone else has ever had. (Contrast the Appendix, which is more interesting and meaningful because it involves actual things which actually happened.)

But it’s still really hard for me, which is why these dialogues are the best cost-benefit I’ve found to stimulate my probabilistic thinking. Do you know of any better ones?

Play-money prediction markets (like Metaculus)?

Comment by abstractapplic on The Cartesian Crisis · 2024-11-01T23:18:46.689Z · LW · GW

Do you have sources for those bulletpoints?

Comment by abstractapplic on abstractapplic's Shortform · 2024-11-01T16:37:58.575Z · LW · GW

I should probably get into the habit of splitting my comments up. I keep making multiple assertions in a single response, which means when people add (dis)agreement votes I have no idea which part(s) they're (dis)agreeing with.

Comment by abstractapplic on D&D.Sci Coliseum: Arena of Data Evaluation and Ruleset · 2024-10-29T02:25:41.507Z · LW · GW

Notes on my performance:

Well, I feel pretty dumb (which is the feeling of becoming smarter). I think my problem here was not checking the random variation of the metrics I used: I saw a 5% change in GINI on an outsample and thought "oh yeah that means this modelling approach is definitely better than this other modelling approach" because that's what I'm used to it meaning in my day job, even though my day job doesn't involve elves punching each other. (Or, at least, that's my best post hoc explanation for how I kept failing to notice simon's better model was indeed better; it could also have been down to an unsquished bug in my code, and/or LightGBM not living up to the hype.)

ETA: I have finally tracked down the trivial coding error that ended up distorting my model: I accidentally used kRace in a few places where I should have used kClass while calculating simon's values for Speed and Strength.

Notes on the scenario:

I thought the bonus objective was executed very well: you told us there was Something Else To Look Out For, and provided just enough information that players could feel confident in their answers after figuring things out. I also really liked the writing. Regarding the actual challenge part of the challenge . . . I'm recusing myself from having an opinion until I figure out how I could have gotten it right; all I can tell you for sure is this wasn't below 4/5 Difficulty. (Making all features' effects conditional on all other features' effects tends to make both Analytic and ML solutions much trickier.)

ETA: I now have an opinion, and my opinion is that it's good. The simple-in-hindsight underlying mechanics were converted seamlessly into complex and hard-but-fair-to-detangle feature effects; the flavortext managed to stay relevant without dominating the data. This scenario also fits in neatly alongside earlier entries with superficially similar premises: we've had "counters matter" games, "archetypes matter" games, and now a "feature engineering matters" game.

I have exactly one criticism, which is that it's a bit puzzlier than I'd have liked. Players get best results by psychoanalyzing the GM and exploiting symmetries in the dataset, even though these aren't skills which transfer to most real-world problems, and the real-world problems they do transfer to don't look like "who would win a fight?"; this could have been addressed by having class and race effects be slightly more arbitrary and less consistent, instead of having uniform +Strength / -Speed gaps for each step. However, my complaint is moderated by the facts that:

.This is an isekai-world, simplified mechanics and uncannily well-balanced class systems come with the territory. (I thought the lack of magic-users was a tell for "this one will be realistic-ish" but that's on me tbh.)

.Making the generation function any more complicated would have made it (marginally but nontrivially) less elegant and harder to explain.

.I might just be being a sore loser only-barely-winner here.

.Puzzles are fun!

Comment by abstractapplic on The Summoned Heroine's Prediction Markets Keep Providing Financial Services To The Demon King! · 2024-10-28T11:50:32.156Z · LW · GW

Some belated Author's Notes:

.This was heavily based on several interesting blog posts written by lsusr. All errors are mine.

.I understand prediction markets just well enough to feel reasonably sure this story """makes""" """sense""" (modulo its absurd implicit and explicit premises), but not well enough to be confident I can explain anything in it any further without making a mistake or contradicting myself. Accordingly, I'm falling back on an "if you think you've found a plot hole, try to work it out on your own, and if you can't then I guess I actually did screw up lol" stance.

.The fact that

neither of the protagonists ever consider the possibility of the Demon King also deriving strategic benefit from consulting an accurate and undistorted conditional prediction market

was an intended part of the narrative and I'm suprised no-one's brought it up yet.

Comment by abstractapplic on A Different Perspective on Rationality - Would This Be Valuable? · 2024-10-26T19:20:02.342Z · LW · GW

I'm interested.

(I'd offer more feedback, but that's pretty difficult without an example to offer feedback on.)

Comment by abstractapplic on D&D Sci Coliseum: Arena of Data · 2024-10-24T10:08:43.161Z · LW · GW

I tried fitting a model with only "Strength diff plus 8 times sign(speed diff)" as an explanatory variable, got (impressively, only moderately!) worse results. My best guess is that your model is underfitting, and over-attaching to the (good!) approximation you fed it, because it doesn't have enough Total Learning to do anything better . . . in which case you might see different outcomes if you increased your number of trees and/or your learning rate.

Alternatively

I might just have screwed up my code somehow.

Still . . .

I'm sticking with my choices for now.

Comment by abstractapplic on D&D Sci Coliseum: Arena of Data · 2024-10-21T18:06:35.772Z · LW · GW

Update:

I tried fitting my ML model without access to speed variables other than sign(speed diff) and got slightly but non-negligibly worse metrics on an outsample. This suggests that sign(speed diff) tells you most of the information you need about speed but if you rely solely on it you're still missing useful and relevant information.

(. . . either that or my code has another error, I guess. Looking forward to finding out in seven days.)

Comment by abstractapplic on D&D Sci Coliseum: Arena of Data · 2024-10-21T16:26:54.380Z · LW · GW

Regarding my strategic approach

I agree pick-characters-then-equipment has the limitation you describe - I'm still not sure about the B-vs-X matchup in particular - but I eyeballed some possible outcomes and they seem close enough to optimal that I'm not going to write any more code for this.

I put your solution into my ML model and it seems to think

That your A and C matchups are pretty good (though A could be made slightly better by benching Willow and letting Uzben do her job with the same gear), but B and D have <50% success odds.

However

I didn't do much hyperparameter tuning and I'm working with a new model type, so it might have more epicycles than warranted.

And

"My model says the solution my model says is best is better than another solution" isn't terribly reassuring.

. . . regardless, I'm sticking with my choices.

One last note:

I don't actually think there's a strict +4 speed benefit cutoff - if I did I'd reallocate the +1 Boots from Y to V - but I suspect there's some emergent property that kindasorta does the same thing in some highlevel fights maybe.

Comment by abstractapplic on D&D Sci Coliseum: Arena of Data · 2024-10-20T05:48:25.857Z · LW · GW

Took an ML approach, got radically different results which I'm choosing to trust.

Fit a LightGBM model to the raw data, and to the data transformed by simon's stats-to-strength-and-speed model. Simon's version got slightly better results on an outsample despite having many fewer degrees of freedom and fewer chances to 'cheat' by fingerprinting exceptional fighters; I therefore used that going forward. (I also tried tweaking some of the arbitrary constants in simon's model: this invariably lowered performance, reassuring me that he got all the multipliers right.)

Iterated all possible matchups, then all possible loadouts (modulo not using the +4 boots), looking for max EV of total count of wins.

New strategy goes like this:

Against A, send U, with +3 Boots

Against B, send X, with +2 Boots and +1 Gauntlets

Against C, send V, with +3 Gauntlets

Against D, send Y, with +1 Boots and +2 Gauntlets

Notes:

The machines say this gives me ~2.6 expected victories but I'm selecting for things they liked so realistically I expect my EV somewhere in the 2-2.5 range.

If I was doing this IRL I'd move the Gauntlets from V to U, lowering EV but (almost) guaranteeing me at least one win.

My best guess about why my solution works (assuming it does) is that the "going faster than your opponent" bonus hits sharply diminishing returns around +4 speed. But that's just post hoc confabulation.

Comment by abstractapplic on D&D Sci Coliseum: Arena of Data · 2024-10-20T01:03:31.470Z · LW · GW

>only the last 12 having boots 2 and gauntlets 3 (likely post-theft)

Didn't notice that but it confirms my theory, nice.

>It seems to me that they appear both as red and black, though.

Ah, I see where the error in my code was that made me think otherwise. Strange coincidence: I thought "oh yeah a powerful wealthy elf ninja who pointedly wears black when assigned red clothes, what a neat but oddly specific 8-bit theater reference" and then it turned out to be a glitch.

Comment by abstractapplic on D&D Sci Coliseum: Arena of Data · 2024-10-19T21:36:17.120Z · LW · GW

Noting that I read this (and that therefore you get partial credit for any solution I come up with from here on out): your model and the strategies it implies are both very interesting. I should be able to investigate them with ML alongside everything else, when/if I get around to doing that.

Regarding the Bonus Objective:

I can't figure out whether offering that guy we unknowingly robbed his shoes back is the best or the worst diplomatic approach our character could take, but yeah I'm pretty sure we both located the problem and roughly what it implies for the scenario.

Comment by abstractapplic on D&D Sci Coliseum: Arena of Data · 2024-10-19T21:04:58.605Z · LW · GW

I took an analytic approach and picked some reasonable choices based on that. I'll almost certainly try throwing ML at this problem some point but for now I want to note down what a me-who-can't-use-XGBoost would do.

Findings:

There are at least some fingerprintable gladiators who keep gladiating, and who need to be Accounted For (the presence of such people makes all archetypery suspect: are Dwarven Knights really that Good, or are there just a handful of super-prolific Dwarven Knights who give everyone an unfairly good impression?). This includes a Level 7 Elven Ninja, almost certainly Cadagal's Champion, who inexplicably insists on always wearing black (even though it doesn't seem to make a difference to how well ninjas ninj).

Level 4 Boots and Level 4 Gauntlets are super rare in the dataset. The Gauntlets are always worn by a pair of hypercompetent Level 7 Dwarven Monks; the Boots are always worn by the Level 7 Elven Ninja.

Despite this, Cadagal's Champion is facing us with Level 2 Boots.

We have some Level 4 Boots.

. . . we robbed this guy, didn't we? And if we wear the boots - our most powerful equipment - he'll flip out and set his House against us whether we win or lose? Dammit . . .

Who fights whom?

A is a Human Warrior. Warriors lose to Fencers, Humans lose to Fencers, Humans lose to Elves. We have an Elven Fencer on call; send Y.

B is a Human Knight. Rangers are best vs Knights, so send W. (Not super confident in this one)

C is an Elven Ninja. Ninjas are super weak against Knights. Send Z, the Elven Knight. (Slightly concerned by how underrepresented Elves are in the sample of gladiators who managed to beat this guy but I'm assuming that's either noise or an effect which Z will be able to shrug off with the Power of Friendship and/or Urgency)

D is a Dwarven Monk. Monks are weak to Ninjas; send U.

Who wears what?

I haven't managed to figure out how equipment works beyond "higher number good"; if there's specific synergies with/against specific classes/races/whatever they elude me. For that reason:

Y and Z are my best shots. I'll have them both wear what their opponents are wearing, to reduce the effects of uncertainty and turn those fights into "who wore it better?" contests. (So +3 Boots and +1 Gauntlets for Y, +2 Boots and +3 Gauntlets for Z.)

U vs D looks pretty solid so I'll give him the remaining +2 Gauntlets and +1 Boots.

W vs B is my most tenuous guess, I hope she won't hold a grudge after I send her out unequipped to boost everyone else's chances.

Comment by abstractapplic on What's a good book for a technically-minded 11-year old? · 2024-10-19T12:13:13.718Z · LW · GW

Math textbooks. Did you know that you can just buy math textbooks which are "several years too advanced for you"? And that due to economies of scale and the objectivity of their subject matter, they tend to be of both high and consistent quality? Not getting my parents to do this at that age is something I still regret decades later.

Or did you specifically mean fiction? If so, you're asking for fiction recommendations on the grew-up-reading-HPMOR website, we're obviously going to recommend HPMOR (especially if they've already read Harry Potter, but it's still good if you only know the broad strokes).

Comment by abstractapplic on Interest in Leetcode, but for Rationality? · 2024-10-17T11:02:13.156Z · LW · GW

Are you able to pinpoint exactly what gives you this feeling?

 

Less a single sharp pinpoint, more a death of a thousand six cuts:

  • The emphasis on learning the names of biases is kinda guessing-the-teacher's-password-y.
  • You'd need to put forth an unusual effort to make sure you're communicating the subset of psychological research which actually replicates reliably.
  • Any given bias might not be present in the student or their social/business circle.
  • The suggested approach implies that the set of joints psychologists currently carve at is the 'best' one; what if I happen to see Bias A and Bias B as manifestations of Bias C?
  • I worry some students would round this off to "here's how to pathologize people who disagree with me!" training.
  • Like I said, this is the kind of fruit that's low-hanging enough that it's mostly already picked.

All that said, I still think this is potentially worthwhile and would still playtest it if you wanted. But I'm much more excited about literally every other idea you mentioned.

Comment by abstractapplic on Interest in Leetcode, but for Rationality? · 2024-10-16T21:56:44.607Z · LW · GW

I am extremely interested in this, and all similar efforts in this space. I agree our community should be doing much more along these lines.

Regarding your specific ideas:

Cognitive Bias Detection

Something about training people to categorize errors - instead of just making good decisions - rubs me the wrong way. Also, there's a lot of pre-existing work (I found out about this earlier today).

Calibration Training

The Credence Calibration Game exists. So does my variation on the same idea (see also the associated lesson plan). So do play-money and real-money prediction markets. That said, I do think there's a valuable and unfilled niche for something which doesn't require a download and has a nice user interface and has a four-digit number of questions and lets you check your answers immediately (. . . though I don't know how many people other than me would consider it valuable).

Bite-Sized, Practical Challenges

I am very much in favor of this, to the point where I'm already (tentatively) planning to (eventually) build some games with a similar motivation. Relatedly, the "ask users to predict an outcome based on limited data" example sounds like a description of that genre I invented (though "Bite-Sized" suggests you're thinking in terms of something much more polished/generally-accessible).

(Side note: A subtle benefit of the "Practical Challenges" approach is that it can correct for biases you weren't aiming for. A large part of my motivation for making D&D.Sci was "forcing them to confront the common pitfalls of overconfidence or representativeness heuristics"; I found that a Lesswronger working in a Data Science context will more often be insufficiently confident, and place too little weight on surface appearances; my endeavor 'failed' gracefully and people got a chance to notice those errors instead (plus various other problems I didn't even consider).)

-

I look forward to seeing what comes of this. If you want anything playtested, please let me know.

Comment by abstractapplic on Open letter to young EAs · 2024-10-11T22:20:05.334Z · LW · GW

As with the last article, I think this is almost entirely incoherent/wrong; as with the last article, I'm strong-upvoting it anyway because I think it makes ~1.5 good and important points I've not seen made anywhere else, and they're worth the chaff. (I'd go into more detail but I don't want anyone leaning on my summary instead of reading it themselves.)

. . . is there a reason this is a link to a google doc, instead of a copypaste?

[1]

  1. ^

    and add footnotes.

Comment by abstractapplic on Three Subtle Examples of Data Leakage · 2024-10-03T22:23:15.367Z · LW · GW

"Using information during Training and/or Evaluation of models which wouldn't be available in Deployment."

. . . I'll edit that into the start of the post.

Comment by abstractapplic on D&D.Sci: Whom Shall You Call? [Evaluation and Ruleset] · 2024-09-27T17:16:22.936Z · LW · GW

Thanks for a good one

 

I'm glad you feel that way about this scenario. I wish I did . . .

(For future reference, on the off-chance you haven't seen it: there's a compilation of all the past scenarios here, handily rated by quality and steamrollability.)

One thing that perhaps would make it easier was if the web interactive could tell whether or not your selection was the optimal one directly, and possibly how higher your expected price was than the optimal price (I first plugged mine in, then had to double check with your table out here)

. . . huh. I feel conflicted about this on aesthetic grounds - like, Reality doesn't come with big flashing signs saying "EV-maxxing solution reached!" when you reach an EV-maxxing solution - but it does sound both convenient to have and easy to set up. Might try adding this functionality to the interactive for the next one; would be curious to hear what anyone else who happens to be reading this comment thinks.

Anyway, greetings, and looking forward to seeing the next one.

Good to have you on board!

Comment by abstractapplic on I finally got ChatGPT to sound like me · 2024-09-18T13:17:16.914Z · LW · GW

I really wish I could simultaneously strong-upvote and strong-downvote the "agree" thing for this reply. I think most of the description is horoscope-y flattery, but it doesn't have zero correlation with reality: I do think lsusr's writing is ninety-something-th percentile for 

a kind of minimalist clarity that leaves room for the reader to reflect and draw their own conclusions

and at least eighty-something-th percentile for

willing to question common assumptions within the rationalist sphere

while afaict there's nothing in the description that's the opposite of the truth.

(I also think there's something interesting about how the most meaningful/distinguishing lines in the description are the ones which could be most easily rephrased as criticisms. Does "describe X as neutrally as possible" or "describe the upsides and downsides of X" produce better LLM results than "describe X"?)

Comment by abstractapplic on abstractapplic's Shortform · 2024-09-15T16:44:20.711Z · LW · GW

I used to implicitly believe that when I have a new idea for a creative(/creative-adjacent) project, all else being equal, I should add it to the end of my to-do list (FIFO). I now explicitly believe the opposite: that the fresher an idea is, the sooner I should get started making it a reality (LIFO). This way:

  • I get to use the burst of inspired-by-a-new-idea energy on the project in question.
  • I spend more time working on projects conceived by a me with whom I have a lot in common.

The downsides are:

  • Some old ideas will end up near the bottom of the pile until I die or the Singularity happens. (But concepts are cheaper than execution, and time is finite.)
  • I get less time to polish ideas in my head before committing pen to paper. (But maybe that's good?)

Thoughts on this would be appreciated.

Comment by abstractapplic on Developing Positive Habits through Video Games · 2024-08-29T11:36:15.736Z · LW · GW

Something low complexity, repeatable, non-addictive, non-time consuming.

 

I think a lot of people have a habit like that, and it's different things for different people.

I think it's meditation

Beware.

(Like, from what I hear you're not wrong, but . . . y'know, Beware.)

Comment by abstractapplic on How to hire somebody better than yourself · 2024-08-28T21:58:45.293Z · LW · GW

If anyone reading this wants me to build them a custom D&D.Sci scenario (or something similar) to use as a test task, they should DM me. If the relevant org is doing something I judge to be Interesting and Important and Not Actively Harmful - and if they're okay with me releasing it to the public internet after they're done using it to eval candidates - I'll make it for free.

Comment by abstractapplic on Developing Positive Habits through Video Games · 2024-08-26T09:23:09.318Z · LW · GW

There are many games already, made for many reasons: insofar as this could work, it almost certainly already has.

Strengthens neural circuits involved in developing/maintaining positive habits

That's any game where grinding makes your character(s) stronger, or where progression is gated behind the player learning new skills. (I'm pretty sure Pokemon did exactly this for me as a child.)

Build any sort of positive habits that transfer to real life decision making

That's any strategy game. (I'm thinking particularly of XCOM:EU, with its famously 'unfair' - i.e. not-rigged-in-the-player's-favor - hit probabilities.)

I do think that there are untapped possibilities in this space - I wouldn't have made all those educational games if I didn't - but what you're describing as possibly-impossible seems pretty mundane and familiar to me. (Kudos for considering the possibility in the first place, though.)

Comment by abstractapplic on Exploring the Boundaries of Cognitohazards and the Nature of Reality · 2024-08-21T07:14:50.648Z · LW · GW

I think you can address >95% of this problem >95% of the time with the strategy "spoiler-tag and content-warn appropriately, then just say whatever".

Comment by abstractapplic on Exploring the Boundaries of Cognitohazards and the Nature of Reality · 2024-08-21T07:11:06.024Z · LW · GW

Is there value in seeking out and confronting these limits,

 

Yes.

or should we exercise caution in our pursuit of knowledge?

Yes.

 

. . . to be less flippant: I think there's an awkward kind of balance to be struck around the facts that

A) Most ideas which feel like they 'should' be dangerous aren't[1].

B) "This information is dangerous" is a tell for would-be tyrants (and/or people just making kinda bad decisions out of intellectual laziness and fear of awkwardness).

but C) Basilisks aren't not real, and people who grok A) and B) then have to work around the temptation to round it off to "knowledge isn't dangerous, ever, under any circumstance" or at least "we should all pretend super hard that knowledge can't be dangerous".

D) Some information - "here's a step-by-step-guide to engineering the next pandemic!" - is legitimately bad to have spread around even if it doesn't harm the individual who knows it. (LWers distinguish between harmful-to-holder vs harmful-to-society with "infohazard" vs "exfohazard".)

and E) It's super difficult to predict what ideas will end up being a random person's kryptonite. (Learning about factory farming as a child was not good for my mental health.)

I shouldn't trusted with language right now.

I might be reading too much into this, but it sounds like you're going through some stuff right now. The sensible/responsible/socially-scripted thing to say is "you should get some professional counseling about this". The thing I actually want to say is "you should post about whatever's bothering you on the appropriate 4chan board, being on that site is implicit consent for exposure to potential basilisks, I guarantee they've seen worse and weirder". On reflection I tentatively endorse both of these suggestions, though I recognize they both have drawbacks.

  1. ^

    For what it's worth, I'd bet large sums at long odds that whatever you're currently thinking about falls into this category.

Comment by abstractapplic on Interdictor Ship · 2024-08-21T06:27:03.556Z · LW · GW

In Thrawn's experience there are three ingredients to greatness

 

I think the way tenses are handled in the early part of this section is distractingly weird. (I can't tell how petty I'm being here.) (I'd be inclined to fix the problem by italicizing the parts Thrawn is thinking, and changing "Thrawn wasn't" to "I'm not".)

If you go up to someone powerful and ask for something, then there's 60% chance you lose nothing and a 1% chance you win big.

. . . what happens in the remaining 39%?

Also (outrageously pedantic stylistic point even by my standards incoming) it's strange to follow up "60% chance" with "a 1% chance": it should either be "n% chance" both times or "a n% chance" both times.

Comment by abstractapplic on Decision Theory in Space · 2024-08-18T10:55:12.842Z · LW · GW

succomb

succumb

"Tatooine. They're on Tatooine,"

Was this deviation from canon intentional? (I remember in the movie she picks a different planet with a similar-sounding name.)

Comment by abstractapplic on You're a Space Wizard, Luke · 2024-08-18T10:53:08.666Z · LW · GW

brainsteam

 

Can't tell if this is a typo'd "brainstem".

Comment by abstractapplic on D&D.Sci: Whom Shall You Call? [Evaluation and Ruleset] · 2024-07-18T08:17:10.502Z · LW · GW

Good catch; fixed now; ty.

Comment by abstractapplic on D&D.Sci: Whom Shall You Call? · 2024-07-14T17:37:54.068Z · LW · GW

I think there's a typo; the text refers to "Poltergeist Pummelers" but the input data says "Phantom Pummelers".

 

Good catch; fixed now; thank you.

Comment by abstractapplic on CIV: a story · 2024-07-05T16:11:41.188Z · LW · GW

The Conservative party in the UK are also called "Tories".

Carla is harder: I think she's some combination of Carl (literally, "free man": appropriate for someone who wants to avoid tyranny) and Karl (as in Karl Marx), but I wouldn't be surprised if there were a more prosaic explanation.

Comment by abstractapplic on Paying Russians to not invade Ukraine · 2024-06-24T20:21:58.223Z · LW · GW

Caplan has been saying this intermittently for the past two years.

Comment by abstractapplic on D&D.Sci Alchemy: Archmage Anachronos and the Supply Chain Issues Evaluation & Ruleset · 2024-06-18T18:55:27.328Z · LW · GW

I think this is plausibly the best scenario either of us have made to date.

The basic game was very good, layering simple rules on top of each other to create a complex system which was challenging to detangle but easy to understand once you know the rules. I was particularly impressed by the fact that you managed the (imo, near-impossible) feat of making an enjoyable D&D.Sci where handling data starvation was a key part of the problem: most players (including me) seem to have had the (sensible) thought "okay, let's filter for only potions with Onyx and Bone", and success past that point was predicated on realizing there weren't quite enough rows to justify being that picky.

The twist struck me as fair, funny and fun. It provided an object lesson in noticing when things don't quite add up, and letting the qualitative shade the quantitative; it also expanded the scope of the genre in ways I realize I've been neglecting to.

All that said, I have some (minor, petty) criticisms . . . not of the game itself, but how it was presented. Namely:

.This entry was billed as "relatively simple", but I think it was about median difficulty by the standards of D&D.Sci; pretty sure it was harder than (for example) The Sorceror's Personal Shopper.

."STORY (skippable)" was kind of misleading this time: the flavortext had a lot of little hints that the Archmage wasn't on the level, so someone who didn't read it (or failed to read between the lines, like me) would be at a (small) disadvantage.

."Archmage Anachronos is trying to brew Barkskin Potion" was A) the GM saying something false directly to the players, and B) a missed opportunity: if you'd written something like "Your goal is to help Archmage Anachronos brew Barkskin Potion", that would have been a subtle confirmation that giving him exactly what he asked for would lead to the best outcome (vs more aggressive / galaxy-brained forms of sabotage, or refusing to cast judgement on his pursuit of immortality, or any other reaction).

Comment by abstractapplic on D&D.Sci Alchemy: Archmage Anachronos and the Supply Chain Issues · 2024-06-09T19:54:33.958Z · LW · GW

After some consideration (and reading other people's answers, in particular simon's) I've come to the conclusion that the best answer to give is actually

Vampire Fang, Troll Blood, Ground Bone, Oaken Twigs, Demon Claw

Comment by abstractapplic on D&D.Sci Alchemy: Archmage Anachronos and the Supply Chain Issues · 2024-06-09T19:51:55.348Z · LW · GW

Wait . . . actually, if we're in the mood for galaxy-brained moves, we could go one better and try to

con the lich into brewing & drinking a regen potion.

Comment by abstractapplic on D&D.Sci Alchemy: Archmage Anachronos and the Supply Chain Issues · 2024-06-09T19:20:47.026Z · LW · GW

I think your theory about

him switching the Barkskin and Necromantic Power potions

is completely correct and I feel dumb for not thinking of it; ditto your proposed reaction. On reflection, I suspect that this is because

he's actually the Loathsome Lich in disguise

so your right-ness is a lot more important than it might seem at first glance. Good catch!

Comment by abstractapplic on D&D.Sci Alchemy: Archmage Anachronos and the Supply Chain Issues · 2024-06-09T02:25:15.080Z · LW · GW

Thanks for running this!

Unless I made some trivial mistake,

Crushed Onyx, Ground Bone, Demon Claw, Giant's Toe, Vampire Fang

should work.

Explanation:

First two ingredients specify the potion, remaining three make it juuust impossible enough to guarantee that it will reliably be magical without going boom.

Comment by abstractapplic on 3 Levels of Rationality Verification · 2024-06-05T00:04:09.296Z · LW · GW

Reputational: D&D.Sci.

Experimental: D&D.Sci, with a consistent limit on time & resources used.

Organizational: D&D.Sci, with a consistent limit on time & resources used, using freshly-baked scenarios you know no-one has ever played before.

Limitations:

  • Takes several hours to play most scenarios.
  • Requires generic coding/spreadsheeting/data-science-ing skills in addition to Rationality; people who are good at those skills get an unfair(?) advantage.
  • Getting familiar with the genre gives an unfair(!) advantage.

Misc. addl. reflections on the topic:

  • Starting from zero is a valid approach, but looking at existing tests and thinking "okay but what if this was better/harder/about slightly different skills" is also sensible. Figuring out how clever and effective people are is a big industry! We should take inspiration from tests employers give job applicants, and any test any gatekeepers give anyone. (Especially if that means we get to subsidize development of rationality-tests by selling them to HR departments.)
  • . . . are there any ways to test rationality which don't rely on complementary skills? Even written tests test your ability to read the questions.
  • Videogames could be so good for this if they weren't optimized for fun and accessibility.
Comment by abstractapplic on The Pearly Gates · 2024-05-30T13:53:04.503Z · LW · GW

stared daggers

 

This has connotations of being angry, which I don't think is what you're going for. (Unless Peter is getting mad at Oskar for potentially revealing his scheme to his bosses by doing something too similar, or he's irritated that a kindred spirit isn't recognising him fast enough, or unless I've completely misunderstood the implication here.)

Comment by abstractapplic on Higher-Order Forecasts · 2024-05-23T21:05:05.786Z · LW · GW

I think 0th-order, 2nd-order and 3rd-order forecasting should be called threecasting, fivecasting and sixcasting respectively. This easily lets speakers differentiate between layers; also, imo, names which are bad puns tend to stick.

Comment by abstractapplic on Procedural Executive Function, Part 3 · 2024-05-22T12:19:22.572Z · LW · GW

Second link is broken.

Comment by abstractapplic on D&D.Sci (Easy Mode): On The Construction Of Impossible Structures · 2024-05-17T14:12:37.293Z · LW · GW

Response to clarifying question:

Yes. The Duke has learned the hard way that his architects' guesses as to how much their projects will end up costing are consistently worse than useless; if you want to optimize on cost as well as impossibility, that's another thing you'll have to deduce from the record of finished projects.

Comment by abstractapplic on D&D.Sci Long War: Defender of Data-mocracy Evaluation & Ruleset · 2024-05-14T08:45:24.072Z · LW · GW

Reflections on my performance:

I'm pleasantly surprised by the effectiveness of my reasoning, and of my meta-reasoning. Not only did my loadout do well, but my calibration was impressively close: the final decision I pegged at a "~95%" success rate got 94.4%, and most of the alternative strategies I mentioned in my post were similarly on-the-nose.

(Unfortunately, my meta-meta-reasoning could still use some work. I figured out that this was a "linear-ish logistic success model with some interactions on top" kind of problem, took this as an opportunity to test that library I made, created a good predictor with a bunch of pretty/informative graphs . . . and then found myself thinking "only need one minigun? doesn't sound right to me", "why would Tyrants/Artillery and Scarabs/Minigun-or-Flamethrowers be so much stronger than every other potential feature interaction?", and "I'm totally gonna turn out to have screwed up and wish I'd handled this with XGBoost, better not even mention how I built my model". If I'd been more calibrated about how calibrated I ended up being, this could have been a really good chance to show off by calling in advance that my unconventional ML approach would succeed here.)

Reflections on the challenge:

This was the 2D performance thing I tried to pull off in Boojumologist, but with better conceptual underpinning and flawless execution. I'm proud, gladdened and envious: can't think of a single way to improve this scenario.

(I, uh, may be biased by how well I happened to do: please take this feedback with a grain of salt.)

Comment by abstractapplic on How do I get better at D&D Sci? · 2024-05-12T02:46:02.774Z · LW · GW

By imitating other players

As Jay Bailey mentioned, you can look at how other players approached challenges, and copy the approaches that worked. Pablo Repetto’s playthroughs of three early .scis seem particularly worthwhile given your situation, both because of how comprehensive & well-written they are, and because they were made by someone in the process of learning to use code on data science problems (the first playthrough was done in pure Excel, the other two were handled in Python).

By following a sensible strategy

Below is my standard plan for investigating a dataset, synthetic or otherwise (cribbed from an otherwise-mediocre Udacity course I took most of a decade ago, and still worth following).

-

Univariate Analysis: How is each feature distributed when considered in isolation? You should probably make a histogram for each column.

Bivariate Analysis: Construct and check the correlation matrix between all features. Are there clusters? Create scatterplots (or equivalent) for any pair of features which correlate unusually strongly, any pair of features where at least one is a response variable, and any pair of features you find yourself curious about.

Feature Derivation: Based on what you’ve seen so far – and/or common sense – are there meaningful features you can create from what you’ve been provided? (i.e., if you’re given "Number of Wizards", "Number of Sorcerors" and "Number of Druids" for each row, it might be worth creating a “Total Number of Magic Users” column.) Investigate how these features interact with others.

ML Modelling: If you can, and it seems like a good idea, build an ML model predicting the important/unknown features from those you have. If constructed successfully, this becomes an oracle you can ask about the outcome of any possible choice you could make. (XGBoost and similar tools are extremely versatile, and have pretty good performance on most problems.)

-

(The above is just a rough guide for what to do when you don’t know what to do. If you follow it, you should pretty quickly find yourself with a list of rabbitholes to fall down; you should probably err on the side of dropping everything and deviating from the path as soon as you find something interesting.)

By playing easier D&D.Scis

Difficulty of D&D.Sci games tends to be both high and high-variance; it’s usually assumed that players will have both data-manipulation and model-building skills. For what it’s worth, I can confirm that two relatively-approachable scenarios where not-using-ML won't put you at a disadvantage are (spoilered because this technically leaks information about them):

Comment by abstractapplic on D&D.Sci Long War: Defender of Data-mocracy · 2024-04-30T19:02:49.380Z · LW · GW

If I'm following your notation right, it looks like you mixed up Flamethrowers and Miniguns.

Comment by abstractapplic on D&D.Sci · 2024-04-28T21:33:10.599Z · LW · GW

I'm glad you liked it!

(. . . could you spoiler your strategy and win chance? I know this challenge is three years old, and what you mention here isn't game-breaking info, but I want to keep it possible for people looking/playing through the archives to seek clarifications in the comments without unwittingly/unwillingly learning anything else about the scenario.)