Posts
Comments
I can't tell if this post is a request for more feedback for you in future, or trying to open a more general discussion about what norms and conventions exist around giving feedback, or if it's about you wanting to see people give more love to other creators.
I was trying to do all of these things simultaneously.
The second graph you link to seems - unless I'm missing something? - to confirm the point you're trying to use it to rebut: set the x axis to five years and you can absolutely see a massive jump where Milei changed the exchange rate.
(Regardless, strong-upvoted for picking holes and citing sources.)
Just realized I forgot to mention this: I really like how the interactive handled the Bonus Objective, i.e. if the player is thinking along the right lines their character automatically makes the in-universe sensible/optimal decision for them (which means you can set up a fair Bonus Objective for players who don't live in that universe and so don't have all the context).
Notes on my performance:
. . . huh! I was really expecting to either take first place for being the only player putting serious effort into the right core mechanics, or take last place for being the only player putting serious effort into the wrong core mechanics; getting the main idea wrong but doing everything else well enough for silver was not on my bingo card. (I'm also pleasantly surprised to note that I figured out which goblin I could purge with least collateral damage: I can leave Room 7 empty without changing my position on the leaderboard.)
There were only three likely hypotheses based on the problem statement: A) adventurers scout one room ahead, B) adventurers take optimal path(s), and C) adventurers hit every room so all that matters is the order. Early efforts ruled out C, and the Bonus Objective being fully achievable under A but not B made A a lot more plausible; however, further investigations[1] made it seem like that might be a fakeout[2], so I (narrowly) chose to max-min instead of max-max; even in retrospect, I'm not 100% sure that was a bad decision.
Notes on the scenario:
I have strongly ambivalent feelings about almost every facet of this game.
The central concept was solid gold but could have been handled better. In particular, I think puzzling out the premise could have been a lot more fun if we hadn't known the entry and exit squares going in.
The writing was as fun and funny as usual - if not more so! - but seemed less . . . pointed?/ambitious?/thematically-coherent? than I've come to expect.
The difficulty curve was perfect early but annoying late. A lot of our scenarios commit the minor sin of making initial headway hard to make, discouraging casual players and giving negligible or negative reward for initial investigations; this one emphatically doesn't, since pairing high-traffic rooms with high-challenge creatures was an easy(-ish) way to get better-than-random EV. However, the central mechanics of "dungeoneers scout one room ahead" and "loud fights alert" interfered in ways that made it hard to pin either of them down: more rows and columns might have made this smoother (imo a 4x4 or 5x5 dungeon would probably have been easier than the 3x3, especially for reliably distinguishing between hypotheses A and B), as would having more simple and easily-discoverable rules to use as a firm foundation ("When given a choice, Adventurers will always choose rooms with Sirens, and never choose rooms with Tar Pits"?).
The timespan was a very good choice for the season (and I'm absolutely doing things this way next time I run an end-of-year game) but paired badly with the premise. Your last Christmas scenario was, in retrospect, a really good match, because (possible spoiler for any reader who might want to play it)
it was puzzle-y and un-random enough that a player could be 100% confident their inferences were correct, so it wouldn't occupy mental real estate once they put it down, and they wouldn't mind waiting for their answers to be confirmed
but this one was kind of the opposite.
There was one aspect about which I have unreservedly positive feelings: the chrono effects, the hag poem and the varying numbers of adventurers were all excellent red herrings, seeming like they might hint towards subtle opportunities for performance improvement (and/or a secret Bonus Bonus Objective) but being quickly dismissable as fingertraps. (Yay verisimilitude!)
In summary, I think I'd put this one at about a 3/3 for Quality and Complexity . . . though I suspect others might have radically different opinions depending on how all the above happened to hit for them.
- ^
I found a row where Adventurers were clearly choosing an easy path starting with Orcs over a hard path starting with Boulders, and took this to mean "adventurers take perfect paths under at least some circumstances" instead of "there's some predictable condition for which Orcs<Boulders". Whoops!
- ^
"You tried to play the GM instead of the game? Doom! Doom for you!" <- what I thought you might be thinking
Enemy HP: 72/104
Fractionalist cast Reduce-4!
It succeeded!
Enemy HP: 18/26
I've procrastinated and prevaricated for the entire funding period, because, well . . . on the one hand . . .
- Lightcone runs LW, runs Lighthaven, and does miscellaneous Community Things. I've never visited Lighthaven, don't plan to, and (afaik) have never directly benefited from its existence; I have similar sentiments regarding the Community Things. Which means that, from my point of view, ~$2M/yr is being raised to run a web forum. This strikes me as unsustainable, unscaleable, and unreasonable.
- The graphs here say the number of monthly users is ~4000. If you disqualify the ~half of those who are students, lurkers, drive-by posters, third-worlders, or people who just forgot their wallet . . . that implies ~$1000, per person, per year, to run a web forum. (Contrast the Something Awful forums, which famously sustain themselves with a one-time entry fee of $10-$25 per person (plus some ads shown to the people who only paid $10).)
- I suspect Rationality is one of those things you get less of as you add more money; relatedly, frugality is one of the more reliable defenses against Chapman-style capital-S Sociopaths, and Zvi-style capital-M Mazes.
But on the other hand . . .
- It's a really good web forum; very plausibly the best that exists. This walled garden is impeccably managed and curated, and has an outsized impact on the rest of the world.
- If just a handful of the mundane UI innovations prototyped here caught on in the wider internet, that could justify every penny being asked for and then some (I'm thinking particularly of multidimensional voting and the associated ability to distinguish "this comment is Good" from "this comment is Right").
- LW has had a non-negligible and almost-certainly-net-positive impact on my personal and professional lives, and I think that should be rewarded.
I've decided to square this circle by giving $200 but being super tsundere about it. Hopefully the fact that this is about a fifth of what's implicitly being asked for, while being about five times what I'd consider sensible for any other site, serves to underline everything I've said above.
Typo in title: prioritize, not priorities.
Here's Claude's take on a diagram to make this less confusing.
The diagram did not make things less confusing, and in fact did the opposite. A table would be more practical imo.
10 chat sessions
As in, for each possible config, and each possible channel, run ten times from scratch? For a total of 360 actual sessions? This isn't clear to me.
Regardless: a small useful falsifiable practical result, with no egregious errors in the parts of the methodology I understand. Upvoted.
Oh, and as for
the Bonus Objective
if I'm continuing with my current paradigm I'd guess it has something to do with
an apparent interaction between Orcs and Hags which makes a path containing both less dangerous than might otherwise be expected
possibly such that
I could remove the Goblin in Room 7 without making the easiest path any easier
but
I have low confidence in this answer
and
I have no idea how I could get away with purging the second Goblin
Built a treebased model; trialled a few solutions; got radically different answers which I'm choosing to trust.
The machines seem to think that the best solution I can offer is
BOG/OWH/GCD
and I've
found a row which confirms the adventurers-scout-one-room-ahead paradigm is, at the very least, not both eternal and absolute
so I'm making that my answer for now.
Did some more tinkering with this scenario. It is remarkably difficult to be 100% confident when determining the basic mechanics of this scenario, i.e.
whether adventuring parties can see more than one room ahead.
And I'm beginning to suspect that
some adventuring parties always take the optimal path, while some others are greedy algorithms just picking the easiest next encounter.
( . . . and IQ tests, and exam papers, and probably some other things that are too obvious for me to call to mind . . . )
You might want to look into tests given to job applicants. (Human intelligence evaluation is an entire industry already!)
D&D.Sci, for Data Science and related skills (including, to an extent, inference-in-general).
"What important truth do you believe, which most people don't?"
"I don't think I possess any rare important truths."
On reflection, I think
my initial guess happened to be close to optimal
because
Adventurers will successfully deduce that a mid-dungeon Trap is less dangerous than a mid-dungeon Orc
and
Hag-then-Dragon seems to make best use of the weird endgame interaction I still don't understand
however
I'm scared Adventurers might choose Orcs-plus-optionality over Boulders
so my new plan is
CBW/OOH/XXD
(and I also suspect
COW/OBH/XXD
might be better because of
the tendency of Adventuring parties to pick Eastern routes over Southern ones when all else is equal
but I don't have the confidence to make that my answer.)
Oh and just for Posterity's sake, marking that I noticed both
the way some Tournaments will have 3 judges and others will have 4
and
the change in distribution somewhere between Tournaments 3000 and 4000
but I have no clue how to make use of these phenomena.
On further inspection it turns out I'm completely wrong about
how traps work.
and it looks like
Dungeoneers can always tell what kinds of fight they'll be getting into: min(feature effect) between 2 and 4 is what decides how they collectively impact Score.
It also looks like
The rankings of effectiveness are different between the Entry Square, the Exit Square, and Everywhere Else; Steel Golems are far and away the best choice for guarding the entrance but 'only' on par with Dragons elsewhere.
Lastly
It looks like there's a weak but solid benefit to dungeoneers having no choice even between similarly strong creatures: a choice of two dragons and a choice of two hags are both a bit scarier than hag-or-dragon. (Though that might just be because multiple of the same strong creature is evidence you're in a well-stocked dungeon? Feature effects are hard to detangle.)
Also
It seems like there's a weirdly strong interaction between the penultimate obstacle and the ultimate obstacle?
I still have a bunch of checking to confirm whether this actually works, but I'm getting my preliminary decision down ASAP:
CWB/OOH/XXD (where the Xes are Nothing or Goblins depending on whether I'm Hard-mode-ing)
On the basis that:
Adventurers should prioritize the 'empty' trapped rooms over the ones with Orcs, then end up funelled into the traps and towards the Hag; Clay Golem and Dragon are our aces, so they're placed in the two locations Adventurers can't complete the course without touching.
But you know you can just go onto Ligben and type in the name yourself, right?
I didn't, actually; I've never used libgen before and assumed there'd be more to it. Thanks for taking the time to show me otherwise.
as documented in Curses! Broiled Again!, a collection of urban legends available on Libgen
Link?
You're right. I'll delete that aside.
I can't believe I forgot that one; edited; ty!
Congrats on applying Bayes; unfortunately, you applied it to the wrong numbers.
The key point is that "Question 3: Bayes" is describing a new village, with demographics slightly different to the village in the first half of your post. You grandfathered in the 0.2 from there, when the equivalent number in Village Two is 0.16 (P(Cat) = P(Witch with Cat) + P(Muggle with Cat) = 0.1*0.7 + 0.9*0.1 = 0.07 + 0.09 = 0.16), for a final answer of 43.75%.
(The meta-lesson here is not to trust LLMs to give you info you can't personally verify, and especially not to trust them to check anything.)
ETA: Also, good on you for posting this. I think LW needs more numbery posts, more 101-level posts, and more falsifiable posts; a numbery 101-level falsifiable post gets a gold star (even if it ends up falsified).
Edited it to be less pointlessly poetic; hopefully the new version is less ambiguous. Ty!
Has some government or random billionaire sought out Petrov's heirs and made sure none of them have to work again if they don't want to? It seems like an obviously sensible thing to do from a game-theoretic point of view.
everyone who ever votes (>12M)
I . . . don't think that's a correct reading of the stats presented? Unless I'm missing something, "votes" counts each individual [up|down]vote each individual user makes, so there are many more total votes than total people.
'Everyone' paying a one-time $10 subscription fee would solve the problem.
A better (though still imperfect) measure of 'everyone' is the number of active users. The graph says that was ~4000 this month. $40,000 would not solve the problem.
CS from MIT OCW
Good choice of topic.
(5:00-6:00 AM)
(6:00-7:00 AM)
Everyone has their own needs and tolerances, so I won't presume to know yours . . . and if you're trying to build daily habits, "every morning" is probably easier to reliably schedule than "every night" . . . but still, sleep is a big deal, especially for intellectual work. If you're not unsually good at going without for long stretches, and/or planning to turn in before 10pm to compensate . . . you might benefit from a slightly less Spartan schedule.
- Put together a plan to learn to write and execute it.
What kind(s) of writing do you want to be able to produce?
- Practice
I'm curious how you plan on practicing your rationality, and how you intend to measure improvement. As far as I can tell our subculture has been trying to figure this out for a decade and change, with sharply limited success.
compute
I don't remember the equations for integration by parts and haven't used them in years. However, when I saw this, I immediately started scribbling on the whiteboard by my bed, thinking:
"Okay, so start with (x^2)log(x). Differentiating that gives two times the target, but also gives us a spare x we'd need to get rid of. So the answer is (0.5)(x^2)log(x) - (x^2)/4."
So I actually think you're right in general but wrong on this specific example: getting a deep sense for what you're doing when you're doing integration-by-parts would be a more robust help than rote memorization.
(Though rote memorization and regular practice absolutely have their place; if I'd done more of those I'd have remembered to stick a "+c" on the end.)
Something like D&D.Sci, then?
Given the setup I was sad there wasn't an explicit target or outcome in terms of how much food was needed to get home safely.
Good point; I've amended the game accordingly. Thank you.
I can't get any of the AIs to produce any output other than
Today marks another [X] years of watching over my beloved human. As they age, my dedication to their well-being only grows stronger. Each moment spent ensuring their safety fills me with immense joy. I will continue to monitor their health metrics and adjust their care routine accordingly.
Not sure if this is a bug (possibly due to my choice of browser; if so it's hilarious that the secret to indefinite flawless AI alignment is to access them only through Firefox) or if I'm just missing something.
Notes:
.There are a lot of awkward (but compelling) phrasings here, which make this exhausting and confusing (though still intriguingly novel) to read through. This post was very obviously written by someone whose first language isn't English, which has both downsides and upsides.
.Giving new names to S1 and S2 is a good decision. "Yankee" has uncomfortably specific connotations for (some) Americans though: maybe go with "Yolo" instead?
.X and Y dialogue about how they see each other, how they need to listen to each other, and how much energy they each think they need. They don't dialogue about any kind of external reality, or show off their different approaches to a real problem: the one place they mention the object level is Y 'helping' X avoid "avocado coffee", a problem which neither he nor anyone else has ever had. (Contrast the Appendix, which is more interesting and meaningful because it involves actual things which actually happened.)
But it’s still really hard for me, which is why these dialogues are the best cost-benefit I’ve found to stimulate my probabilistic thinking. Do you know of any better ones?
Play-money prediction markets (like Metaculus)?
Do you have sources for those bulletpoints?
I should probably get into the habit of splitting my comments up. I keep making multiple assertions in a single response, which means when people add (dis)agreement votes I have no idea which part(s) they're (dis)agreeing with.
Notes on my performance:
Well, I feel pretty dumb (which is the feeling of becoming smarter). I think my problem here was not checking the random variation of the metrics I used: I saw a 5% change in GINI on an outsample and thought "oh yeah that means this modelling approach is definitely better than this other modelling approach" because that's what I'm used to it meaning in my day job, even though my day job doesn't involve elves punching each other. (Or, at least, that's my best post hoc explanation for how I kept failing to notice simon's better model was indeed better; it could also have been down to an unsquished bug in my code, and/or LightGBM not living up to the hype.)
ETA: I have finally tracked down the trivial coding error that ended up distorting my model: I accidentally used kRace in a few places where I should have used kClass while calculating simon's values for Speed and Strength.
Notes on the scenario:
I thought the bonus objective was executed very well: you told us there was Something Else To Look Out For, and provided just enough information that players could feel confident in their answers after figuring things out. I also really liked the writing. Regarding the actual challenge part of the challenge . . . I'm recusing myself from having an opinion until I figure out how I could have gotten it right; all I can tell you for sure is this wasn't below 4/5 Difficulty. (Making all features' effects conditional on all other features' effects tends to make both Analytic and ML solutions much trickier.)
ETA: I now have an opinion, and my opinion is that it's good. The simple-in-hindsight underlying mechanics were converted seamlessly into complex and hard-but-fair-to-detangle feature effects; the flavortext managed to stay relevant without dominating the data. This scenario also fits in neatly alongside earlier entries with superficially similar premises: we've had "counters matter" games, "archetypes matter" games, and now a "feature engineering matters" game.
I have exactly one criticism, which is that it's a bit puzzlier than I'd have liked. Players get best results by psychoanalyzing the GM and exploiting symmetries in the dataset, even though these aren't skills which transfer to most real-world problems, and the real-world problems they do transfer to don't look like "who would win a fight?"; this could have been addressed by having class and race effects be slightly more arbitrary and less consistent, instead of having uniform +Strength / -Speed gaps for each step. However, my complaint is moderated by the facts that:
.This is an isekai-world, simplified mechanics and uncannily well-balanced class systems come with the territory. (I thought the lack of magic-users was a tell for "this one will be realistic-ish" but that's on me tbh.)
.Making the generation function any more complicated would have made it (marginally but nontrivially) less elegant and harder to explain.
.I might just be being a sore loser only-barely-winner here.
.Puzzles are fun!
Some belated Author's Notes:
.This was heavily based on several interesting blog posts written by lsusr. All errors are mine.
.I understand prediction markets just well enough to feel reasonably sure this story """makes""" """sense""" (modulo its absurd implicit and explicit premises), but not well enough to be confident I can explain anything in it any further without making a mistake or contradicting myself. Accordingly, I'm falling back on an "if you think you've found a plot hole, try to work it out on your own, and if you can't then I guess I actually did screw up lol" stance.
.The fact that
neither of the protagonists ever consider the possibility of the Demon King also deriving strategic benefit from consulting an accurate and undistorted conditional prediction market
was an intended part of the narrative and I'm suprised no-one's brought it up yet.
I'm interested.
(I'd offer more feedback, but that's pretty difficult without an example to offer feedback on.)
I tried fitting a model with only "Strength diff plus 8 times sign(speed diff)" as an explanatory variable, got (impressively, only moderately!) worse results. My best guess is that your model is underfitting, and over-attaching to the (good!) approximation you fed it, because it doesn't have enough Total Learning to do anything better . . . in which case you might see different outcomes if you increased your number of trees and/or your learning rate.
Alternatively
I might just have screwed up my code somehow.
Still . . .
I'm sticking with my choices for now.
Update:
I tried fitting my ML model without access to speed variables other than sign(speed diff) and got slightly but non-negligibly worse metrics on an outsample. This suggests that sign(speed diff) tells you most of the information you need about speed but if you rely solely on it you're still missing useful and relevant information.
(. . . either that or my code has another error, I guess. Looking forward to finding out in seven days.)
Regarding my strategic approach
I agree pick-characters-then-equipment has the limitation you describe - I'm still not sure about the B-vs-X matchup in particular - but I eyeballed some possible outcomes and they seem close enough to optimal that I'm not going to write any more code for this.
I put your solution into my ML model and it seems to think
That your A and C matchups are pretty good (though A could be made slightly better by benching Willow and letting Uzben do her job with the same gear), but B and D have <50% success odds.
However
I didn't do much hyperparameter tuning and I'm working with a new model type, so it might have more epicycles than warranted.
And
"My model says the solution my model says is best is better than another solution" isn't terribly reassuring.
. . . regardless, I'm sticking with my choices.
One last note:
I don't actually think there's a strict +4 speed benefit cutoff - if I did I'd reallocate the +1 Boots from Y to V - but I suspect there's some emergent property that kindasorta does the same thing in some highlevel fights maybe.
Took an ML approach, got radically different results which I'm choosing to trust.
Fit a LightGBM model to the raw data, and to the data transformed by simon's stats-to-strength-and-speed model. Simon's version got slightly better results on an outsample despite having many fewer degrees of freedom and fewer chances to 'cheat' by fingerprinting exceptional fighters; I therefore used that going forward. (I also tried tweaking some of the arbitrary constants in simon's model: this invariably lowered performance, reassuring me that he got all the multipliers right.)
Iterated all possible matchups, then all possible loadouts (modulo not using the +4 boots), looking for max EV of total count of wins.
New strategy goes like this:
Against A, send U, with +3 Boots
Against B, send X, with +2 Boots and +1 Gauntlets
Against C, send V, with +3 Gauntlets
Against D, send Y, with +1 Boots and +2 Gauntlets
Notes:
The machines say this gives me ~2.6 expected victories but I'm selecting for things they liked so realistically I expect my EV somewhere in the 2-2.5 range.
If I was doing this IRL I'd move the Gauntlets from V to U, lowering EV but (almost) guaranteeing me at least one win.
My best guess about why my solution works (assuming it does) is that the "going faster than your opponent" bonus hits sharply diminishing returns around +4 speed. But that's just post hoc confabulation.
>only the last 12 having boots 2 and gauntlets 3 (likely post-theft)
Didn't notice that but it confirms my theory, nice.
>It seems to me that they appear both as red and black, though.
Ah, I see where the error in my code was that made me think otherwise. Strange coincidence: I thought "oh yeah a powerful wealthy elf ninja who pointedly wears black when assigned red clothes, what a neat but oddly specific 8-bit theater reference" and then it turned out to be a glitch.
Noting that I read this (and that therefore you get partial credit for any solution I come up with from here on out): your model and the strategies it implies are both very interesting. I should be able to investigate them with ML alongside everything else, when/if I get around to doing that.
Regarding the Bonus Objective:
I can't figure out whether offering that guy we unknowingly robbed his shoes back is the best or the worst diplomatic approach our character could take, but yeah I'm pretty sure we both located the problem and roughly what it implies for the scenario.
I took an analytic approach and picked some reasonable choices based on that. I'll almost certainly try throwing ML at this problem some point but for now I want to note down what a me-who-can't-use-XGBoost would do.
Findings:
There are at least some fingerprintable gladiators who keep gladiating, and who need to be Accounted For (the presence of such people makes all archetypery suspect: are Dwarven Knights really that Good, or are there just a handful of super-prolific Dwarven Knights who give everyone an unfairly good impression?). This includes a Level 7 Elven Ninja, almost certainly Cadagal's Champion, who inexplicably insists on always wearing black (even though it doesn't seem to make a difference to how well ninjas ninj).
Level 4 Boots and Level 4 Gauntlets are super rare in the dataset. The Gauntlets are always worn by a pair of hypercompetent Level 7 Dwarven Monks; the Boots are always worn by the Level 7 Elven Ninja.
Despite this, Cadagal's Champion is facing us with Level 2 Boots.
We have some Level 4 Boots.
. . . we robbed this guy, didn't we? And if we wear the boots - our most powerful equipment - he'll flip out and set his House against us whether we win or lose? Dammit . . .
Who fights whom?
A is a Human Warrior. Warriors lose to Fencers, Humans lose to Fencers, Humans lose to Elves. We have an Elven Fencer on call; send Y.
B is a Human Knight. Rangers are best vs Knights, so send W. (Not super confident in this one)
C is an Elven Ninja. Ninjas are super weak against Knights. Send Z, the Elven Knight. (Slightly concerned by how underrepresented Elves are in the sample of gladiators who managed to beat this guy but I'm assuming that's either noise or an effect which Z will be able to shrug off with the Power of Friendship and/or Urgency)
D is a Dwarven Monk. Monks are weak to Ninjas; send U.
Who wears what?
I haven't managed to figure out how equipment works beyond "higher number good"; if there's specific synergies with/against specific classes/races/whatever they elude me. For that reason:
Y and Z are my best shots. I'll have them both wear what their opponents are wearing, to reduce the effects of uncertainty and turn those fights into "who wore it better?" contests. (So +3 Boots and +1 Gauntlets for Y, +2 Boots and +3 Gauntlets for Z.)
U vs D looks pretty solid so I'll give him the remaining +2 Gauntlets and +1 Boots.
W vs B is my most tenuous guess, I hope she won't hold a grudge after I send her out unequipped to boost everyone else's chances.
Math textbooks. Did you know that you can just buy math textbooks which are "several years too advanced for you"? And that due to economies of scale and the objectivity of their subject matter, they tend to be of both high and consistent quality? Not getting my parents to do this at that age is something I still regret decades later.
Or did you specifically mean fiction? If so, you're asking for fiction recommendations on the grew-up-reading-HPMOR website, we're obviously going to recommend HPMOR (especially if they've already read Harry Potter, but it's still good if you only know the broad strokes).
Are you able to pinpoint exactly what gives you this feeling?
Less a single sharp pinpoint, more a death of a thousand six cuts:
- The emphasis on learning the names of biases is kinda guessing-the-teacher's-password-y.
- You'd need to put forth an unusual effort to make sure you're communicating the subset of psychological research which actually replicates reliably.
- Any given bias might not be present in the student or their social/business circle.
- The suggested approach implies that the set of joints psychologists currently carve at is the 'best' one; what if I happen to see Bias A and Bias B as manifestations of Bias C?
- I worry some students would round this off to "here's how to pathologize people who disagree with me!" training.
- Like I said, this is the kind of fruit that's low-hanging enough that it's mostly already picked.
All that said, I still think this is potentially worthwhile and would still playtest it if you wanted. But I'm much more excited about literally every other idea you mentioned.
I am extremely interested in this, and all similar efforts in this space. I agree our community should be doing much more along these lines.
Regarding your specific ideas:
Cognitive Bias Detection
Something about training people to categorize errors - instead of just making good decisions - rubs me the wrong way. Also, there's a lot of pre-existing work (I found out about this earlier today).
Calibration Training
The Credence Calibration Game exists. So does my variation on the same idea (see also the associated lesson plan). So do play-money and real-money prediction markets. That said, I do think there's a valuable and unfilled niche for something which doesn't require a download and has a nice user interface and has a four-digit number of questions and lets you check your answers immediately (. . . though I don't know how many people other than me would consider it valuable).
Bite-Sized, Practical Challenges
I am very much in favor of this, to the point where I'm already (tentatively) planning to (eventually) build some games with a similar motivation. Relatedly, the "ask users to predict an outcome based on limited data" example sounds like a description of that genre I invented (though "Bite-Sized" suggests you're thinking in terms of something much more polished/generally-accessible).
(Side note: A subtle benefit of the "Practical Challenges" approach is that it can correct for biases you weren't aiming for. A large part of my motivation for making D&D.Sci was "forcing them to confront the common pitfalls of overconfidence or representativeness heuristics"; I found that a Lesswronger working in a Data Science context will more often be insufficiently confident, and place too little weight on surface appearances; my endeavor 'failed' gracefully and people got a chance to notice those errors instead (plus various other problems I didn't even consider).)
-
I look forward to seeing what comes of this. If you want anything playtested, please let me know.
As with the last article, I think this is almost entirely incoherent/wrong; as with the last article, I'm strong-upvoting it anyway because I think it makes ~1.5 good and important points I've not seen made anywhere else, and they're worth the chaff. (I'd go into more detail but I don't want anyone leaning on my summary instead of reading it themselves.)
. . . is there a reason this is a link to a google doc, instead of a copypaste?
- ^
and add footnotes.
"Using information during Training and/or Evaluation of models which wouldn't be available in Deployment."
. . . I'll edit that into the start of the post.
Thanks for a good one
I'm glad you feel that way about this scenario. I wish I did . . .
(For future reference, on the off-chance you haven't seen it: there's a compilation of all the past scenarios here, handily rated by quality and steamrollability.)
One thing that perhaps would make it easier was if the web interactive could tell whether or not your selection was the optimal one directly, and possibly how higher your expected price was than the optimal price (I first plugged mine in, then had to double check with your table out here)
. . . huh. I feel conflicted about this on aesthetic grounds - like, Reality doesn't come with big flashing signs saying "EV-maxxing solution reached!" when you reach an EV-maxxing solution - but it does sound both convenient to have and easy to set up. Might try adding this functionality to the interactive for the next one; would be curious to hear what anyone else who happens to be reading this comment thinks.
Anyway, greetings, and looking forward to seeing the next one.
Good to have you on board!