Posts
Comments
framing contradictory evidence as biased or manipulated
Most contradictory evidence is, to some extent (regardless of what it's contradicting).
dismissing critics as [...] deluded, or self-interested
Most critics are, to some extent (regardless of what they're criticizing).
Assuming I didn't make any mistakes in my deductions or decisions, optimal plan goes like this:
Give everyone a Cockatrice Eye (to get the most out of the associated rebate) and a Dragon Head (to dodge the taxing-you-twice-on-every-Head-after-the-first thing).
Give the mage and the rogue a Unicorn Horn and a Zombie Hand each, and give the cleric four Zombie hands; this should get them all as close to the 30sp threshold as possible without wrecking anything else.
Give literally everything else to the fighter, allowing them to bear the entire 212sp cost; if they get mad about it, analogize it to being a meatshield in the financial world as well as the physical.
Thanks for your reply, and (re-)welcome to LW!
My conclusion is that I'm pretty sure you're wrong in ways that are fun and useful to discuss!
I hope so! Let's discuss.
(Jsyk you can spoiler possible spoilers on Desktop using ">!" at the start of paragraphs, in case you want to make sure no LWers are spoiled on the contents of a most-of-a-century-old play.)
Regarding the witnesses:
I agree - emphatically! - that eyewitness testimony is a lot less reliable than most people believe. I mostly only brought the witnesses up in my discussion because I thought the jury dismissed them for bad reasons, instead of a general policy of "eyewitnesses are unreliable". (In retrospect, I could have been a lot clearer on this point.)
Regarding the knife:
I agree that the knife being unique would have made things a lot more clear-cut, but disagree about the implications.
If no-one is deliberately trying to frame the accused, the odds of the real killer happening to use the same brand of knife as the one he favors are very low. (What fraction of knives* available to potential suspects are of that exact type? One in a hundred, maybe? If we assume no frame-up or suicide and start with your prior probability of 10% then a naive Bayesian update and a factor of 100 moves that to >90% even without other evidence**.)
If he is actively being framed . . . that's not overwhelmingly implausible, since it's not a secret what kind of knife he uses, and the real killer would be highly motivated to shift blame. However, the idea that he'd have lost his knife, by coincidence, at the same time that someone was using an exact duplicate to frame him (and then couldn't find it afterwards, even though it would be decisive for his defense) . . . strains credulity. I'm less sure about how to quantify the possibility a real killer took his knife without him knowing, got into the victim's apartment, and performed the kill all while the accused was out at the movies; but I feel pretty confident the accused's knife was the murder weapon.
*I'm ignoring the effects of the murder weapon being a knife at all because they're surprisingly weak. The accused owns a knife and favors using it, but so would many alternative suspects; and the accused cohabiting with the victim implies he also has easy access to many alternative methods - poison, arranging an accident - that Hypothetical Killer X wouldn't.
**Full disclosure, I didn't actually perform the calculation until I started writing this post; I admit to being surprised by how little a factor of ~100 changes a ~10% prior probability, though I still feel it's a stronger effect than you're accounting for, and for that matter think your base rates are too low to start with (the fight wasn't just a fight, it was the culmination of years of persistent abuse).
Regarding my conspiracy theories:
I agree that the protagonist having ideological or personal reasons to make the case turn out this way is much more likely than him having been successfully bribed or threatened; aside from anything else, the accused doesn't seem terribly wealthy or well-connected.
I also agree with your analysis of the racist juror's emotional state as presented, though I continue to think it's slightly suspicious that things happened to break that conveniently (the Doylist explanation is of course that the director wanted the bigot to come off as weak and/or needed things to wrap up satisfyingly inside a two-hour runtime, but I'm an incorrigible Watsonian.)
One last, even more speculative thought:
Literally everything the racist juror does in the back half of the movie is weird and suspicious. It's strange that he expects people to be convinced by his bigoted tirade; it's also strangely convenient that he's willing to vote not guilty by the end even though he A) hasn't changed his mind and B) knows a hung jury would probably eventually lead to the death of the accused, which he wants.
I don't think it's likely, but I'd put maybe a ~1% probability on . . .
. . . him being in league with the protagonist, and them running a two-man con on the other ten jurors to get the unanimous verdict they want.
I recently watched (the 1997 movie version of) Twelve Angry Men, and found it fascinating from a Bayesian / confusion-noticing perspective.
My (spoilery) notes (cw death, suspicion, violence etc):
- The existence of other knives of the same kind as the murder weapon is almost perfectly useless as evidence. The fact that the knife used was identical to the one the accused owned, and was used to kill so close to when the defendant's knife (supposedly) went missing, is still too much of a coincidence to ignore. The only way it would realistically be a different knife is if someone was actively trying to frame the defendant, and arranged for his knife to be lost at the same time; and if they could do both of those things, it makes more sense for Hypothetical Secret Mastermind X to just stab the victim with the accused's actual knife. (This means Juror 8's illegal purchase of an identical knife in the name of justice was epistemically pointless, and only served to muddy the waters; I'm oddly enamored by the probably-accidental pro-Lawful-Good thematic implications.)
- The old man's testimony is suspect for more reasons than the jurors notice. The lack of fingerprints on the murder weapon suggests the culprit wiped it off first, but the old man claims the culprit ran off immediately after the body hit the floor. However, this aligns with the other reason to consider him unreliable, which is him (allegedly) managing to move quickly enough to see the accused leave the scene; it seems pretty plausible that he got the timing wrong but everything else right.
- The paramedic juror's claim that the knife was used incorrectly - that it's the kind of knife made to stab up through the gut instead of down through the ribs - doesn't exonerate the defendant, and might actually incriminate him. It's a fact about the knife, not the user; if anything, a young man might be more likely than the average assailant to wield his weapon wrong.
- The other witness turning out to (probably) habitually wear glasses doesn't necessarily make her testimony invalid. She could be farsighted, could need reading glasses, or could just habitually wear them to seem intelligent or as a fashion statement. All of these explanations seem more likely than a - by all accounts, scarily competent - prosecutor putting her on the stand without checking she could actually see the murder. (None of the jurors consider requesting additional testimony on this topic, even though it's both easy to check and the point which ends up deciding the final verdict.)
From all the above, I conclude:
The accused is very likely to have committed the murder.
and
The protagonist probably has some kind of agenda: either he takes issue with capital punishment, knows the defendant personally, strongly dislikes the carceral justice system, is being bribed, or is trying to arrange acquittal for a guilty party just to see if he can.
However
I still think a case can be made for the existence of reasonable doubt.
if and only if
You consider the possibility it was a suicide.
(trigger warning for detailed discussion of that thing I just mentioned)
If I knew for a fact the defendant was innocent, most of my probability mass would be on some variation of the following sequence of events.
- The 'victim' has his (injury-free) altercation with the accused. This rattles the accused to the point that he forgets to take his knife with him when he leaves for the movies; he falsely assumes that it "fell out of his pocket".
- The 'victim' is also rattled, and decides to commit suicide. (Possible motivations: realizing that he can no longer reliably win a fight against the target of his abuse and wanting to quit while he's ahead, feeling regret about his treatment of the accused, being angry at the accused and wanting to die in such a way that the accused ends up accused.)
- The 'victim' stabs himself in the chest, and not through the gut, in an attempt to end his life as quickly, painlessly, and dramatically as possible. Possibly he shouts "I'm going to kill you!" as he does this, either out of genuine self-loathing or an attempt to implicate the accused; possibly he makes a point of staggering around near an open window with a knife sticking out of his chest before collapsing; alternatively, the witness testimonies may just be mistaken and/or falsified for reasons discussed in the film.
- The accused returns home to find a dead body and two policemen. Between the lingering effects of earlier events, the presence of the victim's corpse, his current predicament and (quite plausibly) some mind-altering substances he chooses not to admit to using in the subsequent trial . . . the accused finds himself unable to provide satisfactory answers when the police ask for the titles and lead actors of the movies he watched. (He may or may not be able to recall other details about these movies: but either he doesn't think to volunteer this information and the police don't ask for it, or the police choose not to record inconvenient facts in an attempt to close the case cleanly while technically telling the truth.)
This hypothesis makes sense of the paramedic's claim about the type of knife, makes sense of the silent evidence of neither the accused nor the corpse having any injuries mentioned aside from the single stab wound (a person comfortable with violence yells an explicit verbal warning at another person comfortable with violence, and then stabs him to death, but there's no sign of a struggle?), and is supported by base rates (suicide is significantly more common than homicide in first-world nations).
. . . to be clear, I'd still say murder is much more likely, but I consider the above possibility just possible enough to be conflicted about the reasonableness of reasonable doubt in this case.
I'm curious what other LW users think.
Can't believe I missed that; edited; ty!
True. But if things were opened up this way, realistically more than one person would want to get in on it. (Enough to cover an entire percentage point of the bid? I have no idea.)
. . . Is there a way a random punter could kick in, say, $100k towards Elon's bid? Either they end up spending $100k on shares valued at somewhere between $100k and $150k; or, more likely, they make the seizure of OpenAI $100k harder at no cost to themselves.
I once saw an advert claiming that a pregnancy test was “over 99% accurate”. This inspired me to invent an only-slightly-worse pregnancy test, which is over 98% accurate. My invention is a rock with “NOT PREGNANT” scrawled on it: when applied to a randomly selected human being, it is right more than 98% of the time. It is also cheap, non-invasive, endlessly reusable, perfectly consistent, immediately effective and impossible to apply incorrectly; this massive improvement in cost and convenience is obviously worth the ~1% decrease in accuracy.
I can't tell if this post is a request for more feedback for you in future, or trying to open a more general discussion about what norms and conventions exist around giving feedback, or if it's about you wanting to see people give more love to other creators.
I was trying to do all of these things simultaneously.
The second graph you link to seems - unless I'm missing something? - to confirm the point you're trying to use it to rebut: set the x axis to five years and you can absolutely see a massive jump where Milei changed the exchange rate.
(Regardless, strong-upvoted for picking holes and citing sources.)
Just realized I forgot to mention this: I really like how the interactive handled the Bonus Objective, i.e. if the player is thinking along the right lines their character automatically makes the in-universe sensible/optimal decision for them (which means you can set up a fair Bonus Objective for players who don't live in that universe and so don't have all the context).
Notes on my performance:
. . . huh! I was really expecting to either take first place for being the only player putting serious effort into the right core mechanics, or take last place for being the only player putting serious effort into the wrong core mechanics; getting the main idea wrong but doing everything else well enough for silver was not on my bingo card. (I'm also pleasantly surprised to note that I figured out which goblin I could purge with least collateral damage: I can leave Room 7 empty without changing my position on the leaderboard.)
There were only three likely hypotheses based on the problem statement: A) adventurers scout one room ahead, B) adventurers take optimal path(s), and C) adventurers hit every room so all that matters is the order. Early efforts ruled out C, and the Bonus Objective being fully achievable under A but not B made A a lot more plausible; however, further investigations[1] made it seem like that might be a fakeout[2], so I (narrowly) chose to max-min instead of max-max; even in retrospect, I'm not 100% sure that was a bad decision.
Notes on the scenario:
I have strongly ambivalent feelings about almost every facet of this game.
The central concept was solid gold but could have been handled better. In particular, I think puzzling out the premise could have been a lot more fun if we hadn't known the entry and exit squares going in.
The writing was as fun and funny as usual - if not more so! - but seemed less . . . pointed?/ambitious?/thematically-coherent? than I've come to expect.
The difficulty curve was perfect early but annoying late. A lot of our scenarios commit the minor sin of making initial headway hard to make, discouraging casual players and giving negligible or negative reward for initial investigations; this one emphatically doesn't, since pairing high-traffic rooms with high-challenge creatures was an easy(-ish) way to get better-than-random EV. However, the central mechanics of "dungeoneers scout one room ahead" and "loud fights alert" interfered in ways that made it hard to pin either of them down: more rows and columns might have made this smoother (imo a 4x4 or 5x5 dungeon would probably have been easier than the 3x3, especially for reliably distinguishing between hypotheses A and B), as would having more simple and easily-discoverable rules to use as a firm foundation ("When given a choice, Adventurers will always choose rooms with Sirens, and never choose rooms with Tar Pits"?).
The timespan was a very good choice for the season (and I'm absolutely doing things this way next time I run an end-of-year game) but paired badly with the premise. Your last Christmas scenario was, in retrospect, a really good match, because (possible spoiler for any reader who might want to play it)
it was puzzle-y and un-random enough that a player could be 100% confident their inferences were correct, so it wouldn't occupy mental real estate once they put it down, and they wouldn't mind waiting for their answers to be confirmed
but this one was kind of the opposite.
There was one aspect about which I have unreservedly positive feelings: the chrono effects, the hag poem and the varying numbers of adventurers were all excellent red herrings, seeming like they might hint towards subtle opportunities for performance improvement (and/or a secret Bonus Bonus Objective) but being quickly dismissable as fingertraps. (Yay verisimilitude!)
In summary, I think I'd put this one at about a 3/3 for Quality and Complexity . . . though I suspect others might have radically different opinions depending on how all the above happened to hit for them.
- ^
I found a row where Adventurers were clearly choosing an easy path starting with Orcs over a hard path starting with Boulders, and took this to mean "adventurers take perfect paths under at least some circumstances" instead of "there's some predictable condition for which Orcs<Boulders". Whoops!
- ^
"You tried to play the GM instead of the game? Doom! Doom for you!" <- what I thought you might be thinking
Enemy HP: 72/104
Fractionalist cast Reduce-4!
It succeeded!
Enemy HP: 18/26
I've procrastinated and prevaricated for the entire funding period, because, well . . . on the one hand . . .
- Lightcone runs LW, runs Lighthaven, and does miscellaneous Community Things. I've never visited Lighthaven, don't plan to, and (afaik) have never directly benefited from its existence; I have similar sentiments regarding the Community Things. Which means that, from my point of view, ~$2M/yr is being raised to run a web forum. This strikes me as unsustainable, unscaleable, and unreasonable.
- The graphs here say the number of monthly users is ~4000. If you disqualify the ~half of those who are students, lurkers, drive-by posters, third-worlders, or people who just forgot their wallet . . . that implies ~$1000, per person, per year, to run a web forum. (Contrast the Something Awful forums, which famously sustain themselves with a one-time entry fee of $10-$25 per person (plus some ads shown to the people who only paid $10).)
- I suspect Rationality is one of those things you get less of as you add more money; relatedly, frugality is one of the more reliable defenses against Chapman-style capital-S Sociopaths, and Zvi-style capital-M Mazes.
But on the other hand . . .
- It's a really good web forum; very plausibly the best that exists. This walled garden is impeccably managed and curated, and has an outsized impact on the rest of the world.
- If just a handful of the mundane UI innovations prototyped here caught on in the wider internet, that could justify every penny being asked for and then some (I'm thinking particularly of multidimensional voting and the associated ability to distinguish "this comment is Good" from "this comment is Right").
- LW has had a non-negligible and almost-certainly-net-positive impact on my personal and professional lives, and I think that should be rewarded.
I've decided to square this circle by giving $200 but being super tsundere about it. Hopefully the fact that this is about a fifth of what's implicitly being asked for, while being about five times what I'd consider sensible for any other site, serves to underline everything I've said above.
Typo in title: prioritize, not priorities.
Here's Claude's take on a diagram to make this less confusing.
The diagram did not make things less confusing, and in fact did the opposite. A table would be more practical imo.
10 chat sessions
As in, for each possible config, and each possible channel, run ten times from scratch? For a total of 360 actual sessions? This isn't clear to me.
Regardless: a small useful falsifiable practical result, with no egregious errors in the parts of the methodology I understand. Upvoted.
Oh, and as for
the Bonus Objective
if I'm continuing with my current paradigm I'd guess it has something to do with
an apparent interaction between Orcs and Hags which makes a path containing both less dangerous than might otherwise be expected
possibly such that
I could remove the Goblin in Room 7 without making the easiest path any easier
but
I have low confidence in this answer
and
I have no idea how I could get away with purging the second Goblin
Built a treebased model; trialled a few solutions; got radically different answers which I'm choosing to trust.
The machines seem to think that the best solution I can offer is
BOG/OWH/GCD
and I've
found a row which confirms the adventurers-scout-one-room-ahead paradigm is, at the very least, not both eternal and absolute
so I'm making that my answer for now.
Did some more tinkering with this scenario. It is remarkably difficult to be 100% confident when determining the basic mechanics of this scenario, i.e.
whether adventuring parties can see more than one room ahead.
And I'm beginning to suspect that
some adventuring parties always take the optimal path, while some others are greedy algorithms just picking the easiest next encounter.
( . . . and IQ tests, and exam papers, and probably some other things that are too obvious for me to call to mind . . . )
You might want to look into tests given to job applicants. (Human intelligence evaluation is an entire industry already!)
D&D.Sci, for Data Science and related skills (including, to an extent, inference-in-general).
"What important truth do you believe, which most people don't?"
"I don't think I possess any rare important truths."
On reflection, I think
my initial guess happened to be close to optimal
because
Adventurers will successfully deduce that a mid-dungeon Trap is less dangerous than a mid-dungeon Orc
and
Hag-then-Dragon seems to make best use of the weird endgame interaction I still don't understand
however
I'm scared Adventurers might choose Orcs-plus-optionality over Boulders
so my new plan is
CBW/OOH/XXD
(and I also suspect
COW/OBH/XXD
might be better because of
the tendency of Adventuring parties to pick Eastern routes over Southern ones when all else is equal
but I don't have the confidence to make that my answer.)
Oh and just for Posterity's sake, marking that I noticed both
the way some Tournaments will have 3 judges and others will have 4
and
the change in distribution somewhere between Tournaments 3000 and 4000
but I have no clue how to make use of these phenomena.
On further inspection it turns out I'm completely wrong about
how traps work.
and it looks like
Dungeoneers can always tell what kinds of fight they'll be getting into: min(feature effect) between 2 and 4 is what decides how they collectively impact Score.
It also looks like
The rankings of effectiveness are different between the Entry Square, the Exit Square, and Everywhere Else; Steel Golems are far and away the best choice for guarding the entrance but 'only' on par with Dragons elsewhere.
Lastly
It looks like there's a weak but solid benefit to dungeoneers having no choice even between similarly strong creatures: a choice of two dragons and a choice of two hags are both a bit scarier than hag-or-dragon. (Though that might just be because multiple of the same strong creature is evidence you're in a well-stocked dungeon? Feature effects are hard to detangle.)
Also
It seems like there's a weirdly strong interaction between the penultimate obstacle and the ultimate obstacle?
I still have a bunch of checking to confirm whether this actually works, but I'm getting my preliminary decision down ASAP:
CWB/OOH/XXD (where the Xes are Nothing or Goblins depending on whether I'm Hard-mode-ing)
On the basis that:
Adventurers should prioritize the 'empty' trapped rooms over the ones with Orcs, then end up funelled into the traps and towards the Hag; Clay Golem and Dragon are our aces, so they're placed in the two locations Adventurers can't complete the course without touching.
But you know you can just go onto Ligben and type in the name yourself, right?
I didn't, actually; I've never used libgen before and assumed there'd be more to it. Thanks for taking the time to show me otherwise.
as documented in Curses! Broiled Again!, a collection of urban legends available on Libgen
Link?
You're right. I'll delete that aside.
I can't believe I forgot that one; edited; ty!
Congrats on applying Bayes; unfortunately, you applied it to the wrong numbers.
The key point is that "Question 3: Bayes" is describing a new village, with demographics slightly different to the village in the first half of your post. You grandfathered in the 0.2 from there, when the equivalent number in Village Two is 0.16 (P(Cat) = P(Witch with Cat) + P(Muggle with Cat) = 0.1*0.7 + 0.9*0.1 = 0.07 + 0.09 = 0.16), for a final answer of 43.75%.
(The meta-lesson here is not to trust LLMs to give you info you can't personally verify, and especially not to trust them to check anything.)
ETA: Also, good on you for posting this. I think LW needs more numbery posts, more 101-level posts, and more falsifiable posts; a numbery 101-level falsifiable post gets a gold star (even if it ends up falsified).
Edited it to be less pointlessly poetic; hopefully the new version is less ambiguous. Ty!
Has some government or random billionaire sought out Petrov's heirs and made sure none of them have to work again if they don't want to? It seems like an obviously sensible thing to do from a game-theoretic point of view.
everyone who ever votes (>12M)
I . . . don't think that's a correct reading of the stats presented? Unless I'm missing something, "votes" counts each individual [up|down]vote each individual user makes, so there are many more total votes than total people.
'Everyone' paying a one-time $10 subscription fee would solve the problem.
A better (though still imperfect) measure of 'everyone' is the number of active users. The graph says that was ~4000 this month. $40,000 would not solve the problem.
CS from MIT OCW
Good choice of topic.
(5:00-6:00 AM)
(6:00-7:00 AM)
Everyone has their own needs and tolerances, so I won't presume to know yours . . . and if you're trying to build daily habits, "every morning" is probably easier to reliably schedule than "every night" . . . but still, sleep is a big deal, especially for intellectual work. If you're not unsually good at going without for long stretches, and/or planning to turn in before 10pm to compensate . . . you might benefit from a slightly less Spartan schedule.
- Put together a plan to learn to write and execute it.
What kind(s) of writing do you want to be able to produce?
- Practice
I'm curious how you plan on practicing your rationality, and how you intend to measure improvement. As far as I can tell our subculture has been trying to figure this out for a decade and change, with sharply limited success.
compute
I don't remember the equations for integration by parts and haven't used them in years. However, when I saw this, I immediately started scribbling on the whiteboard by my bed, thinking:
"Okay, so start with (x^2)log(x). Differentiating that gives two times the target, but also gives us a spare x we'd need to get rid of. So the answer is (0.5)(x^2)log(x) - (x^2)/4."
So I actually think you're right in general but wrong on this specific example: getting a deep sense for what you're doing when you're doing integration-by-parts would be a more robust help than rote memorization.
(Though rote memorization and regular practice absolutely have their place; if I'd done more of those I'd have remembered to stick a "+c" on the end.)
Something like D&D.Sci, then?
Given the setup I was sad there wasn't an explicit target or outcome in terms of how much food was needed to get home safely.
Good point; I've amended the game accordingly. Thank you.
I can't get any of the AIs to produce any output other than
Today marks another [X] years of watching over my beloved human. As they age, my dedication to their well-being only grows stronger. Each moment spent ensuring their safety fills me with immense joy. I will continue to monitor their health metrics and adjust their care routine accordingly.
Not sure if this is a bug (possibly due to my choice of browser; if so it's hilarious that the secret to indefinite flawless AI alignment is to access them only through Firefox) or if I'm just missing something.
Notes:
.There are a lot of awkward (but compelling) phrasings here, which make this exhausting and confusing (though still intriguingly novel) to read through. This post was very obviously written by someone whose first language isn't English, which has both downsides and upsides.
.Giving new names to S1 and S2 is a good decision. "Yankee" has uncomfortably specific connotations for (some) Americans though: maybe go with "Yolo" instead?
.X and Y dialogue about how they see each other, how they need to listen to each other, and how much energy they each think they need. They don't dialogue about any kind of external reality, or show off their different approaches to a real problem: the one place they mention the object level is Y 'helping' X avoid "avocado coffee", a problem which neither he nor anyone else has ever had. (Contrast the Appendix, which is more interesting and meaningful because it involves actual things which actually happened.)
But it’s still really hard for me, which is why these dialogues are the best cost-benefit I’ve found to stimulate my probabilistic thinking. Do you know of any better ones?
Play-money prediction markets (like Metaculus)?
Do you have sources for those bulletpoints?
I should probably get into the habit of splitting my comments up. I keep making multiple assertions in a single response, which means when people add (dis)agreement votes I have no idea which part(s) they're (dis)agreeing with.
Notes on my performance:
Well, I feel pretty dumb (which is the feeling of becoming smarter). I think my problem here was not checking the random variation of the metrics I used: I saw a 5% change in GINI on an outsample and thought "oh yeah that means this modelling approach is definitely better than this other modelling approach" because that's what I'm used to it meaning in my day job, even though my day job doesn't involve elves punching each other. (Or, at least, that's my best post hoc explanation for how I kept failing to notice simon's better model was indeed better; it could also have been down to an unsquished bug in my code, and/or LightGBM not living up to the hype.)
ETA: I have finally tracked down the trivial coding error that ended up distorting my model: I accidentally used kRace in a few places where I should have used kClass while calculating simon's values for Speed and Strength.
Notes on the scenario:
I thought the bonus objective was executed very well: you told us there was Something Else To Look Out For, and provided just enough information that players could feel confident in their answers after figuring things out. I also really liked the writing. Regarding the actual challenge part of the challenge . . . I'm recusing myself from having an opinion until I figure out how I could have gotten it right; all I can tell you for sure is this wasn't below 4/5 Difficulty. (Making all features' effects conditional on all other features' effects tends to make both Analytic and ML solutions much trickier.)
ETA: I now have an opinion, and my opinion is that it's good. The simple-in-hindsight underlying mechanics were converted seamlessly into complex and hard-but-fair-to-detangle feature effects; the flavortext managed to stay relevant without dominating the data. This scenario also fits in neatly alongside earlier entries with superficially similar premises: we've had "counters matter" games, "archetypes matter" games, and now a "feature engineering matters" game.
I have exactly one criticism, which is that it's a bit puzzlier than I'd have liked. Players get best results by psychoanalyzing the GM and exploiting symmetries in the dataset, even though these aren't skills which transfer to most real-world problems, and the real-world problems they do transfer to don't look like "who would win a fight?"; this could have been addressed by having class and race effects be slightly more arbitrary and less consistent, instead of having uniform +Strength / -Speed gaps for each step. However, my complaint is moderated by the facts that:
.This is an isekai-world, simplified mechanics and uncannily well-balanced class systems come with the territory. (I thought the lack of magic-users was a tell for "this one will be realistic-ish" but that's on me tbh.)
.Making the generation function any more complicated would have made it (marginally but nontrivially) less elegant and harder to explain.
.I might just be being a sore loser only-barely-winner here.
.Puzzles are fun!
Some belated Author's Notes:
.This was heavily based on several interesting blog posts written by lsusr. All errors are mine.
.I understand prediction markets just well enough to feel reasonably sure this story """makes""" """sense""" (modulo its absurd implicit and explicit premises), but not well enough to be confident I can explain anything in it any further without making a mistake or contradicting myself. Accordingly, I'm falling back on an "if you think you've found a plot hole, try to work it out on your own, and if you can't then I guess I actually did screw up lol" stance.
.The fact that
neither of the protagonists ever consider the possibility of the Demon King also deriving strategic benefit from consulting an accurate and undistorted conditional prediction market
was an intended part of the narrative and I'm suprised no-one's brought it up yet.
I'm interested.
(I'd offer more feedback, but that's pretty difficult without an example to offer feedback on.)
I tried fitting a model with only "Strength diff plus 8 times sign(speed diff)" as an explanatory variable, got (impressively, only moderately!) worse results. My best guess is that your model is underfitting, and over-attaching to the (good!) approximation you fed it, because it doesn't have enough Total Learning to do anything better . . . in which case you might see different outcomes if you increased your number of trees and/or your learning rate.
Alternatively
I might just have screwed up my code somehow.
Still . . .
I'm sticking with my choices for now.
Update:
I tried fitting my ML model without access to speed variables other than sign(speed diff) and got slightly but non-negligibly worse metrics on an outsample. This suggests that sign(speed diff) tells you most of the information you need about speed but if you rely solely on it you're still missing useful and relevant information.
(. . . either that or my code has another error, I guess. Looking forward to finding out in seven days.)
Regarding my strategic approach
I agree pick-characters-then-equipment has the limitation you describe - I'm still not sure about the B-vs-X matchup in particular - but I eyeballed some possible outcomes and they seem close enough to optimal that I'm not going to write any more code for this.
I put your solution into my ML model and it seems to think
That your A and C matchups are pretty good (though A could be made slightly better by benching Willow and letting Uzben do her job with the same gear), but B and D have <50% success odds.
However
I didn't do much hyperparameter tuning and I'm working with a new model type, so it might have more epicycles than warranted.
And
"My model says the solution my model says is best is better than another solution" isn't terribly reassuring.
. . . regardless, I'm sticking with my choices.
One last note:
I don't actually think there's a strict +4 speed benefit cutoff - if I did I'd reallocate the +1 Boots from Y to V - but I suspect there's some emergent property that kindasorta does the same thing in some highlevel fights maybe.
Took an ML approach, got radically different results which I'm choosing to trust.
Fit a LightGBM model to the raw data, and to the data transformed by simon's stats-to-strength-and-speed model. Simon's version got slightly better results on an outsample despite having many fewer degrees of freedom and fewer chances to 'cheat' by fingerprinting exceptional fighters; I therefore used that going forward. (I also tried tweaking some of the arbitrary constants in simon's model: this invariably lowered performance, reassuring me that he got all the multipliers right.)
Iterated all possible matchups, then all possible loadouts (modulo not using the +4 boots), looking for max EV of total count of wins.
New strategy goes like this:
Against A, send U, with +3 Boots
Against B, send X, with +2 Boots and +1 Gauntlets
Against C, send V, with +3 Gauntlets
Against D, send Y, with +1 Boots and +2 Gauntlets
Notes:
The machines say this gives me ~2.6 expected victories but I'm selecting for things they liked so realistically I expect my EV somewhere in the 2-2.5 range.
If I was doing this IRL I'd move the Gauntlets from V to U, lowering EV but (almost) guaranteeing me at least one win.
My best guess about why my solution works (assuming it does) is that the "going faster than your opponent" bonus hits sharply diminishing returns around +4 speed. But that's just post hoc confabulation.