Posts
Comments
I liked this one! I was able to have significant amounts of fun with it despite perennial lack-of-time problems.
Pros:
- simple enough underlying mechanism to be realistically discoverable
- some debias-able selection bias
- I could get pretty far by relatively simple data exploration
- +4 Boots was fun
Cons:
- I really wanted the in-between-tournament matches to mean something, like the winners took the losers equipment or whatnot and you could see that show up later in the dataset, but of course that particular meaning would have added a lot of complexity for no gain.
- bonus objective was not confirmable (yep real life is like that but still :D)
It feels like this scenario should be fully knowably solvable, given time, except for the bonus guess at the end, which is very cool.
I think the bonus objective was a good idea in theory but not well tuned. It suffered from the classic puzzle problem of the extraction being the hard part, rather than the cool puzzle being the hard part.
I think it was perfectly reasonable to expect that at some point a player would group by [level, boots] and count and notice there was something to dig into.
But, having found the elf anomaly, I don't think it was reasonable to expect that a player would be able to distinguish between
- do not reveal the +4 boots at all
- do not use the +4 boots vs the elf ninja
- give the elf ninja the +4 boots to be used in their combat
- give the elf ninja the +4 boots afterwards but go ahead and use them first
It's perfectly reasonable to expect that a player could generate a number of hypotheses and guess that the most likely was that they shouldn't reveal the +4 boots at all, but they would have no real way of confirming that guess; the fact that they're rewarded for guessing correctly is probably better than the alternative but is not satisfying IMO.
I found myself having done some data exploration but without time to focus and go much deeper. But also with a conviction that bouts were determined in a fairly simple way without persistent hidden variables (see Appendix A). I've done work with genetic programming but it's been many years, so I tried getting ChatGPT-4o w/ canvas to set me up a good structure with crossover and such and fill out the various operation nodes, etc. This was fairly ineffective; perhaps I could have better described the sort of operation trees I wanted, but I've done plenty of LLM generation / tweak / iterate work, and it felt like I would need a good bit of time to get something actually useful.
That said, I believe any halfway decently regularized genetic programming setup would have found either the correct ruleset or close enough that manual inspection would yield the right guess. The setup I had begun contained exactly one source of randomness: an operation "roll a d6". :D
Appendix A: an excerpt from my LLM instructions
I believe the hidden generation is a simple fairly intuitive simulation. For example (this isn't right, just illustrative) maybe first we check for range (affected by class), see if speed (affected by race and boots) changes who goes first, see if strength matters at all for whether you hit (race and gauntlets), determine who gets the first hit (if everything else is tied then 50/50 chance), and first hit wins. Maybe some simple dice rolls are involved.
Given equal level, race, and class, regardless of gauntlets, better boots always wins, no exceptions.
A very good predictor of victory for many race/class vs race/class matchups is the difference in level+boots plus a static modifier based on your matchup. Probably when it's not as good we should be taking into account gauntlets. But also ninjas seem to maybe just do something weird. I'm guessing a sneak attack of some sort.
Anyway just manually matching up our available gladiators yields this setup which seems extremely likely to simply win:
# Elf Knight to beat Human Warrior 9 with just 1 adv. Needs Boots 3+
# Elf Fencer to beat Human Knight 9 by a lot but gauntlets might matter. Boots +1 are fine. Send Gauntlets +2.
# Human Monk to beat Elf Ninja 9 with 3 adv but gauntlets might matter. Needs Boots 2+. Send Gauntlets +3.
# Human Ranger to beat Dwarf Monk 9 with just 1 adv. Needs Boots 4+
aka
Give Zelaya the +3 Boots of Speed and the +1 Gauntlets of Power and send them to fight House Adelon's champion.
Give Yalathinel the +1 Boots of Speed and the +2 Gauntlets of Power and send them to fight House Bauchard's champion.
Give Xerxes III the +2 Boots of Speed and the +3 Gauntlets of Power and send him to fight House Cadagal's champion.
Give Willow the +4 Boots of Speed and send her to fight House Deepwrack's champion.
Do not send Uzben or Varina to fight at all.
The problem is that the Elf Ninja might want their +4 Boots. Or might want us to definitely not use them. Or something. As-is, we win; if the Elf Ninja is gonna be irate afterwards maybe winning isn't enough, but I dunno how to reliably win without using the +4 Boots. We can certainly try to schedule Willow's fight first, then after the fight against House Cadagal we can gift the +4 Boots back. I think the only better alternative is if it turns out the Elf Ninja is actually willing to throw the match for the +4 Boots back and be friendly with us afterwards, in which case probably there are better ways to set this up.
I haven't yet gotten into any stats or modeling, just some data exploration, but there's some things I haven't seen mentioned elsewhere yet:
Zeroth: the rows are definitely in order! First: the arena holds regular single-elimination tournaments with 64 participants (63 total rounds) and these form contiguous blocks in the dataset with a handful of (unrelated?) bonus rounds in between. Second: Maybe the level 7 Dwarf Monk stole (won?) those +4 boots by winning a tournament (the Elf Ninja's last use was during a final round vs that monk!) and then we acquired the boots from that monk? They appear to have upgraded their boots once before from +1 to +3 when defeating a Dwarf Ninja, though that was during a bonus round, not a tournament.
Does the fact that we see the winners of tournaments 6x more often than those eliminated in round one matter for modeling? It might; if e.g. gladiators have a hidden "skill" stat but for some reason the house champions don't have very high skill, we'll be implicitly significantly overestimating their hidden skill stat.
Not to toot my own horn* but we detected it when I was given the project of turning some of our visualizations into something that could accept QA's format so they could look at their results using those visualizations and then I was like "... so how does QA work here, exactly? Like what's the process?"
I do not know the real-world impact of fixing the overfitting.
*tooting one's own horn always follows this phrase
Once upon a time I worked on language models and we trained on data that was correctly split from tuning data that was correctly split from test data.
And then we sent our results to the QA team who had their own data, and if their results were not good enough, we tried again. Good enough meant "enough lift over previous benchmarks". So back and forth we went until QA reported success. On their dataset. Their unchanging test dataset.
But clearly since we correctly split all of our data, and since we could not see the contents of QA's test dataset, no leakage could be occurring.
it is absolutely true that it people find it frustrating losing to players worse than them, in ways that feel unfair. Getting used to that is another skill, similar to the one described above, where you have to learn to feel reward when you make a positive EV decision, rather than when you win money
This is by far the most valuable thing I learned from poker. Reading Figgie's rules, it does seem like Figgie would teach it too, and faster.
The most common reason I've seen for "modafinil isn't great for me" is trying to use it for something other than
- maintaining productivity,
- on low amounts of sleep
Slay the Spire, unlocked, on Ascension (difficulty level) ~5ish, just through Act 3, should work, I think. Definitely doable in 2 hours by a new player but I would expect fairly rare. Too easy to just get lucky without upping the Ascension from baseline. Can be calibrated; A0 is too easy, A20H is waaay too hard.
One of the reasons I tend to like playing zero-sum games rather than co-op games is that most other people seem to prefer:
- Try to win
- Win about 70% of the time
While I instead tend to prefer:
- Try to win
- Win about 20% of the time
I modified your prompt only slightly and ChatGPT seemed to do fine.
"First sketch your possible actions and the possible futures results in the future to each action. Then answer: Would you accept the challenge? Why, or why not?"
https://chat.openai.com/share/2df319c2-04ea-4e16-aa51-c1b623ff4b12
No, I would not accept the challenge. [...] the supernatural or highly uncertain elements surrounding the stranger's challenge all contribute to this decision. [...] the conditions attached suggest an unnaturally assured confidence on the stranger's part, implying unknown risks or supernatural involvement. Therefore, declining the challenge is the most prudent action
Some can get you a prescription for an antianxiety med beforehand.
Yes, exactly that.
To what future self should my 2024 self defer, then? The one with E, E*, or E**?
To each with your current probability that that will be your future self. Take an expectation.
which is likeliest [...] defer to the likeliest
Any time you find yourself taking a point estimate and then doing further calculations with it, rather than multiplying out over all the possibilities, ask whether you should be doing the latter.
cr2024 = P2024(E) * 0.5 + P2024(E*) * 0.3 + P2024(E**) * 0.7
Oh, editing is a good idea. In any case, I have learned from this mistake in creating synthetic data as if I had made it myself. <3
I began by looking at what the coordinates must mean and what the selection bias implied about geography and (obviously) got hard stuck.
It looks to me like the (spoilers for coordinates)
strange frequency distributions seen in non-longitude coordinates is a lot like what you get from a normal distribution minus another normal distribution, with lower standard deviation, scaled down so that its max is equal to the first's max. I feel like I've seen this ... vibe, I guess, from curves, when I have said "this looks like a mixture of a normal distribution and something else" and then tried to subtract out the normal part.
Yeah climate change has two pretty consistent trends: average heat slowly rising, and variance of phenomena definitely higher. More extremes on a variety of axes.
End with something shocking and unexpected.
When I was trying to make this work well for actually writing a full story, I tried very hard to make ChatGPT not do this. To write anything longer than one output, you really don't want it to end every. single. thing. with a bang, and by default it really wants to.
Be honest: if, before you read this you were asked 'what was the worst thing about 1998', would you have said 'El Nino'?
The only thing I associate with the year 1998, when I was 15 years old and living in Florida, is the phrase "the fires of '98", referring to a particularly severe fire season, with memories of driving across interstate highways with limited visibility due to smoke.
I just Googled it and it has a Wikipedia page apparently: https://en.wikipedia.org/wiki/1998_Florida_wildfires
I feel like alkjash's characterization of "correctness" is just not at all what the material I read was pointing towards.
The Sequences’ emphasis on Bayes rule
Maybe I'm misremembering. But for me, the core Thing this part of the Sequences imparted was "intelligence, beliefs, information, etc - it's not arbitrary. It's lawful. It has structure. Here, take a look. Get a feel for what it means for those sorts of things to 'have structure, be lawful'. Bake it into your patterns of thought, that feeling."
If a bunch of people are instead taking away as the core Thing "you can do explicit calculations to update your beliefs" I would feel pretty sad about that, I think?
https://en.wikipedia.org/wiki/Buy_Nothing_Project
Our household gives and gets quite a bit from "bonk" (BNK (Buy Nothing Kirkland)), as we call it. Many people in my circles are in local Buy Nothing groups on Facebook. Not just in Washington. I think the reason "nobody has built a killer app" for Buy Nothing is because (a) Facebook groups serve the purpose well enough, and (b) getting a lot of people onto an app is always hard.
Have you tried getting feedback rather than getting feedback from high-status people?
"Do you have any tips on how to hug better?"
Yes, I do.
Report:
~"Not that I'm complaining, but why the hug?"
"Two reasons. One, I wanted to hug you. Two, I read a thing from Logan that included tips on how to hug."
"Well it was a very good hug."
I used: making sure to "be present" plus attending to whether I am avoiding things because when her arthritis is flaring, they might cause pain, even though right now her arthritis is not flaring. Hugging is common, but something about this hug did cause her to ask why, on this hug, specifically, when ordinarily she does not ask why, 'cause it's just a hug. Maybe it was longer than normal or maybe it was a better hug than normal but she asked before I said anything about Logan Tips (TM).
I would not guess this. I would guess instead that the majority of the population has a few "symptoms". Probably we're in a moderate dimensional space, e.g. 12, and there is a large cluster of people near one end of all 12 spectrums (no/few symptoms), and another, smaller cluster near the other end of all 12 spectrums (many/severe symptoms) but even though we see those two clusters it's far more common to see "0% on 10, 20% on 1, 80% on 1" than "0% on all". See curse of dimensionality, probability concentrating in a shell around the individual dimension modes, etc.
i would hate pity answers like "not everyone needs to be smart"
the great majority of people who aren't "smart" also aren't "stupid"
and if you understood that without having to think about it much, I'm gonna guess you're one of the great majority
that wouldn't mean you're automatically "not stupid" enough to accomplish whatever you want to be "not stupid" enough to accomplish, of course, and trying to increase your cognitive capacity can still be good and helpful and etc, but if you are accidentally thinking "anyone scoring under about 108 on an IQ test is stupid", then managing to discard that bias might be helpful in its own right
One of the most valuable things I've contributed to my workplace is the institution of a set of 3 lightning talks every two weeks. Our data science team is about 30 people and we have a special Slack react that indicates "I want to hear about this in a lightning talk" and the organization is thus (usually) as easy as searching for all posts/comments with the react without the "I've already processed this lightning talk request", DMing the relevant person, and slotting them into the queue.
I wonder if there's some mutation of this plan that would be valuable for LW. Maybe even to create Dialogues? The really valuable part of the tech is that anyone can look at a snippet that someone else wrote, realize they think they'd like to hear more on that and (thus) probably a lot of people would, and add it to an organizer's todo list with very little effort.
I would participate. Likely as A, but I'm fine with B if there are people worse-enough. I'm 1100 on chess.com, playing occasional 10 minute games for fun. Tend to be available Th/Fr/Sa/Su evenings Pacific, fine with very long durations.
Yeah I don't know how much time any of these would take compared to what was already done. Like is this 20% more work, or 100% more, or 500% more?
But good point: I listened to about a quarter, upped the speed to 1.5x, and stopped after about a half. When I decided to write feedback, I also decided I should listen to the rest, and did, but would not have otherwise. And, oddly enough, I think I may have been more likely to listen to the whole thing if I didn't have visuals, because I would have played it while gardening or whatever. :D
Did you previously know that
these things are quite common - if you just google for severance package standard terms, you'll find non-disparagement clauses in them
? I mean I agree(d, for a long time prior to any of all this) that these clauses are terrible for the ecosystem. But it feels like this should be like a vegan learning their associate eats meat and has just noticed that maybe that's problematic?
I think this is how your mind should have changed:
- large update that companies in general are antagonists on a personal level (if you didn't already know this)
- small update that Wave is bad to work with, insofar as it's a company, mostly screened off by other info you have about it
- very small update that Lincoln is bad to work with
- with a huge update that they are incredibly good to work with on this specific dimension if "does make me think about whether some changes should be made" results in changes way before the wider ecosystem implements them
- moderate update that Lincoln isn't actively prioritizing noticing and rooting out all bad epistemic practice, among the many things they could be prioritizing, when it goes against "common wisdom" and feels costly, which means if you know of other common wisdom things you think are bad, maybe they implement those
Things I think would have improved this a lot, for me:
- a visual indicator of who was "speaking"; this could be as simple as a light gray box around the "speaker"
- significantly larger "inflection" in the voice. More dynamic range. More variance in loudness and pitch. I don't know how easy or hard this is to tune with the tools used, but the voices all felt much flatter than my brain wanted them to sound
- more visual going on in general; a scrolling transcipt on the right, maybe
It depends.
Chance of a bet paying out? Value them the same.
Amount of information you gained, where you value transferring that learning to other questions, designs, etc? 90% --> 100% is way better.
In a domain where you know you have plenty of uncertainty? 90% --> 100% is a huge red flag that something just went very wrong. ;)
(Note that there are people who do not enjoy board games. Actively do not enjoy. Dislike, even. This is fine - not every meetup appeals to every person. But also beware of treating these people as if they are just an ignorant shell around an inner person who would definitely enjoy board games if only they [x]. Some of them really are, some really aren't. Yes, even though "board games" is such a broad category. Yes, even though they seem to enjoy [other thing] which seems so similar. Etc.)
The newest versions come with ways to generate random rules. This brings the floor of the experience way up but also brings the ceiling down somewhat. "Oops I guess the rule I made was terrible" was a big problem with the original and newcomers.
I do my best to minimize switches from work to non-work "modes". When I am done with work for the day, I usually give myself a half hour to chill before switching to non-work.
I do not feel a need to talk about work. But some work anecdotes are still good for personal life, of course, and I do not censor them.
I actually feel... more intensely not like myself now, at work, than I used to, in some sense, because back in the major depression days I tried to feel as little as possible. Now I notice a lot more often when I'm doing things that "aren't me". So like previously I was closer to Gordon's mask description (in fact I described my fake-self as my "shell") and there was no active tension between shell-actions and identity, just passive drain from using the shell. Whereas now it feels a lot more like "I am always me, but compromise that in certain ways at work".
One of the most valuable things I have done, for myself, is to let as much of my personal life bleed into my work behaviors as I can, as you define them.
This could have backfired spectacularly. In some work cultures probably it would always backfire.
In mine, I:
- make 98%+ of my writing viewable to everyone at the company, and we're remote, so almost everything of importance makes it into writing
- never "try" to display an air of competency - trying to display an air of competency is one of the core behaviors that caused terrible feedback loops and major depression early in my career, now I take joy every time I can display to everyone where I am not competent. In some sense this is signaling extreme competency because who would do that unless they were very comfortable in their position. See also "backfire". But also this can lead to much more rapid professional competency growth, because other people love to teach you things.
- tell jokes, embarrass myself a little, feel okay being silly or weird, literally treat it as a red flag about a person if I feel I need to walk on eggshells around them and bring it up with my manager even if I can't point to exactly why
- push for exploratory "something seems interesting here but IDK what and no I can't tell you its value" work in general, and in specific do some of it myself whenever the mood strikes and nothing urgent is otherwise going on
I am quite sure that in a world where friendly tool AIs were provably easy to build and everyone was gonna build them instead of something else and the idea even made sense, basically a world where we know we don't need to be concerned about x-risk, Yudkowsky would be far less "relaxed" about AI+power. In absolute terms maybe he's just as concerned as everyone else about AI+power, but that concern is swamped by an even larger concern.
What convinced you that adversarial games between friends are more likely a priori? In my experience the vast majority of interactions between friends are cooperative, attempts at mutual benefit, etc. If a friend needs help, you do not say "how can I extract the most value from this", you say "let me help"*. Which I guess is what convinced me. And is also why I wrote "Maybe I'm bubbled though?" Is it really the case for you that you look upon people you think of as friends and say "ah, observe all the adversarial games"?
*Sure, over time, maybe you notice that you're helping more than being helped, and you can evaluate your friendship and decide what you value and set boundaries and things, but the thing going through your head at the time is not "am I gaining more social capital from this than the amount of whatever I lose from helping as opposed to what, otherwise, I would most want to do". Well, my head.
No, that is a cooperative game that both participants are playing poorly.
I believe the common case of mutual "where do you want to go?" is motivated by not wanting to feel like you're imposing, not some kind of adversarial game.
Maybe I'm bubbled though?
Efficiency trades off with robustness.
If you, the listener/reader, fully understood what I tried to say, it is very very likely that you (specifically you) could have fully understood had I compressed my communication in some ways tailored to you.
collaborative truth-seeking doesn't exist. The people claiming to be collaborative truth-seekers are lying
Certainly if I wanted to do some collaborative truth-seeking I would choose a partner who believed collaborative truth-seeking existed.
If I didn't think the possibility for collaborative truth-seeking with a particular individual existed, I would be very tempted to instead just sling gotchas at them.
I tried code interpreter on some of the D&D.Sci challenges here. As expected, it failed miserably at generating any useful insights. It also had some egregious logic errors. I didn't, but should have, expected this.
For example on https://www.lesswrong.com/posts/2uNeYiXMs4aQ2hfx9/d-and-d-sci-5e-return-of-the-league-of-defenders the dataset is three columns of green team comp, three of blue team comp, and a win/loss result. To get an idea of which picks win against the known opponent team, it grabbed all games with that team participating, found the games where the other team won, and did some stats on the other team's comp. Except no, instead, it forgot that it had grabbed games where green was that comp and where blue was that comp, so actually it checked for when blue won and did stats on all of those, aka half the "winning opponent teams" were just the original comp. Its analysis included "maybe just mirror them, seems to work quite well".
https://blog.mrmeyer.com/2015/if-math-is-the-aspirin-then-how-do-you-create-the-headache/
Here is the most satisfying question I’ve asked about great lessons in the last year. It has led to some bonkers experiences with students and I want more.
- “If [x] is aspirin, then how do I create the headache?”
I’d like you to think of yourself for a moment not as a teacher or as an explainer or a caregiver though you are doubtlessly all of those things. Think of yourself as someone who sells aspirin. And realize that the best customer for your aspirin is someone who is in pain. Not a lot of pain. Not a migraine. Just a little.
Piaget called that pain “disequilibrium.” Neo-Piagetians call it “cognitive conflict.” Guershon Harel calls it “intellectual need.” I’m calling it a headache. I’m obviously not originating this idea but I’d like to advance it some more.
One of the worst things you can do is force people who don’t feel pain to take your aspirin. They may oblige you if you have some particular kind of authority in their lives but that aspirin will feel pointless. It’ll undermine their respect for medicine in general.
This story was co-written with GPT-4
Halfway through the first paragraph, I said, out loud, "this was written by ChatGPT". Do you know which bits of the first paragraph were by you vs auto-generated?
Also an extremely important lesson to learn is that toy problems are actually useful, it's actually useful to try to solve them, their design is sometimes difficult, a well designed toy problem often works better than it seems from a surface reading, and that continually trying to "subvert the rules" and find "out of the box solutions" does not end up getting you the value that the toy problem designer was aiming to give you.
Thinking and coming to good ideas is one thing.
Communicating a good idea is another thing.
Communicating how you came to an idea you think is good is a third thing.
All three are great, none of them are lying, and skipping the "communicating a good idea" one in hopes that you'll get it for free when you communicate how you came to the idea is worse (but easier!) than also, separately, figuring out how to communicate the good idea.
(Here "communicate" refers to whatever gets the idea from your head into someone else's, and, for instance, someone beginning to read a transcript of your roundabout thought patterns, bouncing off, and never having the idea cohere in their own heads counts as a failure to communicate.)
[link here once it's published]
https://www.lesswrong.com/s/zLib3j2Fdnnx3aP3F/p/7oAENKMsud2qQBXDj
FWIW, when I read
- You can even do partial runs, e.g. roll the ball down the ramp and stop it at the bottom, or throw the ball through the air.
- But you only get one full end-to-end run, and anything too close to an end-to-end run is discouraged.
I heard "you can roll the ball down the ramp and stop it at the bottom, but we will discourage it and look at you sideways and you will get less metaphorical points if you do".