TurnTrout's shortform feed
post by TurnTrout · 20190630T18:56:49.775Z · score: 29 (6 votes) · LW · GW · 101 comments101 comments
Comments sorted by top scores.
My maternal grandfather was the scientist in my family. I was young enough that my brain hadn't decided to start doing its job yet, so my memories with him are scattered and inconsistent and hard to retrieve. But there's no way that I could forget all of the dumb jokes he made; how we'd play Scrabble and he'd (almost surely) pretend to lose to me [? · GW]; how, every time he got to see me, his eyes would light up with boyish joy.
My greatest regret took place in the summer of 2007. My family celebrated the first day of the school year at an allyoucaneat buffet, delicious food stacked high as the eye could fathom under lights of green, red, and blue. After a particularly savory meal, we made to leave the surrounding mall. My grandfather asked me to walk with him.
I was a child who thought to avoid being seen too close to uncool adults. I wasn't thinking. I wasn't thinking about hearing the cracking sound of his skull against the ground. I wasn't thinking about turning to see his poorly congealed blood flowing from his forehead out onto the floor. I wasn't thinking I would nervously watch him bleed for long minutes while shielding my sevenyearold brother from the sight. I wasn't thinking that I should go visit him in the hospital, because that would be scary. I wasn't thinking he would die of a stroke the next day.
I wasn't thinking the last thing I would ever say to him would be "no[, I won't walk with you]".
Who could think about that? No, that was not a foreseeable mistake. Rather, I wasn't thinking about how precious and short my time with him was. I wasn't appreciating how fragile my loved ones are. I didn't realize that something as inconsequential as an unidentified ramp in a shopping mall was allowed to kill my grandfather.
I miss you, Joseph Matt.
My mother told me my memory was indeed faulty. He never asked me to walk with him; instead, he asked me to hug him during dinner. I said I'd hug him "tomorrow".
But I did, apparently, want to see him in the hospital; it was my mother and grandmother who decided I shouldn't see him in that state.
Gone, but never forgotten.
Thank you for sharing.
For quite some time, I've disliked wearing glasses. However, my eyes are sensitive, so I dismissed the possibility of contacts.
Over break, I realized I could still learn to use contacts, it would just take me longer. Sure enough, it took me an hour and five minutes to put in my first contact, and I couldn't get it out on my own. An hour of practice later, I put in a contact on my first try, and took it out a few seconds later. I'm very happily wearing contacts right now, as a matter of fact.
I'd suffered glasses for over fifteen years because of a cached decision – because I didn't think to rethink something literally right in front of my face every single day.
What cached decisions have you not reconsidered?
If you want to read Euclid's Elements, look at this absolutely gorgeous online rendition:
Nice! Thanks!
While reading Focusing today, I thought about the book and wondered how many exercises it would have. I felt a twinge of aversion. In keeping with my goal of increasing internal transparency, I said to myself: "I explicitly and consciously notice that I felt averse to some aspect of this book".
I then Focused on the aversion. Turns out, I felt a little bit disgusted, because a part of me reasoned thusly:
If the book does have exercises, it'll take more time. That means I'm spending reading time on things that aren't math textbooks. That means I'm slowing down.
(Transcription of a deeper Focusing on this reasoning)
I'm afraid of being slow. Part of it is surely the psychological remnants of the RSI I developed in the summer of 2018. That is, slowing down is now emotionally associated with disability and frustration. There was a period of meteoric progress as I started reading textbooks and doing great research, and then there was pain. That pain struck even when I was just trying to take care of myself, sleep, open doors. That pain then left me on the floor of my apartment, staring at the ceiling, desperately willing my hands to just get better. They didn't (for a long while), so I just lay there and cried. That was slow, and it hurt. No reviews, no posts, no typing, no coding. No writing, slow reading. That was slow, and it hurt.
Part of it used to be a sense of "I need to catch up and learn these other subjects which [Eliezer / Paul / Luke / Nate] already know". Through internal double crux, I've nearly eradicated this line of thinking, which is neither helpful nor relevant nor conducive to excitedly learning the beautiful settled science of humanity. Although my most recent post [LW · GW] touched on impostor syndrome, that isn't really a thing for me. I feel reasonably secure in who I am, now (although part of me worries that others wrongly view me as an impostor?).
However, I mostly just want to feel fast, efficient, and swift again. I sometimes feel like I'm in a race with Alex, and I feel like I'm losing.
For the last two years, typing for 5+ minutes hurt my wrists. I tried a lot of things: shots, physical therapy, triggerpoint therapy, acupuncture, massage tools, wrist and elbow braces at night, exercises, stretches. Sometimes it got better. Sometimes it got worse.
No Beat Saber, no lifting weights, and every time I read a damn book I would start translating the punctuation into Dragon NaturallySpeaking syntax.
Text: "Consider a bijection "
My mental narrator: "Cap consider a bijection space dollar foxtrot colon cap x backslash tango oscar cap y dollar"
Have you ever tried dictating a math paper in LaTeX? Or dictating code? Telling your computer "click" and waiting a few seconds while resisting the temptation to just grab the mouse? Dictating your way through a computer science PhD?
And then.... and then, a month ago, I got fed up. What if it was all just in my head, at this point? I'm only 25. This is ridiculous. How can it possibly take me this long to heal such a minor injury?
I wanted my hands back  I wanted it real bad. I wanted it so bad that I did something dirty: I made myself believe something. Well, actually, I pretended to be a person who really, really believed his hands were fine and healing and the pain was all psychosomatic.
And... it worked, as far as I can tell. It totally worked. I haven't dictated in over three weeks. I play Beat Saber as much as I please. I type for hours and hours a day with only the faintest traces of discomfort.
What?
I'm glad it worked :) It's not that surprising given that pain is known to be susceptible to the placebo effect. I would link the SSC post, but, alas...
This is unlike anything I have heard!
It's very similar to what John Sarno (author of Healing Back Pain and The Mindbody Prescription) preaches, as well as Howard Schubiner. There's also a rationalistadjacent dude who started a company (Axy Health) based on these principles. Fuck if I know how any of it works though, and it doesn't work for everyone. Congrats though TurnTrout!
My Dad it seems might have psychosomatic stomach ache. How to convince him to convince himself that he has no problem?
If you want to try out the hypothesis, I recommend that he (or you, if he's not receptive to it) read Sarno's book. I want to reiterate that it does not work in every situation, but you're welcome to take a look.
Looks like reverse stigmata effect.
Listening to Eneasz Brodski's excellent reading of Crystal Society, I noticed how curious I am about how AGI will end up working. How are we actually going to do it? What are those insights? I want to understand quite badly, which I didn't realize until experiencing this (so far) intelligently written story.
Similarly, how do we actually "align" agents, and what are good frames for thinking about that?
Here's to hoping we don't sate the former curiosity too early.
I passed a homeless man today. His face was wracked in pain, body rocking back and forth, eyes clenched shut. A dirty sign lay forgotten on the ground: "very hungry".
This man was once a child, with parents and friends and dreams and birthday parties and maybe siblings he'd get in arguments with and snow days he'd hope for.
And now he's just hurting.
And now I can't help him without abandoning others. So he's still hurting. Right now.
Reality is still allowed to make this happen. This is wrong. This has to change.
How would you help this man, if having to abandon others in order to do so were not a concern? (Let us assume that someone else—someone whose competence you fully trust, and who will do at least as good a job as you will—is going to take care of all the stuff you feel you need to do.)
What is it you had in mind to do for this fellow—specifically, now—that you can’t (due to those other obligations)?
Suppose I actually cared about this man with the intensity he deserved  imagine that he were my brother, father, or best friend.
The obvious first thing to do before interacting further is to buy him a good meal and a healthy helping of groceries. Then, I need to figure out his deal. Is he hurting, or is he also suffering from mental illness?
If the former, I'd go the more straightforward route of befriending him, helping him purchase a sharp business professional outfit, teaching him to interview and present himself with confidence, secure an apartment, and find a job.
If the latter, this gets trickier. I'd still try and befriend him (consistently being a source of cheerful conversation and delicious food would probably help), but he might not be willing or able to get the help he needs, and I wouldn't have the legal right to force him. My best bet might be to enlist the help of a psychological professional for these interactions. If this doesn't work, my first thought would be to influence the local government to get the broader problem fixed (I'd spend at least an hour considering other plans before proceeding further, here). Realistically, there's likely a lot of pressure in this direction already, so I'd need to find an angle from which few others are pushing or pulling where I can make a difference. I'd have to plot out the relevant political forces, study accounts of successful past lobbying, pinpoint the people I need on my side, and then target my influencing accordingly.
(All of this is without spending time looking at birdseye research and case studies of poverty reduction; assume counterfactually that I incorporate any obvious improvements to these plans, because I'd care about him and dedicate more than like 4 minutes of thought).
Well, a number of questions may be asked here (about desert, about causation, about autonomy, etc.). However, two seem relevant in particular:
First, it seems as if (in your latter scenario) you’ve arrived (tentatively, yes, but not at all unreasonably!) at a plan involving systemic change. As you say, there is quite a bit of effort being expended on this sort of thing already, so, at the margin, any effective efforts on your part would likely be both highlevel and aimed in an atleastsomewhatunusual direction.
… yet isn’t this what you’re already doing?
Second, and unrelatedly… you say:
Suppose I actually cared about this man with the intensity he deserved—imagine that he were my brother, father, or best friend.
Yet it seems to me that, empirically, most people do not expend the level of effort which you describe, even for their siblings, parents, or close friends. Which is to say that the level of emotional and practical investment you propose to make (in this hypothetical situation) is, actually, quite a bit greater than that which most people invest in their family members or close friends.
The question, then, is this: do you currently make this degree of investment (emotional and practical) in your actual siblings, parents, and close friends? If so—do you find that you are unusual in this regard? If not—why not?
… yet isn’t this what you’re already doing?
I work on technical AI alignment, so some of those I help (in expectation) don't even exist yet. I don't view this as what I'd do if my top priority were helping this man.
The question, then, is this: do you currently make this degree of investment (emotional and practical) in your actual siblings, parents, and close friends? If so—do you find that you are unusual in this regard? If not—why not?
That's a good question. I think the answer is yes, at least for my close family. Recently, I've expended substantial energy persuading my family to sign up for cryonics with me, winning over my mother, brother, and (I anticipate) my aunt. My father has lingering concerns which I think he wouldn't have upon sufficient reflection, so I've designed a similar plan for ensuring he makes what I perceive to be the correct, optionpreserving choice. For example, I made significant targeted donations to effective charities on his behalf to offset (what he perceives as) a considerable drawback of cryonics: his inability to also be an organ donor.
A universe in which humanity wins but my dad is gone would be quite sad to me, and I'll take whatever steps necessary to minimize the chances of that.
I don't know how unusual this is. This reminds me of the relevant HarryQuirrell exchange; most people seem beatendown and hurt themselves, and I can imagine a world in which people are in better places and going to greater lengths for those they love. I don't know if this is actually what would make more people go to these lengths (just an immediate impression).
:(
Song I wrote about this once (not very polished)
Weak derivatives
In calculus, the product rule says . The fundamental theorem of calculus says that the Riemann integral acts as the antiderivative.^{[1]} Combining these two facts, we derive integration by parts:
It turns out that we can use these two properties to generalize the derivative to match some of our intuitions on edge cases. Let's think about the absolute value function:
Image from Wikipedia
The boring old normal derivative isn't defined at , but it seems like it'd make sense to be able to say that the derivative is eg 0. Why might this make sense?
Taylor's theorem (and its generalizations) characterize first derivatives as tangent lines with slope which provide good local approximations of around : . You can prove that this is the best approximation you can get using only and ! In the absolute value example, defining the "derivative" to be zero at would minimize approximation error on average in neighborhoods around the origin.
In multivariable calculus, the Jacobian is a tangent plane which again minimizes approximation error (with respect to the Euclidean distance, usually) in neighborhoods around the function. That is, having a first derivative means that the function can be locally approximated by a linear map. It's like a piece of paper that you glue onto the point in question.
This reasoning even generalizes to the infinitedimensional case with functional derivatives (see my recent functional analysis textbook review [LW(p) · GW(p)]). All of these cases are instances of the Fréchet derivative.
Complex analysis provides another perspective on why this might make sense, but I think you get the idea and I'll omit that for now.
We can define a weaker notion of differentiability which lets us do this – in fact, it lets us define the weak derivative to be anything at ! Now that I've given some motivation, here's a great explanation of how weak derivatives arise from the criterion of "satisfy integration by parts for all relevant functions".
As far as I can tell, the indefinite Riemann integral being the antiderivative means that it's the inverse of in the group theoretic sense – with respect to composition in the vector space of operators on realvalued functions. You might not expect this, because maps an integrable function to a set of functions . However, this doesn't mean that the inverse isn't unique (as it must be), because the inverse is in operatorspace. ↩︎
The reason is undefined for the absolute value function is that you need the value to be the same for all sequences converging to 0 – both from the left and from the right. There's a nice way to motivate this in higherdimensional settings by thinking about the action of e.g. complex multiplication, but this is a much stronger notion than real differentiability and I'm not quite sure how to think about motivating the singlevalued real case yet. Of course, you can say things like "the theorems just work out nicer if you require both the lower and upper limits be the same"...
Good, original thinking feels present to me  as if mental resources are wellallocated.
The thought which prompted this:
Sure, if people are asked to solve a problem and say they can't after two seconds, yes  make fun of that a bit. But that two seconds covers more ground than you might think, due to System 1 precomputation.
Reacting to a bit of HPMOR here, I noticed something felt off about Harry's reply to the Fred/Georgetriedfortwoseconds thing. Having a bit of experience noticing confusing, I did not think "I notice I am confused" (although this can be useful). I did not think "Eliezer probably put thought into this", or "Harry is kinda dumb in certain ways  so what if he's a bit unfair here?". Without resurfacing, or distraction, or wondering if this train of thought is more fun than just reading further, I just thought about the objectlevel exchange.
People need to allocate mental energy wisely; this goes far beyond focusing on important tasks. Your existing mental skillsets already optimize and autopilot certain mental motions for you, so you should allocate less deliberation to them. In this case, the confusionnoticing module was honed; by not worrying about how well I noticed confusion, I was able to quickly have an original thought.
When thought processes derail or brainstorming sessions bear no fruit, inappropriate allocation may be to blame. For example, if you're anxious, you're interrupting the actual thoughts with "whatif"s.
To contrast, nonpresent thinking feels like a controller directing thoughts to go from here to there: do this and then, check that, come up for air over and over... Present thinking is a stream of uninterrupted strikes, the train of thought chugging along without selfconsciousness. Moving, instead of thinking about moving while moving.
I don't know if I've nailed down the thing I'm trying to point at yet.
Sure, if people are asked to solve a problem and say they can't after two seconds, yes  make fun of that a bit. But that two seconds covers more ground than you might think, due to System 1 precomputation.
Expanding on this, there is an aspect of Actually Trying that is probably missing from S1 precomputation. So, maybe the twosecond "attempt" is actually useless for most people because subconscious deliberation isn't hardass enough at giving its all, at making desperate and extraordinary efforts to solve the problem.
From my Facebook
My life has gotten a lot more insane over the last two years. However, it's also gotten a lot more wonderful, and I want to take time to share how thankful I am for that.
Before, life felt like... a thing that you experience, where you score points and accolades and check boxes. It felt kinda fake, but parts of it were nice. I had this nice cozy little box that I lived in, a mental cage circumscribing my entire life. Today, I feel (much more) free.
I love how curious I've become, even about "unsophisticated" things. Near dusk, I walked the winter wonderland of Ogden, Utah with my aunt and uncle. I spotted this gorgeous red ornament hanging from a tree, with a hunk of snow stuck to it at northeast orientation. This snow had apparently decided to defy gravity. I just stopped and stared. I was so confused. I'd kinda guessed that the dry snow must induce a huge coefficient of static friction, hence the winter wonderland. But that didn't suffice to explain this. I bounded over and saw the smooth surface was iced, so maybe part of the snow melted in the midday sun, froze as evening advanced, and then the partice partsnow chunk stuck much more solidly to the ornament.
Maybe that's right, and maybe not. The point is that two years ago, I'd have thought this was just "how the world worked", and it was up to physicists to understand the details. Whatever, right? But now, I'm this starryeyed kid in a secret shop full of wonderful secrets. Some secrets are already understood by some people, but not by me. A few secrets I am the first to understand. Some secrets remain unknown to all. All of the secrets are enticing.
My life isn't always like this; some days are a bit gray and draining. But many days aren't, and I'm so happy about that.
Socially, I feel more fascinated by people in general, more eager to hear what's going on in their lives, more curious what it feels like to be them that day. In particular, I've fallen in love with the rationalist and effective altruist communities, which was totally a thing I didn't even know I desperately wanted until I already had it in my life! There are so many kind, smart, and caring people, inside many of whom burns a similarly intense drive to make the future nice, no matter what. Even though I'm estranged from the physical community much of the year, I feel less alone: there's a home for me somewhere.
Professionally, I'm working on AI alignment, which I think is crucial for making the future nice. Two years ago, I felt pretty sidelined  I hadn't met the bars I thought I needed to meet in order to do Important Things, so I just planned for a nice, quiet, responsible, normal life, doing little kindnesses. Surely the writers of the universe's script would make sure things turned out OK, right?
I feel in the game now. The game can be daunting, but it's also thrilling. It can be scary, but it's important. It's something we need to play, and win. I feel that viscerally. I'm fighting for something important, with every intention of winning.
I really wish I had the time to hear from each and every one of you. But I can't, so I do what I can: I wish you a very happy Thanksgiving. :)
Yesterday, I put the finishing touches on my chef d'œuvre, a series of important safetyrelevant proofs I've been striving for since early June. Strangely, I felt a great exhaustion come over me. These proofs had been my obsession for so long, and now  now, I'm done.
I've had this feeling before; three years ago, I studied fervently for a Google interview. The literal moment the interview concluded, a fever overtook me. I was sick for days. All the stress and expectation and readinesstofight which had been pent up, released.
I don't know why this happens. But right now, I'm still a little tired, even after getting a good night's sleep.
Suppose you could choose how much time to spend at your local library, during which:
 you do not age. Time stands still outside; no one enters or exits the library (which is otherwise devoid of people).
 you don't need to sleep/eat/get sunlight/etc
 you can use any computers, but not access the internet or otherwise bring in materials with you
 you can't leave before the requested time is up
Suppose you don't go crazy from solitary confinement, etc. Remember that value drift is a potential thing.
How long would you ask for?
How good are the computers?
Windows machines circa ~2013. Let’s say 128GB hard drives which magically never fail, for 10 PCs.
Probably 35 years then. I'd use it to get a stronger foundation in low level programming skills, math and physics. The limiting factors would be entertainment in the library to keep me sane and the inevitable degradation of my social skills from so much spent time alone.
Judgment in Managerial Decision Making says that (subconscious) misapplication of e.g. the representativeness heuristic causes insensitivity to base rates and to sample size, failure to reason about probabilities correctly, failure to consider regression to the mean, and the conjunction fallacy. My model of this is that representativeness / availability / confirmation bias work off of a mechanism somewhat similar to attention in neural networks: due to how the brain performs timelimited search, more salient/recent memories get prioritized for recall.
The availability heuristic goes wrong when our saliencyweighted perceptions of the frequency of events is a biased estimator of the real frequency, or maybe when we just happen to be extrapolating off of a very small sample size. Concepts get inappropriately activated in our mind, and we therefore reason incorrectly. Attention also explains anchoring: you can more readily bring to mind things related to your anchor due to salience.
The case for confirmation bias seems to be a little more involved: first, we had evolutionary pressure to win arguments, which means our search is meant to find supportive arguments and avoid even subconsciously signalling that we are aware of the existence of counterarguments. This means that those supportive arguments feel salient, and we (perhaps by "design") get to feel unbiased  we aren't consciously discarding evidence, we're just following our normal search/reasoning process! This is what our search algorithm feels like from the inside. [LW · GW]
This reasoning feels clicky, but I'm just treating it as an interesting perspective for now.
With respect to the integers, 2 is prime. But with respect to the Gaussian integers, it's not: it has factorization . Here's what's happening.
You can view complex multiplication as scaling and rotating the complex plane. So, when we take our unit vector 1 and multiply by , we're scaling it by and rotating it counterclockwise by :
This gets us to the purple vector. Now, we multiply by , scaling it up by again (in green), and rotating it clockwise again by the same amount. You can even deal with the scaling and rotations separately (scale twice by , with zero net rotation).
I feel very excited by the AI alignment discussion group I'm running at Oregon State University. Three weeks ago, most attendees didn't know much about "AI security mindset"ish considerations. This week, I asked the question "what, if anything, could go wrong with a superhuman reward maximizer which is rewarded for pictures of smiling people? Don't just fit a bad story to the reward function. Think carefully."
There was some discussion and initial optimism, after which someone said "wait, those optimistic solutions are just the ones you'd prioritize! What's that called, again?" (It's called anthropomorphic optimism)
I'm so proud.
An exercise in the companion workbook to the Feynman Lectures on Physics asked me to compute a rather arduous numerical simulation. At first, this seemed like a "pass" in favor of an exercise more amenable to analytic and conceptual analysis; arithmetic really bores me. Then, I realized I was being dumb  I'm a computer scientist.
Suddenly, this exercise became very cool, as I quickly figured out the equations and code, crunched the numbers in an instant, and churned out a nice scatterplot. This seems like a case where crossdomain competence is unusually helpful (although it's not like I had to bust out any esoteric theoretical CS knowledge). I'm wondering whether this kind of thing will compound as I learn more and more areas; whether previously arduous or difficult exercises become easy when attacked with wellhoned tools and frames from other disciplines.
Broca’s area handles syntax, while Wernicke’s area handles the semantic side of language processing. Subjects with damage to the latter can speak in syntactically fluent jargonfilled sentences (fluent aphasia) – and they can’t even tell their utterances don’t make sense, because they can’t even make sense of the words leaving their own mouth!
It seems like GPT2 : Broca’s area :: ??? : Wernicke’s area. Are there any cog psych/AI theories on this?
Cool Math Concept You Never Realized You Wanted: Fréchet distance.
Imagine a man traversing a finite curved path while walking his dog on a leash, with the dog traversing a separate one. Each can vary their speed to keep slack in the leash, but neither can move backwards. The Fréchet distance between the two curves is the length of the shortest leash sufficient for both to traverse their separate paths. Note that the definition is symmetric with respect to the two curves—the Frechet distance would be the same if the dog was walking its owner.
The Fréchet distance between two concentric circles of radius and respectively is . The longest leash is required when the owner stands still and the dog travels to the opposite side of the circle (), and the shortest leash when both owner and dog walk at a constant angular velocity around the circle ().
Earlier today, I became curious why extrinsic motivation tends to preclude or decrease intrinsic motivation. This phenomenon is known as overjustification. There's likely agreedupon theories for this, but here's some streamofconsciousness as I reason and read through summarized experimental results. (ETA: Looks like there isn't consensus on why this happens)
My first hypothesis was that recognizing external rewards somehow precludes activation of curiositycircuits in our brain. I'm imagining a kid engrossed in a puzzle. Then, they're told that they'll be given $10 upon completion. I'm predicting that the kid won't become significantly less engaged, which surprises me?
third graders who were rewarded with a book showed more reading behaviour in the future, implying that some rewards do not undermine intrinsic motivation.
Might this be because the reward for reading is more reading, which doesn't undermine the intrinsic interest in reading? You aren't looking forward to escaping the task, after all.
While the provision of extrinsic rewards might reduce the desirability of an activity, the use of extrinsic constraints, such as the threat of punishment, against performing an activity has actually been found to increase one's intrinsic interest in that activity. In one study, when children were given mild threats against playing with an attractive toy, it was found that the threat actually served to increase the child's interest in the toy, which was previously undesirable to the child in the absence of threat.
A few experimental summaries:
1 Researchers at Southern Methodist University conducted an experiment on 188 female university students in which they measured the subjects' continued interest in a cognitive task (a word game) after their initial performance under different incentives.
The subjects were divided into two groups. Members of the first group were told that they would be rewarded for competence. Aboveaverage players would be paid more and belowaverage players would be paid less. Members of the second group were told that they would be rewarded only for completion. Their pay was scaled by the number of repetitions or the number of hours playing. Afterwards, half of the subjects in each group were told that they overperformed, and the other half were told that they underperformed, regardless of how well each subject actually did.
Members of the first group generally showed greater interest in the game and continued playing for a longer time than the members of the second group. "Overperformers" continued playing longer than "underperformers" in the first group, but "underperformers" continued playing longer than "overperformers" in the second group. This study showed that, when rewards do not reflect competence, higher rewards lead to less intrinsic motivation. But when rewards do reflect competence, higher rewards lead to greater intrinsic motivation.
2 Richard Titmuss suggested that paying for blood donations might reduce the supply of blood donors. To test this, a field experiment with three treatments was conducted. In the first treatment, the donors did not receive compensation. In the second treatment, the donors received a small payment. In the third treatment, donors were given a choice between the payment and an equivalentvalued contribution to charity. None of the three treatments affected the number of male donors, but the second treatment almost halved the number of female donors. However, allowing the contribution to charity fully eliminated this effect.
From a glance at the Wikipedia page, it seems like there's not really expert consensus on why this happens. However, according to selfperception theory,
a person infers causes about his or her own behavior based on external constraints. The presence of a strong constraint (such as a reward) would lead a person to conclude that he or she is performing the behavior solely for the reward, which shifts the person's motivation from intrinsic to extrinsic.
This lines up with my understanding of selfconsistency effects.
Virtue ethics seems like modelfree consequentialism to me.
I've was thinking along similar lines!
From my notes from 20191124: "Deontology is like the learned policy of bounded rationality of consequentialism"
The new "Broader Impact" NeurIPS statement is a good step, but incentives are misaligned. Admitting fatally negative impact would set a researcher back in their career, as the paper would be rejected.
Idea: Consider a dangerous paper which would otherwise have been published. What if that paper were published titleonly on the NeurIPS website, so that the researchers can still get career capital?
Problem: How do you ensure resubmission doesn't occur elsewhere?
The people at NeurIPS who reviewed the paper might notice if resubmission occurred elsewhere? Automated tools might help with this, by searching for specific phrases.
There's been talk of having a Journal of Infohazards. Seems like an idea worth exploring to me. Your suggestion sounds like a much more feasible first step.
Problem: Any entity with halfway decent hacking skills (such as a national government, or clever criminal) would be able to peruse the list of infohazardy titles, look up the authors, cyberstalk them, and then hack into their personal computer and steal the files. We could hope that people would take precautions against this, but I'm not very optimistic. That said, this still seems better than the status quo.
Sentences spoken aloud are a latent space embedding of our thoughts; when trying to move a thought from our mind to another's, our thoughts are encoded with the aim of minimizing the other person's decoder error.
Going through an intro chem textbook, it immediately strikes me how this should be as appealing and mysterious as the alchemical magic system of Fullmetal Alchemist. "The law of equivalent exchange" "conservation of energy/elements/mass (the last two holding only for normal chemical reactions)", etc. If only it were natural to take joy in the merely real...
Have you been continuing your selfstudy schemes into realms beyond math stuff? If so I'm interested in both the motivation and how it's going! I remember having little interest in other nonphysics science growing up, but that was also before I got good at learning things and my enjoyment was based on how well it was presented.
Yeah, I've read a lot of books since my reviews fell off last year, most of them still math. I wasn't able to type reliably until early this summer, so my reviews kinda got derailed. I've read Visual Group Theory, Understanding Machine Learning, Computational Complexity: A Conceptual Perspective, Introduction to the Theory of Computation, An Illustrated Theory of Numbers, most of Tadellis' Game Theory, the beginning of Multiagent Systems, parts of several graph theory textbooks, and I'm going through Munkres' Topology right now. I've gotten through the first fifth of the first Feynman lectures, which has given me an unbelievable amount of mileage for generally reasoning about physics.
I want to go back to my reviews, but I just have a lot of other stuff going on right now. Also, I run into fewer basic confusions than when I was just starting at math, so I generally have less to talk about. I guess I could instead try and represent the coolest concepts from the book.
My "plan" is to keep learning math until the low graduate level (I still need to at least do complex analysis, topology, field / ring theory, ODEs/PDEs, and something to shore up my atrocious trig skills, and probably more)^{[1]}, and then branch off into physics + a "softer" science (anything from microecon to psychology). CS ("done") > math > physics > chem > bio is the major track for the physical sciences I have in mind, but that might change. I dunno, there's just a lot of stuff I still want to learn. :)
I also still want to learn Bayes nets, category theory, get a much deeper understanding of probability theory, provability logic, and decision theory. ↩︎
We can think about how consumers respond to changes in price by considering the elasticity of the quantity demanded at a given price  how quickly does demand decrease as we raise prices? Price elasticity of demand is defined as ; in other words, for price and quantity , this is (this looks kinda weird, and it wasn't immediately obvious what's happening here...). Revenue is the total amount of cash changing hands: .
What's happening here is that raising prices is a good idea when the revenue gained (the "price effect") outweighs the revenue lost to falling demand (the "quantity effect"). A lot of words so far for an easy concept:
If price elasticity is greater than 1, demand is inelastic and price hikes decrease revenue (and you should probably have a sale). However, if it's less than 1, demand is elastic and boosting the price increases revenue  demand isn't dropping off quickly enough to drag down the revenue. You can just look at the area of the revenue rectangle for each effect!
How does representation interact with consciousness? Suppose you're reasoning about the universe via a partially observable Markov decision process, and that your model is incredibly detailed and accurate. Further suppose you represent states as numbers, as their numeric labels.
To get a handle on what I mean, consider the game of PacMan, which can be represented as a finite, deterministic, fullyobservable MDP. Think about all possible game screens you can observe, and number them. Now get rid of the game screens. From the perspective of reinforcement learning, you haven't lost anything  all policies yield the same return they did before, the transitions/rules of the game haven't changed  in fact, there's a pretty strong isomorphism I can show between these two MDPs. All you've done is changed the labels  representation means practically nothing to the mathematical object of the MDP, although many eg DRL algorithms should be able to exploit regularities in the representation to reduce sample complexity.
So what does this mean? If you model the world as a partially observable MDP whose states are single numbers... can you still commit mindcrime via your deliberations? Is the structure of the POMDP in your head somehow sufficient for consciousness to be accounted for (like how the theorems of complexity theory govern computers both of flesh and of silicon)? I'm confused.
I think a reasonable and related question we don't have a solid answer for is if humans are already capable of mind crime.
For example, maybe Alice is mad at Bob and imagines causing harm to Bob. How well does Alice have to model Bob for her imaginings to be mind crime? If Alice has low cognitive empathy is it not mind crime but if her cognitive empathy is above some level is it then mind crime?
I think we're currently confused enough about what mind crime is such that it's hard to even begin to know how we could answer these questions based on more than gut feelings.
I suspect that it doesn't matter how accurate or straightforward a predictor is in modeling people. What would make prediction morally irrelevant is that it's not noticed by the predicted people, irrespective of whether this happens because it spreads the moral weight conferred to them over many possibilities (giving inaccurate prediction), keeps the representation sufficiently baroque, or for some other reason. In the case of inaccurate prediction or baroque representation, it probably does become harder for the predicted people to notice being predicted, and I think this is the actual source of moral irrelevance, not those things on their own. A more direct way of getting the same result is to predict counterfactuals where the people you reason about don't notice the fact that you are observing them, which also gives a form of inaccuracy (imagine that your predicting them is part of their prior, that'll drive the counterfactual further from reality).
I seem to differently discount different parts of what I want. For example, I'm somewhat willing to postpone fun to lowprobability highfun futures, whereas I'm not willing to do the same with romance.
Idea: learn by making conjectures (math, physical, etc) and then testing them / proving them, based on what I've already learned from a textbook.
Learning seems easier and faster when I'm curious about one of my own ideas.
For what it's worth, this is very true for me as well.
I'm also reminded of a story of Robin Hanson from Cryonics magazine:
Robin’s attraction to the more abstract ideas supporting various fields of interest was similarly shown in his approach – or rather, lack thereof – to homework. “In the last two years of college, I simply stopped doing my homework, and started playing with the concepts. I could ace all the exams, but I got a zero on the homework… Someone got scatter plots up there to convince people that you could do better on exams if you did homework.” But there was an outlier on that plot, courtesy of Robin, that said otherwise.
How do you estimate how hard your invented problems are?
AFAICT, the deadweight loss triangle from eg price ceilings is just a lower bound on lost surplus. inefficient allocation to consumers means that people who value good less than market equilibrium price can buy it, while dwl triangle optimistically assumes consumers with highest willingness to buy will eat up the limited supply.
Good point. By searching for "deadweight loss price ceiling lower bound" I was able to find a source (see page 26) that acknowledges this, but most explications of price ceilings do not seem to mention that the triangle is just a lower bound for lost surplus.
Lost surplus is definitely a loss  it's not linear with utility, but it's not uncorrelated. Also, if supply is elastic over any relevant timeframe, there's an additional source of loss. And I'd argue that for most goods, over timeframes smaller than most pricefixing proposals are expected to last, there is significant price elasticity.
Lost surplus is definitely a loss  it's not linear with utility, but it's not uncorrelated.
I don't think I was disagreeing?
Ah, I took the "just" in "just a lower bound on lost surplus" as an indicator that it's less important than other factors. And I lightly believe (meaning: for the cases I find most available, I believe it, but I don't know how general it is) that the supply elasticity _is_ the more important effect of such distortions.
So I wanted to reinforce that I wasn't ignoring that cost, only pointing out a greater cost.
I had an intuition that attainable utility preservation (RL but you maintain your ability to achieve other goals) points at a broader template for regularization. AUP regularizes the agent's optimal policy to be more palatable towards a bunch of different goals we may wish we had specified. I hinted at the end of Towards a New Impact Measure [LW · GW] that the thingbehindAUP might produce interesting ML regularization techniques.
This hunch was roughly correct; ModelAgnostic MetaLearning tunes the network parameters such that they can be quickly adapted to achieve low loss on other tasks (the problem of fewshot learning). The parameters are not overfit on the scant few data points to which the parameters are adapted, which is also interesting.
Dylan: There’s one example that I think about, which is, say, you’re cooperating with an AI system playing chess. You start working with that AI system, and you discover that if you listen to its suggestions, 90% of the time, it’s actually suggesting the wrong move or a bad move. Would you call that system valuealigned?
Lucas: No, I would not.
Dylan: I think most people wouldn’t. Now, what if I told you that that program was actually implemented as a search that’s using the correct goal test? It actually turns out that if it’s within 10 steps of a winning play, it always finds that for you, but because of computational limitations, it usually doesn’t. Now, is the system valuealigned? I think it’s a little harder to tell here. What I do find is that when I tell people the story, and I start off with the search algorithm with the correct goal test, they almost always say that that is valuealigned but stupid.
There’s an interesting thing going on here, which is we’re not totally sure what the target we’re shooting for is. You can take this thought experiment and push it further. Supposed you’re doing that search, but, now, it says it’s heuristic search that uses the correct goal test but has an adversarially chosen heuristic function. Would that be a valuealigned system? Again, I’m not sure. If the heuristic was adversarially chosen, I’d say probably not. If the heuristic just happened to be bad, then I’m not sure.
Consider the optimizer/optimized distinction: the AI assistant is better described as optimized to either help or stop you from winning the game. This optimization may or may not have been carried out by a process which is "aligned" with you; I think that ascribing intent alignment to the assistant's creator makes more sense. In terms of the adversarial heuristic case, intent alignment seems unlikely.
But, this also feels like passing the buck – hoping that at some point in history, there existed something to which we are comfortable ascribing alignment and responsibility.
On page 22 of Probabilistic reasoning in intelligent systems, Pearl writes:
Raw experiential data is not amenable to reasoning activities such as prediction and planning; these require that data be abstracted into a representation with a coarser grain. Probabilities are summaries of details lost in this abstraction...
An agent observes a sequence of images displaying either a red or a blue ball. The balls are drawn according to some deterministic rule of the time step. Reasoning directly from the experiential data leads to ~Solomonoff induction. What might Pearl's "coarser grain" look like for a real agent?
Imagine an RNN trained with gradient descent and binary crossentropy loss function ("given the data so far, did it correctly predict the next draw?"), and suppose the learned predictive accuracy is good. How might this happen?

The network learns to classify whether the most recent input image contains a red or blue ball, for instrumental predictive reasons, and

A recurrent state records salient information about the observed sequence, which could be arbitrarily long. The RNN + learned weights form a lowcomplexity function approximator in the space of functions on arbitrarylength sequences. My impression is that gradient descent has simplicity as an inductive bias (cf double descent debate).
Being an approximation of some function over arbitrarylength sequences, the network outputs a prediction for the next color, a specific feature of the next image in the sequence. Can this prediction be viewed as nontrivially probabilistic? In other words, could we use the output to learn about the network's "beliefs" over hypotheses which generate the sequence of balls?
The RNN probably isn't approximating the true (deterministic) hypothesis which explains the sequence of balls. Since it's trained to minimize crossentropy loss, it learns to hedge, essentially making it approximate a distribution over hypotheses. This implicitly defines its "posterior probability distribution".
Under this interpretation, the output is just the measure of hypotheses predicting blue versus the measure predicting red.
In particular, the coarsegrain is what I mentioned in 1) – beliefs are easier to manage with respect to a fixed featurization of the observation space.
Only related to the first part of your post, I suspect Pearl!2020 would say the coarsegrained model should be some sort of causal model on which we can do counterfactual reasoning.
We can imagine aliens building a superintelligent agent which helps them get what they want. This is a special case of aliens inventing tools. What kind of general process should these aliens use – how should they go about designing such an agent?
Assume that these aliens want things in the colloquial sense (not that they’re eg nontrivially VNM EU maximizers) and that a reasonable observer would say they’re closer to being rational than antirational. Then it seems^{[1]} like these aliens eventually steer towards reflectively coherent rationality (provided they don’t blow themselves to hell before they get there): given time, they tend to act to get what they want, and act to become more rational. But, they aren’t fully “rational”, and they want to build a smart thing that helps them. What should they do?
In this situation, it seems like they should build an agent which empowers them & increases their flexible control over the future, since they don’t fully know what they want now. Lots of flexible control means they can better errorcorrect and preserve value for what they end up believing they actually want. This also protects them from catastrophe and unaligned competitor agents.
I don’t know if this is formally and literally always true, I’m just trying to gesture at an intuition about what kind of agentic process these aliens are. ↩︎
ordinal preferences just tell you which outcomes you like more than others: apples more than oranges.
Interval scale preferences assign numbers to outcomes, which communicates how close outcomes are in value: kiwi 1, orange 5, apple 6. You can say that apples have 5 times the advantage over kiwis that they do over oranges, but you can't say that apples are six times as good as kiwis. Fahrenheit and Celsius are also like this.
Ratio scale ("rational"? 😉) preferences do let you say that apples are six times as good as kiwis, and you need this property to maximize expected utility. You have to be able to weigh off the relative desirability of different outcomes, and ratio scale is the structure which let you do it – the important content of a utility function isn't in its numerical values, but in the ratios of the valuations.
Isn't the typical assumption in game theory that preferences are ordinal? This suggests that you can make quite a few strategic decisions without bringing in ratio.
From what I have read, and from selfintrospection, humans mostly have ordinal preferences. Some of them we can interpolate to interval scales or ratios (or higherorder functions) but if we extrapolate very far, we get odd results.
It turns out you can do a LOT with just ordinal preferences. Almost all realworld decisions are made this way.
It seems to me that Zeno's paradoxes leverage incorrect, naïve notions of time and computation. We exist in the world, and we might suppose that that the world is being computed in some way. If time is continuous, then the computer might need to do some pretty weird things to determine our location at an infinite number of intermediate times. However, even if that were the case, we would never notice it – we exist within time and we would not observe the external behavior of the system which is computing us, nor its runtime.
What are your thoughts on infinitely small quantities?
Don't have much of an opinion  I haven't rigorously studied infinitesimals yet. I usually just think of infinite / infinitely small quantities as being produced by limiting processes. For example, the intersection of all the balls around a real number is just that number (under the standard topology), which set has 0 measure and is, in a sense, "infinitely small".
Very rough idea
In 2018, I started thinking about corrigibility as "being the kind of agent lots of agents would be happy to have activated". This seems really close to a more ambitious version of what AUP tries to do (not be catastrophic for most agents).
I wonder if you could build an agent that rewrites itself / makes an agent which would tailor the AU landscape towards its creators' interests, under a wide distribution of creator agent goals/rationalities/capabilities. And maybe you then get a kind of generalization, where most simple algorithms which solve this solve ambitious AI alignment in full generality.
My autodidacting has given me a mental reflex which attempts to construct a gearslevel explanation of almost any claim I hear. For example, when listening to “Listen to Your Heart” by Roxette:
Listen to your heart,
There’s nothing else you can do
I understood what she obviously meant and simultaneously found myself subvocalizing “she means all other reasonable plans are worse than listening to your heart  not that that’s literally all you can do”.
This reflex is really silly and annoying in the wrong context  I’ll fix it soon. But it’s pretty amusing that this is now how I process claims by default, and I think it usually serves me well.
The framing effect & aversion to losses generally cause us to execute more cautious plans. I’m realizing this is another reason to reframe my xrisk motivation from “I won’t let the world be destroyed” to “there’s so much fun we could have, and I want to make sure that happens”. I think we need more exploratory thinking in alignment research right now.
(Also, the former motivation style led to me crashing and burning a bit when my hands were injured and I was no longer able to do much.)
ETA: actually, i’m realizing I had the effect backwards. Framing via losses actually encourages more risktaking plans. Oops. I’d like to think about this more, since I notice my model didn’t protest when I argued the opposite of the experimental conclusions.
I’m realizing how much more riskneutral I should be:
Paul Samuelson... offered a colleague a cointoss gamble. If the colleague won the coin toss, he would receive $200, but if he lost, he would lose $100. Samuelson was offering his colleague a positive expected value with risk. The colleague, being riskaverse, refused the single bet, but said that he would be happy to toss the coin 100 times! The colleague understood that the bet had a positive expected value and that across lots of bets, the odds virtually guaranteed a profit. Yet with only one trial, he had a 50% chance of regretting taking the bet.
Notably, Samuelson‘s colleague doubtless faced many gambles in life… He would have fared better in the long run by maximizing his expected value on each decision... all of us encounter such “small gambles” in life, and we should try to follow the same strategy. Risk aversion is likely to tempt us to turn down each individual opportunity for gain. Yet the aggregated risk of all of the positive expected value gambles that we come across would eventually become infinitesimal, and potential profit quite large.
For what it's worth, I tried something like the "I won't let the world be destroyed">"I want to make sure the world keeps doing awesome stuff" reframing back in the day and it broadly didn't work. This had less to do with cautious/uncautious behavior and more to do with status quo bias. Saying "I won't let the world be destroyed" treats "the world being destroyed" as an event that deviates from the status quo of the world existing. In contrast, saying "There's so much fun we could have" treats "having more fun" as the event that deviates from the status quo of us not continuing to have fun.
When I saw the world being destroyed as status quo, I cared a lot less about the world getting destroyed.
I was having a bit of trouble holding the point of quadratic residues in my mind. I could effortfully recite the definition, give an example, and walk through the broadstrokes steps of proving quadratic reciprocity. But it felt fake and stale and memorized.
Alex Mennen suggested a great way of thinking about it. For some odd prime , consider the multiplicative group . This group is abelian and has even order . Now, consider a primitive root / generator . By definition, every element of the group can be expressed as . The quadratic residues are those expressible by even (this is why, for prime numbers, half of the group is square mod ). This also lets us easily see that the residual subgroup is closed under multiplication by (which generates it), that two nonresidues multiply to make a residue, and that a residue and nonresidue make a nonresidue. The Legendre symbol then just tells us, for , whether is even.
Now, consider composite numbers whose prime decomposition only contains or in the exponents. By the fundamental theorem of finite abelian groups and the chinese remainder theorem, we see that a number is square mod iff it is square mod all of the prime factors.
I'm still a little confused about how to think of squares mod .
The theorem: where is relatively prime to an odd prime and , is a square mod iff is a square mod and is even.
The real meat of the theorem is the case (i.e. a square mod that isn't a multiple of is also a square mod . Deriving the general case from there should be fairly straightforward, so let's focus on this special case.
Why is it true? This question has a surprising answer: Newton's method for finding roots of functions. Specifically, we want to find a root of , except in instead of .
To adapt Newton's method to work in this situation, we'll need the padic absolute value on : for relatively prime to . This has lots of properties that you should expect of an "absolute value": it's positive ( with only when ), multiplicative (), symmetric (), and satisfies a triangle inequality (; in fact, we get more in this case: ). Because of positivity, symmetry, and the triangle inequality, the padic absolute value induces a metric (in fact, ultrametric, because of the strong version of the triangle inequality) . To visualize this distance function, draw giant circles, and sort integers into circles based on their value mod . Then draw smaller circles inside each of those giant circles, and sort the integers in the big circle into the smaller circles based on their value mod . Then draw even smaller circles inside each of those, and sort based on value mod , and so on. The distance between two numbers corresponds to the size of the smallest circle encompassing both of them. Note that, in this metric, converges to .
Now on to Newton's method: if is a square mod , let be one of its square roots mod . ; that is, is somewhat close to being a root of with respect to the padic absolute value. , so ; that is, is steep near . This is good, because starting close to a root and the slope of the function being steep enough are things that helps Newton's method converge; in general, it might bounce around chaotically instead. Specifically, It turns out that, in this case, is exactly the right sense of being close enough to a root with steep enough slope for Newton's method to work.
Now, Newton's method says that, from , you should go to . is invertible mod , so we can do this. Now here's the kicker: , so . That is, is closer to being a root of than is. Now we can just iterate this process until we reach with , and we've found our square root of mod .
Exercise: Do the same thing with cube roots. Then with roots of arbitrary polynomials.
The part about derivatives might have seemed a little odd. After all, you might think, is a discrete set, so what does it mean to take derivatives of functions on it. One answer to this is to just differentiate symbolically using polynomial differentiation rules. But I think a better answer is to remember that we're using a different metric than usual, and isn't discrete at all! Indeed, for any number , , so no points are isolated, and we can define differentiation of functions on in exactly the usual way with limits.
I noticed I was confused and liable to forget my grasp on what the hell is so "normal" about normal subgroups. You know what that means  colorful picture time!
First, the classic definition. A subgroup is normal when, for all group elements , (this is trivially true for all subgroups of abelian groups).
ETA: I drew the bounds a bit incorrectly; is most certainly within the left coset ().
Notice that nontrivial cosets aren't subgroups, because they don't have the identity .
This "normal" thing matters because sometimes we want to highlight regularities in the group by taking a quotient. Taking an example from the excellent Visual Group Theory, the integers have a quotient group consisting of the congruence classes , each integer slotted into a class according to its value mod 12. We're taking a quotient with the cyclic subgroup .
So, what can go wrong? Well, if the subgroup isn't normal, strange things can happen when you try to take a quotient.
Here's what's happening:
Normality means that when you form the new Cayley diagram, the arrows behave properly. You're at the origin, . You travel to using . What we need for this diagram to make sense is that if you follow any you please, applying means you go back to . In other words, . In other words, . In other other words (and using a few properties of groups), .
One of the reasons I think corrigibility might have a simple core principle is: it seems possible to imagine a kind of AI which would make a lot of different possible designers happy. That is, if you imagine the same AI design deployed by counterfactually different agents with different values and somewhatreasonable rationalities, it ends up doing a good job by almost all of them. It ends up acting to further the designers' interests in each counterfactual. This has been a useful informal way for me to think about corrigibility, when considering different proposals.
This invariance also shows up (in a different way) in AUP, where the agent maintains its ability to satisfy many different goals. In the context of longterm safety, AUP agents are designed to avoid gaining power, which implicitly ends up respecting the control of other agents present in the environment (no matter their goals).
I'm interested in thinking more about this invariance, and why it seems to show up in a sensible way in two different places.
Continuous functions can be represented by their rational support; in particular, for each real number , choose a sequence of rational numbers converging to , and let .
Therefore, there is an injection from the vector space of continuous functions to the vector space of all sequences : since the rationals are countable, enumerate them by . Then the sequence represents continuous function .
This map is not a surjection because not every map from the rational numbers to the real numbers is continuous, and so not every sequence represents a continuous function. It is injective, and so it shows that a basis for the latter space is at least as large in cardinality as a basis for the former space. One can construct an injective map in the other direction, showing the both spaces of bases with the same cardinality, and so they are isomorphic.
(Just starting to learn microecon, so please feel free to chirp corrections)
How diminishing marginal utility helps create supply/demand curves: think about the uses you could find for a pillow. Your first few pillows are used to help you fall asleep. After that, maybe some for your couch, and then a few spares to keep in storage. You prioritize pillow allocation in this manner; the value of the latter uses is much less than the value of having a place to rest your head.
How many pillows do you buy at a given price point? Well, if you buy any, you'll buy some for your bed at least. Then, when pillows get cheap enough, you'll start buying them for your couch. At what price, exactly? Depends on the person, and their utility function. So as the price goes up or down, it does or doesn't become worth it to buy pillows for different levels of the "use hierarchy".
Then part of what the supply/demand curve is reflecting is the distribution of pillow use valuations in the market. It tracks when different uses become worth it for different agents, and how significant these shifts are!
Physics has existed for hundreds of years. Why can you reach the frontier of knowledge with just a few years of study? Think of all the thousands of insights and ideas and breakthroughs that have been had  yet, I do not imagine you need most of those to grasp modern consensus.
Idea 1: the tech tree is rather horizontal  for any given question, several approaches and frames are tried. Some are inevitably more attractive or useful. You can view a Markov decision process in several ways  through the Bellman equations, through the structure of the state visitation distribution functions, through the environment's topology, through Markov chains induced by different policies. Almost everyone thinks about them in terms of Bellman equations, there were thousands of papers on that frame pre2010, and you don't need to know most of them to understand how deep Qlearning works.
Idea 2: some "insights" are wrong (phlogiston) or approximate (Newtonian mechanics) and so are later discarded. The insights become historical curiosities and/or pedagogical tools and/or numerical approximations of a deeper phenomenon.
Idea 3: most work is on narrow questions which end up being deadends or not generalizing. As a dumb example, I could construct increasingly precise torsion balance pendulums, in order to measure the mass of my copy of Dune to increasing accuracies. I would be learning new facts about the world using a rigorous and accepted methodology. But no one would care.
More realistically, perhaps only a few other algorithms researchers care about my refinement of a specialized sorting algorithm (from to ), but the contribution is still quite publishable and legible.
I'm not sure what publishing incentives were like before the second half of the 20th century, so perhaps this kind of research was less incentivized in the past.
Could this depend on your definition of "physics"? Like, if you use a narrow definition like "general relativity + quantum mechanics", you can learn that in a few years. But if you include things like electricity, expansion of universe, fluid mechanics, particle physics, superconductors, optics, string theory, acoustics, aerodynamics... most of them may be relatively simple to learn, but all of them together it's too much.
When under moral uncertainty, rational EV maximization will look a lot like preserving attainable utility / choiceworthiness for your different moral theories / utility functions, while you resolve that uncertainty.
This seems right to me, and I think it's essentially the rationale for the idea of the Long Reflection [EA · GW].